Platinum Benchmarking: How Ideal Standards Eclipse Marginal Value

12 Apr, 2026

Your on-call rotation is burning your team out. The alerting system fires constantly. False positives, noisy thresholds, engineers waking up at 3am for things that don't matter. Someone proposes bumping the alerting threshold. It's a five-minute fix. It wouldn't catch everything, but it would kill the obvious noise while still catching real incidents.

Someone else pushes back. That's a band-aid. The real solution is a full overhaul of the alerting strategy: rethink severity tiers, define proper runbooks, add auto-resolution, redesign the whole thing properly. A full overhaul probably would be better. But that's not the right comparison.

This is platinum benchmarking. It's when you compare a proposal to the ideal version of a solution instead of comparing it to what you have now.

Once the ideal becomes the reference point, every practical fix looks inadequate. The threshold bump isn't being compared to the current system, where people are exhausted and alerts are being ignored. It's being compared to a fully rethought alerting architecture.

But the question shouldn't be "is this as good as the best version we can picture?" It should be "is this better than what we're doing now, and is it worth the cost?" A five-minute threshold change that kills 80% of your false-positive alerts is a huge improvement. But once the ideal becomes the minimum bar, practical fixes get rejected for failing a test they were never supposed to pass.

There's an asymmetry here though. We don't hold existing solutions to the same standard we hold proposals to.

Before something exists, it gets judged against everything it's missing. After it exists, it only gets judged on whether improving it further is worth the effort.

A team that's been using a shared spreadsheet for six months isn't going to delete it because someone mentions a database would be better. They'd say the spreadsheet works fine. But if that same spreadsheet was proposed as a new solution in a meeting, it would get torn apart for not being a database.

A useful way to catch this is the look-back test. Imagine the quick fix has already been applied. The threshold change is live, the alerts have quieted down, the team is sleeping through the night. Now someone proposes the full overhaul. Would you actually prioritise it? Would you pull engineers off other work to spend weeks rebuilding a system that's working well enough?

If the answer is no, the objection was never really about value. If the answer is yes, great, do both. Ship the five-minute fix now and pursue the overhaul later. The two were never in competition. Platinum benchmarking just made them feel like they were.

This is why platinum benchmarking is costly. Teams don't usually reject a quick fix and then go build the ideal version. They reject the quick fix, and the ideal version turns out to be too big to prioritise. So neither gets done. The original problem just keeps going longer than it needed to.

Compared to what we have now, how much better is this, and what does it cost?

Most of the time, the quick fix doesn't lose to a better solution. It loses to the conversation about one.