Platinum Benchmarking: How Ideal Standards Eclipse Marginal Value

12 Apr, 2026

Your on-call rotation is burning your team out. The alerting system fires constantly. False positives, noisy thresholds, engineers waking up at 3am for things that don't matter. Someone proposes bumping the alerting threshold. It's a five-minute fix. It wouldn't catch everything, but it would kill the obvious noise while still catching real incidents.

Someone else pushes back. That's a band-aid. The real solution is a full overhaul of the alerting strategy: rethink severity tiers, define proper runbooks, add auto-resolution, redesign the whole thing properly. A full overhaul probably would be better. But that's not the right comparison.

This is platinum benchmarking. It's when you compare a proposal to the ideal version of a solution instead of comparing it to what you have now.

Once the ideal becomes the reference point, practical fixes start to look inadequate. The threshold bump isn't being compared to the current system, where people are exhausted and alerts are being ignored. It's being compared to a fully rethought alerting architecture.

The better question is usually "is this better than what we're doing now, and is it worth the cost?" A five-minute threshold change that kills 80% of your false-positive alerts is a huge improvement. But once the ideal becomes the minimum bar, practical fixes can end up getting rejected for failing a test they were never supposed to pass.

There's an asymmetry here though. We tend not to hold existing solutions to the same standard we hold proposals to.

Before something exists, it gets judged against everything it's missing. After it exists, it mostly gets judged on whether improving it further is worth the effort.

A team that's been using a shared spreadsheet for six months isn't going to delete it because someone mentions a database would be better. They'd say the spreadsheet works fine. But if that same spreadsheet was proposed as a new solution in a meeting, it would probably get torn apart for not being a database.

A useful way to catch this is the look-back test. Imagine the quick fix has already been applied. The threshold change is live, the alerts have quieted down, the team is sleeping through the night. Now someone proposes the full overhaul. Would you actually prioritise it? Would you pull engineers off other work to spend weeks rebuilding a system that's working well enough?

If the answer is no, the objection probably wasn't about value. If the answer is yes, great, do both. Ship the five-minute fix now and pursue the overhaul later. The two aren't necessarily in competition. Platinum benchmarking can just make them feel like they are.

Next time a quick fix gets blocked, try asking: if this was already shipped, would we still build the alternative?