Why your p99 latency lies to you

Tail latency is the number everyone quotes and almost nobody computes correctly. We put p99 on dashboards, alert on it, and write it into SLOs — and then measure it in a way that quietly removes the worst cases from the sample.

Coordinated omission

The classic mistake is sending a request, waiting for the response, and only then sending the next one. When the system stalls, your load generator stalls with it, so the long pause never produces the flood of slow samples it should have. You measured the system's good behavior and called it the tail.

The fix is to decouple request scheduling from response timing: issue requests on a fixed schedule and record latency against the time the request should have started, not when you got around to sending it.

Measure the thing you promised

A p99 computed over the wrong population is not a conservative estimate — it is a different number that happens to share a name. Before you trust a tail metric, ask what got dropped to produce it.