logoalt Hacker News

trbyesterday at 10:29 PM2 repliesview on HN

Considering other metrics then p99 for user impact is unwise. All users will at some point experience a <1% request, it's not like half of all users will only send requests what will be under your median latency, some of their requests will hit your worst-case.

By focusing on the tail and optimizing worst cases you help users more than by improving your median latency.


Replies

the8472today at 10:05 AM

If your frontend fires hundreds of requests (which isn't uncommon) then the p99 is merely what most users will experience. Ideally you want cumulative distribution chart that goes up to the max. And then that's just for the requests you measure. If something takes too long the user might do something that cancels the requests which means the backend never completes its response and won't get the time-to-response sample, so you need to account dropped requests too.

https://www.youtube.com/watch?v=lJ8ydIuPFeU

show 1 reply