And this reminds me about another story, I can't find the link but it was something like this:
A game developer was making a game for PlayStation and they were over their memory limit. They were approaching a deadline but couldn't remove anything else from the game to fit it in memory (or disk, I can't remember). So a senior dev came by, changes the code in 1 minute and everything could fit into memory now.
The thing was that at the start of each project, he had declared a variable of 2mb that does nothing, so when every optimisation has been done and it still doesn't fit, he could just remove that variable and free up some more space.
There’s an episode of “Star Trek: the Next Generation” where Scotty, the engineer from the original Shatner Star Trek tells the next-century engineer LaForge that this is how it’s done. You never tell the captain all that you have so you keep something to squeeze at the fatal moment.
This is sound advice from both technical and nontechnical perspectives. I like to use this middleware in node to inject semi-realistic delays in my mock data: https://github.com/boo1ean/express-delay
A lot of people are building big systems these days and in many cases we have a guess at what is a reasonable amount if time for all of the steps in the process to take if we want an answer in 600 ms.
While your trick makes you look good, setting the times to match the budget might be more honest. And when the app slows down you can blame the people who take 250ms to do their part when we agreed to 100ms.
Really? Our 95th percentile only goes above 1s when we are having problems, and nobody with any power in the company thinks that's good enough. Think about how much hardware capacity you need for a site getting even 100s of requests per second. If you can halve the p95 you can decommission or re-allocate close to half of your servers.
As several other people on HN have pointed out more eloquently, it's the variability that kills you faster than the average throughput.
The 100ms was not about end-user response times, it's referring to internal response times between servers. To make a page in 1 second you can't have 3 different services taking 700ms to respond, even if you can make all three calls in parallel. And if you have to call a bunch sequentially, you need the 75th or even the 95th percentile for those services to be pretty good otherwise your 95th percentile for the entire interaction will be very spiky.
Lesson learned: make demo functions slower on purpose so the real one matches or exceeds it. It's not even deceptive: you're setting realistic expectations instead of giving a false impression.