> As a rate limiter in my application I found CPU seconds per second to be much more useful than requests per second.
This is really insightful and makes good sense. I've toyed for years with the idea of building a "Cost Benefit Cache" that basically tracked the request volume for a piece of data, the time to retrieve that data and the size of the data retrieved to make decisions about what to cache during system stress.
The idea was basically a cost benefit formula where the cost is the size of the data, so how much of the available cache space would it use, while the benefit is the seconds of savings from requests * average retrieval time. Requests per second alone doesn't really matter if the request is low/no impact.
Google had a library like that, and one of the foot guns is that requests that are malformed or otherwise do 0 work have incredibly low cost, which results in them being prioritized above all other requests. Every user had to configure some obscure options to set cost multipliers based on return code to try and correct for this, which felt like a pretty fundamental bug that was being hacked around.
Edit: this was actually for cost-based load balancing, not load shedding. You obviously can't load shed based on return code.
Some things are really hard to appropriately abstract, but there's so much similarity we can't help but be tempted to try.
Soemthing that I wrote at $WORK recently was a data-oriented exponentially smoothed reservoir sampling order statistic randomized splay tree.
All of those components exist individually. Even picking on two of them though; given a tree implementation it's trivial (as a human) to create an analogous order-statistic tree. In any pseudo-mainstream programming language though, especially at a system level, how do you tack on the "order-statistic" behavior to an arbitrary tree?
It's not too bad if you additional require some interface the primary tree needs to adhere to. That interface is, necessarily, biased toward the needs of an order-statistic tree though. There isn't actually that much difference between implementing a tree which adheres to the interface and one which just does every operation manually. Plus, there are some allocation questions and other inefficiencies (or, alternatively, unsoundness) that level of abstraction introduces which you can avoid by writing the whole thing from scratch.
Bringing it back to the $GOOG library, the problem is that you have a lot of code which _looks_ like it's the same, but it's impossible to make it actually _be_ the same without heavy lifting for callers of that library.
Taking those monolithic frameworks (e.g., inadvisedly popular rate-limiting shenanigans) and turning them into libraries of helpful utilities often helps. General solutions are harder to come by.
I think the answer is for the library to not be quite so ambitious. This detail isn’t a footgun, bug or hack (to quote the previous comment) if it’s explicitly part of the interface.
The user would need to declare the signals to know a request should be deprioritized, or even fully handle that logic themselves. People will deride this as “more complex”, but it’s actually less complex than the solution that appears simple, but doesn’t really work.
After it gets enough usage, you’ll start noticing patterns and you can implement smaller abstractions to make things easier. You could have sensible defaults of codes to ignore, or integrate it with your error logging service.
When you've got 80,000 engineers you end up with lots of different opinions on what the wire protocol should look like.
* Sometimes, you want to fail closed - errors are returned for all errors.
* Sometimes, you want to fail open - ok is returned for errors.
* Sometimes, you want to separate "failure" from "denied", the code "worked", but it didn't do what the client wanted - think "payment declined", so you return "OK", with the actual result in the response.
Compare the different costs to charge a credit card that is expired vs one that works.
* The expired one can be checked before calling out to the processor - cheap, returns "OK + Declined".
* The success ends up with a call to the processor and takes a second or two, returns "OK + Success".
This is really insightful and makes good sense. I've toyed for years with the idea of building a "Cost Benefit Cache" that basically tracked the request volume for a piece of data, the time to retrieve that data and the size of the data retrieved to make decisions about what to cache during system stress.
The idea was basically a cost benefit formula where the cost is the size of the data, so how much of the available cache space would it use, while the benefit is the seconds of savings from requests * average retrieval time. Requests per second alone doesn't really matter if the request is low/no impact.