
Scaling your API with rate limiters - edwinwee
https://stripe.com/blog/rate-limiters
======
ptarjan
Hi! I'm the author of the post and am happy to answer any questions you have.

There is also a corresponding set of code examples at the bottom that might be
of interest to you:
[https://gist.github.com/ptarjan/e38f45f2dfe601419ca3af937fff...](https://gist.github.com/ptarjan/e38f45f2dfe601419ca3af937fff574d)

~~~
b-ryan
I'm curious what your thoughts are on implementing this via a reverse proxy in
front of your services. In terms of implementation it appears to me much
simpler. But it obviously comes at the cost of maintaining the proxy if you
are, say, already using ELBs.

~~~
Psilidae
Before I read this article (having just seen the title) I assumed this was
going to be covering a reverse proxy system, and started pondering how I would
implement something like this myself. A very basic approach probably would be
simpler than middleware, but the question I got stuck on was whether or not a
reverse proxy would need to be "smart" about the content it was limiting. That
is, would it need to be programmed to parse requests and understand what the
intention and purpose was of each request?

For example, limiting traffic by IP address via reverse proxy is simple, but
it seems like it would be more difficult to limit by request priority.

I was surprised when the article revealed it was middleware, and suddenly that
made a lot more sense and seemed easier because it no-longer requires
duplicating application logic to understand the content of requests.
Middleware definitely seems like the better approach to me after these
considerations.

What kinds of methods would one use to solve the problem of needing to parse
and understand incoming requests using the reverse proxy method?

~~~
b-ryan
From the bit I know, I think there are at least some options available. For
example, in nginx I believe you can use the "map" module to mix and match
different components of the request into your limiting.

From what I saw, HAProxy appears to be even more powerful. Its ACL concepts
are completely able to create rules based on headers, IPs in the request, etc.
and you can compose them into larger ACLs.

With the example of request priority, if you can determine the priority by
it's URL or a header, let's say, you can achieve this with nginx. But if you
need to look the user up in a DB and see how much they're paying you, you
obviously have to do it in the application.

------
Animats
I've put fair queuing into an web API. Requests queue by IP address; if you
have a request in the mill, any further requests are processed behind those
from other IP addresses. This handles single-source overloads very well. It
doesn't require tuning; there are no fixed time constants or rate limits. For
about a month, someone was pounding on the system with thousands of requests
per minute. This caused no problems to other users. I didn't notice for
several days.

One would think that this would be a standard feature in web servers.

~~~
twright0
This is an interesting approach to request rate limiting - the simplicity and
lack of tuning is definitely appealing.

Did you do anything to mitigate the scenario where multiple users are behind
the same IP address? With this approach I would worry about locking out all
users in a NAT when a single user misbehaves.

It's also scary to me to process something "last" \- under high enough load
you might never get back to the (potentially briefly) abusive user? Did you
attempt to guarantee some minimum passthrough rate even for misbehaving users?

~~~
Animats
As with fair queuing in routers [1], you have to avoid infinite overtaking. If
someone has have a request in progress, their other requests are stuck behind
that one until the request in progress is serviced. Then your next request is
eligible to be processed.

[1] [https://tools.ietf.org/html/rfc970](https://tools.ietf.org/html/rfc970)

------
jfroma
We do the same but instead of using redis we built our own service for it
currently on top of leveldb instead of using redis.

This allow us to define the parameters of the token bucket on the service
instead of the application like:

    
    
      'requests':
        per_second: 100
        size: 200
    

Then in the app is like

    
    
      conformant = limitd.take('request', ip, 1)
    

I found the token bucket alg to be useful in a lot of scenarios not only for
rate limiting but also for any kind of event debouncing. A common example:
lets say you want to email a user everytime they trigger some condition on the
system but you dont want to send the same mail more than once a day.

Our project is opensource, we still working on it:

[https://github.com/auth0/limitd](https://github.com/auth0/limitd)

------
jakozaur
Many engineers discover a need of rate limiters the hard way :-).

I just wonder, do you use rate limiters just for external API or also for
internal API of your microservices?

~~~
hibikir
A big percentage of our internal services are near-real-time asynchronous
event consumers. Writing a consumer is not all that different than writing a
service. For events and consumers we have mechanisms that serve similar
functions as API rate limiting, but they have a different shape.

So the answer is probably a yes?

------
sinzone
Another way to add a fast, distributed rate-limiting on top of your APIs is by
using an API Gateway [1] like the open-source Kong [2]. This pattern is
becoming a standard since it saves time from re-implementing and duplicating
the same rate limiting logic in each backend microservice.

[1]
[http://microservices.io/patterns/apigateway.html](http://microservices.io/patterns/apigateway.html)

[2] [https://getkong.org/plugins/rate-
limiting/](https://getkong.org/plugins/rate-limiting/)

~~~
LeonidBugaev
Yeah, worth mentioning [https://tyk.io](https://tyk.io) too.

Also, do not forget about Quotas which usually comes along with Rate limits.
Modern API gateways can handle so many stuff for you and help with API
scaling.

------
susi22
FYI: Redis now has a module that does rate limiting for you:
[https://github.com/brandur/redis-cell](https://github.com/brandur/redis-cell)

------
jdwyah
Can't help but put a link to [https://www.ratelim.it](https://www.ratelim.it)
in here. A rock solid distributed rate-limit was something I desperately
missed when moving to a smaller company.

Primarily I use this for idempotency and throttling things like usage events.
But you can also use it for locking and concurrency control.

------
netinstructions
Reminds me of a website I built once (more like a side project) where I
anticipated heavy loads and all sorts of nefarious users / bad situations so I
spent a good chunk of time implementing and testing rate limiters into most of
the critical requests.

I ended up getting maybe... 3 or 4 well-behaved visitors / day for the first
two years.

------
crbaker
Great article that I expect will assist many teams that are facing scaling
issues.

I thought it appropriate to plug my java based rate limiting library that
implements the token bucket algorithm as mentioned in the article

[https://github.com/mokies/ratelimitj](https://github.com/mokies/ratelimitj)

------
coolg54321
Very interesting, we have throttling for our customer facing API built with
PHP and found this library very useful: [https://github.com/sunspikes/php-
ratelimiter#throttler-types](https://github.com/sunspikes/php-
ratelimiter#throttler-types)

------
ge96
Ignorant guy here:

Wondering how the two words "scale" and "limit" go together. My only
experience right now with someone's API and rate limiting is Cloudinary. They
give you 500 requests/hr (free). Which can be a lot or nothing at all.

No sorry it does make sense, don't allow one person to exhaust/consume your
resources.

What about using Go? I hear crazy stuff like going from 2000 servers to 2, and
doing 25,000 requests per second. Or is this a bandwidth concern?

~~~
zitterbewegung
How would you handle the problem where one API user would exhaust all your
resources and your servers don't melt? Other than spending an ever increasing
amount of money expanding capacity you would need rate limiting. What if the
API call is computationally expensive ? What if someone is trying to
maliciously use your API?

~~~
ge96
Yeah I'm not disagreeing with the author here. It didn't occur to me
immediately what was being done. Scaling and limiting seems like antonyms but
I get the point. Especially if it's free (the API access). Even premium/paid
would need limits.

For the case here, where the client insisted on keeping the 500/hr limit, I
cached the query-results with a database as the values don't really change.
But if they did change in the future that could be a problem, so you'd have to
update the "cache".

Regarding handling overflow, that's something I haven't done yet myself, still
stuck in the LAMP days, and have not done something like bench marking my
server to see how many concurrent connections/requests it could handle.

note: LAMP isn't an excuse, I'm just making a note that I'm behind in using
other technology that could be better. But I've heard of modular Apache
configs/routing... stuff beyond me at this point.

~~~
ge96
Man I have yet to experience what people are talking about here. By that I
mean actually have the traffic to worry about requests/sec. I'd be lucky to
get 100 visitors per month haha, I don't really have anything. Good stuff to
look forward to, glad to be able to learn from other people.

------
tofflos
Well written. I liked the "Building rate limiters in practice" section and
your advice on launching dark seems like a great idea.

------
fspear
Great post however does anyone know how rate limiting specific users is
typically implemented? For instance if you have a SaaS API with multiple
subscription plans you generally rate limit users based on tier e.g free tier:
up to X number of requests, paid tier: unlimited number of requests, etc. I am
assuming this is typically handled in the API itself.

Thanks in advance.

~~~
ptarjan
Thanks! The buckets are already done per-user so it is very simple to make the
constant factors be user-dependent.

For my example in
[https://gist.github.com/ptarjan/e38f45f2dfe601419ca3af937fff...](https://gist.github.com/ptarjan/e38f45f2dfe601419ca3af937fff574d#file-1-check_request_rate_limiter-
rb) you would just set REPLENISH_RATE to be a different value for different
users.

~~~
fspear
Thanks!

------
rodionos
Bookmarking this so I can go back to it when we have similar issues which are
a function of scale, I assume.

------
cyberferret
This article kind of makes me glad we decided to build our SaaS API on top of
AWS API Gateway. At least I don't have to worry about the fine details of
implementing rate limiting - just outsource that hard stuff to AWS and tuck
our actual API endpoint behind their gateway.

------
ezekg
Does Stripe really allow 1,000 reqs/sec, or was that just an example limit? Is
that normal i.e. expected for an API service like Stripe to offer? I usually
see rate limits of 5-25 reqs/sec, but maybe I haven't been looking at the
right APIs?

~~~
nacs
Well Stripe is handling payments for other companies so any unserved/blocked
request could cost their clients literal money/customers. Also their customers
could suddenly get an influx of orders/customers following things like
Superbowl commercials so their limits would need to be pretty high.

I'm guessing here but I'm guessing Stripe would actually only enforce these
limits when they have some infrastructure issue or emergency and in most cases
would allow all non-abusive uses of their API through.

Twitter and other such services could use much lower rate limits because its
OK if a user is unable to post a new tweet for a few seconds.

~~~
ezekg
Good point. That's what I figured--for example, a background job running end-
of-month subscription charges for a large co. could potentially slam the API
for a few hours.

------
daok
Mousewheel doesn't work for me on this website

~~~
edwinwee
Could you send me an email with what browser and OS version you're using?
edwin.wee@stripe.com

------
rdegges
I a bit torn here. I love Stripe, but I feel like this is awful advice for
API-first companies.

If you are building a developer product where your API is your business, using
rate-limiters is akin to preventing customers from giving you money. If your
product is an API, you should encourage usage. The more usage you get, the
more success both you and your customer will have. It's a win-win.

Because of this, I believe implementing rate-limiting strategies result in not
only poor-DX for the product, but also a loss of trust (if these limits are in
place to prevent your backend from failing, what other things do I need to
worry about while using your product?), AND most importantly, they result in
loss of business for both the API company and the customer.

IMO, if you're an API company, and you can't handle bursts of traffic from
your customers, you should work on improving your backend and stop wasting
time messing around with implementing patterns like this. It's a lose-lose
situation for you and your customers.

I was really inspired by this recently after spending some time @ Twilio. Jeff
(and the rest of the Twilio team) are hardcore about their API first product /
thinking. They have a motto which is something along the lines of this: if you
and your customer are both more successful and both make more money when the
API usage goes up, do whatever you can to get out of the way and let the API
be used as much as possible. I thought that was an awesome approach to take.

~~~
jonaf
Sorry, but this is naive and bad advice. Rate limiting is a critically missing
component in many service APIs, for more reasons than I can even
comprehensively enumerate off the top of my head.

Scaling an API of any real value is NOT trivial, and struggling to scale an
API to meet user demand does NOT necessarily mean that the backend was poorly
designed. This is a naive generalization that is hazardous to the industry.
Please don't spread it.

Here are some reasons why a lack of rate limiting / user auth is practically
negligence. There are more, to be sure. I have experience operating a customer
facing API for Bazaarvoice, so I think I know what I'm talking about. (We do
thousands of requests per second and power reviews for the likes of Walmart,
Best Buy, and 4,000 other retailers and brands worldwide.)

* Multi-tenancy * * client A over extends and causes client B to be unable to use the API * * client A needs scale independent of clients B-F * monitoring * * suddenly a client is making fewer than usual calls, why? * * suddenly a client is making more than usual calls, why? * billing * * want more requests / second? Upgrade your contract * * it's easier to measure how much I should charge customers per request or type of request, when I can see the rates of those requests and what it costs me * security * * DDOS attack? Start by setting the limit to nothing, or rejecting the requests * * leaking API auth info is less dangerous, if it happens

I think some other sibling comments mentioned other great reasons. The
takeaway is that a valuable API will most likely be difficult, expensive or
both difficult and expensive to scale, and rate limiting is extremely
important.

~~~
kpil
Rate limiting, and specifically leaky bucket algorithms that spaces out
requests evenly, rather than servicing them as they happens to come in was
shown to improve the overall performance in a few systems I worked with.

Using a leaky bucket algorithm, and a per-customer bucket, I think it's
possible to build "fair" systems that also improves the performance.

That is, you can run the system with a higher total transactions per seconds,
just by queuing "simultaneous" requests a few milliseconds, as they will
complete quicker.

The reason is probably that it's reducing contention and levels out the
resource usage.

I thought it would be a feature in almost all web servers, since it's been
known "since forever" in the telecom world, but I have not seen it. (Have not
looked specifically either, so maybe there are good support for this
everywhere and I missed it...)

