Fun weekend project but definitely not production-ready (no tests, no error handling, concurrent requests will cause a race condition, etc.). If readers are looking for something production-ready to use, consider https://github.com/go-redis/redis_rate (which implements GCRA/leaky bucket), or https://github.com/ulule/limiter (which uses a much simpler algorithm, but has good middleware).
> It is capabale to handle distributed workload with its redis database support
sounds like this is limited by redis. For many organizations, this is fine. At my last gig, we used redis for deduplication and it required over 150 redis nodes with deterministic routing and consensus. Redis reportedly could support 200k rps per node, but in our case, we wouldn't see it get passed around 50k rps no matter how we tuned it.
An interesting addition to this library would be to use an interface and allow your backing datastore of choice allowing teams to use redis, zookeeper, an in-mem Go instance of the same library, sql, etc.
A fun exercise would be to figure out how to make the rate limiting itself distributed so you don't need a single store keeping everything in sync. Maybe a combo of deterministic request routing in a ring topology
> An interesting addition to this library would be to use an interface and allow your backing datastore of choice allowing teams to use redis, zookeeper, an in-mem Go instance of the same library, sql, etc.
Thanks for the feedback. I'm gonna implement an in-mem Go instance for local dev, but not sure if that will be enough to use in prod. also, in the next release, I will make redis optional.
RPS meaning reads or writes? What's the distribution of message sizes, and how large is your total dataset? What specs (core count, NIC) did each node have?
I'm asking because without this info, RPS is not a particularly useful metric. As an extreme example, if your dataset is <1MB, you could likely serve read-heavy requests from your SmartNIC's dcache at close to line rate.
It's been a number of years since I worked on it. I can try to answer your questions. The calls were nearly entirely INCR calls against keys that were typically around 150 bytes long and app logic was based on the return values. I believe each node took up around 20GB of memory when saturated. Redis uses a single CPU, so core count shouldn't matter, no? I'm not entirely sure what our NIC situation was, but it was tuned to handle gigabit traffic.
Given most of the backends use round robin for loadbalancing, having in-memory counter should be enough. Removing redis as downstream dependency is a big win.
For the redis implementation, there should be fallback to in-memory counting instead blocking altogether. Currently the redis is a SPOF for the entire service.
if you're round robining clients w/o sticky assignment then you're going to get nodes*limit consumption. Not correct.
Also if you give limit/nodes per node and random assign a connection, you get correct answers on average, but a really janky pattern at the edge case (a user gets a 429, and retries and succeeds, then gets 429 again as they consume those last few requests).
> Given most of the backends use round robin for loadbalancing, having in-memory counter should be enough. Removing redis as downstream dependency is a big win.
thanks for the feedback. planning to make redis optional in next release.
1. some tests, over the wire preferably, would be nice
2. redis.go does not seem to be nessary it just changes signature of redis client constructor without much difference, might as well inline its contents
3. using fmt too much, if you don't need run time variables encoding, can do something more simpler. like writing to w.Write([]byte) directly. fmt uses reflect and runtime type detection, better avoid if not needed.
4. code comments do not follow godoc conventions. they should start from symbol name. did you run go fmt and some basic linters?
I really like the idea of getting code reviews from Hacker News for personal projects. There's so much knowledge and passion on here, it could be really useful. It would also help for me, as a bystander, to understand the context of these recommendations. I've done a bit of Go, and some of these sound useful to know.
Selfishly, I'd love a documentation review if you (or anyone else) has the time to take a try out some golang projects I've been working on.
The code is frankly not very polished and not worth reviewing, but I'm curious to hear feedback on the documentation / README and whether or not you clearly understand what these libraries are for and how to use them.
If that's not polished code, then I'm really curious what you consider to be polished!
* It follows Go conventions, has helpful comments and readme file.
* Given it's not user facing prod code, I'm assuming high-performance isn't critical, and the nature of what it's doing means using `once.*` is worth the trade-off with concurrency.
* It also makes sense not to actual test against a database, given that's the whole point of this package.
* The code is well-structured, with "internal" code separated.
I don't think I'm good enough at Go to be able to provide any more useful feedback than that.*
Thank you! Just saw this — kind of you to take the time, and I appreciate the compliments.
> Given it's not user facing prod code, I'm assuming high-performance isn't critical, and the nature of what it's doing means using `once.*` is worth the trade-off with concurrency.
I'm impressed you noticed this. I don't do a good job of explaining this in code comments or docstrings, but the linearization / contention is necessary for correctness. Basically, each time someone asks for a new testdb, the code needs to make sure the relevant user exists, and the relevant template exists, and has the migrations run on it. If these things don't exist, the test will need to create them. With many tests running in parallel, they need some way to contend and make sure that the user/template is only created once.
Because golang runs the tests of separate packages as totally separate processes, the code does the contention with "advisory locks" inside the Postgres server. When two tests are contending on these advisory locks, they have to hold a connection open to the server. As a result, server speed and max simultaneous connections are the primary limiters of how many tests can be operating in parallel and how fast your test suite runs.
I added the `once.*` helpers to move the contention "left" where possible, to the memory of the packages that are being tested. Within a package, tests can (and should) also run in parallel. The `once.*` helpers force the different tests to contend in the shared process memory, the theory being that it's faster and that it reduces the number of server queries/connections held open just waiting around on a server-side lock.
I haven't actually tested the code with and without this, and thanks to your comment I will try to do this at some point!
Good for you doing a thing! Please understand the community is likely very wary of single maintainer, weekend project, high risk dependencies right now.
> For production use, fork it and made changes based on your need.
...instead of the standard "this is work in progress/just a prototype/weekend project, don't use it in production!" disclaimer (which of course won't reliably stop people from using it in production either, but at least they can't say they haven't been adequately warned).
https://stanza.systems is a managed thing that offers this functionality (and a bit more) if y'all are looking for an escape valve away from redis as the coordination mechanism.
Thank you to everyone who provided me with suggestions to make it better. I got a lot of awesome feedback. and I will build/fix them. you guys are awesome.