100 requests per second? That is not too impressive. Yes, I know Python is their tool of choice, but it is probably the wrong tool for the problem at hand.
There are frameworks that can handle over a million requests per second (simple json output) or at least several 10,000 requests per second if DB queries are performed (even though on different hardware, but just compare the scale).
For simple queries Postgres can do thousands of requests per second on desktop hardware without any serious tuning. Of course if those are complex queries or if they need to handle a lot of data this can be much, much slower. But Postgres is not slow for simple queries.
If you look at these benchmark a bit more deeply, you'll see that there isn't anything that's as featured and robust as Django and that's more than 2/3 times as fast, which isn't much to justify a full rewrite. Django is at 4.4%, and Spring for example is at 9%. The exception would be ASP.NET Core, but I don't know enough about the ecosystem to judge it.
> 100 requests per second? That is not too impressive. Yes, I know Python is their tool of choice, but it is probably the wrong tool for the problem at hand.
I would imagine that even with Python, a fully asynchronous framework would be far better at requests-per-second than Django, but would require sacrificing features, since nothing async is yet at feature parity.
One thing that jumps out at me is that it seems they didn't use any Postgres connection pooling. They mention that going above 100 concurrent connection would bump them into a more expensive Postgres plan, which really only makes sense if you don't use a pool.
My first instinct is that the number of requests seems really low, but I have no idea about the complexity of each request. To me that is kind of a crucial information to actually evaluate anything in the blog post.
This is a great point. One of the ideas we were wanting to get across is that using a very simple tech stack you can get to a number of requests that most services will never meaningfully see. We do have our Django configured to pool connections, but it's configured per process, and not sharing them across all the various web processes that we're running: https://docs.djangoproject.com/en/3.2/ref/databases/#persist...
I'd be curious to see if there is a meaningful gain from that approach, if anyone has done the transition before.
That's one part where having multiple completely independent processes is a disadvantage to just having threads. I never benchmarked this, but I always heard that Postgres processes are expensive and you should always use a connection pool. But this is probably more noticeable on the DB server side, where each process uses memory. I never used it myself, but PG bouncer might be useful to you in this case.
The number of requests still feels slow, and it isn't clear to me from the blog post whether that is DB limited or CPU limited on the application servers. Even with writes on each access Postgres should still be bored at that kind of load.
Yea, our IO and CPU of our Postgres instance is around 20% currently. We haven't hit too many issues with Postgres, so haven't delved too deeply into the performance.
I think people seeing performance posts are used to people pushing huge numbers. Our intent here was to show that using standard off the shelf tools with a tiny bit of architecting, you can hit huge performance numbers at a reasonable cost. We have lots of various ways to shave performance, but we're pretty sure we could scale 2-3x with the same stack without a ton of work.
Postgres is mostly bored (CPU sub 20% utilized) with 60 write requests per second which is normal peak traffic. Even traffic spikes to 100-120 don't usually change that much.
As our ad network grows though, everything has to grow pretty linearly (writes, reads, requests, etc.) and I hope our approach will continue to work well even with double or triple these numbers.
I don't think they need 100 more current connection. Assume one process take 1 connection, only several connection are required.
I think there must be some misconfiguration there. otherwise, the throughout should not be that low.
If possible, the author should setup a demo (contrived query, but similar to production scenario). From there, others can give valid suggestion or improve the code and configure.
A blog post we recently published talking about scaling up our operations. I love using standard tools (Django, Python, Postgres) to achieve this. Definitely shows that you don't need fancy tools until you get huge -- and here is a longer version of that story.
Has your experience led you to identify any areas for improvement within Django and/or Python?
(enjoyed the post, and I like your philosophy of using simple widely available tools; I'm hopeful that any changes you're able to provide back will have a kind of community-benefit flywheel effect)
Honestly, not really. A lot of the core issues that we'd hit were solved years ago (eg. multiple DB's for a read replica, JSON in the DB). I think it's one of the big benefits of using standard "boring" tools, that they do most everything we need them to do. We're not anxiously waiting on a new release, we're happily using new features implemented 5 years ago when we need them.
We've run some tests with PyPy on Read the Docs itself but not for ads. For Sphinx documentation builds, builds took around ~50% of the time. It was especially pronounced on builds with large numbers of doc files (hundreds) and therefore complicated side navigation.
Is the difference really that big? TechEmpower's benchmarks [1] shows express being a few times faster than Django, but it's an order of magnitude at most. You're talking about 4 orders of magnitude. Is it something that's not well covered by benchmarks?
It would be nice if python could get the same amount of effort put in to performance as JavaScript did.
But in this case the interpreter speed is largely irrelevant. This is a complex project and more than half the response time is spent waiting for the database (which is C++, not that it matters either)
We hit a few various issues. One of the worst was the instances becoming totally unavailable for random periods of time, and not being able to do anything really to debug or adjust them. We hacked some code to be able to eventually get into a shell on the machine, but that still didn't give us a ton of control.
Generally we've found that having basic control over the hosting infrastructure is important. We still depend on Azure's LB's and other infra for keeping things available, but having the actual instances our code are running on accessible makes a lot of things easier.
Forgive me for my ignorance (I'm not in the saas biz) but $300-$500/mo seems eye wateringly expensive for what is essentially static content + abuse detection and metering.
less than $100/mo buys you an absolute beast of a VPS at some hosting provider that could alone handle way more than that.
Interesting read.. Have you tried using waitress[0]?
We got some better performance for an internal service than Gunicorn after trying different gunicorn configurations to improve the performance of our workloads. We now mostly run (read depend) one instance of that service instead of 3 like we used to before.
Maybe I didn't run waitress through as full a test as I should have. In my (admittedly shallow) tests, it didn't offer any significant performance benefits over Gunicorn and I stuck with Gunicorn for the simple reason that the rest of our infra used it. Would you be willing to share a bit more about your configuration that made waitress significantly better for you?
I'm not an expert on async views at all but my hunch is it wouldn't do much. I do think async has some pretty interesting applications though on Read the Docs itself in our small proxy that serves docs which are just static files in s3. Currently it's a stripped down Django setup running in separate process from the main RTD app and it mostly sets headers picked up by nginx/sendfile.
Async views alone might not do much, but ASGI support should mean that request handling is async, which might impact the request rate. Django's core is synchronous[1], though, so who knows.
if you do blocking DB operations before serving ads, latency won't be great. Assuming you don't have to wait for transactions to commit, you can probably push required operations onto a queue (Redis would be fine for that) and serve the content immediately?
I think we could probably do it on 1 machine. We have multiple mostly for availability and handling spikes. There's definitely no reason we couldn't use 1 large machine to do this, just not a great reason to run a production app this way.
We do a write on every ad view, so there is a bit more complexity here. Most of those writes should be atomic (eg. each ad will only be viewed once) -- but there is a case where an ad might get viewed, then clicked, and we need the view data to know if the click is valid.
Just an aside, but if the primary goal was to increase capacity (transactions per second), actually performing a write on every ad view would be the first thing I'd seek to eliminate. Is it possible to keep the details as a class/object in RAM, update an in-memory array or cache, and do a bulk write to the DB once every, say, 5 minutes? Even if you lost 0.5% of your data (and I would expect your actual losses would be much lower), we're talking ad clicks, not bank transactions. Eliminating the DB write and confirmation on every request, especially over the network, could easily speed up responses and therefore capacity by 5x or more.
We may have to eventually do that. That also might make more sense for handling requests across continents. Every 5 minutes still seems like not frequent enough but presumably the same approach could be used to write every 5-10s.
What we have considered is not doing any synchronous writes and just queuing it up and handling it async. There's definitely some questions about whether our current approach will scale 10x but it should be fine for the next 2-3x.
Just to shed a bit more light, we break things up into figuring out which ad to show and then handling when that ad is actually seen. The second part is mostly async already but the first part is the harder part. You want to choose the best ad for the content, the geographic targeting needs to match, and the ad campaign has to have budget left on the hour/day/total. If we only checked the budget every 5 minutes, that would be a problem. At some point, multiple servers need to know that there's still budget on a campaign and this requires some amount of synchronization.
Sure, I don't know your exact business, those numbers were just guesses. Still, if you are doing half of your stated capacity (50 requests per second), then a single database write every 5 seconds reduces that traffic 249/250. Much of it depends how often the data needs to be read.
This is a common pattern, however. The easiest thing to set up is to do a customer database query for every request, while the far more efficient option, in many, many cases, is to cache the 1000 most-likely queries and serve up static or pre-rendered data. Not cache like one update per day, but perhaps one update per minute. Only you and your team can decide whether it would cost more to serve up a few free ads (over a customer's budget) because the cache was a minute or two old, vs doubling/tripling your infrastructure costs in order to make the system more precise. Sounds like you've got a good roadmap for making that decision.
I hate when HN'ers just chime in and say you're shit why didn't you do this.
So hopefully I'm not that totally rude person, no-shade intended pointing out that if for some random reason you haven't seen ClickHouse you should check it out.
Solves problems that I parsed from your blog post and originally built for that use case.
100 recs/s for an ad network is super duper mega tiny so I hope you're successful and get bigger!
I was asked that question a lot. It’s true that a month after the Java app is running, it’ll have served more requests than the Python app, but, by then, the Python app will be on its fifth feature release.
Well.... now that you ask. I've enjoyed using [Rust Tide](https://docs.rs/tide/0.16.0/tide/), because it supports async with response objects so you can build the response across async boundardies, unlike [actix](https://www.arewewebyet.org/topics/frameworks/#pkg-actix-web). I've found diesel to be the lingua franca of ORMs right now, but a bit rough to use. Anyways https://www.arewewebyet.org/ gives you the full compendium. My experience is that there isn't a soup-to-nuts framework in rust like django, but you can get there.
There are frameworks that can handle over a million requests per second (simple json output) or at least several 10,000 requests per second if DB queries are performed (even though on different hardware, but just compare the scale).
https://www.techempower.com/benchmarks/#section=data-r20&hw=... https://www.techempower.com/benchmarks/#section=data-r20&hw=...
I think, if performance is a strong requirement, they were better off with another programming language.