I've been lucky enough to have benefitted from Sqlalchemy and Mako (but it's been awhile). Thanks.
This article looked like it was going to hit the sweet spot of stuff I'm curious about, but I found I was still left with questions. If you (or anyone) will indulge me... I'll try and ask a question to help clarify matters.
I work at a University on legacy ERP system(s). During registration there are 800+ concurrent connections but normally it floats around 200. Most all of these connections are idle. As you pointed out, a db may not be io/context bound (still hazy on that one). At the end of the day I consider myself technically astute, but basically a crud business programmer. I understand ACID and transactions; threads and async, maybe not so much.
Where I've always thought async could provide benefit would be in the following scenario. Our apps make a large # of procedural db calls today. If after studying them I realize that many are independent (i.e. reads) and could be 'batched' could that not provide a big performance/latency improvement? I.e. instead of the serial sequence of calls that happen now (even if a stored proc), async allows me to submit multiple sql calls. What I'm calling batch. In this ideal world, sqlalchemy would take care of the details (perhaps with some guidance directives as to whether ordering of results was important) and assemble the results.
Is this not a possible future 'async' sqlalchemy with superior responsiveness? Don't threads block on each sql request?
The rudimental SQLAlchemy-emulation system within aiopg right now can probably accommodate this use case but it is Postgresql specific. "Legacy ERP system" sounds like there's some different database in play there, if you are relying upon closed-source drivers you'd have to get access to a non-blocking API within them. Else you're stuck with threads.
acveilleux's point about caching here is very relevant and I thought of that also. if these are indeed independently read sets of fairly non-changing data, pulling it from a cache is the more traditional approach to reducing latency (as well as the need for 800+ database connections).
This lets you parallelize work across multiple processes while maintaining consistency. When the client is the limiting factor you can use this with multiprocessing. When the db server is the limiting factor you can just use threads (or async.) Postgres backend processes are essentially single threaded.
That depends, do you want to save (some) development time or do you want to save (a lot of) money?
I did some work for a client some time ago that were expecting a lot of read load. Their backend had a bunch of constantly changing data in mongo - but it only refreshed every 10 seconds. I told them initially to just output the aggregated data to an S3 object and have all the clients access it from there. They decided to run loads of servers instead, they were muttering something about AWS Autoscale (even though I told them that wouldn't help).
As expected, I got a call one Friday evening asking if I could take a look at why their servers were timing out. When I got there, there were about 15 frontend servers hammering 1 poor mongo box that was aggregating the same query again and again - and within any 10 second window always getting the same result. I stripped it down to 1 frontend box with an nginx cache (after jumping through a hoop to support jsonp).
After the dust settled they apparently didn't want to admit that it was something that could just be solved with caching so it was described as a configuration issue to the business.
1. Not critical to the logic of your app and thus should be fairly obvious candidate for caching.
2. Critical to the logic of your app and quite possibly done within a specific DB connection / transaction for locking reason.
Trying to be too clever and parallel often flounders on mundane things like running out of DB connections.