I've built three distributed job systems at this point. A handy rule of thumb which I have promoted for years is "build for 10x your current scale."
If you need to handle 70 requests/second, design for 700. If you need to handle 20 servers running batch jobs, design for 200 servers. If you're in a startup that grows 100% per year, you'll be at 8x scale in 3 years. So you'll have time to rewrite as you grow!
Out of three job systems I built, the first one tried to avoid SQL for "scalability." Then we hit a bunch of edge cases that needed transactional integrity, and we were in a world of pain.
My two more recent distributed job systems use PostgreSQL as a coordinator. They're literally built around SELECT FOR UPDATE SKIP LOCKED. One of them routinely controls 350 workers and the other does elaborate prioritization for thousands of jobs. Both of them will continue to run just fine until they're earning millions of dollars a year—for example, the one controlling 350 workers should scale to about 2,000 CPUs with a little work.
Hyperscale technologies are fairly cheap right up until you discover you need transactions. At that point, faking transactional semantics on top of an eventually consistent data store becomes an engineering nightmare.
So sit down, and do the math. If your company was earning $100 million/year, how big would your distributed system need to be? Can you easily get a PostgreSQL instance big enough to handle that load? (Or could you just shard per client?) If so, strongly consider using PostgreSQL. It makes a hundred things trivial.
Similar experience here. Multiple times, I've pushed an SQL-based queue a couple orders of magnitude past the scale where others say SQL craps out and a distributed solution is an absolute requirement. And the SQL solution is typically simpler, requires fewer compute resources, and easier to support in production.
But, to make it work, you've got to know the database well enough to know that things like SELECT FOR UPDATE SKIP LOCKED exist in the first place. Which is a kind of knowledge that's getting quite rare these days, because more and more engineers grow up never having known a world where they aren't walled off from their DBMS's true capabilities by a heavyweight ORM.
> Multiple times, I've pushed an SQL-based queue a couple orders of magnitude past the scale where others say SQL craps out and a distributed solution is an absolute requirement.
What about availability, though? The distributed solution is also useful to avoid downtime in case of single node failure.
Is there an off-the-shelf solution that lets me do that with Postgres? I know the newest version (16) just added active-active replication, but I wouldn’t know how to use that to achieve resilience.
But if you're using Postgres as your queuing system because you're already using it as your core database technology for your app, you've got the same issue. If your single Postgres instance is down then your app is, too, and won't be enqueuing more jobs.
And unless your jobs are trivial then it's highly likely that they interact with your app in some way so it doesn't really matter if your workers are distributed and up, they're not able to complete their work because your app is down because of a single-node Postgres.
What you're pointing out is an architectural constraint that's unrelated to how and where one queues jobs.
If background jobs need to be available while some other core application is down, that needs to be designed for, and that design can be achieved with any queue technology. Simply separate the queue system stack from the core application system stack.
> But if you're using Postgres as your queuing system because you're already using it as your core database technology
Note your own use of "database technology" and not "database server". It's common to have separate application and queue database servers when such an architectural constraint is present. Of course, this sacrifices the benefit of transactional guarantees when the application and background jobs run on the same server.
Like I said in the post, technology (and architectural) choices are tradeoffs all the way down :)
What the article writes about scalability also applies to availability. Does the queue need 99.999% or 99.9999% uptime? Or is the Service Level Objective actually 99.99%, 99.9% or even 99.5%?
With 99.99% you can have 4 minutes of downtime a month. If failover to a hot standby takes a minute then that shouldn't be a problem to achieve a 99.99% uptime SLO.
I don't know PostgreSQL as well. I mostly use MSSQL in production, and it's had good replication for ages, so I just wasn't really that worried about single-node failure for the critical stuff.
And, frankly, even for the less-critical stuff that was only running on a single node, I still dealt with fewer availability problems back in the day than I do now that everything's gone distributed. I think that a thing that's been forgotten over the years is that a lot of this stuff that distributed systems do to be reliable was more about digging oneself out of the hole that was created by running on lots of cheap hardware instead of using a single server with redundancy built-in. I acknowledge that, past a certain scale, that's the only option that makes sense. But if you're not operating at that scale then there's a good chance it's all just verschlimmbessern.
Design (and test) for 10x your current scale, build for what you need now. And the system has to be able to handle peak loads, and if you don't know what those are build in a safety margin or a way to shed or defer work if you need to.
Everything is a tradeoff, optimize for the things that need optimizing, and determining what those are is the hallmark of a good engineer.
I've found the same. You should understand what your 10x / 100x growth solution would look like (assuming that that's relevant - obviously if you have no intent to hit that scale, don't bother). Build your system to handle your 1-1.5x, maybe 10x scale, and make sure you're not blocking the 10-100x solutions by doing so.
"faking transactional semantics on top of an eventually consistent data store becomes an engineering nightmare"
PREACH
If this is news to you, you just justified a week's worth of browsing HN on the clock reading this statement. Scribble it on a scrap of paper and keep it taped to your ATM card.
slower to run, but when you keep the postgres connection open you will know that the job is still running, while with for update skip locked you would need to have a status and a job_timeout basically.
so pg_try_advisory_lock/pg_advisory_unlock can lock over transactions while for update skip locked can't, thus you would either need to keep a transaction open or use status+job_timeout (and in postgres you should not use long transactions)
basically we use c#, but we looked into https://github.com/que-rb/que which uses advisory_locks, since our jobs take like 1 min to 2 hours it was a no-brainer to use advisory_locks. it's just not the best thing if you have thousands of fast jobs per second, but for a more moderate queue where you have like 10000 jobs per minute/10 minutes/30 minutes and they take like 1 min to 2 hours its fine.
we also do not delete jobs, we do not care about storage since the job table basically does not take a lot. and we have a lot of time to catchup at night since we are only in europe
Here's my current favorite recipe for building complex job systems on PostgreSQL. I'm not thinking about "send an email"-type jobs, but bigger jobs that do complex tasks.
The usual trick I use is to have a `jobs.state` field containing "pending", "running", "done", or "error" (or whatever that job system needs). I only hold SELECT FOR UPDATE SKIPPED LOCKED long enough to:
1. Transition from "pending" to "running". Or a second time, to transition from "running" to either "done" or "error".
2. Store the current worker ID (often the Kubernetes pod name).
Then, I can build a watcher that wakes up every 5 minutes, and looks for "running" jobs with no corresponding Kubernetes pod, and mark them as "error". I try to never hold a lock for more than a second or two, and to never lock more than one job at once. This gets me 80% of the way there.
The reason I don't hold a transaction open for the entire job is because every transaction requires a PostgreSQL connection, and connections are surprisingly expensive. In fact, you may want to run connections through pgbouncer or a stateless REST API to avoid holding open hundreds or thousands of connections. Everything except PostgreSQL itself should ideally be stateless and restartable.
You might also have a retry system, or jobs that recursively queue up child jobs, or jobs that depend on other jobs, or more elaborate state machines. You might have timeouts. Most of these things are solveable with some mix of transations, some SQL, CREATE INDEX or CREATE VIEW. A database gives you so many things for free, if you're just a little careful about it.
And since Grafana supports SQL, you can easily build really nice dashboards for support and ops by just querying the tables used by the job system.
There are other ways to do it! But I'm fond of this general strategy for coarse-granularity jobs.
IMO the main issue with it - advisory locks in postgres require an open connection being held the entire time lock is taken. Combine that with thread per connection model...
For several projects I’ve opted for the even dumber approach, that works out of the box with every ORM/Query DSL framework in every language: using a normal table with SELECT FOR UPDATE SKIP LOCKED
I've done even simpler without locks (as no transaction logic), where I select a row, and then try to update a field about it being taken. If 1 row is affected, it's mine. If 0, someone else did it before me and I select a new row.
I've used this for tasks at big organizations without issue. No need for any special deployments or new infra. Just spin up a few worker threads in your app. Perhaps a thread to reset abandoned tasks. But in three years this never actually happened, as everything was contained in try/catch that would add it back to the queue, and our java app was damn stable.
PSA: This is a read-modify-write pattern, thus it is not safe under concurrency unless a transaction isolation level of SERIALIZABLE is specified, or some locking mechanism is used (select for update etc).
The part about checking the number of affected rows hints at using `UPDATE ... WHERE ...` which should act as an atomic CAS regardless of isolation level.
Edit: To clarify, I mean `SELECT id WHERE used = 0` followed by `UPDATE ... SET used = 1 WHERE id = ... AND used = 0`
I don't get it :(. Why could the same task be executed more than once? From my understanding, if the UPDATE is atomic, only one worker will be able to set `used = 1`. If the update statement is not successful (affected != 1), then the worker should drop the task and do another select.
With a transaction isolation level below SERIALIZABLE you can have two transactions that both read the old row (with `used = 0`) at the time they perform the update (but before they commit the transaction). In that case, both transactions will have performed an update (rows affected = 1).
Why would both transactions see `used = 0`? The DB server tries to isolate transactions and actively hides effects of other transactions that have not committed yet.
This is not true in postgres. When the second transaction tries to update the row, it will wait for the first transaction to commit first and then recheck the WHERE.
This should be safe under SI (other than the ABA issue, which isn't even fixed with serializable). The update forces a W-W conflict, which is sufficient to make the behavior serializable under SI (and therefore, I think but am not sure, PG's RR level too).
I guess you update it with the assigned worker id, where the "taken by" field is currently null? Does it mean that workers have persistent identities, something like an index? How do you deal with workers being replaced, scaled down, etc?
Just curious. We maintained a custom background processing system for years but recently replaced it with off the shelf stuff, so I'm really interested in how others are doing similar stuff.
No, just update set taken=1. If it was a change to the row, you updated it. If it wasn't, someone updated before you.
Our tasks were quick enough so that all fetched tasks would always be able to be completed before a scale down / new deploy etc, but we stopped fetching new ones when the signal came so it just finished what it had. I updated above, we did have logic to monitor if a task got taken but never got a finished status, but I can't remember it ever actually reporting on anything.
I would set the taken field to a timestamp. Then you could have a cleanup job that looks for any lingering jobs aged past a reasonable timeout and null out the field.
We have a "status flag" column which is either Available, Locked or Processed (A, L and P), an Updated column with a timestamp of when it was last updated, and a Version counter.
When grabbing a new message it selects "Available or (Locked with Updated timestamp older than configured timeout)". If successful it immediately tries to set the Locked status, Updated timestamp and bumps the Version counter, where the previous values of Status and Version has to match. If the update fails it retries getting a new message.
If the Version counter is too high, it moves the message to the associated dead-letter table, and retries getting a new message.
This isn't for high performance. I tested it and got 1000 messages/sec throughput with handful of producers and consumers against test db instance (limited hardware), which would be plenty for us.
I wrote it to be simple and so we could easily move to something AMPQ'ish like RabbitMQ or Azure Service Bus when needed. Overall quite easy to implement and has served us well so far.
it wont work with a timestamp because each write will have an affected row of 1 beacuse the writes happen at different times. setting a boolean is static
That is the sort of thing that bites you hard when it bites. It might run perfectly for years but that one period of flappy downtime at a third party or slightly misconfigured DNS will bite you hard.
But compared to our rabbit setup where I work now, it was dead stable. No losing tasks or extra engineering effort on maintaining yet another piece of tech. Our rabbit cluster acting up has led to multiple disasters lately.
Agreed, I've had my own rabbit nightmares. But setting up a more robust queue on postgresql is easy, so you can easily gain a lot more guarantees without more complexity.
I've done this successfully with a web service front that retrieves jobs to send to workers for processing, by using a SQL table queue. That web service ran without a hitch for a long time, serving about 10 to 50 job consumers for fast and highly concurrent queues.
My approach was:
- Accept the inbound call
- Generate a 20 character random string (used as a signature)
- Execute a sql query that selects the oldest job without a signature and write the signature, return the primary key of the job that was updated.
- If it errors for any reason, loop back and attempt again, but only 10 times, as some underlying issue exists (10 collisions is statistically improbable for my use case)
- Read the primary key returned by that sql query and read it, comparing it's signature to my random one.
- If a hit, return the job to the caller
- If a miss, loop back and start again, incrementing attempts by 1.
The caller has to handle the possibility that a call to this web service won't return anything, either due to no jobs existing, or the collision/error threshold being reached.
In either case, the caller backs for it's configured time, then calls again.
Callers are usually in 'while true' loops, only existing if they get an external signal to close or an uncontrolled crash.
If you take this approach, you will have a function or a web service that converts the SQL table into a job queue service. When you do that, you can build metrics on the amount of collisions you get while trying to pull and assign jobs to workers.
I had inbuilt processes that would sweep through jobs that were assigned (had a job signature) and weren't marked as complete, it actioned those to handle the condition of a crashed worker.
There are many many other services the proper job queues offer, but that usually means more dependencies, and code libraries / containers, so just build in the functionality you need.
If it is accurate, fast enough, and stable, you've got the best solution for you.
The reason why you want to use skip locked is so that Postgres can automatically skip rows that are being concurrently accessed for updating the "status". You are right, if you update a "status" field you don't really need to worry about advisory locks and skipping rows that are locked but it still helps with performance if you have a decent amount of concurrent consumers polling the table.
I recently got introduced to this system at work, and also built a new job using it. It works fine, but since I had to implement work stealing to deal with abandoned jobs in a timely manner, I wouldn't dare to use it for actions that absolutely must not happen twice.
Exactly-once is only meaningfully possible if you have a rollback for tasks of unknown completion state - for example if the task involves manipulating the same database as the one controlling the task execution. Otherwise, it becomes the (impossible to solve) two-generals problem between updating the task status and performing the task.
There is actually another possibility: there must be a way to check whether the receiving system has received the message. But this only works if there are no "rogue" senders.
Agenda uses this, and we found the hard way on mongo 4 that it can lead to mongo spinning the CPU at 100% if it gets too many at once. No idea if they've fixed it in later versions.
I recently published a manifesto and code snippets for exactly this in Postgres!
delete from task
where task_id in
( select task_id
from task
order by random() -- use tablesample for better performance
for update
skip locked
limit 1
)
returning task_id, task_type, params::jsonb as params
Holding a transaction open for the duration of a request to an external service makes me nervous. I've seen similar code lock up the database and bring down production. Are you using timeouts and circuit breakers to control the length of the transactions?
Yes, you absolutely need to set a reasonable idle transaction timeout to avoid a disaster (bugs in the code happen) - this can also be done globally in the database settings.
This article was written in 2015, a year before idle_in_transaction_session_timeout parameter was added (in Postgres 9.6) - which is unfortunately still disabled by default, but that's the easiest way to make sure no transaction sits idle for too long.
In my experience, a queue system is the worst thing to find out isn't scaling properly because once you find out your queue system can't architecturally scale, there's no easy fix to avoid data loss. You talk about "several thousand background jobs" but generally, queues are measured in terms of Little's Law [1] for which you need to be talking about rates; according to Little's Law namely average task enqueue rate per second and average task duration per second. Raw numbers don't mean that much.
In the beginning you can do a naive UPDATE ... SET, which locks way too much. While you can make your locking more efficient, doing UPDATE with SELECT subqueries for dequeues and SELECT FOR UPDATE SKIP LOCKED, eventually your dequeue queries will throttle each other's locks and your queue will grind to a halt. You can try to disable enqueues at that point to give your DB more breathing room but you'll have data loss on lost enqueues and it'll mostly be your dequeues locking each other out.
You can try very quickly to shard out your task tables to avoid locking and that may work but it's brittle to roll out across multiple workers and can result in data loss. You can of course drop a random subset of tasks but this will cause data loss. Any of these options is not only highly stressful in a production scenario but also very hard to recover from without a ground-up rearchitecture.
Is this kind of a nightmare production scenario really worth choosing Boring Technology? Maybe if you have a handful of customers and are confident you'll be working at tens of tasks per second forever. Having been in the hot seat for one of these I will always choose a real queue technology over a database when possible.
> and are confident you'll be working at tens of tasks per second forever.
It's more like a few thousand per second, and enqueues win, not dequeues like you say... on very small hardware without tuning. If you're at tens of tasks per second, you have a whole lot of breathing room: don't build for 100x current requirements.
This link is simply raw enqueue/dequeue performance. Factor in workers that perform work or execute remote calls and the numbers change. Also, I find when your jobs have high variance in times, performance degrades significantly.
> This doesn't really make sense to me. To me, the main problem seems to be that you end up with having a lot of snapshots around.
The dequeuer needs to know which tasks to "claim", so this requires some form of locking. Eventually this becomes a bottleneck.
> don't build for 100x current requirements
What happens if you get 100x traffic? Popularity spikes can do it, so can attacks. Is the answer to just accept data loss in those situations? Queue systems are super simple to use. I'm counting "NOTIFY/LISTEN" on Postgres as a queue, because it is a queue from the bottom up.
> Factor in workers that perform work or execute remote calls and the numbers change.
These don't occur on the database server, though... This merely affects the number of rows currently claimed.
> The dequeuer needs to know which tasks to "claim", so this requires some form of locking. Eventually this becomes a bottleneck.
These are just try locks, though-- the row locks are not contended. The big thing you run into is having lots of snapshots around and having to skip a lot of claimed rows for each dequeue.
> What happens if you get 100x traffic? Popularity spikes can do it, so can attacks.
If you get 100x the queueing activity for batch jobs, you're going to have stuff break well before the queue. It's probably not too easy to get 100x the drain rate, even if your queue system can handle it.
This scales well beyond 100M batch tasks per day, which gets you to 1M users with 100 tasks/day each.
Throttle the inputs. Rate-limiting doesn’t belong to the data layer.
While throttling due to organic popularity isn’t great, I’d argue the tradeoffs might be worthwhile. If it looks like the spike will last, stand up Redis during the throttling, double-write, and throttle down the Postgres queue until it’s empty. If you really need to, take a 15 minute outage to just copy data over.
What happens when you get 500x the traffic or 50x?
How does the system behave when the traffic rate is higher for which it was designed for or can currently handle? Because that number will always be there, even in a "scalable" system. One won't be able to add capacity at the same rate that work will increase.
NOTIFY/LISTEN isn't a queue it has broadcast semantics. Postgres queueing is really just the SELECT FOR UPDATE SKIP LOCKED, the NOTIFY/LISTEN allows you to reduce the latency a bit but not essential.
If you find yourself in that situation, migrating to a more performant queuing solution is not that much of a leap. You already have an overall system architecture that scales well (async processing with a queue).
_Ideally_ the queuing technology is abstracted from the job-submitters/job-runners anyway. It's a bit more work if multiple services are just writing to the queue table directly.
I agree that the _moment_ the system comes to a screeching halt is definitely not fun.
You are going to have the same scaling issues with your datastore. I don't really understand why you say that your dequeue queries will throttle each others locks and grind it to a half? Isn't that the whole point of SKIP LOCKED?
>Applied to job records, this feature enables simple queue processing queries, e.g. SELECT * FROM jobs ORDER BY created_at FOR UPDATE SKIP LOCKED LIMIT 1.
Also, postgres partial indexes can be quite helpful in situations where you want to persist and query intermediate job lifecycle state and don't want multiple rows or tables to track one type of job queue
Skip locked is useful till you have to maintain order for a group of messages with some "group_id", so that set of related messages are sent one after the other.
Then you probably have to write complicated queries or use partitions in some sort.
article says he also uses "order by" clause, but I am wondering if it will severely limit throughput since all messages will need to be sorted on each lookup, but this probably can be solved by introducing index.
It seems strictly worse to use ORDER BY in this case, since if you're using SKIP LOCKED you should be doing parallel processing anyway, and if you're doing parallel processing, ordering is already going out the window.
Unless you can guarantee that the processing time of each job is exactly the same, if you have multiple workers processing the same queue, you can’t order anything except the start time.
You can use locks to effectively break the queue into sub queues so that each sub queue is only being processed by 1 worker. Then you can order that sub queue.
job should be attempted inthe same order/priority they are enqueued, that's the meaning of the word "queue". That they take varrying amounts of time is another matter.
Queue can clearly mean "work that needs to be completed" not necessarily 'work completed in order'. Your definition is much stricter than it needs to be for most use cases.
There is clearly a conceptual difference between a set of things from which you pull things out randomly, and a queue. A queue always has intrinsic criteria to select the next item to be pulled out.
There are many times when the start order doesn’t really matter, and the additional sorting overhead isn’t worth it. In those cases people will still tend to refer to the entity holding the jobs to be processed as a queue despite the fact that it doesn’t strictly follow FIFO order.
If they are being technically precise, queue isn’t the correct term, but language changes with context and time. Either way the implementation isn’t wrong if strict start order has been considered and isn’t important.
I’m not confusing anything. I’ve seen random selection “job queues” implemented many times. As long as you truly don’t care about start order, it’s fine to trade it for increased throughout.
It means that you are telling pg that you don’t care about order, so it is free to optimize the query in whatever way it wants to. The order can change query to query depending on numerous external factors.
I’m not using pg itself as an example. I’m using a specific implementation of a “job queue” built with pg.
I’ve seen and you can search for and find many implementations of “job queues” using relational databases where job start order guarantees are traded away for throughput.
You have no ordering guarantees, so how can order be important? If 4 work items are scheduled on 4 independent workers, you have no guarantee which will start first or finish first.
I think the order matter at least because you want to have some FIFO approximation, otherwise some tasks can forever stuck in queue and never be picked up.
I often see the "engineers copy FAANG infrastructure because they want to be cool, even though their needs are completely different" take as a kind of attack on engineers.
But I think a lot of it is also about knowledge and documentation. If I want to copy FAANG or another startup, and set up an infinitely scalable queue-based architecture, I can find dozens of high quality guides, tutorials, white papers etc, showing me exactly how to do it. Yes maintenance is higher, but I can get set up with redis, SQS, any of the 'scalable' solutions within a few hours of copy-pasting commands and code and configuration from a reputable source.
If I want to use NOTIFY in postgres? I googled "SQLALchemy notify listen postgres" and I find a few unanswered stackoverflow questions and a github gist that has some code but no context.
I would honestly love to use this approach for a side project, but I don't have 2-3 days to figure it out on my own. The direct choice for me might seem to be
* simple, but not scalable (ie just use postgres)
* complex, but scalable (ie redis, sqs, whatever)
and then it's a tradeoff, and the argument goes that I am blinded by cool tech and FAANG and I'm choosing complex but scalable, even though I don't need scalable.
But taking into account guides and other resources, the choice for me is actually
* complex and not scalable (this, because I don't know how to implement it and I can't predict what pitfalls I might face if I try)
* simple and scalable (what everyone actually does)
and that makes the engineer's choice to follow faang look a lot more reasonable.
Python, Flask, SQLAlchemy, and Postgres all have great documentation individually, but if I am building an application at the intersection often a guide on exactly how to join them all up is much faster than using each individually and trying to figure out the interactions in four places.
AWS white papers and engineering blogs tend to give me everything I need in one place, and I don't think there are any for apps built with NOTIFY.
SQLAlchemy is an extra abstraction blocking your path here. While you probably should still use an ORM for your regular relation queries, you are not gaining anything significant by trying to use SQLAlchemy for implementing a queue backend. You can write raw SQL with psycopg2 (which is already a dependency in your project thanks to SQLAlchemy), and wrap these raw queue management SQL in a nice little Python module which you can later reuse for other applications as well.
Without being rude, what are the nice features? I've worked with it a bit and constantly found myself wishing it was just SQL whenever I've bumped into it
If SQLAlchemy’s documentation doesn’t explain its use with LISTEN/NOTIFY, perhaps it’s the wrong tool for the job? You are presumably not going to use it with Redis or SQS queues, so why are you so hung up on it here?
IMHO - and this probably why I’ll never launch a product - you should understand each piece of your infra. Not necessarily to the metal on each, but I don’t think it’s unreasonable to be able to explain why each piece is necessary, what it’s doing, and how to troubleshoot it when it breaks.
With your mentioned list, three of them are Python, so that significantly reduces the breadth.
Scalability comes at a price. Unless you need it, it makes you less flexible. And that is exactly what you don't want to be as a startup.
For instance, if you use postgres with a low load, it is almost trivial to migrate schemas, add new constraints, do analytics etc.
If you use SQS, Cassandra, whatever, then you now get scalability/availability but it becomes much more time-consuming to change things if you figure out that your original design doesn't work. Say the business comes and says "please add constraint X. All users of type foo must never combined value bar at the same time."
It is possible to implemented that without postgres, but it is not easy or simple, especially if you need to make changes.
Therefore, my take is that you either use postgres to stay flexible or you use both postgres and something else on top of it when you know that you won't have to change things. Of course this means additional infrastructure/maintenance overhead.
In the end it's always a trade-off, you just need to know when to trade which thing off against what.
This is all true, important and often misunderstood, but beside the point made to which you reply.
There's a (sort of) objective trade-off to be made, but another dimension is how familiar you are with the solution and/or how quick can you implement it using documentation and examples.
If you happen to know exactly how to create a horizontally scalable microservice based hairball with nodejs, then maybe you are quicker with that than with some traditional django monolith using a nicely normalized sql database (or whatever).
In a startup, you are often always squeezed for time, so making the objectively right tradeoff for your context is usually secondary to the simple question of 'when can you ship?' If the scalable-yet-inflexible is what stack overflow abundantly recommends and documents, maybe this is quicker to get done now, whatever the consequences are on the longer run.
Then maybe I just don't understand your post. To me it sounds like you say "FAANG-technology" is chosen because of documentation. But I don't think that the documentation of e.g. SQS is better than the postgres (if you can even compare the too).
If someone says "I choose X over Y because I used X before (or because X is better documented" then fair enough - but I rarely hear that as an argument when choosing "FAANG-technology".
> If I want to copy FAANG or another startup, and set up an infinitely scalable queue-based architecture, I can find dozens of high quality guides, tutorials, white papers etc, showing me exactly how to do it.
I'm not sure about this either, though from reading typical developer blogs and listening to the hivemind, you do get the feeling that you must be scalable. Devs often don't really know when (usually not) that becomes important and how far the vast majority of apps can go with monoliths in big boxes (quite far).
But my response would then be that this is a stupid example in the context of this whole submission because that submission talks about postgres and trying to get postgres to scale "infinitely" let alone fulfill other properties like extremely high uptime etc. that is just... insane. No one in their right mind tries to do that with postgres. It is one thing to do queueing with it but "infinitely scalable" is a totally different one.
Therefore I can only say: yeah, to set up "an infinitely scalable queue-based architecture" you should not use postgres and the author in the submission says the same thing.
> Devs often don't really know when (usually not) that becomes important and how far the vast majority of apps can go with monoliths in big boxes (quite far).
Right, they make the wrong trade-offs. That is exactly what I wanted to express with my response.
I often find that the tooling I have at work helps speed up the development of more complicated solutions. So saying that FAANG solutions are easy to use and you can be fast at is easy when you have that FAANG support. Even just non-FAANG but large enterprises allow for that, but for startup's is easy to forget how the environment (including tooling) helps speed up all of that work immensely.
So yeah, I find a lot of the more complicated solutions to be simple, but mostly because it's well supported and not by just me.
But that is not what we are discussing here. From the submission:
> There’s a good chance that you’re already using a relational database, and if that relational database is Postgres, you should consider it for queues before any other software
The point is, if you are already using postgres, then the question is not: should I use postgres for queueing and the rest or should I use postgres for the rest and a FAANG solution for queueing on top of it.
Now the thing is that the FAANG solutions are great in certain ways and allow you to scale a lot and have extremely high availability. But it comes at the cost, for examply those solutions don't support transactions like postgres does. So if you need those (and often you don't know in advance how the business of a startup develops) then now you have to build some technical solution on top of the FAANG solution which is much much slower and more complicated compared to doing it in postgres.
Even if you say that it's more difficult to setup and understand the queueing in postgres (and I agree), I would argue that in the end it is still faster because you don't need to setup and maintain all the infrastructure (yeah, even if it runs in the cloud) unless this is a prototyp and you don't care about security, documentation and all of that and throw it away in the end anyways.
> Scalability comes at a price. Unless you need it, it makes you less flexible. And that is exactly what you don't want to be as a startup.
This is a valid comment. I’ve chosen Postgres in the past for the features, not the performance. For example guaranteed at most once delivery (via row locks) and filtering of jobs based on attributes (it’s a database after all).
> If I want to use NOTIFY in postgres? I googled "SQLALchemy notify listen postgres" and I find a few unanswered stackoverflow questions and a github gist that has some code but no context.
Author here. I would say that my post is less targeted at someone like you (application developer, presumably) and more targeted at library developers.
I don't think it's ideal for everyone to be implementing bespoke, Postgres-backend (or any other queue for that matter) background job workers in their applications. There's a lot of nuance and implementation details to get wrong with background jobs, and for that reason I think background work should generally be done by more comprehensive, dedicated libraries or frameworks.
If every Rails application didn't have Sidekiq/Active Jobs and instead had bespoke background worker implementations, Rails applications would likely have a much less rosy reputation on account of their unreliability.
I love the article's point, and I tend to feel that the "chasing the cargo cult of 'scale'" is maybe the biggest problem I see in development teams today. It is certainly the biggest problem that I rarely hear anybody talking about.
Author here. I would say that my post is less
targeted at someone like you (application developer,
presumably) and more targeted at library developers.
I think the article might benefit from clarification on this point.
Reading the HN comments, I see that I'm not the only person who came away with a misunderstanding there.
This is what kills you if you're a small startup. Of course it gives you a lot too. But if you're belly up then it doesn't matter.
Of course go for whatever solution gives you the most benefits while not distracting you too much from your main goal.
I've seen a startup where devs spent around 80% of the time fighting their tools and infrastructure. They had a 3 month runway and today there's a massive hole at the end of that runway. I still shudder form just the thought of it.
Another point is: you don't need scalable now, but may (or even hope) to need it later, and you know that when you will need it you probably won't have time to invest into migrating this component.
Also: you may think that you may one day want to be hired by a FAANG.
How relevant is it to be hired by a FAANG? I have some experience with "web scale" systems, but I tend to reject FAANG recruiters because Leetcode makes me want to become an apple farmer (no pun).
> I tend to reject FAANG recruiters because Leetcode
I understand the pain of leetcode interviews. They’re terrible. But optimizing your career based on the interview process seems… backwards?
FAANG companies (for example) are very relevant if you want to make a lot of money and live in Silicon Valley without being a successful founder/VC. Apple farmers… not so much. If you live in Tokyo, then FAANG companies might be less relevant.
Either way, doesn’t seem like the interview is where you should draw the line.
I guess my career is pretty close to optimal. I get to work on interesting problems from anywhere I want and save a ton of money. If you are an EU candidate, FAANG companies want you to relocate to some city in the UK or Ireland, which would obliterate my savings rate, and are worse places to live than most mainland EU areas. I understand not everyone is as fortunate as me, which increases their motivation to grind Leetcode and the like.
Hard disagree. Some things are difficult to change later on, others not so much, and you can't do everything for v1. The product has to launch at some point. Your choice of queue is one of the things you'll be able to change. Don't complicate things unless you've run the numbers and know you'll need to. A lot of very large companies do just fine with using relational databases as queues.
If your first point holds, then all app components should be “scalable” from the beginning, because you may not have time to make it so later.
And that’s terrible advice, of course. You very likely will have time to scale things up (customer count almost never increase dramatically from one day to the next), and even if you don’t you’ll most likely never deliver a useable product if all components need to be “scalable” from the beginning.
This is all a matter of balancing constraints. I wrote "you know that when you will need it you probably won't have time to invest into migrating this component.". I didn't wrote "always go for the more scalable".
For a starter "the most scalable component is always the most difficult to integrate and use" isn't true, and "whatever your team knows or don't know, the challenges tied to integrating then exploiting a given component are always the same". There are many parameters. In some contexts taking into account the team's subjective preferences is crucial.
There is no universal rule, à la "always go for the most scalable, neglecting any other consideration" or "the minimal immediate effort is always the best option".
Building things you don't need in hopes that you'll need them because things you aren't spending time on will grow to demand them is like hiring an investment manager when you're in debt.
but I can get set up with redis, SQS, any of the
'scalable' solutions within a few hours of copy-pasting
commands and code and configuration from a reputable
source
[...] and that makes the engineer's choice to follow faang
look a lot more reasonable.
I also agree with the linked article's overall point, but I think the specific "job queue" example from the article is actually a bad example because:
- "rolling your own" job queue is not rocket science but is nontrivial and easy to get wrong w.r.t. locking etc.
- the argument against taking additional dependencies is that now you have one more tool to master, understand, and manage. but my experience is that job queues like Sidekiq are not a significant overhead in terms of developer burden.
That's a great point, that often people misunderstand.
It's even worse than you say though. As someone who has used neither Postgres or Redis for queueing, how am I supposed to know what is the "simple" solution here and if it really solves my problem?
Almost everyone uses solution X. A few people are saying "no, just use solution Y, it's obviously enough and far simpler". Even if it is far simpler, how am I supposed to know whether there are some hidden gotchas here?
Much safer to bet on technology that is proven to work, given that large amounts of people are using it in production for this purpose.
1. The main downside to using PostgreSQL as a pub/sub bus with LISTEN/NOTIFY is that LISTEN is a session feature, making it incompatible with statement level connection pooling.
2. If you are going to do this use advisory locks [0].
Other forms of explicit locking put more pressure on the database while advisory locks are deliberately very lightweight.
My favorite example implementation is que [1] which is ported to several languages.
At my previous company, they switched from using NOTIFY/LISTEN for Postgres notifications to a custom solution built on top of logical replication. As I understand it, part of the reason was reliability. I never touched that part of the code, but I believe the idea was to subscribe to logical replication updates and send out notifications based on these.
Skype used postgres as queue with a small plugin to process all their CDR many years ago. I have no idea if it used these days but it was 'web scale', 10 years ago. Just working, while people on the internet argued about using a database as a queue is an anti-pattern.
The software works excellently in a development environment and performs well when running as a single instance. However, I encountered issues when scaling it up for high availability in a clustered setup. The system would fail inconsistently, with two masters consuming messages simultaneously, which wasn't ideal for my use case. Eventually, I switched to Kafka and haven't revisited the original solution since.
It's worth noting that these issues might have been due to my improper configuration. Nevertheless, if the configuration process is fraught with pitfalls, that's problematic in itself. I've had these experiences more than once.
Additionally, I found a critical race condition in the Python library, rendering it practically unusable for me. I submitted a bug report with a minimal example demonstrating the issue. I considered fixing it myself, but since using RabbitMQ wasn't crucial for my project, I switched to ZeroMQ, which didn't require a broker. The issue was acknowledged and fixed about a year later. At the time, I had to assume that nobody else was using the Python bindings.
Three years ago, I worked on a project that used the software for a Celery queue. Messages would occasionally go missing, although this could have been a configuration issue on our part. Ultimately, we replaced it with a Redis queue (not the best practice, I admit) and didn't look back. This was for a lower-availability use case where a single instance of Redis sufficed.
I used RabbitMQ for a while and nothing but problems.
Admittedly I probably shouldn't have used it the way I did. I dumped many millions of tasks into it, then fanned out processes pulling from that queue that took a variable amount of time to run. Some ran in seconds, some hours.
I had picked RabbitMQ because I wanted that queue to be durable and resist workers dying, or being restarted. However long lived tasks like this is not really what it was designed for (in my opinion). I kept running into issues where it would take a long time to restart, and stop answering connections and need a restart to continue. I ended up having to write monitoring code to check for this and handle it to have it be slightly reliable.
Im sure it works well for smaller short lived messages, but considering the issues I bumped into I would be hesitant to try it. Id probably reach to redis first with wrappers allowing me to swap out to any other queue as required first.
I can share our experience with RabbitMQ/SQS/Sidekiq. Our two major issues have been around the retry mechanism and resource bottlenecks.
The key retry problem is "What happens when a worker crashes?".
RabbitMQ solves this problem by tying "unacknowledged messages" to a tcp connection. If the connection dies, the in-flight messages are made available to other connections. This is a decent approach, but we hit a lot of issues with bugs in our code that would fail to acknowledge a message and the message would get stuck until that handler cycled. They've improved this over the past year or so with consumer timeouts, but we've already moved on.
The second problem we hit with RabbitMQ was that it uses one-erlang-process-per-queue and we found that big bursts of traffic could saturate a single CPU. There are ways to use sharded queues or re-architect to use dynamically created queues but the complexity led us towards SQS.
Sidekiq solves "What happens when a worker crashes?" by just not solving it. In the free version, those jobs are just lost. In Sidekiq Pro there are features that provide some guarantees that the jobs will not be lost, but no guarantees about when they will be processed (nor where they will be processed). Simply put, some worker sometime will see the orphaned job and decide to give it another shot. It's not super common, but it is worse in containerized environments where memory limits can trigger the OOM killer and cause a worker to die immediately.
The other issue with Sidekiq has been a general lack of hard constraints around resources. A single event thread in redis means that when things go sideways it breaks everything. We've had errant jobs enqueued with 100MB of json and seen it jam things up badly when Sidekiq tries to parse that with a lua script (on the event thread). While it's obvious that 100MB is too big to shove into a queue, mistakes happen and tools that limit the blast radius add a lot of value.
We've been leaning heavily on SQS the past few years and it is indeed Simple. It blocks us from doing even marginally dumb things (max message size of 256gb). The visibility timeout approach for handling crashing workers is easy to reason about. DLQ tooling has finally improved so you can redrive through standard aws tools. There are some gaps we struggle with (e.g. firing callbacks when a set of messages are fully processed) but sometimes simple tools force you to simplify things on your end and that ends up being a good thing.
Do you have any experience with NATS, and how would you compare it to RMQ/SQS?
The authors claim it guarantees exactly-once delivery with its JetStream component, and it looks very alluring from the documentation, but looks can be deceiving.
> The authors claim it guarantees exactly-once delivery
I find this definition has morphed from one meaningful to developers into one queue implementations like to claim. I've learned this generally means "multiple inserts will be deduped into only one message in the queue".
The only guarantee this `exactly-once` delivery provides is that I won't have two workers given the exact same job. Which is a nice guarantee, but I still have to decide on my processing behavior and am faced with the classic "at most once or at least once" dilemma around partially failed jobs. If I'm building my system to be idempotent so I can safely retry partially failed messages it doesn't do much for me.
It has multiple mode. One of them is explicitly acknowlede mode. If the worker finished process the job but doesn't ack, the message will appear again.
SQS limits you further in other ways. For instance, scheduled tasks are capped to 15m (delaySconds knob), so you'll be stuck when implementing the "cancel account if not verified in 7 days" workflow. You'll either reenqueue a message every 15m until its ready (and eat your the SQS costs), or build a bespoke solution only for scheduled tasks using some other store (the database usually) and another polling loop (at a fraction of the quality of any other OSS tool). This is a problem well solved by sidekiq, despite the other drawbacks you mention.
If you wanted to handle this scenario with the serverless AWS stack, my recommendation would be to push records to Dynamo with TTLs and then when they pop have a Lambda push them onto the queue. Would cost almost nothing to do this. If you had 10 million requests a month your Lambda cost would be ~$150 to run this (depending on duration, but just pushing to a queue should be quick). Dynamo would be another ~$50 to run, depending how big your tasks are.
Granted now you need 3 services instead of 1. I personally don't find the maintenance cost particularly high for this architecture, but it does depend on what your team is comfortable with.
I've explored this space pretty thoroughly, including the Dynamo approach you've described. Dynamo does not have a strict guarantee on when items get deleted:
TTL typically deletes expired items within a few days. Depending on the size and activity level of a table, the actual delete operation of an expired item can vary. Because TTL is meant to be a background process, the nature of the capacity used to expire and delete items via TTL is variable (but free of charge). [0]
Because of that limitation, I would not use that approach. Instead I would do Scheduled Lambdas to check for items every 15 minutes in a Serverless Aurora and then add them to SQS with delays.
I've had my eye on this problem for a few years and keep thinking that a simple SaaS that does one-shot scheduled actions would probably be a worthy side project. Not enough to build a company around, but maintenance would be low and there's probably some pricing that would attract enough customers to be sustainable.
> Sidekiq solves "What happens when a worker crashes?" by just not solving it. In the free version, those jobs are just lost.
I've been using Sidekiq for 11+ years in production and I've never seen this happen. Sidekiq (free version) has a very robust retry workflow. What are you talking about here?
He is talking about the case when the worker itself die for some reason. It can be due to for example when the worker died due to using too much memory or if it hits a segfault or whatever.
Yep. OOMs are the most common cause. It's definitely low frequency. On the order of one in a billion. For some systems that's once a year. For us that's once a week. If that's an important job and it just gets dropped, then you've got a problem.
With the paid features to keep it from getting dropped things still can be painful. We have a lot of different workers, all with different concurrency settings and resource limits. A memory heavy worker might need a few GB of memory and be capped at concurrency of 2 while a lightweight worker might only need 512MB and have concurrency of 20. If the big memory worker crashes, its jobs might get picked up by the lightweight worker (and possibly hours later), which will then OOM and all its 19 other in flight jobs all end up in the orphanage. And now your alerts are going off saying are saying the lightweight worker is OOMing and your team is scratching their heads because that doesn't make any sense. It just gets messy.
Sidekiq probably works great outside of containerized environment. Many swear to me they've never encountered any of these problems. And maybe we should be questioning the containerization rather than sidekiq, but ultimately our operations have been much simpler as we've moved off of sidekiq.
Sidekiq will drop in-progress jobs when a worker crashes. Sidekiq Pro can recover those jobs but with a large delay. Sidekiq is excellent overall but it’s not suitable for processing critical jobs with a low latency guarantee.
> This said, I'd use a dedicated queue these days.
I agree, primary reason being that if you're in the cloud (thus this applies to a lot of people but obviously not everyone), all the cloud providers have extremely easy to use, and cheap, hosted queueing tech. Even if you're worried about vendor lockin, queueing primitives are so small (basically push and pop), that it's relatively easy to write things in a way so it would be easy to migrate if necessary.
I commented elsewhere, but in most cases I think it would be a bad idea to host your queue tables and logic in the same instance that hosts your primary data. This if you spin up another PG instance in the cloud, it could very well end up costing you more than a default cloud-hosted queue service.
I don’t think that’s best for small things. Unless you can’t vertically scale your instance to handle to load, being able to join and keep transactions within one data store is massively valuable. I wouldn’t want to open myself to distributed systems problems unless I’m absolutely forced to.
> Skype used postgres as queue with a small plugin to process all their CDR many years ago. I have no idea if it used these days but it was 'web scale', 10 years ago. Just working, while people on the internet argued about using a database as a queue is an anti-pattern.
It works great until it doesn't, and the way it breaks puts you in a state that's very difficult to recover from. And if your excuse for using a database as a queue was that you were already running a database, that cuts both ways: congratulations, your queue mess has now brought down your primary datastore too.
> It works great until it doesn't, and the way it breaks puts you in a state that's very difficult to recover from.
This is similar to saying, 'if I mess up all the tables in one database I wreck the rest'. Just my opinion, but this is not actually a thing in databases.
Maybe compromised the performance of one database due to another loading things up? I think database are developed with this as an important consideration. I can't say I have seen this, but admit it's a possibility.
Also, if you run one postgres, you won't have of an issue running another if you have the experience in production.
> This is similar to saying, 'if I mess up all the tables in one database I wreck the rest'. Just my opinion, but this is not actually a thing in databases.
If you mess up the tables in one database it doesn't affect others, but if you lock up the server where it can't respond to queries, that affects every database running on that server.
> Also, if you run one postgres, you won't have of an issue running another if you have the experience in production.
We're talking about using a different feature that you presumably haven't used before, so you won't necessarily know about the admin side of that.
> If you mess up the tables in one database it doesn't affect others, but if you lock up the server where it can't respond to queries, that affects every database running on that server.
How is it different from: putting multiple queues on same redis, when one queue is locked up, others queue are affected?
If that's a real risk, you can always put them into different instances.
The solution is exactly the same for redis or postgresql
> How is it different from: putting multiple queues on same redis, when one queue is locked up, others queue are affected?
Queues by nature tend to be for tasks that you can tolerate delaying a bit. If your queues can impact your "live"/online processing, that's worse than just impacting other queues.
Also something like redis tends to be a lot simpler and less prone to locking up than the monster that is postgresql.
> No, you already know how to run and manager a postgres database.
Well, maybe. Postgres is large and has lots of features. NOTIFY/LISTEN is a distinct thing with its own quirks, and just because you've been running a postgres database up until now doesn't mean you're going to know about the admin side of these other features.
> congratulations, your queue mess has now brought down your primary datastore too.
Just don't put your queue tables/logic in the same DB instance as your datastore. There are still a lot of benefits to using the same tech even if you have segregated instances.
One of the biggest benefits imo of using Postgres as your application queue, is that any async work you schedule benefits from transactionality.
That is, say you have a relatively complex backend mutation that needs to schedule some async work (eg sending an email after signup). With a Postgres queue, if you insert the job to send the email and then in a later part of the transaction, something fails and the transaction rollbacks, the email is never queued to be sent.
Worth being clear that bridging to another non-idempotent system necessarily requires you to pick at-least-once or at-most-once semantics. So for emails, if you fail awaiting confirmation of your email you still need to pick between failing your transaction and potentially duplicating the email, or continuing and potentially dropping it.
The big advantage is for code paths which async modify your DB; these can be done fully transactionally with exactly-once semantics since the Job consumption and DB update are in the same transaction.
That's kind of missing the parent's point. If you wanted to ensure emails arrive, that sounds like another queue that could be backed by a different table that is also produced into as part of the original transaction.
> One of the biggest benefits imo of using Postgres as your application queue, is that any async work you schedule benefits from transactionality.
This is a really important point. I often end up using a combination of Postgres and SQS since SQS makes it easy to autoscale the job processing cluster.
In Postgres I have a transaction log table that includes columns for triggered events and the pg_current_xact_id() for the transaction. (You can also use the built in xmin of the row but then you have to worry about transaction wrap around.) Inserting into this row triggers a NOTIFY.
A background process runs in a loop. Selects all rows in the transaction table with a transaction id between the last run's xmin and the current pg_snapshot_xmin(pg_current_snapshot()). Maps those events to jobs and submits them to SQS. Records the xmin. LISTEN's to await the next NOTIFY.
Good point. We alleviate that a bit by scheduling our queue adds to not run until after commit. But then we still have some unsafety, and if connection to rabbit is down we're in trouble.
I agree - having to tell a database that something was processed, and fire off a message into RabbitMQ, say, is never 100% transactional. This would be my top reason to use this approach.
> With a Postgres queue, if you insert the job to send the email and then in a later part of the transaction, something fails and the transaction rollbacks, the email is never queued to be sent.
This is true - definitely worth isolating what should be totally separate database code into different transactions. On the other hand, if your user is not created in the DB, you might not want your signup email. Just depends on the situation.
Another benefit of this is that you're guaranteed that the transaction is completed before the job is picked up. With redis-backed queues (or really anything else), you very quickly run into the situation where your queue executes a job depending on a database record existing prior to the transaction being committed (and the fix for this is usually awkward / complex code).
I'm not sure this is really an issue with transactionality as a single request can obviously be split up into multiple transactions, but rather that even if you correctly flag the email as pending/errored, you either need to process these manually, or have some other kind of background task that looks for them, at which point why not just process them asynchronously.
> With a Postgres queue, if you insert the job to send the email and then in a later part of the transaction, something fails and the transaction rollbacks, the email is never queued to be sent.
An option could be use a second connection and a separate transaction to insert data in the queue table.
One thing I love about Kafka is... It's just an append-only log, and a client is essentially just holding an offset. This is conceptually very simple to reason about. It's also persistent and pretty fault-tolerant (you can just go back and read any offset).
Unfortunately, Kafka carries with it enough complexity (due to the distributed nature) that it ends up not being worth it for most use-cases.
Personally I'd love something similar that's easier to operate. You'd probably be able to handle hundreds (if not thousands) of events per second on a single node, and without distributed complexity it'd be really nice.
And yes, in theory you could still use postgres for this (and just never delete rows). And maybe that's the answer.
Considering that you have a native "offset" (auto incrementing id) and the ability to partition by date I would say postgres is a great candidate for a simple Kafka replacement. It will also be significantly simpler to set up consumers if you don't really need to whole consumer group, partition etc. functionality.
Unfortunately `serial` is not sufficient on it's own for that use case. If you observe the values 1, 2, and 4, you can't actually conclude whether 3 exists or not. That transaction may have failed after incrementing the serial, meaning 3 does not exist, or it may be an ongoing transaction which hasn't yet committed, meaning 3 exists but is not yet visible to you.
So if you update your offset to 4 before the transaction for 3 commits, you'll lose 3 forever (unless you control for this, eg by periodically detecting & requeueing orphaned jobs, or by using a strictly serial/gap free integer rather than the built in auto incrementing type).
SERIAL/SEQUENCE/IDENTITY increment immediately, not at commit. They’re just reading from a generator. You can also get the current key from it if you’d like.
Advisory locks also exist, if you want to implement logic in the application to inform you of various row conditions without having the DB care about it.
But for the example given, you could do many things:
* Add some boolean columns for ack and complete. Performance due to cardinality of these will eventually start to suck if they’re being indexed, but by that point (millions of rows) you can have thought of another solution.
* Add ctime and atime columns, each of which can be handled by Postgres natively to update when created / written, respectively. This has the advantage of lending itself nicely to partitioning by date range, if that becomes necessary.
* Have three tables - available, in_progress, and completed. Use triggers or application logic to move entries in an atomic manner.
None of this is necessarily normalized, but if you’re receiving JSON payloads for
the job, 1NF went out the window anyway.
We use exactly this for windmill (OSS Retool alternative + modern airflow) and run benchmarks everyday. On a modest github CI instance where one windmill worker and postgres run as containers, our benchmarks run at 1200jobs/s. Workers can be added and it will scale gracefully up to 5000jobs/s. We are exploring using Citus to cross the barrier of 5000j/s on our multi-tenant instance.
We used postgres for some of our queues back when we were at ~10 msg/s. It scaled quite a bit, but, honestly, setting up SQS or some other queue stack in AWS, GCP, or Azure is so simple and purpose built for the task (with DL queues and the like built in), I don’t know why you wouldn’t just go that route and not have to worry about that system shitting the bed and affecting the rest of the DB’s health.
It seems foolish. I am a big fan of “use the dumbest tool”, but sometimes engineers take it too far and you’re left with the dumbest tool with caveats that don’t seem worth it given the mainstream alternative is relatively cheap and simple.
What I've settled on is "store most job state in the DB, use task queues just to poke workers into working on the jobs".
Storing the job state in the DB means you can query state nicely. It's not going to exactly show the state of things but it's helpful for working through a live incident (especially when most job queues just delete records as work is processed).
And if you make all the background tasks idempotent anyways then you're almost always safe with running a thing like "send a job to the task queue to handle this job".
If you rely _just_ on message queues, there are a lot of times where you can have performance issues, yet have a lot of trouble knowing what's going on (for example, rabbitMQ might tell you the size of your queues, but offer little-to-no inspection of the data inside them).
Ultimately you have to figure out the separation of concerns of the job state and other core state. Ranging from “all state stored in message and will never become out of sync” to “no state stored in message and will never become out of sync”. In between you have “some state stored in db and some in message” and what I’ve found to be useful is keeping stuff in the db that needs to have high end state integrity (or as you said just making sure jobs are cancellable/idempotent).
Tangible example:
We have a video transcoder queue. The state of the video model in our db can change as the video is being finalized in various ways. The transcoder generates thumbnails and assets from the video and also updates its state in the db. So we store job information in the message about what thumbnails we want to generate and the video ID but nothing else. This allows us to look up the video row, see if the same media was already transcoded from the video (and cancel the job), and, if not, run the job and update the video row.
Also (and I know you’re not saying this), but I’ve never understood the argument that keeping queues in Postgres leads to higher data integrity via transaction guarantees. The job is still running on another process outside of the db. The only time this could be true is if the job itself mostly updates state in the db, in which case it’s the small minority of queued workloads (with the majority needing to do non-db compute work).
You can't do better than At Least Once if you're having side effects outside the database, so it's not clear that SQS's weaker semantics have any practical effect.
> I don’t know why you wouldn’t just go that route and not have to worry about that system shitting the bed
Because different software has different requirements. Not having an external service requirement other than Postgres might be a feature of an on-prem/b2b appliance.
Because some software may be projected never outgrow the capabilities of Postgres, and if it does moving to another service can be made very easy.
Because you want a transitional job system and the simplicity of doing it in Postgres.
Yep, we process hundreds of thousands and sometimes a few million jobs daily inside Postgres, using Oban in Elixir.
Having transactional semantics around background jobs is incredibly convenient for things like scheduling email only if the transaction is successful, and so on.
You do need to do a little bit of autovacuum tuning, but once sorted it’s been great for us.
I'm always surprised that when I see people talk about queues I never see anyone mention beanstalkd. I've been using it for basically everything for 10 years and it's solid as a rock, incredibly simple and requires basically no maintenance. It Just Works™
Yes, I liked it when I encountered it a few years back. Lately been using redis and rq on a recent project since we're already using redis for caching. Is there much difference other than that?
We process around 1 million events a day using a queue like this in Postgres, and have processed over 400 million events since the system this is used in went live. Only issue we've had was slow queries due to the table size, as we keep an archive of all the events processed, but some scheduled vacuums every so often kept that under control.
We just have a single table, with a column indicating if the job has been taken by a worker or not. Probably could get a bit more performance out of it by splitting into two tables, but it works as it is for now.
I do enjoy using https://github.com/graphile/worker for my postgresql queuing needs. Very scalable, the next release 0.14 even more so, and easy to use.
Running this exact implementation with 47M jobs processed and counting. SKIP LOCKED is great for VACUUM, and having durable storage with indexes make otherwise expensive patterns like delayed jobs, retries, status updates, "at least once", etc. really easy to implement.
I'm sure Redis is much faster than an RDBMS w/ all the ACID features turned on. The biggest concern I always have with Redis is simply overwhelming the in-memory storage limits when someone wants to do process a large number of good-sized messages at an inconvenient time. #tradeoffs
During my tenure as CTO at a fintech company I built a banking engine using postgres backed queue system using Elixir / Phoenix. It's still in use today. The company processed large volumes of transactions and we were able to do things in real-time in terms of payments. Our system reached a point where I realized that we can scale almost infinitely just using a 2 tier architecture (Elixir / Phoenix / Oban and PostgreSQL)
The industry standard for real-time was anything under 30s end-to-end. We hit many scaling limits as our platform grew. We worked on improving the system instead of looking at postgresql as a problem. We measured and measured and postgresql was never the problem when it came to scaling. Most of the problems that existed was mostly due to design decisions in the system or some technical debt that needed to be solved. When we solved those problems the system flew.
There was one incident where our PostgreSQL instance CPU usage went up to 100% during peak hours and would cause problems in the system. We wondered why, there was a lot of debate around the topic, PostgreSQL doesn't scale, we need a larger instance, or move to a dedicated queue system etc...
I didn't believe the hypothesis that PostgreSQL didn't scale. So I conducted a performance audit on the system myself I discovered one of the engineers forgot to add an index on one of the columns on a major table with > 60m records. There was a lot of back and forth between engineers that discussed whether we should add the index. I simply conducted a small experiment and PG analyze clearly showed a missing index in one of the key tables. After we added a single line of code and deployed the index, during peak hours DB CPU usage would not even exceed 20% we had to scale down our DB to save some $$.
PostgreSQL backed queue system does work.
I've been building distributed systems in large and small scales for over a decade. Before building systems with Elixir / Phoenix I also used to work with Rails where the default is as mentioned in this article is you just adopt Redis. While Redis does have additional overhead when it comes to management of the infrastructure, it also worked well.
Having been through both I'd say a simple PostgreSQL backed solution is the clear winner when it came to not having to manage a 3rd dependency when you already have a complex system. There was no 3rd or 4th piece of the puzzle to theorize hypothesize or experiment with. You just had 2 pieces it is either your application or your database that's the problem. That simplicity saved us a lot of time for debate and doing experiments also squashed any dogmatic values and opinions that just didn't hold any water.
Interesting how the immediate reaction is “postgres does not scale” when there is a single table lacking an index.
This also tells how important competence and knowledge of the system is. People that came in new and didn’t know the system like you do probably lacked the confidence/skills to just “get in” like that.
Yeah, I think though what happened in this scenario probably happens a lot everywhere else also. In my entire career, this type of scenario is very typical. Lots of Meetings / discussions, standups and talking uselessly without jumping in face to face with the actual problem. Things get in the way of the science and facts. Which is why it's important to remove fear, think from first principles and break things down and get your hands dirty.
There was a fear that having to create an index on a table that large would take a long time, and I think some of it was also ego "I intentionally didn't add it in, because so and so reason". This was why I dug in and did my thing, debunk all the fear / opinions / rationalization. Sometimes you just gotta be able to tell people they're wrong supported with empirical evidence. That's how the team will grow. There is just no need to dance around facts. I remember having to tell the team, "taking a long time to run an index is no reason to avoid creating the index".
>> I simply conducted a small experiment and PG analyze clearly showed a missing index in one of the key tables.
Based on this sentence, I interpreted that part as representing that the engineers did not believe the missing index was causing the problem (until the experiment was run).
Yes, one of the theory was that the index wasn't the problem because there was already a multi column index on that particular column. However the PG analyze tool showed some particular query didn't utilize the index, so there needed to be a separate index just for that particular column.
The number of reasons why an RDBMS - especially Postgres - can choose to not use an index is wide. Sometimes it’s your fault, sometimes it’s the table statistics fault.
Good on you for actually empirically determining reality.
For running queues on Postgres with Node.js backend(s), I highly recommend https://github.com/timgit/pg-boss. I'm sure it has it scale limits. But if you're one of the 90% of the apps that never needs any kind of scale that a modern server can't easily handle then it's fantastic. You get transactional queueing of jobs, and it automatically handles syncing across multiple job processing servers using Postgres locks.
You don't even need a database to make a message queue. The Linux file system makes a perfectly good basis for a message queue since file moves are atomic.
My guess is that many people are implementing queuing mechanisms just for sending email.
You can see how this works in Arnie SMTP buffer server, a super simple queue just for emails, no database at all, just the file system.
This is true, and I’ve worked on systems that use this, but it’s a lot more work than just a rename.
I’d recommend that, if you have a Postgres database already, definitely use that instead. Your queues will be transactional and they will get backed up when the rest of your database does.
Well, if you have multiple writers then you need to decide who’s responsible for rotating the queues, and you need to serialise writes; or if each writer has its own queue then the reader has to do more work. And then you need to worry about fsync, and backup. And of course you need to be careful to flush to the queue after each write to avoid partial writes.
Basically I’m saying that there are just a number of potential footguns when using files as queues - I speak from experience! - which are trivially taken care of by a database, especially if you have one already.
I’m not saying that it’s not possible, just that for non trivial applications, it’s certainly more complex than just an atomic file move.
To do anything safe and interesting you’ll need transactions. Using SKIP LOCKED won’t be your bottleneck, your application will. Job queues are about side effects and the rest of your application needs to keep up.
Pretty common advice for scaling Postgres is to deploy pgbouncer in transaction mode in front of it to handle connection pooling.
Advisory locks don’t work in this setup (and will start behaving in strange ways if you do try to use them.) Something to consider if you go this route.
Maybe it's changed in the last year or so, but from benchmarking and writing / running queue software for Postgres - SKIP LOCKED was/is significantly faster. Is that different for MySQL?
Another point is that task queue technology is highly fungible. There's nothing stopping you from starting with cron, adding in Postgres/Redis, then graduating to Kafka or something as need arises. And running all three in parallel, with different jobs in each. I would be surprised if the average Kafka shop didn't also have a bunch of random cron jobs doing things that could be implemented on Kafka or vice versa.
At some point you may want to refactor things to reduce tech debt, but it really is a "and" rather than "or" decision.
I agree with all of this except for the part about “cron”. Cron jobs in my experience quickly become hard to manage and effectively invisible over time.
Use almost anything else to manage job scheduling….
This is exactly what the Oban https://getoban.pro/ Elixir library uses and combining postgres plus actors for queues scales pretty great for 90% of the needs out there. I have used it at my last few jobs at pretty decent scale and would take it over 10 years using Celery to manage queues + supervisord, setting up RabbitMQ or Redis. Its so simple you only need Elixir and Postgres and not 3 or 4 infrastructure pieces to manage a queue.
Temporal, which AFAIK was made by the Uber Cadence team, which was also involved in SQS, uses postgres as a backend.
I used it for a web automation system for an accounting client (automatically read files from a network share, lookup the clients on a database, submit the documents to government websites, using headless browsers, and put the resulting files in the directory). It allows for completely effortless deterministic programs that call workers that run the non deterministic code, with built in configurable retries (react to certain exception type, exponential back off) so you can write code that works almost like there were no issues with api connections, filesystem, etc.
This code has been running for 5 or more years, with barely any maintenance, with 0 issues so far. It keeps everything in postgres, so even full reboots and crashes have no impact, it will just move the work back to the queue and it will run when there's an available worker.
Temporal is a pretty complicated system. It has sharding built in, stores the entire activity history and runs multiple queues for timers and events. I’m a big fan (worked at Uber) but it’s definitely not just postgres with a few indices.
I maintain QueueClassic (https://github.com/QueueClassic/queue_classic) for Rails/Ruby folks; which is basically what you're talking about - a queuing system for Postgres. A bonus reason, and why I originally wanted this was the ability to use transactions fully - i.e. I can start one, do some stuff, add a job in to the queue (to send an email), .....and either commit, or roll back - avoiding sending the email. If you use resque, I found sometimes either you can't see the record (still doing other stuff and it's not committed), or it's not there (rollback) - so either way you had to deal with it.
QC (and equivs) use the same db, and same connection, so same transaction. Saves quite a bit of cruft.
> For example, this Hacker News comment stated that using Postgres this way is “hacky” and the commenter received no pushback. I found the comment to be load of BS and straw man arguments. This thinking seems to be “the prevailing wisdom” of the industry – if you want to talk about queue technology in public, it better not be a relational database.
I don't think that there's anything wrong with using a database as a queue, however, I think that there probably could have been better ways to get across the idea, rather than just dismissing an honest opinion as BS. I don't necessarily agree with all of what was said there, but at the same time I can see why those arguments would be reasonable: https://news.ycombinator.com/item?id=20022572
For example:
> Because it is hacky from the perspective of a distributed system architecture. It's coupling 2 components that probably ought not be coupled because it's perceived as "convenient" to do so. The idea that your system's control and data planes are tightly coupled is a dangerous one if your system grows quickly.
To me, this makes perfect sense, if you're using the same database instance for the typical RDBMS use case AND also for the queue. Then again, that could be avoided by having separate database instances/clusters and treating those as separate services: prod-app-database and prod-queue-database.
That said, using something like RabbitMQ or another specialized queue solution might also have the additional benefit of bunches of tutorials and libraries, as well as other resources available, which is pretty much the case whenever you have a well known and a more niche technology, even when the latter might be in some ways better! After all, there is a reason why many would use Sidekiq, resque, rq, Hangfire, asynq and other libraries that were mentioned and already have lots of content around them.
Though whether the inherent complexity of the system or the complexity of your code that's needed to integrate with it is more important, is probably highly situational.
1. Many queueing solutions struggle with at most once delivery. I've used a queue I built on top of RDBMS to solve this before. Ironically, when I was a junior we used a key/value store to lock certain messages for at most once delivery, which doubly compounds my next point.
2. A queue is more infrastructure to manage, and hard to troubleshoot infrastructure at that.
3. Many queueing softwares struggle with deserialization into native types which makes the code on either end unnecessarily obtuse. Using RDBMS queues makes your serialization as good as your drivers.
One issue with Redis as a queue backend seems to be that persistence is quite expensive, at least for managed Redis instances. Using PG seems like it could be much cheaper, especially if you already have an instance with room to spare.
I thought it was an interesting article, and I'd love to hear more from people using PG for queues in production (my intuition would say you'd get a lot of table bloat and/or vacuum latency, but I haven't tested it myself), but when it comes to the conclusion - "choosing boring technology should be one’s default choice" - I can't think of anything more boring (in a good sense, mostly) than Sidekiq + Redis for a Rails app.
Same here. Sidekiq + Rails in a Rails app is a powerhouse, simple and reliable, but I do worry about losing the queue in Redis. It would be great to have that in Postgres as well.
There are a few mentions of Oban [1] here. Most people don't realise that Oban in fact uses SKIP LOCKED [2] as well.
Oban's been great, especially if you pay for Web UI and Pro for the extra features [3]
The main issue we've noticed though is that due to its simple fetching mechanism using locks, jobs aren't distributed evenly across your workers due to the greedy `SELECT...LIMIT X` [2]
If you have long running and/or resource intensive jobs, this can be problematic. Lets say you have 3 workers with a local limit of 10 per node. If there are only 10 jobs in the queue, the first node to fetch available jobs will grab and lock all 10, with the other 2 nodes sitting idle.
The way I implement my queues (usually as part of my monolith application) is as go routines. Each instance of the app launches with a unique id, and also a role. It can be a worker or the app itself. So when the app generates a queue item, it simply adds it to a table as pending. A worker will then, via transaction, update a set of items to add its instance id as well as an expiration for this lock. If that succeeds, no other worker will pull a queue item with a non null id or with an id different that it’s instance id that is not expired. Worker can then start processing and update item status accordingly. If it crashes, another worker will just repeat the process after the lock expires.
The code that does this is maybe 100 lines at most. It’s very effective especially if you deploy your app in kubernetes where you can expect instances to be ephemeral. It’s one of the components of my apps that has never needed any updates since I first wrote it circa 2017.
They poll. It's easy however to add pub/sub via redis. Queues are categorized in frequency. So for example a notification queue might process every minute or 5 minutes, a reconciliation queue might run every 12 hours, or have a fixed time, like daily at 2am and 6pm, etc. Each queue runs on its own goroutine on its onwn schedule, and every worker runs all queues. I can also adjust how many items a worker can pull each time so queues do not get backed up. Say there is some criteria an account needs to meet to be eligible for some feature. If that criteria requires some expensive db queries, I can run a daily job that pulls 100 accounts each time to check that they meet the criteria. But if the instances can process more, I can configure it to pull 1000 accounts. Or I can add 9 workers and each pulls 100 accounts.
Usually what I do is pull the records that were updated least recently (as in they should be ahead of the queue). So if a previous worker locked the oldest X records, the second worker will pull the next batch bc the condition will exclude the previously updated (locked) records. There's a lot of flexibility you can add with just these controls: schedule, frequency, batch size, number of workers.
I've implemented queues with tables in RDBMSs a few times and it's always great and usually all you need. Worried about future scale? Make a class to wrapper the queue with a decent interface and swap it for RabbitMQ or whatever you want down the road. Implementation stays opaque and you have an easy upgrade path later on.
In my Amazon team, we use PostgreSQL as a queue using skip-locked to implement transactional outbox pattern for our database inserts. People commenting 'just use a queue' are totally missing the need for transactional consistency. I agree with the author, it's an amazing tool and scales quite well.
> As an industry, we’ve become absolutely obsessed with “scale”.
I wish the industry was even half as concerned with efficiency as it was with scale. Bitcoin? Electron? 5MB web pages? 5/10/25GB downloads to run 20yr old CD-sized games on modern software?
I have been involved in a few projects using postgres-bakend queues for a few years, scale hasn't been a problem so far.
On the other hand, I have done a few experiments with postgres LISTEN/NOTIFY, while the feature seems nice at first glance, I concluded that it wasn't worth it for our use cases, maybe it is different in other languages but in the JVM, you have to allocate 1-thread for polling these results, which also keeps a connection busy.
What I ended up doing is leveraging akka-stream to stream the queue data directly from the db, which makes it simple to define throttling rules, this is super simple and effective.
It's about the challenge of matching up transactions with queues - where you want a queue to be populated reliably if a transaction completes, and also reliably NOT be populated if it doesn't.
Brandur's pattern is to have an outgoing queue in a database table that gets updated as part of that transaction, and can then be separately drained to whatever queue system you like.
If you're on a cloud provider, I'd say just use their offering. For small/medium amounts of messages (single digit millions a day) the cost will be trivial.
A lot of frameworks already have queue/job libraries with adapters (so you're not really locked in) and cloud providers are highly scalable and fault tolerant.
It seems silly to try to build into Postgres something that is already cheap and readily available unless you find yourself in a situation where standing up additional infra is hard (embedded, certain on premise)
> If you're on a cloud provider, I'd say just use their offering. For small/medium amounts of messages (single digit millions a day) the cost will be trivial.
It's a good compromise but not suitable for every use case.
The thing I really don't like is that you need to be connected to the cloud even for local development and test.
AWS SDK can stub responses. In general, I usually recommend creating local/in memory implementations of infrastructure so you don't end up running a gigantic stack locally. You can write tests against the real integration that selectively run against a real environment and write contract tests that cover the real implementation and your local stubs (or maybe even interfaces is strongly typed languages are sufficient).
Some languages/frameworks have this built in (a lot of ORMs do this out of the box). A lot of frameworks also have facilities for conditionally wiring in different implementations based on runtime config.
When I wrote my own background task queue I looked at Postgres, because it was already in use in the stack. Postgres would work for a simple queue, but supporting queue priorities, delayed/eta tasks, and broadcast tasks was too complicated. I decided on Redis, and it's scaled very well over the last year:
I've done the Postgres skip locked thing at least three times and I'm currently doing it, but IMO it is actually more maintenance and overhead, not less -- at least when compared with the queues made available by the major cloud providers. Compared with Pubsub or SQS you need to handle,
* Metrics, monitoring, alarming on depth, message age
* Autoscaling on your custom metrics
* Managing retries, dead lettering, backoff
* Managing the DB workload: it's update-heavy and may easily be more intensive than the rest of your app. You may need to repeatedly scale your tiny, startup-scale DB, causing customer disruptions, because of your homemade queue.
The arguments for it are either avoidance of lock-in, or the hand-wavy one in this article/many comments: "we're a small startup, we need to minimize complexity, and make things as simple as possible, maintenance will kill us!".
Lock-in makes sense (though other queue alternatives should still be considered), but the latter argument gets used to justify all kinds of harebrained, superficially-simple but actually-complicated engineering schemes. I generally put the PG skip locked approach in that bucket, particularly when the alternative on hand is Pubsub or SQS. If it's between a Postgres table and ActiveMQ I might feel more conflicted.
Something that always bothers me about "Use Postgres as a queue" (something I would suggest, even) is that there are shockingly few people publishing numbers around this. How am I supposed to know what scale I can handle with this solution? I've seen so, so few benchmarks, and maybe no recent benchmarks - a problem since performance has significantly changed across versions.
I just want to commend OP - if they’re here - for choosing an int64 for job IDs, and MD5 for hashing the payload in Neoq, the job library linked [0] from the article.
Especially given the emphasis on YAGNI, you don’t need a UUID primary key, and all of its problems they bring for B+trees (that thing RDBMS is built on), nor do you need the collision resistance of SHA256 - the odds of you creating a dupe job hash with MD5 are vanishingly small.
As to the actual topic, it’s fine IFF you carefully monitor for accumulating dead tuples, and adjust auto-vacuum for that table as necessary. While not something you’d run into at the start, at a modest scale you may start to see issues. May. You may also opt to switch to Redis or something else before that point anyway.
EDIT: if you choose ULID, UUIDv7, or some other k-sortable key, the problem isn’t nearly as bad, but you still don’t need it in this situation. Save yourself 8 bytes per key.
Some time ago, I wrote a queue using SQLite[0]. Instead of SKIP LOCKED, you can use RETURNING to lock-and-read a message and ensure only one worker is going to pick it up:
UPDATE ... SET status = 'locked' ... RETURNING message_id
Or you can just use an IMMEDIATE transaction, SELECT the next message ID to retrieve, and UPDATE the row.
On top of that, if you want to be extra safe, you can do:
UPDATE Queue SET status = 'locked' WHERE status = 'ready' AND message_id = '....'
To make sure you that the message you are trying to retrieve hasn't been locked already by another worker.
Could it be people are choosing over-engineered solutions because AWS has bad documentation and that's what the solutions architects tell them to do?
Could it be because microservices and so-called "server-less" have been sold as cost-saving measures that increase the business' flexibility and decrease capital investment?
When... in reality a single deployed Docker container is way more manageable than a distributed system constructed with "lambdas" and requires fewer engineers in the long run?
What I'm trying to say is that FAANG cargo-culting is only part of why developers choose to build solutions that scale larger than they need. Another large part is the cloud development ecosystem writ-large and the consulting culture that has built up around it.
All of shortwave.com is built on this concept. The super powerful bit here that is the mentioned is that enqueuing jobs is transactional with other normal transactions, so you don’t have to architect around using a different system for the queue and the rest of your data.
Alternatively if you just want to quickly hack something into your application, here is a complete solution in one Python function with retries (ask ChatGPT to tell you what the table structure is):
import psycopg2
import psycopg2.extras
import random
db_params = {
'database': 'jobs',
'user': 'jobsuser',
'password': 'superSecret',
'host': '127.0.0.1',
'port': '5432',
}
conn = psycopg2.connect(**db_params)
cur = conn.cursor(cursor_factory=psycopg2.extras.DictCursor)
def do_some_work(job_data):
if random.choice([True, False]):
print('do_some_work FAILED')
raise Exception
else:
print('do_some_work SUCCESS')
def process_job():
sql = """DELETE FROM message_queue
WHERE id = (
SELECT id
FROM message_queue
WHERE status = 'new'
ORDER BY created ASC
FOR UPDATE SKIP LOCKED
LIMIT 1
)
RETURNING *;
"""
cur.execute(sql)
queue_item = cur.fetchone()
print('message_queue says to process job id: ', queue_item['target_id'])
sql = """SELECT * FROM jobs WHERE id =%s AND status='new_waiting' AND attempts <= 3 FOR UPDATE;"""
cur.execute(sql, (queue_item['target_id'],))
job_data = cur.fetchone()
if job_data:
try:
do_some_work(job_data)
sql = """UPDATE jobs SET status = 'complete' WHERE id =%s;"""
cur.execute(sql, (queue_item['target_id'],))
except Exception as e:
sql = """UPDATE jobs SET status = 'failed', attempts = attempts + 1 WHERE id =%s;"""
# if we want the job to run again, insert a new item to the message queue with this job id
cur.execute(sql, (queue_item['target_id'],))
else:
print('no job found, did not get job id: ', queue_item['target_id'])
conn.commit()
process_job()
cur.close()
conn.close()
I feel like one of the problems with using Postgres as a queue is that it’s hard to get started. There’s a lot you need to know. Getting started with something like Pub/Sun on GCP is much easier for many developers.
I’ve experimented with making this easier via libraries that provide high-level APIs for using Postgres as a queue and manage the schemas, listen/notify, etc for you: https://github.com/adriangb/pgjobq
It seems like listen/notify doesn't play well with a serverless architecture. Would it make sense for Postgres to make a web request when there's work in the queue? Is that a thing?
For those using simple SELECTs, what kind of WHERE clause are you using that works well with lots of qualified pending messages and (somewhat) guarantees the most appropriate (oldest?) message?
The official JDBC driver for PostgreSQL provides no way for the database to push events to the client. So you need a dedicated connection to continuously poll the database to see if there are any events available. This seems half-baked and does not give me the warm and fuzzy feeling I crave when making architectural choices. Not to mention, it causes undesirable latency in event delivery unless I flood the database with queries.
This has its place. I would gladly use Postgres queues in places where I use redis queues today; but I would not consider replacing my SQS queues with Postgres.
I've been thinking a lot lately about how much of tech, and life more broadly, is ruined by the pursuit of scale.
Taking on problems you don't (and will never) have because some vanishingly small minority has experienced them is nuts. Over-engineering is as incorrect as under-engineering. The correctly sized, correctly complicated answer is what we're after.
> Over-engineering is as incorrect as under-engineering.
I know what you mean. The catch is figuring out 'the correctly sized' answer. Uncertainty is a huge factor here.
I think a lot of projects that are called 'over-engineered' have the benefit of hindsight. At least some of such projects were rationally designed based on what people knew at the time. There is often significant uncertainty. When engineering resources are hard to acquire, the powers-that-be often want to 'get it right' the first time. Translated: over-engineer it, because we don't know when we'll get resources for the next iteration.
I suppose my point is this: the next time somebody finds some 'over-engineered' code, it can be insightful to learn about the project history. (Example: some projects have overly optimistic prediction of usage. This perhaps suggests that pessimism is one antidote to over-engineering.)
>Postgres queue tech is a thing of beauty, but far from mainstream.
A small nitpick. FOR UPDATE SKIP LOCKED was mainstream (in a sense) way before it was copied from Oracle to Postgres. It's used under the hood in most non-trivial Oracle deployments. Like for replication or refreshing materialized views in parallel.
Not sure this particular article brings anything new to the table, but it's nice to spread the word.
Been meaning to build an Orleans stream provider for Postgres.. I believe that's the main missing component that would allow everything to "JustWork" with Postgres until you outgrow it.
I wrote a hacky version of SKIP LOCKED using advisory locks and a recursive CTE before it was released for a job queue. It worked splendidly, along with the transactional semantics of a proper database. I’m surprised more systems don’t realize they need it.
I don't have a problem with this approach at all, but I would argue that for many use cases a redis backed job library like Celery or Sidekiq might be even easier, especially if you are on a cloud provider that offers managed redis.
I’m not against using Postgres for this. But I am against the rolling your own distributed task queue. It always seems like a simple task but snowballs in complexity. Any gains you get simplifying your stack will be wiped out by the fact that things like Celery (for example) don’t support using Postgres as a broker so now you have to do your own DIY Celery instead of say, just using Celery with the SQS broker (which… since we’ve established scale isn’t being considered here, SQS costs shouldn’t be an issue either).
Anyone know if there are Celery or Celery-like tools that support Postgres as a broker?
As a side-note, if you want a simple no-frills task scheduler ap-scheduler is a dead simple option. It’s even more limited than the solution described in OP (you can only run one worker so it’s not distributed at all) but often it is all you need especially for toy projects.
> I’m not against using Postgres for this. But I am against the rolling your own distributed task queue.
Good thing I didn't listen to your advice... my DIY background task queue saved my website when Celery couldn't scale. Why are you against rolling your own task queue besides it seeming complicated?
I hate to point out something uncomfortable, but this guy's combative writing style and penchant for abstract art reminds me of the articles by a certain infamous creator of the ReiserFS file system for linux.
I'm in the market for a Postgres-backed queue system with client libraries in NodeJS _and_ Python. Clients in both languages need to be able to read and write from the queue. Can anybody suggest one?
PGMQ does not require a client library, https://github.com/tembo-io/pgmq so long as your language of choice can run SQL. All the functions live in Postgres, and you just call them with SQL statement. Very similar feel and semantics to SQS.
Did you find it in a credible source? Which one(s)?
I've found nothing credible in Merriam Webster, Etymology Online, nor _any_ other I've searched. There is at least one low-quality ad-serving site that credits ChatGPT with a definition.
There's many uses in British literature of the 1800's, and a whole lot of uses in academic literature of the 70's to 80's. https://i.imgur.com/BhMv2nF.png "Disabuse" would fit into many of these slots, but not all.
Only common use now is RPG jargon; imbuing something with an attribute is something role playing nerds talk about, and it really needs an antonym.
there's an important dimension of scalability that I think gets overlooked in a lot of these discussions about database-as-a-queue vs queue-system-as-a-queue:
are you queuing jobs, or are you queuing messages?
that's a fuzzy distinction, so somewhat equivalently, what's the expected time it takes for a worker to process a given queue item?
at one end, an item on the queue may take several seconds to a minute or longer to process. at the other end, an item might take only a few milliseconds to process. in that latter case, it's often useful to do micro-batching, where a single worker pulls 100 or 1000 items off the queue at once, and processes them as a batch (such as by writing them to a separate datastore)
the "larger" the items are (in terms of wall-clock processing time, not necessarily in terms of size in bytes of the serialized item payload) the more effective the database-as-a-queue solution is, in my experience.
as queue items get smaller / shorter to process, and start to feel more like "messages" rather than discrete "jobs", that's when I tend to reach for a queue system over a database.
for example, there's a RabbitMQ blog post [0] on cluster sizing where their recommendations start at 1000 messages/second. that same message volume on a database-as-a-queue would require, generally speaking, 3000 write transactions per second (if we assume one transaction to enqueue the message, one for a worker to claim it, and one for a worker to mark it as complete / delete it).
can Postgres and other relational databases be scaled & tuned to handle that write volume? yes, absolutely. however, how much write volume are you expecting from your queue workload, compared to the write volume from its "normal database" workload? [1]
I think that ends up being a useful heuristic when deciding whether or not to use a database-as-a-queue - will you have a relational database with a "side gig" of acting like a queue, or will you have a relational database that in terms of data volume is primarily acting like a queue, with "normal database" work relegated to "side gig" status?
1: there's also a Postgres-specific consideration here where a lot of very short-lived "queue item" database rows can put excessive pressure on the autovacuum system.
I’ve used PG as a message queue, actually it was used as a transactional front end to Kafka; we’d push messages to a PG table during a transaction, which would then be snarfed up to Kafka by a separate process after the transaction completed.
I’ve seen very high transaction rates from this arrangement, more than 20k messages/second.
We have our own queue, because it was easy, fun and has been exceedingly reliable above all else. Far moreso than other things we had tried. Cough Gearman cough SQS cough
One endpoint accepts work to a named queue, writes it to a file in an XFS directory. Another locks a mutex, moves the file to an in progress directory and unlocks the mutex before passing the content to the reader. A third and final endpoint deletes the in progress job file. There is a configurable timeout, after which they end up at a dead letter box. I am simplifying only a little bit. It's a couple hundred lines of Go.
The way this is set up means a message will only ever be handed to one worker. That simplifies things a lot. The workers ask for work when they want it, rather than being constantly listening.
It took a little tuning but we process a couple billion events a day this way and it's been basically zero maintenance for almost 10 years. The wizards in devops even figured out a way to autoscale it.
They ask for work after they finish the previous job (or jobs, they can ask for more than one). Each worker is a single process built just for one task.
If there's no work for them there's a small timeout and they ask for more. Simple loop. It's all part of a library we built for building workers. For better or worse, it's all done over http.
You are right, though, it is one XFS volume per queue instance.
We just run multiple instances (EC2) on a load balancer. Each instance of the queue gets it's own set of workers though so the workers know the right server to report done to.
We want a way to have a single pool of workers, rather than a pool per queue instance, and have them talk to the load balancer rather than directly, but we haven't come up with a reasonable way to do that.
I like how GCP cloud tasks reverses the model. Instead of workers pinging the server asking for work, have the queue ping the worker and the worker is effectively a http endpoint. So you send a message to the server, it queues it and then pings a worker with the message.
Ooh, that's kind of interesting. Am I reading this right that it holds the HTTP connection open for up to thirty minutes waiting for the work to complete? That's kind of wild.
Indeed. If you're hitting AppEngine or GCP Functions, they auto scale workers up for you to manage long running tasks. Ideally though, you finish as quickly as possible by breaking the work down into more tasks. That way, you can parallelize as much as possible.
It is all configurable, but I've scaled up to hundreds of workers at a time to blast through tasks and it wasn't expensive at all.
Workers being an HTTP endpoint makes them super easy to implement and even better... write tests for.
I love Task Queues. We are using them extensively. Also, they give you deduplication for free and a lot of other nice features like delayed tasks storing tasks for up to 30 days extremely detailed rate limits etc.
Yea, this is the only thing I don't like about them, that I can't test them locally.
More generally, is there something like a "on prem cloud" which just replicates say Cloud Tasks (but also other Cloud Apis) using local compute as well as say a local db. For testing / development this would be very cool.
Built very similar but on S3. Jobs have statuses, land in /jobs, indexed by status at /indexes-jobs/PENDING, etc. Scheduler polls for jobs in PENDING index, acquire lock, pass job to processor, change its status to COMPLETE or DEAD.
300~ LOC or so and fairly easy to test. Wouldn't take that approach every time, but definitely worth it when you're aiming for a simple architecture.
Why files though, and why move them into different directories? You said billions a day. With files, the physical drive must be taking a beating. Not to mention potential issues with directory file limitations(based on OS and file system). Why not use some kvdb?
As I understand (correct me if I'm wrong, it's been forever since I've worked with filesystems) - file renames are very cheap as the actual data does not get moved, simply a journal gets updated
Our industry is full of shysters pushing their own technology. Time and again, it turns out that a RDBMS will handle that job just fine. That's really the premise of this article.
So please, go on and back up your bold statements with some specifics. Why specifically is it not OK to use a database as a message queue?
Good counter-point. I see these main points over there:
1. It doesn't scale (there it is again)
2. Queuing with Postgres is super fiddly to get right
3. You're hacking a queue on top of something that isn't a queue
4. Running redis or rabbit isn't all that complicated
#1 as TFA argues, premature concern about scaling is the root of so much needless complexity. You should make scaling decisions like this: 1) assume boring tech like PG will satisfy your needs; 2) if it demonstrably does not, then find something that does.
#2 is obviously true; just look at this thread. There are battle-tested queuing libraries in most popular languages, but you do have to dig into the details of how they interact with things like pgbouncer.
#3 I guess so? But if the queue abstraction works and isn't leaky, what does it matter?
#4 can be debated. For several years I've been running a moderately complicated setup with 2 databases, redis, and kafka for several years. There's no way I'm going to add another piece of tech unless there's no other choice. The cognitive cost is too high.
The main debate is in the tradeoff between #2 and #4. Personally, if I can use an existing piece of tech to solve a problem to avoid having to ops another piece of tech, then I'm going to do that every time.
Terrible idea. Using a database for queues means using a file format that's not optimized for this, so, unless you don't delete the rows (effectively having a always-growing table) you'll run into performance issues. Also the index for a processed flag will cause contention.
I'm certain you can work around those issues, but why if you can use a proper queue?
My main issue with pretty much all queue approaches is that they don't work across platforms. They are built for one technology stack, be it Python/NodeJS/etc. This is fine if you've only got one stack, but in a microservices world it doesn't work where jobs can span multiple systems. You might be able to find some abandoned library that supports that queue tech on the other platforms you need, but now you've basically become a queue tech maintainer.
> My main issue with pretty much all queue approaches is that they don't work across platforms.
What technology stack are you working in which doesn't support postgres?
> but in a microservices world it doesn't work where jobs can span multiple systems
The point of the queue system is to be able to span said microservices. You can have an OCaml service picking up from one queue, processing, then writing into another queue. That queue could then be processed by a TypeScript service.
There are queues like sqs and google pubsub where there is already official clients written for most popular languages. And they also have http API so you can use it in any language environment that can make http requests.
I don't usually downvote posts, but this article is just garbage - a rant about "the cargo cult of scale" but no actual arguments as to why Postgres is better than redis or sqs or anything else. The main reason that people don't use postgres for this kind of thing isn't some kind of misguided obsession with scalability, it's because postgres is way more complicated to deploy / manage and harder to use for application developers than any of the other options.
The argument is stated-- you shouldn't adopt too many technologies. The author merely says that if you're running postgres already, you should seriously consider it for your queue. They also point out that many of the arguments against psql-as-queue are flawed.
If you need to handle 70 requests/second, design for 700. If you need to handle 20 servers running batch jobs, design for 200 servers. If you're in a startup that grows 100% per year, you'll be at 8x scale in 3 years. So you'll have time to rewrite as you grow!
Out of three job systems I built, the first one tried to avoid SQL for "scalability." Then we hit a bunch of edge cases that needed transactional integrity, and we were in a world of pain.
My two more recent distributed job systems use PostgreSQL as a coordinator. They're literally built around SELECT FOR UPDATE SKIP LOCKED. One of them routinely controls 350 workers and the other does elaborate prioritization for thousands of jobs. Both of them will continue to run just fine until they're earning millions of dollars a year—for example, the one controlling 350 workers should scale to about 2,000 CPUs with a little work.
Hyperscale technologies are fairly cheap right up until you discover you need transactions. At that point, faking transactional semantics on top of an eventually consistent data store becomes an engineering nightmare.
So sit down, and do the math. If your company was earning $100 million/year, how big would your distributed system need to be? Can you easily get a PostgreSQL instance big enough to handle that load? (Or could you just shard per client?) If so, strongly consider using PostgreSQL. It makes a hundred things trivial.