Postgres Task Queues: The Secret Weapon Killing Specialized Queue Services?

bennyp101 · 2023-09-02T17:31:57

I use Postgres as a queue for some bits at work using Graphile-worker [0] and it works perfectly. No need for another moving part when the data I need is in the db. Also avoids having to do outbox stuff.

It’s quite cool just how far you can get with postgres

[0] https://github.com/graphile/worker

savikko · 2023-09-03T07:23:01

Yep, graphile-worker seems to be so generally good that you can implement very robust pipelines with it also.

I think graphile-worker as temporal.io 's little sibling.

catlover76 · 2023-09-02T18:54:19

I'm not well-versed in the use of task queues; doesn't this make one's database layer even more of a bottleneck?

Like, I thought part of the logic of a separate task queue is: 1) TQ operations, such as writing to queue, updating status, etc, involve a lot of writes; 2) If your db is your TQ, that's a lot of extra writes to the DB, which still has to handle requests from the main application code; 3) this will lead you to have to scale your DB layer in some fashion earlier than you would otherwise have to, and scaling the DB layer via, let's say sharding or something, is harder/trickier than setting up or scaling a task queue.

Especially if early/small, who wants to worry about scaling their DB layer unless they have to?

Would appreciate some insight; I have a side project that involves downloading and processing a lot of PDFs; if it sees the light of day, it will need a task queue of some kind. I'm already using Postgres, so it would be tempting to use that as the TQ, so long as that doesn't require more careful management to get working right.

adamckay · 2023-09-02T19:05:36

> Especially if early/small, who wants to worry about scaling their DB layer unless they have to?

If early/small, who wants to worry about putting another piece of infrastructure (RabbitMQ, Redis, etc) in your stack that you have to operate?

It will probably surprise you how far you can go with just Postgres.

catlover76 · 2023-09-02T19:13:57

> If early/small, who wants to worry about putting another piece of infrastructure (RabbitMQ, Redis, etc) in your stack that you have to operate?

Yeah, I get that. I guess my assumption is kind of that these things are so common and mature at this point, that setting them up isn't that big of a deal. I suppose too that if one is early/small, one should probably use a managed solution for anything possible as long as it isn't crazy expensive; so a managed TQ should be easy to set up, and if you're using managed Postgres, scaling it horizontally should also be relatively easy (?)

basicallybones · 2023-09-02T21:40:54

Take this with a grain of salt, as I have not used PG Task Queues. I think the answer is there is no right answer here, and that it depends on your personnel and your data integrity/uptime guarantees. If anything, I suspect the only real "mistake" is using something like Kafka unless you really need it, but it's probably not that costly if you do.

Presumably, one easily could deploy a second PG to scale Task Queues independently of the DB layer as needed. So in that respect, there is no real issue, and it might be easier to start off with a single instance and split it up as needed for scale.

The real difference is if you find yourself dealing with subpar decisions from clueless management. Not giving them (or letting them know about) the option to put the queue and DB on the same instance can save your platform from total outages (especially if the queue load is much more variable than the DB load).

vosper · 2023-09-02T18:35:08

What’s the max throughout anyone’s got using Postgres as a queue? And on what spec machine?

Could I conceivably put 200 messages per second through a Postgres queue given a big enough RDS instance?

Wondering if we don’t really need to move to Kafka at work…

colinchartier · 2023-09-02T19:10:44

Postgres can be tuned to 10k inserts per second, Kafka is definitely overkill for 200/s

vosper · 2023-09-03T00:36:47

Thanks! I will definitely look into this some more. M

sorokod · 2023-09-03T06:03:05

When you read from the queue you do a "select for update",this uses up db resources (locks) . Is or isn't an issue depending on how many locks you want to take concurrently and for how long.

notnmeyer · 2023-09-02T19:42:04

considering how simple sqs is, i’m confused how building your own queue system on postgres is keeping it simple.

if you’re not using aws, then i understand avoiding setting up an account and some IAM bits, i guess.

fatfingerd · 2023-09-02T20:27:57

If you use some other system, you still end up needing to track all of tasks and their progression in the database then identify all the ways they may have gone off track and update their state..

notnmeyer · 2023-09-02T20:46:37

yeah, you’re right

aftbit · 2023-09-02T19:55:04

If you already have Postgres as a dependency, then adding a second is almost always more complicated than just using your existing one.