Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Postgres is a great pub/sub and job server (2019) (webapp.io)
468 points by anonu on Dec 17, 2021 | hide | past | favorite | 200 comments


Author here! A few updates since this was published two years ago:

- The service mentioned (now called https://webapp.io ) eventually made it into YC (S20) and still uses postgres as its pub/sub implementation, doing hundreds of thousands of messages per day. The postgres instance now runs on 32 cores and 128gb of memory and has scaled well.

- We bolstered Postgres's PUBLISH with Redis pub/sub for high traffic code paths, but it's been nice having ACID guarantees as the default for less popular paths (e.g., webhook handling)

- This pattern only ever caused one operational incident, where a transaction held a lock which caused the notification queue to start growing, and eventually (silently) stop sending messages, starting postgres with statement_timeout=(a few days) was enough to solve this

Previous discussion: https://news.ycombinator.com/item?id=21484215

Happy to answer any questions!


> doing hundreds of thousands of messages per day

> The postgres instance now runs on 32 cores and 128gb of memory and has scaled well.

Am I the only one?


I should've clarified, the database handles more than just the "regular" pub/sub, some of the tables have over a billion rows.


Thank you. That makes a lot more sense and explains the value of the approach much better.


I assume that is their main database for everything, not just for pub/sub. One of the big benefits of doing it that way is that you have proper transaction handling across jobs and their related data.


Come on man… you can run the whole thing off if a few Gb instance. Such a huge instance should be able to do about 100k a second!


Potential and actual usage aren't related. They might be having a lot of records and read/writes but maybe the actual pub/sub isn't that intensive. They seem to be using the same DB for everything


Be careful not to confuse average load with peak instantaneous load. Bursty workloads are the bane of capacity planners everywhere.


Does postgres scale that well? I would be interested in case studies as I've not seen much achieving beyond 10k records per second.


Whose to say it can't?


Such a server is 400$/mo, a backend developer that can confidently maintain kafka in production is significantly more expensive!


I think the point of interest was 32 cores to handle what sounds like 10 messages per second at most. That's not really a ton of throughput... It's certainly a valid point that an awful lot of uses cases don't need Twitter-scale firehoses or Google-size Hadoop clusters.


Ah, the database does a lot more than just pub/sub - especially since the high traffic pub/sub goes through redis. I guess my point was that we never regretted setting up postgres as the "default job queue" and it never required much engineering work to maintain.

For an example, it handles stripe webhooks when users change their pricing tier - if you drop that message, users would be paying for something they wouldn't receive.


It said nothing about the distribution of traffic. It might well be thousands and thousands of pub sub messages at some point of the day and 0 for others.


Fwiw I don't know the shape of the data, but I feel like you could do this with Firebase for a few bucks a month...


you 100% could, and this thread feels like the twilight zone with how many people are advocating for using a rdbms for (what seems like) most peoples queuing needs.


Dude you are seriously underestimating postgres' versatility. It does so many different things, and well!


I'm not underestimating anything. I am advocating for the right tool for the job. I have a hard time believing, despite the skewed sample size in this thread, that most people think using postgres as a message queue for most cases makes the most sense.


What is your idea of 'most cases'?

I've personally written real-time back-of-house order-tracking with rails and postgres pubsub (no redis!), and wrote a record synchronization queuing system with a table and some clever lock semantics that has been running in production for several years now -- which marketing relies upon as it oversees 10+ figures of yearly topline revenue.

Neither of those projects were FAANG scale, but they work fine for what is needed and scale relatively cleanly with postgres itself.

Besides, in a lot of environments corporate will only approve the use of certain tools. And if you already have one approved that does the job, then why not?


>some clever lock semantics

Most senior+ engineers that I know would hear that and recoil. Getting "clever" with concurrency handling in your home-rolled queuing system is not something that coworkers, especially more senior coworkers, will appreciate inheriting, adapting, and maintaining. Believe me.

I get that you're trying to flex some cool thing that you built, but it doesn't really have any bearing on the concept of "most cases" because it's an anecdote. Queuing systems are a thing for a reason, and in most cases, using them makes more sense than writing your own.


> Most senior+ engineers that I know would hear that and recoil. Getting "clever" with concurrency handling in your home-rolled queuing system is not something that coworkers, especially more senior coworkers, will appreciate inheriting, adapting, and maintaining. Believe me.

I am both a "senior+ engineer" that has inherited such systems and an author of such systems. I think you're overreacting.

Concurrency Control (i.e., "lock semantics") exists for a reason: correctness. Using it for its designed purpose is not horror. Yes, like any tool, you need to use it correctly. But you don't just throw away correctness because you don't want to learn how to use the right tool properly.

I have inherited poorly designed concurrency systems (in the database); yes, I recoiled in horror and did not appreciate it. So you know what I did? I fixed the design, and documented it to show others how to do it correctly.

I have also inherited OOB "Queuing Systems" that could not possibly be correct because they weren't integrated into the DB's built-in and already-used correctness system: Transactions and Concurrency Control. Those were always more horrific than poorly-implemeneted in-DB solutions. Integrating two disparate stores is always more trouble than just fixing one single source.

----

> I get that you're trying to flex some cool thing that you built, but it doesn't really have any bearing on the concept of "most cases" because it's an anecdote. Queuing systems are a thing for a reason, and in most cases, using them makes more sense than writing your own.

I get that you're trying to flex that you use turnkey Queueing Systems, but it doesn't really have any bearing on the concept of "most cases", because all you've presented are assertions without backing. Queuing systems are good, for a specific kind of job, but when you need relational logic you better use one that supports it. And despite what MongoDB and the NoSQL crowd has been screaming hoarsely for the past decade, in most cases, you have relational logic.


Well, you'd have to see it before you judge. It's super simple, like 5 or 10 lines total. Handles 1000x+ the traffic it sees. In any case concurrency is nothing to be afraid of. Do they not teach dining philosophers any more?

My point is that postgres is a swiss army knife and you and anyone else would be remiss to not fully understand what it is capable of and what you can do with it. Entire classes of software baggage can be eliminated for "most" use cases. One could even argue that reaching for all these extra fancy specialized tools is a premature optimization. Plus, who could possibly argue against having fewer moving parts?


I guess the clever lock semantics are SKIP LOCKED, which is designed to support efficient queues. The cleverness is inside PostgreSQL rather than in the application, other than the cleverness of knowing about this feature. https://www.2ndquadrant.com/en/blog/what-is-select-skip-lock...


Yup, exactly that


No, you are misunderstanding. People are saying Postgres does message broking quite well. That makes it the right tool for the job for many people. You have a hard time believing it but people who have actually done it are saying otherwise. This is your misunderstanding.


There is also the issue of having to have up to n experts for n different "best tools". Programmer/devops time is expensive; the tool choice is not the only (and often the least) cost to consider.


Why should I rely on yet another microservice when I have PostgreSQL right there?


Everything is a nail, why should I use anything but this hammer?


Make every system as complex as you can with tech you are not really familiar with is a good plan for your small team? Under a 100 people, your company does not have 100 devops etc to make sure all these 'best of breed' tools actually managed properly in production. If a service on top of postgres dies, I will find out why very quickly; on Kafka, even though I have used it a bunch of times, I usually have no clue; just restart and pray. Why would I force myself to use another tool when postgres actually works well enough? Resume driven?

Sometimes I agree with best tool for the job; if the constraints make something a very clear winner; if the difference is marginal for the particular case at hand, I pick what I/we know (I would actually argue that IS the best tool for the job; but in absolute 'what could happen in the future' terms it probably is not).


Postgres happens to be a very good hammer, thank you very much. You should try it sometime.

But seriously though, postgres's relational logic implementation makes for a very good queueing system for most cases. It's not a hack that's bolted on top. I know that's how quite a few "DBs" are designed and implemented, and maybe you've been burned by too many of them, but Postgres is solid. I've seen it inside and out.


I think you're helping bring balance to the enthusiasm here for using Postgres as a multi-purpose tool. However, there is a lot of room for you and the advocates favoring Postgres to both be right about tooling. I adopted RabbitMQ because I decided I didn't want to grow into needing it by dealing with many of the problems that motivated engineers to bring RabbitMQ into existence. However, I probably would have been fine with Postgres-pubsub, or Redis-pubsub/streams, both databases that I already used for their general purpose and have established capabilities for messaging. I noticed your earlier agreement with the person who mentioned using Firebase, and Firebase is yet another multi-purpose tool good enough at many things but still not better than the customized domain systems. If you agree with the claim for Firebase, others can now agree about Supabase. It's all Postgres beneath, though.


Agree with your point about multiple tools being good enough, but IMO firebase is not one of them. In my experience despite it claiming to be excellent at scaling, it performs worse than even a small Postgres instance. It’s good at the “real-time subscriptions”, but that’s about it.


Noted. Thanks for sharing your experiences with that. We need to hear more about lackluster investments in tech.


Does Firebase offer self-hosting these days?

What do you say to those who don't want Google to know their usage info?


Checkout supabase.com. It is based on postgres.


But Kafka does significantly more.

And if your needs are simpler like in this case then there are dozens of smaller pub/sub/queue systems that you could compare this to.


Limit the types of server used to reduce system complexity. If you can have all your business state in the same place, ops are much easier.

Kafka does more for streaming data, but doesn't do squat for relational data. You always need a database, but you sometimes can get by without a queuing system.


Briefly what are some mandatory kafka use cases?


I would say postgres does much more. What use case can only Kafka handle?


It's that much on a popular cloud platform, you can buy this for 3-4 times that amount and use it for years.


Got a 128gb 32 core xeon workstation sitting under my desk off eBay and it was $400


That's not exactly a setup suitable for reliable production usage though.


Or rent it for a lot less at a traditional hosting company.


That's the job of a DevOps engineer not a backend developer attempted to be overworked.


It probably fits within the free tier limits of a managed pubsub service.


I imagine their scaling problem isn't messages/day, it's probably lots of concurrent, persistent connections. And I don't think a connection pooler would work with this job queue setup.


Yeah, every home IoT hub processes more messages than that with less thsn raspberri pi worth of compute


I certainly appreciate the sentiment though I'm pretty sure I don't have the same reliability and uptime guarantees on my little Rpi3/MQTT/NodeRed/SQLite/ESP8266 home system :-)

That said, it's been running for upwards of 4 years and accumulated an insane number of temperature readings inside and above heating vents (heat source is heat pump)

SELECT count() as count FROM temperatures : msg : Object { _msgid: "421b3777.908118", topic: "SELECT count() as count FROM …", payload: 23278637 }

Ok, I need therapy for my data hoarding - 23 million temp samples is not a good sign :-)


Curious as to why you aren’t tracking that with a time series database?


It's hobby diversion so minimal effort is a factor. That and that SQL query comes back in seconds. The initial experimentation was with ESP8266's and the MQTT/NodeRed/SQLite played a supporting roll.

My experience with SQLite is that it can take you a long ways before needing to look elsewhere.


The IoT hubs are an embedded system, built with a minimal memory footprint and overhead, 512 mb of ram is typical, sometimes less. Here is an example: https://www.gl-inet.com/products/gl-s1300/

That means you can't have docker and different versions of Java, node and .Net all running in parallel.

You run a single process and Sqlite is a library that allows SQL operations and database to be inbuilt. You 'budget' is like 100 mb of Ram, becauae other stuff has to run too.

All the time-series databases I know are a large, memory hungry hippo, built for distributed compute/kubernetes. Just very different usecase. If one was built with minimalism in mind, then it could be used.


Are you implying that given the specs, hundreds of thousands of messages per day is not good enough? I think you are, or at least that is what I was thinking myself.


Only for hundreds of thousands of messages per day, that's way too big of a server. But if you look on the rest of the thread, it doesn't do only that.

Anyway, for a server that only does pub/sub with ACID guarantees, those specs are so large that there is certainly a bottleneck before they matter. So it wouldn't be strange if somebody gets one that can't even handle that, it just would mean that there is some issue somewhere we don't see.


Is your point that the server has room to grow? Or that you just “ain’t impressed by that”?


I guess the point is that the scale is actually not that large, but that's perfectly ok because most problems will never need that large scale either.

In fact, the article makes a very good point how just doing it in postgres is great, it doesn't really scale (because of ACID), and adapting it for scale after you need it will lead to a better design than what you would do if you started optimizing without any information.


People are jumping on this. Question is—do the resource requirements outlined align with usage you described, or is that combined workload? By combined workload, I mean working set plus messaging. It’s not a useful exercise to criticize a service that’s multifaceted based on a single use case. Full disclosure—-not a Postgres user, nor am I invested in the tech.


It’s such a low throughput requirement I think even bitcoin could support it.


No, this is prob still spiky enough to have more than 7 transactions per second, that’s too much for Bitcoin.


To cherry pick two details of the post and insinuate something about it?

No.


Thanks for the great blog post - still relevant after a few years!

> statement_timeout=(a few days)

wouldnt you want this to be a few seconds or minutes? Maybe I miss the point of setting this to days...


Didn't want to deal with ramifications of statement timeouts in a complex system, the failure mode mentioned (queue filling up) happened on the scale of 6 weeks, so it was very cheap operationally to set this timeout to some high value.


So, just to make sure I understand correctly: notifications are delivered while the notification queue size is increasing (due to the transaction holding a lock), and it doesn’t become a problem until the queue size reaches its maximum, at which point it causes dropped notifications?

But the queue grows precisely because some notifications aren’t getting delivered, right?


Since it's pub/sub, you just need one misbehaving client that LISTENs in a transaction to have problems, the other clients can still receive and process the NOTIFY event


"hundreds of thousands of messages per day"

This is not much load at all, an iPhone running RabbitMQ could process many millions of messages per day. Even 1M messages per day is only 11 messages per second average. i.e. not taxing at all.


I've built software that can process millions of messages per second on a single thread.

I find it amusing that we happily play these AAA gaming experiences that are totally fantastical in their ability to deal with millions of things per frame and then turn around and pretend like hundreds of thousands of things per day is some kind of virtue.


I assume you use polling workers looking for the next job to grab for themselves?

Personally I do see the niceness of having a good pattern implemented using existing technology. Less deployment nonsense, less devops, less complexity, a few tables at most. I've done similar things in the past, it is nice.

For anyone who'd criticize, having complex deployments can be just about as much dev time, AND if implemented well, they can theoretically covert this whole thing to rabbitmq with minimal effort just by swapping the queueing system.

In any case, happy to see people mentioning how using existing simple tech can lead to fairly simple to manage systems, and still solve the problems you're trying to solve.


I've always been curious, what kind of latency do you see between an insert, and when the notify goes out over the channel?


10ms or so


I'd be curious to know what the drawbacks of using PG for a pub/sub server are.


What are the options to use Postgres pub/sub with Java? Because the usual Java libraries don't seem to support the pub/sub functionality well, you have to actively poll when you want to subscribe.



This may depend on the JDBC driver support Listen/Notify. Though if queue traffic is relatively steady then maybe polling isn't so bad?


I very happily use this technique and I believe I found out about it from your original blog post. Thanks for the original writeup and for the update on how it's going a few years in!


Silly question but how does this compare to SQS? More cost friendly I assume?


SQS vs "Postgres Queue", I think mainly:

- Closed/lock-in vs. Open/lock-free

- Rigid data access pattern vs. Very flexible SQL access

-Managed by AWS vs. Managed by you/your team (although you could use one of those managed Postgres services to reduce ops burden)

- Integrates well with other AWS services (e.g. Lambda, SNS, DynamoDB, etc) vs. No integrations with AWS ecossystem out of the box


What was the isolation level used when that incident occurred?


The default postgres one, serializable (if I remember correctly?)


The default isolation level in postgres is read committed.


I had thought about using postgres as a job queue before, but I couldn't figure out in my head how to make sure two processes didn't both take the same job. The "FOR UPDATE" and "SKIP LOCKED" were the keys to make this work in the article. Essentially, as far as I can tell, "SELECT FOR UPDATE" locks the rows as they're selected (locks are apparently visible outside the transaction), and "SKIP LOCKED" skips over rows for the select that other transactions have locked. Cool stuff.

The below article goes over some downsides to using postgres for a job queue: namely that the data model isn't optimized to find and send new jobs to large numbers of subscribers. Actual messaging systems like Kafka/NATS are better at this.

https://www.2ndquadrant.com/en/blog/what-is-select-skip-lock...

Also, there are things that the dedicated messaging systems give you that postgres won't, such as transparent error handling and retry logic. If your worker grabs a job from postgres and fails (in some kind of transient way such that the task should be retried), you'll have to implement the retry logic yourself. A streaming platform like Kafka or NATS will notice that you haven't acknowledged the message and deliver it to someone else.

This is something you could pretty easily implement yourself with something like a process that just scans over the "processing" jobs and looks for timeouts, resetting them as it goes. But there are probably a few of these little papercuts that Kafka-like (…Kafkaesque?) systems would handle for you.

So, I guess if you already have postgres set up in a reliable way, and there's no one around whose job it is to set up new production systems like Kafka, and you don't already have something like Kafka, and you only need basic job queue requirements or you're okay with implementing whatever you need on top of postgres yourself, and your incoming job rate is fewer than maybe a thousand per second, and you have fewer than maybe tens of consumers… postgres is probably a decent job queue.

The above paragraph seems snide but it probably does describe many people's environments.


This. The article seems like "one weird trick that message queue companies HATE" as it's utilizing, as far as I understand, some SQL semantics in a very specific way to cobble together a way of achieving what other software is designed to do out of the box. It seems fine for a toy system, but I wouldn't stake the success of a real company on this approach.

One could also use DNS TXT as an RDBMS with some interesting fault tolerance and distribution schemes. That doesn't mean it's a good idea or the best way to solve a problem.

If you haven't seen them already, the Jepsen analyses are really worth a read: https://aphyr.com/posts/293-jepsen-kafka https://aphyr.com/posts/282-jepsen-postgres https://aphyr.com/tags/jepsen


It's strange to see you suggesting this can't work for a "real" company. That's demeaning to the OPs company, which seems quite real. It's also odd that you're proposing replacing this with a distributed system and citing the Jepsen articles as support for it, when they prove the opposite. Distributed systems are hard. If you can avoid them and stay in the happy ACID town with Postrgres, indeed why not?


If you want to carry messages across the internet SQS et al is fine. But within the same system, e.g. in the same computer, or cluster, it makes much more sense to use something like this rather than something like SQS. Different tool, different job.


Thank you.


If a job fails, the connection to the database will timeout. Postgres will rollback the transaction, which releases row locks, freeing a job to be retried.

Of course, the database client and server together form a distributed system. The client might continue processing a job under the mistaken impression that it still holds the lock. Jobs still need to be idempotent, as with the streaming platforms.


> If a job fails, the connection to the database will timeout. Postgres will rollback the transaction, which releases row locks, freeing a job to be retried.

Only if you begin a transaction when your job starts that isn’t committed until you job finishes. As I understand it, this is not a good idea for long-running jobs, since you’ll have a long-running Postgres transaction. Am I missing something? The linked article doesn’t seem to use this approach.

Does the “FOR UPDATE” decrease the lock to just a single row, thus making it unproblematic?


This assumes that you're creating a transaction per message, which I think is not advisable.


Postgres implicitly creates a transaction for any query modifying data outside of one.

Transactions in MVCC are relatively cheap. The main resource of contention is a global txid that can disastrously wrap around if autovacuum is disabled. That process is responsible for a few other important tasks, like updating statistics for the query planner and maintaining BRIN indexes.


> Postgres implicitly creates a transaction for any query modifying data outside of one.

I should clarify. I meant holding a transaction for the duration of the message.


You can also create transactions for batches of messages if you want. Same as acking batches of messages in a queue system.


if you care to elaborate, i'm curious -- what alternative(s) would you recommend instead of one transaction per message, and why?


What if you want to store an error message+status with the job if it fails? It means you have to commit something no? What if you want to mark the message as being worked on and what process locked it and when they did for some kind of job queue monitoring stats?

The times I have done this, I end up with a workflow where the "select for update ... skipped lock" is used only to initially mark the job as "taken" so that other processes do not start working on it. That update is committed in one transaction and then the "work" is done in second transaction.


The article's approach is to have a column that stores if he job has been claimed and then handles dead jobs by just having a timeout.


I think the key concept here is atomicity. If some API is responsible for creating a job, storing it in the database AND publishing it can never be an atomic operation. Both the database and pub/sub servers are separate network connections from the application server. For example, if you save the record first and then publish, It's quite possibe that you save the record in the database and then lose the connection to the pub/sub server when publishing. If this happens, you can never know if the pub/sub server received the request and published it. In systems where it's critical to guarantee that a record was saved and guarantee that it was published as well, the only way to do that is by using a single external connection - in this case to a Postgres DB. We've used the same setup on an AWS RDS t2.medium machine to process over 600 records/second.


>If some API is responsible for creating a job, storing it in the database AND publishing it can never be an atomic operation.

We use transactional outbox for that - we insert a record about the event into the event table in the same transaction as the rest of the operation (providing atomicity), and then a special goroutine reads this table and pushes new events to the message broker on a different server. In our design, there are multiple services which might want to subscribe to the event, and the rule is that they shouldn't share their DB's (for proper scaling) so we can't handle event dispatch in some single central app instance. Of course we could implement our own pub/sub server in Go over a DB like Postgres if we wanted, but what's the point of reinventing the wheel if there's already existing battle-tested tools for that, considering you have to reimplement: queues, exchanges, topics, delivery guarantee, proper error handling, monitoring etc.


Distributed transactions are hard. And if queue data is the source of truth for something then they must be durable. All that said, when traffic volume, payload size, or throughout demands something specialized then it makes sense to do it.

If I'm building one shed then maybe I only need 5 tools, while a shed factory might use 100. Context matters.


You can use transactional outbox pattern to solve this.


Postgres is really just great for being able to build just about anything to get that first viable product built. It's basically the swiss army knife for anything data in my opinion. You got sql, nosql, job queues, full text indexing. It's great.

I use it as a sql database and full text search for little personal project I work on off and on and it works great. I haven't touched it except to check every few weeks for security updates for months since I got a promotion and it, the golang app server and python scripts have had no issue just churning along keeping a 30 day archive of links found via reddit and twitter. Postgres is great.


This seems to come up on HN at least once a year. Sure it can work but LISTEN ties up a connection which limits scalability as connections are limited and expensive. Also, mitigation strategies like PgBouncer cannot be used with this approach (nor can scale out solutions like CitusDB I don't think).

Of course, if scalability is not a concern (or the connection limitations are eventually fixed in postgres - this has improved in 14), this would be a very viable approach.


Supabase's Realtime [1] is one of the solutions that can help with that. Although it doesn't exactly let you LISTEN at scale, but it allows the applications to be notified on changes in the database.

> Listen to changes in a PostgreSQL Database and broadcasts them over WebSockets

[1]: https://github.com/supabase/realtime

Disclosure: I'm a Supabase employee.


I like Supabase's approach over pub/sub. One of the big advantages is they listen to the Postgres WAL which overcomes the 8000 bytes limitation[1] of the notify approach.

And Elixir is especially well suited for this type of workload. I actually extracted out much of the Supabase Realtime library so that I could work with the data directly in Elixir[2]

[1]: https://github.com/supabase/realtime#why-not-just-use-postgr...

[2]: https://github.com/cpursley/walex


Does Supabase have HA or failover yet? Asked a few months ago and got no answer


How many connected users does this scale to roughly?


Considering that it's Elixir/Erlang, presumably millions: https://phoenixframework.org/blog/the-road-to-2-million-webs...


> Sure it can work but LISTEN ties up a connection which limits scalability as connections are limited and expensive.

Scalability is always limited, no matter which solution you choose. This article argues that the scalability limit for this particular solution is acceptable for most people to begin with:

> It's rarely a mistake to start with Postgres and then switch out the most performance critical parts of your system when the time comes.


>LISTEN ties up a connection

Wait, what?? I can't keep doing things with my connection after I issue a LISTEN? That doesn't seem right! I would assume it would isomorphic to how unix-y bg programs will occasionally write to the console (although I see how this might be hard to deal with at the driver level). Now I will have to go check.


You (the application) can certainly go on to issue other database commands. See the docs [1].

> With the libpq library, the application issues LISTEN as an ordinary SQL command, and then must periodically call the function PQnotifies to find out whether any notification events have been received.

[1]: https://www.postgresql.org/docs/current/sql-listen.html


It’s also possible to use advisory locks to implement a job queue in Postgres. See e.g. Que[1]. Note there are a fair number of corner cases, so studying Que is wise if trying to implement something like this, as well as some (a bit older) elaboration[2].

We implemented a similar design to Que for a specific use case in our application that has a known low volume of jobs and for a variety of reasons benefits from this design over other solutions.

[1]: https://github.com/que-rb/que [2]: https://brandur.org/postgres-queues


There's something even nicer than advisory locks, as of a few years ago. https://www.2ndquadrant.com/en/blog/what-is-select-skip-lock...


Some comments from brandur on SKIP LOCKED in https://github.com/brandur/sorg/pull/263. I haven’t looked into it too much since what we have works without any issues; but good to know, thanks!


This is fantastic for relatively low volume queues with intensive work to be done by a small number of workers. For these types of use cases, I'll take this approach over RabbitMQ or Kafka all day long.

But once you get even just up to say, 40 messages per second with 100 worker processes, you're now up to 4000 updates per second just to see which worker got to claim which job, and up from there becomes untenable.


> This is fantastic for relatively low volume queues with intensive work to be done by a small number of workers. For these types of use cases, I'll take this approach over RabbitMQ or Kafka all day long.

That’s what our workload is like for our SaaS code analysis platform. We create a few tasks (~10 max) for every customer submission (usually triggered by a code push). We replaced Kafka with a PostgreSQL table a couple of years ago.

We made the schema, functions, and Grafana dashboard open source [0]. I think it’s slightly out-of-date but mostly the same as what we have now in production, and has been running perfectly.

[0] https://github.com/ShiftLeftSecurity/sql-task-queue


a few years back i worked on an enterprisey project that used postgres as a database, along with rabbitmq and celery for job processing.

of _course_ the system architecture had to have a job queue and it had to be highly available (implemented with a rabbitmq cluster)

what we learned after a few months in production was the only time the rabbitmq cluster had outages was when it got confused* and thought (incorrectly) there was a network partition, and flipped into partition recovery mode, causing a partial outage until production support could manually recover the cluster

the funny thing about this is that our job throughput was incredibly low, and we would have had better availability if we had avoided adding the temperamental rabbitmq cluster and instead implemented the job queue in our non-HA postgres instance that was already in the design --- if our postgres server went down then the whole app was stuffed anyway!

* this was a rabbitmq defect when TLS encryption was enabled and very large messages were jammed through the queue -- rabbitmq would be so busy encrypting / decrypting large messages that it'd forget to heartbeat, then it'd timeout and panic that it hadn't got any heartbeats, then assume that no heartbeats implied a network partition, and cause an outage, needing manual recovery. i think rabbitmq fixed that a few years back


Why jump from Postgres to Kafka? Celery + rabbitmq or redis is a great middle ground.


And RabbitMQ is a solid middle ground that you can scale like crazy.


I wrote StarQueue https://www.starqueue.org which uses Postgres as a back end.

I actually wrote StarQueue for MySQL, Microsoft SQL server and also Postgres - they all work just fine as a message queue back end, because they all support SKIP LOCKED.

I started out including Oracle as a supported database but found that I loathed Oracle's complexity.


Oban[1] from the Elixir ecosystem leans on Postgres for job scheduling

[1] https://github.com/sorentwo/oban


Wonder if theres a similar framework in Python


AFAIK the only one is Dramatiq [1] with dramatiq-pg [2]: a 3rd party message broker using Postgres LISTEN/NOTIFY.

[1]: https://dramatiq.io/

[2]: https://gitlab.com/dalibo/dramatiq-pg/


Strong disagree on using a database as a message queue. This article[0] covers many of the reasons why. Summary: additional application complexity and doesn't scale well with workers.

0. https://www.cloudamqp.com/blog/why-is-a-database-not-the-rig...

EDIT>> I am not suggesting people build their own rabbitmq infrastructure. Use a cloud service. The article is informational only.


I'm increasingly of the opinion that relational databases are absolutely the right way to build queue systems for most projects.

One of the biggest advantages comes when you start thinking about them in terms of transactions. Transactional guarantees are really useful here: guarantee that a message will be written to the queue if the transaction commits successfully, and guarantee that a message will NOT be written to the queue otherwise.

https://brandur.org/job-drain describes a great pattern for achieving that using PostgreSQL transactions.


The transaction feature seems nice but how often is your application dropping queue messages because something happened between tx.commit() and queue.send(msg)? My experience has been that this is not an issue.


I am SUPER worried about this when it affects something really important, like getting the client/customer/merchant/cardholder their money. It seems like the world has moved on without me —- “ohh, three nines is enough”… “hrmmm maybe?…”


If you're big enough to worry about the scalability of Postgres, you're big enough to experience this failure fairly often IMO.


Scalability was the second of two concerns I listed. The first was additional application complexity that real message queues hide from you by virtue of being a system built for that usage pattern.

>you're big enough to experience this failure fairly often IMO

Please explain how? You would either have to suffer from frequent network connectivity issues that affects only your db and not your queue, or your process must be mysteriously dying in the microseconds between those 2 operations. Either of those cases are not something I would consider things that happen "fairly often," even if you were processing trillions of messages per day.

In my experience, the vast majority of message processing failures happen at the worker level.


If the queue goes down you end up updating the db without enqueuing a job and now an engineer needs to go in and re enqueue the missing jobs manually.


Oh that happens fairly often. In fact, some message will be lost every time your queue server reboots due to a power outage, PSU failure, kernel panic, OOM, etc. (Unless it spends almost all of its time idle in which case I guess no messages will be in flight)

You’re guaranteed to break the invariant sooner or later so you end up with all the usual complexity of keeping stuff in sync.


Your queue server rebooting is completely orthogonal to whether the application submitting the message can do so atomically or not. Use a cloud service if you care about durability.

Edit>> I see you edited your post after I responded. None of those scenarios qualify as "fairly often."


Wish we would stop saying "use a cloud service" for everything. People can and do operate their own hardware, manage their own databases, and build and maintain their own application stacks. I don't know how we got to this point of learned helplessness where now we just have to use cloud providers for everything.


It's not learned helplessness to avoid re-inventing the wheel. At the end of the day, the goal is to deliver value to customers, not invent a queueing system, unless your company's product is queueing systems.


The scalability problem is way overblown. Setting the correct isolation level and locking mode isn't that hard and a modern cloud-hosted PG/MySQL can push 10s of thousands of inserts/s no problem.

IMO, the downsides of hosting a queue inside your primary relational DB are very much outweighed by the downsides of 1) having to run a new piece of infra like rabbit and 2) having to coordinate consistency between your message queue and your relational DB(s)


In the end, engineering is just whether the thing works with minimal maintenance. And ultimately, I've had great experiences using DBaaQ for low volume data movement. It works as a persistent queue with easy to do retries etc.

For high throughput (we had ad tech servers with 1E7 hits/s) we used a home-built low-latency queue that supported real time and persisted data. But for low throughput stuff, the DBaaQ worked fine.

And ultimately, maybe it was a lack of imagination on our part since Segment was successful with a mid-throughput DBaaQ https://segment.com/blog/introducing-centrifuge/


well rabbitmq is really really hard to setup correctly and stuff like priority, time based scheduling are not that much easier than rabbitmq. in fact a queue adds more complexity and it is not necessary until you outscale your database. not saying that rabbitmq might be a better fit, it's just not a good fit to start with. if you have a small team < 8 it's better to stay with as few things as possible and especially with things you know (well).


I wouldn't recommend setting up your own message queue infrastructure either. The cloudamqp link was more about the content than the product. All cloud providers come with extremely simple, scalable, and inexpensive message queue services with bindings to most languages.

A message queue is one of those things that is easy enough and worth the effort to do "right" early on, because it is not something you want to rip out and rewrite when you hit your scaling bottlenecks, given how critical it is and how many things it will end up touching.


well most cloud queues do not support priorities you can only create multiple subscriptions and prefer the messages from the higher one. so in the end you would built a system on a system anyway. also these queues lock you in quite hardly (and do not work on premise)

Edit: also keep in mind most queues do not like "slow consumers" i.e. if your workload is bursty with long processing times, a database might be a better fit (i.e. rabbitmq does not like it)

Edit2: we implemented a queue with postgres since we need acid and having 10k inserts per second is highly unlikely since a customer upload takes longer than a second (we deal with files) we mostly have burst workloads short period of high volume followed by long pauses (i.e. nobody uploads stuff at night)


>well most cloud queues do not support priorities you can only create multiple subscriptions

At least on GCP PubSub, a subscription is a separate concept from a topic/queue. If you want different priorities, you create multiple topics. You create multiple subscriptions when you want to fan out a single message to multiple workers. As far as I know, multiple subscriptions have nothing to do with priorities. Can you explain?


ah yeah topics... I basically meant topics. But having multiple topics for priority is still way harder than lets say the rabbitmq priority stuff or the postgres stuff.


It seems notable that this is a blog post from the perspective of rabbitmq authors, or at least the author of a book about it. It talks very vaguely about one potential implementation of a queue on postgres.


Basically every piece of this article's criticism is wrong as applied to the source / link above.

- No need to poll the database table

- No table-level locks and manual handling: row locks used for handling the work in progress

- "Manual cleanup" -- uhhh

Etc.


Basically all of those reasons are solved by using LISTEN/NOTIFY and FOR UPDATE SKIP LOCKED, which every queue built on pg will use.


>all of those reasons are solved

How does it solve the additional code complexity problem?


Which complexity? Of running a SQL query on the same database you're already using, vs writing code to support some other new system?


>Of running a SQL query on the same database you're already using

Go back and read OPs link. They create new SQL types, tables, triggers, and functions, with non-trivial and very unforgiving atomic logic. And every system that needs to read or write from this "db queue" needs to leverage specific queries. That's the complexity.

>vs writing code to support some other new system

You mean using a stable well maintained library with a clean and sensible interface to a queueing system? Yes, that is far more simple.


Not necessarily. Many languages like Java already have queuing libraries that operate on rdbms through JPA. So not even a single additional line needs to be written for this to work. We got ours working in a day and it works great. I don't know if other languages have these libraries but I'm inclined to believe they do (at least nodejs has).


Do you have to set up the triggers, tables, and procedures beforehand or how does that work?


The libraries create their own tables, triggers etc during initialization.


So your application needs privileged access (to create tables, triggers, etc) to the database in order to run? That's an anti-pattern. Your deployed application should only need least privileges possible. If you need to do extra things to your database, it should be done in migrations, which should be more privileged, but now you've decoupled the creation of these extra db objects from the library itself, meaning if the library changes, your migrations will not be in sync.


No... JPA writes out a file with the necessary DDL and the administrator runs it.

If this is insufficient for more complicated migrations, there's tooling to support it. e.g. Flyway.


It proves again, that in 90% of the cases Postgres is totally enough. If you don’t expect a high load (and most applications don’t), then just go with Postgres, simplicity wins!


A single core small Redis server can do wonders.

For fire and forget type jobs you can use lists instead of pub/sub: save a job to a list by a producer, pop it on the other end by a consumer and execute it. It's also very easy to scale, just start more producers and consumers.

We're currently using this technique, to process ~2M jobs per day, and we're just getting started. Redis needs very little memory for this, just a few mb.

Redis also supports acid style transactions.


Beware of the scale up challenges with Redis. Redis can only utilize a single core. If you do anything sophisticated that needs to be atomic then you can't scale out to multiple servers, and you can't scale up to multiple cores.

At least with Postgres you can scale up trivially. Postgres will efficiently take advantage of as many cores as you give it. For scale out you will need to move to a purpose built queuing solution.


Good point. My assumption is that the first hit would be memory usage, way before core usage.

There are many options for scaling:

- vertically scale by adding more memory

- start redis instance on another port (takes 1mb) if decided to add more cores on the same vm

- separate data into another vm

- sharding comes out of the box, but that would be my last resort


The problem is that you can't atomically write to your other database and also put a message on a redis queue. So you'll either end up with db changes not conveyed to redis, or you'll have messages on redis not reflected by changes to the db.


I just spent two months unrolling an unruly pg_boss implementation and that experience has soured me on using postgres for pubsub, jobs, or messaging. For my money, Github Actions is fine for infrequent cron jobs, or hell even a tiny lambda with a CloudWatch rule is super cheap, with infra as code to make it easy (relatively so) for anyone to deploy. After all, I'd rather spend the majority of my time on writing code than being a DevOps. If I need pub/sub, I'd much rather use SNS quick and dirty, or SNS+SQS for the heavy lifts. Separation of concerns and separation of tiers is still important to me, and I have no desire to maintain a server moving forward.


I have mostly great results using pg_boss. There was one upgrade which had a major bug, but the response was quick and the work around was easy.

The benefit of having a distributed cron that is version controlled and type checked in my repo has been amazing. I am also confident that my pg_boss implementation has higher uptime than GitHub actions.


Why couldn't the code run from a GitHub Action or within a Lambda be type checked and in version control? All of mine are. As for uptime, it completely depends on where you host your pg instance. But I'd wager you know that already.


Anybody using graphile-worker[1] in production/heavy load? It looks awesome, and I coded up some simple prototype tasks (email, sms, etc), but question how it truly scales. They claim horizontal scaling is trivial.

> graphile-worker is horizontally scalable. Each instance has a customisable worker pool, this pool defaults to size 1 (only one job at a time on this worker) but depending on the nature of your tasks (i.e. assuming they're not compute-heavy) you will likely want to set this higher to benefit from Node.js' concurrency.

[1] https://github.com/graphile/worker


Graphile Worker maintainer here; keep in mind that postgres is not the ideal location for a job queue, so you’re going to be limited ultimately by postgres’ capabilities. I’ve seen Worker max out at around 10k jobs/second but very much YMMV - you should benchmark it for your expected use case. Personally I’d move to a dedicated job queue if I started having an average of more than 1-2k jobs per second. The majority of systems never hit anywhere near this (we very much cater to the “long tail” of job queue needs).

Regarding the horizontal scalability; that relates to if you have heavy tasks (tasks that take a second or more to execute) - you can use more instances to get higher throughput.

Hope this helps!


Largely agree at the scale this article is working with.

But frankly, if 10k/s inserts is the scale you are talking about even worrying about a pub/sub solution seems odd.

Introducing something like Kafka for anything less than an order of magnitude more than that seems like an architectural blunder. By the time you are there Postgres will have obviously disqualified itself.


What does scale have to do with it? Pub/sub as an architectural pattern could be equally relevant for your use case whether there are a hundred users in your system or a billion.

And Kafka isn't the only solution for it. There are many lightweight pub/sub and queuing systems which also don't involve needlessly adding abstraction layers and application code into an RDBMS.


I think massive scale is the only reason you'd really want to adopt something like Kafka. If you're 10k inserts/s or less then there's no reason not to do everything in a single big relational DB where you get the warm fuzzy feeling of transactions, point in time backups, scalable read replication, etc


Yes but then 'enterprise architects' can't put their shiny cloud certifications to use. They need pubsub regardless of the scale because that's what is recommended by big cloud. And of course the companies also don't mind because there is no immediate devops cost for it. The problems like complexity of debugging, lockin won't hit home at the time of design.


To spell out good reasons for doing this:

If you're already using Postgres, you can avoid increasing operational complexity by introducing another database. Less operational complexity means better availability.

You can atomically modify jobs and the rest of your database. For example, you can atomically create a row and create a job to do processing on it.


But one of the golden rules of databases is to not use them as queues/integration.

Granted I didn't even read the main article because it seems like such a casual headline.

Edit post-read: yeah, using it as a CI jobs database. He lists the alternatives, but seriously, Kafka? Kafka is for linear scaling pub/sub. This guy has a couple CI jobs infrequently run.

Sure this works if the entire thing is throwaway for a non critical pub/sub system.

"It's possible to scale Postgres to storing a billion 1KB rows entirely in memory - This means you could quickly run queries against the full name of everyone on the planet on commodity hardware and with little fine-tuning."

Yeah just because it can does not mean it is suited for this purpose.

Don't do this for any integration at even medium scale.


Avoiding increasing operational complexity is really important, but for pub/sub we are using Redis. While this does add complexity, it is very little, because it is incredibly easy to install and maintain.


Obviously you're in a better position to evaluate the trade-offs for your application than I am, so I'm not saying your decision is wrong, but this can potentially decrease availability if your application depends on both PostgreSQL AND Redis to be available to function.


Agree on the pros. A few cons are increased costs due to (relatively) high IOPS, higher coupling, and need for more (costly) connections.

Overall though I'd agree it can be good as a first step or default until lower cost or higher performance is needed.


If you are interested how to implement this in Python, check out this Gist: https://gist.github.com/kissgyorgy/beccba1291de962702ea9c237...

It's really simple. I used this snippet for a real-time application which can react to any kind of change in the database instantly.


Rolling your own pub/sub server implementation is often riddled with subtle concurrency bugs. For example, in our first iteration, when finding new events to dispatch, the implementation simply used the "last dispatched event ID" from the previous run. It sounded logical, because ID's are guaranteed to grow monotonically. But in MySQL, this approach is problematic under high load. There's a race condition: first, the server reserves a new autoincrement ID, and only then (in a different, non-atomic step) it inserts an actual row. So sometimes, the goroutine would retrieve a newer row skipping an older row, if two sessions were concurrently reserving autoincrement IDs (and you could get a wrong order of events, with ORDER BY id). We fixed that, but a month later another arcane concurrency bug was found due to wrong transaction model assumptions (I don't already remember the details). So if I was to choose between rolling your own implementation, and using an existing well-tested tool, now I'd choose the latter.


This is a fantastic hack. This week I watched tech demos where AWS services like kinesis were used to articulate services processing on the order of .3 to 3 jobs per second. Lol. CLOUD


So SKIP LOCKED is a pretty well worn optimization at this point. People have been doing this for a while.

What kind of TPS are people seeing on queues based on psql?

edit: https://gist.github.com/chanks/7585810

10k/s here, but that was on a postgres from years ago - there have been like 4 or 5 major versions since then I think.

That's a good amount and I'm betting you can push it forward. Further, a queue is trivially sharded since messages are entirely isolated.

That said, Kafka can do hundreds of thousands if not millions of messages per second, so clearly there's room for optimizations.


You are correct that things have improved further.

You can now efficiently partition your job queue, just like you would do with Kafka to get higher scalability. You then "prepare" your "dequeue" query in Postgres and the planner will only look at the relevant partition, pruning all of the others. It's like having one logical queue to insert into and hundreds of actual physical queues, transparently partitioned, to pull from. You then assign your workers to specific partitions and plow through your jobs.

In PG 14, you can reasonably have a thousand partitions on a single job queue, each virtually independent performance-wise. As a bonus, you can have two-level partitions gated on the task's timestamp. Older partitions/tasks can then be detached and dropped without using DELETE, which makes it a fast operation in Postgres. Any index or table bloat from the older partitions disappears immediately. Pretty sweet.

Obviously, this takes more work to set up and there's ongoing operational stuff (cron job), but you retain all of the transactional guarantees of Postgres in the process and the performance is quite good. I like the overall operational simplicity of having everything in a single, inexpensive and rock-solid RDBMS. I like getting transaction-safe jobs "for free."

It's so cheap, too. People on this thread talk about using SQS. I spend ~$1600/month on SQS to process 20M messages per day, going on three years now. I can do far more than that on our Postgres instance, a $5K machine bought two years ago, sitting in our co-lo in downtown LA with a cheap 10Gb Cogent connection that also runs the rest of the business ($2300/month for the entire rack of machines + Internet).

But I'm forced to pay for that damn SQS queue because that's how a business partner gets us data. Such a waste of money for something so cheap and easy to do reliably with Postgres. I've now spent over $50K on something I can do more or less for free on a 2012 Dell server. Such is business.


Just because something can be used to do something doesn't mean it should. Kafka is specifically designed for this purpose, it is free, and it is easy to learn and use. If "starting with Postgres and then switching out when the time comes" saves money then I can understand. Otherwise use the right tool for the right job, right from the start.


I've been on stage at KafkaConf demo-ing my Kafka SRE chops, and I would avoid Kafka until I am sure it is necessary.

'easy to learn and use' is a downright lie.

edit: link to this same topic being discussed a few weeks ago: https://news.ycombinator.com/item?id=28903614#28904103


This is advice that seems reasonable but is actually pretty harmful.

Take a startup with a few users. The senior engineer decides they need pub/sub to ship a new feature. With Kafka, the team goes to learn about Kafka best practices, choose client libraries, and learn the Kafka quirks. They also need to spin up Kafka instances. They ship it in a month.

With postgres, they’ve got an MVP in a day, and shipped within a week.


I can set up an application to use AWS SQS or GCP PubSub in a day and it will scale without a second thought. I don't think it's productive to compare the worst case of scenario A and the best case of scenario B.


How does any of this equally not apply to PostgreSQL ?

Is this some magical database where you don't need to worry about access patterns, best practices or how it is deployed.


> How does any of this equally not apply to PostgreSQL ?

1. Postgres is easier to setup and run (than Kafka) 2. Most shops already have Postgres running (TFA is targeted to these shops) 3. Postgres is easier to adapt to changing access patterns (than Kafka).

----

> Is this some magical ...

Why must your adversary (Postgres) meet some mythical standard when your fighter (Kafka) doesn't meet even basic standards.


Yes, it's that magical database, up to certain scale.


> With postgres, they’ve got an MVP in a day, and shipped within a week.

And the next week they realize they want reader processes to block until there is work to do. Oops that's not supported. Now you have to code that feature yourself... and soon you're reinventing Kafka.


That's where LISTEN comes in. It's very simple to write this loop perfectly correct.


The very source we're talking about describes how to block until there is work to do -- listener.Listen("ci_jobs_status_channel")


I've been working with Kafka since 0.8, I mildly beg to differ on "easy to learn and use", just based on the fact that to use it well, you have to design your applications for its semantics, and that tuning it requires a lot of indepth understanding of its mechanics.

And I've seen a looot of bad designs and misconfigurations.

All that said, I'm a massive fan of Kafka, I'm the first to admit it's a complex tool, but it needs to be for the problem space it targets.


Kafka is not a queue. Kafka's parallelism is limited by the number of partitions you allocate, and you have to be sure to avoid head of line blocking.

Not the case with a queue.


This needs to be sung from the rooftops every time Kafka is mentioned. It's an amazing tool but it is the wrong wrong wrong tool if you need a queue. It will bite you in the ass and you'll be left with someone breathing down your neck wondering why jobs are processing so slowly and why you can't just spin up more workers.


Exactly. If you do want something very scalable that fixes these problems but shares a lot of architectural similarity with Kafka then you should check out Apache Pulsar.


what is "head of line" blocking?


A single partition is intended to be processed, more or less, in by a single worker. If one of those messages, for whatever reason, ends up being really expensive, or flaky, you can't move on until you've handled it.

That's head of line blocking.


> Kafka is specifically designed for this purpose, it is free, and it is easy to learn and use

I think Kafka is great, but it is absolutely not “easy to learn and use”.


If anyone is looking to use postgres as a job server with node.js clients, I can highly recommend the pg-boss library (https://github.com/timgit/pg-boss). Looks like it just added pub/sub too (yesterday), so I guess you could use it for that too.


Webapp.io people: I really don't get how your service works, exactly. I see the broad strokes, but I don't know how it will fit my own use cases, because I cannot see how it would integrate with my architecture or workflows. Please create some more diagrams with actual examples of all the use cases you hint at on your website but don't actually show examples of (other than your config file examples, which do not answer any of my questions). Even if I could use your thing at work, I won't even try now, because I can't see how it would apply, and I don't want to spend an hour digging into your tech docs to figure out if it would apply.


I know there used to be some O(n^2) logic associated with notify. Anyone know if that ever got fixed? Cursory search only turns up https://postgrespro.com/list/thread-id/2407396

  The benchmark shows that the time to send notifications grows 
  quadratically with the number of notifications per transaction without 
  the patch while it grows linearly with the patch applied and 
  notification collapsing disabled.


I actually implemented this after seeing this post back in the days, and it worked great, but writes are somewhat slow after it passes ~1m messages per day in my small VPS, and then I switched to redis.



You don't understand! Our small business with a few hundred customers needs to scale


Doesn't LISTEN connection holds transaction and therefore prevent vacuuming of old rows?


Same goes for MySQL. I like to try new and different things, so I'm always trying to find reasons to use things like redis or other pub/sub or caching options. But I usually tend to end up sticking with tried and true relational DBs. Obviously, there is a scale at which those other options will be necessary, but I haven't hit that yet.

Many times even when I do use something like Redis, I run into limitations.

For example, I have been using it for caching calculated data that users need, but now we need to be able to query the cached data by date which puts us back to needing to store the cached data in the relational database.


How would this Postgres feature work in a Postgres cluster? Does it work at all? If so, what are the requirements on the setup?


Could this be used with Postgrest?


So much mentions of Kafka or RMQ. What are peoples thoughts on Active MQ?


There is a simpler method. Rabbitmq for accepting new jobs. In the function start a job via spring batch. Basically you get the same effect but using existing tools.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: