Reference: https://x.com/saisrirampur/status/1824694191537959159
Parent Tweet: https://x.com/craigkerstiens/status/1824114371737616794
Change Data Capture(CDC) is hard, it has 100s of edge cases / failure points. At
PeerDB (https://www.peerdb.io/)
, instead of focusing on multiple sources we just focused on Postgres. This helped us ensure that we gave enough care to iron out as many edge cases as possible. We were also able to implement a bunch of Postgres native performance and reliability optimizations. Our engineering blog https://blog.peerdb.io/ more on the optimizations and how we ironed out edge cases.
Pipeline failures have been rare these days, and so far none of the source databases were affected due to load. Also, most of our customers are in the shorter tail, i.e., data sizes over 300-400GB to 15-20TB. This helped battle test the product and make seamless for the long tail.
However, I don’t think CDC is a solved problem, as Postgres is full of mysteries and it keeps evolving. We need to continue polishing the experience and evolve along with Postgres!
TL;DR specialized CDC tools that focus on a single (or limited) database are reliable ways to provide a solid CDC experience.
MySQL, Oracle, SQL Server and Postgres are very close to each other in terms of market share (however you choose to define it).
The effect is that most database tooling folks almost never specialise on one database. They'd be giving up too much TAM for that.
I think this is a big part of why database tooling in general are generalised, not specialised. It's also a big part of why database tooling aren't as good as they should be.