Hacker News new | past | comments | ask | show | jobs | submit login
Change Data Capture (CDC) Tools should be database specialized not generalized
5 points by saisrirampur 5 months ago | hide | past | favorite | 2 comments
Reference: https://x.com/saisrirampur/status/1824694191537959159 Parent Tweet: https://x.com/craigkerstiens/status/1824114371737616794

Change Data Capture(CDC) is hard, it has 100s of edge cases / failure points. At PeerDB (https://www.peerdb.io/) , instead of focusing on multiple sources we just focused on Postgres. This helped us ensure that we gave enough care to iron out as many edge cases as possible. We were also able to implement a bunch of Postgres native performance and reliability optimizations. Our engineering blog https://blog.peerdb.io/ more on the optimizations and how we ironed out edge cases.

Pipeline failures have been rare these days, and so far none of the source databases were affected due to load. Also, most of our customers are in the shorter tail, i.e., data sizes over 300-400GB to 15-20TB. This helped battle test the product and make seamless for the long tail.

However, I don’t think CDC is a solved problem, as Postgres is full of mysteries and it keeps evolving. We need to continue polishing the experience and evolve along with Postgres!

TL;DR specialized CDC tools that focus on a single (or limited) database are reliable ways to provide a solid CDC experience.




One of the interesting things about databases is that unlike most other markets, there is no single database that completely owns a majority of market share.

MySQL, Oracle, SQL Server and Postgres are very close to each other in terms of market share (however you choose to define it).

The effect is that most database tooling folks almost never specialise on one database. They'd be giving up too much TAM for that.

I think this is a big part of why database tooling in general are generalised, not specialised. It's also a big part of why database tooling aren't as good as they should be.


Ack, true VC-backed startup optimize for TAM and therefore go more generalized. At PeerDB, we were still focused on just Postgres and were expanding from CDC to other ETL/data-movement use cases for Postgres, such as Active<>Active, Database migrations etc. With the pace at which Postgres is growing, I believe that ETL for Postgres can meet that billion-dollar TAM. Anyway, we were recently acquired by ClickHouse and are now doubling down on providing a world-class CDC experience from Postgres to ClickHouse. :)




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: