
Ask HN: Is it possible to replicate 30TB from 20 DBs to PostgreSQL real-time? - dba_leveling_up
If you had 20 databases spread across 20 servers, could you replicate them all into a single PostgreSQL instance? And can this be done on a server costing less than $100,000?<p>The databases total 30 TB on disk. There are ~30,000 tables. The maximum number of rows in a single table is 1.2 billion. The number of rows changed per second is modest (100s or less).<p>Some of the servers run PostgreSQL, some MySQL. The versions are different, but they&#x27;re relatively current (PostgreSQL &gt;= 9.1, MySQL &gt;= 5.6).
======
slap_shot
How big is a row? If you're saying the change speed is "100s of records per
second or less" you could easily connect to each database as a replica and
stream the change data capture to Kafka/Google Pub/Sub or Kinesis and stream
the inserts into your destination table.

This should be possible for much less than $100,000. Assuming each database
gets its own process for reading the change feed, you're looking at 20 (add
more based on individual table/db needs) instances for the change data
capture, 3-5 instances for Kafka/ZK brokers, and a handful of machines for
reading the changes and writing them into the destination database.

The important thing here is not the number of databases or the size of the
tables - its the speed at which inserts happen. 100 inserts/updates per second
* 30k tables = 3M per second. That can be handled by the architecture I
described above.

If this is a real problem you are trying to solve, I'd love to talk to you.
I'm a co-founder of stealth company that is building better tools for
engineers solving problems like this.

Our product can solve this exact problem very easily. I'd love to hear what
you are working on and show you what we've built. Let me know how to contact
you.

Edit: corrected my math.

~~~
dba_leveling_up
The size of the rows vary, but very few tables have large blobs, and there
should be much fewer than 3M rows changing per second. Does the change capture
use triggers or a different method?

------
viraptor
[https://dba.stackexchange.com/](https://dba.stackexchange.com/)

~~~
dba_leveling_up
Thanks, posted. I am however also interested to hear from non-technical
stakeholders who have supported something similar.

------
rodri_vera
Good question. I am also interested as well.

