Hacker News new | past | comments | ask | show | jobs | submit login
DBLog: A Change-Data-Capture Framework (netflixtechblog.com)
102 points by srijan4 on Feb 15, 2020 | hide | past | favorite | 13 comments

I'd love more info how Netflix propagates schema changes to downstream stores. How do you apply migrations to heterogeneous databases? Applying binlog messages only works if downstream stores are the same flavor database as the source. And common message formats like Avro don't have a guaranteed migration strategy like protobuf.

I suspect it's more of a process solution than a technological solution. Are non-backwards-compatible migrations scheduled in advance, and broadcast to dependent teams? Are downstream consumers expected to have a replay/dead-letter queue?

This is a great question :) We may share details about that in a future blog post.

Is this similar to Debezium (https://debezium.io/)?

Debezium is mentioned in the post.

yes. Debezium is a related project.

This was nice to read. According to you, what would be the minimal change to push to postgresql and mysql in order to reduce complexity and better support tools like DBLog?

Added a kafka layer to make this real time capture. I guess lot of people are trying to do this. I guess what is the keypoint i am missing here.anyone ?

Kafka or similar is already used by the existing solutions mentioned in the post.

I think what this adds is a better way of dump processing without using database specific features.

DBLog has a very simple Output interface which allows to plugin a writer into whatever Output is desired: a stream (like Kafka), a datastore, a service, ...

For example one can use MySQL as a source and have ElasticSearch as a direct output, without needing to go through an intermediate stream like Kafka.

The described properties of DBLog (see blog post) hold true regardless of the output, including capturing changes in real-time and writing them to a desired output.

This depends on row-based binlog replication, correct? Has netflix had to deal with systems with statement-based replication?

correct. This way we can capture create, update and delete events of individual rows. binlog_format must be set to ROW in order to make this work in MySQL. For Postgres we are using replication slots which provide row based events.

We use MySQL RDS and it has "mixed" as the default binlog_format. Mixed uses statement based logging for some event types (see MySQL docu for details). Hence statement based replication is part of the mix unless one explicitly switches to ROW based replication (which is required for DBLog).

Interesting article thanks. Maybe add a link to the previous article in the "Blog Series" like it does in part 1.

This is a good point, we will fix that. Thx for the feedback

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact