

Change Data Capture: The Magic Wand We Forgot - timclark
http://martin.kleppmann.com/2015/06/02/change-capture-at-berlin-buzzwords.html

======
boothead
I have come to believe that storing your data as the semantic events that
happen rather than the state at a given point in time is the way to go. From
what I've seen change data capture is the opposite process of trying to
extract an event stream from the data changes.

~~~
ZenoArrow
Why do you believe capturing semantic events (update statements, delete
statements, alter statements, etc...) is superior to capturing a log of the
data changes?

Whilst there is an element of compactness when it comes to capturing semantic
events, the benefit of using a simpler mechanism like logs means that you
don't need to use a full database engine to parse the data, and may end up
offering better performance (for example, no need to calculate what a commit
rollback entails on every node, just do it on the master node and let the
other nodes read the logs to know what to update).

~~~
boothead
I didn't explain properly. I meant that I think the things that should be
stored are things like.

    
    
      CustomerCreated { stuff }
      CustomerMadeOrder { custId, stuff }
      ItemAddedToOrder { orderId, stuff }
    
      etc..
    

This is the event sourcing view of the world.

~~~
ZenoArrow
What advantages does this give you? It appears to be the log data with extra
metadata, grouped by action, is this a fair assessment? In that case, what
value do you gain from the extra metadata and the groupings?

------
adamtj
See also, "The Log: What every software engineer should know about real-time
data's unifying abstraction"

[http://engineering.linkedin.com/distributed-systems/log-
what...](http://engineering.linkedin.com/distributed-systems/log-what-every-
software-engineer-should-know-about-real-time-datas-unifying)

[https://news.ycombinator.com/item?id=6916557](https://news.ycombinator.com/item?id=6916557)

------
baseballmerpeak
Essentially, one database to rule them all?

~~~
brianxq3
It is very much the opposite. With this pattern, you're going to have lots of
copies of your data in different transformations in potentially many different
data stores. The idea is that you take the stream of changes from something
like Postgres and use that stream to populate caches, indexes,
denormalizations/representations, counts, etc.

~~~
akkartik
One append-only data structure to rule all them databases.

~~~
endymi0n
If your nail looks like smaller size important data, CDC / immutable
datastores seem like a great hammer. For all other stuff, the answer is: it
depends. Some thoughs on the limitations of this approach:
[http://www.xaprb.com/blog/2013/12/28/immutability-mvcc-
and-g...](http://www.xaprb.com/blog/2013/12/28/immutability-mvcc-and-garbage-
collection/)

~~~
donjigweed
[https://news.ycombinator.com/item?id=7011102](https://news.ycombinator.com/item?id=7011102)

