

Cassandra Lan Party: 20 Nodes, 3 Data Centers, 1 Hour - tjake
http://www.datastax.com/dev/blog/cassandra-nyc-lan-party

======
tim_h
The article mentions that "Tokyo" DC was taken offline by disconnecting the
data center from the "New York" and "Paris" DCs. Then writes were performed,
Tokyo was reconnected, and the writes replayed, effectively re-syncing the
DCs.

My question is: how was consistency maintained? What mechanism prevented the
same record from being changed both in Tokyo and in New York while the network
was split? Or, what mechanism resolved the conflicts? I think the article
might be glossing over some details here. I'm guessing that either one side of
the split was offline (or at least read-only) during the outage or that the
writes were made without guarantees.

~~~
rbranson
Cassandra allows you to specify the consistency of writes and reads per
operation, so for these operations, it would be required to trade consistency
for availability.

Conflict resolution is done using timestamps on each column. At first, this
might seem too crude to work, but works well in production on real world
applications.

When the datacenter link was healed, one of the 3 mechanisms used to repair
consistency kicked in and data was restored to a consistent state: Hinted
Handoff, Anti-Entropy, and Read Repair.

------
nmilford
It was a lot of fun!

Most of the trouble was herding all the people who showed up. That took 80% of
the time, getting the cluster up once everyone was on the network was the easy
part.

