Hacker News new | comments | show | ask | jobs | submit login
The Log: An Epic Software Engineering Article (bryanpendleton.blogspot.com)
152 points by tosh on Sept 20, 2014 | hide | past | web | favorite | 12 comments

Hah, I just noticed something amusing:

Oracle headquarters buildings: http://en.wikipedia.org/wiki/Oracle_Corporation#mediaviewer/...

The typical symbolic representation of a database in a diagram: https://openclipart.org/detail/181674/database-symbol-by-ete...

> Focus on the data, not on the logic. The logic will emerge when you understand the data. -- Bryan Pendleton

Sounds similar to:

> Bad programmers worry about the code. Good programmers worry about data structures and their relationships. -- Linus Torvalds

Good discussion of an (indeed) epic article. Reading the other comments here, I would like to stress the complexity of allowing concurrent updates of the same object on distributed machines.

And I totally share the hopes for the current eventual consistency monopoly to perish. Some use cases should use strong consistency. You see people build stuff like this https://github.com/Netflix/s3mper to "fix" eventual consistency...awkward

Cconcurrent update of the same object is actually an extremely common real-life use case. You need to handle the arising conflicts anyway, so you might as well get the benefits of eventual consistency. If you don't have concurrent updates, then you don't have a distributed system. But that doesn't model the real world anymore. The state of the object is already distributed to multiple computers (e.g. mobile clients), and the database system needs to handle that.

You can, of course, handle the concurrency on the application level, but the question is, why bother? The database can help with managing that.

The whole industry has been slowly trending away from eventual consistency. Google proved with spanner that it is possible to do linearizable transactions at mega scale. Even Amazon's DynamoDB offers strong consistency for individual rows.

Nobody wants to throw data away on concurrent writes, and CRDTs or application specific merge functions are complicated.

RTS games are an application of this form of replication - each player had the full fame state, sends their commands to the server that timestamps them and sends them back to all users (including the player who initiated the action) and the player's action is applied in sync to all the machines. Synchronicity is maintained by pure determinism.

I'm a little sad that neither article (I've read some more beside the ones linked there) about mentions that the idea of logging here and processing there is pretty much the Unix principle of each program handling text (mostly lines of colon/comma separated data records) as input and creating another stream of text as output. Also I don't know if the idea of a central data storage (which is Krep's suggestion as far as I can understand) is really the solution to all our problems in a time of distributed everything.

Having several datacenters running Kafka with MirrorMakers between then (https://cwiki.apache.org/confluence/pages/viewpage.action?pa...) is far from being central data storage, it's more like a graph of (reasonably) ordered data distributed across the globe, which is available to any application.

I don't really see much of a relationship to unix text stream processing. Maybe in something like a Kafka -> Samza -> Kafka style system -- but even then, it feels like a stretch. There's much more of a focus on the ordering of messages rather than the concept of log here, process there.

The LinkedIn article seemingly discusses every possible issue except security and privacy, which seems rather important when handling customer data. Access control in Kafka seems to be work in progress:


The Bitcoin blockchain is another nice example of this principle.

Great article, horrible layout. Fix the layout by pasting this in the console: $('div#main').style.width = "100%".

The referenced article was discussed at https://news.ycombinator.com/item?id=6916557.

Applications are open for YC Winter 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact