Hacker News new | past | comments | ask | show | jobs | submit login

The exchange between Antirez and aphyr following the post about Redis sentinel is a fascinating comparison between two engineering approaches. Antirez makes a qualitative argument (http://antirez.com/news/56, especially http://antirez.com/news/56#comment-910996445) about the behavior of the system in some 'real world' where complex network partitions are rare. On the other hand, aphyr made a much more theoretically sound argument (including using TLA+ to demonstrate the validity of his argumement) in his post (http://aphyr.com/posts/287-asynchronous-replication-with-fai...).

Despite having a huge amount of respect for Antirez and Redis, I strongly believe that the approach aphyr took is the one we are going to need as we build larger and more complex systems on unreliable infrastructure. Our engineering intuition, as excellent as it may be for single-node systems, almost always fails us with distributed systems. To get around this, we need to replace intuition. The tools that aphyr uses, such as TLA+ and carefully crafted counterexamples and diagrams, are an extremely good start in that direction. Getting a computer (in this case TLA+'s model checker TLC) to exhaustively test a design specification is very powerful. Comparing those results to the ones that we expected is even more powerful.

The comment made by Metaxis (http://antirez.com/news/56#comment-905001533) on Antirez's second reply is very good. Especially:

> I think your attempt to differentiate formal correctness and real world operations is deeply flawed and amounts to asserting anecdote - that what you have observed to be common makes for good design assumptions and better trade off decisions.

> Allow me to counter: Real world operations will inevitably tend to approach formal correctness in terms of observed failure modes. In other words, over time, you are more and more likely to see edge cases and freak occurrences that are predicted in theory but happen rarely in practice.

This closely matches my own experience. Just because I don't believe a network can behave in a particular way, doesn't mean it won't. The real world is full of complex network partitions and Byzantine failures, and our systems need to be safe when they happen




That all may be true - but is it asking the wrong question?

It can be tempting to stand on an ivory tower and proclaim theory, but what is the real world cost/benefit? Are you building a NASA Shuttle Crawler-transporter to get groceries?


> It can be tempting to stand on an ivory tower and proclaim theory, but what is the real world cost/benefit?

That's an important question. The real world costs of fulling understanding the behavior of a system like Redis, amortized over all the users of that system is likely to be very low. That's a huge benefit of using well-tested pieces (whether they are Redis, Cassandra or Oracle) as the foundation of systems.

The important question to ask is what you are losing. Losing availability in these complex cases is acceptable for the vast majority of applications. You can save a lot of complexity and cost there. Losing safety is much more of a problem, because the effects can be very long lived. Once you have put inconsistent data into your database, you are in a world of hurt for potentially a very long time.

I think that it's actually cheaper in the long run, even for small systems with small goals, to build on top of safe components. The costs of writing code to deal with the shifting reality of inconsistent data is higher, as are the costs of not writing that code.

I don't think that proving the safety of protocols, testing those safety properties once implemented, and understanding failure modes is "ivory tower" at all. It's just good engineering practice.


I tried to address that question in the full version of the article, but had to condense it somewhat for the InfoQ post. http://aphyr.com/posts/286-call-me-maybe-final-thoughts




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: