Hacker News new | past | comments | ask | show | jobs | submit login

"It's not so much that integrity and availability weren't focus'd on, it's that they're working with things that take enormous time and effort to fully debug."

You literally just contradicted yourself. You didn't mean to but you did. What you said is their software requirements and challenge are enormous. It's going to be hard to pull of the theory and implementation. There is room for huge problems in protocol, custom code, libraries used, and OS interactions. Preventing tons of debugging requires QA to be turned up in these situations. Maybe even add protocol analysis like Amazon does with TLA+ on top of integration/fuzz/unit tests and language-level analysis.

Then, you said they were too focused on making it work to do that part. The part that was a prerequisite of making it work. As they're now seeing.

All I mean is that the evidence is no matter how much you care about getting it right, it will take several years to get this kind of system right. That was google and everyone else's experience with just Paxos, let alone a larger system that also involves time synchronization, transaction protocols, etc.

I don't think it was a matter of "ooops, we just didn't care enough". There's no way to make this kind of thing where it comes out of the oven perfect the first time. There just isn't.

I agree that using TLA+ or the like from the very beginning would probably help. I also found the "rule based development" paper from the RAMCloud folks pretty convincing, but I haven't tried to put it into practice.

"it will take several years to get this kind of system right."

That's definitely true.

I don't think it was a matter of "ooops, we just didn't care enough"."

In the article, they said that's whst happen. Little to no attention paid to problem. No QA person. Problems mounted. I don't why people keep speculating on causes when article itself said it was negligence they're correcting. That's also why Im countering all comments to the contrary.

Re RAMcloud paper. I might have missed it. Will look it up. Thanks.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact