Hacker News new | past | comments | ask | show | jobs | submit login

We're talking about an in-development distributed database. Obviously it's no where near ready for production, but that is not to say that it should be directly compared to the single-node reliability requirements of a relational monolith. Sure the software should be capable, and obviously isn't yet. But the infrastructure it runs on will have a lower bar and hence the software will need to tolerate more failures. In that sense, instability is a challenge they need to overcome in order to succeed, but the comparisons with scale-up database stores just doesn't make sense here.



A combo of a memory-safe language, restricted expression of it to ease analysis, and design-by-contract will knock out many problems with little cost. Likewise, the Cleanroom methodology used to do the same with regular languages. There's finance companies developing crash-free, fast stuff in Haskell and Ocaml. One guy at IRONSIDES put together a DNS immune to single-packet crashes just using SPARK Ada. Finally, SQLlite shows how rugged a database can be just integrating rigorous testing into its design that runs every time they change something.

All of this indicates many problems they're having could've been avoided with some different methodology. I don't need it, though, because they said they were ignoring stability instead. You keep forgetting that part in your comments. I forgive lots of inadvertent failures but intentionally ignoring the QA in mission-critical software deserves harsh comments. ;)


[cockroach labs engineer here] Where are you getting the "ignoring stability" part? Or that we "skipped correctness and stability" (in another comment of yours)?

Correctness and robustness are the main factors behind the design and implementation choices we make. A significant part of the overall engineering effort has been on stability for a long time. We're now making it closer to 90% for a while.

None of this is unexpected in the development of a complex system, especially when there are many factors involved in deciding how much to focus on various aspects. I have worked on a few unrelated systems which turned out to be stable and successful when released and - at a comparable stage in their development - they were much less stable. So personally, I am very optimistic about CockroachDB.


It's implied in the article. It mentions many mounting problems in stability, including nobody was dedicated to working on it. Implies a lack of QA. I based my claims on the article's. If those claims were mistated, then any of mine drawing on them wont apply of course.

Just reads like a lack of QA in general with most correctness effort focused on protocol design itself.


It really boggles my mind but you referring to this as mission-critical software when it's an in development product in its infancy.

Surely you're trolling?


It's intended to eventually be mission-critical software. So, you design it for verification from the beginning. You don't have to do all the QA at once. You just do a little plus leave room in how it's structured and coded to do more easily later on. This isn't trolling: it's standard practice in safety- and security-critical development. Also used by teams outside those fields that just want their stuff to work reliably & be easy to maintain.

Another principle from high-assurance security is that it's usually impossible to retrofit high robustness into a product after the fact. Has to be baked into it with each decision you make. Interestingly, the author cites the fact that correctness is usually impossible to retrofit and should be in from beginning. So, they already know this. ;)




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: