He also did these awesome Tolkien-esque maps of the database engine ecosystem: https://martin.kleppmann.com/2017/03/15/map-distributed-data...
Anyway, I inject this sort of stuff directly into my veins, so thanks very much for the post!
I am the CTO/Founder of YugaByte and author of the above post. Thanks for your comments, glad you liked the post! You make a great point about YugaByte DB using Apache Kudu to start out, but wanted to clarify a few things.
* This is a post about our architectural decisions in building a distributed SQL database, and less so about how we actually implement it. Hence there is no mention of any starting points as far as the codebase.
* Also, we have used the RocksDB and PostgreSQL codebases in their entirety in addition to Apache Kudu, and make no secret of this fact (https://docs.yugabyte.com/latest/architecture/concepts/ackno...).
Incredible work. And supporting full postgresql.
This does sound a lot like a marketing post. But me no cares. This is pretty awesome I'm rarely this hyped for a technology.
Funnily enough, all what I would call "NoSQL papers" are very well known, yet F5 is hardly ever talked about. As if some people would not like to admit they might have been wrong ;-)
What do you mean by this? There are lots of examples of distributed databases.
regarding the tone of the post, we are always open to feedback on how we can do better. let us know through any means possible incl. github and slack.
Under contention, a Calvin-based system will behave similarly to others which use optimistic locking schemes for Serializable isolation such as Postgres, or YB itself. There are advantages to the Calvin approach as well. For example, under Calvin, the system doesn't have to write out speculative intents to all data replicas in order to detect contention: The only writes to data storage are successful ones. The original paper only describes this briefly, but you can read about how FaunaDB has implemented it in more detail: https://fauna.com/blog/consistency-without-clocks-faunadb-tr...
It's also not a stretch to see how the protocol described in that post can be extended to support session transactions: Rather than executing a transaction per request, the transaction context is maintained for the life of the session and then dispatched to Calvin on commit. (This is in fact how we are implementing it in our SQL API feature.)
I would instead say that one of the more significant differences between Calvin and Spanner is the latter's much stricter requirements it places on its hardware (i.e. clock accuracy) in order to maintain its correctness guarantees; a weakness its variants also share.
Reference to querying Cosmos with SQL: https://docs.microsoft.com/en-us/azure/cosmos-db/how-to-sql-...
1) BESPOKV: Application Tailored Scale-Out Key-Value Stores
2) ClusterOn: Building Highly Configurable and Reusable Clustered Data Services using Simple Data Nodes
DynomiteDB from Netflix allowed to scale singe server NoSQL stores. https://github.com/Netflix/dynomite.
From a technical perspective, I'd guess that the reason was because it makes local transactions trickier to do. Spanner has a philosophy of avoiding work that isn't required, and knows how to optimize transactions that don't require 2PC-ing data on different servers.
Table interleaving is also important to ensure that correlated data that is colocated in the same server in the cluster.
I'm sorry you're all offended that I'm on legacy projects right now and we're not at this point yet.