Hacker News new | past | comments | ask | show | jobs | submit login
FoundationDB's Lesson: A Fast Key-Value Store Is Not Enough (voltdb.com)
94 points by jermo on April 1, 2015 | hide | past | favorite | 32 comments



The underlying organization of a database is not about what can be expressed in theory, it is about what can be expressed efficiently. Not only is there a very rich set of data structure designs that can be used with varying tradeoffs, but most sophisticated databases use an ecosystem of different, tightly interwoven data structures that smooth over the sharp corners that any single data structure has.

Efficient expressiveness follows from the relationships directly preserved in the organization. Generally from least-to-most expressive you have:

- cardinality preserving (hash tables, most KV stores)

- order preserving (LSM, btree, skiplist, space-filling curves*)

- space preserving (space decomposition e.g. quad-tree, a zoo of exotics)

Competent implementation is progressively more complex, nuanced, and sophisticated as you go down the list, so most implementations reflect the comfort level of implementor. As you move down this stack, you can express things efficiently that will be very inefficient to express with a less expressive class of organization.

SQL was designed for databases built on order-preserving structures. While you can implement it on a KV store, it will never be as efficient as a database organized in the more expressive organization that SQL assumes.

KV stores are popular because they are relatively simple to design and implement, not because they are expressive. It is an architectural impedance mismatch to add query functionality that tacitly assumes a more expressive organization. It will never perform well against a database actually organized for the expressiveness of the query layer.

Any database can only do a few things really well. It is inherent in the tradeoffs. You can add mediocre support for a laundry list of other "good enough" capabilities but you never want to market and position your product around those mediocre capabilities.


That's a good breakdown.

In this case though FoundationDB's KV store was order preserving. The API supported (& efficiently implemented) ranged reads, not just individual get's.

Implementing layered architectures always looks good 'on paper' but the details often throw up performance issues that are hard to deal with without punching some holes in the abstractions.

In this case it seems that relatively few companies had a need for a massively scalable ordered KV store, and the whole SQL layer was an attempt to bridge the product to a wider audience. It would be fascinating to hear more of the story but I suspect that will never escape now.


Regarding SQLite using FoundationDB as k/v, I got inspired by that comment to do the same thing with Redis instead: http://grisha.org/blog/2013/05/29/sqlite-db-stored-in-a-redi... (and it was quite slow too)

I'm curious though - databases typically store data in B-Trees which are blocks of equal size which works great for block storage. So isn't "block storage" essentially a key value store, where the key is the block number and the value is the block itself? That I think is the proper way of using a key/value store as the database back-end. (And that's what I did in my SQLite/Redis experiment, BTW)


I think this may slightly improve the too-fine-granularity locking, and it might make full table scans a bit more efficient, but otherwise most of what I wrote in the post applies. In fact the metadata problem has gotten worse and you might have to move even more data around.

It would help if you could push down filter predicates to run locally inside Redis, but at that point you're already more than a key-value store. I wonder if you could do this using Lua?


Having Redis to the work is what http://thredis.org/ was about - and it was blazingly fast. I just couldn't find a use for it, so it's mostly collecting dust at the moment.


I think you could, but at this point you're butting up against the event loop assumption: that most of the work is IO. If you do compute on the edges, you then want threads, and you're re-engineering redis (Edit: I should have read grandparent's link, where he does just this).

But the core idea of pushing predicates to edges seems reasonable. At one point, I built this sql engine that coordinated queries and pushed down queries to the edges. It assumed that each edge store implemented an iterator over all its values, with optional filtering and sorting (if not implemented on the edge store, then the engine/client would filter/sort). It works great, but I haven't yet published it for other reasons.


Hey, amateur here - but has anybody tried to do a database where your edge servers literally run jit code? Like, you'd define a predicate like an OpenCL kernel, as a small ball of code taking a predetermined set of constants or per-row variables, then presumably push this as LLVM bytecode and let the edges compile it into locally appropriate loops (probably with caching). Is the problem there that it would become hard to apply optimizations that depend on awareness of data structure at a higher-than-row level?


So there are plenty of systems that compile portions of a SQL plan to bytecode (LLVM or JVM) or machine code directly. Usually, the part you compile is the SQL plan and most importantly the predicate filters.

Common operations like networking, transaction management and even index walks (except the key comparisons) are already compiled to native code, so you don't need to go all in. You just optimize the stuff that needs it.


Hmm, a previous competitor slamming them only after they're certain FoundationDB's gag order will prevent a rebuttal. Classy.


Former VoltDB developer here.

John makes some good technical arguments about why implementing transactional SQL on top of a distributed KV store (even a transactional one) is hard.

The points about metadata performance and consistency were actually new ideas to me. I already had beef with moving the data around to SQL nodes as that is an obvious waste of capacity.

But the importance of metadata in processing SQL queries in a distributed database never occurred to me even though I've lost a decent chunk of my life implementing that consistency. It's one of those things you take for granted if it's how you have always done it.


Correct me if I'm wrong, but what happened to FoundationDB smells like an acqui-hire par excellence. This article provides an interesting alternative to the "Apple bought this space age technology to keep it to themselves" narrative.

Seems like the guy knows what he's talking about as well, b/c surprise, he's working on a DB.


There's a third option: Apple had a need for a massive, distributed K/V store, and at their scale, it's cheaper to buy a company than a license.


Apple is a heavy user of Teradata which according to this article they were the quickest to reach a petabyte. It is now 2 years later and they could be in the tens of petabytes given the success of iPhone 6 and integration with Beats.

https://gigaom.com/2013/03/27/why-apple-ebay-and-walmart-hav...

Teradata is VERY expensive and my guess is reaching also scalability limits.


FoundationDB solves completely different problems than Teradata. FDB isn't an analytical store.


http://cassandra.apache.org/

I'm not sure why Apple would prefer FoundationDB to Cassandra for this usecase.


According to the Cassandra home page: "One of the largest production deployments is Apple's, with over 75,000 nodes storing over 10 PB of data."


Perhaps because FoundationDB is ACID compliant?


So there is a point here. Apple is trying to compete with Google. Google has some amazing distributed systems, including Spanner, F1, MillWheel, etc... Apple has Cassandra and other OSS/COTS software. Not to ding Cassandra, but this is a problem for Apple. We've seen repeatedly that at huge scale, it often makes sense to own (or control) the software. See Facebook and LinkedIn as well.

Now I don't think FDB (the product) is the answer, at least not in the short term. There are more problems scaling it to Apple's use case than there are working around Cassandra's lack of ACID.

So I'm convinced the value of FDB is the experience in the engineers' brains. Apple need brains to run Cassandra, but also to figure out if Cassandra is the right long term path. Build, buy, adapt? It takes veterans to make the right call.


Apple runs Cassandra at scale today. It underpins most of iTunes Match and last time I checked all your iCloud data was sharded and stored in Cassandra. They run one of the latest clusters in the world (at least the last time I spoke with Datastax). Cassandra is a pretty easy database to run and scale and Datastax is just down the road from them to help.

My guess is that FoundationDB is replacing their Teradata installation. Better to buy the company and invest heavily in it then let it not met it's full potential as a small startup.


Why do people think ACID is some panacea for every database problem ? It doesn't save you from data loss. It doesn't help you scale. It doesn't make it easier to manage. It doesn't actually help with 99% of the problems you typically experience with a database.

A more measured, intelligent consideration of pros/cons is needed.


>> A more measured, intelligent consideration of pros/cons is needed.

Every April 1st :)


Quorum Writes, Quorum Reads is as ACID compliant as FoundationDB iirc.

The only thing I can think of is Apple has an internal team that is building a Cassandra replacement and they need more qualified engineers with experience.


Not even close. FDB did arbitrary multi-key transactions. Cassandra can only do Compare-And-Set on single keys IIRC.


Meh.

Yes, it's critical about the "layers" model and about the SQL implementation in particular. Most of what I write isn't this critical, but I thought there was actually an interesting point here so I wrote it down. Take it for whatever it's worth.


It seems like John had some technical points to make. It's hard for me to justify his post as "slamming" FoundationDB if his assertions are based on logic. Now, if you have a criticism of his points, you should feel free to rebut them. You certainly have not do so here.


The skillful use of logic in no way negates the slamminess.

My primary issue is that the people best able to rebut John's well-argued points are no longer able to do so.

My secondary issue is that John could have easily made the same logical arguments _before_ FoundationDB was acquired, but chose not to do so for whatever reason. This would have led to a much more enlightening debate than the one we're able to have now.


I could have easily written the same post months ago, yes. None of these thoughts are new.

It's actually precisely because FDB isn't a perceived competitor anymore that I can write this. As a vendor, I actually have much less of an agenda now. It's not like I'm worried about FDB stealing a customer. If I had posted this months ago, it would have been less credible and it would have felt tacky.

The other point is that I think there's actually something to say here. I'm not just dumping on a dead product, I'm trying to show there's a lesson to learn here about trying to bolt SQL onto things (a trend). To contrast, if NuoDB disappeared tomorrow, I could write a solid post on why they could never technically achieve what their marketing said they could, but there's no lesson there. "Don't make bad engineering choices" is too generic.


Their SQL layer architecture is indeed chatty.

If I remember correctly they were doing/planing a lot of optimizations like:

1. delaying requests a little on purpose to take more advantage of batch requests

2. fancy techniques to improve join locality (based on Akibans previous work).

These two go hand-to-hand.

So in a good enough network (aka not any public cloud) it'd probably work reasonably well up to a point.


When I started VoltDB, Mike Stonebraker told me we had to be 10x faster than Oracle sitting on a RAM drive, or nobody would care. 5x wasn't interesting to him.

It seems like FDB-SQL was closer to 1x, with a much better replication story, but with huge limitations on the kinds of things you can do. (https://foundationdb.com/layers/sql/documentation/Concepts/k...)

So maybe you could push it to 2x or 3x with a few years of work, but other new systems with more SQL support and more customer traction are doing 10x and up today. It's a tough sell.


While I do agree with you I don't think anyone used foundation DB sonellly for its sql layer and if they did...


Right. I'm not being critical of the underlying KV tech. It seemed pretty impressive from what I know. My two points were: 1. The SQL thing wasn't gonna work they way they went about it. 2. Without SQL or some other powerful query tool, it's a less interesting product.


I think quite the contrary it can be quite enough and it should not be spoiled with an SQL engine.

In which case it is a bit silly and you get the worst of both world.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: