This is INCREDIBLE news! FoundationDB is the greatest piece of software I’ve ever worked on or used, and an amazing primitive for anybody who’s building distributed systems.
The short version is that FDB is a massively scalable and fast transactional distributed database with some of the best testing and fault-tolerance on earth[1]. It’s in widespread production use at Apple and several other major companies.
But the really interesting part is that it provides an extremely efficient and low-level interface for any other system that needs to scalably store consistent state. At FoundationDB (the company) our initial push was to use this to write multiple different database frontends with different data models and query languages (a SQL database, a document database, etc.) which all stored their data in the same underlying system. A customer could then pick whichever one they wanted, or even pick a bunch of them and only have to worry about operating one distributed stateful thing.
But if anything, that’s too modest a vision! It’s trivial to implement the Zookeeper API on top of FoundationDB, so there’s another thing you don’t have to run. How about metadata storage for a distributed filesystem? Perfect use case. How about distributed task queues? Bring it on. How about replacing your Lucene/ElasticSearch index with something that actually scales and works? Great idea!
And this is why this move is actually genius for Apple too. There are a hundred such layers that could be written, SHOULD be written. But Apple is a focused company, and there’s no reason they should write them all themselves. Each one that the community produces, however, will help Apple to further leverage their investment in FoundationDB. It’s really smart.
I could talk about this system for ages, and am happy to answer questions in this thread. But for now, HUGE congratulations to the FDB team at Apple and HUGE thanks to the executives and other stakeholders who made this happen.
Now I’m going to go think about what layers I want to build…
[1] Yes, yes, we ran Jepsen on it ourselves and found no problems. In fact, our everyday testing was way more brutal than Jepsen, I gave a talk about it here: https://www.youtube.com/watch?v=4fFDFbi3toc
Will said what I wanted to say, but: me too. I'm super happy about this and grateful to the team that made it happen!
(I was one of the co-founders of FoundationDB-the-company and was the architect of the product for a long time. Now that it's open source, I can rejoin the community!)
Another (non-technical) founder here - and I echo everything voidmain just said. We built a product that is unmatched in so many important ways, and it's fantastic that it's available to the world again. Will be exciting to watch a community grow around it - this is a product that can benefit hugely from OS contributions as layers that sit on top of the core KV store.
I echo what wwilson has said. I work at Snowflake Computing, we're a SQL analytics database in the cloud, and we have been using FoundationDB as our metadata store for over 4 years. It is a truly awesome product and has proven to be rock-solid over this time. It is a core piece in our architecture, and is heavily used by all our services. Some of the layers that wwilson is talking about, we've built them. metadata storage , object-mapping layer , lock manager , notifications system . In conjunction with these layers, FoundationDB has allowed us to build features that are unique to Snowflake. Check out our blog titled, "How FoundationDB powers Snowflake Metadata forward" [1]
Kudos to the FoundationDB team and Apple for open sourcing this wonderful product. We're cheering you all along! And we look forward to contributing to the open source and community.
I am one of the designers of probably the best known metadata storage engine for a distributed filesystem, hopsfs - www.hops.io.
When I looked at FoundationDB before Apple bought you, you supported transactions - great. But we need much more to scale. Can you tell me which of the following you have:
row-level locks
partition-pruned index scans
non-serialized cross-partition transactions (that is, a transaction coordinator per DB node)
distribution-aware transactions (hints on which TC to start a transaction on)
It's somewhat hard to answer your questions because the architecture (and hence, terminology) of FoundationDB is a little different than I think you are used to. But I will give it a shot.
FoundationDB uses optimistic concurrency, so "conflict ranges" rather than "locks". Each range is a (lexicographic) interval of one or more keys read or written by a transaction. The minimum granularity is a single key.
FoundationDB doesn't have a feature for indexing per se at all. Instead indexes are represented directly in the key/value store and kept consistent with the data through transactions. The scalability of this approach is great, because index queries never have to be broadcast to all nodes, they just go to where the relevant part of the index is stored.
FoundationDB delivers serializable isolation and external consistency for all transactions. There's nothing particularly special about transactions that are "cross-partition"; because of our approach to indexing and data model design generally we expect the vast majority of transactions to be in that category. So rather than make single-partition transactions fast and everything else very slow, we focused on making the general case as performant as possible.
Transaction coordination is pretty different in FoundationDB than in 2PC-based systems. The job of determining which conflict ranges intersect is done by a set of internal microservices called "resolvers", which partition up the keyspace totally independently of the way it is partitioned for data storage.
Please tell me if that leaves questions unresolved for you!
Thanks for the detailed answer.
Is it actually serializable isolation - does it handle write skew anomalies (https://en.wikipedia.org/wiki/Snapshot_isolation)?
Most OCC systems I know have only snapshot isolation.
Systems that sound closest to FoundationDB's transaction model that i can think of are Omid (https://omid.incubator.apache.org/) and Phoenix (https://phoenix.apache.org/transactions.html). They both support MVCC transactions - but I think they have a single coordinator that gives out timestamps for transactions - like your "resolvers". The question is how your "resolvers" reach agreement - are they each responsible for a range (partition)? If transactions cross ranges, how do they reach agreement?
We have talked to many DB designers about including their DBs in HopsFS, but mostly it falls down on something or other. In our case, metadata is stored fully denormalized - all inodes in a FS path are separate rows in a table. In your case, you would fall down on secondary indexes - which are a must. Clustered PK indexes are not enough. For HopsFS/HDFS, there are so many ways in which inodes/blocks/replicas are accessed using different protocols (not just reading/writing files or listing directories, but also listing all blocks for a datanode when handling a block report). Having said that, it's a great DB for other use cases, and it's great that it's open-source.
Yes, it's really serializable isolation. The real kind, not the "doesn't exhibit any of the named anomalies in the ANSI spec" kind. We can selectively relax isolation (to snapshot) on a per-read basis (by just not creating a conflict range for that read).
I tried to explain distributed resolution elsewhere in the thread.
I believe our approach to indices pretty much totally dominates per-partition indexing. You can easily maintain the secondary indexes you describe; I don't understand your objection.
My guess is the objection lies in "Have to manage the index myself."
Also, the main draw-back of "indices as data" in NoSQL is when you need to add a new index -- suddenly, you have to scrub all your data and add it to the new index, using some manual walk-the-data function, and you have to make sure that all operations that take place while you're doing this are also aware of the new index and its possibly incomplete state.
Certainly not impossible to do, but it sometimes feels a little bit like "I wanted a house, but I got a pile of drywall and 2x4 framing studs."
"I wanted a house, but I got a pile of drywall and 2x4 framing studs."
This is a totally legitimate complaint about FoundationDB, which is designed specifically to be, well, a foundation rather than a house. If you try to live in just a foundation you are going to find it modestly inconvenient. (But try building a house on top of another house and you will really regret it!)
The solution is of course to use a higher level database or databases suitable to your needs which are built on FDB, and drop down to the key value level only for your stickiest problems.
Unfortunately Apple didn't release any such to the public so far. So I hope the community is up to the job of building a few :-)
I agree with voidmain’s comment as secondary indexes shouldn’t be any different than the primary KV in your case. Almost seems that you’re focusing on a SQL/Relational database architecture but storing your data demoralized anyways. Odd combination of thoughts.
Is that where the data sits around waiting, eager, to be queried into action while watching data around them being used over and over again.... But that time never comes, thus leaving them to question their very worth?
> Transaction coordination is pretty different in FoundationDB than in 2PC-based systems. The job of determining which conflict ranges intersect is done by a set of internal microservices called "resolvers", which partition up the keyspace totally independently of the way it is partitioned for data storage.
Ok, per my other question that makes sense. Similar to FaunaDB except the "resolvers" (transaction processors) are themselves partitioned within a "keyspace" (logical database) in FaunaDB for high availability and throughput. But FaunaDB transactions are also single-phase and we get huge performance benefits from it.
> best known metadata storage engine for a distributed filesystem, hopsfs
Not even close. I don't even see anything I'd call a filesystem mentioned on your web page. I missed FAST this year, and apparently you had a paper about using Hops as a building block for a non-POSIX filesystem - i.e. not a filesystem in my and many others' opinion - but it's not clear whether it has ever even been used in production anywhere let alone become "best known" in that or any other domain. I know you're proud, perhaps you have reason to be, but please.
I'm not convinced a paper with a website and some proof of concepts would be considered the "best". You're throwing around a bunch of components in to a distro calling yourselves everything from "deep learning" to a file system. It's not clear what you guys are even trying to do here.
You don't need to worry about shards / partitions with FDB. Their transactions can involve any number of rows across any number of shards. It is by far the best foundation for your bespoke database.
Your talk was one of the best talk i've seen , and i keep mentionning it to people whenever they ask me about distributed systems, database and testing.
i'm incredibly impatient to have a look at what the community is going to build on top of that very promising technology.
I'm not familiar with FDB but what you say sounds almost too good to be true. Can I use it to implement the Google Datastore api? I'm trying for years to find a suitable backend so that I can leave the Google land. Everything I tried either required a schema or lacked transactions or key namespaces.
If I recall correctly, the "SQL layer" you had in FDB before the Apple acquisition was a nice proof of concept, but lacked support for many features (joins in SELECT, for example). Is the SQL layer code from that time available anywhere to the public? (I'm not seeing it in the repo announced by OP.)
I used to work there. The SQL layer was capable of actually the majority of SQL features including joins, etc. We had an internal rails app that we used to dog-food for performance monitoring, etc. I used to work on the document layer, and was sad to see it wasn't included here.
I hope this question doesn't feel too dump, but is it possible to implement a SQL Layer using SQLite's Virtual Table mechanism and leverage all foundationdb's features?
It doesn't look like Apple open-sourced the Document Layer, which is a slight bummer. But I echo what Dave said below: what we got is incredible, let's not get greedy!
Also TBH now that I don't have commercial reasons to push interop, if I write another document database on top of FDB, I doubt I'd make it Mongo compatible. That API is gnarly.
Other than that, they totally did a fake it until you make it with MongoDB 3.4 passing Jepsen a year ago and MongoDB BI 2.0 containing their own SQL engine instead of wrapping PostgreSQL.
What specifically are you trying to avoid endorsing about the author of the LinkedIn post to which you linked? I couldn't find anything from a cursory web search.
He runs lambdaconf, and refused to disinvite a speaker who many people felt shouldn't be permitted to speak because of his historical non-technical writings.
(I've tried to keep the above as dry as possible to avoid dragging the arguments around this situation into this thread - and I suspect the phrasing of the previous comment was also intended to try and avoid that, so let's see if we can keep it that way, please)
You could try adding controversy to the author name and searching then. As mst correctly notes, I am trying to avoid reigniting said controversy while indicating my distaste.
I've forked the official SDK so that I can get extra functionality but it's quite hard to keep it updated when internal stuff changes. There is no way I can contribute.
I can't use it everywhere I want...shortly said it's not open source and this sucks.
MongoDB is AGPL or proprietary. Many companies have a policy against using AGPL licensed code. So, if you work at one of those companies, then open source MongoDB is not an option (at least for work projects), and proprietary may not be either (depending on budget etc).
FoundationDB is now licensed under Apache 2, which is a much more permissive license, so most companies' open source policies allow it.
Unless people want to change the mongodb code that they would be using, using the agpl software should be a non issue and there are not problems with it. People should start understanding the available licenses instead of spreading fear.
I know that multiple other companies have a similar policy (either a complete ban on using AGPL-licensed software, or special approval required to use it), although unlike Google, they don't post their internal policy publicly.
If someone works at one of these companies, what do you want to do – spend your day trying to argue to get the AGPL ban changed, or a special exception for your project; or do you just go with the non-AGPL alternative and get on with coding?
The main reason it's a problem at many of the companies which ban it is they have a lot of engineers who readily patch and combine code from disparate sources and might not always apply the right discipline to keep AGPL things appropriately separate. Bright-line rules can be quite useful.
It is true that MongoDB's AGPL contagion and compliance burden, if you don't modify it, is less than many fear. It is also true that those corporate concerns are valid. MongoDB does sell commercial licenses so that such companies can handle their unavoidable MongoDB needs, but they would tend to minimize that usage.
> How about replacing your Lucene/ElasticSearch index with something that actually scales and works?
Do you have something to back that up? This to me reads like you imply that Elasticsearch does not work and scale.
It's definitely interesting but I'm cautious. The track record for FoundationDB and Apple has not been great here. IIRC they acquired the company and took the software offline leaving people in the rain?
Could this be like it happened with Cassandra at Facebook where they dropped the code and then more or less stopped contributing?
Also I haven't seen many contributions from Apple to open-source projects like Hadoop etc. in the past few years. Looking for "@apple.com" email addresses in the mailing lists doesn't yield a lot of results. I understand that this is a different team and that people might use different E-Mail addresses and so on.
In general I'm also always cautious (but open-minded) when there's lots of enthusiasm and there seems to be no downside. I'm sure FoundationDB has its dirty little secrets and it would be great to know what those are.
They are, slowly. Swift is open source, clang is open source. They are moving parts of the xcode IDE into open source, like with sourcekitd and now recently clangd.
I don't think they will ever move 'secret sauce' into open source, but infrastructural things like DBs and dev tooling seems to be going in that direction.
~~How is it different from when Apple acquired the then-open-source FoundationDB (and shut down public access)? They could have just kept it open source back then.~~
EDIT: My bad, looks like FoundationDB wasn't fully open-source back then.
From what I recall (and based on some quick retro-googling) I don't believe Foudnation was open-source. One of the complaints about it on HN back in the day was that it was closed...
Unrelated to the original topic, but I had never come across that talk and it is great. I use the same basic approach to testing distributed systems (simulating all non-deterministic I/O operations) and that talk is a very good introduction to the principle.
Know of any ideas around using this for long term Time Series data? Wonder if maybe something like OpenTSDB but with this as backend instead of hbase (which can be a sort of operational hell)
It's more like you would build a better Elasticsearch using Lucene to do the indexing and FoundationDB to do the storage. FoundationDB will make it fault tolerant and scalable; the other pieces will be stateless.
It'd take a low number of hours to wire up FoundationDB as a Lucene filesystem (Directory) implementation. Shared filesystem with a local RAM cache has been practical for a while in Lucene, and was briefly supported then deprecated in Elasticsearch. I've used Lucene on top of HDFS and S3 quite nicely.
If you have a reason to use FoundationDB over HDFS, NFS, S3, etc, then this will work well.
Doing a Lucene+DB implementation where each entry posting lists are stored natively in the key-value system was explored for Lucene+Cassandra as (https://github.com/tjake/Solandra). It was horrifically slow, not because Cassandra was slow, but because posting lists are optimized and putting them in a generalized b-tree or LSM-tree variant will remove some locality and many of the possible optimizations.
I'm still holding out some hope for a hybrid implementation where posting list ranges are stored in a kv store.
I think you are on the right track. Storing every individual (term, document, ...) in the key value store will not be efficient, but you should be able to take Lucene's nice fast immutable data structure and stuff blocks of it (at the term level or below) into FDB values very efficiently. And of course you can do caching (and represent invalidation data structures in FDB), and...
FDB leaves room for a lot of creativity in optimizing higher layers. Transactions mean that you can use data structures with global invariants.
So from the Lucene perspective, the idea of a filesystem is pretty baked into the format. However, there's also the idea of a Codec which takes the logical data structures and translates to/from the filesystem. If you made a Codec that ignored the filesystem and just interacted with FDB, then that could work.
You can already tune segment sizes (a segment is a self-contained index over a subset of documents). I'd assume that the right thing to do for a first attempt is to use a Codec to write each term's entire posting list for that one segment to a single FDB key (doing similar things for the many auxiliary data structures). If it gets too big, then you should have tuned max segment size to be smaller. Do some sort of caching on the hot spots.
If anyone has any serious interest in trying this, my email is in my profile to discuss further.
Hmmm. I'm skeptical. A Lucene term lookup is stupidly fast. It traverses an FST, which is small and probably in memory. Traversing the postings lists itself also needs to be smart by following a skip table, which is critical for performance.
> you should be able to take Lucene's nice fast immutable data structure and stuff blocks of it (at the term level or below) into FDB values very efficiently.
That sounds a lot like Datomic's "Storage Resource" approach, too! Would Datomic-on-FDB make sense, or is there a duplication of effort there?
Datomic’s single-writer system requires conditional put (CAS) for index and (transaction) log (trees) roots pointers (mutable writes), and eventual consistency for all other writes (immutable writes) [0].
I would go as far as saying a FoundationDB-specific Datomic may be able to drop its single-writer system due to FoundationDB’s external consistency and causality guarantees [1], drop its 64bit integer-based keys to take advantage of FoundationDB range reads [2], drop its memcached layer due to FoundationDB’s distributed caching [3], use FoundationsDB watches for transactor messaging and tx-report-queue function [4], use FoundationDB snapshot reads [5] for its immutable indexes trees nodes, and maybe more?
Datomic is a FoundationDB layer. It just doesn’t know yet.
I wrote the original version of Solandra (which is/was Solr on Cassandra) on top of Jake's Lucene on Cassandra[1].
I can confirm it wasn't fast!
(And to be fair that wasn't the point - back then there were no distributed versions of Solr available so the idea of this was to solve the reliability/failover issue).
I wouldn't use it on a production system now days.
I just watched the demo of 5 machines and 2 getting unplugged. The remaining 3 can form a quorum. What happens if it was 3 and 3? Do they both form quorums?
A subset of the processes in a FoundationDB cluster have the job of maintaining coordination state (via disk Paxos).
In any partition situation, if one of the partitions contains a majority of the coordinators then it will stay live, while minority partitions become unavailable.
Nitpick: To be fully live, a partition needs a majority of the coordinators and at least one replica of each piece of data (if you don't have any replicas of something unimportant, you might be able to get some work done, but if you have coordinators and nothing else in a partition you aren't going to be making progress)
The majority is always N/2 + 1, where N is the number of members. A 6 member is less fault-tolerant than a 5 member cluster (quorum is 4 nodes instead of 3, and it still only allows for 2 nodes to fail).
The number of coordinators is separate from the number of boxes. You don't have to have a coordinator on every box.
I think you can set the number of coordinators to be even, but you never should - the fault tolerance will be strictly better if you decrease it by one.
Apple also uses HBase for Siri I believe, what are some of the cluster sizes that FoundationDB scales to? Could it be used to replace HBase or Hadoop?
I was in attendance at your talk, and thought it was one of the best at the conference. Apple I think broke some hearts completely going closed-source for a while, but glad to see them open sourcing a promising technology.
If scale is a function of read/writes, very large. In fact with relatively minimal (virtual) hardware it's not insane to see a cluster doing around 1M writes/second.
I was talking more about large file storage like HDFS, and the MapReduce model of bringing computation to data. HBase does the latter, and it's strongly consistent like FoundationDB, though FoundationDB provides better guarantees. As a K/V I understand what you and OP say.
How does this compare with CockroachDB? I'm planning to use CockroachDB for a project but would love to get an idea if I can get better results with FoundationDB.
They might be targeting the wrong market, hence the desperate marketing. For people who use MySQL/PostgreSQL a compatible, slower, but distributed database probably just doesn't solve any problem. Those people need a managed solution, not a distributed one.
That presentation was really good! Well explained on the simulations. If one wanted to get into this exciting event and create something with FoundationDB but no database experience (I do know many programming languages) where would I start? If anyone could point me in the direction, I'd greatly appreciate it.
How scalable and feasible to implement a SQL Layer on top of SQLite's Virtual Table Mechanism (https://www.sqlite.org/vtab.html) which redirects the read/write of the record data from/to foundationdb?
Long before we acquired Akiban, I prototyped a sql layer using (now defunct) sqlite4, which used a K/V store abstraction as its storage layer. I would guess that a virtual table implementation would be similar: easy to get working, and it would work, but the performance is never going to be amazing.
To get great performance for SQL on FoundationDB you really want an asynchronous execution engine that can take full advantage of the ability to hide latency by submitting multiple queries to the K/V store in parallel. For example if you are doing a nested loop join of tables A and B you will be reading rows of B randomly based on the foreign keys in A, but you want to be requesting hundreds or thousands of them simultaneously, not one by one.
Even our SQL Layer (derived from Akiban) only got this mostly right - its execution engine was not originally designed to be asynchronous and we modified it to do pipelining which can get a good amount of parallelism but still leaves something on the table especially in small fast queries.
@voidmain, Thank you, it's very insightful and clear! I mean, I can see the disadvantage if such SQL layer is implemented directly through SQLite's virtual tables.
Would it be possible to build a tree DB on top of it like MonetDB/Xquery? I always wondered why XML databases never took off, I've never seen anything else quit as powerful. Document databases if du jour seem comparatively lame.
Yes you can. You need a tree index basically. Any kv store can serve as the backing data structure. I've been writing one for config file bidirectional transformation.
MongoDB is just a database with less features than a SQL database. An XML/XQuery database is fundamentally different, so I figured if FoundationDB layers are really so powerful, they might be able to model a tree DB as well.
In some distributed databases the client just connects to some machine in the cluster and tells it what it wants to do. You pay the extra latency as it redirects these requests where they should go.
In FDB's envisioned architecture, the "client" is usually a (stateless, higher layer) database node itself! So the client encompasses the first layer of distributed database technology, connects directly to services throughout the cluster, does reads directly (1xRTT happy path) from storage replicas, etc. It simulates read-your-writes ordering within a transaction, using a pretty complex data structure. It shares a lot of code with the rest of the database.
If you wanted, you could write a "FDB API service" over the client and connect to it with a thin client, reproducing the more conventional design (but you had better have a good async RPC system!)
> but you had better have a good async RPC system!
The microservices crew with their "our database is behind a REST/Thrift/gRPC/FizzBuzzWhatnot microservice" pattern is still catching up to the significance of this statement.
This might be a dumb question (from someone used to using blocking JBDC) but why is async RPC important in this case? Just trying to understand. And can gRPC not provide good async RPC?
I was referring to the trend of splitting up applications into highly distributed collections of services without addressing the fact that every point where they communicate over the network is a potential point of pathological failure (from blocking to duplicate-delivery etc). This tendency replaces highly reliable network protocols (i.e. the one you use to talk to your RDBMS) with ad hoc and frequently technically shoddy communication patterns, with minimal consideration for how it might fail in complex, distributed ways. While not always done wrong, a lot of microservice-ification efforts are quite hubristic in this area, and suffer for it over the long term.
Wouldn't layers be hard to be built on the server (since you have to also change the client) and slow to be built as a layer (since it will be another separate service) ?
I'm not sure what you are asking, but depending on their individual performance and security needs layers are usually either (a) libraries embedded into their clients, (b) services colocated with their clients, (c) services running in a separate tier, or (d) services co-located with fdbservers. In any of these cases they use the FoundationDB client to communicate with FoundationDB.
In case (c) or (d) how can a layer leverage the distributed facilities that FDB gives?
I mean if I have clients that connect to a "layer service" that is the one who talks to FDB, I have to manage "layer service" scalabily, fault tolerance etc... by myself.
Yes, and that's the main advantage of choosing (a) or (b). But it's not quite as hard as it sounds; since all your state is safely in fdb you "just" have to worry about load balancing a stateless service.
got it, what will you suggest to do something like that? a simple RPC with a good async framework I've read,
like what? an RPC service on top of Twisted for python, similar things in other languages?
Postgres operates great as a document store, btw. You don’t really need mongo at all. And if you need to distribute because you’ve outgrown what you can do one a single postgres node, you don’t want to use mongo anyway.
If you’ve read any of the comments or have been following the project, it should be pretty obvious that this is far from rookie.
This is a game changer, not a hobby project. This is the first distributed data store that offers enough safety and a good enough understand of CAP theorem trade offs that it can be safely used as a primary data store.
Or perhaps it's not so incredible? Maybe it wasn't such a huge hit for Apple and didn't leave up to expectation so they figure they can give it away and earn some community goodwill.
The short version is that FDB is a massively scalable and fast transactional distributed database with some of the best testing and fault-tolerance on earth[1]. It’s in widespread production use at Apple and several other major companies.
But the really interesting part is that it provides an extremely efficient and low-level interface for any other system that needs to scalably store consistent state. At FoundationDB (the company) our initial push was to use this to write multiple different database frontends with different data models and query languages (a SQL database, a document database, etc.) which all stored their data in the same underlying system. A customer could then pick whichever one they wanted, or even pick a bunch of them and only have to worry about operating one distributed stateful thing.
But if anything, that’s too modest a vision! It’s trivial to implement the Zookeeper API on top of FoundationDB, so there’s another thing you don’t have to run. How about metadata storage for a distributed filesystem? Perfect use case. How about distributed task queues? Bring it on. How about replacing your Lucene/ElasticSearch index with something that actually scales and works? Great idea!
And this is why this move is actually genius for Apple too. There are a hundred such layers that could be written, SHOULD be written. But Apple is a focused company, and there’s no reason they should write them all themselves. Each one that the community produces, however, will help Apple to further leverage their investment in FoundationDB. It’s really smart.
I could talk about this system for ages, and am happy to answer questions in this thread. But for now, HUGE congratulations to the FDB team at Apple and HUGE thanks to the executives and other stakeholders who made this happen.
Now I’m going to go think about what layers I want to build…
[1] Yes, yes, we ran Jepsen on it ourselves and found no problems. In fact, our everyday testing was way more brutal than Jepsen, I gave a talk about it here: https://www.youtube.com/watch?v=4fFDFbi3toc