
FoundationDB Record Layer - davelester
https://www.foundationdb.org/blog/announcing-record-layer/
======
wwilson
This is very cool!

FoundationDB excites a lot of people because it's an extremely scalable and
extremely reliable distributed database that supports ACID transactions, and
which is both open-source and has Apple standing behind it. And yeah, all of
that is pretty nice.

But arguably the real power comes from the fact that it exposes a relatively
low-level data model that can then be wrapped in one or more stateless
"layers". All of these layers write to the same storage substrate, so you can
have your document database, your SQL database, your time-series database,
your consensus/coordination store, your distributed task queue, etc., etc.,
but you're only actually operating one stateful system. Your SREs will thank
you.

Writing these layers to be scalable and high-performance can be challenging,
but it looks like Apple is actively doing it and willing to release the
results to the rest of us. This also suggests that their previous open-
sourcing of the MongoDB-compatible document layer wasn't a one-off fluke. All
of this is very good news for everybody who needs to run databases in the real
world.

Full disclosure: I worked on FoundationDB a long, long time ago.

~~~
wwilson
Wow, from the post:

"Together, the Record Layer and FoundationDB form the backbone of Apple's
CloudKit. We wrote a paper describing how we built the Record Layer to run at
massive scale and how CloudKit uses it."

I think this is the first time that little detail has been publicly disclosed.

~~~
Scriptor
Isn't the code on GitHub?

------
abalone
Apple low key does some cool server projects with a Java bent. They've
contributed to Netty (well, they hired core developers).[1]

They've been basically put them to work reimplementing it in Swift.[2] It's
open and out there but not a lot of people paying attention. While it's still
early days I think there may a year where, suddenly, Swift on the server is a
super serious thing and all this work they've been doing on little old
CloudKit kind of takes over the world.

Just a fun prediction.. but it wouldn't be the first time Apple pulled
something like that.

I do like that Swift's non-tracing garbage collection model is well suited for
server apps. Rust is cool too but maybe Swift would be a little friendlier and
thus better suited to inherit Java's mantle. I mean can you just imagine if
Apple is slowly building up Swift to overtake Java on the server? That that's
one of their long game master plans? I know that sounds completely crazy.. It
just might work. They do run one of the biggest data center networks in the
world so they have a pretty good testbed and can justify a hefty R&D budget.

[1] [https://www.infoq.com/presentations/apple-
netty](https://www.infoq.com/presentations/apple-netty)

[2] [https://github.com/apple/swift-nio](https://github.com/apple/swift-nio)

~~~
ascagnel_
Apple is probably hoping to run Swift on _their_ servers. I don't foresee them
putting in the effort into enterprise sales and service to make Swift overtake
Java, though -- it hasn't really been their MO in the past.

~~~
abalone
They wouldn’t need to. They already partner with IBM for that stuff.

------
spullara
I had built a layer like this one for my startup Bagcheck called Havrobase[1]
(it was on top of HBase/Solr, here is the motivating blog post[2]) that
ultimately I put on top of MySQL/Solr and other stores. Later, when we started
Wavefront, I ported that layer to FDB and that still powers their metadata.
Really a good fit and very much like this record layer. I highly recommend
this approach for 24/7 services as you never need to have maintainence windows
for schema upgrades and the like.

[1]
[https://github.com/spullara/havrobase](https://github.com/spullara/havrobase)
[2] [https://javarants.com/havrobase-a-searchable-evolvable-
entit...](https://javarants.com/havrobase-a-searchable-evolvable-entity-store-
on-top-of-hbase-and-solr-d305f90a3eaa)

Initially at Wavefront we were using HBase for telemetry, Zookeeper for the
service mesh and MySQL Cluster for entity metadata. All that was moved on top
of FDB with 3 different layers that we developed.

I'm excited that this kind of database is now going to be available more
broadly and with the confidence that CloudKit is using the same technology
since to date implementing something like this was basically a DIY project.

~~~
thejerz
What were the pro's and con's of using FDB over HBase?

~~~
spullara
Several things caused us to move off of HBase:

1) Operationally, HBase is a nightmare whereas FDB is extremely easy to
operate. 2) HBase doesn't natively, or efficiently with extensions, support
transactions across rows. 3) GC makes HBase performance unpredictable whereas
FDB is written in C++. 4) HBase depends on Zookeeper and it is operationally
painful to support and we were replacing it with FDB also.

I don't think I will ever again use anything from the Hadoop ecosystem if I
can get away with it.

------
manigandham
Glad Apple is releasing all of this, I wonder what kickstarted it all?

The paper is rather interesting: [https://www.foundationdb.org/files/record-
layer-paper.pdf](https://www.foundationdb.org/files/record-layer-paper.pdf)

~~~
mastox
recruitment?

~~~
gshack
Surely not apple has an issue with it

~~~
ubershmekel
I'm sure they're good, but better is better.

------
lima
This might be the first good alternative to etcd for configuration stores that
need real-time updates.

Like Kubernetes.

Many Kubernetes scaling issues are etcd-related.

RethinkDB is dead-ish, and CockroachDB is treating their changefeeds as an
enterprise feature that requires a Kafka instance to stream to :(

~~~
PhilippGille
Is TiKV an alternative?

Short overview and maybe good to know it's becoming part of the CNCF:
https//www.cncf.io/blog/2018/08/28/cncf-to-host-tikv-in-the-sandbox/

Haven't worked with it myself yet, but maybe others can share their
experience?

There have also been some HN threads in the past, about TiDB at least.

~~~
c4pt0r
TiDB developer here. Yes, I think TiKV is an alternative to FDB. Compare to
FDB Record Layer, TiKV aims to provide a more atomic primitive, just including
Get/Set/Transaction in key-value layer, so users can build customized
distributed system around it. The main differences between TiKV/TiDB and FDB
are:

1\. TiKV uses Multi-Raft architecture, I think Raft provides more HA.

2\. TiKV's transaction model is inspired by Google Percolator, it's a
classical optimistic 2PC transaction model with MVCC support. I'm not a expert
of FDB, but I think different transaction models fit for different application
scenarios, TiKV's transaction model is good when your workload is mainly small
transactions and with a low conflict rate.

3\. TiDB is a full-featured SQL layer on top of TiKV, aims to provide a MySQL
compatible solution, you know, most of the TiDB users are migrated from the
MySQL, so the focus of TiDB will be how to be compatible with these legacy
MySQL-based applications. For example, how to read MySQL binlog and then
replay on TiDB in real time, let TiDB become a MySQL active replica, or how to
support complex SQL queries like distributed join or groupby, you know,
building a full-featured SQL optimizer is a huge project.

There are some case studies:

[https://pingcap.com/success-stories/](https://pingcap.com/success-stories/)

[https://pingcap.com/success-stories/tidb-in-meituan-
dianping...](https://pingcap.com/success-stories/tidb-in-meituan-dianping/)

There are some quick-start documents you can start with:

[https://pingcap.com/docs/op-guide/docker-
compose/](https://pingcap.com/docs/op-guide/docker-compose/)

[https://pingcap.com/docs/v2.0/op-guide/migration/#migrate-
da...](https://pingcap.com/docs/v2.0/op-guide/migration/#migrate-data-from-
mysql-to-tidb)

------
ryanworl
Congrats to the team at Apple for getting this released! They have had a busy
few months with getting the document layer released, the FDB Summit, and now
the record layer.

------
jwr
Very interesting. I've been looking closely at FoundationDB as a way forward
(to replace RethinkDB and Cassandra in existing systems). It's one of the few
contenders for a really interesting take on a distributed database.

I am not sure if I will use the record layer (I've been planning to write "my
layer" myself), but it will definitely be an interesting thing to look at.

~~~
azimmerlin
Fellow RethinkDB user here. I’ve been looking at Cassandra and FoundationDB as
replacements. I’m genuinely curious— what didn’t you like about Cassandra?

~~~
jwr
Cassandra

To be honest, I don't like _anything_ about Cassandra. Beginning with the
naming: back when I was trying to learn about Cassandra, I couldn't get past
the obscure and bizarre naming (super-columns?). When I dealt with systems
using it, I never quite understood how you can keep saying that "the later
timestamp wins" and speak of consistency with a straight face: in a
distributed system, there is no such thing as a "later timestamp". Or speak of
transactions which aren't really transactions at all.

Then I read the Jepsen reports about Cassandra. Yes, Cassandra has made
progress since then, but still.

I think of Cassandra as an outdated piece of technology at this point: we can
(and do) build better distributed databases today, with better consistency
guarantees, and proper transactions in case of FoundationDB. Cassandra was
designed for a specific use case and then outgrew its initial design, because
there was nothing else at the time. But I see no reason to stick with it any
longer.

Even now when you need massive multi-region scalability there is little to
choose from — if you want it to be open-source, there's pretty much only
FoundationDB left.

~~~
aseipp
FoundationDB does not support true geo-replicated multi-region distribution
the way Cassandra, Spanner, Cockroach, etc do, at least not without paying
huge latency/round trip costs. If you want to avoid that, the best you can
have is a separate failover region, and, with FoundationDB 6, you can get
closer-to-LAN latencies for failover deployments to separate regions (but only
one region) while retaining ACID semantics. You could build truly global geo-
distribution on top of it but that would have to be its own layer that
implements 2PC/Paxos or something between regions. Ultimately you have to pay
the toll somewhere in a truly consistent system like that if you want global
availability (unless you're Spanner and have incredible hardware engineering
that can be deployed across the globe).

Cassandra/Scylla are the only open source key value stores that do linear
scalability by simply adding nodes even in huge, geo-distributed settings as
far as I know, but they are ultimately AP systems. And Scylla just has absurd
performance compared to Cassandra or FoundationDB. You just have to know what
you're getting into. (But yes, ACID transactions are a good model for
developers, and truly FDB's linearizable transactions and high scalability
make it an obvious choice many CP systems, if you ask me.)

~~~
jwr
That is an excellent summary. There is no silver bullet and you can't have
your cake and eat it, too. The approach to multi-region that FoundationDB 6
takes suits my needs (I'm not Google) and I like the compromises they made.

Since most of what I do (or consult with) does not need massive performance,
I'd rather pick databases with compromises favoring consistency and
correctness. This is why I like what I see in FoundationDB so far.

------
bcx
I learned that basically all of Imessages and contacts are stored on
foundation DB, it's pretty great this is making it into opensource. Thanks
Apple!

~~~
ryanworl
Are you using FDB at Olark?

(Saw it in your profile)

~~~
bcx
No :) Just off the shelf DBs so far. But the FoundationDB guys are HS friends
;).

------
georgewfraser
Seeing this soon after the AWS “wire compatible with Mongo” kerfuffle, it
makes me think: it would be amazing if the cloud vendors would offer a managed
FDB service. An open-source, cloud-agnostic, horizontally scalable, document-
oriented transactional database would be an incredible tool. I know AWS is
going in the opposite direction these days with proprietary “wire compatible”
services but a guy can dream...

~~~
mcintyre1994
It superficially sounds a bit like Azure's CosmoDB - they say that'll scale
horizontally as much as you need, it's document-oriented, ACID transactions,
with SQL, Mongo and graph APIs. Obviously lacking badly the open-source and
cloud-agnostic. I wonder if there's a world where Microsoft and Apple could
work together to standardise something cloud-agnostic based on the best of
both.

------
devj
Few doubts:

1\. Any reason to write it in Java instead of C, C++, Rust, etc?

2\. Any reason to use Protobuf instead of Flatbuffers, Avro, etc?

3\. Can FoundationdDB be used with Apache Arrow?

~~~
all0c
The Record Layer is written in Java as it was designed to fit in with an
existing stack that was already primarily Java-based. You can read more about
how CloudKit uses the Record Layer in the preprint of the Record Layer paper:
[https://www.foundationdb.org/files/record-layer-
paper.pdf](https://www.foundationdb.org/files/record-layer-paper.pdf)

Excellent question regarding the choice to use Protocol Buffers. Firstly, as
mentioned in the paper released last year, CloudKit uses Protocol Buffers for
client-server intercommunication. As a result, there was already expertise
around protobuf, which is a good tie breaker when evaluating alternatives.
(Here's that paper, by the way:
[http://www.vldb.org/pvldb/vol11/p540-shraer.pdf](http://www.vldb.org/pvldb/vol11/p540-shraer.pdf))
Secondly, the Record Layer makes heavy use of Protocol Buffer descriptors,
which specify the field types and names within protobuf schemata, and dynamic
messages. Descriptors are used internally within the Record Layer to do things
like schema validation. (For example, if an index is defined on a specific
field, the descriptor can be checked to validate that that field exists in the
given record type.) Likewise, dynamic messages make it possible for
applications using the Record Layer to load their schema at run time by
reading it from storage. The FDBMetaDataStore allows the user to do exactly
that (while storing the schema persistently in FoundationDB):
[https://static.javadoc.io/org.foundationdb/fdb-record-
layer-...](https://static.javadoc.io/org.foundationdb/fdb-record-layer-
core/2.5.37.0/com/apple/foundationdb/record/provider/foundationdb/FDBMetaDataStore.html)

The Record Layer's data format is not compatible with the specification
specified by Apache Arrow, no.

~~~
devj
Thanks for your reply. Would be really helpful if you can share the following:

1\. Size of the CloudKit cluster and the number of RecordLayer instances. A
ratio would also be enough to get an approx. idea.

2\. How metadata changes involving field data type are being handled?

3\. How are relationships and therefore, foreign keys handled? Are any
referential actions like cascading deletes supported?

~~~
all0c
The Record Layer doesn't currently support foreign key constraints, so foreign
keys are more of an “design pattern” than a first-class feature. For example,
in a sample schema in the repository, an “Order” message has have a field
called “item_id” that points to the primary key of an “Item” message:
[https://github.com/FoundationDB/fdb-record-
layer/blob/792c95...](https://github.com/FoundationDB/fdb-record-
layer/blob/792c952a2e460ff00eead9900a289a9055cb9d6a/examples/src/main/proto/sample.proto#L60)
There isn't an automatic check to make sure the item exists, though, nor are
there cascading deletes. That being said, I don't think the architecture is
incompatible with that feature, so it would be a reasonable feature request.

There are some guidelines regarding field type changes in the schema evolution
guide: [https://foundationdb.github.io/fdb-record-
layer/SchemaEvolut...](https://foundationdb.github.io/fdb-record-
layer/SchemaEvolution.html#change-the-type-of-a-field-in-a-record-type) Most
data type changes are incompatible with either Protobuf's serialization format
or the FDB Tuple layer's serialization format (which the Record Layer users
for storing secondary indexes and primary keys). The general advice for type
changes (if there are existing data in your record stores) would instead be to
introduce a new field of the new type and deprecate the old one.

------
mbesto
Has anyone ever used FoundationDB and _not_ found it successful? All I read is
"it supports RDMS + NoSQL and can be distributed". So what use cases _doesn
't_ it solve?

~~~
ryanworl
The best way I can describe FoundationDB is it is like a file system. You can
do just about whatever you’d like with files and a file system, in theory. You
can implement just about any data model you can dream up in FDB.

But the current storage engine is not as well optimized as it could be.

It _does_ have scalability limits, although they’re not relevant for 99.9% of
use cases.

Upgrading a cluster to a new non-patch version will require a small (seconds)
amount of downtime. A mitigating factor there is upgrading _your client_
doesn’t have that limit, which is where all the interesting stuff is.

The minimum latency for a transaction is relatively high compared to systems
which acknowledge writes before syncing to disk or only after syncing to a
single disk.

I wouldn't say it doesn’t solve “use cases”. Rather, if you can live within
the limitations (which means you need to know what they are), you can reduce
the complexity and cost of designing a system for your use case by a lot.

Check out my talk from the FDB Summit for an example:
[https://youtu.be/SKcF3HPnYqg](https://youtu.be/SKcF3HPnYqg)

~~~
mbesto
Super helpful, thanks for the info.

------
pier25
So why would Apple be doing this now? Maybe preparing the terrain to enter the
cloud space and compete with Azure and AWS in a couple of years?

After all, it's no mystery Apple wants to expand their services revenue. Their
hardware revenue it's not growing as much as it used to.

------
mathnode
Does anyone know if FoundationDB is gaining ground over Cassandra at Apple?

~~~
nemothekid
I recall a couple years ago that it was rumored that Apple had bought FDB with
the intention of replacing Cassandra (and I think, at the time, Apple had the
largest Cassandra cluster ever known).

Combined with other statements in this thread, I think that may be true. I
remember reading once that iMessage used to be served by Cassandra, but now
its served by FDB.

This is all speculation though.

~~~
seidoger
The FDB Record Layer white paper [0], section 8.1, does open with:

> 8.1 New CloudKit Capabilities

> CloudKit was initially implemented using Cassandra as the underlying storage
> engine.

So it seems this is what happened, for CloudKit at least.

[0] [https://www.foundationdb.org/files/record-layer-
paper.pdf](https://www.foundationdb.org/files/record-layer-paper.pdf)

------
nschiefer
The preprint of the paper is now up on arXiv.org:
[https://arxiv.org/abs/1901.04452](https://arxiv.org/abs/1901.04452)

------
continuations
Does that mean FDB now supports secondary indexes?

If that's the case, how does FDB compare to ScyllaDB now that they both have
secondary indexes?

~~~
ryanworl
FDB does not automatically index your data, but you can write a layer (like
this one) to index your data.

In a transaction, you write a key like “users/1” with a value of “bob” and
then write another key like “users/bob/1” with no value. Then you can do a
range scan over the prefix “users/bob/“ and find all the primary keys. After
that you do individual gets for the keys in the PK index to retrieve the full
record if needed.

The comparison between the two is FDB “secondary indexes” are just like
anything else in FDB. Namely, you update them in transactions and they are
consistent immediately. Scylla does not AFAIK have this feature.

~~~
misframer
I wrote a blog post on how to implement secondary indexes using an ordered
key-value store a couple of years ago: [https://misfra.me/2017/01/18/how-to-
implement-secondary-inde...](https://misfra.me/2017/01/18/how-to-implement-
secondary-indexes/)

It would work with FoundationDB, RocksDB, etc. I actually learned these
techniques when I interned at FoundationDB but have used them the most with
other K-V systems.

------
gigatexal
This is super exciting. Can’t wait to have some time this weekend to play with
it.

------
Artemis2
This is powering CloudKit. Very cool!

------
akavel
Can someone please ELI5/executive summary to me what are the benefits of
FoundationDB? Assuming I know the basics of PostgreSQL and ElasticSearch? I
see some hype around it, but I can't understand what's the breakthrough. As a
helping question: can you maybe try to tell me who are the expected users of
it, vs. PSQL, ES? Or, when I should choose it over them? Also, what are its
disadvantages? (I suspect bigger complexity, and bigger cost/worse
effectiveness at small scale?) TIA!

~~~
dominotw
Its a distributed acid k/v layer that other models can be built on top of.

So you can build PostgreSQL, ElasticSearch on top of the foundationDB.

