
FoundationDB 6.0 released, featuring multi-region support - davelester
https://www.foundationdb.org/blog/foundationdb-6-0-15-released/
======
monstrado
Congrats to the team on the release. Using FoundationDB has been one of the
most rock solid NoSQL experiences I've ever had, and I've used a lot. After
having a few months to hammer my cluster with fairly low level atomic
operations, I can confidently say this thing holds up to pretty much anything
you want to throw at it. Coming from the land of HBase and DynamoDB, It's
ability to automatically (and intelligently) repartition data based on write
throughput has been an administrative breakthrough for me.

Looking forward to additional use cases I can throw at this beast of a system.

Kudos to you guys!

~~~
th1nkdifferent
Could you add some details about your scale?

~~~
monstrado
Hey, I've written some comments regarding the use case in the past you can see
here:

[https://news.ycombinator.com/item?id=18305446](https://news.ycombinator.com/item?id=18305446)

------
ryanworl
To anyone who is on the fence about putting FoundationDB into production (or
at least evaluating it for their use cases), what is the number one thing you
think is missing or you're worried about?

i.e.

\- a SQL interface,

\- pre-packaged data structure libraries,

\- monitoring,

\- limitations of FoundationDB itself,

\- etc.

I'm working on a talk for the upcoming FoundationDB Summit and I'd love to
address some real-world questions or issues people have.

~~~
realreality
\- transaction size and duration limitations. I can almost understand the
limitation on large write transactions, but the same size limitation applies
to read transactions. If you’re doing a large range read, you may not know
whether your range will reach the 10MB limit, and thus raise an exception.

\- the storage backends seem less impressive than the marketing leads you to
believe. The default memory backend is obviously too limited to use in
production, and the “ssd” backend turns out just to be built on top of the
Btree code from SQLite. Besides that, the documentation warns against using
the ssd backend on macOS. Isn’t that a bit strange, considering who owns
foundationdb??

\- while testing, I found that it was impossible to shrink a cluster. If you
add a second storage node just to test that the distributed stuff works
correctly, you can’t reduce it back to a single node without destroying the
entire database and starting over. If it’s possible to run everything on one
node, it should be possible to shrink a cluster back to a single node.

\- the storage backends have a crazy amount of write amplification (something
like 3x, according to the docs). The foundationdb folks should focus on
improving the underlying storage, for instance by building on lmdb or RocksDB
or something. For my toy app, I abstracted my data access to use either lmdb
(for local testing) or foundationdb (for production), but I ultimately ended
up just using lmdb because I didn’t want to deal with fdb’s limitations and
operational unknowns.

\- another weird fdb limitation: the best single threaded latency you’ll get
is supposedly around 1ms for small reads. The docs suggest you can achieve
much better performance by scaling the cluster and number of clients. That may
be true, but some applications may want high single-threaded performance.
(Something like lmdb can achieve tens of thousands of reads per second)

~~~
jared2501
On shrinking a cluster: you'll want to use the fdbcli to "exclude" nodes.
Should be pretty straight forward (search the docs for the word "exclude").

On write amplification: a factor of 3x is not actually that unusual. The
default RocksDB size amplification is 2x, and I've seen performant LSM trees
with about 3x write amplification.

On the single threaded bottleneck: this is an inherent issue you have when you
put your database over a network connection. LMDB can do 10k/100k+ reads/sec
on a single thread since it's just doing syscalls. As soon as you start to
need to distribute your database across more than 1 machine you start to need
to parallelize you work for high throughput.

~~~
ddorian43
Scylladb/redis can also do a lot of calls with single thread/core.

~~~
ryanworl
FoundationDB single-core performance is fine. From my testing on the memory
engine (and the docs), you can expect 70k+ reads/second/core for small keys
and values. But crucially this means you must have _concurrency_ to drive
throughput.

No database can magically make your serial access pattern faster. Amdahl's law
and all that.

FoundationDB's latency for your specific workload is up to how good you are at
designing your algorithm for concurrency. If you do every step serially,
you'll be spending most of your time waiting for the network.

------
Rafuino
Shameless plug here, but if anyone wants to benchmark in-memory vs. NVMe NAND
SSD vs. NVMe Intel Optane DC SSD performance, we're looking for someone with
FoundationDB expertise to give it a shot and share their learnings with the
community. Make a request for a server by posting a new issue at our Github
page [1].

Basically, I'm curious to know how FDB's memory engine performs compared to
the SSD engine with a standard NAND SSD and an Intel Optane DC SSD. Something
along the lines of the throughput per core and latency results on the FDB
performance page [2].

[1]:
[https://github.com/AccelerateWithOptane/lab/issues](https://github.com/AccelerateWithOptane/lab/issues)
[2]:
[https://apple.github.io/foundationdb/performance.html](https://apple.github.io/foundationdb/performance.html)

Disclosure: I'm working at Intel and help manage our open source lab with our
friends at Packet.

~~~
whitepoplar
Only tangentially related, but are there any public Postgres benchmarks for
systems running high-CPU, high-memory, p4800x optane drives?

~~~
Rafuino
Would something like the TSBS [1] help with this? It's TimescaleDB but they're
built on Postgres. They have built-in high-CPU queries, but I haven't seen
high-memory before. Can you point me in the right direction? Otherwise, we've
had some Postgres people use the lab and are waiting on their decision whether
to share publicly.

[1]: [https://github.com/timescale/tsbs](https://github.com/timescale/tsbs)

------
throwaway5752
Anyone know how they're using it at Apple vs other distributed databases?

~~~
jared2501
A mate of mine is ex-apple and said they use it for icloud.

~~~
throwaway5752
Yes, but I understand they use Cassandra heavily
[https://www.techrepublic.com/article/apples-secret-nosql-
sau...](https://www.techrepublic.com/article/apples-secret-nosql-sauce-
includes-a-hefty-dose-of-cassandra/)), and I was curious about why they use
FoundationDB in some settings vs Cassandra in others. I can imagine a few good
technical reasons to use one or the other depending on the context, but
figured I'd ask in case anyone knew.

------
the_duke
So, FoundationDB is a pretty low level distributed key-value store with
transactions.

Most applications will need something higher level, like a SQL or document db
frontend which could be built on top.

I'm curios what people have started using FoundationDB for. Any interesting
stories to share?

~~~
ex3ndr
We have moved to FDB for our messaging platform. We had several options:

a) Rewrite SQL code. In our case we are using Node.JS and all SQL libraries
are very very slow. Even replacing one with another is enormous work. b)
Rewrite to a new language. It was also an option since querying Postgres can
take 1ms, but parsing response can easily take 100ms+. That trashed our event
loop and causes awful latency. c) Rewrite to high performance NoSQL database.

We picked a last one. In context of a Node.js we were able to write really
thin layer on top of FDB that works super-fast and in a way we needed.

In my previous startup we eventually ditched all SQL from our codebase too
since SQL databases is just too slow for low latency messaging apps. There are
no simple way to shard data, there are always random locks around your
database (which blocks connections). Locks are really hard to debug sometimes.
How to scale single SQL server? All of this is doable, but in FDB it was
basically free.

We migrated to FDB and got almost x100 improvement in latency/performance. And
unlike SQL-code that was very carefully crafted we can do nasty things. Like -
"hey, let's just pull this key every 100ms and check for a new value" or "hey,
let's do it on 10s of instances at the same time?". In this situations
Postgres started to consume all available CPU. You can easily creep out SQL
with a single instance of your app. We haven't managed this to do with FDB for
1/2 of the cost. We are often in situation when someone commit something with
a bug and, for example, started to pull data every millisecond in N^2 streams
where N is number of online users. In this situations we just can't see any
impact at all on our platform. Just spikes in monitoring.

FDB is wonderful thing - it allows you to forget about optimizing performance
of your queries, forget about managing backups and replication. It just works!

~~~
tshannon
100ms+!

Ouch. Is that just large amounts of data? Are you guys using pg.native?

~~~
ex3ndr
No, this is just say a list of messages, not that much of data. We tried to
move to pg-native, but it didn't help. Problem was in Sequelize. But in my
internal tests even Sequelize was the fastest library on the market.

~~~
th1nkdifferent
Looks like you’re blaming Postgres for Something that sounds like Sequelize’s
fault. You should try prototyping parts of your application in a language that
is better supported. Last I used Sequelize I was disappointed at how poorly it
fared compared to other libraries like Django ORM or SQLAlchemy.

------
ex3ndr
If someone want to chat about FoundationDB and want to ask about our (while
limited) experience building messaging on top of FDB, please feel free to join
our small room:
[https://app.openland.com/joinChannel/updnSlD](https://app.openland.com/joinChannel/updnSlD)

------
romed
The upgrade instructions are firmly in the lolwut category. The database is
just not available during the transition. I think they should focus on fixing
their protocol such that multiple versions can run at the same time, and
rollbacks are possible.

------
discoball
How does FDB compare to Spanner as far as the Consistency model and trade
offs?

~~~
ryanworl
FoundationDB and Spanner both offer external consistency. Spanner does this
through synchronized clocks. FoundationDB has a similar clock called
TimeKeeper, which is not a clock per se but a counter which advances
approximately 1M times per second. Transactions are ordered based on this
timestamp.

~~~
discoball
With a Lamport Clock (counter, logical clock) you could end up with the
following due to dependence on conflict resolution aka optimistic MVCC rather
than Wall Time (copies do from another HN):

“it is possible for a transaction C to be aborted because it conflicts with
another transaction B, but transaction B is also aborted because it conflicts
with A (on another resolver), so C "could have" been committed. When Alec
Grieser was an intern at FoundationDB he did some simulations showing that in
horrible worst cases this inaccuracy could significantly hurt performance. But
in practice I don't think there have been a lot of complaints about it.”

~~~
ryanworl
Yes, I think is fairly well known in the optimistic family of concurrency
control algorithms you can get into situations where aborts are not necessary.

[https://db.cs.cmu.edu/papers/2016/yu-
sigmod2016.pdf](https://db.cs.cmu.edu/papers/2016/yu-sigmod2016.pdf)

Section 3.4 of this paper covers another example of this.

------
thefounder
Is there any high level api/lib like we have for Google Datastore?

