
FoundationDB Summit Program Announced - davelester
https://www.foundationdb.org/blog/foundationdb-summit-program-announced/
======
ryanworl
I’m very excited to hear about the new storage engine and Apple’s record
layer! Additionally, the lightning talk about backwards compatibility for
rolling upgrades would be a great addition. Those three would make
FoundationDB a much more obvious fit for the average application.

My talk is at 10:40! If anyone is attending and would be interested in meeting
up, my email is in my profile and my Twitter handle is the same as my HN
username.

------
innagadadavida
Apple internally uses Cassandra, HBase, Riak, Hadoop/Impala, Oracle, Siri
Search kv store, memcache, redis, MySQL, Postgres etc etc. Each of these
handle >100 TB in aggregate.

Considering these applications won’t be ported to FDB, why not develop a
translation later. This will also drive adoption of FDB.

Having smart people work on cool things is not sufficient, you also need them
to be working on solving high impact but boring problems.

~~~
asien
>Considering these applications won’t be ported to FDB, why not develop a
translation later. This will also drive adoption of FDB.

Writing a translation layer would be nice, "drop in" replacement can be an
overkill to drive growth.

That said the three biggest factors for adoption in my opinion are developer
experience, tooling and hosting.

If some FoundationDB enthusiasts made an elastic hosting service and some
dedicated tooling it would be help massively to compete with others NoSQL
Vendors.

~~~
solarengineer
What dedicated tooling would you like to see? What would an elastic hosting
experience feel like to you?

------
monstrado
I've been having enormous success using FDB for my POC. It's ability to do
atomic mutations is honestly game changing for our use case. Mandatory
transactions are also a lifesaver, as our previous implementation required
careful OCC.

~~~
dominotw
Curious what ur usecase is and what makes foundationDB particularly suited for
it.

~~~
monstrado
Real-time aggregations over a stream of data where we may have multiple
servers writing a partial aggregation to the same row. With FDB I can safely
read data, merge it with my in-memory copy, and then write the final result
back. That's only for our complex aggregations, such as HyperLogLog and
T-Digests. For the easier things like COUNT I can just use the ADD mutation.
For SUM of doubles, I can use APPEND_IF_FITS to keep each partial aggregation
as a "running log" of partial sums in a single row.

~~~
mping
That's pretty cool! You implemented HLL and TDigest on top of FDB? Or are you
storing some kind of blob and computing server-side? I did something similar
for Hadoop and MonetDB a long time ago

~~~
monstrado
Both the HLL (Algebird) and TDigest implementations we're using have a simple
way to serialize a compressed representation. So basically just reading the
row, merging the value currently stored, and writing the merged value back.

Depending on how many times you will write to the row, you could avoid having
to do a merge on write by using APPEND_IF_FITS and just merging the byte
arrays when you read.

It's nice that FDB gives you so much low level flexibility, you can do
whatever you feel fits your use case.

~~~
scaleout1
Hey man, thats pretty cool and we do exactly the same using Cassandra instead
of FDB. Since Cassandra doesnt support transaction at high volume (100K tps)
we do a shuffle so that all the same key do read/modify/write from the same
machine. It seems like with FDB you can get away with it as it supports
transactions? My question to you is what is the volume your system is
operating at? Also how does it work for skews? Lets say you need to update HLL
for a key that is heavily skewed, does your FDB transaction unwind fast enough
not to slow down the whole system?

~~~
monstrado
Great questions!

> what is the volume your system is operating at?

This varies, as our workload is dynamic in that anyone at any time can inject
a query for the data stream, but for this sake lets say 5k.

> Also how does it work for skews?

Foundation does a magnificent job automatically detecting and physically
relocating skew. However, to mitigate write skew, I use time bucketing
techniques where party of the key is a MURMUR3 hash of the minute_of_hour so
that heavy write loads can only affect a server for one minute. This has
helped with certain metrics.

> Lets say you need to update HLL for a key that is heavily skewed, does your
> FDB transaction unwind fast enough not to slow down the whole system?

There isn't really a concept of an HLL (or key) being heavily skewed. A key
lives on a single sever (or multiple, depending on replication). Essentially,
when I want to merge additional HLL content into one already store, I just
read it, deserialize it, merge it with the one I have and then write the
result back to FDB. Because of transactions I can ensure that nobody else is
doing the same exact thing I am doing. If there were...then mine (or their)
transaction would fail, and retry. The retry is important because it would
reattempt the same logic, except the result I got from the database would be
the merged result from somebody else. This allows you to ensure that
idempotent / atomic operations happen as you'd expect.

~~~
scaleout1
Thanks for the reply, got a few more additional questions for you :-)

Lets say you are counting distinct ips used by `users` using HLL. Lets say you
start getting DDOSed by certain users since I am assuming you are not doing s
shuffle before writing to FDB, you will be locking the user, reading HLL,
deserializing, merging and writing back to FDB from multiple machines which
will results in a lot of rejected transaction and retries. My question is
whether retries unwind fast enough or you will end up dropping data on the
floor as you will exhaust the retry count

~~~
monstrado
Turns out we are doing a shuffle :) - We're using Apache Flink for the
aggregation step (5 second window) which performs a merge on key before
writing the value out. So at the end of the day, we would only
read/deserialize/merge/write once every 5 seconds, that is of course assuming
we received data for the HLL aggregation.

However, due to the need for HA, we might run two or three clusters in
different AZs which means we might have a few servers writing a partial
aggregation to the same row, thus, the awesomeness of FDB plays a role.

That being said, our P99 latency writing to FDB is typically very low (few
ms). We're doing usually 4,000 - 5,000 transactions a second at any given
time.

------
tarlinian
The program includes talks about other layers that Apple is developing. Does
anyone know if they are planning on open sourcing any of those layers in the
future?

~~~
atombender
From the description, the "record layer" talk seems like it's about an example
POC and not a real project:

> This talk will provide a developer’s perspective building a new FoundationDB
> layer by describing the design and development of a record store that can
> provide semantics similar to a relational database. This example layer will
> provide the core functionality of a structured data store such as metadata
> management, indexing, and query planning.

The JanusGraph support is a real project, though.

------
john92
Considering one database to rule all is old..where does this stand?

[https://www.allthingsdistributed.com/2018/06/purpose-
built-d...](https://www.allthingsdistributed.com/2018/06/purpose-built-
databases-in-aws.html)

~~~
CodesInChaos
FoundationDB is a key-value store which supports transactions and scans over
key ranges. Then on top of that layer, you can add higher level abstractions,
like relational, document, graph databases. So it's similar to the storage
engine concept many database servers use.

------
as17237
Are the talks going to be recorded?

------
coldcode
Sadly there is no Swift API for FDB.

~~~
ryanworl
Swift bindings were added recently.

[https://github.com/FoundationDB/fdb-swift-
bindings](https://github.com/FoundationDB/fdb-swift-bindings)

