
Quicksilver: Configuration Distribution at Internet Scale - migueldemoura
https://blog.cloudflare.com/introducing-quicksilver-configuration-distribution-at-internet-scale/
======
mwcampbell
As I was reading about the problems this system needs to solve and the issues
with Kyoto Tycoon, I thought that LMDB might be a good foundation for a
solution. So I was gratified to find out that Quicksilver indeed uses LMDB. I
gather Cloudflare has found that LMDB is indeed as reliable as it's advertised
to be.

Looking forward to the eventual open-source release of Quicksilver.

------
PaywallBuster
I was working on something similar one year ago for a personal project.

I simply persisted the config in Mysql and synced the data to redis. Then each
server had a redis replica locally to allow fast reads from OpenResty
(nginx+lua scripts).

The project never took off, it was just an MVP. But why would someone pick
Kyoto Tycoon instead?

------
ComputerGuru
The story of how they picked a storage engine ill-suited for their needs and
then used ridiculous and unsustainable workarounds to deal with its
shortcomings (including unsafe practices like relying on rebuilding the
databases, assuming it’s safe to turn off synchronous writes for KV storage
writes, etc) just confirms how shoddy engineering is at CloudFlare. That is
peak technical debt.

(It’s one thing to make a wrong choice, it’s another to think you can paper
over those mistakes and they’ll go away.)

~~~
elithrar
This is an extremely unconstructive comment. It's easy to be critical with
20/20 hindsight.

More constructively: what would you have picked back in _2011_ when Cloudflare
was getting off the ground? Ideally, it needs to have a memcached like
interface (for easy gets/puts from Lua + NGINX), still operate when
disconnected from upstreams (CDN POPs can have unreliable upstream conns), be
cheap/free in terms of CAPEX/licensing, and be optimized for (extremely) read-
heavy workloads. Strong consistency is less useful here.

"Technical debt" only debt after the fact. Most of the time, it's the result
of a series of (likely rational) trade-offs you've made given the current
state of your business.

~~~
ComputerGuru
You're right, my comment would have been more valuable if I'd included some
alternate technologies they could have used instead, but I believe that the
criticisms still stand regardless, because the approach itself is in question.
You don't find out _once you 've reached scale_ that the underpinnings for
your distributed KV platform are not actually free of global locks for writes!
That's something you verify and test for yourself before building your empire
upon it. If an alternative exists, you use it. If not, you need to build it.

To answer your question:

> what would you have picked back in _2011_ when Cloudflare was getting off
> the ground?

Not only "would have" but did pick and use (well before 2011) an abstraction
around SQLite because our team first evaluated our read vs write requirements
and found it to be an adequate option rather than going crazy trying to find a
nosql solution worthy of including on our resumes.

The Cloudflare article is somewhat skimpy on the details of their benchmark,
so these numbers are not an exact equivalent but, here, I just threw this
together: [https://github.com/mqudsi/sqlite-readers-
writers](https://github.com/mqudsi/sqlite-readers-writers)

P99.9 for reads with two writers is 2ms as compared to their 1215ms, and this
is with full ACID compliance and write synchronization.

You don't need to be an expert in _creating_ these systems, you just have to
be able to test and validate your architectural decisions before building upon
them. It takes only an hour or three to pick a library and write a similar
benchmark for any KV store you're interested in (although the exact benchmark
would have to be tweaked to match your expected needs).

(That said, lmdb is a great choice and I've recommended it here on HN
before... except they ended up replicating on top of it a lot of what SQLite
provides for free. Their transaction logs are almost like SQLite's WAL which I
used in my benchmark, except going with SQLite would have skipped the entire
second half of their Quicksilver solution since it they wouldn't need to
manually handle transaction logs, split payloads, and manually reassemble to
avoid fragmentation.)

