
How LaunchDarkly Serves Over 4B Feature Flags Daily - aechsten
http://stackshare.io/launchdarkly/how-launchdarkly-serves-over-4-billion-feature-flags-daily
======
jessaustin
_MongoDB -- as our core application data store. It 's popular to make fun of
Mongo these days, but we've found it to be a great database technology as long
as you don't store too many things in it. Anything you can count on your
fingers and toes should be fine._

I suspect this is a joke, but I'm still confused? If 4B things happen every
day, how is that finger-toe countable? Does this mean that relatively slowly-
changing things like "customer name" are stored in mongo while events etc. are
stored elsewhere?

~~~
jkodumal
I was being cheeky :)

Events are stored in DynamoDB. We use mongo for "core" data like accounts,
feature flags, etc, but none of the high throughput stuff touches MongoDB.

~~~
koolba
> Events are stored in DynamoDB. We use mongo for "core" data like accounts,
> feature flags, etc, but none of the high throughput stuff touches MongoDB.

Curious, why the choice of MongoDB for anything? In my mind "core" data tends
to be very structured and require true transactional updates, both of which
are cornerstones of a relational database ( _cough Postgres_ ). Sure it's
possible to use MongoDB, but what was the driver for it?

~~~
jkodumal
We had a lot of operational experience with Mongo from our previous jobs at
Atlassian, and started out storing almost everything in Mongo. As we scaled
out, we migrated all of our high volume data into other stores-- analytics
into DynamoDB, searchable data into ElasticSearch, etc.

The last remaining piece is our core data, and there's no strong push to move
it out of Mongo.

~~~
kornish
So, just to clarify, you use DynamoDB and ElasticSearch as primary datastores
for those types of data and don't replicate data from Mongo into them?

~~~
jkodumal
That's mostly right. We don't use ES as a primary datastore, though.

------
jonaf
Are there any good resources for managing feature flags like this? I am
currently using Netflix Archaius and it's working great, but I find it's easy
to forget what values are set or what flags exist. Over time, it's easy to
make a mistake and deploy something broken. I'm currently working on
instituting a kanban policy to replace feature flags with static
configurations ~1-2 months after their initial deployment where possible. But
I'm thinking there's a better way to avoid this trap.

------
siliconc0w
It's a pretty question architecture that outsources feature flags to a third
party. You're creating a critical path hard dependency and for most apps this
is going to be hit multiple times on every request and is something you're
going to want to run locally.

~~~
jkodumal
We've thought quite a bit about how to make this work as a service. The key to
our architecture is that evaluating a feature flag for a user does not involve
a remote call. We make that work by embedding a rule evaluation engine in our
SDKs. When you request a flag, the user is compared against these rules (in
memory) and served the appropriate variation.

We then use a streaming API to serve rule changes, so when you make a change
to your dashboard, the new rules are streamed to all your backend servers
within a few hundred milliseconds.

If you need even more resiliency, you can deploy a small service in your own
infrastructure ([https://github.com/launchdarkly/ld-
relay](https://github.com/launchdarkly/ld-relay)) that allows you to persist
these flag configurations in Redis.

~~~
rch
Your rules engine sounds interesting - definitely worth a closer look.

------
edoceo
My logs show frequent timeouts when looking up setting through their API.
Their SDK default timeout is 3s. Wouldn't want to wait longer. Maybe not
serving all 4B

------
chatmasta
Is averaging 4.6k requests per second really that much?

~~~
jkodumal
I think it depends on the workload. Serving 4.6k static pages per second,
cached on a CDN, is not too difficult. However, handling an analytics workload
of 4.6K RPS is a little harder.

~~~
wsh91
Albeit doable with Lambda and Cassandra. ;)

~~~
jkodumal
I mentioned this briefly in the article, but we thought about doing something
with Lambda + API Gateway. But doing the math, 5k RPS pushed through API
Gateway is about $1500 daily _just_ to authenticate.

------
blibble
that's almost 512mb of 0/1 bits a day!

(not serious, I suspect they're wrapped in a 5kb HTTP request)

