
Moving persistent data out of Redis - samlambert
http://githubengineering.com/moving-persistent-data-out-of-redis/
======
antirez
Wow I'm learning today that Github used Redis for persistent data, now that
they moved away :-) Anyway very happy that Redis helped to run such an
important site. From the blog post it looks like that for certain things to
move away from Redis was hard even if they are very skilled with MySQL, this
is a good thing from the POV of Redis since it means that Redis allows to
model certain things easily. However they wanted to move away as an important
priority, so I wish to know why they wanted to move away so badly and how
Redis could be improved in order to serve better the users. If Redis was
better for their use case, they could have avoided to move to MySQL I guess.
Unfortunately the blog post is short of details on that regard, perhaps
because the blog post author(s) are too gentle to bash Redis after using it
for a long time.

~~~
rthrfrd
FWIW we went through a very similar process to that documented here by Github
(~3 months ago). It was entirely due to operational reasons and nothing to
with shortcomings in Redis itself. MySQL was the master record for 99% of our
data while Redis was the master record for the other 1% (as it happens it was
also a kind of activity stream). Having the single 'master' reference for our
data reduced complexity to a degree that it was worth running a less
computationally-efficient setup. We also have nowhere near Github's volume so
we did not have to do such significant re-architecting to make unification
possible.

Now we still use Redis for reading the activity streams and as LRU cache for
all sorts of data, but it is populated like all of our specialised slave-read
systems (elasticsearch, etc) by replicating from the MySQL log.

Hope that helps!

~~~
parthdesai
just asking for some info,

but how do you make sure that multiple of your db systems are in sync
(specifically interested in MySql and elasticsearch)?

Hope it's alright to ask you that.

~~~
rthrfrd
In the case of ES the short answer is; we don't. We have fault tolerance in
our replication system to guarantee eventual consistency instead. I would say
using ES as a consistent source of data isn't really playing to its strengths
so we don't use it that way. The consistency you want is determined at read
time: If you need consistency then hit MySQL, but for our use case that almost
never happens as eventual consistency is usually instantaneous enough.

Our other tool is to decouple lookup (which objects to fetch) and population
(what data to return for each object). You can mix and match, e.g. do a lookup
against an inconsistent ES but still get consistent objects by populating from
MySQL (or vice versa). As others have alluded to it depends entirely on the
requirements for the result set.

------
jhgg
We've recently had to move away from redis for persistent data storage at work
too - opting instead to write a service layer ontop of cassandra for storing
data.

Redis was tremendous in our journey up there - but one of the shortcomings is
that it isn't as easy to scale-up as cassandra is if you haven't designed your
system to scale-up on redis from when it was built (which we didn't) - instead
of re-architecting for a redis-cluster setup, we decided to move the component
to a clustered microservice written in go, that sits as a memory-cache & write
buffer infront of cassandra for hot, highly mutated data.

Would anyone be interested in a blog post about our struggles & journey?

~~~
cookiecaper
>instead of re-architecting for a redis-cluster setup, we decided to move the
component to a clustered microservice written in go, that sits as a memory-
cache & write buffer infront of cassandra for hot, highly mutated data.

Somehow setting up a Redis cluster and doing whatever you have to do to
distribute/shard your keys effectively (which afaik is not much) _does_ sound
a little more efficient than rewriting a clustered microservice in Go with a
Cassandra backend. Redis clustering is actually quite easy.

Forgive me if I seem grumpy. My recent experiences have caused the "We had a
minor issue, so we redid everything in a Totally Cool Super-Neato New Stack
That Integrates All The Hiring Manager's Favorite Buzzwords!" perspective to
become a bit grating.

Redis is one of the few new pieces of infrastructure over the last 10 years
that's truly deserving of its position.

~~~
jhgg
My post above describes the main reason for moving from redis - the fact that
data for inactive users doesn't need to be memory perpetually. :P

~~~
cookiecaper
Cool. I look forward to the post that reveals the unique properties of
Cassandra that ended up making it the most practical data store for your use
case.

I understand that Cassandra et al exist to solve real problems that someone
out there has experienced, and I seek to throw no shade on the great engineers
who make these fine products. I am, however, somewhat dubious that these niche
products are applicable in the vast majority of cases where they're deployed.
I strongly believe, and I think the data would bare out on this, that when it
gets down to brass tacks, most people are integrating such specialized tools
into generic products to either a) make life at the office more exciting; b)
beef up resume points for their next job application cycle; or c) both.

Someone in our company wrote a blog post pretending to justify the move to a
niche datastore. He's very proud of it and makes several spurious, nonsensical
justifications in it. The truth is that MySQL would've been many times more
practical along all axes, except the one this guy cares most about, which
involve his personal career ambitions.

This move was partially under the radar so objections couldn't be raised and
full backups were not properly arranged. It cost the company a lot of money
not only in time and infrastructure, but also in the recovery process that had
to be undertaken by real data experts (or nearest we had at the time, at
least) when the cluster was destroyed by one of his careless scripts. :)

Second nightmare, currently ongoing: shifting everything to docker/k8s, which,
for just one example among a _very_ long laundry list of complaints, only got
support for directly addressing app servers behind a load balancer _last
month, as a beta feature_ (in k8s nomenclature, that's "Version 1.5 has a beta
StatefulSets feature to make Pods in a ReplicaSet uniquely addressable from
inside the cluster! Don't forget to make a Headless Service and Persistent
Volume." Exhausted yet? Just wait.).

Why are we switching to something that lacks such basic functionality (we're
like 3 versions behind, so we can't use it)? If I told you, I'd have to kill
you, but it sure makes our resumes pretty.

I'm all for learning, experimentation, and doing things for fun. We are on
Hacker News after all. I guess I've just developed a taste for a stable
production ethos that, to co-opt a scriptural term, is not "blown about by
every wind of [tech fad]". I crave a company that makes its decisions based on
a significant and real cost-benefit analysis that shows substantial unique
benefits and sufficient maturity to a tech before jumping on the bandwagon. I
guess I just want some sanity.

As it stands, people just pretend that these justifications exist by making up
some mumbo-jumbo about "dude JavaScript on the backend is like really event-
driven, brah!"

------
activatedgeek
Can anybody here help me understand why many teams are using MySQL as a KV
store? (Uber did it recently, so assuming many others probably did it too,
network effect)

I personally love MySQL. Just want to understand what makes MySQL a great KV
store as opposed to more seemingly specialized systems like Redis?

~~~
toomuchtodo
> Just want to understand what makes MySQL a great KV store

Its not, it just happens to be good enough, which matters a lot for
operational expertise/costs/etc.

For example, you can store hundreds of millions of KV rows in an InnoDB table
and still have <1-3ms response times on queries, while having persistence
built in. Perfect is the enemy of good enough.

~~~
activatedgeek
Interesting!

------
epberry
Cool stuff. I found it especially interesting how they removed 30% of writes
with new logic to compose some timelines of events in other timelines. It's a
thought provoking optimization that calls to mind graph partitioning.

For example you have 10 people in your organization with various permissions
on repos. Some people (CTO let's say) can see every repo while others might
only be able to see some repos. Or you might have consultants or open source
projects which non-employees contribute to. Then you construct a graph where
each node is an contributor that is connected to other contributors by the
permissions they have on repos (or are the repos the nodes and the contributor
permissions the connections?). Finally you run a graph partitioning algorithm
where the number of partitions is the number of unique timelines you have to
write for an organization. Thinking about an organization with closer to 500
contributors I can see how this could reduce the number of timelines by 30%.

------
amalag
Activity streams are such a common use case. It is very interesting that
Soundcloud chose to do something different:
[https://developers.soundcloud.com/blog/roshi-a-crdt-
system-f...](https://developers.soundcloud.com/blog/roshi-a-crdt-system-for-
timestamped-events)

Assembling the inbox on demand is quite interesting. I don't quite understand
the querying and operations involved with Roshi for doing that.

------
matharmin
Working with large amounts of persistent data is hard. Limiting the
architecture to a single database system (MySQL in this case) generally makes
managing and scaling much easier, versus having to know/learn how to scale
multiple systems independently.

Even if Redis was a better fit for some of their use cases, it just makes it
much easier to not have the additional persistent database system to manage.

------
antman
Has Github stopped allowing free searching for code? All I get is "Must
include at least one user, organization, or repository". I think that its a
greater problem than the speed of its streams.

~~~
avian
I believe this restriction was put in place to limit searching for
accidentally committed AWS keys.

~~~
whatismyip
AWS keys are already being detected and the account owners get notified when
they are found. Private key material was a bigger issue at one point in time
(i.e dotfiles/.ssh/id_rsa).

------
coldcode
I wonder if this will allow better scaling of GitHub enterprise. We are
pegging our usage; if we could we would migrate everything to Gitlab
Enterprise (which we also have) which seems to have better scalability.

~~~
sytse
How can we help you move to GitLab EE? (As you indicated it scales to 100k
users so that shouldn't we the problem)

~~~
john832
Does GitLab really handle 100k users? Where is this indicated?

[https://gitlab.com/gitlab-org/gitlab-
ce/issues/26405#note_20...](https://gitlab.com/gitlab-org/gitlab-
ce/issues/26405#note_20968879)

~~~
sytse
Thanks for asking. The issue you referred to talks about users on a single
machine. Unlike GitHub you can run a cluster of application servers with
GitLab [https://about.gitlab.com/high-
availability/](https://about.gitlab.com/high-availability/)

Some of our users have 25k+ users on their cluster. We know GitLab can scale
to 100k users because we run GitLab Enterprise Edition without modifications
on GitLab.com

GitLab.com currently has much more than 100k users and the performance leaves
much to be desired [https://gitlab.com/gitlab-
com/infrastructure/issues/947](https://gitlab.com/gitlab-
com/infrastructure/issues/947)

But we're comfortable that you can run 100k users on a cluster of machines
without much tuning.

~~~
marcinkuzminski
can you share how many machines handles gitlab.com currently?

~~~
sytse
It is around 100 of machines. Commonly it is a couple of thousand active users
per application server, but your mileage may vary based on many things.

------
rch
> We changed up how writing to and reading from Redis keys worked for [the
> organization] timeline before even thinking about MySQL ... This resulted in
> a dramatic 65% reduction of the write operations in for this feature.

Interesting. Is there a comparison of overall performance between the
intermediate design (w/ Redis) and what they ended up with?

------
tschellenbach
Wonder why they didn't use Cassandra for this use case.

~~~
vacri
Last year I was setting up a trial of Cassandra for something, going through
the usual swearing of a new tool not quite working as expected (eg by default
picking a _random_ port for inter-node communication)... and the next desk
over, a non-tech colleague called Cassandra kept hearing me mutter angrily
about 'cassandra' and wondered what she'd done. Whoops :)

~~~
flurdy
Do you sit close to Ezekiel as well?

Though having someone sit near me not knowing what I am/we are working on
would surprise me. But it does happen especially if there are hotdesks nearby
for people from other offices to work on temporarily. I do swear loudly often
so probably not a good choice to have those too near me...

------
sandGorgon
its fairly unfortunate that they are tied to mysql, because this is pretty
much the usecase that postgresql jsonb was built for.

it has first class support in most ORM, and works quite well.

------
adrr
I don't know why people use redis as an LRU cache. Its a terrible LRU cache.
Its eviction algorithm isn't true LRU and does sampling which may cause new
keys to get incorrectly convicted. LRU is also really slow being single
threaded.

~~~
zaius
What should people use?

~~~
adrr
Memcached is a LRU cache.

~~~
eriknstr
I went and searched for memcached redis because up until now I haven't gotten
around to checking out the differences. Here are some things I found.

[http://antirez.com/news/94](http://antirez.com/news/94)

[http://stackoverflow.com/questions/10558465/memcached-vs-
red...](http://stackoverflow.com/questions/10558465/memcached-vs-redis)

[http://www.infoworld.com/article/3063161/application-
develop...](http://www.infoworld.com/article/3063161/application-
development/why-redis-beats-memcached-for-caching.html)

------
buremba
It would be nice if they can share some info about Github::KV.

------
aerialcombat
I love Redis!

