
How Discord Stores Billions of Messages Using Cassandra - jhgg
https://blog.discordapp.com/how-discord-stores-billions-of-messages-7fa6ec7ee4c7#.dzqq7q4o7
======
niftich
These kinds of write-ups offer valuable insight into a popular project's
requirements and decision-making, and are some of the most instructive
resources one can find: these show not only the kinds of challenges one has to
face at scale, but also how architectural choices are made.

It's far more valuable to understand _why_ Discord uses Cassandra than to
merely be aware they do.

Out of curiosity, did you consider HBase and Riak? Did you entertain going
fully hosted with Bigtable? If so, what criteria resulted in Cassandra winning
out?

~~~
Vishnevskiy
Let me take a stab at that.

Riak is not a good model since its more a blob store and we wanted to simply
range scan through messages rather than sharding blobs (Cassandra is REALLY
good at this).

HBase would have been fine for this model, but the open source version of
HBase has much lower adoption than Cassandra so that was a big factor. We also
don't care about consistency and HBase is a CP database, we prefer AP for this
use case. As far as using GCP's BigTable (HBase compat), we made this decision
before we moved to GCP, but we are also not fans of using platform lock-in.
While BigTable has the same API as HBase we would hate to go to an less widely
adopted version where we have a hard time getting community support if we
decided to leave GCP.

Hope that helps.

~~~
runeks
> As far as using GCP's BigTable (HBase compat), we made this decision before
> we moved to GCP, but we are also not fans of using platform lock-in.

Did you consider GCP Datastore as well?

It has strong consistency for a single "entity group", but eventual
consistency for queries on multiple entity groups.

So by storing data only relevant to a single user in an entity group, you can
have strongly consistent, atomic transactions on that group (albeit limited to
1 tx/s), and at the same time do global queries on all user data with eventual
consistency.

~~~
Vishnevskiy
The pricing model does not fit our needs, and that is even more locked in than
the BigTable variant.

~~~
runeks
I'm happy to hear you dropped it for non-technical reasons, since I'm asking
because I've chosen Datastore for an app because I care less about vendor
lock-in than ease of operation, and it fits my pricing model perfectly, due to
the app in question receiving (Bitcoin) payments that are charged a fee on a
per-request/payment basis.

 _Hint: if you have technical reasons for avoiding GCP Datastore I 'd be very
interested in hearing about them_

~~~
threeseed
Google Cloud is the least geo-distributed provider around. Which is a major
problem if your use case has requirements around (a) latency and (b) data
locality due to legal requirements.

In 2017 they will finally have datacenters in Sydney, London, Singapore,
Frankfurt etc.

~~~
manigandham
This is one area where Azure is leading with both Azure SQL and DocumentDB
supporting geo-replication.

~~~
merb
nope since azure is extremly expensive and also you need several accounts for
different regions. i.e. you can't create servers in germany with a whole new
account / credits / support.

~~~
manigandham
This is about capabilities, not price. Azure Germany is the only one that
requires a different account due to German legal issues. The rest of the
datacenters are all connected from the same account.

------
jakebasile
I use Discord a fair amount, and something that annoys me about it is that
everyone has their own server.

I realize this is a key part of the product, but the way I tend to use it is
split into two modes:

\- I hang out on a primary server with a few friends. We use it when we play
games together.

\- I get invited to someone else's server when I join up with them in a game.

The former use case is fine but the latter annoys me. I end up having N extra
servers on my Discord client that I'll likely never use again. I get pings
from their silly bot channels (seemingly even if I turn notifications off for
that server/channel), and I show up in their member lists until I remove
myself.

I wish there was a way to accept an invite as "temporary", so that it
automatically goes away when I leave or shut down Discord. Maybe keep a
history somewhere if I want to go back (and the invite is still valid).

Aside from that, it's a great product and really cleaned up the gamer-focused
voice chat landscape. It confuses me that people will still use things like
TeamSpeak or (god help you) Ventrilo when you can get a server on Discord for
free with far better features.

Now that I posted this, I realize this has little to do with TFA. Sorry.

edit: formatting, apology

~~~
Vishnevskiy
We have plans to make using temporary sessions in games a much better
experience so look out for that in the future.

~~~
Lutin
There's also a "Grant temporary membership" option when creating the invite
that will automatically kick users when they disconnect unless a role has been
assigned to them, but having that as an option when accepted would be cool.

------
ve55
Discord seems to me like it has a very polished user experience, and it's no
surprise that users are trashing programs like Skype in favor of Discord when
it is better in every area.

Discord seems to take security seriously, as they should, but I'm curious
about their stance on privacy and openness. For example, I wonder if they
would consider:

\- Allowing end-to-end encryption to be used between users for private
communications

\- Allowing users to connect to Discord servers using IRC or other clients
(or, at least having an API that easily allows this)[1]

\- Allow users to have better control over their own data, such as providing
local/downloadable logs so that they can search or otherwise use logs
themselves

Discord is definitely succeeding within the gaming market, but I'm curious
what other markets they would like to take a stab at.

[1] I'm aware Discord has an API, but if I understand it correctly, normal
users cannot easily use Discord from anything other the official Discord apps,
as this API is specifically for Discord 'bots'. I see there's a discord-irc
bridge, but not much more than that. I may be incorrect on this.

~~~
doctorpangloss
> Discord seems to take security seriously

Any app that has voice turned on whenever it detects sound by default, without
prompting the user on installation, doesn't take security seriously.

I mean, unless you expect a communications app, running in the background, to
share the conversation you're having in your room, without telling you, with
everyone in every channel, until you discover it in your user preferences.

(I'm going to assume you're going to misunderstand what the issue here is. It
listens by default, like when you install, and you're not prompted that it's
the default. Contrary to every other communications or microphone app in
existence, save for ones that are designed to spy on people).

~~~
hug
I don't think this is a "security" issue as much as it's a usability or
privacy issue, and I don't think it's an example of Discord being evil.

For a start, it's not quite "on install", but after joining your first voice
channel. The issue comes from the interaction of a series of reasonable steps
that on the whole result in an unfortunate experience for some people. The
problematic series:

* By default, Discord uses voice detection to determine when you're speaking, as opposed to push-to-talk. This feature makes perfect sense.

* By default, Discord configures itself to start up on login. This feature makes sense. (I personally immediately turn that option off, but I don't resent its inclusion.)

* When started, Discord rejoins any voice channel you were in when Discord was last exited. This feature also makes perfect sense. [Edit: Apparently this is no longer true, and Discord will only rejoin the channel if you were active within the past 5 minutes.]

Essentially the result of these design decisions in series is [edit: was] that
if you install & use Discord, and fail to manually disconnect from your voice
channel, next time you start your computer Discord will automatically join
your last channel and broadcast any loud enough audio in the same room as your
computer to the voice channel.

There are a few mitigating factors, too: Discord is pretty obviously open and
on the screen when this happens, and it does show your active voice channel,
and it does show an activity indicator when you're broadcasting.

~~~
notnight
> Essentially the result of these design decisions in series is that if you
> install & use Discord, and fail to manually disconnect from your voice
> channel, next time you start your computer Discord will automatically join
> your last channel and broadcast any loud enough audio in the same room as
> your computer to the voice channel.

It's worth noting that Discord no longer does this if you've been away from
the voice channel for more than 5 minutes. The feature was intended to
autoreconnect you when the app was restarted due to updates and such, not to
cause people to accidentally broadcast themselves on system start.

~~~
hug
Ah, excellent. I wasn't aware they'd added a timer to the channel
reconnection. That's an elegant way of solving the problem without
compromising the important part of the features.

------
maktouch
It's really interesting to see that you're using Cassandra for this. IIRC,
Cassandra was created by Facebook for their messaging, and realized that
eventual consistency was a bad model for chat, so they moved to HBase instead.
(source: [http://highscalability.com/blog/2010/11/16/facebooks-new-
rea...](http://highscalability.com/blog/2010/11/16/facebooks-new-real-time-
messaging-system-hbase-to-store-135.html))

The tombstone issue was really interesting ! Thanks for sharing.

~~~
jjirsa
You can have strong consistency in Cassandra - ING gave a talk at Cassandra
summit 2015 on their multi-DC Strongly consistent use cases

Cassandra let's you choose - per query - how many replicas must ack the query.
Strong consistency is just a query parameter away.

~~~
nvarsj
Need to be careful about the wording, "strong consistency". I dislike that
datastax uses that in their documentation, because it's misleading and really
confuses people. There is no commit protocol in place - Cassandra is still an
AP db under the hood, so even having multiple replicas acknowledge doesn't
mean the data is consistent. For that you need paxos or something similar.
This becomes very obvious if you are doing updates from multiple sources to
the same key.

~~~
jjirsa
You do realize that Cassandra implements paxos and has a CAS system right?

------
flyingramen
It is fascinating that more and more people are using Cassandra. DataStax
believes they have fixed problems with prior guarantees claims that were
exposed by Jepsen. But there has been no official Jepsen testing since.

On the topic of looking at Scylla next, I wonder why did the team not just
start out with it to begin with. Also, are they people with experience running
both. How is the performance? And what is the state of reliability?

~~~
Vishnevskiy
The problems that Jepsen found were centered around the "transactions" feature
that Cassandra added. We don't use these and don't need them since we don't
need 100% consistency and prefer availability (for example we read at quorum
to trigger read repair, but downgrade to single node reads if we need to).

Also ScyllaDB is a new product and it would be crazy to start off with it. We
plan to run a long-term double write experiment before we are comfortable with
using it as a primary data store.

~~~
smharris65
The Jepsen tests were not completely centered around transactions. It also had
to do with data loss when replicas go down and pure "last-write-wins"
approach. For those wanting more info around this the original post is here:

[https://aphyr.com/posts/294-jepsen-
cassandra](https://aphyr.com/posts/294-jepsen-cassandra)

~~~
Vishnevskiy
Last-write-wins is a semantic we are okay with for this data, and dealing with
one of our conflicts is outlined the article.

~~~
smharris65
I enjoyed your article and I do appreciate your transparency!

------
pilif
_> While Cassandra has schemas not unlike a relational database, they are
cheap to alter and do not impose any temporary performance impact_

in most relational databases, the schema is cheap to alter and does not impose
a temporary performance impact.

In-fact, all of their requirements (aside of linear scalability) could also be
met with a relational database. Doing so would gain you much more flexible
access to querying for various reports and it would reduce the engineering
effort required for retrieval of data as they add more features (relational
databases are really good at being queried for arbitrary constraints).

I think people tend to dismiss relational databases a bit too quickly these
days.

~~~
threeseed
a) I'm not aware of any relational database that can alter a schema in real
time on a hot table with billions of records.

b) You were quite okay to just dismiss scalability there except that's the
most important requirement for a company such as this. People don't just
choose Cassandra lightly given how significant its tradeoffs are.

c) Most companies are offloading analytics/reporting workloads into
Hadoop/Spark and then exporting the results back to their EDW. This allows for
far more functionality and keeps your primary data store free from adhoc
internal workloads.

d) Nobody dismisses relational databases quickly. In almost all cases they are
the first choice because they are so well understood. The issue is that most
of them do have issues with linear scalability and the cost to support them
quite prohibitive e.g. Teradata, Oracle.

~~~
ianamartin
I sort of agree with threeseed and and the GP comment, so upvotes to both of
you.

1) Altering schema is vague. It was used vaguely in the article (although,
given the clarity of the article, I suspect the authors knew exactly what they
meant). Some alterations on relational database tables are fine, even when hot
and have billions of rows. Others are not.

Add a new column: fine. Index the new column: fine. Create a new index on a
column with billions of rows: definitely not fine.

But the index plan described in the article was very specific about what they
wanted. It doesn't sound like they had to add any new indices.

2) Mostly agree here. Linear scalability is a big deal here, and it's fucking
hard to do well for most RDBMS systems. I slightly disagree, however, because
the article explicitly states that the requirements are willing to trade C for
A in CAP theorem. This is important. The hardest parts of linear scaling in
RDBMS are enforcing C. Think transactional systems that absolutely must be
consistent. Like your bank account. This isn't that, and the blog post clearly
states it. Takes a lot of pressure off the relational database when it comes
to scaling.

3) Strongly disagree. Most companies don't have the resources or manpower to
do that. It takes a lot of time and a lot of effort. Hell, most companies
don't even have an EDW. Let alone a pipeline from the OLTP server to
Spark/Hadoop to the non-existent EDW.

4) We seem to run in different circles. Almost everyone I know dismisses
relational databases without question. Mongo is the way to go. And I get
called out as the resident old fart/luddite who insists on using postgres.
Speaking of which, if the first things you think of with relational DBs are
Teradata and Oracle, we are definitely operating in different contexts.

If your opinion is that relational databases are generally well understood by
--and therefore often the first choice for--developers . . . I want to know
where you work.

Because that's not a different context from where I am.

That's a different universe.

The reality is that storing and retrieving data is a hard problem, and there's
no set answer that works for everyone in every situation. If you're building a
new product from scratch, you should go with what your team knows, provided
that the team knows enough to not put yourself in the situation where you're
just losing data in a partition scenario (well-made point in the original
article. Mongo is fine on one node. Scale it out, and you might as well write
your data to /dev/null)

Almost any datastore will serve the needs of a new product until it needs to
scale horizontally. Relational, NoSQL, Object store, whatever. When it comes
to scaling linearly, you have to take factors into account.

1) Which part of CAP theorem are you willing to sacrifice? You always have to
let go of one.

1a) If you want a CP system, you have no choice but to deal with scaling
problems of relational databases. You must have transactional guarantees for
this to work.

1b) If you need an AP system, you have choices, but the choices lean in favor
of systems like cassandra. It's just easier than seting up multi-node postgres
and doing sharding.

It's also worth pointing out that people very often dismiss vertical scaling
too soon. Take a look at Joel Spolsky's articles about infrastructure at
StackOverflow. You can do quite a lot with the available firepower of modern
technology by just buying bigger and better hardware.

I'm not suggesting that going bigger would have been the right choice for
Discord. But sometimes it can be the right choice.

If there's something I fundamentally disagree with about the article, it's
this: trying to do everything in a single data store. I think--much like what
you suggested above--that it's better to have separate systems for reading and
writing. Since the use case is definitively AP, I can't see a reason not to
have a transactional system in an RDBMS and a streaming pipeline to a
cassandra cluster for reading.

Use the right tools for the right job, is basically my point.

~~~
jjirsa
> The hardest parts of linear scaling in RDBMS are enforcing C.

The hardest parts of linear scaling in RDBMS is actually doing the scaling -
it's "what do I do when I'm about to outgrow a master and need to add a bunch
of capacity", and "what do I do when the master crashes". At Crowdstrike we
would add 60-80 servers to a cassandra cluster AT A TIME, no downtime, no
application changes, no extra work on our side - just bootstrap them in, they
copy their data, and they answer queries. The tooling to do that in an RDBMS
world probably exists at FB/Tumblr/YouTube, and almost nowhere else.

> Think transactional systems that absolutely must be consistent. Like your
> bank account

Most banks use eventual consistency, with running ledgers reconciled over
time.

> It takes a lot of time and a lot of effort. Hell, most companies don't even
> have an EDW. Let alone a pipeline from the OLTP server to Spark/Hadoop to
> the non-existent EDW.

In the cassandra world, it's incredibly common to setup an extra pseudo-
datacenter, which is only queried by analytics platforms (spark et al). Much
less work, and doesn't impact OLTP side.

> 1a) If you want a CP system, you have no choice but to deal with scaling
> problems of relational databases. You must have transactional guarantees for
> this to work.

This is fundamentally untrue - you can query cassandra with
ConsistencyLevel:ALL and get strong consistency on every query (and
UnavailableException anytime the network is partitioned or a single node is
offline). Better still, you can read and write with ConsistencyLevel:Quorum
and get strong consistency and still tolerate a single node failure in most
common configs.

> Use the right tools for the right job, is basically my point.

And this is the real point, with the caveat that you need to know all the
tools in order to choose the right one.

~~~
ianamartin
The fuck are you even talking about?

1) scaling is easy . . . oh casandra. Where you can't have C and don't care
about P.

2) Let me tell you about banks. I used to work for banks. Banks do not use
systems that are eventually consistent. Banks use systems--however old and
outmoded--that are strongly consistent. Banks do not use systems that are
eventually consistent except for ACH transfers. And that's not a database.
That's a flat file

3) There is no cassandra world that you speak of. This is utter bullshit.

4) No it's not untrue. Cow--as we call it on me team--absolutely sucks at C
when you're talking about scaling horizontally.

Make up your mind. Is this good at single node guarantees or is it good at
sharded guarantees?

Pick one.

We know for a fact that if you want CAP, you can't have all three. You can
have AP or CP, but you can't have all of them. If you're arguing that you can
have C and A, you have failed at P.

Maybe that's a thing you're willing to trade-off. But it doesn't in any way
relate to my point.

My point, if you missed it, was this: if you want strong consistency, you need
a relational database, and you need transactional guarantees.

That is hard to do, and no one does it well yet. You're just lying to people
if you say otherwise.

~~~
jjirsa
I don't know what your background is, but I'm really encouraged by the fact
that I've worked my whole career without having to deal with people that
behave like you.

> 1) scaling is easy . . . oh casandra. Where you can't have C and don't care
> about P.

This isn't about teaching me the CAP theorem. I know the CAP theorem. I know
the tradeoffs. I've built and managed systems that handle hundreds of billions
of events a day, writing millions of writes a second into a thousand cassandra
nodes. You can have C, if you want it - you dont get transactions with
rollbacks, but that doesn't mean you dont have consistency.

> 2) Let me tell you about banks. I used to work for banks. Banks do not use
> systems that are eventually consistent

All this time, I thought ING was a bank:
[https://www.youtube.com/watch?v=EiqdX23u_Mk](https://www.youtube.com/watch?v=EiqdX23u_Mk)

Also: [http://highscalability.com/blog/2013/5/1/myth-eric-brewer-
on...](http://highscalability.com/blog/2013/5/1/myth-eric-brewer-on-why-banks-
are-base-not-acid-availability.html)

> 3) There is no cassandra world that you speak of. This is utter bullshit.

I see, lame troll or wholly clueless. Guess I'm done.

------
alfg
Love Discord. Most of my friends and I have switched over from using Mumble
and it's been great.

I run a small Mumble host [1] and I've always thought of the idea of wrapping
the Mumble client and server APIs to function like Discord/Slack as an open
source alternative. Mumble is great and all, but the UI/UX appeal of Discord
is so much better.

Keep up the great work!

Also, is this is the same Stanislav of Guildwork? Ha, I remember when
Guildwork was being formed back in the FFXI days.

[1] [https://guildbit.com](https://guildbit.com)

~~~
Vishnevskiy
I am :) glad people remember me from FFXI haha

Add me on Discord if you ever wanna chat, Stanislav#7943

------
jjirsa
Wildly biased Cassandra person, but I find this very well written and
explained, and I'm especially happy that when you bumped into problems like
wide partition and tombstone memory pressure, you didn't just throw up your
hands, but you worked around it.

The wide partition memory problem should be fixed in 4.0, for what it's worth.

------
mahyarm
Discord missed an opportunity a year or two ago to become something like slack
for large companies. Hipchat's perf is horrible and slack couldn't scale to
+20k users a year ago. Managing a mattermost instance requires staff and is
more outage prone.

It's really too bad that they didn't take advantage of it, since they were
actually scalable compared to their competitors and had good voice chat. Slack
has started becoming more scalable recently, so I don't know how much the
opportunity is still there.

~~~
Vishnevskiy
We are larger than Slack

~~~
ceejayoz
In what metrics? Number of users? Number of paying users? Valuation?
Headcount?

~~~
sciurus
Seconding this question. Can you share the equivalent discord numbers for the
slack numbers at the link below?

[http://expandedramblings.com/index.php/slack-
statistics/](http://expandedramblings.com/index.php/slack-statistics/)

~~~
Vishnevskiy
We currently don't share exact metrics for all those stats, but we have shared
a few press releases and blog posts which you can easily extrapolate from. :)

------
sparrish
If you're deleting often, I recommend running a full compact (after your
repair) to free up space and rid yourself of those tombstones once and for
all. Repairs without compacts make those SSTables grow and grow. It's amazing
how much space a compact clears up.

~~~
coredog64
I had to delete a shitload of data from Cassandra recently and it required
dropping gc_grace_seconds to a very low value in order for the tombstoned
records to be dropped during compaction (this was mentioned in the article)

------
joaodlf
Not surprised to see other companies facing issues with Cassandra and
tombstones. Don't get me wrong, I understand the need for tombstones in a
distributed system like Cassandra... It doesn't make it any less of a pain
though :).

~~~
jjirsa
The tombstone problem described is due to misuse - probably from improper use
of prepared statements. Looks like they worked around it well.

~~~
dwenzek
> The tombstone problem described is due to misuse.

My concerns with Cassandra are precisely here: this is easy to misuse it.

There are a lot of constraints on the schema (more specifically on the design
of partition & clustering keys). Each choice leads to many restrictions on
what can be requested/updated/removed; and to different issues with tombstones
and GC.

The Discord's story is exactly what I experimented: a series of surprises, and
even really bad surprises in production. In both cases, the story ended with
an efficient system, but with by far more engineering work and rework than
initially planned.

~~~
krinchan
This article is extremely similar to almost every other "OMG! We used
Cassandra and it was nothing like a SQL database!" article by Netflix,
Spotify, and so many more. The fact that every single one contains the same 6
or 7 self-inflicted issues is pretty funny to me. I mean, I thought we lived
in the age of the Google Dev.

Cassandra does require you to know more of it's internals than most other data
stores. Unfortunately, the move to CQL and very SQL-like names for things that
are nothing at ALL like their SQL counterparts is not helping.

Also, our own personal death by tombstone: A developer who didn't even know
those existed checked in some logic that would write null into a column every
time a thing succeeded.

After that passed QA and went into production, all hell broke loose with
queries timing out everywhere. SUCH FUN.

------
cookiecaper
I'm one of the people who nagged you on the redis post, and particularly
expressed skepticism that such a transition would've been necessary. I haven't
read this yet, but I just want to say thanks for actually following up to that
thread and posting it. Looking forward to it!

\---------

EDIT: Just read the post, and while it provides a good perspective on
Discord's rationale to introduce Cassandra in the first place and does a great
job pointing out some unexpected pitfalls, it doesn't specifically respond to
replacing Redis with Cassandra due to clustering difficulty, per the prior
thread. [0] Redis is only specifically called out as something they "didn't
want to use", which I guess is probably the most honest answer.

The bucket logic applied to Cassandra seems like it could've been applied to
redis + a traditional permanent storage backend nearly as easily. The biggest
downside here would be crossing the boundary for cold data, but that's a
pretty common thing that we know lots of ways to address, right? And Cassandra
effectively has to do the same thing anyway, it just abstracts it away.

Again, I'm left wondering what specific value Cassandra brings to the table
that couldn't have been brought by applying equal-or-lesser effort to the
system they already had.

I also found it amusing that they're already contemplating the need to
transition to a datastore that runs on a non-garbage-collected platform.

[0]
[https://news.ycombinator.com/item?id=13368754](https://news.ycombinator.com/item?id=13368754)

~~~
Vishnevskiy
This post was about using Cassandra for message storage.

You are basically advocating for plugging 2 systems together, which out of the
box don't provide elasticity. Or we could just use Cassandra. It is a simpler
solution and Cassandra is not new tech. Aside from caching empty bucket
information we have nothing sitting in front of Cassandra. It works great and
the effort was minimal.

The Redis comment by jhgg was referring to our service which tracks read
states for each user. We might write about that later, but it's not as
interesting. The most interesting about that experience was reusing memory in
Go to avoid thrashing the GC.

We care about seamless elasticity for our services which Redis doesn't provide
out of the box except with Redis Cluster which does not seem to be wildly
adopted and forces routing to clients.

~~~
cookiecaper
Ah, I see. Thank you for clarifying that this is not the service to which jhgg
referred.

Obviously, I haven't addressed this problem in-depth and I don't really know
enough about the specifics to criticize the decision directly. It's completely
possible that Cassandra was the perfect fit here.

The previous thread was in the context of switching things up without strong
technical motivation. I said that actually, it _does_ seem easier to fix a
redis deployment than to write a microservice architecture backed by
Cassandra, and that I hope to hear more about a stable production ethos from
the tech community as a whole. There are a lot "moving on to $Sexy_New_Toy_Z"
posts, but not a lot of "we solved a problem that affected scaling our
systems, and instead of throwing the whole component away and starting over,
we actually did the hard work of fixing and optimizing it: here's how".

To address your specific complaints.

>You are basically advocating for plugging 2 systems together, which out of
the box don't provide elasticity.

I mean, again, without getting in-depth, I'm not advocating _anything_ (I feel
like I need a "This is not technical advice; if you have problems, consult a
local technician" disclaimer :P).

However, storing lots of randomly-accessed messages and maintaining reasonable
caches are not new problems. There are lots of potential solutions here.

And while Cassandra is not "new tech" in the JavaScript-framework-is-a-
grandpa-after-6-months sense, it's certainly "new tech" in the "I'm the system
of record for irreplaceable production data" sense.

Cassandra is also among the less-used of the NoSQL datastores, putting it in a
minority of a minority for battle-tested deployments. You mention Netflix and
someone other big name using it in production as part of your belief that it's
stable. This, I think, is part of the problem.

These big companies use these solutions because

a) they truly do have a scale that makes other solutions untenable (although
probably not the case with Cassandra itself);

b) they can easily afford the complex computing and labor resources needed to
run, repair, and maintain a large-scale cluster. Such burdens can be onerous
on smaller companies (esp. labor cost);

c) when they need a patch or when something starts going awry, they can pay
anyone whose willing to make the patch, their own team not excluded. Often the
BDFLs/founders of such projects end up _directly employed_ by these big
companies that adopt their tech.

"Netflix [or any other big tech name] uses it so we know it's stable" is a
giant misnomer, IMO.

None of this is to say that Cassandra isn't a good choice for this problem or
any other specific problem, because again, as a drive-by third-party I don't
know. But contrary to what the article states, it hardly seems like Cassandra
was the only thing that could've possibly fit the bill. I bet it could be done
well with a traditional SQL databases (which, from the body of the post that
identifies Discord as beginning on MongoDB and planning to move to something
Cassandra-ish later on, it doesn't sound like was ever tried or considered).

It's kind of like reading an RFP that was written by a a guy at a government
agency that already knew they really wanted to hire their brother-in-law's
firm. "Must have $EXTREMELY_SPECIFIC_FEATURE_X, because, well, we must! And it
just so happens that this specific formulation can only be provided by
$BIL_FIRM. What d'ya know."

>We care about seamless elasticity for our services which Redis doesn't
provide out of the box except with Redis Cluster which does not seem to be
wildly adopted and forces routing to clients.

First, you just admitted that Redis _actually does_ have that feature that
you're saying it doesn't have. "Redis Cluster" and "Redis" are the same thing.
"Redis Cluster" is part of normal redis and afaik, while it requires
additional configuration, it will automatically shard.

In any case, while I have no numbers, I would wager that Redis Cluster is more
widely used than Cassandra.

~~~
manigandham
Cassandra doesn't require all the data to live in RAM. At this scale, you need
disk-based data access.

------
beck5
Serious question, how do you backup a casandra database of that size. Do you
even back it up or just rely on the sharing to prevent dataloss?

~~~
Vishnevskiy
Cassandra has a snapshat command that creates a directory by symlinking files
that hold data (this is safe cause Cassandra files are immutable). Then you
just upload them to your backup storage. This is obviously for recovery
scenarios that are catastrophic.

Normally though since the data is replicated on 3 nodes, you can technically
loose a node completely and rebuild it from the other nodes.

------
Globz
I love Discord and use it on a daily basis, one of our main concern with my
gaming group is the voice latency compared to TS, Mumble or Ventrilo but this
is mainly due to the inability to host your own server.

One of the big missing feature we would like to have in Discord is the ability
to assign special permission to our groups leader so they can communicate over
voice chat to other other group leaders in other channels (global voice chat).

When we play PVP MMO's and have 40+ users all in the same channel calling
shots its impossible to coordinate properly.

What we normally do is split the group in 4 so 10 players in 4 different
channels and each group leaders are calling shots independently BUT can also
communicate via voice chat to other group leaders. Basically there's a global
voice chat for group leaders that no one else can hear but them.

Other than that Discord is amazing!

------
mastax
For a bit more information about the tombstone issue from the perspective of
the person who caused it:
[https://www.reddit.com/r/programming/comments/5oynbu/_/dcnxy...](https://www.reddit.com/r/programming/comments/5oynbu/_/dcnxy5s)

------
treenyc
I'm curious why would people use a closed source software, when you can use
something like [https://riot.im](https://riot.im)

Please let me know. I may be missing something.

~~~
eikenberry
Games vs general use. By targeting gamers Discord is able to customize the
experience to make it better for games.

~~~
Zaheer
Kind of like Twitch vs Youtube even

~~~
eikenberry
Exactly.

------
jolux
Discord is great but I have intermittent performance issues with it that make
it almost unusable in comparison to Slack which never has any noticeable
latency.

~~~
Vishnevskiy
This should not be the case, we strive for Discord be lightning fast at all
times.

Please email me stanislav@discordapp.com I would love more details.

~~~
jolux
I can't really give you "more details," I was trying to use Discord with some
people and every couple minutes or so the chat would freeze up and then flood
with messages. It wasn't just me, it was everyone in the room's Discord doing
it, and it really doesn't seem like a client-side bug.

------
glidek
> Having a large partition also means the data in it cannot be distributed
> around the cluster.

Why can't a large partition be distributed around the cluster?

~~~
manigandham
Cassandra uses consistent hashing. A partition is a segment of data identified
by the partition key to determine which node in the consistent hash ring owns
that data.

You cant break down partitions any further because it's just a name for the
smallest cohesive set of data owned by a hash key, so instead it's advisable
to use more partitions with data modeling rather than making them huge.

------
smaili
Does anyone know what protocol/transport Discord uses? XMPP, web sockets,
JSON, etc?

~~~
b1naryth1ef
Our realtime messaging is done over Websockets using either JSON (for web/non-
native) or ETF
([http://erlang.org/doc/apps/erts/erl_ext_dist.html](http://erlang.org/doc/apps/erts/erl_ext_dist.html)).
Almost all user-based actions are sent over our HTTP API using JSON.

~~~
lobster_johnson
Since Cassandra is eventually consistent, how do clients get a consistent
sequence of messages?

Do you actually use Cassandra range queries to poll for new messages, or do
clients use some kind of queue to get notified?

Say messages A, B, C are created in that sequence. But isn't it then possible
that a client asking for new messages only gets A and C, and B only shows up a
few milliseconds later, which would be missed unless the client actually asked
for "everything since before A"? Or is that not possible?

~~~
jhgg
We have a real time distribution system written in Erlang/Elixir that keeps
the clients in sync. We do not poll for messages. That wouldn't work at our
scale :)

------
simooooo
What's an upsert?

~~~
Vishnevskiy
Update if exists, otherwise insert.

------
lightedman
You're storing messages, how are you guaranteeing safety of those messages
when it looks like one can seemingly just blast through your API calls to find
messages when one isn't even on that server?

~~~
kbhn
A few seconds of research would have revealed that their API requires an
authentication token[1]

[1][https://discordapp.com/developers/docs/reference](https://discordapp.com/developers/docs/reference)

~~~
lightedman
A few seconds of pen-testing tells me their token revocation is pretty shoddy.
I'm now spying on my fiance's Discord chat.

Bravo to those that downvoted me without even bothering to use their brains
(let alone test something.)

~~~
Vishnevskiy
The normal token system revokes on password change, if you want to revoke and
have extra security we offer MFA login which has unique tokens per login. If
security is of importance to you then use MFA.

~~~
lightedman
Why are you not revoking tokens after session end across the board? Token re-
use is one of the faster-rising security breach factors now days.

------
marknadal
Wow! This is an incredible article. I do research and development for systems
like this at GUN, and this article nails a lot of important pieces.
Particularly there ability to jump to an old message quickly.

We built a prototype of a similar system that handled 100M+ messages a day for
about $10, 2 minute screen cast here:
[https://www.youtube.com/watch?v=x_WqBuEA7s8](https://www.youtube.com/watch?v=x_WqBuEA7s8)
. However, this was without FTS or Mentions tagging, so I want to explore some
thoughts here:

1\. The bucketing approach is what we did as well, it is quite effective.
However, warning to outsiders, this only effective for append-only data (like
chat apps, twitter, etc.) and not good for data that gets a lot of recurring
updates.

2\. The more indices you add, the more expensive it gets. If you are getting a
100M+ messages a day, and you then want to update the FTS index and mentions
index (user messages' index, hashtag index, etc.) you'll be doing
significantly more writes. And you'll notice that those writes are updates to
an index - this is the gotcha and will increase your cost.

3\. Our system by default backs up / replicates to S3, which is something they
mention they want to perhaps do in the future. This has huge perks to it,
including price reductions, fault tolerance, and less DevOps - which is
something they (and you) should value!

There backend team is amazingly small. These guys and gals seem exceptionally
talented and making smart decisions. I'm looking forward to the future post on
FTS!

