These kinds of write-ups offer valuable insight into a popular project's requirements and decision-making, and are some of the most instructive resources one can find: these show not only the kinds of challenges one has to face at scale, but also how architectural choices are made.
It's far more valuable to understand why Discord uses Cassandra than to merely be aware they do.
Out of curiosity, did you consider HBase and Riak? Did you entertain going fully hosted with Bigtable? If so, what criteria resulted in Cassandra winning out?
Riak is not a good model since its more a blob store and we wanted to simply range scan through messages rather than sharding blobs (Cassandra is REALLY good at this).
HBase would have been fine for this model, but the open source version of HBase has much lower adoption than Cassandra so that was a big factor. We also don't care about consistency and HBase is a CP database, we prefer AP for this use case. As far as using GCP's BigTable (HBase compat), we made this decision before we moved to GCP, but we are also not fans of using platform lock-in. While BigTable has the same API as HBase we would hate to go to an less widely adopted version where we have a hard time getting community support if we decided to leave GCP.
> As far as using GCP's BigTable (HBase compat), we made this decision before we moved to GCP, but we are also not fans of using platform lock-in.
Did you consider GCP Datastore as well?
It has strong consistency for a single "entity group", but eventual consistency for queries on multiple entity groups.
So by storing data only relevant to a single user in an entity group, you can have strongly consistent, atomic transactions on that group (albeit limited to 1 tx/s), and at the same time do global queries on all user data with eventual consistency.
I'm happy to hear you dropped it for non-technical reasons, since I'm asking because I've chosen Datastore for an app because I care less about vendor lock-in than ease of operation, and it fits my pricing model perfectly, due to the app in question receiving (Bitcoin) payments that are charged a fee on a per-request/payment basis.
Hint: if you have technical reasons for avoiding GCP Datastore I'd be very interested in hearing about them
Google Cloud is the least geo-distributed provider around. Which is a major problem if your use case has requirements around (a) latency and (b) data locality due to legal requirements.
In 2017 they will finally have datacenters in Sydney, London, Singapore, Frankfurt etc.
nope since azure is extremly expensive and also you need several accounts for different regions. i.e. you can't create servers in germany with a whole new account / credits / support.
This is about capabilities, not price. Azure Germany is the only one that requires a different account due to German legal issues. The rest of the datacenters are all connected from the same account.
That's because Azure in Germany is not offered by Microsoft but T-Systems. Microsoft just supplies the tech. For the other regions one account suffices (I'm not sure about China where they also use a partner).
> Riak is not a good model since its more a blob store and we wanted to simply range scan through messages rather than sharding blobs (Cassandra is REALLY good at this).
Can you tell a little bit more please? Range scan is done by using secondary indexes (index by timestamp) in our system. I'm not sure I understood the part about blobs or some things specific to Cassandra. Reply is highly appreciated.
Cassandra uses consistent hashing. A segment of data that is addressed by a key is called partition, found by the partition key. Partitions can contain just 1 "row" if you only use a single column as the key, or you can create a compound key with a part dedicated to finding the partition and the rest to finding several rows within that partition.
If you use a compound keys (multiple rows), these rows are all stored in the same partition (which all lives on the single node which owns or replicates that partition in the consistent hash ring), so scanning those rows is very fast and efficient.
> the open source version of HBase has much lower adoption than Cassandra so that was a big factor
Is this due to the availability of experienced developers or another factor?
PostgreSQL has a lower adoption rate than MySQL, but we chose it due to its suitability to the tasks at hand. As long as the adoption rate is not low enough to give concern about the longevity of a tool, I'm less concerned about it than other factors.
Well, relatively speaking Postgre might have lower adoption than MySQL, although I am not too sure about it. However, if you look at the absolute numbers, Postgre has huge adoption, even if it is smaller than MySQL's. So it doesn't really matter as chances are, you will able to find an experienced developer. Can't say the same about HBase, etc since there are significantly fewer projects requiring it compared to MySQL/Postgre.
I agree completely. It is frustrating that no decent books have been written regarding scaling architectures/strategies with current tooling. One has to scavenge various blog posts to try and discover ideas that might help solve their growth issues. I would love to see a book that covers scaling for app servers, RDBMSes, NoSql dbs, using queues/messaging effectively, etc. Failing that, I'd like to see something like Scalers at Work (a la Coders at Work) which would interview different devs who had to solve scaling issues.
Look for local meetups - in Seattle there's "Seattle Scalability" which is great for this sort of thing (and highscalability used to be great for this, too).
Not the parent, but the linked material is more foundational than the subject matter raised in the post. There is in fact an appreciable lack of good, battle-tested, non-secret, sometimes-but-not-necessarily anecdotal public info about the part of the design process where you have a working system doing fairly okay, but you know you're inches away from a very unpleasant wall. On fire [1].
It doesn't help that distributed systems are a dark art, that many open source and free-to-use tools that developers have access to gate the HA/clustering features behind steep pricing (though I sympathize it's one of the few effective ways to make money in open source), and that expertise with scaling is very often a competitive advantage.
The first part is mainly about erlang and the choices they made. But the last part is not at all specific to erlang and walk you all the way through all decisions to take to build that type of architecture.
I use Discord a fair amount, and something that annoys me about it is that everyone has their own server.
I realize this is a key part of the product, but the way I tend to use it is split into two modes:
- I hang out on a primary server with a few friends. We use it when we play games together.
- I get invited to someone else's server when I join up with them in a game.
The former use case is fine but the latter annoys me. I end up having N extra servers on my Discord client that I'll likely never use again. I get pings from their silly bot channels (seemingly even if I turn notifications off for that server/channel), and I show up in their member lists until I remove myself.
I wish there was a way to accept an invite as "temporary", so that it automatically goes away when I leave or shut down Discord. Maybe keep a history somewhere if I want to go back (and the invite is still valid).
Aside from that, it's a great product and really cleaned up the gamer-focused voice chat landscape. It confuses me that people will still use things like TeamSpeak or (god help you) Ventrilo when you can get a server on Discord for free with far better features.
Now that I posted this, I realize this has little to do with TFA. Sorry.
There's also a "Grant temporary membership" option when creating the invite that will automatically kick users when they disconnect unless a role has been assigned to them, but having that as an option when accepted would be cool.
It's possible to leave servers. I was also irritated by the exact behaviour you described, until I figured out how to leave servers. You can also mute all notifications from a server.
I convinced my friends to switch from Skype to Discord for this reason. I've had a few new groups every day and I would get calls all day long because if someone wanted to play they would call everyone.
I made a Discord group about a month ago and everyone I know is using it. If someone new wants to play, we add it to this group, so everyone is there.
Also we're not annoyed by calls anymore, as you only have to join the voice channel, instead of calling everyone.
Discord seems to me like it has a very polished user experience, and it's no surprise that users are trashing programs like Skype in favor of Discord when it is better in every area.
Discord seems to take security seriously, as they should, but I'm curious about their stance on privacy and openness.
For example, I wonder if they would consider:
- Allowing end-to-end encryption to be used between users for private communications
- Allowing users to connect to Discord servers using IRC or other clients (or, at least having an API that easily allows this)[1]
- Allow users to have better control over their own data, such as providing local/downloadable logs so that they can search or otherwise use logs themselves
Discord is definitely succeeding within the gaming market, but I'm curious what other markets they would like to take a stab at.
[1] I'm aware Discord has an API, but if I understand it correctly, normal users cannot easily use Discord from anything other the official Discord apps, as this API is specifically for Discord 'bots'. I see there's a discord-irc bridge, but not much more than that. I may be incorrect on this.
- E2E/OTR encryption is something some of us are interested in, but due to the nature of our platform probably isn't going to happen anytime soon (we'd want to do it right, which requires time and effort).
- Some libraries support connecting through user accounts, and there are various third-party tools for "linking" chat rooms, incl. some client plugins for irssi and such. We don't officially support it, but it's definitely possible.
- Search is currently live on our alpha-testing client, and should be rolling out globally soon. It's also possible to save or log channels through the API fairly easily.
There's a difference between end-to-end/client-side encryption and secure/encrypted backend storage.
I don't think anyone's commented on the backend security situation (I'd hope they'd have messages encrypted at rest, but it doesn't seem that encryption has been a priority), just that they don't do E2E.
But with a chat app the "classic" behaviour is as far as i know, to guarantee that each participant got all the message they ought to.
Thus what are those billions of messages they store in the database? Is it only a very detailed cache data for current conversation or is it hardwired to PRISM or a commercial database? Why on earth should they store so much chat log?
Or maybe i'm not just not award of the popularity of discord, but Billions of messages volumes make me wonder because as a comparison it's roughly iMessage worldwide per day payload.
So messages are probably stored longer than needed : how and why?
The point of our service is that chat is persistent. You can scroll back through time and read all the messages you sent. Users are free to delete whatever they sent whenever if they wish, but for almost everyone persistent chat history is a huge feature. Also important to note that as of the numbers we released last July we receive around 40 million messages a day. The public stats released about iMessage suggest that 2 billion messages are sent per day.
Can users at least opt-out of persistent chat history? Or define a timeframe after which message are deleted?
You are basically confirming that your company is storing a lot of personal data without user specific encryption. This is pretty scary and I hope you have some improvement about this situation on your roadmap. If not your are a "leak" away from a big problem.
Cool features are neats, but in 2016 privacy should not be seen as a secondary feature...
thanks for the informative response. I will look into how difficult it is to connect to a server using a user account from an IRC client, as that would make the experience much nicer for users like me.
I'm curious about the logging API permissions - it seems kind of weird that I could potentially join someone's Discord server and then download logs of their conversations for the past year instantly after joining, but I suppose this is already possible by viewing history in the client?
EDIT: looking at the API on https://discordpy.readthedocs.io/en/latest/api.html, it seems you need permission for the channel logs, but that can't prevent someone from writing code to collect them manually, regardless of permissions?
Discord has a pretty indepth permission system that allows per-channel/per-user setup.
If a server allows a user to view the message history (which basically mean, when you enter the channel, you can see previous messages and scroll up), then yes, that user can write a bot to save all the messages. I don't really see what the issue is here.
That to me really is one of the main reasons I prefer Discord to IRC. It's the fact that you can join a channel from any device and see past conversation. But of course, if for security reasons you don't want that, you can very easily disable message history and have it act like IRC does.
It's pretty easy to setup the discord-irc bridge, assuming you're referring to bitlbee. I already use it to have an IRC interface for facebook, google hangouts, etc, so discord was just adding a plugin and configuring the account, which took about 10 minutes total.
Any app that has voice turned on whenever it detects sound by default, without prompting the user on installation, doesn't take security seriously.
I mean, unless you expect a communications app, running in the background, to share the conversation you're having in your room, without telling you, with everyone in every channel, until you discover it in your user preferences.
(I'm going to assume you're going to misunderstand what the issue here is. It listens by default, like when you install, and you're not prompted that it's the default. Contrary to every other communications or microphone app in existence, save for ones that are designed to spy on people).
I don't think this is a "security" issue as much as it's a usability or privacy issue, and I don't think it's an example of Discord being evil.
For a start, it's not quite "on install", but after joining your first voice channel. The issue comes from the interaction of a series of reasonable steps that on the whole result in an unfortunate experience for some people. The problematic series:
* By default, Discord uses voice detection to determine when you're speaking, as opposed to push-to-talk. This feature makes perfect sense.
* By default, Discord configures itself to start up on login. This feature makes sense. (I personally immediately turn that option off, but I don't resent its inclusion.)
* When started, Discord rejoins any voice channel you were in when Discord was last exited. This feature also makes perfect sense. [Edit: Apparently this is no longer true, and Discord will only rejoin the channel if you were active within the past 5 minutes.]
Essentially the result of these design decisions in series is [edit: was] that if you install & use Discord, and fail to manually disconnect from your voice channel, next time you start your computer Discord will automatically join your last channel and broadcast any loud enough audio in the same room as your computer to the voice channel.
There are a few mitigating factors, too: Discord is pretty obviously open and on the screen when this happens, and it does show your active voice channel, and it does show an activity indicator when you're broadcasting.
> Essentially the result of these design decisions in series is that if you install & use Discord, and fail to manually disconnect from your voice channel, next time you start your computer Discord will automatically join your last channel and broadcast any loud enough audio in the same room as your computer to the voice channel.
It's worth noting that Discord no longer does this if you've been away from the voice channel for more than 5 minutes. The feature was intended to autoreconnect you when the app was restarted due to updates and such, not to cause people to accidentally broadcast themselves on system start.
Ah, excellent. I wasn't aware they'd added a timer to the channel reconnection. That's an elegant way of solving the problem without compromising the important part of the features.
The very first time, you need to very clearly UNMUTE yourself manually by pressing a button, even after you join a voice channel.
After that first time, joining a voice channel will enable your mic, and by default it also makes a clear sound. There is a feature that reconnects you if you were connected to a voice channel before you left (though it seems to be limited to 5m now).
But again, unless you 1. connect to a voice channel and 2. press the unmute button that first time, there is no listening happening...
You can verify all this on Chrome, since there, they have to specifically ask for your permission before accessing your microphone, so you know exactly when they do it. Open Discord on Chrome and play around, join a voice channel, unmute, and only then it will ask for access.
I would consider this more of a privacy issue than a security issue - but something that should nonetheless be communicated clearly to user.
Maybe these terms don't seem worth differentiating, but I see privacy as "what info do they collect" and security as "how safe do they keep my info, after they collect it".
Voice activation is the default of all popular VOIP products. You have to specifically join a voice channel, and it will display that on the bottom left if you're in one. Mumble, TS, Skype especially, all of those use voice activation in setup first. Its the norm, and people expect it, and moreso, want it. I hate voice activation, personally, but with how Discord is used as a group chat in replacement of Skype, it makes sense to default it.
It's really interesting to see that you're using Cassandra for this. IIRC, Cassandra was created by Facebook for their messaging, and realized that eventual consistency was a bad model for chat, so they moved to HBase instead. (source: http://highscalability.com/blog/2010/11/16/facebooks-new-rea...)
The tombstone issue was really interesting ! Thanks for sharing.
They did make the original and the core model is the same, but Cassandra in 2017 is quite different from what they open sourced in features, usability and stability.
I guess everyone makes their own tradeoffs though. This has been working wonderfully for us.
Need to be careful about the wording, "strong consistency". I dislike that datastax uses that in their documentation, because it's misleading and really confuses people. There is no commit protocol in place - Cassandra is still an AP db under the hood, so even having multiple replicas acknowledge doesn't mean the data is consistent. For that you need paxos or something similar. This becomes very obvious if you are doing updates from multiple sources to the same key.
Love Discord. Most of my friends and I have switched over from using Mumble and it's been great.
I run a small Mumble host [1] and I've always thought of the idea of wrapping the Mumble client and server APIs to function like Discord/Slack as an open source alternative. Mumble is great and all, but the UI/UX appeal of Discord is so much better.
Keep up the great work!
Also, is this is the same Stanislav of Guildwork? Ha, I remember when Guildwork was being formed back in the FFXI days.
It is fascinating that more and more people are using Cassandra. DataStax believes they have fixed problems with prior guarantees claims that were exposed by Jepsen. But there has been no official Jepsen testing since.
On the topic of looking at Scylla next, I wonder why did the team not just start out with it to begin with. Also, are they people with experience running both. How is the performance? And what is the state of reliability?
The problems that Jepsen found were centered around the "transactions" feature that Cassandra added. We don't use these and don't need them since we don't need 100% consistency and prefer availability (for example we read at quorum to trigger read repair, but downgrade to single node reads if we need to).
Also ScyllaDB is a new product and it would be crazy to start off with it. We plan to run a long-term double write experiment before we are comfortable with using it as a primary data store.
The Jepsen tests were not completely centered around transactions. It also had to do with data loss when replicas go down and pure "last-write-wins" approach. For those wanting more info around this the original post is here:
I find it fascinating that people still think Cassandra is some risky new tech - been running it in production since 2010, and the fact that people are still worried about it makes me snicker a bit.
The whole ideas behind Jepsen report is not that people need Strong Consistency. It is that products should tell you precisely what they guarantee or not.
> While Cassandra has schemas not unlike a relational database, they are cheap to alter and do not impose any temporary performance impact
in most relational databases, the schema is cheap to alter and does not impose a temporary performance impact.
In-fact, all of their requirements (aside of linear scalability) could also be met with a relational database. Doing so would gain you much more flexible access to querying for various reports and it would reduce the engineering effort required for retrieval of data as they add more features (relational databases are really good at being queried for arbitrary constraints).
I think people tend to dismiss relational databases a bit too quickly these days.
a) I'm not aware of any relational database that can alter a schema in real time on a hot table with billions of records.
b) You were quite okay to just dismiss scalability there except that's the most important requirement for a company such as this. People don't just choose Cassandra lightly given how significant its tradeoffs are.
c) Most companies are offloading analytics/reporting workloads into Hadoop/Spark and then exporting the results back to their EDW. This allows for far more functionality and keeps your primary data store free from adhoc internal workloads.
d) Nobody dismisses relational databases quickly. In almost all cases they are the first choice because they are so well understood. The issue is that most of them do have issues with linear scalability and the cost to support them quite prohibitive e.g. Teradata, Oracle.
Regarding a), a sharded postgresql (i.e. with citus data) can easily accomodate that workflow with just a tiny bit of extra overhead.
Re c) this strongly depends on the usecase, I've seen companies use a) to avoid split-brain problems and having to manage two data pipelines to great success at similar scales. You might find https://www.youtube.com/watch?v=NVl9_6J1G60 interesting.
I sort of agree with threeseed and and the GP comment, so upvotes to both of you.
1) Altering schema is vague. It was used vaguely in the article (although, given the clarity of the article, I suspect the authors knew exactly what they meant). Some alterations on relational database tables are fine, even when hot and have billions of rows. Others are not.
Add a new column: fine. Index the new column: fine. Create a new index on a column with billions of rows: definitely not fine.
But the index plan described in the article was very specific about what they wanted. It doesn't sound like they had to add any new indices.
2) Mostly agree here. Linear scalability is a big deal here, and it's fucking hard to do well for most RDBMS systems. I slightly disagree, however, because the article explicitly states that the requirements are willing to trade C for A in CAP theorem. This is important. The hardest parts of linear scaling in RDBMS are enforcing C. Think transactional systems that absolutely must be consistent. Like your bank account. This isn't that, and the blog post clearly states it. Takes a lot of pressure off the relational database when it comes to scaling.
3) Strongly disagree. Most companies don't have the resources or manpower to do that. It takes a lot of time and a lot of effort. Hell, most companies don't even have an EDW. Let alone a pipeline from the OLTP server to Spark/Hadoop to the non-existent EDW.
4) We seem to run in different circles. Almost everyone I know dismisses relational databases without question. Mongo is the way to go. And I get called out as the resident old fart/luddite who insists on using postgres. Speaking of which, if the first things you think of with relational DBs are Teradata and Oracle, we are definitely operating in different contexts.
If your opinion is that relational databases are generally well understood by--and therefore often the first choice for--developers . . . I want to know where you work.
Because that's not a different context from where I am.
That's a different universe.
The reality is that storing and retrieving data is a hard problem, and there's no set answer that works for everyone in every situation. If you're building a new product from scratch, you should go with what your team knows, provided that the team knows enough to not put yourself in the situation where you're just losing data in a partition scenario (well-made point in the original article. Mongo is fine on one node. Scale it out, and you might as well write your data to /dev/null)
Almost any datastore will serve the needs of a new product until it needs to scale horizontally. Relational, NoSQL, Object store, whatever. When it comes to scaling linearly, you have to take factors into account.
1) Which part of CAP theorem are you willing to sacrifice? You always have to let go of one.
1a) If you want a CP system, you have no choice but to deal with scaling problems of relational databases. You must have transactional guarantees for this to work.
1b) If you need an AP system, you have choices, but the choices lean in favor of systems like cassandra. It's just easier than seting up multi-node postgres and doing sharding.
It's also worth pointing out that people very often dismiss vertical scaling too soon. Take a look at Joel Spolsky's articles about infrastructure at StackOverflow. You can do quite a lot with the available firepower of modern technology by just buying bigger and better hardware.
I'm not suggesting that going bigger would have been the right choice for Discord. But sometimes it can be the right choice.
If there's something I fundamentally disagree with about the article, it's this: trying to do everything in a single data store. I think--much like what you suggested above--that it's better to have separate systems for reading and writing. Since the use case is definitively AP, I can't see a reason not to have a transactional system in an RDBMS and a streaming pipeline to a cassandra cluster for reading.
Use the right tools for the right job, is basically my point.
> 4) We seem to run in different circles. Almost everyone I know dismisses relational databases without question. Mongo is the way to go. And I get called out as the resident old fart/luddite who insists on using postgres. Speaking of which, if the first things you think of with relational DBs are Teradata and Oracle, we are definitely operating in different contexts.
I suppose I run with more...sensible devs? I mean a lot of my co-workers are Millennial Hipster Rubyist types, and they'll pick a Postgres or MySQL database literally every time and never leave it with their cold dead hands. One team here even built their own queuing system on top of some Ruby and MySQL. (Please don't ask. They had...reasons but they basically reinvented Kafka.)
These same teams really try to avoid Redis, also.
Of course, these teams are writing REST APIs with very strict SLAs. Most the time I see MongoDB and other "NoSQL" DBs used is when you have front end JS devs writing the Node backend code. >.>
> The hardest parts of linear scaling in RDBMS are enforcing C.
The hardest parts of linear scaling in RDBMS is actually doing the scaling - it's "what do I do when I'm about to outgrow a master and need to add a bunch of capacity", and "what do I do when the master crashes". At Crowdstrike we would add 60-80 servers to a cassandra cluster AT A TIME, no downtime, no application changes, no extra work on our side - just bootstrap them in, they copy their data, and they answer queries. The tooling to do that in an RDBMS world probably exists at FB/Tumblr/YouTube, and almost nowhere else.
> Think transactional systems that absolutely must be consistent. Like your bank account
Most banks use eventual consistency, with running ledgers reconciled over time.
> It takes a lot of time and a lot of effort. Hell, most companies don't even have an EDW. Let alone a pipeline from the OLTP server to Spark/Hadoop to the non-existent EDW.
In the cassandra world, it's incredibly common to setup an extra pseudo-datacenter, which is only queried by analytics platforms (spark et al). Much less work, and doesn't impact OLTP side.
> 1a) If you want a CP system, you have no choice but to deal with scaling problems of relational databases. You must have transactional guarantees for this to work.
This is fundamentally untrue - you can query cassandra with ConsistencyLevel:ALL and get strong consistency on every query (and UnavailableException anytime the network is partitioned or a single node is offline). Better still, you can read and write with ConsistencyLevel:Quorum and get strong consistency and still tolerate a single node failure in most common configs.
> Use the right tools for the right job, is basically my point.
And this is the real point, with the caveat that you need to know all the tools in order to choose the right one.
1) scaling is easy . . . oh casandra. Where you can't have C and don't care about P.
2) Let me tell you about banks. I used to work for banks. Banks do not use systems that are eventually consistent. Banks use systems--however old and outmoded--that are strongly consistent. Banks do not use systems that are eventually consistent except for ACH transfers. And that's not a database. That's a flat file
3) There is no cassandra world that you speak of. This is utter bullshit.
4) No it's not untrue. Cow--as we call it on me team--absolutely sucks at C when you're talking about scaling horizontally.
Make up your mind. Is this good at single node guarantees or is it good at sharded guarantees?
Pick one.
We know for a fact that if you want CAP, you can't have all three. You can have AP or CP, but you can't have all of them. If you're arguing that you can have C and A, you have failed at P.
Maybe that's a thing you're willing to trade-off. But it doesn't in any way relate to my point.
My point, if you missed it, was this: if you want strong consistency, you need a relational database, and you need transactional guarantees.
That is hard to do, and no one does it well yet. You're just lying to people if you say otherwise.
I don't know what your background is, but I'm really encouraged by the fact that I've worked my whole career without having to deal with people that behave like you.
> 1) scaling is easy . . . oh casandra. Where you can't have C and don't care about P.
This isn't about teaching me the CAP theorem. I know the CAP theorem. I know the tradeoffs. I've built and managed systems that handle hundreds of billions of events a day, writing millions of writes a second into a thousand cassandra nodes. You can have C, if you want it - you dont get transactions with rollbacks, but that doesn't mean you dont have consistency.
> 2) Let me tell you about banks. I used to work for banks. Banks do not use systems that are eventually consistent
Cassandra lets you tweak the C or A trade-off on each query by setting the consistency level. So yes, the same system can provide both guarantees, depending on which you need.
> if you want strong consistency, you need a relational database
You can have consistency without transactions.
Overall, it really sounds like you've never used or know much about Cassandra, and are possibly confused on the CAP theorem.
> I think people tend to dismiss relational databases a bit too quickly these days.
In too many cases, people conflate RDBMS with MySQL which any schema mod is time consuming on large tables, even when adding nullable columns with no other constraints.
These people are still commenting on use cases that clearly don't apply to them? I realize the Lords of Data like to lock themselves in their Oracle-built towers, lock away your data and access, and then every once in a while issue a fiat...
But you'd think they'd read about CAP theorem and cloud architectures while they're hiding from everyone.
Wildly biased Cassandra person, but I find this very well written and explained, and I'm especially happy that when you bumped into problems like wide partition and tombstone memory pressure, you didn't just throw up your hands, but you worked around it.
The wide partition memory problem should be fixed in 4.0, for what it's worth.
Discord missed an opportunity a year or two ago to become something like slack for large companies. Hipchat's perf is horrible and slack couldn't scale to +20k users a year ago. Managing a mattermost instance requires staff and is more outage prone.
It's really too bad that they didn't take advantage of it, since they were actually scalable compared to their competitors and had good voice chat. Slack has started becoming more scalable recently, so I don't know how much the opportunity is still there.
I think it makes sense for Discord to stick to it's gaming niche, rather than trying to do a bunch of things poorly.
The other market is a bit more saturated, with Microsoft and a few others piling on top too. Whereas the Gaming market was completely lacking. There were a few clients which focused much more on voice (mumble, vent, ts), but nothing quite like Discord. Free and one-button to make a new server.
The only other contender I can think of is Raidcall, but that was a joke... Now there's Curse but they came too late to the market and were DoA, except the people they forced to use it by paying thousands.
We've been using Discord a bunch at our company (HearthSim). We have a server for our user community, one for our open source org and one for our company. It's superb, works so much better than Slack ever could.
Are companies a market you're serious about? There is so much focus on gaming, it's hard to be sure. I mentioned to support recently one of our prime issues as a company is being limited to a single owner per server.
PS: Are you the same Stanislav Vishnevskiy I'm thinking about? I remember working on Guildwork with you!
We currently don't share exact metrics for all those stats, but we have shared a few press releases and blog posts which you can easily extrapolate from. :)
Hipchat goes terribad in a 20k organization . In things like only delivering pushes to your phone sometimes or getting chat room history when you open the app. I dont know how slack performs in a similar situation.
If you're deleting often, I recommend running a full compact (after your repair) to free up space and rid yourself of those tombstones once and for all. Repairs without compacts make those SSTables grow and grow. It's amazing how much space a compact clears up.
I had to delete a shitload of data from Cassandra recently and it required dropping gc_grace_seconds to a very low value in order for the tombstoned records to be dropped during compaction (this was mentioned in the article)
Not surprised to see other companies facing issues with Cassandra and tombstones. Don't get me wrong, I understand the need for tombstones in a distributed system like Cassandra... It doesn't make it any less of a pain though :).
> The tombstone problem described is due to misuse.
My concerns with Cassandra are precisely here: this is easy to misuse it.
There are a lot of constraints on the schema (more specifically on the design of partition & clustering keys).
Each choice leads to many restrictions on what can be requested/updated/removed;
and to different issues with tombstones and GC.
The Discord's story is exactly what I experimented: a series of surprises, and even really bad surprises in production.
In both cases, the story ended with an efficient system, but with by far more engineering work and rework than initially planned.
This article is extremely similar to almost every other "OMG! We used Cassandra and it was nothing like a SQL database!" article by Netflix, Spotify, and so many more. The fact that every single one contains the same 6 or 7 self-inflicted issues is pretty funny to me. I mean, I thought we lived in the age of the Google Dev.
Cassandra does require you to know more of it's internals than most other data stores. Unfortunately, the move to CQL and very SQL-like names for things that are nothing at ALL like their SQL counterparts is not helping.
Also, our own personal death by tombstone: A developer who didn't even know those existed checked in some logic that would write null into a column every time a thing succeeded.
After that passed QA and went into production, all hell broke loose with queries timing out everywhere. SUCH FUN.
> My concerns with Cassandra are precisely here: this is easy to misuse it.
You get this massively scalable (to thousands of nodes and tens of millions of operations per second) database for free, and all you have to do is have your developers read about it before they use it. Is that expecting too much?
Let me ask this, then:
It's an OSS project. Let's pretend I'm a committer or on the PMC, and I'd like to fix this in a way that works for you. We need a null write to act as a delete. We need tombstones to make sure things that are meant to be deleted stay deleted. We have to have all the tombstones we scan in heap on reads to reconcile what's deleted and what's not deleted within a partition on any given slice query.
What would you want to see changed to avoid the tombstone problem? There are dozens of blogs around that say "dont write null values if you dont want a tombstone to be created" (like this article, or [0]), but beyond that, do you expect to see errors in logs?
We've made unbound values in prepared statements no longer equivalent to null/delete [1].
What else would you expect an OSS project to do to protect you from abusing it? Serious question.
These same surprises keep being written about for Cassandra. At this point reading a few blog posts and the documentation (especially about how the data is stored) will cover all the issues you might have in production.
I'm one of the people who nagged you on the redis post, and particularly expressed skepticism that such a transition would've been necessary. I haven't read this yet, but I just want to say thanks for actually following up to that thread and posting it. Looking forward to it!
---------
EDIT: Just read the post, and while it provides a good perspective on Discord's rationale to introduce Cassandra in the first place and does a great job pointing out some unexpected pitfalls, it doesn't specifically respond to replacing Redis with Cassandra due to clustering difficulty, per the prior thread. [0] Redis is only specifically called out as something they "didn't want to use", which I guess is probably the most honest answer.
The bucket logic applied to Cassandra seems like it could've been applied to redis + a traditional permanent storage backend nearly as easily. The biggest downside here would be crossing the boundary for cold data, but that's a pretty common thing that we know lots of ways to address, right? And Cassandra effectively has to do the same thing anyway, it just abstracts it away.
Again, I'm left wondering what specific value Cassandra brings to the table that couldn't have been brought by applying equal-or-lesser effort to the system they already had.
I also found it amusing that they're already contemplating the need to transition to a datastore that runs on a non-garbage-collected platform.
This post was about using Cassandra for message storage.
You are basically advocating for plugging 2 systems together, which out of the box don't provide elasticity. Or we could just use Cassandra. It is a simpler solution and Cassandra is not new tech. Aside from caching empty bucket information we have nothing sitting in front of Cassandra. It works great and the effort was minimal.
The Redis comment by jhgg was referring to our service which tracks read states for each user. We might write about that later, but it's not as interesting. The most interesting about that experience was reusing memory in Go to avoid thrashing the GC.
We care about seamless elasticity for our services which Redis doesn't provide out of the box except with Redis Cluster which does not seem to be wildly adopted and forces routing to clients.
Both Cassandra and Redis Cluster will forward queries to the correct nodes and both use client drivers that learn the topology to properly address the right nodes when querying.
Ah, I see. Thank you for clarifying that this is not the service to which jhgg referred.
Obviously, I haven't addressed this problem in-depth and I don't really know enough about the specifics to criticize the decision directly. It's completely possible that Cassandra was the perfect fit here.
The previous thread was in the context of switching things up without strong technical motivation. I said that actually, it does seem easier to fix a redis deployment than to write a microservice architecture backed by Cassandra, and that I hope to hear more about a stable production ethos from the tech community as a whole. There are a lot "moving on to $Sexy_New_Toy_Z" posts, but not a lot of "we solved a problem that affected scaling our systems, and instead of throwing the whole component away and starting over, we actually did the hard work of fixing and optimizing it: here's how".
To address your specific complaints.
>You are basically advocating for plugging 2 systems together, which out of the box don't provide elasticity.
I mean, again, without getting in-depth, I'm not advocating anything (I feel like I need a "This is not technical advice; if you have problems, consult a local technician" disclaimer :P).
However, storing lots of randomly-accessed messages and maintaining reasonable caches are not new problems. There are lots of potential solutions here.
And while Cassandra is not "new tech" in the JavaScript-framework-is-a-grandpa-after-6-months sense, it's certainly "new tech" in the "I'm the system of record for irreplaceable production data" sense.
Cassandra is also among the less-used of the NoSQL datastores, putting it in a minority of a minority for battle-tested deployments. You mention Netflix and someone other big name using it in production as part of your belief that it's stable. This, I think, is part of the problem.
These big companies use these solutions because
a) they truly do have a scale that makes other solutions untenable (although probably not the case with Cassandra itself);
b) they can easily afford the complex computing and labor resources needed to run, repair, and maintain a large-scale cluster. Such burdens can be onerous on smaller companies (esp. labor cost);
c) when they need a patch or when something starts going awry, they can pay anyone whose willing to make the patch, their own team not excluded. Often the BDFLs/founders of such projects end up directly employed by these big companies that adopt their tech.
"Netflix [or any other big tech name] uses it so we know it's stable" is a giant misnomer, IMO.
None of this is to say that Cassandra isn't a good choice for this problem or any other specific problem, because again, as a drive-by third-party I don't know. But contrary to what the article states, it hardly seems like Cassandra was the only thing that could've possibly fit the bill. I bet it could be done well with a traditional SQL databases (which, from the body of the post that identifies Discord as beginning on MongoDB and planning to move to something Cassandra-ish later on, it doesn't sound like was ever tried or considered).
It's kind of like reading an RFP that was written by a a guy at a government agency that already knew they really wanted to hire their brother-in-law's firm. "Must have $EXTREMELY_SPECIFIC_FEATURE_X, because, well, we must! And it just so happens that this specific formulation can only be provided by $BIL_FIRM. What d'ya know."
>We care about seamless elasticity for our services which Redis doesn't provide out of the box except with Redis Cluster which does not seem to be wildly adopted and forces routing to clients.
First, you just admitted that Redis actually does have that feature that you're saying it doesn't have. "Redis Cluster" and "Redis" are the same thing. "Redis Cluster" is part of normal redis and afaik, while it requires additional configuration, it will automatically shard.
In any case, while I have no numbers, I would wager that Redis Cluster is more widely used than Cassandra.
Cassandra was literally designed for this class of problem, and redis wasn't, isn't, and never will be.
The bucketing they're doing is within a partition - they still get all data in a logical cluster, which gives them transparent linear scalability by adding nodes without having to reshard, and they get fault tolerance/HA, reasonable fsync behavior, and even cross-wan replication - things you'd never get with redis.
Cassandra has a snapshat command that creates a directory by symlinking files that hold data (this is safe cause Cassandra files are immutable). Then you just upload them to your backup storage. This is obviously for recovery scenarios that are catastrophic.
Normally though since the data is replicated on 3 nodes, you can technically loose a node completely and rebuild it from the other nodes.
Cassandra natively supports multi-cluster replication - so you can run an entirely separate cluster that also has a copy of the entire dataset (which itself has configurable replication within a cluster) which can be used as an online fully-active backup.
We run 3 geo-distributed clusters with no offline backups because of this.
I love Discord and use it on a daily basis, one of our main concern with my gaming group is the voice latency compared to TS, Mumble or Ventrilo but this is mainly due to the inability to host your own server.
One of the big missing feature we would like to have in Discord is the ability to assign special permission to our groups leader so they can communicate over voice chat to other other group leaders in other channels (global voice chat).
When we play PVP MMO's and have 40+ users all in the same channel calling shots its impossible to coordinate properly.
What we normally do is split the group in 4 so 10 players in 4 different channels and each group leaders are calling shots independently BUT can also communicate via voice chat to other group leaders. Basically there's a global voice chat for group leaders that no one else can hear but them.
Discord is great but I have intermittent performance issues with it that make it almost unusable in comparison to Slack which never has any noticeable latency.
I can't really give you "more details," I was trying to use Discord with some people and every couple minutes or so the chat would freeze up and then flood with messages. It wasn't just me, it was everyone in the room's Discord doing it, and it really doesn't seem like a client-side bug.
This may or may not be your issue. But I found that the Windows Defender system would start going nuts on files related to Discord. This would cause all sorts of problems (though most of the files involved seemed to be more related to their auto-update system, which shouldn't have impacted the running process... or so I would have thought!).
This is pretty good to hear honestly. Competition is always good for the consumer. This rivalry between Discord and Slack will only make things better for everyone.
Cassandra uses consistent hashing. A partition is a segment of data identified by the partition key to determine which node in the consistent hash ring owns that data.
You cant break down partitions any further because it's just a name for the smallest cohesive set of data owned by a hash key, so instead it's advisable to use more partitions with data modeling rather than making them huge.
Our realtime messaging is done over Websockets using either JSON (for web/non-native) or ETF (http://erlang.org/doc/apps/erts/erl_ext_dist.html). Almost all user-based actions are sent over our HTTP API using JSON.
Since Cassandra is eventually consistent, how do clients get a consistent sequence of messages?
Do you actually use Cassandra range queries to poll for new messages, or do clients use some kind of queue to get notified?
Say messages A, B, C are created in that sequence. But isn't it then possible that a client asking for new messages only gets A and C, and B only shows up a few milliseconds later, which would be missed unless the client actually asked for "everything since before A"? Or is that not possible?
We have a real time distribution system written in Erlang/Elixir that keeps the clients in sync. We do not poll for messages. That wouldn't work at our scale :)
Wow! This is an incredible article. I do research and development for systems like this at GUN, and this article nails a lot of important pieces. Particularly there ability to jump to an old message quickly.
We built a prototype of a similar system that handled 100M+ messages a day for about $10, 2 minute screen cast here: https://www.youtube.com/watch?v=x_WqBuEA7s8 . However, this was without FTS or Mentions tagging, so I want to explore some thoughts here:
1. The bucketing approach is what we did as well, it is quite effective. However, warning to outsiders, this only effective for append-only data (like chat apps, twitter, etc.) and not good for data that gets a lot of recurring updates.
2. The more indices you add, the more expensive it gets. If you are getting a 100M+ messages a day, and you then want to update the FTS index and mentions index (user messages' index, hashtag index, etc.) you'll be doing significantly more writes. And you'll notice that those writes are updates to an index - this is the gotcha and will increase your cost.
3. Our system by default backs up / replicates to S3, which is something they mention they want to perhaps do in the future. This has huge perks to it, including price reductions, fault tolerance, and less DevOps - which is something they (and you) should value!
There backend team is amazingly small. These guys and gals seem exceptionally talented and making smart decisions. I'm looking forward to the future post on FTS!
We make friends with owners of large public Discords and they had no problem with being included. The owner of that Discord even enjoyed reading the blog post.
You can also search past messages and the blog post ends with saying the follow-up will be about search. Search is not rolled out to everyone but will next week.
Thanks for sharing. I guess I'm not quite in the target market. It seems like a very interesting evolution beyond prior closed chat systems like AIM, ICQ, etc.
They didn't do anything wrong. They just used a feature Discord provided. It was Discord who failed to anticipate a possible problem with the feature. It's not like they are blaming or publicly shaming someone.
You're storing messages, how are you guaranteeing safety of those messages when it looks like one can seemingly just blast through your API calls to find messages when one isn't even on that server?
And a few extra additional seconds of research shows that they don't change the token unless you change your password, as opposed to having the token expire after every session.
The normal token system revokes on password change, if you want to revoke and have extra security we offer MFA login which has unique tokens per login. If security is of importance to you then use MFA.
It's far more valuable to understand why Discord uses Cassandra than to merely be aware they do.
Out of curiosity, did you consider HBase and Riak? Did you entertain going fully hosted with Bigtable? If so, what criteria resulted in Cassandra winning out?