But if you got 5 TB of data, that needs to be in a SSD drive, then please tell me how I can get that into 1 single physical database.
Now you don't have to shard. More info on how we accomplish distributed transactions. https://fauna.com/blog/distributed-consistency-at-scale-span...
Partitioning the dataset isn't really novel these days (Cassandra, Riak, Mongo et al do the same of course), but what is a significant difference is that both Spanner and FaunaDB implement ACID transactions distributed across partitions. It no longer matters for application correctness what partition key you choose if you can involve any arbitrary set of records in an single transaction.
> Whenever I've heard the term "sharding" it's usually in reference to the application-level sharding described in the article.
I wanted to drop a quick clarification note here. In the article, I used the term "sharding" to refer to both application and database level sharding.
For anyone that's looking at sharding as an option for scaling, we're always happy to chat and help point you in the right direction. My email's ozgun @ citusdata.com
If you're looking databases that come with built-in sharding, I'd definitely check out Citus (then again, I'm biased): https://www.citusdata.com/
if all you need is a lot of data in a single database, there's basically nothing except for money between you and your goal. JBODs full of SSDs coming into a single machine via SAS will get you into petabytes, just with commodity hardware you can order from amazon.
i'm expect IBM could sell you a mainframe that'll do it for whatever capacity you care to name.
if you're a reasonable person, you buy 6 1TB samsung SSDs and stuff them into a single 2U case and you're done.
Let's not pretend there is anything reasonable in this setup.
Some of us do need to shard for sure though (I have multi petabyte data sets).
Also, operations on a such huge data set can be really painful. Think how to backup a DB like that safely, or how to update the engine.
Some slides (little old, 2014) about a huge postgres instance serving as a backend for leboncoin.fr (main classified advertising website in France).
Basically, they bought the best hardware money could buy at the time to scale vertically, they, in the end, run in some issues and started thinking about sharding this huge DB.
I have a workload that runs close to 1.2million TPS for hours at a time and needs less than 100 millisecond response times at the 99th percentile. That uses more than 1 box and sits (replicated) in RAM.
However, 5TB of data really _isn't_ that much on modern SSD's. You can fit a sizable chunk of that in RAM on a decent server, so you probably _don't_ need more than one box.
I have 5TB of data that needs to sit on an SSD is, to be honest, a really poor performance metric. If you are genuinely specing out hardware and a database a better statement would be:
"I have 5TB of relational data, with a pareto distribution for access, at a peak of 100K TPS". Then we can start talking about what solves the problem.
Perhaps the title is click bait, but at the time I was meeting with a lot of users looking for someone else's problems.
5TB could still easily be single server territory. It depends more on the queries.
My point is just that some workloads are better solved with (some) vertical scaling first.
or did you mean just use a large capacity RAID setup? that will probably work fine for a lot of situations but it can expensive and introduce more latency for certain types of operations (but that might not matter, depends on context).
Drive capacity in a server is not limited to the size of a single drive. You can build a raid array any size you like by simply adding more drives.
If you have any questions that aren't covered in the post, happy to answer them here!
For me a major question I have as I consider sharding is what my application code will look like. Let's say I have a query like:
'select products.name from vendor inner join products on vendor.id = products.vendor where vendor.location = "USA"'
If I shard such that there are many products table (1 per vendor), what would my query look like?
If that's too much work, then an easy preliminary step is to add the abstraction layer in your application code. That gets you most of the benefits of a proxy for the purpose of having clean application logic, and makes it easy to switch over later, but is less powerful and feature complete.
If you use Citus, you don't have to make any changes in your application. You just need to remodel your data and define your tables' sharding column(s). Citus will take care of the rest. 
In other words, your app thinks it's talking to Postgres. Behind the covers, Citus shards the tables, routes and parallelizes queries. Citus also provides transactions, joins, and foreign keys in a distributed environment.
 Almost. Over the past two years, we've been adding features to make app integration seamless. With our upcoming release, we'll get there: https://github.com/citusdata/citus/issues/595
If I understood your example query, your application serves vendors and each vendor has different products. Is that correct?
You can approach this sharding question in one of two ways.
1. Merge different product tables into one large product table and add a vendor column
2. Model product tables as "reference tables". This will replicate the product tables to all nodes in the cluster
Without knowing more about your application / table schemas, I'd recommend the first approach. I'd also be happy to chat more if you drop us a line.
To me it read like just a basic introductory post to a longer series.
How is it a benefit that you are no longer able to join data in your separate tables? Is this sentence a mistake?
How long do these sharding projects usually take? Do you know of any posts that break down the steps in more detail?
A good way to tell is by looking at your database schema. If you have a dozen tables, you'll likely migrate with one week's of effort. If your database has 250+ tables, then you'll take about eight weeks.
When you're looking to shard your B2B database, you usually need to take the following steps:
1. Find tables that don't have a customer / tenant column, and add that column. Change primary and foreign key definitions to include this column. (You'll have a few tables that can't have a customer column, and these will be reference tables)
2. Backfill data to tables that don't didn't have customer_id / tenant_id
3. Change your application to talk to this new model. For Rails/Django, we have libraries available that make the app changes simpler (100-150 lines). For example: https://github.com/citusdata/activerecord-multi-tenant
4. Migrate your data over to a distributed database. Fortunately, online data migrations are starting to become possible with logical decoding in Postgres.
If you have a B2C app, these estimates and steps will be different. In particular, you'll need to figure out how many dimensions (columns) are central to your application. From there on, you'll need to separate out the data and shard each data group separately.
There are applications that
* are mature and complex
* with 100s of tables
* serving millions of users
* have to be broken into multiple micro-services
* have developer resource constraints
You've also ignored some of the complexities, such as resharding (moving data between shards), which may significantly add to the cost of the project.
Having several shards can lower the availability of your application if it cannot handle the absence of a shard.
For example if you have 99.9% availability on your individual DBs, and if you split it up into 10 shards, availability will drop to 99% (8 hours VS 3 days of downtime a year).
To handle that, you need to add replication and automatic fail-overs, adding even more complexity.
* Citus and other technologies can now provide features that do a lot of the heavy lifting. Some examples are resharding, shard rebalancing, and the high availability features mentioned below.
* My estimates are for B2B (multi-tenant) apps. For those apps, we found that the steps you need to take in re-modeling your data and changing your app are fairly similar. At Citus, we used to shy away when we saw 200-300 tables. These days, complex apps and schemas have become commonplace.
* We saw dozens of complex B2B databases migrate in similar time frames. Yes, some took longer - I'm in the tech business and always an optimist. :)
I also don't want to generalize without knowing more about your setup. If you drop me a line at ozgun @ citusdata.com, happy to chat more!
We consider our app to be maturing if not mature. It is certainly complex - we integrate with dozens of partners and external APIs. We have 80 tables and 300k LoC of Rails code which is runs on several TB of PostgresSQL data. We have not broken our app into multiple micro-services. Like everybody, we always feel that our developer resources are constrained.
Our data model is very similar to the CRM example in Ozgun's article: _mostly_ we have a master customer table and a wide halo of tables which are directly or transitively associated with customers. We called this the "company sharding domain". Since we allow one user to be associated with multiple accounts, we shard our user table independently: there is a smaller halo of tables in the "user sharding domain". And we have a handful of global tables for content and feature configuration in the "unsharded domain".
We kicked off our migration project from unsharded Postgres to sharded CitusCloud in early Q4 2016. We had one dev work on it solid for one quarter updating our code to be shard-ready. Then another 1.5 devs joined for a month in the final build up to the physical migration. We migrated in late Feb 2017, then consumed perhaps another 3 dev-months on follow-up activities like stamping out some distributed queries which we had unwisely neglected and updating our internal process for our brave new world.
Two years ago at another company I was tech lead on a migration of two much larger Mongo collections to sharded Mongo. That was a larger PHP application which was organized partly into microservices. That effort had a similar cost: as I recall I spent one quarter and two other devs spent about one month, and there were some post-migration follow-up costs as well.
I am confident that real world applications of significant complexity can be migrated from unsharded to sharded storage with a level of effort less than 1 year. I admit that 8 weeks feels fast but I'm sure I could have done it if we had been willing to tie up more devs.
Why were these efforts easier than 2 years? Because we didn't have to build the sharding solution itself - those came off the shelf from some great partners (shout outs to CitusData and mLabs). We just had to update our applications to be shard-smart and coordinate a sometimes complicated physical migration, derisking, and cutover process.
That said, I can imagine the work growing slowly but linearly in the number of tables, and quickly but linearly in the number of micro-services.
Definitely depends on the workload, but often the "micro service" approach (whether or not it's a true micro service in its own runtime) of sharding just one type of data/small set of related tables that you can shard by a primary key or user id or something seems like the only reasonable option for sharding. If your data is becoming unwieldy there's often a bottleneck data set that's bigger than everything else so you don't necessarily have to share everything all at once.
It turns out that if your game backstory reads too much like an MCSE study guide, your game won't sell very well.