Instagram: 13 bit shard ID, 51 bit "local" ID consisting of 41 bit timestamp in milliseconds and 10 bit sub-millisecond ID. So this scheme supports 1024 IDs per millisecond per shard for 41 years, and 8192 shards.
Pinterest: 16 bit shard ID, 10 bit type ID(?), 36 bit local ID, 2 bits reserved. This supports 68 billion objects per shard and 65K shards, but does not represent time. So you need another field / more storage for that. Also notable is the large 10 bit type ID field which seems to be only actually used for a handful of values, leading to a large chunk of bits that don't change across IDs.
In short, Instagram's scheme is more efficient largely due to the leverage of timestamps in the ID instead of type information.
And even if Postgres is slower, for the money you save in license costs you can buy a few beefy nodes extra.
I've seen a different approach where you keep a version number on the row, do your read, modify in memory on the app server, then do your write like this
UPDATE db03429.pins SET blob=’<modified blob>’ WHERE local_id=7075733 AND version=53
The reason you'd do this is so you can't ever have the row locked for a long period of time. A lot of people don't think about database scalability so _even if they know_ that the code they're writing runs while a transaction is held open, they don't care that that transaction is blocking anything else that is trying to read the row they're working on.
This can lead to row lock bloat over time, which can cause scalability / availability issues as app servers wait longer and longer to get read (or write) locks on all the rows they care about for their current request. This is mitigated a bit if you're requiring / encouraging people to read from slaves instead of master, though.
These can be worked through, with discipline. You probably only need a fully consistent few on a surprisingly small number of pages.
Generally you'll want to read from the master right after a user changes something (if you have a model where user clicks a button -> you go to a different page). User changes setting on a pin, clicks 'save', you render a new page showing their updated pin. This page view should probably come from the master, or else you risk the user's change not showing up, causing confusion.
Reads from slaves are fine, as long as you're not using something that was read out of a slave as an input to a database write somewhere else (which you shouldn't be doing anyway!). If you render a page that lets a user change their profile (say site.com/edit_profile), the user data can come from a slave, but if you take _all_ the field values and blindly write those into the master, that's where you run into "time travel" bugs. You just need to find out what the user changed and only make those changes in the master.
"MySQL is mature, stable and it just works. Not only do we use it, but it’s also used by plenty of other companies pushing even bigger scale. MySQL supports our need for ordering data requests, selecting certain ranges of data and row-level transactions. It has a hell of a lot more features, but we don’t need or use them. But, MySQL is a single box solution, hence the need to shard our data. Here’s our solution:"
what about that paragraph is not true of postgres also ?
This schemaless json reminds me of this friendfeed blog post from years ago:
With this specific workload, I think MySQL will work pretty well. Two features in particular: innodb clustered index and compression.
(I work on the MySQL team.)
"Aside: I still recommend startups avoid the fancy new stuff — try really hard to just use MySQL. Trust me. I have the scars to prove it."
In many ways, Postgres is cutting edge with it's features and capabilities. I see lots of updates from Postgres that include new SQL features (json objects) - conversely I see lots of updates from MySQL that are about durability and scalability e.g. (galera cluster, percona updates etc...)
From the use case Pinterest exposed in the article, I'm not sure what the additional features of Postgres would actually buy them?
It is not stuck in the stone age, yet extremely reliable.
Having worked with a number of MySQL (main fork) and Postgres installations in the past 7 years I observed a number of crashes and corruptions with MySQL (not counting the glory MyISAM days) while having _none_ with Postgres.
I should confess that that experience made me very biased in favour of Postgres. My default thinking about Postgres now is: if it doesn't have something MySQL has, it is either for a reason or they are taking their time to do it the _right_ way.
>or use a solution that enables native JSON storage
This would help if you needed to select or join on the individual columns, but if Pinterest don't need to do that, then this falls under the "avoid the fancy new stuff" quote from the article.
In practice, how often do schema migrations take place though?
But I still wonder why some people store blobs in a database rather then simple files on a file-system!?
For "small" blobs, say <256K, your database is probably faster too. I can only recall one such study (and of only NTFS vs MSSQL), and NTFS was only advantageous at >1MB blobs: http://research.microsoft.com/pubs/64525/tr-2006-45.pdf
If say you have the database with blobs - you can restore the db and everything is back up and running. If instead, you need to also restore the individual files... well - not only do you have lots of individual disk writes, you also have to make sure you still have all the files...
The takeaway is that for content < 20K you should use SQLite, and for content < 100K you should also use it with possibly some tuning to do.
The first would be to create a mapping table as described. For a relation like "links to reddit", the cardinality is such that it would probably break their sharding scheme.
So the second approach is probably the one he mentions in the article: map reduce (more generally, separate computation). My guess is that for those sorts of "reports" they are using Hadoop. They could also be leveraging things like the HyperLogLog features of Redis.
It is the responsible thing to do, otherwise other website bots can spoof you.
One good idea is to open source it, so if other people can take advantage of it, they will also help you maintain it and find bugs for it.
A question: If you started this now, would you consider using Postgres-XL AFAIK it supports similar shardings, in a more transparent manner for the developers. Any thoughts on this?
Shard ID = (241294492511762325 >> 46) & 0xFFFF = 3429
Type ID = (241294492511762325 >> 36) & 0x3FF = 1
Local ID = (241294492511762325 >> 0) & 0xFFFFFFFFF = 7075733
241294492511762325 >> 46 means shift the binary form of number which is
& 0xFFFF is doing a AND operation of the bits.
If GUID got you into a bank account, opps.
If it gets someone access to an already public picture... not a huge deal, right?
Every request has to convert the public, non-guessable GUID into the private ID that will be used in the lookup query.
NoSQL is really better than SQL to scale your databases. It is just sad there is no one open source NoSQL db as good as Google BitTable.
Downvoted for what is essentially your opinion presented as a fact. If you want to make such a statement, you'll need to give more context - either in the form of proof, experience, or "other". Just something would suffice really - anything. You'll find a lot of people agreeing with you if you add such context.
Without context, it's nothing more than "Vanilla is better than chocolate - hands down."
sorry, that was bugging me...
With Cassandra, you can set replication factors, speed up the writes, and automatically shard the data without having to manage your own "mapping tables".
What he lists as his transaction use case (update if unmodified) I do right now with Cassandra and an IF clause and a timestamp...
i.e. update foo set x = y AND last_modified_timestamp = 456789 IF last_modified_timestamp = 12345
Cassandra was mature by late 2011 and we were using it in production back then (billions of records).
In any case, I would have a knee jerk reaction to not trust any database system that hit 1.0 a month before I wanted to start using it, especially for mission critical core data.
>Support asking for N number of Pins in a board in a deterministic order (such as reverse creation time or user specified ordering). Same for Pinner to likes, Pinner to Pins, etc.
This can be a pain to model in Cassandra. It would require denormalizing for every key you wanted to order on, and it generally makes updating data a pain.