Hacker News new | past | comments | ask | show | jobs | submit login

Right on: I've used a single Postgres database on AWS to handle 1M+ concurrent users. If you're Google, sure, not gonna cut it, but for most people these things scale vertically a lot further than you'd expect (especially if, like me, you grew up in the pre-SSD days and couldn't get hundreds of gigs of RAM on a cloud instance).

Even when you do pass that point, you can often shard to achieve horizontal scalability to at least some degree, since the real heavy lifting is usually easy to break out on a per-user basis. Some apps won't permit that (if you've got cross-user joins then it's going to be a bit of a headache), but at that point you've at least earned the right to start building up a more complex stack and complicating your queries to let things grow horizontally.

Horizontal scaling is a huge headache, any way you cut it, and TBH going with something like Spanner is just as much of a headache because you have to understand its limitations extremely well if you want it to scale. It doesn't just magically make all your SQL infinitely scalable, things that are hard to shard are typically also hard to make fast on Spanner. What it's really good at is taking an app with huge traffic where a) all the hot queries would be easy to shard, but b) you don't want the complexity of adding sharding logic (+re-sharding, migration, failure handling, etc), and c) the tough to shard queries are low frequency enough that you don't really care if they're slow (I guess also d) you don't care that it's hella expensive compared to a normal Postgres or MySQL box). You still need to understand a lot more than when using a normal DB, but it can add a lot of value in those cases.




I can't even say whether or not Google benefits from Spanner, vs multiple Postgres DBs with application-level sharding. Reworking your systems to work with a horizontally-scaling DB is eerily similar to doing application-level sharding, and just because something is huge doesn't mean it's better with DB-level sharding.

The unique nice thing in Spanner is TrueTime, which enables the closest semblance of a multi-master DB by making an atomic clock the ground truth (see Generals' Problem). So you essentially don't have to worry about a regional failure causing unavailability (or inconsistency if you choose it) for one DB, since those clocks are a lot more reliable than machines. But there are probably downsides.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: