Hacker News new | comments | show | ask | jobs | submit login

Aurora is very cool but won't help you much after you vertically scale your master and still need more write capacity. With Cloud Spanner you get horizontal write scalability out of the box. Critical difference.

So if I'm understanding you, with Aurora all writes go to one master and you're constrained by the biggest instance AWS offers. Is that right?

Do you have a sense of what that limit is?

There's a pretty big price difference between Spanner and Aurora at the entry level so it's useful to explore this.

> Do you have a sense of what that limit is?

Per their pricing page[1] it looks like the largest instance available is a "db.r3.8xlarge", which is a special naming of the "r3.8xlarge" instance type[2] which is 32 cpus and 244gb of memory.

That's a hell of a lot of capacity to exhaust, especially if you're using read replicas to reduce it to only/mostly write workloads. Obviously it's possible to use more than this, but the "sheer scale" argument is a bit of a flat one.

[1] https://aws.amazon.com/rds/aurora/pricing/ [2] https://aws.amazon.com/ec2/instance-types/#r3

Wouldn't the write master be I/O-bound, rather than CPU- or memory-?

I disagree, the "sheer scale" argument is not flat. The fact that one can scale horizontally and the other can't is significant.

Let me present a quote to you: 512 kb ram ought to be enough for everybody

You can disagree on that if you'd like, but note that I explicitly acknowledged the possibility of exceeding these limits. In my opinion, for most cases/workloads, it's highly unlikely that you will and designing for that from the outset is a waste of time and resources.

Yes, Aurora has a single write master, though it does have automatic write failover -- i.e. if the Aurora primary dies, one of your read replicas is promoted to the primary and reads/writes are directed to the new instance. That does constrain your primary's capabilities to the largest instance size (currently a db.r3.8xlarge).

I don't have a good idea what the upper limit is for an Aurora database setup.

How does Aurora know that the primary is dead? Automatic failover is problematic in a distributed system.

AWS uses heartbeats for detecting liveliness. If x heartbeats fail the failover procedure is started. Generally 10s - 5minutes. In practice (for me) the failover has been less than 15s.

My concern was more around split brain. If you fail over while the write master is simply unreachable, pain results.

Aurora's read replicas share the underlying storage that the primary uses, so AWS claims that there's no data loss on failover. They also claim -- and I've never heard anyone say they were wrong -- that Aurora failovers take less than a minute. So the pain should be limited to under a minute of lost writes, which most applications can handle (with an error). It can still be painful depending on the application.

See here for more info: https://aws.amazon.com/rds/aurora/faqs/#high-availability-an...

Yeah, the latency on that failover isn't specified.

Do you mean the amount of time it takes to initiate a failover or the amount of time for a failover to complete?

For the former, I don't think they specify beyond "automatic".

For the latter, "service is typically restored in less than 120 seconds, and often less than 60 seconds": http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Aurora...

That's a pretty good cutover, but as you say, they should also include the time needed to detect a failure and initiate the transition.

Amazon provides a testing methodology here: https://d0.awsstatic.com/product-marketing/Aurora/RDS_Aurora... which might be useful to explore when benchmarking the two services against each other.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact