
Amazon Aurora Now Available - jeffbarr
https://aws.amazon.com/blogs/aws/now-available-amazon-aurora/
======
davidw
Interesting from a licensing point of view: the GPL does not require you to
redistribute changes that are 'internal' to your organization. You are not
redistributing the program itself in this case, you're just letting someone
access it.

For some people, this kind of end-run around the GPL is the poster child for
the AGPL:
[https://en.wikipedia.org/wiki/Affero_General_Public_License](https://en.wikipedia.org/wiki/Affero_General_Public_License)

~~~
jbandela1
Can't you get around the AGPL by just making it an _internal_ service on your
network? Basically it looks like this.

Internet --> Your API service --> AGPL program

Your API service uses the AGPL to implement it's logic. Since the FSF holds
that API cannot be copyrighted, Your API service implements the same API as
the AGPL program and uses a network connection to the AGPL program running on
another computer to implement the API. Since the Internet user never connects
to the AGPL program, they are not entitled to the source code. You are
required to release the source code to the users interacting with the AGPL
program over the network. You therefore release the source code to the
implementers of Your API service and call it a day.

Adding another layer of indirection solves a lot of problems in computer
science, even GPL and AGPL

~~~
ahachete
That is correct.

However, only if there is effectively the API service in the middle, making
your AGPL program internal. But if you are exposing an AGPL program (let's say
a database) as-a-service, you cannot have an API service in between (unless
you'd re-implement all the API yourself) and hence exposing directly the AGPL
database, in which case you are bound by the license with respect to the
users.

~~~
codebeaker
Thanks for posting this, we've had some arguments with our investors and their
lawyers recently about (specifically Golang, in our case) GPL and AGPL
meanings and how they relate to SaaS. One case in point we have a cronspec
parser package licensed AGPL (what the hell!?) which doesn't even include a
web server or listener code of any kind… so are we in violation by using it to
parse cronspecs that our customers have entered into our SaaS API?!)

Do you have any citation, references, or case-law I could use as a starting
point to steal your arguments as my own?

~~~
belorn
The citation/case-law to use is those that define the terms distribution,
adaptation, and derivative work.

With GPL, anything which isn't client side code should be fine, especially
since legal advice from GPL authors have said that SaaS and GPL do not put any
additional requirement on the service provider.

AGPL talk about using the work, and here the law as I have read it defined
that you got to be a lawful owner in order to be permitted to copy the work
into RAM. In order to be a lawful owner of the copy, you got the be in
compliance with the license. How you do so is up to you, but the perceived
consensus seems to put that responsibility on the SaaS provider.

Since we are talking about an SaaS API, what count as the program is sadly a
grey zone. For example, Linux based operative systems has commonly a command
line API but it isn't a single program. Ask yourself (or the lawyers) what a
nontechnical layman would consider as a single work and what they would
consider as multiple separate programs working in unison. I suspect it highly
depend on what the API do, how data is flown, and how the internal source code
is laid out.

~~~
dragonwriter
> With GPL, anything which isn't client side code should be fine, especially
> since legal advice from GPL authors have said that SaaS and GPL do not put
> any additional requirement on the service provider.

Legal guidance from GPL authors is of somewhat limited utility; they aren't
_your_ lawyers, and for software where the FSF isn't the copyright holder as
well as the license author, it doesn't even have the utility of being a
documented representation from the copyright holder as to their intent with
license they were offering.

They also are employed by an entity with a vested interested in promoting the
use of the GPL.

------
jpgvm
Mentioned this on Twitter to the Aurora program manager but it would be
awesome to see a PostgreSQL compatible frontend.

Still though, awesome achievement to build a competitive database engine. :)

~~~
olalonde
> but it would be awesome to see a PostgreSQL compatible frontend

Why not just use PostgreSQL directly?

~~~
jordanthoms
Aurora has a bunch of nice features WRT automatic failover, redundancy, and
automatically scaling storage as you push more data into the DB.

~~~
josh2600
Does Aurora do anything other than detect a heartbeat failure, shoot the node
in the head and bring it up on another box WRT failover?

~~~
awgupta
That is the basic idea on such things, though distributed protocols become
complex when you have to consider split-brain, multiple concurrent failures,
etc which all can occur in large-scale events.

Aurora leverages Amazon RDS Multi-AZ in this area. Their implementation has
been battle-tested over many years and many many db instances.

------
jMyles
I've lost track of the characteristics of the players in this space. Do
Rackspace or DigitalOcean have anything that even compares?

I've always used Rackspace as a sort of default. They seem to employ good
people, so they must be good - so the thinking goes.

But what's the real breakdown between these three? And any others I might not
know about?

~~~
aquadrop
AWS is a giant construction store like Home Depot, where you can find
everything. Rackspace is local home improvement shop, with good service, which
was doing fine until Home Depot build it's store nearby. Now people have less
and less reasons to visit it. Some folks like it for the old times sake,
though. DigitalOcean is your buddy who works at "Hammer&Nails Manufacturing
Inc.", he can get you nice hammers and nails pretty cheap if you ask him, but
not much else.

That's how I see it :)

~~~
boulos
What do you do with your analogy now that Rackspace offers official support
for helping you with your AWS and Azure deployments? Are the general
contractors who started the local hardware store now also working a side job
at Home Depot? ;)

~~~
aquadrop
Local shop owner offers consulting service - go with you to the Home Depot and
help you find right things for your project (and hoping next time you'll go
the his shop, after you see how good he is) :)

------
yeskia
So keen on Australia to get a third availability zone so that we can use this
- we were lucky enough to be invited to the preview programme but there was no
point to us hosting on another continent. Very exciting!

------
waivej
At that pricing, I'm curious if it makes sense to put blobs in the database
rather than S3. Any thoughts?

~~~
jewel
At $100/TB/mo? I don't think so. S3 is $30/TB/mo. Unless you have some
compelling reason to have the blobs in the database, like indexing them, you'd
be better off keeping them somewhere else.

I guess if you won't have many blobs, it may be worthwhile to save yourself
engineering time of supporting both S3 and Aurora. In an ideal world, there'd
just be one permanent data store with a built-in caching layer, for databases,
blobs, and log files.

~~~
toomuchtodo
> I guess if you won't have many blobs, it may be worthwhile to save yourself
> engineering time of supporting both S3 and Aurora. In an ideal world,
> there'd just be one permanent data store with a built-in caching layer, for
> databases, blobs, and log files.

We're getting close to that. Write once your data to S3 for persistent
storage, then load into Elastic Search. Mostly there.

------
zaargy
No Cloudformation support yet? I don't see it in the docs.

~~~
kolev
I'm highly disappointed by Amazon for recommending CloudFormation, but always
neglecting to maintain it. Aurora has been in preview for a while, adding
support for it in advance would be a great thing and coordinate the launches.
As a person who fell in the trap to use the "best practice", I'm always months
if not years behind in my CloudFormation stack as it takes months for
CloudFormation engineers to add, let's say, a single attribute.

So, based on my experience, give it a quarter before you see it in
CloudFormation! I keep bugging Jeff Barr, but it seems that he doesn't have
much control over the process.

The whole development process at Amazon seems broken. Everything is developed
in a silo and once GA, then other teams start integrating it, unless the two
products cannot operate in isolation.

P.S. I keep checking this URL [0] several times a day and I curse several
times a day, because it almost never changes!

[0]
[https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGui...](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/ReleaseHistory.html)

~~~
codeonfire
Amazon is involved in a lawsuit over CloudFormation and U.S. Patent No.
8,271,974. Perhaps they are hesitant to invest until that is cleared up.

~~~
kolev
They still update it, just very slowly, but I wasn't aware of this other
patent troll. I guess Terraform and Fugue are in danger as well.

------
wahnfrieden
> Due to Amazon Aurora’s unique storage architecture, replication lag is
> extremely low, typically between 10 ms and 20 ms.

I think in the original announcement, this number was lower, but this is still
great.

In this range of latency, the most compelling thing relevant to my needs is
the possibility of using read replicas for serving read-only API calls, with
minimal risk of having a write API call immediately followed by a read API
call serving stale data. It's possible to orchestrate a prevention for that
with regular latency-prone MySQL replication, but it carries tremendous
complexity depending on the application.

Would be very interested if anyone else has explored this idea further with
Aurora.

------
anotherangrydev
>they verified that each Amazon Aurora instance is able to deliver on our
performance target of up to 100,000 writes and 500,000 reads per second

This bit caught my attention, does an "Amazon Aurora instance" means one
computing instance? Or do they refer to something like your allocated share of
the overall Aurora platform? Because if they are able to achieve that
performance per-machine, I'm truly amazed.

Their larger instance appears to be "32 vCPUs and 244GiB Memory", that sounds
credible to be able to sustain that throughput, particularly if your whole
data fits in RAM, but barely. Would be nice to see R/W performance on the
smaller instances.

~~~
jeffbarr
You can read the blog post that I wrote last year when we announced Aurora to
learn more about how it works:

[https://aws.amazon.com/blogs/aws/highly-scalable-mysql-
compa...](https://aws.amazon.com/blogs/aws/highly-scalable-mysql-compat-rds-
db-engine/)

~~~
o_____________o
Do you have any plans for Geo functionality? We're working on a massive
project that Aurora sounds perfect for, aside from the apparent lack of that
feature.

~~~
devNoise
It seems like MySQL 5.7 will start to have better GIS support. Though that is
still a development release at this point. Hopefully if Amazon tries to
maintain compatibility, native GIS support may eventually arrive.

------
acconrad
Does anyone know how this benchmarks against Postgres and MySQL? Would be
curious to see how TPC-C runs on this platform but I can't find any studies of
any independent parties benchmarking Aurora.

~~~
boulos
It only just came out of preview, so I'd expect someone independent to test it
out in the coming weeks. Maybe / hopefully one of the Percona folks?

Edit: typo (out _of_ preview)

~~~
awgupta
For TPC-C like benchmarks, you can run: 1) CloudHarmony:
[https://github.com/cloudharmony/oltpbench](https://github.com/cloudharmony/oltpbench)
2) Percona: [https://code.launchpad.net/~perconadev/perconatools/tpcc-
mys...](https://code.launchpad.net/~perconadev/perconatools/tpcc-mysql)

We've found it easier to load large datasets using CloudHarmony but have run
both.

I'd recommend reading through [http://d0.awsstatic.com/product-
marketing/Aurora/RDS_Aurora_...](http://d0.awsstatic.com/product-
marketing/Aurora/RDS_Aurora_Performance_Assessment_Benchmarking_v1-2.pdf). It
is oriented towards SysBench, but will help you set up your clients to have
enough network throughput to run a full test.

Generally, we find the performance comparison improves on large instances,
high throughput workloads, or when the data set does not fit in RAM.

------
felixgallo
The AWS pages are very heavy on marketing and light on system characteristics.
It would be pretty nice to know if the characteristics of Aurora are different
than RDS; for example, RDS table size and index rebuilding were a big problem
for a client, and I suspect that Aurora would do better, but have no reason to
believe it.

~~~
jdc0589
Aurora is not an RDS alternative. It is a new database engine available to use
on the RDS platform.

~~~
felixgallo
hopefully everyone else understood what I said.

~~~
awgupta
We've made a number of improvements relative to MySQL, for example with large
numbers of tables and the results set cache. There are some improvements on
large tables and schema changes, but quite a bit more to be done in those
areas. You can contact our PM team at aurora[dash]pm[at]amazon[dot]com to give
us the issues that matter most. Feedback is how we prioritize!

------
dylanz
We've been experiencing huge problems with our multi-AZ and encrypted MySQL
RDS instance this morning (it looks like a hardware issue). We're contacting
support, but are considering taking our entire application down and migrating
to Aurora. The timing on this is too ironic.

------
iofj
I wonder what this is. Did they create a proper multitenant version of MySQL
or are they simply running mysql without a container around it ?

I'm guessing it's the latter (much easier) option, but ... Do you really get
5x savings from simply not having a container ?

~~~
mathnode
Amazon Aurora – New Cost-Effective MySQL-Compatible Database Engine for Amazon
RDS

* [https://aws.amazon.com/blogs/aws/highly-scalable-mysql-compa...](https://aws.amazon.com/blogs/aws/highly-scalable-mysql-compat-rds-db-engine/)

It's an engine for MySQL.

Aurora Database Architecture by Amazon Web Services:

* [https://www.youtube.com/watch?v=-TbRxwcux3c&list=WL&index=5](https://www.youtube.com/watch?v=-TbRxwcux3c&list=WL&index=5)

------
elktea
Is this based on Galera?

~~~
falcolas
That's my question as well - and an important one. Using Galera cluster with
MySQL imposes several performance and usage constraints an end user needs to
be aware of.

And if it's not Galera, how did Amazon work around the constraints of multiple
writers in an ACID database, and what constraints does it impose?

~~~
mathnode
It's linked in the article, here is more links:

[https://news.ycombinator.com/item?id=9960276](https://news.ycombinator.com/item?id=9960276)

~~~
falcolas
It definitely looks like Galera (which is what powers both MySQL and MariaDB
cluster implementations; it's a storage engine built atop InnoDB), but it's
hard to say without more information. They mention a quorum write with
automatic recovery across three nodes, but doesn't mention the method used -
two phased commit, checking commits against pending transactions, etc.

It's a very complex thing to implement, and unless they have made leaps beyond
what Galera has done, for some workloads it will be fast, but for others it
will perform far worse than a standard MySQL instance.

Of course, I guess it could also be a cluster built upon NDB, but the lack of
memory constraints on the size of the data makes that less likely.

~~~
awgupta
Aurora isn't implemented based on either Galera or NDB.

~~~
falcolas
Since it sounds like you have the information on what it is based upon (if
only the principles which were used to address distributed ACID consistency),
it would be good to get this information dissiminated - it's hard to trust
that it will "just work" when we have so many examples of distributed ACID not
working well.

~~~
awgupta
You can think of Aurora as a single-instance database where the lower quarter
is pushed down into a multi-tenant scale-out storage system. Transactions,
locking, LSN generation, etc all happen at the database node. We push log
records down to the storage tier and Aurora storage takes responsibility for
generation of data blocks from logs.

So, the ACI components of ACID are all done at the database tier using
(largely) traditional techniques. Durability is where we're using distributed
systems techniques around quorums, membership management, leases, etc, with
the important caveat that we have a head node generating LSNs, providing a
monotonic logical clock, and avoiding those headaches.

Our physical read replicas receive redo log records, update cached entries and
have readonly access to the underlying storage tier. The underlying storage is
log-structured with nondestructive writes, so we can access data blocks in the
past of what is current at the write master node - that's required if the
replica needs to read a slightly older version of a data block for consistency
reasons.

Make sense?

------
choppaface
Is 30GB of RAM the max for this service? That's rather paltry for very large
joins, which seem plausible given they claim support to up to 64TB of data.
Percona had recommended 144GB+ and moving to that from 64GB made our BI-type
queries an order of magnitude faster. Obvi YMMV but in my experience one needs
more RAM for joins before more disk for data.

That said, the UI and system for managing replication looks pretty nice. I've
never done MySQL admin directly but I do appreciate how much of a pain point
this can be.

~~~
boulos
No; from the product page:

"You can use Amazon RDS to scale your Amazon Aurora database instance up to 32
vCPUs and 244GiB Memory. You can also add up to 15 Amazon Aurora Replicas
across three availability zones to further scale read capacity. Amazon Aurora
automatically grows storage as needed, from 10GB up to 64TB."

------
Mizza
Can anybody with any experience compare this vs RDS for me?

~~~
giaour
Aurora is an engine for RDS.

------
boulos
Offtopic a bit but I'm genuinely curious: Jeff, why did you post this at 10pm
Pacific?

~~~
kolev
That's pretty much his job and he's very passionate about it.

~~~
jeffbarr
You got it!

------
imaginenore
At these prices I would rather install MySQL on Hetzner myself at a fraction
of a cost.

~~~
notwedtm
Every time I see a comment like this, I get the feeling that the poster
doesn't understand how "the cloud" works.

Everybody knows that renting a dedicated server (or purchasing your own
servers) is generally a more cost effective approach. Cloud computing is used
to accomplish different goals when used appropriately.

Sure you could install MySQL on a box at Hetzner, but what happens when it
goes down? Do you feel like maintaining the box for security and updates? What
about log rotation? What if you need to migrate to a different geographical
region, or need to increase capacity?

Smaller sites might be okay with managing those on their own, but as you grow,
being able to offload that work to Amazon (who tend to know what they are
doing more than the average joe) is a benefit when you have other business-
logic issues to deal with.

So yes, naively getting a box at Hetzner or any other dedicated provider would
be less of an upfront monetary cost, but in the long run, the advantages
aren't as clear cut.

