
Announcing RDS/Aurora Snapshot Export to S3 - nitesh_aws
https://aws.amazon.com/about-aws/whats-new/2020/01/announcing-amazon-relational-database-service-snapshot-export-to-s3/
======
whalesalad
This is a welcomed addition. Would love to see a “restore to existing DB”
option. Sucks having to restore a backup to a new instance all the time.

~~~
otterley
(Disclaimer: I work for AWS.)

Can you tell us more about the use case? I'm not sure I understand the need
here, or how it might improve your work patterns.

Also, note that this is an export feature, not a backup/restore feature.
Aurora already has native backup/restore:
[https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide...](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/Aurora.Managing.Backups.html)

~~~
whalesalad
I am imagining a scenario where the instance is shut down and restored to a
point-in-time snapshot. The UI suggests this is possible based on the verbage,
and this is an expectation of lots of other DB management tools.

At the moment, when you try and click 'Restore to point in time' it's just
taking you through the new instance wizard with the snapshot being used as the
initial dataset. Screenshot:
[https://i.imgur.com/vBiqgJg.png](https://i.imgur.com/vBiqgJg.png) \-- the
'restore to point in time' is extremely misleading.

My specific uses case would be for PostgreSQL and/or MySQL, I do not use nor
have I tried Aurora.

I can't get the S3 export to work either. Screenshot:
[https://i.imgur.com/nabbsMr.png](https://i.imgur.com/nabbsMr.png)

I need to grab a single row out of a snapshot from our staging database. So
far I have restored the DB once and our QA team was able to reproduce a bug.
They began testing prior to me pulling the data out, so now I am re-cloning an
instance to fetch the unmolested data from that DB.

tl;dr - allow me restore a snapshot like every other piece of software that
has a snapshotting component. caveat is that this will likely need to be an
offline process, but as long as that is noted and clear, that is going to help
a LOT of people out.

~~~
otterley
> allow me restore a snapshot like every other piece of software that has a
> snapshotting component. caveat is that this will likely need to be an
> offline process, but as long as that is noted and clear, that is going to
> help a LOT of people out.

We have this in Aurora already. If you migrate to Aurora (which is
MySQL/Postgres compatible), you'll get this functionality out of the box.

~~~
whalesalad
Is there a 'clone this database, but as an aurora database, and make it a
replica that can eventually become standalone' feature?

Most of my RDS usage is for my clients, so it is not always feasable to up and
forklift them to a different database platform.

I would imagine though that AWS is not going to invest a lot of time and
energy into improving the developer ergonomics and UX of non-Aurora databases?

I can't think of anyone who would _choose_ Aurora right off the bat -- becuase
it's one click closer to complete vendor lock-in and away from OSS.

~~~
otterley
We have migration instructions in the Aurora documentation. See, for example:
[https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide...](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraMySQL.Migrating.html)
(MySQL) or
[https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide...](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraPostgreSQL.Migrating.html)
(Postgres).

Again, both are compatible with the open-source versions, so you can export
your data if you need to via the standard tooling.

------
sandGorgon
Hmm..there's no restore? So this is like a one way export.

Also Glacier export would be nice for long term cold storage of database
backups.

But a restore capability is essential.

~~~
zten
The snapshots already deliver your desired functionality. This dumps it as a
different format. In fact, it probably depends on restoring the snapshot to a
database in order to implement this format change.

I think this is a replacement for rolling your own database exports for
analytics applications with Sqoop or Spark.

~~~
jungturk
Exporting to parquet has existed for a bit for AWS RDS (via AWS Data Migration
Service), but this should make doing so more straightforward (since it doesn't
require managing any DMS compute).

AWS DMS also supports incrementals (using the change-data-capture features in
the DB).

~~~
zten
Hmm, I wonder why they didn't push people towards DMS instead? This S3 export
offering certainly commands a premium price for the privilege.

~~~
anbotero
I just wish DMS supported real-time data synchronization for utf8mb4 on
MySQL... That really destroyed my workflow with errors when I moved to utf8mb4
(requirement) and then had to stop using DMS altogether. It’s pretty much the
only thing I used from that services.

~~~
jungturk
There are a number of unfortunate shortcomings depending on what your DMS
source/target are. Another that bit us was (the lack of) support for JSON
types in MySQL sources.

Was there no option to add a transformation to your DMS routine to handle
that?

------
etaioinshrdlu
I use rds Aurora but i run a daily cron job to export the entire thing with
mysqldump + gzip to s3.

I pass some flags to mysqldump to avoid locking the db and otherwise
interfering with production. It also dumps from the reader node not the
writer.

I also clear some tables and restore it daily to a development DB.

Sadly this looks like it only supports parquet, a rather usual db dump format.
Even if, I'm sure, it's way more efficient to process using some modern tools
than a text sql dump.

I just like open formats and interoperability and AWS's rds offerings are
always just slightly 'off'.

~~~
sudhirj
Parquet is an open format. It’s part of the Apache foundation.

Would have preferred CSV as well, though, easier to work with. There’s also a
little known feature called S3 Select that allows you to run SQL on S3 files
and selectively retrieve data, which is what Parquet is for in the first
place.

~~~
etaioinshrdlu
Can you load that dump using any open source tools outside of AWS and recreate
the database identically? Because you can do that with mysqldump.

~~~
derision
I don't think this is meant for recreating the database, but rather exporting
large sets for analysis somewhere else. But as far as tools yes, parquet can
be parsed by any tool that knows the format. It's not too uncommon

------
nitesh_aws
Full disclosure - I work for AWS and my opinions are my own

~~~
saurik
For those of us who are interested, where might we find some of your hottest
opinions?

------
samokhvalov
Great news. But unfortunately, it is not the original data snapshot as the
title says, it's Apache Parquet format. So this is useful for analytical tasks
only.

For operational tasks, still, the only option (besides RDS cloning) for
Postgres is pg_dump or pg_transport, losing physical layout.

------
scrollaway
Are there lightweight solutions to reading parquet files in Python? Any time I
want to deal with parquet in AWS lambda I have to deal with the entire Pandas
suite which is a pain on lambda.

~~~
cavisne
One interesting way is to use S3 Select which can read parquet, then you just
need a dependency on the AWS sdk

------
nishantvyas
What happens to bandwidth saturation on the backup database/RDS host? is it
unlimited? Capped? user-defined? i.e. would backup impacts the ongoing
transactions/packet transfer?

~~~
Tobani
snapshots would happen as normal in RDS. This is just the process of moving
them "offsite" to s3.

~~~
nishantvyas
got it. thanks.

------
giseir
Excuse me for technical illiteracy, but would that make sense to export these
snapshots daily? Is there any way to consistently get the (every 24h) data
from the database in S3?

~~~
giseir
Let’s say I have a database with 1TB of data. I export it daily to S3 with
this Snapshot export. Does it mean I will be adding 1TB every day to my S3
storage?

~~~
HatchedLake721
Yes

~~~
giseir
Thanks, now I understand it. Then it makes sense to delete the earlier
snapshot and only keep the latest one to not store redundant data.

~~~
jungturk
You can also use the CDC features of AWS Data Migration Service to just get
incrementals (also in parquet if preferred) rather than full snapshots.

[https://aws.amazon.com/blogs/database/aws-dms-now-
supports-n...](https://aws.amazon.com/blogs/database/aws-dms-now-supports-
native-cdc-support/)

------
johnrob
Quick question (answer might be in docs somewhere): can you choose which
tables to export?

~~~
nitesh_aws
Yes, you can filter. More instruction in the docs.

------
cmclaughlin
Paying for the full snapshot and not just the filtered data is a bummer

------
jbverschoor
Nice! More database dumps exposed.

~~~
bdcravens
I imagine a significant number of companies rolled their own solutions to dump
to s3 already, so this is no less secure.

