

Ask HN: Explain AWS - gsmaverick

Ok, so I might be completely missing something here, but bear with me.  I want to use AWS to build my service.  But I am a bit perplexed when it comes to storage.  If I have MySQL running where do the database files get stored?  Does each EC2 instance come with a certain amount of storage?  Would be glad if someone explained this for me :)
======
delano
Every EC2 instance has about 150GB of transient disk space. It works like a
regular disk but once you shutdown the instance (or in the rare event the
instance fails) the disk is gone forever. To solve this problem, Amazon
provides EBS volumes ("elastic block storage"). These are like raw disks that
you can create on the fly. You create one, attach it to your instances, give
it a filesystem, and then mount it.

You can run MySQL off of the transient disk ("/mnt") but you should consider
running it off of an EBS volume. One reason is because you won't lose the disk
if something happens to the instance. Another is that you can use the
snapshots feature to create instant backups of your database. Later on you can
create new EBS volumes from a snapshot (which is really cool for building a
stage environment with production data).

This may sound a bit complicated when you're just starting out. It was for me
anyway so after working with AWS for a couple years I started working on a
development and deployment tool for EC2. The configuration is entirely via a
Ruby DSL.

You define the properties of your environments and roles with the machines DSL
(including EBS volumes) and you script your repeatable processes (like
deployments) with the routines DSL:

<http://github.com/solutious/rudy/blob/0.7/Rudyfile>

If Python is more your style, there's boto: <http://code.google.com/p/boto/>

~~~
iamelgringo
I've found it easier to think of an EC2 image's 150GB of 'disk' as RAM, and
Elastic Block as an actual hard drive. EC2's disk is there when it's powered
on, but when you turn it off, all data disappears. Elastic block is really
EC2's hard drive.

Two issues that burned me a couple of months ago with EC2 and Elastic Block:

I built my server and got it running just right. I then wanted to back it up,
so I bundled it (save your instance to S3) but forgot to register it(list your
S3 bundle as a privately available AMI). When I terminated my instance I
wasn't able to restart it, because it wasn't available in my list of
registered AMI's. I had to rebuild my server from scratch.

The second problem that I had was in rebuilding my database after my instance
terminated. Postgres keeps the log files (transactions the database has
completed) in the install folder and the data file(what the database has
stored) in the storage folder. When I terminated my instance, I lost all the
log files that were stored on the EC2 'disk'. I had the data file on elastic
block, but the log files were gone. The standard recovery process for postgres
needs log files.

It took me a day to figure out how to rebuild the database from just the data
files. I now have both the database and the log files on my elastic block
image. Hopefully that will help the next time I kill my server. :) I also now
have automated backups of everything.

~~~
delano
Ya, getting burned by EC2 is a rite of passage. I learned my lesson with
Amazon's API tools. Dealing with IDs is cumbersome and dangerous when it comes
to production instances. Automation is a necessity.

How are you running your automated backups? Have you automated deployment too?

------
keefe
Each EC2 instance is a virtual, private server which looks just like any other
linux box. However, files on disk are part of shared drives and are not
guaranteed to persist. You have three choices : attach and pay for an elastic
block store that IS persistent, make periodic backups to S3 or forgo the use
of the disk altogether and use SimpleDB instead of mySQL. I'd recommend using
an elastic block store.

~~~
latortuga
I'll back this up with the addendum that SimpleDB has useful applications
where using it might be a better choice than MySQL (e.g. a service that gets
lots of reads but real-time updates isn't necessarily mission critical)

~~~
keefe
I think it's also interesting to note that once you do a cluster, you have a
distributed system and instantaneous writes are no longer a cheap operation...
Plus how many operations do you need to access full data in your database? For
simple one off access, you could propagate the changes into ram on each of
your cluster machines (memcached) and only relatively "unimportant" search
operations would see stale data.

------
lrm242
Each EC2 instance comes with a certain amount of transient storage. It is
transient because it does not persist beyond instance termination. Amazon also
has persistent storage for your EC2 instances called Elastic Block Store. EBS
volumes will persist beyond instance termination and can be attached to any
running instance (but only one at once) within the same availability zone.

If you're running MySQL on EC2, it's pretty straightforward. You get an EBS
volume for your database and back that volume up to Amazon's S3 service. S3
will then provide multi-site redundancy for your data in the event of AWS
availability zone failure.

------
wave
Others mostly answered your question, but if you like to run MySQL on Elastic
Block Store (EBS), you should read the following article:

[http://developer.amazonwebservices.com/connect/entry.jspa?ex...](http://developer.amazonwebservices.com/connect/entry.jspa?externalID=1663)

If you take the approach described in the above article, you should think
about creating a symbolic link to your EBS volume instead of changing MySQL
configuration file.

------
mollylynn
If you haven't done so already, check out RightScale (www.rightscale.com).
They offer a dashboard that helps users access AWS.

~~~
just_the_tip
Or you can just use Amazon's console.

<https://console.aws.amazon.com/>

------
nolanbrown23
I used Engine Yard Solo for working with AWS. It's great service and really
easy to get started with Rails app on EC2. You've got to wrap your head around
how EC2 works but I really love the ability to start up a brand new server on
the fly and do whatever I need to do, and then shut it down.

------
jawngee
a. Unless you like throwing money out the window, there is no need to host
your stuff on AWS. It's the most expensive of the cloud offerings. Go with
Linode or Slicehost.

b. You'll have to use EBS if you want persistent disk storage with an AWS
instance. Otherwise when you reboot, goodbye to everything on your harddisk.

But, seriously, reconsider your aims of putting up a 24/7 web hosted up on
EC2. You're wasting money.

~~~
delano
You can do things with EC2 you can't do anywhere else.

I've helped companies build deployment processes with EC2 that allows them to
launch a staging environment with a single command that includes a complete
copy of production data. They run their tests, then cut a release and shutdown
the staging environment with another single command. That's not possible
without EC2 and EBS.

~~~
jawngee
Also, you are wrong. You can do that on slicehost with their API as well.

In fact, with the slicehost api, you can take a snapshot of currently running
system, and clone a new slice from that snapshot.

~~~
delano
Can you create, attach, and take snapshots of multiple disks with Slicehost?

------
mattsmall
RightScale's paid edition has prebuilt MySQL Master-Slave configurations with
or without EBS. EBS is great if your transaction volume is average, but on
high IO disks it can get prohibitively expensive. The RS Manager for MySQL has
failover, recovery tools as well. All you need to do is plug in your DB dump.

RightScale's Developer Edition is always free - AND - RightScale Website
Edition is FREE for Ycombinator startups while you're actively engaged in the
program.

