Hacker News new | past | comments | ask | show | jobs | submit login

I've been intrigued by the idea of running databases on EC2 instance storage for a long time. (You couldn't use RDS though, at least not today.) Putting your db on something also called "ephemeral storage" seems risky, but maybe not much riskier than putting it on plain HDDs. The big issue to me is that most instances don't come with much space. If you need more you have to scale up the whole instance (not a separate dimension like with EBS), and if you're already on the biggest instance type you're just out of luck. I guess it could be worthwhile to use separate tablespaces so you could have some data on instance storage and some on EBS. But so far I've gotten acceptable perf by RAIDing over gp2 EBS volumes (12 TB in my case), following the approach here: https://news.ycombinator.com/item?id=13842044



All of Reddit was Postgres on raided EBS up till I left in 2011 and I think still is today but I kinda hope not.

It’s totally safe to use local storage if you build it right. But those raided EBSs caused a lot of problems. In short, when one gets slow the whole volume gets slow because software raid isn’t hardware raid.

The main advantage of RDS is that they take care of the mundane redundancy for you.


Hey Jeremy, I’m saying they should take care of it for me.

As a database operator I treat safety on i3 similarly where I have multiple hot replicas of my data so that if any fails I’m good to go. Additionally, there isn’t any reason you couldn’t have a EBS replica of an ephemeral node.

What we typically do with i3 is mirror the data locally, replicate it, have an EBS replica, and take backups. This is probably overkill but the data needs to be both accessed quickly and secure so that’s where we are at.


I'm wondering, how do you handle failovers?

Is it automatic or manual?

On infrastructure I handled from top to bottom, I used VIPs with keepalived (only the vrrp part, with a weight linked to success/failure of a check script).

But in AWS, I'm wondering how to do it properly, maybe DNS records with low TTL (like 1 second).


We use instance store for Postgres and Cassandra now on i3 and i2 respectively.


I thought reddit went to cassandra.


Cassandra is used for some things, Postgres for the rest. Unless they went full Cassandra recently, which is possible. But for many years we ran both.


We’ve been running Postgres on i3 instances with their attached SSDs. Performance is solid and it’s cheaper too. Having up to date replicas becomes crucial, along with incremental backups (we use wal-e for that).

As you mentioned, it is limited by instance size, but for a DB that fits it works great and has fewer moving parts. Knowing that your entire database is essentially ephemeral raises the stakes too and forces you to take replication, backups and restore testing seriously.


We've been running databases on ephemeral drives for many years, the key is using a database with good replication and failover.

I don't think you should trust your data to a single disk, whether or not it's a physical device in your own datacenter or an EBS in AWS. Everything fails eventually.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: