

How to Upgrade a Legacy Heroku Database - without downtime - ashrust
http://blog.sendhub.com/post/30041247598/how-to-upgrade-a-legacy-heroku-database

======
stickfigure
Woah. Heroku is nice, but doesn't $6400/mo seem insane for a quadruple extra
large ec2 instance? That's an impressive margin.

~~~
ashrust
We've noticed the same price differential but if you include the cost of
paying someone to manage the db and be on call in the event of an issue - it
seems to even out.

We've moved some stuff off Heroku but not because of cost, most recently we
discovered they have a tight max on scala apps and it was cheaper for us to
put it on aws directly, rather than code around it on Heroku.

~~~
josegonzalez
That sounds wrong, but I guess you have a premium on someone being "on-call"
that I am not aware of. Wouldn't it be cheaper/easier to just ensure all your
developers understand your infrastructure and can step in where necessary?

~~~
ashrust
You're right, if we manage our own db, then we need everybody to understand
how to troubleshoot across both software and hardware issues, rather than just
file a ticket and follow instructions/suggestions.

Let's say we have 5 engineers and that kind of training costs us 1 hour per
eng/per week. That's 20 hours a month, which is easily more than $1k/month and
the cost grows as the team expands. This is before you account for the losses
of having your own engineers on call for issues and setting up early warning
systems specific to the database etc.

~~~
josegonzalez
Good point.

Hardware aside - thats largely moot on both Heroku and AWS - the 5x cost of
the server will very quickly outstrip the cost of a good developer/operations
guy. It seems like a good idea - I initially just throw money at a problem
too! - but at some point it does not make financial sense, which is what I was
getting at.

Note that we don't have every engineer on staff capable of bringing up a new
server/rebuild a damaged mysql replication setup, but we do have engineers
that can:

    
    
      - ssh (or attempt to ssh onto) a dead instance
      - tail a log
      - check disk space, memory usage and cpu load
      - check if something is running
      - restart something that just randomly stopped
    

All that stuff is pretty basic, and will get you 80% of the way there - the
other 20% being experience. I guess at some point you are paying for the
experience of working with a datastore, so there it makes sense.

As far as early warning etc., that does take time, but it's not the big deal
it appears to be.

EDIT: I can't math, and 80 + 10 = 90, not 100. Good thing I'm not a data
scientist :)

~~~
jeltz
For getting early warnings and helping your employees debug what went wrong
there are several good monitoring tools. They wont magically solve your
problems but it will make it much easier for non-database experts to run a
database server. For monitoring PostgreSQL I have used the excellent plugins
that are shipped with munin.

<http://munin-monitoring.org/>

EDIT: munin does also ship monitoring plugins for MySQL but since I have never
used them I cannot vouch for them. The general health monitoring (disk usage,
CPU usage, SMART status, inode usage, ...) of the machines provided by munin
is also probably more valuable than the database specific monitoring.

------
carsongross
When you say 'cache utilization' are you referring to the postgres working
set/cache? How are you measuring that?

~~~
jaytaylor
"Cache utilization" refers to the amount of space the database is using in
memory compared to the amount of total cache available on the dedicated Heroku
database instance (as seen on their pricing page).

We are measuring it by issuing a `hr pg:info -asendhug` command which includes
a field called "Data Size".

e.g.:

=== HEROKU_POSTGRESQL_GOLD (DATABASE_URL) Plan: Ika Status: available Data
Size: 2.00 GB Tables: 76 PG Version: 9.1.4 Fork/Follow: Available Created:
2012-08-16 04:09 UTC Followers: HEROKU_POSTGRESQL_TANGERINE Maintenance: not
required

As for whether this is the "working set/cache", it is not clear where the
"Data Size" number comes from. This is an interesting question, I'll ask
Heroku for more information and report back.

~~~
jaytaylor
Heroku has replied..

> Cache utilization talks about the working set size, yes.

>

> The database size typically means just the result of

> pg_database_size() on your database (you can use the

> current_database() function as a shortcut). Note that this

> _does_ include dead tuple bloat that is regularly cleaned

> up by Postgres autovacuum, so it's more of an estimate than

> a hard number.

~~~
carsongross
Right, but that doesn't give you a good idea of how close your working set is
to the max memory cache, unless your entire DB is active.

I'll ping them again and ask if they've found a good way to measure this. It
seems like this is the most important number to look at when scaling a
heroku/postgres project.

------
willlll
I'm glad you were able to find a good solution.

------
indiecore
Very cool. That's all I can really say, kudos for thinking on their feet,
coming up with a clever solution and then sharing it.

~~~
ashrust
We do try - and btw, we're hiring if you know any iOS engineers...

Let us know if you're in the unfortunate position of having to use it. We'd
love to know if it works out for you.

