
The Switch from Heroku to Hardware - zeeg
http://justcramer.com/2012/08/30/how-noops-works-for-sentry/
======
jacques_chester
So basically nothing has gone wrong yet, therefore it's a slam dunk case for
Doing It Yourself.

I'm sorry, but no.

You don't pay an ops guy or Heroku buckets of $$ for when things are going
well, just as you don't pay $$ for software that only handles the happy case.

You pay $$ for someone who has fixed shit that went horribly wrong and has the
scars to prove it. "That deep purple welt on my lower ego ... is where I only
had a backup script and never tested the backups. This interesting zigzag is
where I learnt about things that can wrong with heartbeat protocols ..."

Edit: though see below for a more nuanced discussion of reasons from OP.

~~~
zeeg
You're saying before Heroku (or the cloud) everyone paid an operations guy to
do anything?

Nope.

I don't need to suffer the same acts of terror operations people have gone
through to be able to avoid, prevent, or recover from them. I'm paying
__myself __to be the operations guy, as well as the engineer.

~~~
jacques_chester
It's about value delivered and opportunity cost paid.

You're betting foregone engineering time against Heroku's value-as-delivered.

I don't think that the hours you will _inevitably spend_ fixing the ops
infrastructure will turn out to be very profitable.

For a large business, moving off Heroku to inhouse operations is probably
justifiable, because they can capture sufficient value from a smart ops team
to offset the potentially very high monthly bill Heroku would levy for their
business.

But for a smaller firm, Heroku so abundantly oversupplies value compared to
the bill that you will simply never be able to capture that value from
internal work at anything like the same price.

It's like saying "I should stop buying food from the supermarket, it would be
cheaper to grow my own vegetables!"

In a naive dollar-cost analysis this is true. In an actual consideration of
whether raising a vege patch, bearing the risks of starvation, doing it poorly
because you're new at it, spending the hundreds of hours of labour it will
require -- means that letting a professional farmer do it is much smarter.

Gains from trade are _gains_.

I have a pretty strong opinion on this given how much time I've already sunk
on similar work: [http://chester.id.au/2012/06/27/a-not-sobrief-aside-on-
reign...](http://chester.id.au/2012/06/27/a-not-sobrief-aside-on-reigning-in-
chaos/)

And for my secondary startup I am thinking I will just let Heroku handle it.

------
timr
_"On the limited hardware I run for getsentry.com, that is, two servers that
actually service requests (one database, one app), we’ve serviced around 25
million requests since August 1st, doing anywhere from 500k to 2 million in a
single day. That isn’t that much traffic, but what’s important is it services
those requests very quickly, and is using very little of the resources that
are dedicated to it. In the end, this means that Sentry’s revenue will grow
much more quickly than it’s monthly bill will."_

There's nothing in this justification that doesn't also apply to Heroku. The
cost structures just aren't significantly different at a two machine scale.
However, as people keep pointing out, the roll-your-own-cloud approach
requires that you build and maintain a bunch of infrastructure that Heroku has
already built for you, or you forego redundancy and fault tolerance that
Heroku has already built for you.

The best lines of code are the ones you don't have to write.

~~~
zeeg
I didn't note in this post (but in others), but before I switched, my Heroku
bill was almost $700 (and I couldnt get it to perform well), the current bill
is far less even with growth.

~~~
timr
Yeah, I read your original post. I've worked pretty extensively with Heroku
and with large, custom, in-house infrastructure, and I don't share your
experience.

There's an I/O penalty for working on AWS, but it's on the order of tens of
percent, not hundreds. I suspect that your original problems were related to
working set size relative to cache (since Ronin => Fugu bumps cache by over
2GB, and you said that Fugu was working well).

Heroku's largest database has a 68GB cache at an (admittedly expensive) $6,400
a month. But even so, $6,400 is a small expense for a growing web application.
A mediocre developer costs more than that. Trading off server cost for
developer cost is an asymptotically bad bet.

~~~
zeeg
Actually I think I/O was my primary bottleneck. Once I addressed that I
started hitting CPU/memory constraints on Dynos.

The database definitely wasnt __unreasonable __at $400, but for a bootstrapped
project (especially something thats a side project for me), that was a big
consideration.

I probably would have toughed it out with Heroku if I could have gotten things
to perform better. At one point I was running 20 dynos trying to get enough
CPU for worker tasks to actually keep up, and I unfortunately couldn't solve
the bottleneck to where the cost was reasonable.

The application isn't typical (what Sentry does pushes some boundaries of SQL
storage for starters), but it was costing too much of my time to struggle with
optimizing something that really shouldn't have needed that much effort.

I definitely like the redundancy provided, and the ability to add application
servers with zero-thought is a huge plus, I just couldn't justify the cost of
the service in addition to my frustrations/time of trying to scale it on
Dynos.

~~~
timr
_"Actually I think I/O was my primary bottleneck. Once I addressed that I
started hitting CPU/memory constraints on Dynos....At one point I was running
20 dynos trying to get enough CPU for worker tasks to actually keep up"_

Something doesn't make sense. If you "addressed" your I/O problem, your CPUs
were therefore all busy doing something _much, much slower_ than a disk
read/write, in software (which would have to be both obvious and unbelievably
horrible). If that's true, something pathological was going on in your code.
I'm going to assume that you would have noticed it -- swapping, for example.

So let's go back to I/O: if your database was slow, you might observe
something superficially similar to what you've described: throwing lots of
extra CPUs at the problem would result in lots of blocked request threads, and
_appear_ that your dynos were all pegged. The exact symptoms would depend on
your database connection code, and your monitoring tools. But in no case would
throwing more dynos at a slow database make sense, so I'm going to assume that
you didn't do that on purpose (right?)

Given the above, I still can't meet you at the conclusion that abandoning
Heroku was the magic bullet for your problems. There's not enough information,
and it doesn't add up. My money is on one or more of the following: DB cache
misses (i.e. not enough cache); a heavy DB write load; frequent, small writes
to an indexed table; or pathological memory usage on your web nodes. And if it
turns out that the cause is due to I/O, you've only bought yourself a
_temporary respite_ by moving off Heroku. Eventually, you'll get big enough
that the problem will re-emerge, even though your homebuilt servers are 10%
faster (or whatever).

EDIT: Aha! Your comment in another thread actually explains your problem: you
_were_ swapping your web nodes by using more than 500MB RAM
(<http://news.ycombinator.com/item?id=4458657>).

~~~
zeeg
It would take me more than one blog post to describe the architecture that
powers Sentry, the various areas that have and can have bottlenecks (some more
obvious than others).

More importantly, this a few months ago I made the switch, and I don't
remember the specifics of the order of events. I can assure you though that I
know a little something about a little something, and I wasnt imagining
problems.

(Replied to the wrong post originally, I fail at HN)

<http://news.ycombinator.com/item?id=4458643>

~~~
timr
_"I can assure you though that I know a little something about something, and
I wasn't imagining problems."_

Since you've made it clear in another thread that you were actually running
out of RAM on your dynos, I imagine you _were_ running into trouble. There's
no need to be snide about it.

Bottom line: you hit an arbitrary limit in the platform. If heroku had high-
memory dynos, the calculus would be different. In the future, instead of
arguing that your homebrew system is better than "the cloud", you could just
present the actual justification for your choice.

------
kingrolo
I'm okay with the concept of trading cost/time for convenience, and why that
might work for some folks, but even before we get to that argument, my
experience with Heroku is that it just isn't reliable enough for client sites.
Since reviewing it (over the last few months) we've had a couple of instances
of real downtime (ie, greater than 30 mins), and a few spots of smaller
amounts (a few mins each). We don't get that from our Linode+Puppet sites
(taught myself Puppet as we went along, I'm a dev rather than an ops guy
really).

~~~
taligent
Well I guess you are lucky then because Linode has had massive downtimes in
their London and Dallas data centres in the last few days and when I was a
customer their Fremont data centre went down all the time.

<http://status.linode.com>

Heroku at least uses EC2 which will be far more reliable over time.

~~~
citricsquid
Not sure if the London one was really "massive", I have 12 Linodes in London
all on different machines and I had one become unavailable during the downtime
a few days ago and it was resolved very quickly.

------
josephlord
This stuff is really useful even if you don't want to move off Heroku. Just
knowing that you can is really reassuring in case they stop offering what you
need. Simple how to guides are even better.

It was the fact that Heroku offer an environment to run fairly standard Rails
+ Postgres that made me pick them over the more unique and harder to move from
Google platform. Even though I was starting from scratch.

It's always good to have an exit route.

------
dangrossman
A bit off topic but this is a good thread to ask in:

Those of you running startups that don't colocate your own hardware, and don't
run in a cloud, where do you rent servers from these days?

Most of my stuff is at Softlayer, but their RAM pricing is killer ($25/mo/GB).

~~~
corford
It might not work for you if you're US based (and want your servers in a
datacentre you can easily fly/drive to) but I hear a lot of good things about
www.hetzner.de and the prices are fantastic. I plan to use them for the
venture I'm currently working on.

~~~
boundlessdreamz
Hetzner.de is awesome. They are cheap and reliable which is a combination you
don't see often

~~~
dangrossman
Unfortunately, being in Germany and using desktop-grade hardware means high
latency and (comparatively) high failure rates. That's why it's cheap.

~~~
corford
Germany isn't badly connected and any extra latency can be largely mitigated
by using a CDN for your static assets.

As for desktop-grade hardware - I haven't seen this mentioned by anyone else.
Any links to back it up?

If that's true (and I doubt it is), it's avoidable if you use their extremely
well priced colo option (which is what I'm going for).

~~~
dangrossman
> are you referring to the latency

Yes, "high latency" was referring to the latency. A CDN for static assets
doesn't mitigate the fact that the initial request, and everything dynamic is
on the wrong side of the planet for most startups' users.

> Any links to back it up?

Hetzner.de. They list the hardware in the boxes. It's desktop processors,
desktop motherboards, desktop hard drives and non-ECC RAM in all the cheap
server lines.

[http://www.hetzner.de/en/hosting/produktmatrix/rootserver-
pr...](http://www.hetzner.de/en/hosting/produktmatrix/rootserver-
produktmatrix-ex)

7200RPM hard drives, Core/Athlon processors and non-ECC RAM don't belong in
servers.

[http://arstechnica.com/business/2009/10/dram-study-turns-
ass...](http://arstechnica.com/business/2009/10/dram-study-turns-assumptions-
about-errors-upside-down/)

According to Google's study, you would expect about 2 memory errors per day
per server running 24/7. You need ECC RAM.

> I doubt it is

It's rude to publicly call someone a liar without evidence.

~~~
corford
I think you're over blowing the latency problem. European users interact with
US hosted startups all day long and it's not like you hear us complaining.

Re: hardware. The choice is nice to have and their EX6 and up packages are
"proper" server grade (as in Xeon and ECC). EX6 machines start at EUR69 which
is fantastic if you ask me.

Edit: as for rudeness - chalk it up more to a disagreement over what
constitutes "desktop-grade hardware" (and your omission of their higher level
hardware offerings)

------
count
"DNS being slow (fuck it, use IPs)"

So, yeah, ops isn't hard at all, if you don't fucking take the time to do it
right.

~~~
zeeg
It takes just as much time for DNS to propagate as it does for me to deploy a
config change. It literally doesn't matter at this stage.

~~~
koko775
Run DNSMasq locally (as in, same datacenter as the computers that will be
using it) and tell it to cache. It's dead-simple to set up. Then point your
computers to resolve using it.

You can even add to /etc/hosts and the computers using it as their DNS will
resolve it. Depending on how much control you have, DNSMasq will also function
as a DHCP server and TFTP server from which you can netboot other servers and
do such nifty thing as automatic reinstalls. Useful if you have a separate,
internal network and want to set internal IPs, too.

~~~
gizzlon
DNSMasq is nice, it's so easy to make up your own, local, dns names. Do you
run several to avoid a single point of failure, or do you just fall back to
the "real" dns?

Even with a local DNS server, there has to be _some_ overhead though.. OTOH,
avoid premature optimization etc..

------
davestheraves
Sounds to me that the main issue was running out of RAM on workers. Would this
not be soved by moving to another cloud provider (such as AWS) where you are
not limited to the tiny RAM provided on Heroku?

Is this not similar in strategy to choosing hardware and spinning up your own
stack?

EDIT: I getting at the rather sweeping statement against all cloud providers
based on a specific Heroku problem

~~~
Xylakant
The moment you do this, you forego one of the biggest advantages of using
heroku. As long as you're on heroku only, you don't need to take care of
securing the underlying stack - that is firewall rules, OS-updates, general
maintenance. The moment you spin up a single AWS instance besides it, it's
your problem. Depending on your use-case it could be a better choice to just
go all the way to dedicated hardware: The primary advantage AWS has over
dedicated hardware is flexibility. You can spin up instances depending on your
current need. If your load behavior is a flat, predictable curve you might
just not need that - and then real hardware is cheaper in most cases.

------
papercruncher
Not directly relevant to the blog post, but Sentry is an amazing piece of
software. We're running it on an m1.small AWS instance along with a bunch of
other stuff and it is rock solid.

------
JDavo
My company is self hosting sentry for about 30 Python & Java in-house services
and it was the easiest deployment of anything that's taken me more than apt-
get install to deploy.

------
kuanghe
test,hahahaha

