
Building a dependable hosting stack using Hetzner’s servers - prateekdayal
http://devblog.supportbee.com/2012/06/04/building-a-dependable-hosting-stack-using-hetzners-servers/
======
mootothemax
So far, I have nothing but praise for Hetzner. I've only had to contact
support once, when one of my server's hard drives was shouting out SMART
errors and looking like it was going to die shortly.

I got in touch late on Sunday night, discussed the problem with a couple of
their support staff, and by midday on Monday morning, all was fixed, a new
hard drive in place. Really quite incredible service, especially considering
the price.

~~~
jd
I had almost exactly the same experience. Also a hard drive that was starting
to die on us on a Sunday night and they had it replaced within the hour. We
just had to reboot and start the raid mirror again. 6 minutes downtime in
total.

Their customer support is terrific.

~~~
axx
It seems like they use really cheap HDDs. I'm a customer for a few years and
in every server since then, one harddrive died. It's not really a big deal
when you're using RAID.

~~~
jd
They use regular consumer grade HDDs on the cheaper servers, and enterprise
grade on the slightly more expensive ones. For instance, their EX6 servers
have 2x 3TB of high quality hard drives (€70/month) and their EX5 servers have
2x1.5TB consumer grade hard drives (€56/month).

For comparison see:
[http://www.hetzner.de/en/hosting/produktmatrix/rootserver-
pr...](http://www.hetzner.de/en/hosting/produktmatrix/rootserver-
produktmatrix-ex)

------
alberth
I'm surprised no one has mentioned that Hetzner's:

1\. Uses desktop grade hardware (i.e. no ECC, single socket, limited
networking, etc)

2\. Is located in Germany (i.e. high latency for your US user base).

Don't get me wrong, the pricing Hetzner provides is unbelievable.

I just wish a US based hosting provider was available that used server grade
component who was even 2x Hetzner price because it'd still be a steal.

(For those of you unaware of their pricing, you can get a Xeon E-3 with 32GB
of ram for just 79 euros/mo.)

~~~
mootothemax
_1\. Uses desktop grade hardware (i.e. no ECC, single socket, limited
networking, etc)_

To be fair to them, they do offer servers with ECC for a (slightly) higher
price:

[http://www.hetzner.de/en/hosting/produktmatrix/rootserver-
pr...](http://www.hetzner.de/en/hosting/produktmatrix/rootserver-
produktmatrix-ex)

~~~
cmer
Does ECC _really_ make a difference in practice? Is it even worth the price? I
don't have ECC in my iMac and I'm sure it'd do quite well if I used it as a
web server...

~~~
alberth
Do you run your iMac 24/7/365 and have customer data on with such data being
frequently accessed?

ECC exists to prevent data corruption so that you don't have to restart your
server.

Since I imagine you restart you iMac near daily, not having ECC isn't a
problem.

~~~
cmer
I actually never restart my iMac. Once every 3-4 months I'd say. And not
because it crashed, because I need to for software updates.

------
rmaccloy
If you have a single-rack network, now your single point of failure is the
rack switch or PDU. (This is why e.g. HDFS has rack-aware mode.)

If you have a cage, it's the datacenter (peering, power, environment, physical
security.)

Do you need to care about these things? Probably not. (But maybe you do, and
you happen to care less about price, or database write
latency/throughput/predictability, or...) Pick whatever set of tradeoffs works
for you.

~~~
Ecio78
You can have a single rack with redundant PDUs that comes from two distinct
power lines (ups etc..). Then you can have networking devices with redundant
power supplies or use single-power stackable devices and multiple ethernet
connections. Same for servers, redundant power supplies or servers in some HA
configuration

i'm not talkin about Hetzner, but generally

~~~
sneak
The top-of-rack switch becomes an issue.

Setting up link failover between switches (you can't bond for 2gbps, iirc, if
you are split onto two different switches) is sort of kludgy, too.

One's best bet is to just have multiple locations with low latency between
them, and then just do it all in software, and leave the n+x redundancy to BGP
routes. It's a lot cheaper and works just as well.

Note that this is how the Big Boys do it, as well - but it works for two
machines as easily as it does two million.

~~~
asharp
You can in fact bond for 2gbps if you are on two different switches, in two
completely different ways.

One way involves the use of cisco stacking switches, allowing you to use
802.3ad between two independent 'stacked' switches. You can also use the
external PSU to provide redundant power to each switch (giving each switch
redundant PSU's and having each switch redundant).

The second involves the use of the linux bonding driver in balance-rr
configuration. This has a slight bug with the bridge driver in that it
sometimes won't forward ARP packets, but if you're just using it as a web head
or whatever, you don't really care about those.

The 'big boys' do use ibgp/etc. internally, but that's for a different reason:
At large scale you can't buy a switch with a large enough MAC table (they run
out of CAM), so you have routers at the top of your rack that then interlink.
You can still connect your routers with redundant switches easily enough with
vlans and such (think router on a stick).

~~~
Ecio78
Yes i was exactly thinking about stacking two independent switches (i've done
it with Cisco 3750 but you can do it also with other brands). The only problem
could be related to the fact that doing this kind of stack you're now dealing
with one "logical" system so if the firmware is bugged or someone issues the
wrong command, you can have a single point of failure (but this could happen
also if an HA system goes wrong by itself or because of you)

------
jnorthrop
The article makes some good points and is a good starting guide to setting up
a dependable stack but I think the author downplays the skill, cost and time
that something like Heroku can save. He states "not including developer time
ofcourse[sic]."

For those not able to afford a fulltime sys admin that can be a significant
expense and bring in unnecessary risk.

~~~
peterwwillis
Skill: Can you read a HOWTO? <http://tldp.org/HOWTO/HOWTO-INDEX/howtos.html>

Cost: Cheaper, because you're doing the work yourself and only paying for a
VPS or two.

Time: A weekend.

If you're running a start-up and you can't hire a sysadmin, yes, managed
hosting is a good idea and will net you a reliable system for a decent price.
But if you're spinning up test/hobby projects which aren't mission-critical,
take the time to build your own stack/servers. It takes a minimal amount of
time and energy and will give you valuable experience you can use for the rest
of your career.

~~~
semanticist
Despite potentially talking myself out of work, I highly recommend this
approach.

Sysadmin is something that you can learn by doing, and any competent software
developer should be able to pick up enough knowledge to manage the kind of
simple deployment that a freshly minted startup needs.

------
moonboots
One thing to note about Hetzner, in addition to high US latency times, is the
initial setup cost. For the EX4 (core i7-2700, 16 GB ram, 6 TB HD, 49
euros/month), the one time setup was 149 euros. However, I just checked and
the setup cost for this server has dropped to 49 euros. I'm not sure if this
is promotional or permanent.

------
aangjie
I can put in my exp. with Hetzner. We had a RAM that was failing and got
replaced once they ran the check. We did have a backup server to take up the
load, in the mean-time so wasn't a problem.

~~~
alberth
Wow, lots of people in the comments have hardware failures with Hetzner.

~~~
bluelu
I guess they just write about the failure, because the handling of that
failure from Hetzner's site was just great.

There are certainly many many more which never had a failure.

------
btb
IMO a dependable stack requires a firewall in front of your servers. Sure you
can configure software firewalls on all of your servers, but its nice to have
an outer wall as well(defense in depth and all that). If hetzner started
offering that and private vlan support they would have a really killer
offering.

~~~
jarito
I'm also concerned with the lack of a load balancer. I guess you could do
something with a DNS service like Cloudflare, but that seems to be a deal
breaker for good uptimes.

------
lis
By the way - Hetzner lowered the setup fee, it used to be 149€, now it's 49€.

------
zschallz
I had a VPS at Hetzner I replaced my Linode with. Really liked it. For the
same price, though, you can get a really underpowered dedicated server at
kimsufi.ie through OVH with more ram and HD space.

~~~
Wilya
Both Hetzner and Kimsufi offer dedicated servers which are really in the same
scale. The 49€ server, which is the top of the Kimsufi line and the bottom of
the Hetzner line, are virtually the same server except one has 2To disk and
24G RAM, while the other has 2x3To (RAID1) and 16Go.

~~~
zschallz
I have the 14.99€ server as a development box that I access from the states.
Truth be told, the disk IO and processing speed is slower than a Linode 512,
but for 2GB of ram, 1TB of HDD and the transfer you can't really beat it.

------
babuskov
If you mostly have users in Europe like I do, then this is a no-brainer. I
have been using it over 2 years and so far only two glitches: once one of
HDD's simply vanished from my RAID array, and the other was when the key
switch burned out in datacenter where my server was. About 30 minutes downtime
and that was all.

I switched 5 different providers before settling with Hetzner.

------
rdl
I'd probably do DNS RR in front of the linux HA stuff for a load balancer.

------
ilikejam
Hmmm. 'Dependable' and 'a bunch of servers on the same rack' are mutually
exclusive things.

~~~
birken
I think that is a little unfair. The physical rack structure generally
provides two things, a networking switch and power hookup, and both are two of
the more reliable things that datacenters offer. In order for your application
to survive a rack failure (either power cord unplugged or network switch
breaking) then you need to have fully double every necessary part of your
application on another rack, which is going to be pretty inconvenient.

Companies like Amazon and Google no doubt spend a lot of time thinking about
the physical locations of servers and how failures might affect them in terms
of uptime and data loss, but for your average small application I think it is
ok to accept very small risks that will result in downtime as opposed to
spending a massive effort or engineering around it.

I also appreciate that services like Heroku hand stuff like this for you, but
what I'd be really interested to see is take your average dedicated machines
at your average datacenter and compare the uptime to a service like Heroku.
Because while dedicated machines have failure cases (power outage, networking
switch breaks, one of your machines hardware dies, hosting company has
networking issues, etc), AWS/Heroko have them too (AWS outage, DDOS attack
against Heroku, AWS/Heroku engineer makes a mistake, etc).

------
tmrhmd
Has anyone done any latency test between USA and Hetzner's data center
(Germany) and can share their numbers?

------
davyjones
Between pgpool and pgbouncer, go with pgbouncer.

~~~
prateekdayal
Why do you recommend pgbouncer? Did you face any issues with pgpool?

~~~
ibotty
i was just researching that yesterday: performance.

see for example
[http://www.last.fm/user/Russ/journal/2008/02/21/zd_postgres_...](http://www.last.fm/user/Russ/journal/2008/02/21/zd_postgres_connection_pools:_pgpool_vs._pgbouncer)

in short: pgpool is an old-fashioned unix architecture (process based),
pgbouncer is fancy event based. so it usually is a bit more performant.
reliable are both so that should not make any difference.

------
nirvana
I see the everything below the new EX6S has dropped by about 90 Euros in setup
fee. This is great news! I think I'll buy 6!

People have brought up reliability and that they are using consumer grade
hardware. This is an issue if you have SPOF. If you have a fully distributed
system (rare these days, for sure) it isn't much of an issue.

My current plan is to use DNS and each box is a full stack. (web app platform
on top of riak with authoritative DNS on the box.) So a web request might look
up example.com and get back a list of authoritative name servers
NS1-6.exampledns.com When the client then does the query to one of those auth
servers the auth server is in the cluster and returns the list of other
servers in the cluster ranked by load (Eg: multiple A address response for the
query.) Then when the client goes to connect to the web server it will hit the
least busy node.

I wonder, though, if there are 5 authoritative name servers listed in the root
for a given domain, will the root return them in the same order every time,
such that my first authoritative dns server (the one listed first at the
domains registrar) will get most of the DNS load? Or is there a way to have
the root name servers randomize the order of the authoritative servers they
give back to the client?

(Yes all this will be open source, eventually. I've learned not to make
promises about when-- soon as its viable outside the lab.)

~~~
sirclueless
DNS load is typically fairly light, because it's just a few packets per hit.
You certainly don't need 6x redundancy, and if one server gets most of the
traffic it's probably no biggie.

There are a couple of caveats to your load balancing strategy. With enough
headroom, these probably aren't total game breakers, but you should be aware
of them. More at [http://serverfault.com/questions/60553/why-is-dns-
failover-n...](http://serverfault.com/questions/60553/why-is-dns-failover-not-
recommended)

1) You shouldn't expect even or consistent load balancing across servers. Some
caching DNS servers (such as those at large ISPs) have very many downstream
consumers, and they won't do any randomization. If a large DNS server sees a
new order of records, it might trigger a synchronous switch of 10% of your
customer base from one server to another. This will cause spiky traffic.

2) You can't rely on any kind of sticky sessions. This may or may not be a
problem, and many load balancers drop this guarantee as well for performance
reasons, but it is certainly possible that a client may see a DNS records TTL
expire and switch to a new IP. If you aren't prepared for that you may start
dropping sessions.

------
MidwestMuster
So let me sum this up:

Hetzner is comparable to Heroku and AWS, except that you have to do your own
rack buildouts, private IP subnets, load balancing, redundancy zones, and CDN.

Is that right?

~~~
moonboots
Yeah, except for the cost of 1.5 dynos a month, you get a core i7-2600, 16 GB
of ram, and 6 TB of hd space.

