Hacker News new | past | comments | ask | show | jobs | submit login
Building a dependable hosting stack using Hetzner’s servers (supportbee.com)
157 points by prateekdayal on June 4, 2012 | hide | past | favorite | 62 comments

So far, I have nothing but praise for Hetzner. I've only had to contact support once, when one of my server's hard drives was shouting out SMART errors and looking like it was going to die shortly.

I got in touch late on Sunday night, discussed the problem with a couple of their support staff, and by midday on Monday morning, all was fixed, a new hard drive in place. Really quite incredible service, especially considering the price.

I had almost exactly the same experience. Also a hard drive that was starting to die on us on a Sunday night and they had it replaced within the hour. We just had to reboot and start the raid mirror again. 6 minutes downtime in total.

Their customer support is terrific.

It seems like they use really cheap HDDs. I'm a customer for a few years and in every server since then, one harddrive died. It's not really a big deal when you're using RAID.

They use regular consumer grade HDDs on the cheaper servers, and enterprise grade on the slightly more expensive ones. For instance, their EX6 servers have 2x 3TB of high quality hard drives (€70/month) and their EX5 servers have 2x1.5TB consumer grade hard drives (€56/month).

For comparison see: http://www.hetzner.de/en/hosting/produktmatrix/rootserver-pr...

Same here, been there a few years now and recently had 4 drives fail in 24 hours! But, as usual, fast and professional support.

Just make sure you don't have anyone on the server (client, etc) running SSH attacks on their core routers or UDP floods, because that becomes a nightmare!

I'm surprised no one has mentioned that Hetzner's:

1. Uses desktop grade hardware (i.e. no ECC, single socket, limited networking, etc)

2. Is located in Germany (i.e. high latency for your US user base).

Don't get me wrong, the pricing Hetzner provides is unbelievable.

I just wish a US based hosting provider was available that used server grade component who was even 2x Hetzner price because it'd still be a steal.

(For those of you unaware of their pricing, you can get a Xeon E-3 with 32GB of ram for just 79 euros/mo.)

1. Uses desktop grade hardware (i.e. no ECC, single socket, limited networking, etc)

To be fair to them, they do offer servers with ECC for a (slightly) higher price:


Does ECC really make a difference in practice? Is it even worth the price? I don't have ECC in my iMac and I'm sure it'd do quite well if I used it as a web server...


tl;dr, yes, ECC does matter— a lot more than you'd guess!

It depends on your data, if you don't mind single bit errors in your data non-ECC is fine. If your data has to be perfect then you probably need ECC.

Do you run your iMac 24/7/365 and have customer data on with such data being frequently accessed?

ECC exists to prevent data corruption so that you don't have to restart your server.

Since I imagine you restart you iMac near daily, not having ECC isn't a problem.

I actually never restart my iMac. Once every 3-4 months I'd say. And not because it crashed, because I need to for software updates.

The value of ECC memory has little to do with how long your computer runs or how frequently data is accessed. Restarting a computer won't prevent the problems that ECC corrects. ECC memory is designed to detect and repair memory corruption caused by electrical/magnetic interference or problems with the memory hardware.

That kind of blows my mind. No ECC by default? Is it really a server with no ECC?

You can get servers with ECC RAM. All our servers there are 16 GB ECC RAM. I have been with them for a few years now and they have always acted very promptly on network issues (most of the times you don't even notice them)

ECC - very nice. Last I looked at Hetzner, ECC wasn't an option.

Question: (since your the OP), how do you deal with the huge latency to Germany from the USA?

This is more of a physics issue ("speed of light") than anything else.

how do you deal with the huge latency to Germany from the USA?

In some cases the location is actually an asset. Surprisingly, not everyone lives within the US.

More helpfully, the partial solution is to use either the Rackspace or HPCloud CDNs. Both of them are pretty cheap and both use Akamai, which gives you PoPs everywhere that matters. In my case (Australia), Amazon doesn't have a PoP for Cloudfront nearby so I using Amazon means I'm stuck with either West Coast US or (even worse for routing reasons) Singapore.

If you are big enough then you might be able to find yourself a better CDN deal, but most of the cheaper ones don't have a POP down here.

There is certainly higher latency but you can mitigate the effect by

1. Making lesser HTTP requests by using best known practices (for example YSlow recommendations) 2. When you start growing, moving static assets to CDN etc 3. When you grow even more move servers to US :)

If you have a single-rack network, now your single point of failure is the rack switch or PDU. (This is why e.g. HDFS has rack-aware mode.)

If you have a cage, it's the datacenter (peering, power, environment, physical security.)

Do you need to care about these things? Probably not. (But maybe you do, and you happen to care less about price, or database write latency/throughput/predictability, or...) Pick whatever set of tradeoffs works for you.

You can have a single rack with redundant PDUs that comes from two distinct power lines (ups etc..). Then you can have networking devices with redundant power supplies or use single-power stackable devices and multiple ethernet connections. Same for servers, redundant power supplies or servers in some HA configuration

i'm not talkin about Hetzner, but generally

The top-of-rack switch becomes an issue.

Setting up link failover between switches (you can't bond for 2gbps, iirc, if you are split onto two different switches) is sort of kludgy, too.

One's best bet is to just have multiple locations with low latency between them, and then just do it all in software, and leave the n+x redundancy to BGP routes. It's a lot cheaper and works just as well.

Note that this is how the Big Boys do it, as well - but it works for two machines as easily as it does two million.

You can in fact bond for 2gbps if you are on two different switches, in two completely different ways.

One way involves the use of cisco stacking switches, allowing you to use 802.3ad between two independent 'stacked' switches. You can also use the external PSU to provide redundant power to each switch (giving each switch redundant PSU's and having each switch redundant).

The second involves the use of the linux bonding driver in balance-rr configuration. This has a slight bug with the bridge driver in that it sometimes won't forward ARP packets, but if you're just using it as a web head or whatever, you don't really care about those.

The 'big boys' do use ibgp/etc. internally, but that's for a different reason: At large scale you can't buy a switch with a large enough MAC table (they run out of CAM), so you have routers at the top of your rack that then interlink. You can still connect your routers with redundant switches easily enough with vlans and such (think router on a stick).

Yes i was exactly thinking about stacking two independent switches (i've done it with Cisco 3750 but you can do it also with other brands). The only problem could be related to the fact that doing this kind of stack you're now dealing with one "logical" system so if the firmware is bugged or someone issues the wrong command, you can have a single point of failure (but this could happen also if an HA system goes wrong by itself or because of you)

I thought stackable switches provided HA with minimal fuss, and I fail to see what's kludgy in that. I don't see any reason for bonding gigabit connections at this age where 10G connections are readily available, although afaik stacking usually is done via proprietary high-speed links.

The article makes some good points and is a good starting guide to setting up a dependable stack but I think the author downplays the skill, cost and time that something like Heroku can save. He states "not including developer time ofcourse[sic]."

For those not able to afford a fulltime sys admin that can be a significant expense and bring in unnecessary risk.

Skill: Can you read a HOWTO? http://tldp.org/HOWTO/HOWTO-INDEX/howtos.html

Cost: Cheaper, because you're doing the work yourself and only paying for a VPS or two.

Time: A weekend.

If you're running a start-up and you can't hire a sysadmin, yes, managed hosting is a good idea and will net you a reliable system for a decent price. But if you're spinning up test/hobby projects which aren't mission-critical, take the time to build your own stack/servers. It takes a minimal amount of time and energy and will give you valuable experience you can use for the rest of your career.

Despite potentially talking myself out of work, I highly recommend this approach.

Sysadmin is something that you can learn by doing, and any competent software developer should be able to pick up enough knowledge to manage the kind of simple deployment that a freshly minted startup needs.

I don't think you necessarily need a fulltime sysadmin - contract sysadmins do exist (hi!) and systems like Chef can be readily understood by most developers.

If I was moving a start-up from Heroku to self-managed hosting (which could even just be Linode VMs!) I'd include time to train them on what I was doing, and why, and I'd probably stay on retainer for emergency support.

Personally, I'm also more than happy to chat to local start-ups informally and share my experience. (And if anyone in Scotland, particularly the Edinburgh area, wants to take me up on that, my email's in my profile blurb.)

Sysadmin stuff and later scaling is a big learning curve. That's the same reason I'm moving to Heroku.

One thing to note about Hetzner, in addition to high US latency times, is the initial setup cost. For the EX4 (core i7-2700, 16 GB ram, 6 TB HD, 49 euros/month), the one time setup was 149 euros. However, I just checked and the setup cost for this server has dropped to 49 euros. I'm not sure if this is promotional or permanent.

I can put in my exp. with Hetzner. We had a RAM that was failing and got replaced once they ran the check. We did have a backup server to take up the load, in the mean-time so wasn't a problem.

Wow, lots of people in the comments have hardware failures with Hetzner.

I guess they just write about the failure, because the handling of that failure from Hetzner's site was just great.

There are certainly many many more which never had a failure.

Well, that was in about 6-7 years of operation. Please see it in that perspective.

IMO a dependable stack requires a firewall in front of your servers. Sure you can configure software firewalls on all of your servers, but its nice to have an outer wall as well(defense in depth and all that). If hetzner started offering that and private vlan support they would have a really killer offering.

I'm also concerned with the lack of a load balancer. I guess you could do something with a DNS service like Cloudflare, but that seems to be a deal breaker for good uptimes.

By the way - Hetzner lowered the setup fee, it used to be 149€, now it's 49€.

I had a VPS at Hetzner I replaced my Linode with. Really liked it. For the same price, though, you can get a really underpowered dedicated server at kimsufi.ie through OVH with more ram and HD space.

Both Hetzner and Kimsufi offer dedicated servers which are really in the same scale. The 49€ server, which is the top of the Kimsufi line and the bottom of the Hetzner line, are virtually the same server except one has 2To disk and 24G RAM, while the other has 2x3To (RAID1) and 16Go.

I have the 14.99€ server as a development box that I access from the states. Truth be told, the disk IO and processing speed is slower than a Linode 512, but for 2GB of ram, 1TB of HDD and the transfer you can't really beat it.

What's the reliability of kimsufi like? Did you ever have any interaction with their support staff?

I called Ireland once to immediately validate my account and they seemed friendly enough. OVH supports Kimsufi boxes. The dedicated server has been up without issues since I bought it. The server is hosted somewhere in Northern France near London as far as I can tell.

The only complaint I have about it are that the relay they have between France and the US is nearly always congested during US prime time. Because of this, download speeds from my server are really slow around 8PM est. Otherwise it's great.

tl;dr Development box and not in EU? Do it! Production box and not in EU? Maybe go with Hetzner or someone domestic.

OVH is REALLY cheap, not very reliable, but amazingly cheap for heavy machines. We use them for processing a lot.

In what way are they unreliable?

I've been having trouble with my existing UK dedicated server provider of late and am looking to move. I could get a lot more bang for my buck with OVH/kimsufi, but wouldn't want to move if they were more unreliable than what I have now.

Well; I hear that people have different experiences with them (some people here on HN had 0 problems with them); I have been with them for around 8 years and I've had up to 50 servers there of different specs; from kimsufi type to the very large storage machines (http://www.ovh.nl/dedicated_servers/hg_2011_xxl.xml). For us we have the following issues: a) the raid degrades; on EVERY SINGLE MACHINE; we have, per machine, 1 degrade every month, meaning that for 2-3 days of the month the machine is unusable. We asked support; they have no clue as to the why and they always fix it, but it takes 2-3 days because everything will be very slow while it's resynching. b) we have rather a lot of network issues; the network just disappears for minutes to an hour. Now the other people on HN might not notice all of this as they have no (or no good) monitoring and they don't check their sites often enough; generally, for a blog or whatever you won't notice the downtime; if your server is not busy you won't notice the resync and if you don't have monitoring odds are you won't notice that ~1 hour / month network issue. But we really hammer hard (there were 2 very popular sites hosted on a cluster of them) on our servers and they were serving millions of hits/day. People notice then that suddenly the ENTIRE cluster is gone for a time. And when a machine doing 100s of hits/sec and goes into resync, you'll notice. So we are only running analytics there. And for the price/power it's a really good deal; hard to beat.

Also; they are HUGE and growing fast. When I started with them in 2005, they would ONLY speak french, you could not ask for support in another language, you could not pay in another way than the french bank and so forth. And machines + network would be down often. They improved a lot, so my guess is they will improve more over time, but they are a monster in hosting land.

Gah! An hour a month is too much for me.

Thanks for replying.

If you mostly have users in Europe like I do, then this is a no-brainer. I have been using it over 2 years and so far only two glitches: once one of HDD's simply vanished from my RAID array, and the other was when the key switch burned out in datacenter where my server was. About 30 minutes downtime and that was all.

I switched 5 different providers before settling with Hetzner.

I'd probably do DNS RR in front of the linux HA stuff for a load balancer.

Hmmm. 'Dependable' and 'a bunch of servers on the same rack' are mutually exclusive things.

I think that is a little unfair. The physical rack structure generally provides two things, a networking switch and power hookup, and both are two of the more reliable things that datacenters offer. In order for your application to survive a rack failure (either power cord unplugged or network switch breaking) then you need to have fully double every necessary part of your application on another rack, which is going to be pretty inconvenient.

Companies like Amazon and Google no doubt spend a lot of time thinking about the physical locations of servers and how failures might affect them in terms of uptime and data loss, but for your average small application I think it is ok to accept very small risks that will result in downtime as opposed to spending a massive effort or engineering around it.

I also appreciate that services like Heroku hand stuff like this for you, but what I'd be really interested to see is take your average dedicated machines at your average datacenter and compare the uptime to a service like Heroku. Because while dedicated machines have failure cases (power outage, networking switch breaks, one of your machines hardware dies, hosting company has networking issues, etc), AWS/Heroko have them too (AWS outage, DDOS attack against Heroku, AWS/Heroku engineer makes a mistake, etc).

Has anyone done any latency test between USA and Hetzner's data center (Germany) and can share their numbers?

Between pgpool and pgbouncer, go with pgbouncer.

Why do you recommend pgbouncer? Did you face any issues with pgpool?

i was just researching that yesterday: performance.

see for example http://www.last.fm/user/Russ/journal/2008/02/21/zd_postgres_...

in short: pgpool is an old-fashioned unix architecture (process based), pgbouncer is fancy event based. so it usually is a bit more performant. reliable are both so that should not make any difference.

Did some tests. pgbouncer was faster. Something about pgpool using DML while pgbouncer operates at a much lower level.

I see the everything below the new EX6S has dropped by about 90 Euros in setup fee. This is great news! I think I'll buy 6!

People have brought up reliability and that they are using consumer grade hardware. This is an issue if you have SPOF. If you have a fully distributed system (rare these days, for sure) it isn't much of an issue.

My current plan is to use DNS and each box is a full stack. (web app platform on top of riak with authoritative DNS on the box.) So a web request might look up example.com and get back a list of authoritative name servers NS1-6.exampledns.com When the client then does the query to one of those auth servers the auth server is in the cluster and returns the list of other servers in the cluster ranked by load (Eg: multiple A address response for the query.) Then when the client goes to connect to the web server it will hit the least busy node.

I wonder, though, if there are 5 authoritative name servers listed in the root for a given domain, will the root return them in the same order every time, such that my first authoritative dns server (the one listed first at the domains registrar) will get most of the DNS load? Or is there a way to have the root name servers randomize the order of the authoritative servers they give back to the client?

(Yes all this will be open source, eventually. I've learned not to make promises about when-- soon as its viable outside the lab.)

DNS load is typically fairly light, because it's just a few packets per hit. You certainly don't need 6x redundancy, and if one server gets most of the traffic it's probably no biggie.

There are a couple of caveats to your load balancing strategy. With enough headroom, these probably aren't total game breakers, but you should be aware of them. More at http://serverfault.com/questions/60553/why-is-dns-failover-n...

1) You shouldn't expect even or consistent load balancing across servers. Some caching DNS servers (such as those at large ISPs) have very many downstream consumers, and they won't do any randomization. If a large DNS server sees a new order of records, it might trigger a synchronous switch of 10% of your customer base from one server to another. This will cause spiky traffic.

2) You can't rely on any kind of sticky sessions. This may or may not be a problem, and many load balancers drop this guarantee as well for performance reasons, but it is certainly possible that a client may see a DNS records TTL expire and switch to a new IP. If you aren't prepared for that you may start dropping sessions.

DNS doesn't play particularly nice when you try and load balance it. You end up, essentially with issues with end users caching particular ip addresses and either failing when they shouldn't or causing load imbalances on particular servers that you can't seem to fix.

You probably want to have an external dns host returning two ip addresses for a haproxy or LVS cluster, which you then route into your actual web tier.

IIRC the way to get clients to round robin connect to different servers is to have your DNS server(s) return multiple IP addresses for a given domain.

I have no idea about how authoritative name servers work, but I'm assuming it's a prioritized list. I'd probably have all your authoritative servers provide all the IP addresses in any case.

perhaps somebody should put some scripts together and sell them? I wonder how well would CloudFoundry run on this. The biggest concern is the database, since that'd be the SPOF that is hard to handle.

So let me sum this up:

Hetzner is comparable to Heroku and AWS, except that you have to do your own rack buildouts, private IP subnets, load balancing, redundancy zones, and CDN.

Is that right?

Yeah, except for the cost of 1.5 dynos a month, you get a core i7-2600, 16 GB of ram, and 6 TB of hd space.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact