Hacker News new | past | comments | ask | show | jobs | submit login
Basic Concepts of High Availability Linux (frankgroeneveld.nl)
101 points by frenkel on Mar 23, 2014 | hide | past | web | favorite | 24 comments

This article is pretty thin. We are currently building out a clustered application on corosync/pacemaker with postgres synchronous replication and tomcat, and I have mixed feelings about Linux HA so far. It isn't too bad to get something basic set up (cluster with virtual IP for instance), but when things don't work it can be difficult to figure out why. If you are looking for a distributed filesytem GFS2 in this stack isn't bad. However, it seems like there are a lot of differences between package versions, and the interactions between versions of heartbeat, corosync, pacemaker, crm, pcs, your resource definition ocf files, the stonith resources, cluster-glue, along with the linux packages makes problems hard to track down and much of the web info you do find out of date. I've often had to resort to irc or the mailing list to try and figure things out, and even then sometimes it seems like nobody knows. The whole thing feels a little bit shaky at first, but it is possible to build a solid cluster on top of it with enough effort.

If you truly care about reliability, finding a reliable and fully tested distribution of these packages is the way to go. RedHat, SUSE and LinBit offer certified, fully tested binary distributions and support, and companies like Percona also provide support for specific HA stacks, and their expertise can save you a lot of guesswork.

The available documentation for heartbeat, corosync and pacemaker can be disappointing. I found the best way to learn is by doing–if you can afford it, cobble together a lab environment and see how many different ways you can break a system. With pacemaker, I find this depends highly on the resources you are managing!

I feel your pain. I have been maintaining a probably larger cluster with a similar stack for a few years. Right now I am working on a simulation system that will enable testing arbtrary failure scenarios for arbitrary services across corosync/pacemaker in a virtualized environment. Things like pulling cables, machine death, random upticks in latency, that sort of thing. This will also help to test new designs for new data centers, and corosync/pacemaker's performance over (potentially) high latency links such as the internet when physical site / legal jurisdiction redundancy is also required. What I've done, since it's unrealistic to expect average service authors to grok n-layers of complexity in the networking/storage/service setup fabric and related security processes, is to reduce the problem to one of rigid developer workflow neatly segregating 'platforms', 'services' and 'cloud providers'. The latter, which are either external commercial black boxes or your own infrastructure, includes one that implements corosync/pacemaker, writes your cluster rules for you, segments things neatly in to VLANs, dynamically reconfigures routers, disklessly provisions physical nodes, maintains a detailed hardware inventory including all serial numbers, etc. While it's impossible to replicate real hardware precisely in simulation, I came to the conclusion that I basically needed this degree of automation to have any confidence that things are working and will continue to work.

Here's my experience in building HA in Linux. There are two key pieces: storage replication and failure detection. Replication is so that there's a standby system with the same persistent state ready to go, and failure detection, well, the whole point of HA is to ensure ongoing operation to continue in case of failure.

For storage replication, Linux has the excellent DRBD (http://www.drbd.org/) software to replicate disk at the block device level. This is great because any kind of disk based systems can be supported, such as database server, mail server, file server, DNS server, etc.

For failure detection, Linux has the Linux HA Heartbeat ( http://www.linux-ha.org/wiki/Heartbeat). This would detect failure at machine level and ensure proper failover.

Within a machine, there are other tools to monitor process level failure and propagate the failure to Linux HA Heartbeat.

BTW, STONITH is a super simple way to avoid the partition problem.

My a-ha moment for Linux HA + DRBD came when I realized you could use them to retrofit redundancy and failover onto just about any system. (Of course, this won't protect against your replicated filesystem becoming corrupted, and current versions of DRBD don't allow you to scale out.)

Firstly, heartbeat is out of date. Secondly, stonith is not super simple. You need redundant network paths on each node to achieve it. Similarly, you need loads of testing.

A lot of tools are mentioned in both article and in this thread. So what is the simplest and best way to achieve a failover virtual ip assigned to cluster members? I don't want the tool to start services, it would be enough if it won't send the traffic to failed note by determining with simple logic like if port 80 is not listening? Having a lot of alternatives is good but confusing and having powerful tools is also good but when only simple things needs to be achieved, it requires a lot of time to configure it and it is harder while troubleshooting. I prefer "keep it simple stupid".

The Linux HA Heartbeat (http://www.linux-ha.org/wiki/Heartbeat) package supports virtual IP takeover in case of a fail over.

In a two-machine cluster, you configure two network interfaces for each machine. One interface has the real IP of the machine, serving as the private IP. Another interface has the virtual IP, serving as the public IP which all other clients connect to. Configure the second machine the same way with the same virtual IP but has it disabled.

When Heartbeat detects a failure on the primary machine, it would make the standby machine as primary and enable its virtual IP interface. The ARP service on the second machine broadcasts to all machines in the subnet to claim the virtual IP address as its own.

It's a pretty simple process.

I think same problems apply to both ucarp and heartbeat, they do not check service status, if I am not wrong. Also it is stated that heartbeat is deprecated and continue with corosync, but corosync seems more complicated.

If all you want is virtual IP failover, the simplest tool I know of is ucarp as mentioned by pandemicsyn below. That's basically all it does.

Note that ucarp and most (all?) virtual IP tools rely on being able to send fake ARP to update the IP, which puts restrictions on where the servers can live on the network, switch configuration etc.

edit: If what you're using it for is HTTP traffic, HAproxy seems to be the most mature tool out there for HTTP load balancing and failover.

> If what you're using it for is HTTP traffic, HAproxy seems to be the most mature tool out there for HTTP load balancing and failover.

HAproxy is a reverse proxy, it can do load balancing, but it's not useful if you need a HA cluster, as HAproxy becomes the single point of failure in your architecture. You still need something like wackamole, vippy or ucarp to activate the switch from a failed machine to a standby box.

I personally hate this model. Layer2 networking architectures like this need to finally die off, as they are extremely complex and difficult to troubleshoot.

For high-availability HAProxy I've found ECMP to be by far the most simple way to achieve very reliable redundancy, with a side benefit of more or less infinite horizontal scaling (depending on what routers you're talking to). I've served hundreds of gigabits with this model, and it works well. You can even scale it out on a global level by utilizing anycasting.

It works pretty simple. Run bgpd on your HAProxy boxes talking to your router. Each HAProxy box advertises your VIP, and your router will load balance this via ECMP. Should a HAProxy machine die, the BGP announcement gets withdrawn and traffic flows to the remaining proxy servers still advertising the VIP.

The only thing left to do is get some sort of service monitoring going that can automatically down bgpd should haproxy die/otherwise mess up on an individual machine. Add a couple checks in to ensure there is always a "path of last resort" should you have a bug in your app or monitoring code, and this proves to be very resilient, scalable, and is something nearly anyone can troubleshoot in a very short amount of time. It also works well in a cloud type on-demand/devops model - as it's extremely easy to simply spin up additional haproxy machines and have them automatically announce their configuration via bgpd.

I looked at ucarp however, as far as I understand, it only looks if node is alive/down, not if particular port or service is down.

No mention of Wackamole¹ in the article, and I feel really compelled to mention it here, as it is really simple to set up a HA cluster using it. I followed this howto² a few months ago. It was really easy to configure and it runs stable since.



And there is also vippy, that is written in node.js and looks really good, but I didn't have the pleasure to try yet:


We use Spread and Wackamole. I can't say I love them, but they certainly work with not-too-many-quirks.

Linux High Availability is certainly still used a lot within application infrastructures, especially in some of the smaller ones. However what I find more interesting are the architectures that are performing all of the availability functions in the software layer such as in the application or database code, as typically these are simpler and offer far better scalability.

Its not mentioned but ucarp is handy for those times when you want to float a vip between two boxes for a bit of redundancy but don't need something super intelligent (like where a bit of flapping is ok).


OpenVPN Access Server ships with ucarp integration out of the box, and for the most part it just works. Keepalived is another common daemon for managing floating vips. If I understand correctly, ucarp was originally designed to support redundant routers and firewalls, but in my experience, they are great for moving vips between stateless services, like load balancers and reverse proxies.

Agreed. ucarp is almost trivial to deploy:


Great overview of the basic concepts, thanks!

Server 500 error. Now that's irony.

Highly available 500 error, though! (It seems to be back up.)

Install 5 different things, or just run Erlang on all nodes.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact