Hacker News new | past | comments | ask | show | jobs | submit login
Powerset's Erlang-based clustering technology is now open source (github.com/kirindave)
85 points by quellhorst on June 5, 2008 | hide | past | favorite | 11 comments

I'm the architect for Fuzed. I thought I'd explain how powerset uses it production for the main site.

Fuzed sits in front of processes that run Ruby, and are bound to PARC's parsing and grammar analysis libraries. These self-identify to the system and each report their own unique API (the services are heterogenous in both function and version).

Every time you fire a query at powerset, it is sent via JSON-RPC to the frontend of our internal fuzed instance, which then does query analysis before forwarding it to the part of the system that actually does search retrieval. For every one query fired at the powerset frontend, our fuzed cluster takes 2-3 hits, so it has to scale pretty well. So far it's performed admirably even under load. The Erlang runtime is proving to be extremely stable.

Hey. I started playing with Fuzed after the Railsconf presentation and I have to say that I'm impressed. I got a 10 node instance running in no time and it was nicely balanced. I'll be submitting some patches in the future before I fork it on Github. Major kudos to you guys for releasing it! Now with Fuzed and PoolPartyRb, there's no more excuses for not being able to scale easily.

PS: The headline of this post is a bit misleading since Fuzed is tied to Rails as well (Chassis binding)... not just Erlang.

Well, it is and it isn't. Powerset doesn't use the rails functionality at all right now.

P.S., we want more language and framework bindings. I have about half a mzscheme erlang binding done (Erlenmeyer on my github) because I'd love to see arc bindings.

What are the advantages of fuzed over other load balancing options already available? That you can have more complicated logic in the frontend to backend mapping?

This is one. Unlike a lot of existing systems that are "implicit" clusters (e.g., nginx+mongrels, HW load balancer over nginxs over mongrels, etc in that vein), Fuzed is an "explicit" cluster.

So it does things that normal routing doesn't do very well:

1. Fuzed restarts rack processes that become unresponsive or die suddenly as part of its normal operation.

2. Fuzed's dispatcher actually knows when your rack instances are busy and when they aren't. It won't route requests to a busy instance. This ends up being a very good scheduling algorithm for things like rails because of the variable return times.

3. Fuzed clusters reqire almost 0 configuration, they need to be able to do a DNS lookup to a master node, and that's pretty much it.

4. Fuzed can simultaneously host multiple versions of a site and monitor all of them individually OR in groups. Powerset uses the api-mode to host multiple versions of software as we roll out or do performance testing on new hardware or software.

5. Fuzed natively supports multiple frontends that serve out over the same hardware. Usually traditional approaches don't scale horizontally over this dimension without cloning configurations.

I could name some other features that normal clustering approaches need a lot of secondary software to achieve, but I've got a blog post coming up soon that should go over it in more detail.

Cool, thanks. So it sounds like perlbal + monit/god + management aids. A pretty compelling package.

Thanks for the information. It is interesting, but perhaps not as interesting as some of the other problems Powerset is working on.

The nagging voice at the back of my head asks if this is a case of scaling your architecture before you need to?

Well, Powerset's current frontend manages over 2^9's cores, which is even more processes to monitor and route traffic to. Fuzed manages about a 3rd of these and ensures that they are up and traffic is rerouted in the event that the C code that constitutes our NL analysis crashes or consumes too much memory (it can), or a machine drops, or a nic crashes.

Fuzed is cool for this, and a big win for Powerset, because you can scale as you go, you can very nearly just drop more hardware in with practically no global configuration. Search Engines are a special case of websites because they have to scale very broad, very early.

But Fuzed is useful and usable from one box to hundreds, and it's idea for dynamic environments like EC2 where you need to stay very light on your feet to minimize cost. That's originally why I wrote it.

I've been playing with this for a couple of days now on EC2 - I was able to spin up a minimal stack in a few hours from nothing and just keep adding nodes to make it keep being able to take more traffic - it's pretty sweet.

I've packaged a couple of EC2 AMIs (32 and 64 bit versions) that I'll make public in a day or two and a script that uses Net::SSH to automate most of the spinning process. I just finished the video for a screencast on this and I went from zero to a 9 node cluster that could serve 300 req/sec on a non-cached Rails app in development mode, all in about 12 minutes.

It really is fun to play with - now to find a reason to use it...

I have one question about Fuzed. (Turned out to be a few questions, oops.)

What happens when a Master node dies? (Or, say it's EC2 instance kicks the bucket, rare but possible. Or in the case of a non-EC2 deployment the hardware just fails.)

My limited understanding of Erlang leads me to think the unhandled death of a Master node is the death of all the nodes in the cluster.

Further, I understand this does not have to be. I remember there being some way for another node to respond to the death of another node, handle that death, and let things continue along.

Is anything like this set up in Fuzed? Would this be handled by having multiple master nodes for a cluster, and they watch each other?

Would there be any difference in setting up Fuzed to handle the death of the Master node process and the disappearance of the Master node instance?

I have a few answers for you:

When a master node dies, all the worker nodes go into a hibernation state, pinging the master's previous hostname until they can reconnect to that host, and then they reregister their resources with the master's resource_fountain.

One of the next major features for fuzed is handling the master as a single point of failure. We're exploring two options currently:

1. When a master dies the cluster can re-elect a new master on one of the machines in the cloud and everyone re-registers their assets. This process assumes that master death is relatively rare, and so minimizes the resources necessary for redundant operation at the cost of a slight time gap in service while the master is re-elected and resources are re-registered to it.

2. Clusters can create numerous masters which all maintain identical state. When one dies, another master will move forward and become the primary master. This approach requires more hardware in the cloud, but even if the master faults are common they don't allow for a gap of service.

As for the difference between process death and machine death, yes there are differences. If you're interested in how we handle it, please check out master_beater.erl (great name, huh?), which is a gen_fsm that worker nodes use to eagerly reconnect to the master. Also check out the fuzed.ap and fuzed_supervisor.erl. Erlang provides very good resources for handling this.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact