Fuzed sits in front of processes that run Ruby, and are bound to PARC's parsing and grammar analysis libraries. These self-identify to the system and each report their own unique API (the services are heterogenous in both function and version).
Every time you fire a query at powerset, it is sent via JSON-RPC to the frontend of our internal fuzed instance, which then does query analysis before forwarding it to the part of the system that actually does search retrieval. For every one query fired at the powerset frontend, our fuzed cluster takes 2-3 hits, so it has to scale pretty well. So far it's performed admirably even under load. The Erlang runtime is proving to be extremely stable.
PS: The headline of this post is a bit misleading since Fuzed is tied to Rails as well (Chassis binding)... not just Erlang.
P.S., we want more language and framework bindings. I have about half a mzscheme erlang binding done (Erlenmeyer on my github) because I'd love to see arc bindings.
So it does things that normal routing doesn't do very well:
1. Fuzed restarts rack processes that become unresponsive or die suddenly as part of its normal operation.
2. Fuzed's dispatcher actually knows when your rack instances are busy and when they aren't. It won't route requests to a busy instance. This ends up being a very good scheduling algorithm for things like rails because of the variable return times.
3. Fuzed clusters reqire almost 0 configuration, they need to be able to do a DNS lookup to a master node, and that's pretty much it.
4. Fuzed can simultaneously host multiple versions of a site and monitor all of them individually OR in groups. Powerset uses the api-mode to host multiple versions of software as we roll out or do performance testing on new hardware or software.
5. Fuzed natively supports multiple frontends that serve out over the same hardware. Usually traditional approaches don't scale horizontally over this dimension without cloning configurations.
I could name some other features that normal clustering approaches need a lot of secondary software to achieve, but I've got a blog post coming up soon that should go over it in more detail.
The nagging voice at the back of my head asks if this is a case of scaling your architecture before you need to?
Fuzed is cool for this, and a big win for Powerset, because you can scale as you go, you can very nearly just drop more hardware in with practically no global configuration. Search Engines are a special case of websites because they have to scale very broad, very early.
But Fuzed is useful and usable from one box to hundreds, and it's idea for dynamic environments like EC2 where you need to stay very light on your feet to minimize cost. That's originally why I wrote it.
I've packaged a couple of EC2 AMIs (32 and 64 bit versions) that I'll make public in a day or two and a script that uses Net::SSH to automate most of the spinning process. I just finished the video for a screencast on this and I went from zero to a 9 node cluster that could serve 300 req/sec on a non-cached Rails app in development mode, all in about 12 minutes.
It really is fun to play with - now to find a reason to use it...
What happens when a Master node dies? (Or, say it's EC2 instance kicks the bucket, rare but possible. Or in the case of a non-EC2 deployment the hardware just fails.)
My limited understanding of Erlang leads me to think the unhandled death of a Master node is the death of all the nodes in the cluster.
Further, I understand this does not have to be. I remember there being some way for another node to respond to the death of another node, handle that death, and let things continue along.
Is anything like this set up in Fuzed? Would this be handled by having multiple master nodes for a cluster, and they watch each other?
Would there be any difference in setting up Fuzed to handle the death of the Master node process and the disappearance of the Master node instance?
When a master node dies, all the worker nodes go into a hibernation state, pinging the master's previous hostname until they can reconnect to that host, and then they reregister their resources with the master's resource_fountain.
One of the next major features for fuzed is handling the master as a single point of failure. We're exploring two options currently:
1. When a master dies the cluster can re-elect a new master on one of the machines in the cloud and everyone re-registers their assets. This process assumes that master death is relatively rare, and so minimizes the resources necessary for redundant operation at the cost of a slight time gap in service while the master is re-elected and resources are re-registered to it.
2. Clusters can create numerous masters which all maintain identical state. When one dies, another master will move forward and become the primary master. This approach requires more hardware in the cloud, but even if the master faults are common they don't allow for a gap of service.
As for the difference between process death and machine death, yes there are differences. If you're interested in how we handle it, please check out master_beater.erl (great name, huh?), which is a gen_fsm that worker nodes use to eagerly reconnect to the master. Also check out the fuzed.ap and fuzed_supervisor.erl. Erlang provides very good resources for handling this.