

Powerset's Erlang-based clustering technology is now open source - quellhorst
http://github.com/KirinDave/fuzed/tree/master

======
KirinDave
I'm the architect for Fuzed. I thought I'd explain how powerset uses it
production for the main site.

Fuzed sits in front of processes that run Ruby, and are bound to PARC's
parsing and grammar analysis libraries. These self-identify to the system and
each report their own unique API (the services are heterogenous in both
function and version).

Every time you fire a query at powerset, it is sent via JSON-RPC to the
frontend of our internal fuzed instance, which then does query analysis before
forwarding it to the part of the system that actually does search retrieval.
For every one query fired at the powerset frontend, our fuzed cluster takes
2-3 hits, so it has to scale pretty well. So far it's performed admirably even
under load. The Erlang runtime is proving to be extremely stable.

~~~
nickb
Hey. I started playing with Fuzed after the Railsconf presentation and I have
to say that I'm impressed. I got a 10 node instance running in no time and it
was nicely balanced. I'll be submitting some patches in the future before I
fork it on Github. Major kudos to you guys for releasing it! Now with Fuzed
and PoolPartyRb, there's no more excuses for not being able to scale easily.

PS: The headline of this post is a bit misleading since Fuzed is tied to Rails
as well (Chassis binding)... not just Erlang.

~~~
KirinDave
Well, it is and it isn't. Powerset doesn't use the rails functionality at all
right now.

P.S., we want more language and framework bindings. I have about half a
mzscheme erlang binding done (Erlenmeyer on my github) because I'd love to see
arc bindings.

------
schacon
I've been playing with this for a couple of days now on EC2 - I was able to
spin up a minimal stack in a few hours from nothing and just keep adding nodes
to make it keep being able to take more traffic - it's pretty sweet.

I've packaged a couple of EC2 AMIs (32 and 64 bit versions) that I'll make
public in a day or two and a script that uses Net::SSH to automate most of the
spinning process. I just finished the video for a screencast on this and I
went from zero to a 9 node cluster that could serve 300 req/sec on a non-
cached Rails app in development mode, all in about 12 minutes.

It really is fun to play with - now to find a reason to use it...

------
collin
I have one question about Fuzed. (Turned out to be a few questions, oops.)

What happens when a Master node dies? (Or, say it's EC2 instance kicks the
bucket, rare but possible. Or in the case of a non-EC2 deployment the hardware
just fails.)

My limited understanding of Erlang leads me to think the unhandled death of a
Master node is the death of all the nodes in the cluster.

Further, I understand this does not have to be. I remember there being some
way for another node to respond to the death of another node, handle that
death, and let things continue along.

Is anything like this set up in Fuzed? Would this be handled by having
multiple master nodes for a cluster, and they watch each other?

Would there be any difference in setting up Fuzed to handle the death of the
Master node process and the disappearance of the Master node instance?

~~~
KirinDave
I have a few answers for you:

When a master node dies, all the worker nodes go into a hibernation state,
pinging the master's previous hostname until they can reconnect to that host,
and then they reregister their resources with the master's resource_fountain.

One of the next major features for fuzed is handling the master as a single
point of failure. We're exploring two options currently:

1\. When a master dies the cluster can re-elect a new master on one of the
machines in the cloud and everyone re-registers their assets. This process
assumes that master death is relatively rare, and so minimizes the resources
necessary for redundant operation at the cost of a slight time gap in service
while the master is re-elected and resources are re-registered to it.

2\. Clusters can create numerous masters which all maintain identical state.
When one dies, another master will move forward and become the primary master.
This approach requires more hardware in the cloud, but even if the master
faults are common they don't allow for a gap of service.

As for the difference between process death and machine death, yes there are
differences. If you're interested in how we handle it, please check out
master_beater.erl (great name, huh?), which is a gen_fsm that worker nodes use
to eagerly reconnect to the master. Also check out the fuzed.ap and
fuzed_supervisor.erl. Erlang provides very good resources for handling this.

