

How We Made GitHub Fast: A detailed look at GitHub's new architecture - mojombo
http://github.com/blog/530-how-we-made-github-fast

======
rabbitmq
Hi, alexis here from RabbitMQ.

Yes, we implement AMQP. We also provide support for other useful things like
STOMP and HTTP Pubsubhubbub. We implement these other protocols as well as
AMQP because some times people don't need to use the full and awesome power of
the AMQP model.

AMQP _is_ initially hard to grok. I think the main reason for this is that
AMQP combines three things: Queues, Pubsub, and Messaging. These are Not The
Same. Queues manage data in flight as state, Pubsub routes data to consumers,
and Messaging frames it.

So yes, as someone pointed out above, it would be nice to use some but not all
of this from time to time. We are working on ways to make that super easy -
please get in touch if you can help.

Another thing that people find hard is figuring out when to use message hub
technology, and when to use a database as a hub. Using a database to queue and
manage subscriptions to data streams is generally Not A Good Idea. Here’s a
presentation I did which attempts to articulate some of the issues with this:
[http://www.rabbitmq.com/resources/RabbitMQ_Oxford_Geek_Night...](http://www.rabbitmq.com/resources/RabbitMQ_Oxford_Geek_Night.pdf)

So, for someone using AMQP or any other Pubsub tech for the first time, there
can be a 'huh, where do I start' element. But as some commenters point out, if
you look at the client libraries it may be easier to get started. We've
actually lost count of how many clients there are, so take your pick.

List of clients: <http://delicious.com/alexisrichardson/rabbitmq+client>

Getting started: <http://blogs.digitar.com/jjww/2009/01/rabbits-and-warrens/>
(Python centric) and <http://www.infoq.com/articles/AMQP-RabbitMQ> (Ruby
centric)

To the commenter who said the AMQP spec is 300 pages long. You may have a
better time if you look at AMQP 0-91 which is much shorter than that at 40
pages mostly covering edge cases that you can ignore. The nub of AMQP can be
communicated in under a page.

BERT and BERT-RPC look cool. But - re the comments above - I would not see
BERT-RPC as an ‘alternative’ to AMQP though. The GitHub blog post talks about
PB and Thrift and JSON-RPC, all of which have been integrated with RabbitMQ.
If you want to do RPC, there is no ‘one true system’ yet. Typically we have
found that different people get value from different RPC metaphors in
different languages. Maybe BERT-RPC will get more traction than the others -
it definitely looks interesting.

I hope this is all useful or at least of passing interest. Here are some more
links that may be worth a glance:

General background: <http://www.rabbitmq.com/how.html>

AMQP and XMPP: [http://www.igvita.com/2009/10/08/advanced-messaging-
routing-...](http://www.igvita.com/2009/10/08/advanced-messaging-routing-with-
amqp/)

Feel free to contact us directly at info at rabbitmq dot com.

Cheers,

alexis

~~~
vidarh
As it turns out the AMQP 0-91 spec is only that short because the protocol
definition is split out as a separate (139 pages) document.

In contrast, the full STOMP spec fits on a page.

Of course they are vastly different in scope, but that is kind of the point.

There's a place for protocols like AMQP, there's a place for generic brokers
like RabbitMQ, but there's also a place for far simpler protocols and simpler
and/or specialized brokers. Lots of them.

For many applications being able to customize a simple, few hundred lines
long, specialized broker is more useful than having all the extra capabilities
you'd get from AMQP or a multi-protocol broker like RabbitMQ for example.

That's part of the reason you'll keep seeing a proliferation of these systems
- it's trivial to implement a simple broker that can handle tens or hundreds
of millions of messages a day on modern hardware (my last broker processed
about 4-5 million messages/day using 10% of a single 2GHz Xeon core, written
in completely unoptimized Ruby that took about a day to write), which means
the barrier to writing your own and get something that fits your requirements
exactly and where you understand every line instead of trying to find the
ideal off the shelf solution is pretty low.

Now, there are many cases where an off the shelf solution to this is the right
answer. The more complex your requirements are, the more critical proper
failure handling is etc., or if external requirements involve speaking a
complex protocol, the more attractive something like RabbitMQ gets.

But I doubt there will _ever_ be a "one true system" for RPC or message
exchanges, because the needs people are using RPC and message exchanges to
address are so vastly different. You shouldn't look at whether or not these
systems get widespread traction for that reason. What matters is if they are
good at meeting the needs of their specific niches.

~~~
rabbitmq
Thanks for your comment.

I could not find the 139 page document to which you refer.

There are two 0-91 docs, one is the spec definition for users, which as I said
short and is mostly edge cases you can ignore. There is a second doc for
implementers which defines the classes and methods in more detail. This is 63
pages long. Note also that for the purposes of client codegen, the BSD
licensed XML file in 0-91 is only a few pages long - because the _surface_ of
the spec is surprisingly small.

As a comparison, the definition of _core_ XMPP (a Jabber server) in
<http://www.ietf.org/rfc/rfc3920.txt> is 90 pages. BTW the core spec is just
for IM not pubsub.

In the case of both AMQP and XMPP the length comes from the requirement to
_interoperate_ between implementations.

You make a good point about STOMP above. We love STOMP too. There is as you
say a place for it - for lots of protocols. We have however found with STOMP
that because many behaviours are completely unspecified, that it costs us a
lot more to support (find and fix bugs, maintain stable behaviour under
different conditions, etc). It is less likely that the same application
talking to two STOMP brokers will behave the same way with both brokers -
deterministically and predictably. Maybe this is a good thing - there is more
scope for competing implementations? I don't think it's ideal. And let's not
talk about JMS in this regard.

I would not discourage people from writing their own brokers. You are among
many who have done this and people will go on doing it. But although you may
understand every line - what happens when someone else has to take over
managing your code? What if the requirements change - or the scope of use
grows? This is where products add value.

A lot of our customers have extremely simple requirements like "don't ever
lose my messages" or "broadcast to twenty different types of subscriber". So,
I don't think it's fair to make generalisations about "complex requirements".

I completely agree with you about RPC.

Cheers,

alexis

------
timf
> _We have patched our SSH daemon to perform public key lookups from our MySQL
> database_

That seems strange, that could be a PAM module at the least. If you patch SSHd
then you are burdened with keeping up with changes, etc. There's even a module
for direct mysql: <http://sourceforge.net/projects/pam-mysql/>

~~~
noste
I don't think you can use PAM for authentication if you want to use public key
authentication (see auth2-pubkey.c in Portable OpenSSH).

~~~
timf
Thanks, sorry for the confusion. So what they needed but did not have is an
authorization (not authentication) callout after the daemon has verified the
remote user's identity (vs. the built in 'callout' of looking at a user's
authorized_keys file).

~~~
noste
Hmm, I think this part is still about authentication as sshd cannot
authenticate the user without the keys. According to the article, GitHub does
the authorization in their Gerve script.

~~~
timf
This is all sort of pedantic but the way I read the situation is that the only
authentication is proving that the entity on the other end possesses the
private key associated with a certain public key. The authorization part is
two fold: is key X authorized to access account Y. And then it's passed on to
Gerve for more specific authorization checks. Having implemented such things,
I am probably thinking more about the internal situation, sorry..

------
dylanz
Awesome post Tom. Smaller deployments are pretty straight-forward, and complex
deployments are still pretty cookie-cutter. This is a breath of fresh air, as
it uncovers a lot of the pain points you were faced with, and the not-so-
common solutions. Thank you very much for sharing!

------
wallflower
> For our data serialization and RPC protocol we are using BERT and BERT-RPC.
> You haven’t heard of them before because they’re brand new. I invented them
> because I was not satisfied with any of the available options that I
> evaluated, and I wanted to experiment with an idea that I’ve had for a
> while.

> As much as I want to like Thrift, I just can’t. I find the entire concept
> behind IDLs and code generation abhorrent.

No (CORBA) skeletons in his closet.

Most importantly, his non-requirement:

> No need to encode binary data

<http://github.com/blog/531-introducing-bert-and-bert-rpc>

~~~
hassy
BERT supports binary data natively, you don't need to encode it like with
JSON-RPC.

~~~
wallflower
Thanks for the clarification

------
zikzikzik
Why do you use DRBD instead of the built-in mysql replication?

------
brown9-2
Ironically, github.com is now down:

 _GitHub is Temporarily Offline.

Either we're getting more requests right now than we can handle or you found a
page that took too long to render._

------
omouse
No diagram? :(

------
nuclear_eclipse
Off Topic: I miss your giant RSS icon... :(

