
An HTTP reverse proxy for realtime - jkarneges
http://blog.fanout.io/2013/04/09/an-http-reverse-proxy-for-realtime/
======
mcmc
This is awesome! I released something exactly like it a few years ago, hookbox
(MIT licensed):
[https://github.com/hookbox/hookbox/blob/master/docs/source/i...](https://github.com/hookbox/hookbox/blob/master/docs/source/intro.rst)

Basic idea is this: You put all your real-time stuff in a message queue (MQ)
which communicates directly with the browser. For authentication /
authorization and various other forms of permission / logging, you have the MQ
communicate with the web framework via http callbacks (Webhooks) and a
standard REST API. So the architecture is:

User <\--Websocket--> MQ :: publish/subscribe

MQ --Webhooks--> PHP/Django/Servlets/etc. :: user signed on, user joined a
channel, etc.

PHP --REST--> MQ :: publish(msg), remove(user, channel), etc.

The key is to include cookie information in the callbacks from MQ -> PHP so
the callback happens in the context of the user session. Suddenly you can do
things like write a chat app in 30 lines of php + js, or a persistent time
series in 20, and it really feels magical.

I actually started Hookbox almost as a statement of irony, because I was
really frustrated about the major pushback I was getting to sockets in web
browsers at the time. I'd just finished writing/submitting the initial
proposal for Websocket, and I wrote this tongue-incheeck piece about the
mismatch between typical web development and network server programming:
[http://svwebbuilder.wordpress.com/2008/10/20/html5-websocket...](http://svwebbuilder.wordpress.com/2008/10/20/html5-websocket-
and-webjneering/)

So Hookbox started as a 2-3 day project that took on a life of its own for a
while and ended up being really useful. This project was one of my smaller
open source codebases and to this day I receive tons of interest and requests
for maintenance, though I've abandoned it for years due to time.

I'm sure there's a huge market for this sort of thing. It's great to see
Pushpin, I'll definitely check it out!

------
ultimoo
This is a great product, thanks for building it.

In a previous project that I had worked on, we were spawning up complex pieces
of infrastructure using Chef integrated with fronting Rails app. Realtime
updates were always tough to orchestrate with custom Rabbit MQ feedback from
the Chef clients pushing out to Rails clients using JS.

I believe a solution like this one would come in very handy for pushing out
realtime updates for long running infrastructure requests from a distributed
system. Kudos!

------
bradgessler
Nice! We're working on something similar in Ruby EM at <http://firehose.io/>
if anybody is interested.

Glad to see more streaming REST implementations coming alive!

~~~
TheTaytay
Brad, firehose looks cool too. Are you guys using it in production? (Should I
be worried that the Travis build is failing?)

~~~
bradgessler
Yep! We pump a lot of messages through this per day. Not sure why Travis is
failing, probably a recent em-hiredis bump. I'll check it out.

------
obilgic
I started to feel like i am falling behind with all these interesting libs,
servers, frameworks, protocols, languages, platforms recently.

Especially, if you are a student and trying to keep up with all these new
stuff. It is becoming really hard to decide what to learn next or what to
focus on.

~~~
freshhawk
Keep an eye on what's out there, surface level among all the things that are
useful (or just interesting to you). Dive in when you have an actual problem
to solve. Don't decide before you dive in, use that surface knowledge to know
what your options are and where to start but don't make decisions with it (can
be tricky).

And as others are saying, know your fundamentals, and know your theory. Get
experience doing something real start to finish as often as possible and in
order to do that you will have to dive into things. That's how to decide what
to learn next.

~~~
aantix
Upvote.

At the startups that I've worked, things move quickly and unless you have a
personal motivation to learn something, you can't keep up.

It's best to get a cursory view of your available options (star them, bookmark
them, commit them to memory) and when the situation arises for a specific
problem to solve, you'll know of a handful of options to further investigate.

------
halayli
I've written memqueue (<https://github.com/halayli/memqueue/>) for similar
purposes.

Memqueue is a revision-based queue server with a REST API. Multiple consumers
can poll the same queue at a different pace by using revisions. A revision is
sort of a cursor that allows a consumer to specify where to poll from in the
queue. If a connection drops it's not a problem, you continue from the
revision you stopped at after you reestablish a connection. Each time a new
message arrives to the queue, the revision is bumped by one and consumers are
expected to poll from the new revision.

It also allows you to specify message & queue expiries so you don't have to
manage memory growth.

------
ch0wn
I like seeing mongrel2 as part of this stack. It seems like the perfect fit
for an architecture like this and I get the feeling that's it's heavily
underused.

~~~
jkarneges
Mongrel2's usage of ZeroMQ messages to manage low-level protocol was a great
influence to Pushpin's design. In fact, one of the other components in the
stack, Zurl, is basically the inverse of Mongrel2 (doing outbound HTTP instead
of inbound). Love all these little worker components.

------
scraplab
This looks really interesting. We've been using the nginx-push-stream-module
to achieve something similar, but the public facing HTTP API design isn't very
flexible, so something that lets you configure this explicitly would be great.

<https://github.com/wandenberg/nginx-push-stream-module>

~~~
jkarneges
Thanks for the comment. At my last job, we were unable to migrate an existing
API over to a certain realtime solution without breakage, and that was
motivation to create something more versatile.

------
sokrates
Personal peeve: There is still a name collision beetween the popular use of
"real-time" to describe "live" updates (usually as the defining aspect being
"without polling") versus the definition of "real-time", implying that events
are delivered to the customer in a guaranteed period of time. Which is
obviously not true.

~~~
j_s
Can you give an example of any 'true "real-time"' projects implementing any
aspect of HTTP? I think this ship has sailed.

~~~
peterwwillis
Yeah, for example, there are web services whose design goal is <30ms completed
request so they can guarantee their client timely data based on a SLA. If this
"real-time" project can't guarantee time of transactions, it's not "real-
time".

~~~
sctechie
How is this even possible? All it takes is a clogged router buffer 'somewhere'
along your network path and that <30ms web request is gonna get blown away. Am
I missing something here?

I'm probably just being ignorant lol. Can you point me to an example company
offering a service like this?

Thanks.

~~~
peterwwillis
I can't, but you could for example look at the financial services industry or
providers of real-time data feeds.

In general these services are not available through the public internet so
routers with clogged buffers are not usually an issue. But it depends on the
SLA.

------
ck2
I realize http is extremely well known and documented so it's relatively easy
for backend communications but I've always wondered how efficient it actually
is considering how old the standard is.

~~~
buzzkills
It's actually fairly good now that most of the improvements in 1.1 can be used
with browser support like pipelining, compression, keep-alives and so on.
However it isn't perfect SPDY is better still as it allows for things like
muxing and pre-emptive resource downloading.

------
jconley
I'm glad you built this Justin. I have wanted this functionality many, many
times in the social gaming world and never had the time/budget to build it and
instead hacked something together. I love seeing how XMPP has influenced the
design of very useful and much more pragmatic solutions to near-real-time
communications.

------
afshinmeh
That's nice, but I have a question about the mechanism.

In the document I saw that Pushpin send response to client while it's waiting
for the response from web application, right? How is it possible? I mean if
you send a response to the client, you can't send anymore responses after
that.

~~~
jkarneges
If Pushpin is told by the web application to hold a request open, then it does
not send anything to the client (at least not in the long-polling case.
streaming works different but I've yet to write about it). So when the
application publishes data at a later time, it is the first payload the client
sees.

------
jonahx
This is really cool. To make sure I understand, using it would replace the
need to use services like pusher.com? Also, if I were using, say, Sinatra with
nginx and unicorn, where would this sit in the pipeline?

~~~
jkarneges
To be clear, this is software, so comparing it to Pusher.com is a bit of an
apples to oranges comparison. Pushpin is more comparable to Socket.io,
Juggernaut, Faye, etc.

What makes Pushpin special is that you can control what the outside-facing
HTTP exchanges look like, which makes it good for implementing APIs, and may
also be useful if you're just really anal about how your client/server
interactions work. :)

The cost is that you need to design your protocol and write client code (the
other solutions already have their own special protocols and come with
corresponding JavaScript libraries ready to go). So whether or not Pushpin is
good for you depends on what level of control you're after.

In the pipeline, Pushpin goes in the very front, just behind a load balancer
(if any). The reason for this is you could put instances of Pushpin in
different geographic locations, all fronting an application in a single
location. So you want it the furthest out, closest to any users that might be
connected to it.

~~~
jonahx
Thanks!

------
TheTaytay
Cool! This sort of architecture looks great for our Rails API we're
developing. I'd be curious to see benchmarks to know how many connections I
can expect to be maintained by a single instance of pushpin.

------
babuskov
AFAICT, you can do the same with newer versions of HAProxy and nginx. Is there
any advantage of using this over those well tested and proven alternatives?

~~~
jkarneges
Nginx has the Push Stream Module, which is similar but not quite as versatile.
I don't think you could implement the "incremental counter" API described in
the Pushpin article, for example. Whether or not this matters to you comes
down to how much control you need.

I'm not aware of such functionality in HAProxy but would love to hear about it
(I'm an HAProxy fan :)).

~~~
babuskov
While "incremental counter" is a cool idea, I fail to see how it really
improves anything. You're just moving the same problem from one layer to
another. Node.js does all Pushpin does and you can write "simple chat server
in 30 lines of code" as well, without the need to set up another piece of
software.

This look like band-aid for developers still stuck with Python, PHP,
whatever... technology from 2000's. I see I've got a down-vote from fanboys
already, but that's just the way HN works: it's hard to have a constructive
coversation, but it's easy to hate.

P.S. @jkarneges, this reply was not directed at you, but whoever down-voted my
simple question. I mean, how can you down-vote a question? Let's not question
anything and spread love, a la Facebook "like-button-only" style. :(

~~~
jkarneges
I've upvoted your original post to see if that helps.

The article does play up the compatibility with legacy frameworks. However,
the proxy approach itself was designed independently of this, and the
versatility turned out to be a bonus. Some background:
[http://blog.fanout.io/2013/02/10/http-grip-proxy-hold-
techni...](http://blog.fanout.io/2013/02/10/http-grip-proxy-hold-technique/)

Basically, I'm positing that as a system gets larger, then moving the problem
to an outer layer is good design, even if all of your backend code is event-
driven. You can use Pushpin and Node together with a straight face. :)

------
sek
Awesome Justin,

finally something that every web developer understands immediately and really
takes the pain out of realtime for a lot of people.

------
ck2
mirror:
[http://google.com/search?q=cache:http://blog.fanout.io/2013/...](http://google.com/search?q=cache:http://blog.fanout.io/2013/04/09/an-
http-reverse-proxy-for-realtime/)

If you are using wordpress and not wp-super-cache, when a page becomes popular
your server is going to have a bad time.

 _update: looks like they fixed it_

~~~
jkarneges
Thanks for the tip! I've enabled wp-super-cache now, and the page seems to be
working again.

------
annnnd
Love the idea and clean API! Is this production quality?

Also, just curious - how come qt is required?

~~~
jkarneges
Current state of code is "it probably works". We will be hardening it over the
next few weeks though to get it production ready (will be deploying it on
fanout.io, to replace our older code).

Qt, because... it's a nice C++ event-driven lib. :)

~~~
annnnd
Thanks, can't wait to try it out.

Must look into Qt then sometime. :)

------
tantalor
Does each publish_http_response_async release all pending requests or just
one?

~~~
jkarneges
It releases all requests held on the specified channel.

------
rj1107
Impressive!

