

BigPipe: Pipelining web pages for high performance - aristus
http://www.facebook.com/notes/facebook-engineering/bigpipe-pipelining-web-pages-for-high-performance/389414033919

======
angelbob
Sounds like, if they don't make their solution open, this will become the
basis for a fairly nice little framework.

It'll be interesting because this _will_ require significant rework to fit
with how most web servers work. It would be hard to implement in NGinX, for
instance. Facebook probably just wrote a custom server, or heavily modified
the code to an existing one.

~~~
WALoeIII
I think you could actually implement this with nginx + <insert evented/actor
server (tornado, rainbows, mochiweb, yaws etc.) here> \+ your application
quite easily. Nginx buffers the client's request (you cannot turn this off)
before passing it to the backend but you can turn proxy_buffering off which
stops nginx from buffering the response to the client. In this middle tier you
could instantly send the headers + loading JS and flush the buffer then
determine the pagelets to render and either do it in process or further
delegate to application servers with HTTP or your own protocol.

This does kind of beg the question, why use nginx at all? It provides you with
a lot of protection against malformed requests and general fuckery. If you
need it you can push the middle layer back into nginx as a module which would
be screaming fast.

This reminds me of Heroku who do something like this with their 'routing
mesh.' client -> nginx -> routing mesh (erlang) -> thin (ruby app server). The
erlang process knows which ec2 instance has the ruby process to serve the
request and initiates and is basically a smart proxy. You could quite easily
query 6 backends simultaneously for each pagelet and pipe the JSON out to the
client then throw in the footer as well.

I think this deserves some experimentation.

nginx + erlang + rails (serving JSON).

~~~
piotrSikora
Actually, you can implement this with nginx + client-side JavaScript to merge
"sub-pages" right now, you don't need any server-side language or framework.

Guys from Taobao (<http://www.taobao.com>) have open-sourced pretty much
everything you need to do that:

\- ngx_echo (<http://github.com/agentzh/echo-nginx-module>) for asynchronous
pipelining,

\- ngx_drizzle (<http://github.com/chaoslawful/drizzle-nginx-module>) for
fetching data from Drizzle/MySQL/SQLite,

\- ngx_postgres (<http://labs.frickle.com/nginx_ngx_postgres/>) for fetching
data from PostgreSQL,

\- ngx_rds_json (<http://github.com/agentzh/rds-json-nginx-module>) for
converting database-responses into JSON.

Also, this isn't new concept, at least few Chinese companies I know of use
similar rendering process.

~~~
liuliu
With all respect to the people in taobao, but this is different. The whole
page (without css, JavaScript and images) in facebook case generated through
one http request. In your case, they use Ajax to get content through serveral
http requests. IMHO the facebook method is better. It avoid http request
overhead and parallized steps as many as possible.

~~~
piotrSikora
You must have missed ngx_echo module, because it's the part that makes this
exactly as described in Facebook's BigPipe blog post.

I've prepared simple proof-of-concept configuration for nginx:

<http://labs.frickle.com/misc/nginx_bigpipe.conf>

As you can notice, every "sub-page" is generated individually. Using presented
configuration everything is chunked and flushed, so it will be sent to the
client right away. Response on the client side looks like this:

<http://labs.frickle.com/misc/nginx_bigpipe.output>

DISCLAIMER: I don't know how Taobao is using released modules internally or if
they use them in production already (but I know some portals do).

------
dkubb
This reminds me a bit of Edge Side Includes (ESI), although I realize this is
done on the client side more than the server side.

ESI got alot of attention about 10 years ago, but then sort of fell out of
favor in the tech media/blogs. Some big companies are still using them like
Akamai, and the Varnish HTTP accelerator has some basic support for it.

I always liked the idea of breaking up my page into smaller segments, and then
caching each part independently, and assembling the page from the cache. The
cache could request only the parts of the page that aren't in
cache/expired/uncachable, but otherwise pull everything from a super-fast
cache.

~~~
WALoeIII
Facebook's approach seems like a really cool way around browser limitations.

ESI is really cool for caching inside your infrastructure, but it doesn't help
the client as much because they have to download the entire page again even if
only the time changed in the top bar. I've always dreamed of a way of cutting
up my HTML page to have different pieces 'cached' by the browser. A logical
next step from this style (which helps you load a client with a cold cache) is
to have the client cache each of these pagelets. On subsequent requests you
could return JS (and actually you could check a cookie and recycle the JS
too!) look in the client's HTML5 storage or some other crafty mechanism which
will reduce strain on FB's infrastructure and make it faster for the user.

~~~
wmf
I read about somebody using HTML5 client storage to cache pagelets... ah, here
it is:
[http://www.usenix.org/events/webapps10/tech/techAbstracts.ht...](http://www.usenix.org/events/webapps10/tech/techAbstracts.html#Mickens)

~~~
WALoeIII
Sweet, instapaper'd!

------
elpuri
Either the writer of the article sucks at coming up with analogies or doesn't
really understand what pipelining sequential logic means :)

------
mattmcknight
For slow loading using client side includes with JS to load non-core pieces of
the page is a pretty common perceptual speed up technique- if the main content
loads quickly, we can wait for the shared elements to render.

Client side includes didn't make it into HTML5. Maybe in HTML6? Check this
space in 2020. [http://lists.whatwg.org/pipermail/whatwg-
whatwg.org/2008-Aug...](http://lists.whatwg.org/pipermail/whatwg-
whatwg.org/2008-August/015792.html)

------
yish
Wondering how this affects SEO and non javascript enabled browsers. I assume
one would still have to implement the more traditional solution as a backup
option.

~~~
snprbob86
Try Facebook in Firefox using the Web Developer Toolbar extension to disable
Javascript. You'll see that major features just silently fail to operate
correctly. I think that NoScript users and Javascript-free browsers are
basically non-existant. That doesn't mean you shouldn't design for the before-
Javascript-is-downloaded code path, but I wouldn't really worry too much.

~~~
coderdude
I wouldn't say they're non-existent because every so often you'll find some
user on here bragging about how he uses NoScript. Personally though, I don't
think that users who intentionally cripple their browsers matter.

~~~
deno
But that's contious choice they're making. It's like worrying about making
your website accessible for Richard Stallman.

------
xtacy
How is this different HTTP/1.1's support for pipelining?

~~~
IgorPartola
This is done at the JavaScript level so you actually have more flexibility.
Also, the article says that they load JavaScript dynamically so that it's
executed asynchronously, as opposed to just including it in the HTML.

------
dminor
OTOH, pages have to be rebuilt on forward and back.

------
Vekz
This seems very similar to MXHR

demo at: www.mixhammer.com

but using pages in chunks instead of only the assets on the page

------
tjpick
sounds like iframes again.

~~~
angelbob
Sort of. What they're describing has a lot of server-side bits going on to
make it all work.

At least, as described. Rendering most of the page immediately and then
leaving the connection open to shove more through is more server-push than the
standard ways of doing this.

~~~
tjpick
yep. I think you'd be better off with http pipelining and iframe or ajax based
solution. At least that's transparent in terms of existing protocols.

Of course there's a lot more server-side and client-side work going on to
support facebook's implementation here. They've essentially invented some
crazy way of packaging multiple requests together but hiding that inside a
broken html document.

If server push is the destination they I might have thought that websockets
was a cleaner way of achieving it. I don't think server-push is the
destination though in this case.

