
Proxygen, Facebook's C++ HTTP Framework - mikeevans
https://code.facebook.com/posts/1503205539947302/introducing-proxygen-facebook-s-c-http-framework/
======
dcsommer
Hey there, I work on Proxygen at Facebook. I'm happy to answer any questions
you have about the project.

~~~
lost_name
This question is a bit naive, but outside of Facebook, can you think of what
kind of application this is well suited for?

~~~
dcsommer
Well besides the fun of hacking around and building little HTTP servers, I
could see this being useful if you want to save money by running fewer
instances of your HTTP service. For instance, if you have a widely deployed
python webservice that isn't scaling well, you could rewrite it in C++ with
very little boilerplate using proxygen's httpserver.

It's early stages for the proxygen open source project. Maybe further down the
road we'll provide off-the-shelf binaries, but we think the library is already
interesting enough to warrant a release.

~~~
johnbellone
Is that the inception of the project? I am more curious about the lineage. How
was the decision made to go this route instead of throwing more instances at
it?

~~~
dcsommer
The blog post goes into more detail, but Proxygen started with an effort to
write a L7 reverse proxy that could deeply integrate into FB internal
services. We pulled out a lot of the non-FB specific stuff into this open
source release. Before that, we used hardware load balancers for this role,
which was expensive.

~~~
nbm
By expensive, not just capital costs, but costs around operating them - they
weren't as reliably configurable, health-checkable, and instrumentable as we'd
want, and Proxygen (and a later L4 load balancer) were.

Also the previous load balancers had constraints we weren't willing to accept
- they required special connectivity to our networks, we could not use
particular combinations of options, and we had to rely on vendors to solve
problems that most of their customers were not encountering and/or able to
detect.

~~~
johnbellone
Yeah, we are hitting the same wall right now and we've been going down the
same route using HAProxy/Chef. Our plan is to put an API in front of it and
treat it similar to an ELB. Are you still using hardware load balancers for
SSL termination?

~~~
nbm
The machines running Proxygen do the TLS termination.

------
b3tta
I really like your idea of the 4-part abstraction. Still… Your library really
is not "that" easy to use.

I'm currently working on my own library using libuv, http-parser, nghttp2 and
wslay, which is very similiar in it's use to node.js. As you might guess a
echo server is therefore only about 15 lines of code, but about as performant
as your framework. The downsite is that it's not as flexible due to the
missing "4-part abtraction" (really… an excellent idea).

That's why your release somehow saddens me: When I'm going to release my
framework to the public, it might be pretty good for cross platform apps etc.
compared to others, but it will never ever be as popular as yours. Heck… I
don't even have 10 twitter followers.

EDIT: I wrote an example as fast as possible…
[https://github.com/lhecker/libnodecc](https://github.com/lhecker/libnodecc)

~~~
Igglyboo
Care to show us what you've got instead of just saying how much better it is
than Proxygen?

Also Facebook is a group of people so saying "your" doesn't really sound
right.

~~~
b3tta
I'm sorry… English is not my native tongue. But I'm learning fast. :)

In no way I intended to say that my framework is better overall, but I do
think it's better suited for simple things, like apps.

In fact, I think I will integrate something like their "four-part
abstraction", because I really think this is a great idea.

------
beliu
dcsommer (author of proxygen) gave a great talk about this at the last
Sourcegraph open-source meetup. Here's the video:
[https://www.youtube.com/watch?v=-yxQIRl6Qic](https://www.youtube.com/watch?v=-yxQIRl6Qic)

~~~
dcsommer
I'm flattered, but I'm just _an_ author. Proxygen is the work of about a dozen
people over 4 years at Facebook.

------
thomasreggi
"You will need at least 2 GiB of memory to compile proxygen and its
dependencies." What?

~~~
port8080
fbthrift needs 2GiB of memory to compile
([https://github.com/facebook/fbthrift](https://github.com/facebook/fbthrift))
and proxygen has a dependency on fbthrift.

~~~
userbinator
That then begs the question of what in fbthrift needs so much memory... is it
mainly due to heavy use of C++ features like template metaprogramming?

~~~
swah
And then we go back to minimalism, Lua, ... always in this circle.

------
rdrock
We used to do nginx + gunicorn for our rest services, it was not responding
well beyond a point (for a given ec2 instance). We replaced that with nginx +
lua (openresty module), we saw almost 10x increase in response times. Would it
make sense for us to invest in something like this and hope to see a
significant performance gain? Lowering response times is not a big deal but
being able to get those same response times on a lower priced instance would
definitely help. We have no real C++ skills in the team but we could learn or
hire.

------
cthulhuology
what are the units on the table? the top looks like number of workers, but
large numbers are unitless.

~~~
dcsommer
It's requests per second (averaged over a 60 second test run).

------
bsaul
Looks to me as facebook's answer to golang ?

Building simple, standalone http services with good performances seems to me
what those two projects (proxygen and golang) are really about.

Now the question is, how much faster using C++ is, and how much safer and
faster writing golang is...

~~~
Jare
I think Facebook is a lot into D, sounds a more appropriate, for lack of a
better word, "replacement" for golang.

~~~
nbm
There are some D advocates, users, and enthusiasts (I fall into the last
category) at Facebook, and a slowly increasing amount of D code, but the vast
majority of infrastructure projects are done in C++, and most people are still
choosing it for new projects.

------
pcunite
Excellent, I've been toying around with my own and looking at LibUV. I think
the time is right for something like this. I want to maintain state on
everything that connects to me.

------
72deluxe
Looks interesting. Does anyone have a comparison of all these plethora of C++
HTTP frameworks, such as Civetweb etc.?

------
dimman
Facebook oughta hire some better scripters, the deps.sh is of terribly low
quality. I didn't get more than a couple of lines until i stumble upon this
(which tells me the author has no clue :):

'start_dir=`pwd`; trap "cd $start_dir" EXIT;'...

No need to say that the script can be dangerous, in case directory change
fails for instance, there's no checks but sudo make uninstall is run anyway in
another dir than the intended one.

~~~
dcsommer
Bash isn't my expertise and I put this together pretty quickly. Please send
pull requests! Forgive my ignorance, but what's the danger of the cd'ing in
the EXIT trap? Also, I did set -e, so there's no problem of running "sudo make
uninstall" from the wrong directory, afaict.

~~~
dimman
Last msg sounded a bit harsh, sry about that. Anyway, some things to consider:

1\. You don't need bash, but rather use /bin/sh to be more compatible with
other shells (I don't have bash, neither does a lot of other systems after
latest Shellshock incident). There's really no need to limit it to bash (bash
is one of many shells but very commonly mistaken for "shell script").

2\. The script is executed in a subshell, so the directory your script is in
when exiting is irrelevant, it doesn't affect the caller at all. Try by
creating a new script that just does 'cd a_dir_that_exists' and run it from a
terminal. :)

3\. set -e makes the program stop in case of _unhandled_ errors yes, so you're
right, the example I gave is indeed wrong and it would stop on the failed cd
attempt.

Instead of using '|| true' (to deliberatly ignore errors), the std way to do
it is '|| :' (which doesn't fork the true binary). However I would really
recommend taking care and handling possible errors.

------
amelius
Skimming through the article, it seems to me that this server spawns a thread
per connection, is that correct?

~~~
dcsommer
We use a very different model actually. Since spawning OS threads is
expensive, we opted for the popular nonblocking-IO approach. Each worker
thread (usually 1 per core on the CPU) is given connections in a round robin
fashion from the listening socket. The worker thread runs an event loop
processing events on the accepted socket.

~~~
amelius
What kind of programming technique did you use to implement the handling of
the protocols? Did you implement them as finite-state machines, or did you use
coroutines, or some other technique?

Do you think that C++ is a well suited language for this kind of processing?
Is it possible to say, now this project is in a mature state, that other
languages (e.g. Rust) could have helped make your implementation simpler?

~~~
seasonedschemer
Hey, I'm a Software Engineer on Proxygen as well. Proxygen heavily relies on
folly's buffer management abstractions such as IOBuf
([https://github.com/facebook/folly/blob/master/folly/io/IOBuf...](https://github.com/facebook/folly/blob/master/folly/io/IOBuf.h))
and Cursor
([https://github.com/facebook/folly/blob/master/folly/io/Curso...](https://github.com/facebook/folly/blob/master/folly/io/Cursor.h)).
Protocol parsing implementation uses folly::io::Cursor to safely read byte
sequences across non-contiguous buffers. Errors during parsing are wrapped up
in Result types
([https://github.com/facebook/proxygen/blob/master/proxygen/li...](https://github.com/facebook/proxygen/blob/master/proxygen/lib/utils/Result.h))
which take an inspiration from Rust. Such constructs simplify our
implementation to a reasonable extent and are still low-level enough to
extract performance.

