Hacker News new | comments | show | ask | jobs | submit login
LibHTTP: Open Source HTTP Library in C (libhttp.org)
191 points by mabynogy 7 months ago | hide | past | web | favorite | 62 comments



Whenever I see a C networking library, my first questions are:

1. How many security-critical CVEs has it had in the past?

2. How extensively has it been fuzzed?

3. Is this one of the rare C code bases that actually has pretty solid security (like Dovecot https://www.helpnetsecurity.com/2017/01/17/dovecot-security-... or various OpenBSD tools)?

I ask because once upon a time I just about broke my heart trying to write secure networking code in C. I inspected every line. I chose my dependencies carefully. I wrote extensive, malicious tests with tons of malformed data. I added recursion limits to prevent stack overflow. I ran everything through Electric Fence and used the best open source validation tools available in 2001. And despite months of effort for a very simple protocol, I wound up being affected by at least 6 CVEs in 15 years (http://people.canonical.com/~ubuntu-security/cve/pkg/xmlrpc-...), many of them in the third-party XML parser I used. But if somebody directly fuzzed my code, I bet they'd find at least one more issue somewhere.

There are a few people I trust to write mostly secure networking software in C. But the more time I spend fuzzing protocol parsers, the more I realize that—even though I like to think I'm unusually careful and paranoid—I'll never be one of those people.

So how does libhttp look from a security perspective? If it's truly paranoid and thoroughly fuzzed, it might be very useful for embedded work.


> (http://people.canonical.com/~ubuntu-security/cve/pkg/xmlrpc-...)

It looks like some of those CVEs are dated not that long ago. If code safety is still a concern with this project, you/someone might consider conversion to SaferCPlusPlus (essentially a memory-safe subset of C++). There is an "auto-conversion helper tool"[1] still in development, but already functional.

[1] shameless plug: https://github.com/duneroadrunner/SaferCPlusPlus-AutoTransla... (Feel free to post any questions to the github "issues" section.)


> If code safety is still a concern with this project, you/someone might consider conversion to SaferCPlusPlus (essentially a memory-safe subset of C++).

Thank you for the pointer! I haven't been a maintainer of xmlrpc-c in over a decade now, and I'm not even sure who's maintaining it or using it. The sourceforge mailing list archives seem to be down, so I have no way to contact the current maintainers.

The packages in Ubuntu which use xmlrpc-c are freeipa-client, rtorrent, opennebula, certmonger, flowgrind and cobbler-enlist. I also remember 2 or 3 commercial users from 15 years ago. If any of these people are interested, I'd consider writing a drop-in replacement in Rust that preserves the same C ABI, and spend at least a week of CPU time fuzzing it.


Architecturally this uses threads, a master listener and some number of worker threads. You will add a couple calls to your program to initialize the web server. The workers use synchronous I/O. You can easily add server side support for Lua or JavaScript from which you can expose and manipulate your program’s state, to the extent you can use threads in C without tearing your leg off.

It looks much nicer than anything I’ve hand rolled in the past to add web management to various servers and daemons.


It's nice to see a good C library, since it's very easy to bind to it from virtually any programming environment, which can't be said about other languages. And the build system can't be simpler: A makefile with a list of source files and compile flags. All you need to do HTTP!


It's tempting to consider using an http server in your program. But also consider wrapping your c/c++ programs in fastcgi loop and have nginx run behind it.

It removes a lot of complexity from your code.


>nginx run behind it. It removes a lot of complexity from your code.

There are good intentions with that advice but I think it's misleading. It doesn't take into account how http libraries like this one (and similar ones for C++ such as Proxygen[1] and Silicon[2]) are supposed to be used.

The intended use case is to add an embedded webserver to your executable that's communicating with friendly and internal systems. That would be things like microservices and dashboards.

You don't use those libraries to create external public-facing websites that would be under attack from hostile agents. You're correct: Do not re-invent NGINX by using the C http library.

As an example proper use case, let's say you write an internal C program to process terabytes of image files. You think it might be nice to have a visual status of its progress/errors/throughput/etc. Instead of adding a GTK GUI to the code, you use the http library to expose a "web dashboard". You can then point an employee browser on the internal network at it to see its progress. For this use case, avoiding the http library and adding NGINX into the stack makes it more complicated.

Another use case without HTML gui is exposing http endpoints for microservices. Again, it's internal communication between friendly agents.

tldr: http libraries that are compiled into executables vs NGINX are for different use cases.

[1] https://github.com/facebook/proxygen

[2] http://siliconframework.org/


Your described use case for a "friendly / internal" system has a limited definition in any non-trivial sized organization. It would only apply when all users of the embedded webserver already have root access to that server. This might make sense, for instance, in a small development team with a flat security hierarchy, but would be a red flag, security wise, otherwise.


>Your described use case for a "friendly / internal" system has a limited definition in any non-trivial sized organization.

Facebook is a non-trivial size company that has lots of internal private-facing programs with embedded http connectivity. From that experience, they open sourced Proxygen http library which is one of the links I mentioned in the previous comment.

Also, see the comment from VikingCoder which in turn links to reddit thread mentioning another big company like Google doing similar use cases with http libraries: https://news.ycombinator.com/item?id=15671936


Facebook is a mixed bag as far as being a model of good behavior on many things. The company is founded on irresponsible social principles. For how long did general internal employees have access to users' private data? This is no longer the case, though, correct? Sorry to get OT.

Maybe these libs are well-vetted, after all. My mistake. Thanks for the info.


I mostly agree with you, but I think it's worth considering defense in depth. Assuming only friendly requests could make your service a great pivot point.


Or simply talk to nginx in HTTP. Here's Glyph of Twisted's take on it (in a nutshell: why convert HTTP into an HTTP-like protocol which is neither as well-standardised nor as well-documented?) : https://twistedmatrix.com/pipermail/twisted-web/2006-April/0...

You could use a very stripped-down and thus potentially very efficient HTTP parser given that most of the work will be offloaded to nginx.

Also, last time I checked, nginx didn't support request multiplexing/pipelining over FastCGI (as it does over HTTP). Instead, each HTTP request received by nginx would result in a new (FastCGI) connection to your application which is obviously suboptimal.


> why convert HTTP into an HTTP-like protocol which is neither as well-standardised nor as well-documented?

I have these same feelings every time I see uWSGI being used.


The protocol is uwsgi, whereas uWSGI is the application, which doesn't have to use uwsgi :D


I think you mean WSGI is the protocol.


WSGI is the python-level API between web server and your application. uwsgi is one implementation of that which also has it's own network protocol (also called uwsgi)


Are there any supported fastcgi libraries these days? It seems like the official C/C++ one (fastcgi.com) disappeared from the internet.


I tried to write one a while back, I haven't needed it for a while. https://github.com/cjhanks/libSimpleCGI


>It removes a lot of complexity from your code.

Not to mention, Nginx implements quite a lot of features that are probably missing in LibHTTP, or implements them in a more performant way.


yep, more secure, and battle tested. And one additional subtle advantage is that it leads you to make better architectural decisions because it eliminates state.


Surely this is just an added feature to a existing program.

Having it inbuilt is useful for testing.

Also not sure whether doing a fastcgi loop and requiring another http server to frontend it really is simpler!


try embedding a web server inside your program and have it manage threads, event loops, and not be vulnerable to security attacks and compare it with a simpe nginx config pointing to your application and you’ll get an idea which one is simpler. not to mention you’ll need to get around all the quirks of whatever lib u use, learn its api, and deal with platform compat.


An embedded web server typically doesn't have to be very performant. It's just for convenient runtime access to the programs innards.

In fact in some cases it may only accept one client at a time.


Another library worth mentioning is https://github.com/h2o/h2o


h20 is nice, but I believe it requires libuv. libuv is a massive stack with LOTS of abstractions on top of abstractions that are sorta atypical of C libraries. If your code crashes because some assert in libuv fired, good luck navigating through the "typedef city" that libuv is. I wrote a little server built on top of libuv but I could never really get it working right on Windows (which is a platform that libuv supports fairly well I think). In the end I ended up abandoning libuv, and all the that abstraction, and just requiring Linux or BSD by targeting epoll/kqueue directly.


I was interested to see if any of the Rust webservers allow C linking, but a cursory examination shows that they at least aren't explicit about it in their docs, if they do. I thought it would be interesting, because I know a number of them are looking to make sure of Rust's concurrency guarantees and futures-like libraries to provide what it sounds like you want, but would possibly be easier to reason about in the case of errors.[1]

1: This of course presupposes you know, or are willing to learn, Rust. That said, given what you've explained about libuv, the total time to grok libuv's code base and the time to learn Rust may be similar, but learning Rust may pay more dividends?


libh2o does not require libuv, it has its own event loop. libuv is optional.


Then their FAQ needs to be updated:

To build H2O as a library you will need to install the following dependencies:

libuv version 1.0 or above OpenSSL version 1.0.2 or above

As I understand it, regular h2o doesn't require libuv but libh2o does.


The H2O build produces libh2o and libh2o-evloop. The latter doesn't depend on libuv.


It's good to see more libraries filling this space. One of the foremost contenders is Libmicrohttpd (https://www.gnu.org/software/libmicrohttpd/). The only problem with that library is the LGPL license.


An absolute dream to work with, simple, fast enough for most people. (My benchmark when using it was about 20,000 concurrent connections a second under siege.)

It's the Flask/Sinatra of the C world, though you do need to provide your own router if you want anything a little bit more complex.


There is also https://libwebsockets.org/ which, despite its name, works well as a regular HTTP server library.


As an end user, I see it as a feature rather than a problem. Great library either way.


Why is LGPL a problem?


Embedded platforms come to mind.


See https://curl.haxx.se/libcurl/competitors.html for a comparison of various HTTP libs (obviously through the lens of cURL)


While those are all http client libraries, it would appear this project is a http server library.


I'd recommend a clearer title, then. HTTP is a protocol, so making the client vs. server distinction explicit would be useful to the reader.


I wish the "about" page for projects like these had more information. For example, why was this library created and by whom? What does it offer that others don't? Etc. I want to learn more about the problem it's solving even though I have no interest in the code, and I'm sure I'm not alone.


It is sad how open source is all about sharing/learning until it comes time to link to competing/alternative projects.


There are quite a few alternatives, microhttpd is a very popular one though in c++, moongoose / civetweb(forked version) are both good alternatives and are tested in the field and are written in C, you also have duda and libonion doing the same http library but with more functions(restful, websocket,etc)


I've used mongoose in the past to embed http servers to C++ programs. Would be interesting to see how this performs in comparison.

https://github.com/cesanta/mongoose

Here's me attempting to optimize REST routing by using Google's re2 regex library to reduce the list of possible regexes that might match a given URL:

https://github.com/nurettin/pwned/blob/master/server/server....

FilteredRE2

https://github.com/google/re2/blob/master/re2/filtered_re2.h


Given this is MIT licensed instead of GPL like mongoose, it's already leaps and bounds more useful for embedding.


Ok, gpl is a terrible calamity and I'm sorry I ever opened notepad.


What are the advantages over Civetweb, from which this project was forked?

(Civetweb itself was forked from Mongoose when Mongoose changed its license to commercial+GPL)


HTTP/2 support?


h2o has it


Re-posting from a reddit discussion [1] about another HTTP server written in C:

I worked at a company once that had a really decent HTTP server library... That they put in every program.

You'd launch an app, and to debug it, you'd access http://localhost:9001. From there, you could go to URLs for different libraries in the app. Like, if you had a compression library, you could go http://localhost:9001/compression. It would show stats about the recent work it had done, how long it took, how much CPU, RAM, disk it used. The state of variables now, etc. You could click a button to get it to dump its cache, etc.

If you were running a service on a remote machine, accessing it over HTTP to control it was just awesome. http://r2d2:9001/restart. http://r2d2:9001/quit. http://r2d2:9001/logfile.

Oh, and the services running on that remote machine would register with a system-level monitor. So, if you went to http://r2d2/services, you could see a list of links to connect to all of the running services.

...and every service registered with a global monitor for that service. So, if you knew a Potato process was running somewhere, but you weren't sure which machine it was on, you could find it by going to http://globalmonitor/Potato, and you'd see a list of machines it was running on.

Just all kinds of awesomeness were possible. Can not recommend enough.

And, I mean like, programs with a GUI. Like, picture a game. Except on my second monitor, I had Chrome open, talking to the game's engine. I could use things like WebSockets to stream data to the browser. Like, every time the game engine rendered a shot, I could update it (VNC-style) in the browser window. Except annotated with stats, etc. It was just the most useful way to organize different debug information.

And what was great was that writing a library, and wanting to output information, you wouldn't write it to std out... You'd make HTML content, and write to it. Want to update it? Clear the buffer and write to it again. As a user, if you ever want to read the buffer, you just browse it. Want to update it? Refresh the window. Or better yet, stream it over a websocket. Like Std Out on steroids. If you need to combine the output from a few libraries in a new window, you just write a bit more HTML in your code, and you're doin' it.

It's just another example, in my mind, of the power of libraries. We all get used to thinking of frameworks (IIS, Apache) as the only way to solve a problem, that we forget to even think about putting things together in new and unexpected ways. HTTP as a library - HELL YES.

Using HTML to debug programs, live, is highly under-utilized.

[1] : https://www.reddit.com/r/programming/comments/36d190/h2o_is_...


Networking library in C? Nice. Who is afl'ing it first?


During the past twelve months, this project has had only one active contributor.


That's reasonable for a library that just supports HTTP. It's not that difficult of a protocol, even though it appears to have 3700 commits. (Do these count the ones from Mongose where it forked from?) Even CURL is mostly just one person, and it supports everything in the world: HTTPS, HTTP/2, FTP, Gopher, dozens more...


how should i interpret this information?


That this project has bus factor = 1. High risk.


The risk isn't a function of developer count alone; for example, a simple and polished feature-complete library with one developer is lower risk than a beta version of a huge complex distributed system with 100 developers.


"LibHTTP is licensed under the MIT license. Therefore the library can be used in both open and closed source projects."

I really don't want to be overly negative, and I applaud people for putting in work like this... but this is a perfect example of whats wrong with BSDesque licenses right here.

More people should at least consider gplv3'ing their stuff. So many of the bad stories we hear are side-effects from software that doesn't respect the user in the first place, with licenses that don't respect the user.

RMS was and is right.

If I may offer an alternative that respects the user, I have had great experiences with Hiawatha, and is my current standard webserver these days, even over nginx or apache. With an added bonus of being programmed with security in mind from the start, which is something we always hear people talk about wanting but how often do we actually see it?


I don't understand this logic.

You get free software with few restrictions on what you're allowed to do with it. In what way is that "respecting" you less than similar software with more restrictions on what you can do with it? How does someone forcing more control over your actions in any way "respect" you more?


"the library can be used in both open and closed source projects."

When that source gets put in a product that is closed and then that product makes it to the user. How do people not understand this difference by now? Tivoization. It violates the four freedoms principle. That's how.

To quote zAy0LfpBZLC8mAC:

"A law that allows murder is more permissive and obviously does not increase freedom. It's simply a fallacy to think that not putting any limits on what any individual can do results in maximal freedom for society at large."


This argument for the GPL only wins in a fantasy land where all software has to be released under GPL or not at all. In reality, companies reject the GPL and just write their own code or adopt and contribute to permissively licensed projects, and that is what ends up "making it to the user".

You're telling authors to take away freedoms of its own users now, to prevent a potential bogeyman from taking away freedoms of other users in the future.

You can use permissively licensed code now and forever, because that can't be taken away once released. You're refusing to use it now on the off chance that in the future, someone modifies that code to do something else and give it to other users without the code. Your initial use still hasn't changed, and your right to use that initial code hasn't changed. You are not a user of the new, modified code, so your rights have not been impacted whatsoever.


In reality, companies reject the GPL and just write their own code

Right, so we either end up with more Free Software, or with more employment for us programmers. It's a clear win-win!


This is called the Perfect solution fallacy.

For most users the choise is not between a 50% proprietary product that uses an MIT library and a FOSS product that is 0% proprietary. The choice is between a 50% proprietary product and a 100% proprietary product and you are trying to take away their choice of the 50% proprietary product.

Complete eradication of proprietary software is not the expected outcome of MIT licensed libraries. The goal is reduction.

teknico 7 months ago [flagged]

I'll just leave this here. Downvotes start in 3.. 2... 1...

The long goodbye to C - http://esr.ibiblio.org/?p=7711


Could you please stop starting language flamewars? Surely you could come up with something substantive to say.

https://news.ycombinator.com/newsguidelines.html




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: