And yet, every single programming language/platform build their own HTTP-handling library, usually several, of very varying quality and feature support. Again, it would not be as bad if HTTP was a robust format where you could skip recognizing and correctly dealing with half of the features you don't intend to support but it is not: even if you don't want to accept e.g. trailers, you still have to be aware of those. We have OpenSSL, why not also have OpenHTTP (in sans-io style)?
Does anyone know if there is anything like this for HTTP or associated RFCs?
Eg, for HTTP header parameter, names can have a * to change the character encoding of the parameter value. How many implementations test this? Or tests for decoding of URI paths that contain escaped / characters to make sure they're not confused with the /s that are the path separators.
Where did you read this? HTTP header fields may contain MIME-encoded values using the encoding scheme outlined in rfc2047, but I haven't heard of the asterisk having any special meaning...
There are test suites for _some_ subsets of the spec, and there are implementation-specific testsuites (e.g. in chromium) ... but there's not a single HTTP 1.1 all-in-one testserver that you can test your client or server implementation against - over the wire.
The additional lack of tests for hop by hop networking changes (which is e.g. the Transfer Encoding parts of the spec in 1.1) and you have a disaster waiting to happen.
Combine that with 206 Partial Content and say, some byte ranges a server cannot process...and you've got a simple way to crash a lot of server implementations.
There's not a single web server implementation out there that correctly implements multiple byte range requests and responses especially not when chunked encoding can be a requested thing. Don't get me started on the ";q=x.y" value headers, they are buggy everywhere, too.
For my browser project, I had to build a pcap (tcpdump) based test runner  that can setup temporary local networks with throttling and fragmentation behaviour so that I have reproducible tests that I can analyze later when they failed. Otherwise it would be a useless network protocol test that's implementation specific as all others.
I think the web heavily needs a standard HTTP test suite, similar to the ACID tests back then...but for malicious and spec compliant HTTP payloads combined.
I think there currently is no such thing because writing test cases for protocols is an uphill start. You simply don't have any constraints on how to start. Write the tests in plain text? How to encode the behavior? Write the tests in a programming language? How to execute the tested client? It's not impossible to have a client/server-agnostic test library, but it's non-trivial to design the framework.
That said, it depends on your goals. Writing pragmatic, limited test cases for protocols is super hard, due as you say to the lack of constraints.
But if your goal from the outset is to write a definitive, exhaustive test suite then it's a far more mechanical task (much the way that writing a chess AI is hard if you want it to run on a desktop computer, but writing a program to play perfect chess only requires a simple understanding of graph searching if you don't care how fast it is.) Just start from the start of the protocol and work your way through one statement of a time, enumerate all the different ways that an implementation could cock it up, and write a test for each. Of course there are still engineering decisions to be made but you don't have to pick the perfect solution to each. A solution is enough, you (or someone else) can always improve it later.
You mean, like Windows?
And the OS absolutely has access to the HTTP frame: it manages the process's network buffers and its whole memory mapping, it locates and loads OpenSSL at the process's startup... a process is really not a black box from the OS point of view.
Otherwise we will keep chasing bugs forever.
There is a talk by James Kettle about request smuggling with HTTP/2, but it is largely about attacks when the frontend talks HTTP/2 and then converts to HTTP/1.1 to talk to backend servers . That said, it does also highlight some HTTP/2-only quirks, so it’s not completely perfect, but it’s so much better than HTTP/1.1.
The problem isn’t that we don’t have a good header format. The problem is we have too many.
JSON is a terrible format. Especially for streaming data.
Here's a super simple shell script for generating invalid JSON that will blow Python's stack:
n="$(python3 -c 'import math; import sys; sys.stdout.write(str(math.floor(sys.getrecursionlimit() - 4)))')"
left="$(yes [ | head -n "$n" | tr -d '\n')"
echo "$left" | python3 -c 'import json; print(json.loads(input()))'
a) You run out of memory.
b) The connection ends.
Because JSON is terrible for streaming data.
The basis is simple, but then add Cookies, HTTPS, Authentication, Redirecting, Host headers, caching, chunked encoding, WebDAV, CORS, etc etc. All justifiable but all adding complexity.
Joking aside, some "features" in HTTP/1.1 are really questionable. Trailing headers? 1xx responses? Comments in chunked encoding? The headers that proxies must cut out in addition to those specified in "Connection" header except the complete list of those is specified nowhere? The methods that prohibit the request/response to have a body but again, the full list is nowhere to be found?
All these features have justifications but the end result is a protocol with rather baroque syntax and semantics.
P.S. By the way, the HTTP/1.1 specs allow a GET request to have a body in chunked encoding — guess how many existing servers support that.