The State of HTTP in 2022

jimjag · on Jan 1, 2023

IMO, the reason for the initial success of HTTP (1.x) was due to its extreme simplicity. Things like text-based protocol, straightforward stateless design, capability to implement both server and client in easy, basic code. All this meant that the protocol itself was stable, usable, and a reliable standard to use.

The current path is to drastically increase complexity due to the demands of the content provider overlord(s); Basically, in order to better accommodate the needs of Google (and a handful of others), we must redefine things for everyone. It's becoming a complex, over-designed protocol that is being crammed down people's throats, instead of a protocol that is embraced because it makes sense.

rubatuga · on Jan 3, 2023

It's interesting that cloudflare states on their own blog that HTTP/3 has no benefits in real-world usage:

https://blog.cloudflare.com/http-3-vs-http-2/

afiori · on Jan 2, 2023

HTTP 2 allowed for in-stream multiplexing

est31 · on Jan 1, 2023

I'm still not over the fact that they made headers all-lowercase in HTTP/2. I know the reasons, but it's so weird to have all-lowercase headers. TBH I don't see much of an uptake in the community either: since HTTP/2 came out, I've barely seen lowercase headers be proliferated in documentation of headers, e.g. MDN lists them HTTP/1 style: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers

treve · on Jan 1, 2023

Anything working with headers should always have been case insensitive, so there's no reason to update the any documentation.

opportune · on Jan 2, 2023

As someone who works in infrastructure I disagree with this. While it’s true that applications should always have done this, many many do not and need to be dealt with. If they are third party paying customers, your only option is to figure out how to make things work, because asking customers to upgrade will piss many of them off.

The simple fact so many applications do not treat headers as case insensitive is one of the major things holding back http 2. With HTTP/1 it was suggested but not enforced. Upgrading to http/2 with forced-lowecasing breaks these applications

treve · on Jan 5, 2023

What part are you disagreeing with?

avx56 · on Jan 1, 2023

Someone should write a DevTools extension that uppercases the headers, if such a thing is possible. :)

codetrotter · on Jan 1, 2023

If you want to break protocol, I suggest that we also change all of the HTTP verbs for no reason whatsoever :^)

GET -> FETCH

HEAD -> EXAMINE

POST -> SUBMIT

PUT -> REPLACE

DELETE -> REMOVE

CONNECT -> PROXY

OPTIONS -> ALTERNATIVES

TRACE -> ECHO

PATCH -> PARTIALUPDATE

Doctor_Fegg · on Jan 1, 2023

Stop trying to make FETCH happen.

Lockal · on Jan 1, 2023

Need to consult specialists from Stanford, because word "Submit" is inappropriate!

ref: https://news.ycombinator.com/item?id=34039816

artemissian · on Jan 1, 2023

They should ofc have used

SELECT CREATE UPDATE DELETE SUBMIT

and nothing more :D

#TheWebThatNeverWas https://youtu.be/8JOD1AQGqEg?t=1240

BrandoElFollito · on Jan 1, 2023

HEAD, POST and PUT replacement are actually nice and better than the originals :) (at least for a non-native speaker of English).

Why ALTERNATIVES for OPTIONS?

avx56 · on Jan 1, 2023

I just meant for display... :( Like it changes content-type to Content-Type, which shouldn't break anything right...?? HTTP headers are supposed to be case insensitive anyway right?

1vuio0pswjnm7 · on Jan 2, 2023

If the Chromium/Chrome browser is open source then anyone should able to edit the source code to change the headers to whatever case they prefer and re-compile.

NickLamp · on Jan 1, 2023

The new QUERY method strikes me as a really promising addition. Not being able to send a body with a GET-type request is a gnawing issue I have with HTTP

a_c · on Jan 1, 2023

Elastic search (used to?) use HTTP body as parameters for a GET request. IIRC the HTTP specification doesn't (again, used to?) mandate GET request to have no HTTP body

Edit: Example of Elastic search API with GET body, now deprecated https://www.elastic.co/guide/en/elasticsearch/reference/7.7/...

Edit 2: https://stackoverflow.com/questions/978061/http-get-with-req...

The now obsolete specification https://www.rfc-editor.org/rfc/rfc2616 , obsoleted by https://www.rfc-editor.org/rfc/rfc9110

jillesvangurp · on Jan 1, 2023

It still does. I don't think it violates the HTTP 1.1 specification but more that it is unspecified. It's just that a lot of http clients simply don't support doing HTTP GET with a body under the assumption that it is redundant / not needed / forbidden. Of course elasticsearch allows POST as an alternative.

People used to obsess a lot more about those HTTP verbs and their meaning a lot more than today. At least I don't seem to get dragged into debates on the virtues of PUT vs. POST, or using PATCH for a simple update API. If you use graphql, everything is a POST anyway. Much like SOAP back in the day. It's all just different ways to call stuff on servers. Remote procedure calls have a long history that predates all of the web.

chrisweekly · on Jan 1, 2023

GET requests with a body (unspecified in HTTP/1.1) reminds me of a similar case I encountered years ago: URL query params in a POST (an HTML form whose action attr contained a query string).

hackerman123469 · on Jan 2, 2023

It's fairly common to have POST requests that have both a body and a query string. Or maybe not "fairly common" but it isn't really rare.

chrisweekly · on Jan 5, 2023

I know it's not that rare. But it is problematic.

pixl97 · on Jan 1, 2023

I feel not obsessing about the meanings of HTTP verbs can, has, and will lead to security incidents and interoperability issues between middleware. Specifications where everyone gets to pick and choose different behaviors is a nightmare.

int_19h · on Jan 1, 2023

The obsession with fine-grained distinctions may be gone, but GET/POST is still relevant when talking about caching etc.

bruce511 · on Jan 1, 2023

Interestingly though GET with data exists in the wild, and has for many years.

I manage a http library class, and a customer encountered an API that required a GET but with data. (think query parameters passed as XML).

I implemented that for the customer, and then implemented the reverse in the server class. I'm not going to say its used a lot, but it makes semantic sense.

Incidentally it also becomes true for DELETE which is another request typically without a body.

This is the first I've heard of QUERY though, so look forward to reading up on that.

sakisv · on Jan 1, 2023

Just a heads-up that cloudfront will return a 403 when it receives a GET with body.

It's documented and all but I still find it a peculiar choice. A 400 would have been better and less of a red herring.

jshier · on Jan 1, 2023

It also produces an error on all Apple platforms, as it's banned by URLSession.

Waterluvian · on Jan 1, 2023

Is it that you can but you’re just told not to, even if both client and server agree on the semantics?

0xb0565e487 · on Jan 1, 2023

Just send a POST request lol

HideousKojima · on Jan 1, 2023

But you're only "supposed" to use POST with requests that add/modify data, or something silly like that.

In practice, QUERY is most useful for where you want a bunch of different verbs for the same endpoint and need a body.

spiffytech · on Jan 1, 2023

QUERY is cacheable.

est · on Jan 1, 2023

... in theory only. Not that many http cache programs support QUERY. And many HTTP middle-boxes bans non GET/POST verbs.

counttheforks · on Jan 1, 2023

For now. Change has to come from somewhere.

cogman10 · on Jan 1, 2023

And retry-able.

NickLamp · on Jan 1, 2023

I do... but there are good reasons to not want to. If you read the linked spec you can learn for yourself

https://httpwg.org/http-extensions/draft-ietf-httpbis-safe-m...

duxup · on Jan 1, 2023

In practice I think that’s exactly what many people do.

Existenceblinks · on Jan 1, 2023

GraphQL is doomed?

RedShift1 · on Jan 1, 2023

No, this would actually be very useful for GraphQL API's.

asdfghjhgfderty · on Jan 1, 2023

nothing on the spec prevents arbitrary data on body of a GET. but clients and proxies are implemented by lazy people who make excuses they are preserving some legacy security feature or something and continue to ignore the spec.

insanitybit · on Jan 1, 2023

The spec is actually pretty clear on this - do not specify a body on a GET request.

> A payload within a GET request message has no defined semantics; sending a payload body on a GET request might cause some existing implementations to reject the request.

Previously it was "SHOULD ignore the payload".

It's nothing to do with laziness or security - people are writing spec conforming software. And indeed every library I've used allows interacting with a body, even on a GET.

simplotek · on Jan 1, 2023

> The spec is actually pretty clear on this - do not specify a body on a GET request.

That's not what your quote says.

Not having a defined semantics does not mean if is not supported. Just because some implementations fail to support GET with a request body that it does not mean all implementations should interpret it as a malformed request.

I can roll out a service with endpoints that require GET with request bodies and it would still be valid HTTP.

masklinn · on Jan 1, 2023

> That's not what your quote says.

Yes it does. "No defined semantics" = "out of spec".

> I can roll out a service with endpoints that require GET with request bodies and it would still be valid HTTP.

You're out of the HTTP spec entirely.

mekster · on Jan 1, 2023

How are you interpreting that English?

Not defined means, it could be anything. If accepting body in GET is out of spec, then spec is supposed to say, GET cannot send body.

isityouyesitsme · on Jan 1, 2023

"out of spec" means that it is out of specification. It is literally not specified. You are doing something that is not specified. It is therefore an action that is out of specification. it is therefore out of spec.

If there was an utter ban, then it would be against specification and not compliant, not merely out of specification.

simplotek · on Jan 3, 2023

> "out of spec" means that it is out of specification. It is literally not specified.

That's not what it means at all. Being out of spec means the spec explicitly stating that a request with a body should be reject. If the spec does not state that a request with a body should be rejected then you are not required to reject a request which packs a body.

masklinn · on Jan 1, 2023

> Not defined means, it could be anything.

No, not defined means it's not within the purview of the spec. Spec doesn't care. You can send one. Maybe it'll work, maybe it won't, maybe it'll crash, maybe it'll be rejected, maybe some proxy along the way will strip it and the server won't even get it, maybe it'll get your client banned forever.

All of these are fine, because spec doesn't care.

> If accepting body in GET is out of spec, then spec is supposed to say, GET cannot send body.

No, then it would be against spec, like HEAD with a response body.

insanitybit · on Jan 2, 2023

You can do whatever the fuck you want, the spec defines what it defines.

benbjohnson · on Dec 31, 2022

One implementation detail about QUIC that I was surprised by was that it requires TLS. That’s great for improving the security on the public Internet but it seems like it adds complexity and CPU overhead if you’re running on something like an internal Wireguard network. Overall, though, it’s a minor complaint. I did like how they split apart the QUIC and HTTP/3 protocol from one another.

Matthias247 · on Jan 1, 2023

My experience as former technical lead of a major HTTP/3 deployment is that while TLS certainly adds CPU overhead, it's by far not as major as the overhead of TLS on top of TCP is.

QUIC in general is far less efficient than TCP+TLS. Very optimized implementations require about 2x CPU for bulk data transfers, more usual ones (and running on operating systems that don't support UDP segmentation offloads) require about 5x the CPU. From that CPU overhead, only a small part is crypto. Most CPU time is spent in the OS networking stack processing the tiny MTU sized packets. In an optimized implementation where crypto has a bigger impact, you might see around 30% CPU time being spent in crypto operations in a profile when using hardware-accelerated AES (chacha20 is worse). Which means one could gain that amount of CPU back for other things in a cryptoless QUIC. However in a less network-optimized deployment it would only be 10%.

What I however can understand is people not wanting to deal with the complexity of issuing, deploying and rotating certificates for internal deployments that are already secured by other means like wireguard. It could be a concern - but on the other hand tools like the k8s cert-manager already simplify the process for those environments. And of course one needs to consider whether QUIC is the perfect tool for those environments anyway - plain TCP has lots of strength too.

djbusby · on Jan 1, 2023

What kind of profiler can tell which areas the CPU is burning cycles?

Like, just mapping a bunch of known function and timing? Then every "ssl_" function is classified as crypto time? And every "net_" function is networking? (Or something like that?)

Matthias247 · on Jan 1, 2023

perf can do the job on Linux.

Networking cost will be within callstacks for the system calls that send and receive packets (sendmsg, sendmmsg, recvmsg, recvmmsg).

Crypto cost in functions which sound like the crypto primitives being used. They typically won't have ssl_ in the name, because QUIC implementations directly make use of lower-level primitives - that are e.g. exposed by libcrypto/ring/etc.

Here's one example of an old profile using Quinn:

https://gist.githubusercontent.com/Matthias247/47dc290dde72e...

It clearly shows the networking and crypto parts (here using ChaCha20). However don't interpret too much into the actual values in this graph, since the profile is nearly 2 years, uses ChaCha20 instead of AES (much more expensive), and used loopback networking (cheaper than the real thing).

toast0 · on Jan 1, 2023

Intel vTune and AMD uProf, as well as opensource tools like hwpmc for FreeBSD and perf for Linux.

int_19h · on Jan 1, 2023

My concern long-term is that browsers will simply drop other options.

And keep in mind that "internal deployments" also include home servers and such, where it all makes even less sense.

ric2b · on Jan 5, 2023

It does make sense though, our home networks aren't safe these days when random apps on your phone and random websites on your web browser can just make requests to random hosts inside your network, possibly exploit your router and then snoop on traffic.

mnot · on Jan 1, 2023

The precedent for that was set by http/2; while the spec didn’t require it (Snowden happened after that decision was made), all of the inplementatuons did.

devmunchies · on Jan 1, 2023

I believe GP is talking about the lower level QUIC protocol. That’s like TCP requiring TLS.

A tradeoff would have been QUIC not requiring it but the higher level http/3 protocol requiring (if that’s possible idk)

tialaramex · on Jan 1, 2023

TLS delivers three things, Confidentiality, Integrity and Authentication. But intuitively these three things are what people expect their transport protocol to do anyway, it's actually surprising as a user that TCP doesn't really bother providing say, Integrity. "Oh, yeah, your data might be arbitrarily changed by the time it is delivered". Er, what?

BCP 78 "Pervasive Monitoring is an Attack" says the Internet should design new systems to resist such monitoring and so offering these intuitive properties for a new transport protocol made sense. And that's what QUIC is. In a BCP 78 universe it doesn't make sense to ship new protocols that will be rendered useless by the surveillance apparatus.

csmpltn · on Jan 1, 2023

> "it's actually surprising as a user that TCP doesn't really bother providing say, Integrity"

TCP is a transport-layer protocol (ref the OSI model). The responsibilities of a transport-layer are neatly summarized here: https://en.wikipedia.org/wiki/Transport_layer

The OSI model is comprised of layers of abstractions, each building on-top of each other. Making TCP (a transport-layer protocol) depend on TLS (an application-layer protocol, ref https://en.wikipedia.org/wiki/Application_layer) is completely backwards, and makes absolutely no sense when considering the model.

The OSI model has been around for a very long while. If you have a degree in CS, or have studied networking in university - chances are you've had to learn about the OSI model. There's no reason to throw that away now, or reinvent the wheel.

tialaramex · on Jan 1, 2023

> The OSI model has been around for a very long while.

Even imagining that for some reason I wasn't aware of that, why would it be relevant?

> If you have a degree in CS, or have studied networking in university - chances are you've had to learn about the OSI model.

And the waterfall software development model. A bunch of long obsolete data structures. Open Hypermedia (remember that? No? That's OK it doesn't matter any more).

What's happened here is that you've privileged a bad model that somebody probably taught you out of a textbook (hopefully while grimacing, since this is useless "information") over the real world experience of dozens of really smart people who work with actual networking and designed QUIC.

Maybe, if I'm giving you benefit of the doubt you've assumed "user" somehow means "undergraduate who is studying the OSI model" but it doesn't - billions of people use the Internet, far more than will study any degree, let alone Computer Science. And it's quite reasonable for them to expect their communications to have these three properties.

If your preferred model insists that we can't have security until the application layer then your model is wrong, just as surely as if you have a model of the Atom which assumes it's a solid ball of something (the plum pudding model, as with OSI perhaps some undergraduates were taught this model between when it was proposed and when experiments showed it's just wrong).

csmpltn · on Jan 1, 2023

I have absolutely no idea what anything you just wrote means. You're ramblings make no sense whatsoever, and they lead towards no conclusions. Get over yourself.

> "What's happened here is that you've privileged a bad model that somebody probably taught you out of a textbook"

So you're dismissing the OSI model as a "bad model" that "somebody taught me", yet there's this other shiny thing developed by "dozens of really smart people" and that's clearly the way to go because those people "work with actual networking".

> "And it's quite reasonable for them to expect their communications to have these three properties."

Their communications are clearly secured and tamper-proof already today, despite not using QUIC. How do you square that?

The security happens at the application layer, using mechanisms such as TLS. Additional security mechanisms can clearly come on-top. No need to bake everything into a single transport-layer protocol when there's endless flexibility available by layering the protocols on-top of each other.

> "If your preferred model insists that we can't have security until the application layer then your model is wrong"

How about instead of throwing fits about some vague ideas of "security" you actually provide concrete examples of what it is you're talking about, so that we can have a meaningful and constructive conversation about technology?

All I was saying is that there's no reason to introduce more complexity into the transport layer, since "security" is already handled by the application layer (on a per-application basis). It's unclear whether a single "security" model fits into the transport layer, which should be as agnostic and light-weight as possible (hence it's original intention - to facilitate the transfer of information between endpoints, and leave the rest to whatever consumes that information).

tialaramex · on Jan 1, 2023

> All I was saying is that there's no reason to introduce more complexity into the transport layer

But there is, and I explained what that reason was. The real world is under no obligation to faithfully copy your OSI Model, on the contrary, the fact the OSI Model doesn't resemble the real world is a good reason to abandon it.

csmpltn · on Jan 2, 2023

> "But there is, and I explained what that reason was."

If your reason is "security" (whatever that means, you won't define that either), then I too, explained how that is guaranteed by application-layer protocols already today - so it's unclear why changing the transport layer is needed. Again, you're not saying anything concrete and keep handwaving - I don't think you really want to (or can) have a conversation, really.

int_19h · on Jan 1, 2023

I studied networking in the university, 20 years ago. We did study OSI as well as TCP/IP and the rest of the actual Internet stack, and it was blatantly obvious that the latter doesn't really conform to the former - protocols straddling boundaries etc. For example, TCP is not a strictly transport-layer protocol, since it also handles sessions.

When asked, our prof readily admitted that the OSI model was one of those design-by-committee experiments in purity that quickly broke down IRL.

Jasper_ · on Jan 1, 2023

TCP/IP does not use the OSI model. Stacking layers on top of each other promotes inefficiency and poor security, and is one reason why TCP/IP won over the OSI-recommended protocols.

csmpltn · on Jan 1, 2023

> "Stacking layers on top of each other promotes inefficiency and poor security"

Yet that's exactly how the internet works today.

ric2b · on Jan 5, 2023

The advantage of merging TCP and TLS is improved performance and security/privacy (security/privacy is improved due to fewer session parameters being readable or modified by third parties)

So now what is the benefit of sticking to the OSI model, so that we can evaluate the pros and cons?

asdfghjhgfderty · on Jan 1, 2023

it's insane to expect the opposite: that traffic in my own network will need to keep reaching to a certificate authorities outside to validate packages from one host to another.

if you don't understand why these 3 things are on top of tcp, well, nevermind, I was going to say you shouldn't be designing networks buy you migth be already on the quic steering committee.

most committee now are a joke so that some googler middle mamaget makes it to jr director. sigh.

tialaramex · on Jan 1, 2023

The use of TLS for QUIC does not imply or require the use of the Web PKI which is what I assume you're thinking of by "certificate authorities outside to validate packages".

csmpltn · on Jan 1, 2023

> "The use of TLS for QUIC does not imply or require the use of the Web PKI"

Handling certificate revocations (which would be needed to "ensure security"), does indeed imply the use of some way to check for the revocations in a timely manner. The revocation lists themselves can be tampered-with.

tialaramex · on Jan 1, 2023

You've jumped from assuming the Web PKI, which isn't required, to assuming online revocation checks, which is even more not required.

csmpltn · on Jan 2, 2023

So how does your imaginary version of a transport-layer guarantee a message can't be tampered with if it trusts keys which are revoked?

ric2b · on Jan 5, 2023

Web PKI is not the only way to revoke keys.

csmpltn · on Jan 6, 2023

> "Web PKI is not the only way to revoke keys."

You're not answering my question (we both know why), and I never mentioned anything about WebPKI in any of my comments anyways.

danuker · on Jan 1, 2023

HTTPS does not address state-level threats. Your browser trusts a load of entities by default, any of which could be forced to meddle with your data.

Lenovo Superfish exposed the limitations of the CA system.

PoignardAzur · on Jan 1, 2023

I think saying it "doesn't address" these threats is a bit extreme.

It significantly reduces the attack surface (Certificate Authorities vs every ISP), it makes it a lot harder for state actors to pull off those attacks deniably with a gag order, and it makes it a lot easier for an informed but non-expert consumer to pick a secure-by-default solution.

snowwrestler · on Jan 1, 2023

Lenovo Superfish was a local exploit; the software was installed at the factory. It could have been any sort of root kit or other client software. Once an attacker has control of your local device, lots of things are possible. It’s true that HTTPS won’t defend against local attacks, but that doesn’t really seem like a fair criticism since that is not what it is supposed to do.

The defense against compromised certificate authorities starts with platforms and browser makers. They demand that CAs implement certificate transparency logs to be included in root stores.

They also monitor CT logs, as do most large site operators. Facebook for example does not run an OS platform or browser, but has a robust CT monitoring program.

So if one of the random little CAs in the root store of your browser issues a rogue cert for “google.com”, it will be logged and seen, and that CA will risk getting kicked out of the root store. That’s what happened to Symantec, which was not a small CA.

In general it is safer and quieter for bad guys to target client devices with attacks like Pegasus, than systemic actors like entire CAs.

danuker · on Jan 2, 2023

> So if one of the random little CAs in the root store of your browser issues a rogue cert for “google.com”, it will be logged and seen

The victim might be the only one getting a collision as governments target them (and no security researchers get the compromised site + public key), and the Superfish fiasco shows that a collision is simply ignored by the browser.

IgorPartola · on Jan 1, 2023

I think it makes sense, I would just love to improve ergonomics around getting certificates for internal services.

For example, I use the TP-Link Omada Wi-Fi access points and have a local hardware controller for them. The hardware controller can have a static IPv4 (ugh for no IPv6 support) but since it doesn’t support Let’s Encrypt its only way to get a certificate is to upload one via the web GUI. I can of course create my own CA and install my own cert with a long expiration date but then that would mean installing my CA on a bunch of devices from which I might access the controller.

Maybe the solution is something like having your DHCP box also be able to run an ACME server scoped just to your local domain and have that CA be trusted for your local domain by all your devices that get their IP from the DHCPv4/6 via a DHCP option.

comfypotato · on Jan 1, 2023

The link to the new query method [1] intrigued me. Could this, if widely adopted, make GraphQL obsolete? (Or am I admittedly ignorant as to exactly what they each do?)

[1] https://httpwg.org/http-extensions/draft-ietf-httpbis-safe-m...

mnot · on Jan 1, 2023

If everything works out, grapnel could use it and queries would be cacheable by generic http caches. A lot would need to happen to make that work, though. The immediate motivation is to get an alternative to people who want to put a request body on GET.

mnot · on Jan 1, 2023

Graphql. I love autocorrect.

vram22 · on Jan 1, 2023

A sailor must have invented it.

kindaAnIdiot · on Jan 1, 2023

IIRC query is just a method that you'd use for your GraphQL route. It doesn't actually have any querying system defined in the spec (See final sentence here https://httpwg.org/http-extensions/draft-ietf-httpbis-safe-m...)

Simran-B · on Jan 1, 2023

The main focus of HTTP 2 and 3 was apparently performance, and hardly any of the other fundamental issues of the protocol. It's nice to see that some work is put into something like QUERY as well, which improves HTTP as an API protocol (REST) - although the motivation appears to be performance again and I always have mixed feelings when I see the term "idempotent" in a context that likely involves database systems. Too often, such requests do have side effects of some kind and preventing request retries throughout the stack isn't easy.

chucke · on Jan 1, 2023

Definitely not performance, QUERY is an improvement/hackfix. Browsers have a cap on size of the request headline path (used to be very small for IE7), and server software also tend to cap the size of it, as a security mitigation (smth like max header size, similar to what happens with cookies). For this reason, technologies such as elasticsearch or graphql, which allow for quite large query defs, use POST and fit the query in the body instead, which is not bound to the same limitations.

Semantically, such queries are cacheable, but the request being POST, intermediate edge proxies or CDNs won't cache the request. This means clients pay the penalty every single time. Moreover, browsers can't have url-based history with such solutions, as POST requests can't "go back".

So QUERY fixes this. Semantically. Now you have to build support in middle boxes for it, and browsers must support it as well.

Simran-B · on Jan 4, 2023

As far as I know, GET and DELETE requests support a payload body as per one of the later specs, whereas an earlier version wasn't fully clear on this and people assumed it's forbidden. Unfortunately, there's still a lot of software conforming to the interpretation of the old spec, which means you cannot count on these HTTP verbs to be supported.

Given this legacy, QUERY seems like an adequate solution. However, it seems to me like it's only intended for read queries. Write queries are generally not cacheable semantically, at least not in my world. Was the naming influenced by GraphQL by any chance, which distinguishes between queries (read) and mutations (write)?

rswail · on Jan 1, 2023

The "idempotency" is that the request should not cause a state transfer from the client to the server that causes the server to change the resource.

That doesn't mean that the server can't change the resource itself for other unrelated reasons.

Simran-B · on Jan 4, 2023

The practical effect that my coworkers keep telling me about is that the client or an intermediary like a load balancer may retry the same request until it is successful, with the expectation that the resource (records or settings in a database system in our case) is only modified once. A database query that does something like value = OLD.value + 1 cannot be safely retried. There are circumstances under which the outcome of a transaction can be unknown but it might still succeed, and the client should be prevented from doing accidental changes. With read queries this should be rare, but even then there can be side-effects affecting the overall system state. There are of course solutions to this for the application side, like the usage of idempotency keys.

nesarkvechnep · on Jan 1, 2023

GraphQL is obsolete today as it was when it was released. You have HATEOAS and with it you can take advantage of HTTP caching. No, Redis is not the same.

Kamq · on Jan 1, 2023

HTTP methods don't really have functionality inherently attached. They're only semantically meaningful. Occasionally intermediate infrastructure will treat them slightly differently, such as a firewall stripping bodies from GET requests, but that's not part of the spec.

You can even use custom methods. I have a server that responds to the SALAMI method.

no_wizard · on Dec 31, 2022

I thought the most interesting bit was the privacy OHTTP which they’re building a service around[0]. How this will differ from a VPN will be interesting. The gist of it is that the http connections are naive and don’t really record an “accurate” up address or trace, if I understand correctly

[0]: https://blog.cloudflare.com/building-privacy-into-internet-s...

intelVISA · on Jan 1, 2023

Maybe I misunderstand but this just looks like a buzzword mix of proxy + QUIC (i.e. TLS wrapped HTTP).

Seems kinda pointless given that most of the internet is already behind a Cloudflare MITM anyway.

mnot · on Jan 1, 2023

The idea is to compose intermediaries run by different parties, so that no single entity has access to both the unencrypted payload and client ip address / connection context. It’s a controlled form of proxying where the proxy doesn’t have access to the plaintext. Not currently designed for everyday browsing; the use cases are things like collecting telemetry.

nly · on Jan 1, 2023

So....Tor

csande17 · on Jan 1, 2023

CloudFlare's solution is significantly less secure than Tor; it's more like visiting an HTTPS website using a VPN service. You have to trust that the VPN and the website operator aren't colluding to de-anonymize you, and that they aren't both being monitored by the same third party (who can de-anonymize you using timing information).

intelVISA · on Jan 1, 2023

Possibly by design as the NSA leans pretty heavy on CF these days for DPI.

eastdakota · on Jan 1, 2023

No they don’t.

xen2xen1 · on Jan 1, 2023

Well that's an interesting statement from an interesting person.

qchris · on Jan 1, 2023

I thought you were being facetious about what could have been a throwaway comment from a throwaway account, but then I checked their bio; this particular type of interaction is one of the things that I love about HN.

xen2xen1 · on Jan 3, 2023

Same here, I've learned to check. I really should have explicitly pointed out who, so it was less blown off. Almost came back and did so. And yes, why HN is great.

OverclockX64 · on Jan 1, 2023

Well, Tor is really slow to be honest, I think that’s where OHTTP will shine.

pdimitar · on Jan 1, 2023

What's this "HTTP core"? Please tell me it's just the sane parts of HTTP and without any dark corners and ambiguous specs? Please tell me it's something you can write a parser for in an hour?

We're no less 10 years overdue for something like this.

And if it's not that... shame.

secondcoming · on Dec 31, 2022

HTTP is an unmitigated mess. It's in the same state as C++; tries to be everything but has lots of historical baggage that can't/won't be removed.

travisgriggs · on Jan 1, 2023

Similar thoughts. As I read through the “why HTTP took off” explanations, I was surprised that the more obvious “it was the first thing that a plurality of players networked on and after that it was all just bandwagoning, because it’s much easier to meet people at other people’s parties than try to host your own” wasn’t enumerated.

Simran-B · on Jan 1, 2023

"Ease of use" is mentioned, and I suppose this refers to HTTP being text-based, pretty readable and intuitively understandable. HTTP/2 and later aren't like this anymore, for the sake of efficiency. It's a trade-off, but at this point I don't see much value in using it for APIs anymore. If you value efficiency and clean design, there are clearly better alternatives that aren't more complicated than what HTTP has become.

nirui · on Jan 1, 2023

By "ease of use" they probably mean that the ecosystem is so advanced, one can easily write a HTTP handler in the language/framework of their choice.

In reality through, now days it's a nightmare to implement a HTTP server/client (at least for individual developer like myself). HTTP/1 is already a struggle by itself, HTTP/2 is much worse with all the new concepts & HPACK etc it added, and then you have HTTP/3 which introduces an entire UDP transport for you to implement. There is no easy part in HTTP if you choose this route.

shukantpal · on Jan 1, 2023

I'm curious, what other protocols do you have in mind?

Simran-B · on Jan 1, 2023

WebSocket has a lot less fluff, but you could as well use a raw TCP socket and send JSON payloads over it with your own "protocol", e.g. with an op attribute for the action and an error attribute in responses. Another protocol that comes to mind is Apache Thrift.

nly · on Jan 1, 2023

Grpc + protobufs?

RedShift1 · on Jan 1, 2023

Lol, GRPC uses HTTP as its transport :'). Try again.

cryptonector · on Jan 1, 2023

HTTP will never go away. The reason it's so widely used is that browsers are universal -and universally available- clients for arbitrary applications. And on top of that, you can mash multiple sources of content. The latter is also the reason that the web's security model is so awful.

mhd · on Jan 1, 2023

Interesting, I'd say the legacy parts are the actual good things and most of the corporate giga-scale add-ons are what's bad about it, but then again, I'm sure that we get the same for C++, too…

einpoklum · on Jan 1, 2023

Well, if it were in the same state as C++, that means it manages to let you get things right despite the historical baggage by embracing newer features in the standard.

Existenceblinks · on Jan 1, 2023

I would love a queryable file format akin to sqlite but with simple spec tailor made for the QUERY method.

Alifatisk · on Jan 1, 2023

When can I expect, let’s say Rack & Rails to implement HTTP/3?

shaggie76 · on Jan 1, 2023

As someone who's had to deal with many DDoS attacks I'm rather horrified at the thought of QUIC: dropping UDP at the network border eliminates a lot of headaches.

10000truths · on Jan 1, 2023

I'm not sure I follow. Such attacks happen because of poorly designed protocols with very asymmetric resource requirements between endpoints. My understanding is that the QUIC handshake has measures in place to prevent it from being a vector for amplification/reflection based denial-of-service attacks.

masklinn · on Jan 1, 2023

> I'm not sure I follow.

They're saying that blocking UDP entirely is a very effective (and cheap) mitigation of DDOS.

Because QUIC is built on UDP, however, you can't just block it, you have to ingest it and dedicate resources to filtering between QUIC and other UDP traffic.

10000truths · on Jan 1, 2023

I question the premise of such an approach. Denial-of-service is caused by applications that consume disproportionate resources based on untrusted user input. That’s entirely orthogonal to whether the application accepts input over UDP or TCP.

I would raise hell with my ISP/cloud vendor/network operator if they thought that it was appropriate to cut corners and block me from using UDP. That’s more likely to DoS me if it means my games or video calls (or any of a million things that legitimately use UDP) stop working or become significantly degraded.

toast0 · on Jan 1, 2023

If you've got a 10gbps server that's attracting 100gbps of UDP reflection DDoS traffic, and your host has ingress capacity for that, their easy/simple/low cost options to help are most likely null-routing your ip, which doesn't really feel like helping, or dropping udp to your ip.

If you're running a UDP service, you're going to need something more complex. maybe the host can drop all udp but port 80, but maybe the DDoSers can generate the reflection traffic to port 80 instead of random ports. If you're getting too much volumetric DDoS on your service port, you'll need to have some sort of filtering that understands your service traffic better and has enough ingress capacity. Usually that's expensive, not super flexible, and often adds a lot of latency.

masklinn · on Jan 1, 2023

> I question the premise of such an approach.

Feel free to question the direction the sun raises from.

> Denial-of-service is caused by applications that consume disproportionate resources based on untrusted user input. That’s entirely orthogonal to whether the application accepts input over UDP or TCP.

It's not, UDP-based protocols are generally mis-directionable and amplifying, which allows for much easier DOS-ing.

> I would raise hell with my ISP/cloud vendor/network operator if they thought that it was appropriate to cut corners and block me from using UDP.

They're doing the exact opposite of cutting corners. But hey good luck using video calls when the routers are melting, I'm sure that's going to be great.

> That’s more likely to DoS me if it means my games or video calls (or any of a million things that legitimately use UDP) stop working or become significantly degraded.

Only if you operate under the misguided assumption that hole-punching is not a thing.

Hell, any NAT requires specific handling of inbound connections to perform proper translation, and "drop" is a perfectly good default translation for an unrequested inbound.

userbinator · on Jan 1, 2023

It's ironic to see that coming from a company which profits from DDoS prevention services.

shaggie76 · on Jan 1, 2023

Sorry to mislead you; I don't work for a protection service but my company has had contracts with a handful of them and I've tested a handful more.

userbinator · on Jan 1, 2023

I'm referring to Cloudflare.

userbinator · on Dec 31, 2022

The state of proliferating sites that won't let you access them because you're not using an "approved" browser with JS and cookies enabled, and hiding behind the "security" excuse to do so. Ironic that I can't even read an article about HTTP in 2022 because of that racket.

Thanks, Cloudflare.

zdragnar · on Jan 1, 2023

It isn't your security they're concerned about, it is the server's- it is preventing DDoS attacks.

If you want to blame someone, blame the people who poisoned the well for everyone else.

userbinator · on Jan 1, 2023

it is preventing DDoS attacks.

Is it? Those who want to DDoS will always find a way, and meanwhile users with slightly odd hardware/software are being locked out. Admittedly the latter is a minority, but one of the key tenets of the Internet and what made it so successful was interoperability. This is, in some ways, even worse than (but somewhat of a following effect of) the effective browser monopoly.

Calling it "security" when it's really about "availability" is another deceptive misdirection, because the former is something that can more easily persuade the sheeple.

I don't really like to go to extremes, but fingerprinting clients and deciding access based on such should really be regarded as a moral equivalent to racial profiling.

zdragnar · on Jan 1, 2023

Profiling happens all the time for entirely legitimate reasons in the real world. Racial profiling is immoral not because it is profiling; it is immoral because it is based on immutable characteristics which have no intrinsic bearing on the purpose of the profile.

To tie this back to the topic at hand, you're complaining that a service has decided your traffic resembles known patterns from bad actors, and is asking you to go through an extra step to access the content.

Are there better options? Maybe, but it's utterly asinine to compare what cloudflare is doing to racial profiling.

acdha · on Jan 1, 2023

> Those who want to DDoS will always find a way

This is like saying no lock will stop a thief. Cost and difficulty matter, and requiring a full browser increases the challenge for an attacker enough that some people will give up and others won’t be able to send as much traffic. That’s not perfect but speaking from experience a surprising fraction of people will give up after a naive attack fails.

cortesoft · on Jan 1, 2023

> Those who want to DDoS will always find a way

Not really. We are actually pretty good these days at stopping DDOS attacks.

mysterydip · on Jan 1, 2023

Not sure how accurate this is, but currently shows 150+ DDoS attacks going on globally: https://www.netscout.com/ddos-attack-map

acdha · on Jan 1, 2023

It’s accurate. That’s a marketing page for a DDoS prevention service so it’s not an unbiased source, and it’s especially important to remember the distinction between traffic hitting something like an edge node and actually reaching the target and causing harm. I see attacks fairly regularly (politics) but in most cases it means I see 15M block events for “GET /“ in Cloudflare’s dashboard but no actual impact on the service because they’re dropped quickly at locations around the world or, if they faked real browsers, they got a bunch of cache hits.

In other cases, people try more sophisticated attacks (e.g. posting random terms to a search page to avoid caching) and that’s more of a problem but it’s probably like 1% of the total traffic because it’s moved out of script kiddie territory into something where you need to have more skills and people don’t generally do that without a way to make money from it. One challenge with a DDoS in that regard is that it’s not subtle so your ability to wage an attack goes away relatively quickly without constant work replacing systems which are taken offline by a remote ISP.

xboxnolifes · on Jan 1, 2023

If we weren't pretty good at stopping DDoS attacks, every major hosting provider would be offline daily. Yet, websites being inaccessible for me is fairly uncommon.

cortesoft · on Jan 1, 2023

DDOSes still happen a lot, but the infrastructure can mostly handle it ok.

jayd16 · on Jan 1, 2023

> Those who want to DDoS will always find a way

How do you figure? You don't think far more DDoS events would occur if it was easier and more effective?

maxbond · on Jan 1, 2023

We can walk & chew gum here, spammers and DDoSers are poisoning the well for sure, but Cloudflare is responsible for externalities of their service.

zdragnar · on Jan 1, 2023

Server hosts want protection from DDoS. They hire a bouncer (I e. Cloudflare) to keep the hordes at bay. The bouncer asks for some way to distinguish your suspicious looking request from that of the start of a DDoS attack.

How is this an "externality" of cloudflare's service? If anything, it is an "externality" of the server hosts.

maxbond · on Jan 1, 2023

Does it even matter? Blame is irrelevant, I find responsibility to be a more interesting thing to discuss in virtually all circumstances. The DDoSers and spammers should cut it out (but unfortunately we can't really count on them for anything), Cloudflare should make more of an effort to preserve accessibility, and their customers should care about it too & make it clear to Cloudflare that it's a priority to them.

But if we take a step back it's clear that Cloudflare is the entity here that can have the biggest impact. Spammers and customers are diffuse, Cloudflare isn't. How much of the blame they deserve, I don't really care, but theres a problem, they're in the best position to act, so they have a responsibility to. In my opinion, that's the best mindset for solving problems.

zdragnar · on Jan 1, 2023

> Blame is irrelevant, I find responsibility to be a more interesting thing to discuss

This is, to me, a bizarre notion. You're imposing some sort of moral edict upon cloudflare for providing an opt-in service to web hosts.

> they're in the best position to act

They have, and they decided that certain traffic requires additional consideration to distinguish from bad actors. Your refusal to cooperate is entirely on you.

Just because you don't like their solution doesn't mean that there is a better one. If you have better ideas, I'm sure they're all ears, as less intrusive is generally cheaper to implement.

worldofmatthew · on Jan 1, 2023

DDOS attacks that exist because Cloudflare knowly protects booter websites which feeds the DDDOS economy with money to make DDOS botnets profitable?

anenefan · on Jan 1, 2023

Funny enough I can actually view their article.

Despite that I can visit websites where admins that set permissions not overly tough, I still run into enough blocks due to cloudflare that I'm considering investing time (ok for the last few years I've been really really lazy, I'm happy enough to copy and paste so I can manually write a nasty comment ) so that for each "bad load" the web site can be added to my disallow list. It might only save the smallest amount of wasted bandwidth but I guess it all adds up.

svnpenn · on Jan 1, 2023

Yep. Just try to cURL the page, and you get 403 Forbidden. That is completely unacceptable.

mnot · on Jan 1, 2023

The IETF is working on a solution to this, and Cloudlare is actively contributing and implementing; see https://blog.cloudflare.com/eliminating-captchas-on-iphones-...

dceddia · on Jan 1, 2023

This plan looks even worse for privacy, and seems to rely on authenticating the unique device itself.

> Know your user is coming from an authentic device and signed application, verified by the device vendor directly.

If they manage to get wide adoption of something like this it seems like a very bad day for privacy, and a joyous day for advertisers, anyone who wants to prevent web scraping, Google and/or anyone who might want to make it hard to crawl the entire web…

mnot · on Jan 1, 2023

Read a bit more; that’s not how the cryptography works. They’re blinded tokens.

dceddia · on Jan 1, 2023

I read further, and it sounds like they've designed it with privacy in mind. That's nice.

I'm still having a hard time seeing how this doesn't eventually lead to a completely locked-down internet, where users can only use approved browsers and devices.

They have a list of steps for how a request would be made, where steps 2 and 3 are:

> 2. Safari supports PATs, so it will make an API call to Apple’s Attester, asking them to attest.

> 3. The Apple attester will check various device components, confirm they are valid, and then make an API call to the Cloudflare Issuer (since Cloudflare acting as an Origin chooses to use the Cloudflare Issuer).

In a theoretical future world where 99% of site operators have set this up for protection, and 99% of users are using approved browsers, how would one do something like...

Create a competitor to Google? You'd need to crawl the web for that. Would you imagine Apple or Cloudflare would gladly let your device request millions of tokens per hour? Or would that be throttled or disallowed entirely?

Use curl (or telnet, or [any other HTTP client]) to grab a page?

Use yt-dlp to download a YouTube video?

Scrape a bunch of data for an AI project? See this article from the front page where someone scraped a bunch of car listings from KBB and trained a model to estimate car prices and found some interesting results. https://blog.aqnichol.com/2022/12/31/large-scale-vehicle-cla... - would something like that be permitted under a system like this? Or might you need to own/rent an army of authorized devices with authorized browsers to do that experiment?

userbinator · on Jan 1, 2023

The Apple attester will check various device components, confirm they are valid

Here's the "you will be under our control" part of their scheme. Running any "unauthorised" software? Rooted/jailbroken? Certain "security" features disabled? Using third-party replacement parts? ... Social credit score too low? Too bad, you're now denied access.

jws · on Jan 1, 2023

I've implemented PAT on a service. A server would only want to require PAT when it wants to see if it is dealing with a human. In the context of, say a blog, you would require PAT when someone wants to make an anonymous comment or create an account.

For requests which are "reading" you just serve content, unless for some reason you only want human eyeballs to see your content.

In your proposed scenario, reading the publicly accessible contents of the web, there should be no problems. (Of course some percentage of sites will accidentally have required PAT at any time and be unscannable, but presumably they figure that out and fix it.)

Now for the good side: I, reluctantly, implemented a geolocation filter to control anonymous content additions to that service I was alluding to. I felt bad about it, but I also felt bad having to filter out content spam every day. It turned out that all my strange content spam came from one country, so I banned 143 million people from anonymous content creation for my convenience.

With PAT I can remove the national ban and let any "probably human" in.

dceddia · on Jan 1, 2023

> For requests which are "reading" you just serve content, unless for some reason you only want human eyeballs to see your content.

I was going to cite the LinkedIn case where, last I had heard, the courts had decided that scraping was legal...

Headline [0] from April:

> Court rules that data scraping is legal in LinkedIn appeal

> LinkedIn has lost its latest attempt to block companies from scraping information from its public pages, including member pages.

... but upon googling it, I found a more recent [1] ruling :/

> LinkedIn prevails in 6-year lawsuit against data scraper

> The U.S. District Court for the Northern District of California sided with LinkedIn in its six year lawsuit against a firm that scraped data ...

So that sure puts a nail into the argument I was going to make. But still, while I think your use case lines up with the spirit of this kind of system, I think the reality is that it also would be used by every single site with a signup wall to kill off the archive.ph's of the world.

0: https://www.zdnet.com/article/court-rules-that-data-scraping...

1: https://iapp.org/news/a/linkedin-prevails-in-six-year-lawsui...

asdfghjhgfderty · on Jan 1, 2023

sadly, you are completely wrong. just look at captcha usage.

yeah, very few follow the original recommendation of showing captcha for ips that already showed signs of being bots etc.

but the vast majority shows captchas for absolutely anything. I can't even book a dmv appointment today without answering a gratutious one.

same will surely happen with PAT, especially because it is so easy for the implementer to shove it everywhere. people are lazy and dumb.

asdfghjhgfderty · on Jan 1, 2023

> In a theoretical future world where 99% of site operators have set this up for protection, and 99% of users are using approved browsers, how would one do something like [compete with apple or google]

A: they won't. and that's the plan. not to mention that now those are the only two players (microsoft a late third) that can both attest you and profile you locally on the device for advertising profiling.

charcircuit · on Jan 1, 2023

Stopping ad fraud only hurts people trying to make money from it. Preventing web scraping is good for privacy.

This is implemented in a privacy preserving way.

userbinator · on Jan 1, 2023

Preventing web scraping is good for privacy

It's bad for freedom. Very, very very bad.

Talking about the "privacy" of what has been made publicly available makes no sense.

charcircuit · on Jan 1, 2023

>It's bad for freedom

So are laws in the real world about stealing.

>Talking about the "privacy" of what has been made publicly available makes no sense.

Yes, it does. Users often wish to be able to delete or make something that was once public private. For example someone could post a picture of themselves on twitter. A year later they are no longer comfortable with having pictures of themself online so they go and delete them. Despite the user deleting them malicious scrapers will not delete them and keep those images. Another example would be setting your real name to your twitter name. Later you aren't comfortable using your real name so you change it away. Scrapers may still have your real name despite you wanting it to be a secret.

userbinator · on Jan 1, 2023

Users often wish to be able to delete or make something that was once public private.

People also wish to be be able to do a lot of other things, but that doesn't make it right.

What becomes public history must remain immutable. Otherwise you're just going to encourage a state in which those who have the power to will destroy and rewrite the past to their advantage, to control the narrative over the population. The trendy phrase "right to forget" is effectively a "right to rewrite history".

It's interesting that you automatically call those wanting to preserve what could possibly be very important history "malicious scrapers".

charcircuit · on Jan 1, 2023

>It's interesting that you automatically call those wanting to preserve what could possibly be very important history "malicious scrapers".

I am going off of twitter's view. If you store tweets locally you must listen for when they get deleted and then delete them on your end too. If a scraper is breaking twitter's rules I consider that malicious scraper.

https://developer.twitter.com/en/developer-terms/policy#3.Up...

>If you store Twitter Content offline, you must keep it up to date with the current state of that content on Twitter.

cvalka · on Jan 1, 2023

So you don't believe in freedom of speech. Got it. Google should not exist and other great ideas.

charcircuit · on Jan 1, 2023

I never gave my opinion on privacy. Also, Google can exist just fine if they were not allowed to store private documents.

cvalka · on Jan 4, 2023

You don't believe in freedom of thought or speech. Period.

nibbleshifter · on Jan 1, 2023

> Preventing web scraping is good for privacy.

You will need to elaborate on that.

gattilorenz · on Jan 1, 2023

I assume the OP means that scraping is used to collect public data, including that of individuals, which can even be linked across different websites. There’s at least a couple of services that try to connect somebody’s Instagram account to their FB/Twitter/LinkedIn etc. I assume some of those rely on scraping (+username checking), since the TOS for the APIs of those social networks probably prohibit this use case.

charcircuit · on Jan 1, 2023

Yes, big datasets of user data can be created and sold. This user data can be joined across multiple sites to build up profiles on people. These datasets floating around can harm the reputation of a site.

thataintright · on Jan 1, 2023

Why is the IETF letting the CIA influence the design of network protocols?

dane-pgp · on Jan 1, 2023

If you think that's bad, you're not going to like the fact that they're letting the NSA design a system for "TPM-based Network Device Remote Integrity Verification":

https://www.ietf.org/archive/id/draft-ietf-rats-tpm-based-ne...

RedShift1 · on Jan 1, 2023

If you think that's bad, don't look up what happened with IPsec.

ddon · on Jan 1, 2023

May be this link will work for you?

https://archive.ph/1gcnQ

userbinator · on Jan 1, 2023

Thanks for trying, but unfortunately archive.ph is also behind Cloudflare.

1vuio0pswjnm7 · on Jan 1, 2023

The "new" HTTP is clearly targeted at the "approved browser".

For example, this alleged "head-of-line blocking problem" that HTTP/2 purportedly "solves" was never a problem of HTTP outside of a specific program, the graphical web browser, the type of client that tries to pull resources from different domains for a single website. Not all programs that use HTTP need to do that.

For instance I have been using HTTP/1.1 pipelining outside the browser for fast, reliable information retrieval for close to 20 years. It has always been supported by HTTP servers and it works great with the simple clients I use. I still rely on HTTP/1.1 pipelining today, on a daily basis. Never had a problem.

There are uses for pipelining besides the ones envisioned by "tech" companies, web developers and their advertiser customers.

mnot · on Jan 1, 2023

If early hints breaks your proxy, it’s likely your proxy doesn’t handle 1xx status codes correctly. Could you tell me which proxy it is (privately if you think it necessary)? I’d like to chase the bug with them.

raggi · on Jan 1, 2023

The big problem with pipelining in http 1.x is that a response can break the pipeline part way through and there is no way to know what the server processed. A response night for example, mid pipeline be connection:close and that’s that, did any subsequent request get processed? Who knows.

1vuio0pswjnm7 · on Jan 1, 2023

HTTP/2 seems to be designed for (graphical, interactive) webpages.

The maintainer of a popular webserver has suggested HTTP/2 is slower than HTTP/1.1 for file download.

https://stackoverflow.com/questions/44019565/http-2-file-dow...

As I stated, I use HTTP/1.1 pipelining every day. I use it for a variety of information retrieval tasks, even retrieving bulk DNS data. To give an arbitrary example, sometimes I will download a website's sitemaps. This usually involves downloading a cascade of XML files. For example, there might be a main XML file called "index.xml". This file then lists hundreds more sitemap XML files, e.g., archive-2002-1.xml, archive-2002-2.xml, containing every content URL on the website beginning with some prior year all the way up to the present day. Using a real world example, index.xml contains 246 URLs. Using HTTP/1.1 pipelining I can retrieve all of them into a single file using a single TCP connection. Then I retrieve batches of the URLs contained in that file, again over a single TCP connection. Many websites allow thousands of HTTP requests HTTP/1.1-pipelined over a TCP single connection, but I usually keep the batch size at around 500-1000 max. Of course I want the responses in the same order as the requests.

The process looks something like this

    ftp -4o 1 https://[domainname]/sitemaps/index.xml
    yy030 < 1|(ka;nc0) > 2
    yy030 < 2|wc -l

    1337855

1337855 is the number of URLs for [domainname]. Content URLs, not Javascript, CSS or other garbage.

yy030 is a C program that filters URLs from standard input

ka is a shell alias that sets an environment variable that is read by the yy025 program to indicate an HTTP header, in this case the "Connection:" header set to "keep-alive" not "close" (ka- sets it back to close)

nc0 is a one line shell script

    yy025|nc -vv h1b 80|yy045

yy025 is a C program that accepts URLs, e.g., dozens to hundreds to thousands of URLs, on stdin and outputs customised HTTP

h1b is a HOSTS file entry containg the address of a localhost-bound forward TLS proxy

yy045 is a C program that removes chunked transfer encoding from standard input

To verify the download, I can look at the HTTP headers in file "2". I can also look at the log from the TLS proxy. I have it set configured to log all HTTP requests and responses.

Is this a job for HTTP/2. It does not seem like it.

This type of pipelining using only a single TCP connection is not possible using curl or libcurl. Nor is it possible using nghttp. Look around the web and one will see people opening up dozens, maybe hundreds of TCP connections and running jobs in parallel, trying to improve speed, and often getting banned. As with the comment from the Jetty maintainer, I suspect using HTTP/2 would actually be slower for this type of transfer. It is overkill.

IMHO, HTTP, i.e., in the general sense, is not just for requesting webpages and resources for webpages.

I find HTTP/1.1 to be very useful. It is certainly not just for requesting webpages full of JS, CSS, images and the like. That is only one way I might use it. Perhaps HTTP/2 is the better choice for webpages. TBH, if using a "modern" graphical browser, I would be inclined to let it use HTTP/2. Most of the time I am not using a graphical browser.

pfoof · on Jan 1, 2023

This comment deserves a post on it's own, so you can explain the naming scheme behind it

1vuio0pswjnm7 · on Jan 1, 2023

One of the many programmer memes is something along the lines of "naming is difficult." Yet programmers, individuals who are often obsessed with numbers, insist on trying to do it anyway. The results speak for themselves. This extends beyond programs. The so-called "tech" industry has produced some of the most absurd, non-descriptive business names in the history of the world.

I decided to try numbering the programs I write instead of naming them. I often use a prefix that can provide a hint.^1 For example, the yy prefix indicates it was created with flex and the nc in nc0 indicates it is a "wrapper script" for nc. If the program is one I use frequently, then I have no trouble remembering its number. In the event I forget a program number, I have a small text file that lists each yy program along with a short description of less than 35 chars.

1. But not always. I have some scripts that I use daily that are just a number. I also have a series of scripts that begin with "[", where the script [000 outputs a descriptive list of the scripts, [001, [002, etc. I am constantly experimenting, looking for easier, more pleasing short strings to type.

Each source file for a yy program is just a single .l file with a 3-char filename like 025.l, so searching through source code can be as simple as

     grep whatever dir/???.l

If I put descriptions in C comments at top of each .l file I can do something like

     head -5 dir/???.l

Aesthetically, I like have a directory full of files with filenames that follow a consistent pattern and are of equal length. Look at the source code for k, ngn-k or kerf. When it comes to programming, IMO, smaller is better.

asdfghjhgfderty · on Jan 1, 2023

they are simply constructing GET request headers "by hand " based on some xml file downloaded earlier an then sendind that list of GET via `nc`. the example is just over confusing and using file named as 1, 2

I voted up because that is indeed neat tho.

superkuh · on Jan 1, 2023

HTTP/1.1 support is the last bastion between the web and complete corporate control. Once the megacorp browsers and man in the middle companies like Cloudflare drop HTTP/1.1 we will no longer be able to host a website without the continued approval of a third party corporation. HTTP/2 and HTTP/3 implemenations require the use of CA based TLS.

Just to preempt misunderstanding: HTTPS is great. But HTTPS only, with no option for HTTP is very much worse than HTTP+HTTPS for human people. Despite being great for for profit companies and institutions.

MuffinFlavored · on Jan 1, 2023

> HTTP/2 and HTTP/3 implemenations require the use of CA based TLS.

You can't do LetsEncrypt?

acdha · on Jan 1, 2023

You can. You can also continue to use private CAs, which presumably could be used to add more options if something happens to Let’s Encrypt.

csomar · on Jan 1, 2023

Let'sEncrypt is still a third-party entity.

superkuh · on Jan 1, 2023

From my original comment,

>Just to preempt misunderstanding: HTTPS is great. But HTTPS only, with no option for HTTP is very much worse than HTTP+HTTPS for human people. Despite being great for for profit companies and institutions.

Using LE is great. It's problematic that literally everyone uses it but it's better than it not existing. But using LE does not solve the problem of not being able to use plain HTTP.

PenguinCoder · on Jan 1, 2023

Not for LAN only personal websites.

diarrhea · on Jan 1, 2023

Dumb question, but how is HTTP important on local, switched networks? I have a single switch and don’t fear local MitM. I was under the impression basic HTTP is mostly fine then. Other parties, even on that same switch, won’t be able to listen in (a network hub would allow this).

sdfhbdf · on Jan 1, 2023

Some web features are only available on HTTPS and browser really discourage users from entering websites with self-signed certificates.

See: https://developer.mozilla.org/en-US/docs/Web/Security/Secure...

PenguinCoder · on Jan 1, 2023

I don't get what you're asking. Why is HTTP/S important?? I don't know how to answer that for you. Security is important regardless of where it is at. Defense in depth, at multiple layers. I don't fear MITM on a local only HTTPS server, I might not trust other devices/traffic on my network that scoop up _everything_ it can, and being in plaintext, would expose more than I want. I trust my services and devices. I don't trust everything on the network especially devices that are not mine, or that I have no control over (set top boxes, Roku TVs, Sec Cameras/NVRs, etc). Of course I have other controls and protections in place, but I trust and WANT LOCAL HTTPS on my services in addition to those.

diarrhea · on Jan 3, 2023

> I don't fear MITM on a local only HTTPS server

> I trust my services and devices

That is exactly the point. You trust your local, possibly dumb aka unmanaged switch. It's a little piece of silicone with no funny business going on.

Now if you plug a trusted device A as well as an untrusted device B in, the untrusted device won't see any traffic meant for device A:

https://wiki.wireshark.org/CaptureSetup/Ethernet#switched-et...

Point being: as long as you trust all network devices from a trusted machine to another, on a switched Ethernet network, no other device will see any of that traffic at all, on a fundamental, low OSI level. It's not even about HTTP/S at that point yet. All this is untrue for WiFi, where you will want HTTPS indeed.

I'm not advocating against HTTPS at all. I use it as much as possible. But it might actually not be necessary, locally under the right circumstances.

sli · on Jan 1, 2023

Why not just self-sign for that? Outside of using it to test configurations or deployments, SSL seems a lot less necessary if it's just inside a LAN and all the clients are known.

PenguinCoder · on Jan 1, 2023

Some services don't work without HTTPS or transport security. Sometimes, I want transport security on traffic for various reasons. Current mainstream browsers are also refusing to connect to 'self signed certificate' HTTPS sites because they're 'insecure' and continually are disallowing you the user to bypass these 'protections'.

est · on Jan 1, 2023

> But HTTPS only, with no option for HTTP is very much worse

Have to agree with this. Been playing with `window.addEventListener('devicemotion')` recently, that shit is https only, which means I can't debug it on a simple localhost. WTF.

marvinblum · on Jan 1, 2023

https://github.com/FiloSottile/mkcert

diarrhea · on Jan 1, 2023

Try a quick and dirty local reverse proxy like Caddy. It provides self-signed localhost TLS, trusted on the OS level (if you agree) such that the browser is none the wiser.

est · on Jan 1, 2023

That's not quick n' dirty, that's a lot of hassle. I want write index.html and open the local file in a browser and that's it.

tgv · on Jan 1, 2023

$ python -m http.server

then connect to :8000

est · on Jan 1, 2023

that's not TLS

exitheone · on Jan 1, 2023

Nothing prevents you from using a self signed cert.

It adds security for local deployments as well because you either trust the local CA or your browser tells you that someone has owned your network

PenguinCoder · on Jan 1, 2023

> Nothing prevents you from using a self signed cert.

With the various web browsers continuing to disallow or blar warnings about "SELF SIGNED CERT", this is not true. There's a lot of _current issues_ trying to access a self signed HTTPS site using mainstream browsers because they know better than you do.

exitheone · on Jan 2, 2023

That is just an implementation detail. It's trivial to create your own local CA, put it into the trust store of your device, use a cert signed by it and be done with it.

superkuh · on Jan 1, 2023

I've been using self-signed certs on my websites for 20 years. Part of the problem is that the HTTP/3 implementations do not allow the use of self signed certs. CA based only.

exitheone · on Jan 2, 2023

That is just an implementation detail. It's trivial to create your own local CA, put it into the trust store of your device, use a cert signed by it and be done with it.

superkuh · on Jan 2, 2023

And that's fine if it's only you and some friends using it. But if I want a random person on the other side of the world to be able to search for $topic and load my website it's not gonna work.

For the last decade or so I've gotten about 1k hits per day on my self-signed HTTP+HTTPS site. Random people will click past the scare tactics of modern browsers re: self signed if the topic is already technical and the demographic understands browsers are stupid. But all these people would be unable to visit under HTTP/2 or HTTP/3 only.

opportune · on Jan 2, 2023

HTTP/1.1 is never going away. Cloud vendors will need to support it approximately forever because so many applications are not HTTP/2 compliant, and technically not even HTTP/1.1 compliant (because of things like header casing) in a way that precludes “down casting” after ingress without a lot of hacky workarounds.

avx56 · on Jan 1, 2023

Honestly I don't think browsers will drop HTTP/1 anytime soon because it's inconvenient/impossible to do HTTPS on a local network. I mean, Chrome shows Not Secure on http connections, but not on localhost, and it doesn't mean much anyway. Which is FINE. I mean, maybe input[type=password] gets an alert on HTTP? Just a thought.

e12e · on Jan 1, 2023

> HTTP/2 and HTTP/3 implemenations require the use of CA based TLS.

Well, the standard doesn't, but the browser(s?) implantation(s) do?:

https://http2.github.io/faq/#does-http2-require-encryption