IMO, the reason for the initial success of HTTP (1.x) was due to its extreme simplicity. Things like text-based protocol, straightforward stateless design, capability to implement both server and client in easy, basic code. All this meant that the protocol itself was stable, usable, and a reliable standard to use.
The current path is to drastically increase complexity due to the demands of the content provider overlord(s); Basically, in order to better accommodate the needs of Google (and a handful of others), we must redefine things for everyone. It's becoming a complex, over-designed protocol that is being crammed down people's throats, instead of a protocol that is embraced because it makes sense.
I'm still not over the fact that they made headers all-lowercase in HTTP/2. I know the reasons, but it's so weird to have all-lowercase headers. TBH I don't see much of an uptake in the community either: since HTTP/2 came out, I've barely seen lowercase headers be proliferated in documentation of headers, e.g. MDN lists them HTTP/1 style: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers
As someone who works in infrastructure I disagree with this. While it’s true that applications should always have done this, many many do not and need to be dealt with. If they are third party paying customers, your only option is to figure out how to make things work, because asking customers to upgrade will piss many of them off.
The simple fact so many applications do not treat headers as case insensitive is one of the major things holding back http 2. With HTTP/1 it was suggested but not enforced. Upgrading to http/2 with forced-lowecasing breaks these applications
I just meant for display... :( Like it changes content-type to Content-Type, which shouldn't break anything right...?? HTTP headers are supposed to be case insensitive anyway right?
If the Chromium/Chrome browser is open source then anyone should able to edit the source code to change the headers to whatever case they prefer and re-compile.
The new QUERY method strikes me as a really promising addition. Not being able to send a body with a GET-type request is a gnawing issue I have with HTTP
Elastic search (used to?) use HTTP body as parameters for a GET request. IIRC the HTTP specification doesn't (again, used to?) mandate GET request to have no HTTP body
It still does. I don't think it violates the HTTP 1.1 specification but more that it is unspecified. It's just that a lot of http clients simply don't support doing HTTP GET with a body under the assumption that it is redundant / not needed / forbidden. Of course elasticsearch allows POST as an alternative.
People used to obsess a lot more about those HTTP verbs and their meaning a lot more than today. At least I don't seem to get dragged into debates on the virtues of PUT vs. POST, or using PATCH for a simple update API. If you use graphql, everything is a POST anyway. Much like SOAP back in the day. It's all just different ways to call stuff on servers. Remote procedure calls have a long history that predates all of the web.
GET requests with a body (unspecified in HTTP/1.1) reminds me of a similar case I encountered years ago: URL query params in a POST (an HTML form whose action attr contained a query string).
I feel not obsessing about the meanings of HTTP verbs can, has, and will lead to security incidents and interoperability issues between middleware. Specifications where everyone gets to pick and choose different behaviors is a nightmare.
Interestingly though GET with data exists in the wild, and has for many years.
I manage a http library class, and a customer encountered an API that required a GET but with data. (think query parameters passed as XML).
I implemented that for the customer, and then implemented the reverse in the server class. I'm not going to say its used a lot, but it makes semantic sense.
Incidentally it also becomes true for DELETE which is another request typically without a body.
This is the first I've heard of QUERY though, so look forward to reading up on that.
nothing on the spec prevents arbitrary data on body of a GET. but clients and proxies are implemented by lazy people who make excuses they are preserving some legacy security feature or something and continue to ignore the spec.
The spec is actually pretty clear on this - do not specify a body on a GET request.
> A payload within a GET request message has no defined semantics; sending a payload body on a GET request might cause some existing implementations to reject the request.
Previously it was "SHOULD ignore the payload".
It's nothing to do with laziness or security - people are writing spec conforming software. And indeed every library I've used allows interacting with a body, even on a GET.
> The spec is actually pretty clear on this - do not specify a body on a GET request.
That's not what your quote says.
Not having a defined semantics does not mean if is not supported. Just because some implementations fail to support GET with a request body that it does not mean all implementations should interpret it as a malformed request.
I can roll out a service with endpoints that require GET with request bodies and it would still be valid HTTP.
"out of spec" means that it is out of specification. It is literally not specified. You are doing something that is not specified. It is therefore an action that is out of specification. it is therefore out of spec.
If there was an utter ban, then it would be against specification and not compliant, not merely out of specification.
> "out of spec" means that it is out of specification. It is literally not specified.
That's not what it means at all. Being out of spec means the spec explicitly stating that a request with a body should be reject. If the spec does not state that a request with a body should be rejected then you are not required to reject a request which packs a body.
No, not defined means it's not within the purview of the spec. Spec doesn't care. You can send one. Maybe it'll work, maybe it won't, maybe it'll crash, maybe it'll be rejected, maybe some proxy along the way will strip it and the server won't even get it, maybe it'll get your client banned forever.
All of these are fine, because spec doesn't care.
> If accepting body in GET is out of spec, then spec is supposed to say, GET cannot send body.
No, then it would be against spec, like HEAD with a response body.
One implementation detail about QUIC that I was surprised by was that it requires TLS. That’s great for improving the security on the public Internet but it seems like it adds complexity and CPU overhead if you’re running on something like an internal Wireguard network. Overall, though, it’s a minor complaint. I did like how they split apart the QUIC and HTTP/3 protocol from one another.
My experience as former technical lead of a major HTTP/3 deployment is that while TLS certainly adds CPU overhead, it's by far not as major as the overhead of TLS on top of TCP is.
QUIC in general is far less efficient than TCP+TLS. Very optimized implementations require about 2x CPU for bulk data transfers, more usual ones (and running on operating systems that don't support UDP segmentation offloads) require about 5x the CPU. From that CPU overhead, only a small part is crypto. Most CPU time is spent in the OS networking stack processing the tiny MTU sized packets. In an optimized implementation where crypto has a bigger impact, you might see around 30% CPU time being spent in crypto operations in a profile when using hardware-accelerated AES (chacha20 is worse). Which means one could gain that amount of CPU back for other things in a cryptoless QUIC. However in a less network-optimized deployment it would only be 10%.
What I however can understand is people not wanting to deal with the complexity of issuing, deploying and rotating certificates for internal deployments that are already secured by other means like wireguard. It could be a concern - but on the other hand tools like the k8s cert-manager already simplify the process for those environments. And of course one needs to consider whether QUIC is the perfect tool for those environments anyway - plain TCP has lots of strength too.
What kind of profiler can tell which areas the CPU is burning cycles?
Like, just mapping a bunch of known function and timing? Then every "ssl_" function is classified as crypto time? And every "net_" function is networking? (Or something like that?)
Networking cost will be within callstacks for the system calls that send and receive packets (sendmsg, sendmmsg, recvmsg, recvmmsg).
Crypto cost in functions which sound like the crypto primitives being used. They typically won't have ssl_ in the name, because QUIC implementations directly make use of lower-level primitives - that are e.g. exposed by libcrypto/ring/etc.
It clearly shows the networking and crypto parts (here using ChaCha20). However don't interpret too much into the actual values in this graph, since the profile is nearly 2 years, uses ChaCha20 instead of AES (much more expensive), and used loopback networking (cheaper than the real thing).
It does make sense though, our home networks aren't safe these days when random apps on your phone and random websites on your web browser can just make requests to random hosts inside your network, possibly exploit your router and then snoop on traffic.
The precedent for that was set by http/2; while the spec didn’t require it (Snowden happened after that decision was made), all of the inplementatuons did.
TLS delivers three things, Confidentiality, Integrity and Authentication. But intuitively these three things are what people expect their transport protocol to do anyway, it's actually surprising as a user that TCP doesn't really bother providing say, Integrity. "Oh, yeah, your data might be arbitrarily changed by the time it is delivered". Er, what?
BCP 78 "Pervasive Monitoring is an Attack" says the Internet should design new systems to resist such monitoring and so offering these intuitive properties for a new transport protocol made sense. And that's what QUIC is. In a BCP 78 universe it doesn't make sense to ship new protocols that will be rendered useless by the surveillance apparatus.
The OSI model is comprised of layers of abstractions, each building on-top of each other. Making TCP (a transport-layer protocol) depend on TLS (an application-layer protocol, ref https://en.wikipedia.org/wiki/Application_layer) is completely backwards, and makes absolutely no sense when considering the model.
The OSI model has been around for a very long while. If you have a degree in CS, or have studied networking in university - chances are you've had to learn about the OSI model. There's no reason to throw that away now, or reinvent the wheel.
> The OSI model has been around for a very long while.
Even imagining that for some reason I wasn't aware of that, why would it be relevant?
> If you have a degree in CS, or have studied networking in university - chances are you've had to learn about the OSI model.
And the waterfall software development model. A bunch of long obsolete data structures. Open Hypermedia (remember that? No? That's OK it doesn't matter any more).
What's happened here is that you've privileged a bad model that somebody probably taught you out of a textbook (hopefully while grimacing, since this is useless "information") over the real world experience of dozens of really smart people who work with actual networking and designed QUIC.
Maybe, if I'm giving you benefit of the doubt you've assumed "user" somehow means "undergraduate who is studying the OSI model" but it doesn't - billions of people use the Internet, far more than will study any degree, let alone Computer Science. And it's quite reasonable for them to expect their communications to have these three properties.
If your preferred model insists that we can't have security until the application layer then your model is wrong, just as surely as if you have a model of the Atom which assumes it's a solid ball of something (the plum pudding model, as with OSI perhaps some undergraduates were taught this model between when it was proposed and when experiments showed it's just wrong).
I have absolutely no idea what anything you just wrote means. You're ramblings make no sense whatsoever, and they lead towards no conclusions. Get over yourself.
> "What's happened here is that you've privileged a bad model that somebody probably taught you out of a textbook"
So you're dismissing the OSI model as a "bad model" that "somebody taught me", yet there's this other shiny thing developed by "dozens of really smart people" and that's clearly the way to go because those people "work with actual networking".
> "And it's quite reasonable for them to expect their communications to have these three properties."
Their communications are clearly secured and tamper-proof already today, despite not using QUIC. How do you square that?
The security happens at the application layer, using mechanisms such as TLS. Additional security mechanisms can clearly come on-top. No need to bake everything into a single transport-layer protocol when there's endless flexibility available by layering the protocols on-top of each other.
> "If your preferred model insists that we can't have security until the application layer then your model is wrong"
How about instead of throwing fits about some vague ideas of "security" you actually provide concrete examples of what it is you're talking about, so that we can have a meaningful and constructive conversation about technology?
All I was saying is that there's no reason to introduce more complexity into the transport layer, since "security" is already handled by the application layer (on a per-application basis). It's unclear whether a single "security" model fits into the transport layer, which should be as agnostic and light-weight as possible (hence it's original intention - to facilitate the transfer of information between endpoints, and leave the rest to whatever consumes that information).
> All I was saying is that there's no reason to introduce more complexity into the transport layer
But there is, and I explained what that reason was. The real world is under no obligation to faithfully copy your OSI Model, on the contrary, the fact the OSI Model doesn't resemble the real world is a good reason to abandon it.
> "But there is, and I explained what that reason was."
If your reason is "security" (whatever that means, you won't define that either), then I too, explained how that is guaranteed by application-layer protocols already today - so it's unclear why changing the transport layer is needed. Again, you're not saying anything concrete and keep handwaving - I don't think you really want to (or can) have a conversation, really.
I studied networking in the university, 20 years ago. We did study OSI as well as TCP/IP and the rest of the actual Internet stack, and it was blatantly obvious that the latter doesn't really conform to the former - protocols straddling boundaries etc. For example, TCP is not a strictly transport-layer protocol, since it also handles sessions.
When asked, our prof readily admitted that the OSI model was one of those design-by-committee experiments in purity that quickly broke down IRL.
TCP/IP does not use the OSI model. Stacking layers on top of each other promotes inefficiency and poor security, and is one reason why TCP/IP won over the OSI-recommended protocols.
The advantage of merging TCP and TLS is improved performance and security/privacy (security/privacy is improved due to fewer session parameters being readable or modified by third parties)
So now what is the benefit of sticking to the OSI model, so that we can evaluate the pros and cons?
it's insane to expect the opposite: that traffic in my own network will need to keep reaching to a certificate authorities outside to validate packages from one host to another.
if you don't understand why these 3 things are on top of tcp, well, nevermind, I was going to say you shouldn't be designing networks buy you migth be already on the quic steering committee.
most committee now are a joke so that some googler middle mamaget makes it to jr director. sigh.
The use of TLS for QUIC does not imply or require the use of the Web PKI which is what I assume you're thinking of by "certificate authorities outside to validate packages".
> "The use of TLS for QUIC does not imply or require the use of the Web PKI"
Handling certificate revocations (which would be needed to "ensure security"), does indeed imply the use of some way to check for the revocations in a timely manner. The revocation lists themselves can be tampered-with.
I think saying it "doesn't address" these threats is a bit extreme.
It significantly reduces the attack surface (Certificate Authorities vs every ISP), it makes it a lot harder for state actors to pull off those attacks deniably with a gag order, and it makes it a lot easier for an informed but non-expert consumer to pick a secure-by-default solution.
Lenovo Superfish was a local exploit; the software was installed at the factory. It could have been any sort of root kit or other client software. Once an attacker has control of your local device, lots of things are possible. It’s true that HTTPS won’t defend against local attacks, but that doesn’t really seem like a fair criticism since that is not what it is supposed to do.
The defense against compromised certificate authorities starts with platforms and browser makers. They demand that CAs implement certificate transparency logs to be included in root stores.
They also monitor CT logs, as do most large site operators. Facebook for example does not run an OS platform or browser, but has a robust CT monitoring program.
So if one of the random little CAs in the root store of your browser issues a rogue cert for “google.com”, it will be logged and seen, and that CA will risk getting kicked out of the root store. That’s what happened to Symantec, which was not a small CA.
In general it is safer and quieter for bad guys to target client devices with attacks like Pegasus, than systemic actors like entire CAs.
> So if one of the random little CAs in the root store of your browser issues a rogue cert for “google.com”, it will be logged and seen
The victim might be the only one getting a collision as governments target them (and no security researchers get the compromised site + public key), and the Superfish fiasco shows that a collision is simply ignored by the browser.
I think it makes sense, I would just love to improve ergonomics around getting certificates for internal services.
For example, I use the TP-Link Omada Wi-Fi access points and have a local hardware controller for them. The hardware controller can have a static IPv4 (ugh for no IPv6 support) but since it doesn’t support Let’s Encrypt its only way to get a certificate is to upload one via the web GUI. I can of course create my own CA and install my own cert with a long expiration date but then that would mean installing my CA on a bunch of devices from which I might access the controller.
Maybe the solution is something like having your DHCP box also be able to run an ACME server scoped just to your local domain and have that CA be trusted for your local domain by all your devices that get their IP from the DHCPv4/6 via a DHCP option.
The link to the new query method [1] intrigued me. Could this, if widely adopted, make GraphQL obsolete? (Or am I admittedly ignorant as to exactly what they each do?)
If everything works out, grapnel could use it and queries would be cacheable by generic http caches. A lot would need to happen to make that work, though. The immediate motivation is to get an alternative to people who want to put a request body on GET.
The main focus of HTTP 2 and 3 was apparently performance, and hardly any of the other fundamental issues of the protocol. It's nice to see that some work is put into something like QUERY as well, which improves HTTP as an API protocol (REST) - although the motivation appears to be performance again and I always have mixed feelings when I see the term "idempotent" in a context that likely involves database systems. Too often, such requests do have side effects of some kind and preventing request retries throughout the stack isn't easy.
Definitely not performance, QUERY is an improvement/hackfix. Browsers have a cap on size of the request headline path (used to be very small for IE7), and server software also tend to cap the size of it, as a security mitigation (smth like max header size, similar to what happens with cookies). For this reason, technologies such as elasticsearch or graphql, which allow for quite large query defs, use POST and fit the query in the body instead, which is not bound to the same limitations.
Semantically, such queries are cacheable, but the request being POST, intermediate edge proxies or CDNs won't cache the request. This means clients pay the penalty every single time. Moreover, browsers can't have url-based history with such solutions, as POST requests can't "go back".
So QUERY fixes this. Semantically. Now you have to build support in middle boxes for it, and browsers must support it as well.
As far as I know, GET and DELETE requests support a payload body as per one of the later specs, whereas an earlier version wasn't fully clear on this and people assumed it's forbidden. Unfortunately, there's still a lot of software conforming to the interpretation of the old spec, which means you cannot count on these HTTP verbs to be supported.
Given this legacy, QUERY seems like an adequate solution. However, it seems to me like it's only intended for read queries. Write queries are generally not cacheable semantically, at least not in my world. Was the naming influenced by GraphQL by any chance, which distinguishes between queries (read) and mutations (write)?
The practical effect that my coworkers keep telling me about is that the client or an intermediary like a load balancer may retry the same request until it is successful, with the expectation that the resource (records or settings in a database system in our case) is only modified once. A database query that does something like value = OLD.value + 1 cannot be safely retried. There are circumstances under which the outcome of a transaction can be unknown but it might still succeed, and the client should be prevented from doing accidental changes. With read queries this should be rare, but even then there can be side-effects affecting the overall system state. There are of course solutions to this for the application side, like the usage of idempotency keys.
GraphQL is obsolete today as it was when it was released. You have HATEOAS and with it you can take advantage of HTTP caching. No, Redis is not the same.
HTTP methods don't really have functionality inherently attached. They're only semantically meaningful. Occasionally intermediate infrastructure will treat them slightly differently, such as a firewall stripping bodies from GET requests, but that's not part of the spec.
You can even use custom methods. I have a server that responds to the SALAMI method.
I thought the most interesting bit was the privacy OHTTP which they’re building a service around[0]. How this will differ from a VPN will be interesting. The gist of it is that the http connections are naive and don’t really record an “accurate” up address or trace, if I understand correctly
The idea is to compose intermediaries run by different parties, so that no single entity has access to both the unencrypted payload and client ip address / connection context. It’s a controlled form of proxying where the proxy doesn’t have access to the plaintext. Not currently designed for everyday browsing; the use cases are things like collecting telemetry.
CloudFlare's solution is significantly less secure than Tor; it's more like visiting an HTTPS website using a VPN service. You have to trust that the VPN and the website operator aren't colluding to de-anonymize you, and that they aren't both being monitored by the same third party (who can de-anonymize you using timing information).
I thought you were being facetious about what could have been a throwaway comment from a throwaway account, but then I checked their bio; this particular type of interaction is one of the things that I love about HN.
Same here, I've learned to check. I really should have explicitly pointed out who, so it was less blown off. Almost came back and did so. And yes, why HN is great.
What's this "HTTP core"? Please tell me it's just the sane parts of HTTP and without any dark corners and ambiguous specs? Please tell me it's something you can write a parser for in an hour?
We're no less 10 years overdue for something like this.
Similar thoughts. As I read through the “why HTTP took off” explanations, I was surprised that the more obvious “it was the first thing that a plurality of players networked on and after that it was all just bandwagoning, because it’s much easier to meet people at other people’s parties than try to host your own” wasn’t enumerated.
"Ease of use" is mentioned, and I suppose this refers to HTTP being text-based, pretty readable and intuitively understandable. HTTP/2 and later aren't like this anymore, for the sake of efficiency. It's a trade-off, but at this point I don't see much value in using it for APIs anymore. If you value efficiency and clean design, there are clearly better alternatives that aren't more complicated than what HTTP has become.
By "ease of use" they probably mean that the ecosystem is so advanced, one can easily write a HTTP handler in the language/framework of their choice.
In reality through, now days it's a nightmare to implement a HTTP server/client (at least for individual developer like myself). HTTP/1 is already a struggle by itself, HTTP/2 is much worse with all the new concepts & HPACK etc it added, and then you have HTTP/3 which introduces an entire UDP transport for you to implement. There is no easy part in HTTP if you choose this route.
WebSocket has a lot less fluff, but you could as well use a raw TCP socket and send JSON payloads over it with your own "protocol", e.g. with an op attribute for the action and an error attribute in responses. Another protocol that comes to mind is Apache Thrift.
HTTP will never go away. The reason it's so widely used is that browsers are universal -and universally available- clients for arbitrary applications. And on top of that, you can mash multiple sources of content. The latter is also the reason that the web's security model is so awful.
Interesting, I'd say the legacy parts are the actual good things and most of the corporate giga-scale add-ons are what's bad about it, but then again, I'm sure that we get the same for C++, too…
Well, if it were in the same state as C++, that means it manages to let you get things right despite the historical baggage by embracing newer features in the standard.
As someone who's had to deal with many DDoS attacks I'm rather horrified at the thought of QUIC: dropping UDP at the network border eliminates a lot of headaches.
I'm not sure I follow. Such attacks happen because of poorly designed protocols with very asymmetric resource requirements between endpoints. My understanding is that the QUIC handshake has measures in place to prevent it from being a vector for amplification/reflection based denial-of-service attacks.
They're saying that blocking UDP entirely is a very effective (and cheap) mitigation of DDOS.
Because QUIC is built on UDP, however, you can't just block it, you have to ingest it and dedicate resources to filtering between QUIC and other UDP traffic.
I question the premise of such an approach. Denial-of-service is caused by applications that consume disproportionate resources based on untrusted user input. That’s entirely orthogonal to whether the application accepts input over UDP or TCP.
I would raise hell with my ISP/cloud vendor/network operator if they thought that it was appropriate to cut corners and block me from using UDP. That’s more likely to DoS me if it means my games or video calls (or any of a million things that legitimately use UDP) stop working or become significantly degraded.
If you've got a 10gbps server that's attracting 100gbps of UDP reflection DDoS traffic, and your host has ingress capacity for that, their easy/simple/low cost options to help are most likely null-routing your ip, which doesn't really feel like helping, or dropping udp to your ip.
If you're running a UDP service, you're going to need something more complex. maybe the host can drop all udp but port 80, but maybe the DDoSers can generate the reflection traffic to port 80 instead of random ports. If you're getting too much volumetric DDoS on your service port, you'll need to have some sort of filtering that understands your service traffic better and has enough ingress capacity. Usually that's expensive, not super flexible, and often adds a lot of latency.
Feel free to question the direction the sun raises from.
> Denial-of-service is caused by applications that consume disproportionate resources based on untrusted user input. That’s entirely orthogonal to whether the application accepts input over UDP or TCP.
It's not, UDP-based protocols are generally mis-directionable and amplifying, which allows for much easier DOS-ing.
> I would raise hell with my ISP/cloud vendor/network operator if they thought that it was appropriate to cut corners and block me from using UDP.
They're doing the exact opposite of cutting corners. But hey good luck using video calls when the routers are melting, I'm sure that's going to be great.
> That’s more likely to DoS me if it means my games or video calls (or any of a million things that legitimately use UDP) stop working or become significantly degraded.
Only if you operate under the misguided assumption that hole-punching is not a thing.
Hell, any NAT requires specific handling of inbound connections to perform proper translation, and "drop" is a perfectly good default translation for an unrequested inbound.
The state of proliferating sites that won't let you access them because you're not using an "approved" browser with JS and cookies enabled, and hiding behind the "security" excuse to do so. Ironic that I can't even read an article about HTTP in 2022 because of that racket.
Is it? Those who want to DDoS will always find a way, and meanwhile users with slightly odd hardware/software are being locked out. Admittedly the latter is a minority, but one of the key tenets of the Internet and what made it so successful was interoperability. This is, in some ways, even worse than (but somewhat of a following effect of) the effective browser monopoly.
Calling it "security" when it's really about "availability" is another deceptive misdirection, because the former is something that can more easily persuade the sheeple.
I don't really like to go to extremes, but fingerprinting clients and deciding access based on such should really be regarded as a moral equivalent to racial profiling.
Profiling happens all the time for entirely legitimate reasons in the real world. Racial profiling is immoral not because it is profiling; it is immoral because it is based on immutable characteristics which have no intrinsic bearing on the purpose of the profile.
To tie this back to the topic at hand, you're complaining that a service has decided your traffic resembles known patterns from bad actors, and is asking you to go through an extra step to access the content.
Are there better options? Maybe, but it's utterly asinine to compare what cloudflare is doing to racial profiling.
This is like saying no lock will stop a thief. Cost and difficulty matter, and requiring a full browser increases the challenge for an attacker enough that some people will give up and others won’t be able to send as much traffic. That’s not perfect but speaking from experience a surprising fraction of people will give up after a naive attack fails.
It’s accurate. That’s a marketing page for a DDoS prevention service so it’s not an unbiased source, and it’s especially important to remember the distinction between traffic hitting something like an edge node and actually reaching the target and causing harm. I see attacks fairly regularly (politics) but in most cases it means I see 15M block events for “GET /“ in Cloudflare’s dashboard but no actual impact on the service because they’re dropped quickly at locations around the world or, if they faked real browsers, they got a bunch of cache hits.
In other cases, people try more sophisticated attacks (e.g. posting random terms to a search page to avoid caching) and that’s more of a problem but it’s probably like 1% of the total traffic because it’s moved out of script kiddie territory into something where you need to have more skills and people don’t generally do that without a way to make money from it. One challenge with a DDoS in that regard is that it’s not subtle so your ability to wage an attack goes away relatively quickly without constant work replacing systems which are taken offline by a remote ISP.
If we weren't pretty good at stopping DDoS attacks, every major hosting provider would be offline daily. Yet, websites being inaccessible for me is fairly uncommon.
Server hosts want protection from DDoS. They hire a bouncer (I e. Cloudflare) to keep the hordes at bay. The bouncer asks for some way to distinguish your suspicious looking request from that of the start of a DDoS attack.
How is this an "externality" of cloudflare's service? If anything, it is an "externality" of the server hosts.
Does it even matter? Blame is irrelevant, I find responsibility to be a more interesting thing to discuss in virtually all circumstances. The DDoSers and spammers should cut it out (but unfortunately we can't really count on them for anything), Cloudflare should make more of an effort to preserve accessibility, and their customers should care about it too & make it clear to Cloudflare that it's a priority to them.
But if we take a step back it's clear that Cloudflare is the entity here that can have the biggest impact. Spammers and customers are diffuse, Cloudflare isn't. How much of the blame they deserve, I don't really care, but theres a problem, they're in the best position to act, so they have a responsibility to. In my opinion, that's the best mindset for solving problems.
> Blame is irrelevant, I find responsibility to be a more interesting thing to discuss
This is, to me, a bizarre notion. You're imposing some sort of moral edict upon cloudflare for providing an opt-in service to web hosts.
> they're in the best position to act
They have, and they decided that certain traffic requires additional consideration to distinguish from bad actors. Your refusal to cooperate is entirely on you.
Just because you don't like their solution doesn't mean that there is a better one. If you have better ideas, I'm sure they're all ears, as less intrusive is generally cheaper to implement.
Despite that I can visit websites where admins that set permissions not overly tough, I still run into enough blocks due to cloudflare that I'm considering investing time (ok for the last few years I've been really really lazy, I'm happy enough to copy and paste so I can manually write a nasty comment ) so that for each "bad load" the web site can be added to my disallow list. It might only save the smallest amount of wasted bandwidth but I guess it all adds up.
This plan looks even worse for privacy, and seems to rely on authenticating the unique device itself.
> Know your user is coming from an authentic device and signed application, verified by the device vendor directly.
If they manage to get wide adoption of something like this it seems like a very bad day for privacy, and a joyous day for advertisers, anyone who wants to prevent web scraping, Google and/or anyone who might want to make it hard to crawl the entire web…
I read further, and it sounds like they've designed it with privacy in mind. That's nice.
I'm still having a hard time seeing how this doesn't eventually lead to a completely locked-down internet, where users can only use approved browsers and devices.
They have a list of steps for how a request would be made, where steps 2 and 3 are:
> 2. Safari supports PATs, so it will make an API call to Apple’s Attester, asking them to attest.
> 3. The Apple attester will check various device components, confirm they are valid, and then make an API call to the Cloudflare Issuer (since Cloudflare acting as an Origin chooses to use the Cloudflare Issuer).
In a theoretical future world where 99% of site operators have set this up for protection, and 99% of users are using approved browsers, how would one do something like...
Create a competitor to Google? You'd need to crawl the web for that. Would you imagine Apple or Cloudflare would gladly let your device request millions of tokens per hour? Or would that be throttled or disallowed entirely?
Use curl (or telnet, or [any other HTTP client]) to grab a page?
Use yt-dlp to download a YouTube video?
Scrape a bunch of data for an AI project? See this article from the front page where someone scraped a bunch of car listings from KBB and trained a model to estimate car prices and found some interesting results. https://blog.aqnichol.com/2022/12/31/large-scale-vehicle-cla... - would something like that be permitted under a system like this? Or might you need to own/rent an army of authorized devices with authorized browsers to do that experiment?
The Apple attester will check various device components, confirm they are valid
Here's the "you will be under our control" part of their scheme. Running any "unauthorised" software? Rooted/jailbroken? Certain "security" features disabled? Using third-party replacement parts? ... Social credit score too low? Too bad, you're now denied access.
I've implemented PAT on a service. A server would only want to require PAT when it wants to see if it is dealing with a human. In the context of, say a blog, you would require PAT when someone wants to make an anonymous comment or create an account.
For requests which are "reading" you just serve content, unless for some reason you only want human eyeballs to see your content.
In your proposed scenario, reading the publicly accessible contents of the web, there should be no problems. (Of course some percentage of sites will accidentally have required PAT at any time and be unscannable, but presumably they figure that out and fix it.)
Now for the good side: I, reluctantly, implemented a geolocation filter to control anonymous content additions to that service I was alluding to. I felt bad about it, but I also felt bad having to filter out content spam every day. It turned out that all my strange content spam came from one country, so I banned 143 million people from anonymous content creation for my convenience.
With PAT I can remove the national ban and let any "probably human" in.
> For requests which are "reading" you just serve content, unless for some reason you only want human eyeballs to see your content.
I was going to cite the LinkedIn case where, last I had heard, the courts had decided that scraping was legal...
Headline [0] from April:
> Court rules that data scraping is legal in LinkedIn appeal
> LinkedIn has lost its latest attempt to block companies from scraping information from its public pages, including member pages.
... but upon googling it, I found a more recent [1] ruling :/
> LinkedIn prevails in 6-year lawsuit against data scraper
> The U.S. District Court for the Northern District of California sided with LinkedIn in its six year lawsuit against a firm that scraped data ...
So that sure puts a nail into the argument I was going to make. But still, while I think your use case lines up with the spirit of this kind of system, I think the reality is that it also would be used by every single site with a signup wall to kill off the archive.ph's of the world.
> In a theoretical future world where 99% of site operators have set this up for protection, and 99% of users are using approved browsers, how would one do something like [compete with apple or google]
A: they won't. and that's the plan. not to mention that now those are the only two players (microsoft a late third) that can both attest you and profile you locally on the device for advertising profiling.
>Talking about the "privacy" of what has been made publicly available makes no sense.
Yes, it does. Users often wish to be able to delete or make something that was once public private. For example someone could post a picture of themselves on twitter. A year later they are no longer comfortable with having pictures of themself online so they go and delete them. Despite the user deleting them malicious scrapers will not delete them and keep those images. Another example would be setting your real name to your twitter name. Later you aren't comfortable using your real name so you change it away. Scrapers may still have your real name despite you wanting it to be a secret.
Users often wish to be able to delete or make something that was once public private.
People also wish to be be able to do a lot of other things, but that doesn't make it right.
What becomes public history must remain immutable. Otherwise you're just going to encourage a state in which those who have the power to will destroy and rewrite the past to their advantage, to control the narrative over the population. The trendy phrase "right to forget" is effectively a "right to rewrite history".
It's interesting that you automatically call those wanting to preserve what could possibly be very important history "malicious scrapers".
>It's interesting that you automatically call those wanting to preserve what could possibly be very important history "malicious scrapers".
I am going off of twitter's view. If you store tweets locally you must listen for when they get deleted and then delete them on your end too. If a scraper is breaking twitter's rules I consider that malicious scraper.
I assume the OP means that scraping is used to collect public data, including that of individuals, which can even be linked across different websites. There’s at least a couple of services that try to connect somebody’s Instagram account to their FB/Twitter/LinkedIn etc. I assume some of those rely on scraping (+username checking), since the TOS for the APIs of those social networks probably prohibit this use case.
Yes, big datasets of user data can be created and sold. This user data can be joined across multiple sites to build up profiles on people. These datasets floating around can harm the reputation of a site.
If you think that's bad, you're not going to like the fact that they're letting the NSA design a system for "TPM-based Network Device Remote Integrity Verification":
The "new" HTTP is clearly targeted at the "approved browser".
For example, this alleged "head-of-line blocking problem" that HTTP/2 purportedly "solves" was never a problem of HTTP outside of a specific program, the graphical web browser, the type of client that tries to pull resources from different domains for a single website. Not all programs that use HTTP need to do that.
For instance I have been using HTTP/1.1 pipelining outside the browser for fast, reliable information retrieval for close to 20 years. It has always been supported by HTTP servers and it works great with the simple clients I use. I still rely on HTTP/1.1 pipelining today, on a daily basis. Never had a problem.
There are uses for pipelining besides the ones envisioned by "tech" companies, web developers and their advertiser customers.
If early hints breaks your proxy, it’s likely your proxy doesn’t handle 1xx status codes correctly. Could you tell me which proxy it is (privately if you think it necessary)? I’d like to chase the bug with them.
The big problem with pipelining in http 1.x is that a response can break the pipeline part way through and there is no way to know what the server processed. A response night for example, mid pipeline be connection:close and that’s that, did any subsequent request get processed? Who knows.
As I stated, I use HTTP/1.1 pipelining every day. I use it for a variety of information retrieval tasks, even retrieving bulk DNS data. To give an arbitrary example, sometimes I will download a website's sitemaps. This usually involves downloading a cascade of XML files. For example, there might be a main XML file called "index.xml". This file then lists hundreds more sitemap XML files, e.g., archive-2002-1.xml, archive-2002-2.xml, containing every content URL on the website beginning with some prior year all the way up to the present day. Using a real world example, index.xml contains 246 URLs. Using HTTP/1.1 pipelining I can retrieve all of them into a single file using a single TCP connection. Then I retrieve batches of the URLs contained in that file, again over a single TCP connection. Many websites allow thousands of HTTP requests HTTP/1.1-pipelined over a TCP single connection, but I usually keep the batch size at around 500-1000 max. Of course I want the responses in the same order as the requests.
1337855 is the number of URLs for [domainname]. Content URLs, not Javascript, CSS or other garbage.
yy030 is a C program that filters URLs from standard input
ka is a shell alias that sets an environment variable that is read by the yy025 program to indicate an HTTP header, in this case the "Connection:" header set to "keep-alive" not "close"
(ka- sets it back to close)
nc0 is a one line shell script
yy025|nc -vv h1b 80|yy045
yy025 is a C program that accepts URLs, e.g., dozens to hundreds to thousands of URLs, on stdin and outputs customised HTTP
h1b is a HOSTS file entry containg the address of a localhost-bound forward TLS proxy
yy045 is a C program that removes chunked transfer encoding from standard input
To verify the download, I can look at the HTTP headers in file "2". I can also look at the log from the TLS proxy. I have it set configured to log all HTTP requests and responses.
Is this a job for HTTP/2. It does not seem like it.
This type of pipelining using only a single TCP connection is not possible using curl or libcurl. Nor is it possible using nghttp. Look around the web and one will see people opening up dozens, maybe hundreds of TCP connections and running jobs in parallel, trying to improve speed, and often getting banned. As with the comment from the Jetty maintainer, I suspect using HTTP/2 would actually be slower for this type of transfer. It is overkill.
IMHO, HTTP, i.e., in the general sense, is not just for requesting webpages and resources for webpages.
I find HTTP/1.1 to be very useful. It is certainly not just for requesting webpages full of JS, CSS, images and the like. That is only one way I might use it. Perhaps HTTP/2 is the better choice for webpages. TBH, if using a "modern" graphical browser, I would be inclined to let it use HTTP/2.
Most of the time I am not using a graphical browser.
One of the many programmer memes is something along the lines of "naming is difficult." Yet programmers, individuals who are often obsessed with numbers, insist on trying to do it anyway. The results speak for themselves. This extends beyond programs. The so-called "tech" industry has produced some of the most absurd, non-descriptive business names in the history of the world.
I decided to try numbering the programs I write instead of naming them. I often use a prefix that can provide a hint.^1 For example, the yy prefix indicates it was created with flex and the nc in nc0 indicates it is a "wrapper script" for nc. If the program is one I use frequently, then I have no trouble remembering its number. In the event I forget a program number, I have a small text file that lists each yy program along with a short description of less than 35 chars.
1. But not always. I have some scripts that I use daily that are just a number. I also have a series of scripts that begin with "[", where the script [000 outputs a descriptive list of the scripts, [001, [002, etc. I am constantly experimenting, looking for easier, more pleasing short strings to type.
Each source file for a yy program is just a single .l file with a 3-char filename like 025.l, so searching through source code can be as simple as
grep whatever dir/???.l
If I put descriptions in C comments at top of each .l file I can do something like
head -5 dir/???.l
Aesthetically, I like have a directory full of files with filenames that follow a consistent pattern and are of equal length. Look at the source code for k, ngn-k or kerf. When it comes to programming, IMO, smaller is better.
they are simply constructing GET request headers "by hand " based on some xml file downloaded earlier an then sendind that list of GET via `nc`. the example is just over confusing and using file named as 1, 2
HTTP/1.1 support is the last bastion between the web and complete corporate control. Once the megacorp browsers and man in the middle companies like Cloudflare drop HTTP/1.1 we will no longer be able to host a website without the continued approval of a third party corporation. HTTP/2 and HTTP/3 implemenations require the use of CA based TLS.
Just to preempt misunderstanding: HTTPS is great. But HTTPS only, with no option for HTTP is very much worse than HTTP+HTTPS for human people. Despite being great for for profit companies and institutions.
>Just to preempt misunderstanding: HTTPS is great. But HTTPS only, with no option for HTTP is very much worse than HTTP+HTTPS for human people. Despite being great for for profit companies and institutions.
Using LE is great. It's problematic that literally everyone uses it but it's better than it not existing. But using LE does not solve the problem of not being able to use plain HTTP.
Dumb question, but how is HTTP important on local, switched networks? I have a single switch and don’t fear local MitM. I was under the impression basic HTTP is mostly fine then. Other parties, even on that same switch, won’t be able to listen in (a network hub would allow this).
I don't get what you're asking. Why is HTTP/S important?? I don't know how to answer that for you. Security is important regardless of where it is at. Defense in depth, at multiple layers. I don't fear MITM on a local only HTTPS server, I might not trust other devices/traffic on my network that scoop up _everything_ it can, and being in plaintext, would expose more than I want. I trust my services and devices. I don't trust everything on the network especially devices that are not mine, or that I have no control over (set top boxes, Roku TVs, Sec Cameras/NVRs, etc). Of course I have other controls and protections in place, but I trust and WANT LOCAL HTTPS on my services in addition to those.
Point being: as long as you trust all network devices from a trusted machine to another, on a switched Ethernet network, no other device will see any of that traffic at all, on a fundamental, low OSI level. It's not even about HTTP/S at that point yet. All this is untrue for WiFi, where you will want HTTPS indeed.
I'm not advocating against HTTPS at all. I use it as much as possible. But it might actually not be necessary, locally under the right circumstances.
Why not just self-sign for that? Outside of using it to test configurations or deployments, SSL seems a lot less necessary if it's just inside a LAN and all the clients are known.
Some services don't work without HTTPS or transport security. Sometimes, I want transport security on traffic for various reasons. Current mainstream browsers are also refusing to connect to 'self signed certificate' HTTPS sites because they're 'insecure' and continually are disallowing you the user to bypass these 'protections'.
> But HTTPS only, with no option for HTTP is very much worse
Have to agree with this. Been playing with `window.addEventListener('devicemotion')` recently, that shit is https only, which means I can't debug it on a simple localhost. WTF.
Try a quick and dirty local reverse proxy like Caddy. It provides self-signed localhost TLS, trusted on the OS level (if you agree) such that the browser is none the wiser.
> Nothing prevents you from using a self signed cert.
With the various web browsers continuing to disallow or blar warnings about "SELF SIGNED CERT", this is not true. There's a lot of _current issues_ trying to access a self signed HTTPS site using mainstream browsers because they know better than you do.
That is just an implementation detail. It's trivial to create your own local CA, put it into the trust store of your device, use a cert signed by it and be done with it.
I've been using self-signed certs on my websites for 20 years. Part of the problem is that the HTTP/3 implementations do not allow the use of self signed certs. CA based only.
That is just an implementation detail. It's trivial to create your own local CA, put it into the trust store of your device, use a cert signed by it and be done with it.
And that's fine if it's only you and some friends using it. But if I want a random person on the other side of the world to be able to search for $topic and load my website it's not gonna work.
For the last decade or so I've gotten about 1k hits per day on my self-signed HTTP+HTTPS site. Random people will click past the scare tactics of modern browsers re: self signed if the topic is already technical and the demographic understands browsers are stupid. But all these people would be unable to visit under HTTP/2 or HTTP/3 only.
HTTP/1.1 is never going away. Cloud vendors will need to support it approximately forever because so many applications are not HTTP/2 compliant, and technically not even HTTP/1.1 compliant (because of things like header casing) in a way that precludes “down casting” after ingress without a lot of hacky workarounds.
Honestly I don't think browsers will drop HTTP/1 anytime soon because it's inconvenient/impossible to do HTTPS on a local network. I mean, Chrome shows Not Secure on http connections, but not on localhost, and it doesn't mean much anyway. Which is FINE. I mean, maybe input[type=password] gets an alert on HTTP? Just a thought.
Very few people are going to be setting the compile time options for the libraries that implement HTTP/3 to enable plain HTTP/3 (no one has for HTTP/2). So Google, MS, Apple, etc not supporting plain HTTP/3 is a defacto standard more powerful than the spec Google/MS openwashed through the IETF.
Edit: Not worth commenting on this when I'm already getting accused of bad things. Sorry folks, but apparently wanting companies like CF regulated is too controversial.
I think you are projecting, they never mentioned KF. Cloudflare filters much more than just KF, try accessing any CF backed website from Tor. I guess you don't seem to care as long as it also filters the stuff you disagree with, though, so net neutrality is already out of the question for you lol.
Even then, I don't see what the problem is. Cloudflare simply refused to continue serving KiwiFarms as a customer, no? KF can still host elsewhere. And why should CF have the obligation to provide DDoS protection to everybody?
Not true. What Kiwi Farms has done though is help expose grooming of teenagers. Some of these people like to believe they're untouchable and their misdeeds can be hidden. They're not.
Instilling doubt or question in something is often more practical in effecting change than immediately announcing every facet of your argument. Sometimes it’s more practical to coerce change by presenting an opportunity to question an assumed assumption.
Maybe someone will see my comment and start down their own rabbit hole to find a conclusion. That is more ideal than immediately assuming the details of my personal assumptions and conclusions.
Meh. I think too much reliance on a single entity like CloudFlare isn’t good but your reply isn’t helping at all. I’d reconsider the approach it you really care about a decentralized internet.
@userbinator sets a good example elsewhere in this thread, imo.
The current path is to drastically increase complexity due to the demands of the content provider overlord(s); Basically, in order to better accommodate the needs of Google (and a handful of others), we must redefine things for everyone. It's becoming a complex, over-designed protocol that is being crammed down people's throats, instead of a protocol that is embraced because it makes sense.