As a Chinese user who regularly breaches the GFW, QUIC is a god send. Tunneling traffic over a QUIC instead of TLS to breach the GFW has much lower latency and higher throughput (if you change the congestion control). In addition, for those foreign websites not blocked by GFW, the latency difference between QUIC and TCP based protocol is also visible to the naked eye, as the RTT from China to the rest of the world is often high.
You won't casually breach the GFW. I would treat any advice posted publicly on the internet about how to breach the GFW as probably-malicious. They are better at networking than you are.
Beware when using VPN to breach the GFW. Recently a Chinese netizen had to pay over 1 million yuan (>145K USD) for using VPN [1][2]. Before this incident, only VPN service sellers were prosecuted [3]. Beware when doing this casually.
Foreigners need to worry about the new Chinese anti-espionage law instead [1]: at least 17 Japanese nationals have been recently accused of spying in China [2], and a US citizen jailed for life [3]. The German car industry is worried [4].
The law broadens the scope beyond what it originally sought to prohibit – leaks of state secrets and intelligence – to include any “documents, data, materials, or items related to national security and interests.” [1]
I’d say when I travel I rely on Google maps. Without it, yes it is still possible to find your ways but it is so much easier using those maps on the phone especially in non-English areas.
Why would you want to use Google Maps in China? All the crowdsourced information wouldn't be available and the government is hostile to it. Wouldn't it be better to use whatever the Chinese competitor is?
Unless your goal is to read about Tiananmen Square in Tiananmen Square. Which just doesn't sound smart.
Obviously, it's different if you live there. But on a two-week vacation it doesn't seem worth it.
Without Google services your phone is a brick? You can use Apple services, Microsoft services, any number of other websites. People really like to dramatize the GFW.
On an android phone, 5 years ago, it turned into a brick when i stepped out of shenzhen airport. There is a surprising amount of chatty network stuff going on under the hood that stops working. I ended up using a vpn which fixed it. Even basic stuff like contacts no longer worked .
Plus without gmail, you can't recover a password or in some cases authorise yourself on 3rd party services.
> I would treat any advice posted publicly on the internet about how to breach the GFW as probably-malicious. They are better at networking than you are.
The default congestion control is CUBIC, which is very slow for connections between China and the rest of the world. Google's BBR is a great improvement, and sometimes I use "brutal" congestion control, which is basically a constant speed.
> Zscaler best practice is to block QUIC. When it's blocked, QUIC has a failsafe to fall back to TCP. This enables SSL inspection without negatively impacting user experience.
Seems corporate IT is resurging after a decade of defeat
Man, here I am reading this while fighting zScaler when connecting to our new package repository (it breaks because the inspection of the downloads takes too long). No one feels responsible for helping developers. Same with setting up containers, git, Python, and everything else that comes with its own trust store, you have to figure out everything by yourself.
It also breaks a lot of web pages by redirecting HTTP requests in order to authenticate you (CSP broken). Teams GIFs and GitHub images have been broken for months now and no one cares.
At least for me that’s the problem. When I open the redirect url manually it also fixes the problem for some time.
You can open the Teams developer tools to check this. Click the taskbar icon 7 times, then right click it. Use dev tools for select web contents, choose experience renderer AAD. Search for GIFs in Teams and monitor the network tab
It is the single most annoying impediment in corporate IT. And you are on your own when you need to work around the issues it causes. Is it really providing value, or is it just to feel better about security?
Sure it is. The org insists on making your life difficult, and you just want to get your work done. If they really cared about security they would prioritise fixing stuff like this, but they don't, so you know they don't really care, it's just for show and a need for control.
And if they don't really care about security, why should you?
The cloud service. I don't know what it's called exactly. It just says "Zscaler".
In terms of better solutions, I would prefer a completely different approach. Securing the endpoint instead of the network. Basically the idea of Google's "BeyondCorp".
What happens now is that people just turn off their VPN and Zscaler client to avoid issues, when they're working from a public hotspot or at home. In the office (our lab environment) we unfortunately don't have that option.
But by doing so they leave themselves much more exposed than when we didn't have Zscaler at all.
Blame viruses, malware, phishing, ransomware, etc. IT has a responsibility to keep the network secure. Google is already experimenting with no Internet access for some employees, and that might be the endgame.
This has nothing to do with security and more with ineffective practices based on security where nobody knows why its done just that its done. Running MitM on connections basically breaks basic security mechanism for some ineffective security theater. This is basically "90-day password change" 2.0.
> where nobody knows why its done just that its done
Compliance. You think your IT dept wants to deploy this crap? How ever painful you think it is as an end user multiply it having to support hundreds/thousands of endpoints.
Look, I hate traffic inspection as much as the next person but this is for security, it's just not for the security you want it to be. This is so you have an audit trail of data exfiltration and there's no way around it. You need the plaintext to do this and the whole network stack is built around making this a huge giant pain in the ass. This is one situation where soulless enterprises and users should actually be aligned. Having the ability in your OS to inspect the plaintext traffic of all incoming and outgoing traffic by forcing apps off raw sockets would be a massive win. People. seem to understand how getting the plaintext for DNS requests is beneficial to the user but not HTTP for some reason.
Be happy your setup is at least opportunistic and not "block any traffic we can't get the plaintext for."
No, they really really don't. Source: I've worked in corporate IT for many years, and this kind of shit is always forced upon us just as much as it is on you guys. We hate it too.
Not the OP, but currently I work in a regulated industry (financial) where Corporate Risk and Legal depts ask for this stuff (and much more) to satisfy external auditors. The IT people hate it just as much.
I had never experienced just how much power a single dept could hold until we got acquired by a large finance enterprise and had to interact with the Risk dept.
Still, what exactly has changed in the last year that Zscaller/Netskope became prevalent? What law has changed? Can someone pinpoint on it. I work for telecom company for example, two years ago there was no zscaller/netskope MITM in my request from the corporate laptop to Internet, today there is one. What law has changed if any what mandates that? If that matter ISP is registered at NJ.
20 years ago I was configuring VPNs on work laptops that then had all the exit traffic routed to a Bluecoat system to MITM the traffic. The difference is that zScaler is "Zero Trust" so you are actually not on a VPN anymore. It's intercepting the traffic locally and then determining what to do with it. At my current workplace we are using it to access internal services only; allowing all external traffic to exit directly.
Not in your country, but my point about compliance wasn't that a law requires it specifically (laws don't specify technical "solutions" anyway) - just that often the IT dept is compelled by other depts (eg Risk) to implement and support stuff that allows that other dept to show auditors that they are doing something rather than being negligent.
MitM can absolutely stop threats if done correctly. A properly configured Palo Alto firewall running SSL Decryption can stop a random user downloading a known zero-day package with Wildfire. Not saying MitM is an end all be all, but IMHO the more security layers you have the better.
At the end of the day, it's not your network/computer. There's always going to be some unsavvy user duped into something. If you don't like corporate IT, you're free to become a contractor and work at home.
"A properly configured Palo Alto firewall running SSL Decryption can stop a random user downloading a known zero-day package with Wildfire."
Instead that Corp IT should have put a transparently working antivirus/malware scanner on the workstation that would prevent that download to be run at all.
?
DPS/MITM are not security layers but more of privacy nightmares.
> Instead that Corp IT should have put a transparently working antivirus/malware scanner on the workstation that would prevent that download to be run at all. ?
Sure. Then come the complaints that this slows down endpoint devices and has compatibility issues. Somebody gets the idea to do this in the network. Rinse. Repeat.
It's a knife's edge. One OS patch, or one vendor change in product roadmap, and you can be right back to endpoint security software performance and compatibility hell. Stuff has gotten better but it's still fraught with peril.
I disagree, I think you should have both as an endpoint scanner (either heuristics or process execution) may not catch anything. (for example a malicious Javascript from an advertisement)
Why do you care so much about your privacy while you're on company time using their computers, software, and network? If you don't like it, bring your own phone/tablet/laptop and use cellular data for your personal web browsing. FWIW, it's standard practice to exempt SSL decryption for banking, healthcare, government sites, etc.
There are real valuable practices that helps security, and there are practices just break security.
Particularly MITM practice is a net negative. Rolling password resets and bad password requirements are also net negatives. Scanners which does not work as intended, which are not proofed at all and introduce slowness, feature breaks are possible negatives.
Also at some places they introduce predatory privacy nightmares like key loggers, screen recorders..
You’re training users to ignore certificate errors – yes, even if you think you’re not – and you’re putting in a critical piece of infrastructure which is now able to view or forge traffic everywhere. Every vendor has a history of security vulnerabilities and you also need to put in robust administrative controls very few places are actually competent enough to implement, or now you have the risk that your security operators are one phish or act of malice away from damaging the company (better hope nobody in security is ever part of a harassment claim).
On the plus side, they’re only marginally effective at the sales points you mentioned. They’ll stop the sales guys from hitting sports betting sites, but attackers have been routinely bypassing these systems since the turn of the century so much of what you’re doing is taking on one of the most expensive challenges in the field to stop the least sophisticated attackers.
If you’re concerned about things like DLP, you should be focused on things like sandboxing and fine-grained access control long before doing SSL interception.
A competent organisation will have a root certificate trusted on all machines so you won't be ignoring certificate errors. You are right however that you are funnelling your entire corporate traffic unencrypted through a single system, break into that and you have hit the goldmine.
> A competent organisation will have a root certificate trusted on all machines so you won't be ignoring certificate errors.
You definitely need that but ask anyone who’s done it and they’ll tell you that flushing out all of the different places interception causes problems with pinned certs, protocol level incompatibilities, etc. which inevitably someone will try to solve by turning off some security measures. This will inevitably include this like your help desk people trying to be helpful and not realizing that the first hit on Stack Exchange suggesting adding “-k” is not actually a good idea.
This is exacerbated by the low quality of the vendor appliances most places use to implement these policies. For example, Palo Alto will break the Windows SChannel certificate revocation check - there’s still no workaround but I guarantee you won’t know all of the places where that’s been disabled. They also don’t support the secure session renegotiation extension to TLS 1.2 (RFC 5746 from 2010) which I know because OpenSSL 3 started requiring that and had to stop multiple teams from “solving” using a terrible solution from the first hit on Google. Amusingly, they do correctly implement TLS 1.3 so I’ve been able to fix this for multiple open source projects by getting them to enable 1.3 in their CDN configuration.
Correct, this is table stakes to get SSL Decryption working for any vendor. Typically we're talking about Windows PC's joined to Active Directory and they already trust the domain's CA. The firewall then gets it's own CA cert issued by the domain CA, so when you go to www.facebook.com and inspect the certificate it says it is from the firewall.
Most orgs don't inspect sensitive things like banking, healthcare, government sites, etc. Also it's very common to make exceptions to get certain applications working (like Dropbox).
Yes, if you want/need to do those things, then you need to inspect user traffic. But why do you want/need to do those things in the first place? What's your threat model?
Doing this breaks the end-to-end encryption and mutual authentication that is the key benefit of modern cryptography. The security measures implemented in modern web browsers are significantly more advanced and up-to-date than what systems like Zscaler are offering, for example in terms of rejecting deprecated protocols, or enabling better and more secure protocols like QUIC. By using something like Zscaler, you're introducing a single point of failure and a high value target for hackers.
A competent org and good mitm device will have trusted internal root certs on all endpoints, so cert errors are not a problem. The proxy can be set to passthrough or block sites with cert errors (expired, invalid), so there isn't any "bad habits training" of users clicking through cert errors. Several vendors today support TLS 1.3 decryption.
I don't know what you mean by SPOF for a proxy: they are no more a SPOF than any properly redundant network hop.
A proxy doesn't break encryption. Endpoints trust the mitm.
Now, I think that someday the protocols of the web such as quic will get so locked down that the only feasible threat prevention will be heuristic analysis of network traffic, and running all threat scanning on endpoints (with some future OS that has secure methods of stopping malicious network or executables before said traffic leaves some quarantine).
Everything I wrote earlier is based on the use of Zscaler proxy at work, so it's very much about practice, not theory.
Yes, of course the Zscaler root certs have been installed on our endpoints. The problem is that the proxy is replacing the TLS certificate of the origin server with its own certificate, which makes impossible for the browser to verify the identity of the origin server and trust the communication. The browser can only verify that it is communicating with the proxy; it cannot verify anymore that it is communicating with the origin server.
That's what makes Zscaler and similar solutions a SPOF. I know that Zscaler is using a distributed architecture with no hardware or network SPOF. But Zscaler is a SPOF from an organizational perspective. If you hack them, you get access to everything. That's what me and other commenters meant by SPOF in that context.
> A proxy doesn't break encryption. Endpoints trust the mitm.
I didn't write that it's breaking encryption. I wrote it's breaking end-to-end encryption and authentication. I'm sure you understand the difference.
> Now, I think that someday the protocols of the web such as quic will get so locked down that the only feasible threat prevention will be heuristic analysis of network traffic
We're already there. HTTP/3 (QUIC) already amounts for about 30% of the traffic served by Cloudflare to humans [1]. QUIC is actually offering a higher level of security by encrypting more metadata that HTTP/1 and 2 (specifically the part within the TCP headers that can be leveraged by an attacker when it is in clear).
> A competent org and good mitm device
That's the main problem. Those proxies are usually less scrutinized and have smaller engineering and security teams than major modern web browsers like Edge, Chrome, Firefox and Safari, and as a consequence have more vulnerabilities.
In general, major modern web browsers enforce stronger security requirements than Zscaler:
- For example, the following website, using a potentially insecure Diffie-Hellman key exchange over a 1024-bit group (Logjam attack), is blocked by Chrome and Firefox but not by Zscaler: https://dh1024.badssl.com/
Oof, I’ve complained about practical problems in my developer life above, but that’s even worse than I thought. I was able to reproduce dh1024 and no-sct on my work laptop with zScaler. Interestingly it blocks the revoked one by turning it into a self-signed one.
> But why do you want/need to do those things in the first place? What's your threat model?
Not everyone in a company is savvy or hard at work. Randy in accounting might spend spend an hour or more a day browsing the internet and scooping up ads and be enticed to download something to help speed up their PC which turns out to be ransomware.
This assumes Randy is incompetent, but not malicious. Nothing is stopping an attacker from contacting Randy out of band, say over a phone or personal email, and then blackmailing him to get him to hand out company information. The key here is to scope down Randy's access so that no matter what kind of an employee he is, the only access Randy has is the minimum necessary and that all of his accesses to company information is logged for audit and threat intelligence purposes.
That's the problem with these MITM approaches. They open up a new security SPOF (what happens if there's an exploit on your MITM proxy that an attacker uses to gain access to the entire firehose of corporate traffic) while doing little to protect against malicious users.
I think the undertone of your comment says a lot - corporations that feel the need to MITM all traffic tend to not trust their employees (from my experience dealing with this area) - either their competence or their work ethic.
All round, full traffic inspection is generally a bad idea except for some very limited cases where there is a clearly defined need.
* Data leaks are not prevented by MITM attack. A sufficiently determined data leaker will easily find easier or elaborate ways to circumvent it.
* Malware scanning can be done very efficiently at the end user workstation. ( But always done inefficiently )
* How domain blocking requires a MITM?
* C2 scanning can efficiently done at the end user workstation.
* Audits does not require "full contents of communication"
Is MITM ever the answer?
Stealing a valid communication channel and identity theft of remote servers is in fact break basic internet security practices.
Eventually I think the endgame here is that you use your own personal BYOD device to browse the internet that is not able to connect to the corporate network.
Thanks for the link. I’ve seen it done if the defense industry. Interesting to see Google doing this for a small subset of their employees not needing Internet for their job.
I read this recently for sysadmins at Google and Microsoft that have access to absolute core services like authentication, which does make sense to keep these airgapped
This sounds like a misunderstanding of the model. Usually these companies have facilities that allow core teams to recover if prod gets completely fucked e.g. auth is broken so we need to bypass it. Those facilities are typically on separate, dedicated networks but that doesn’t mean the people who would use them operate in that environment day to day.
Google disabling Internet access is very different from your typical company doing that. Watching a YouTube video? Intranet and not disabled. Checking your email on Gmail? Intranet and not disabled. Doing a web search? Intranet and not disabled. Clicking on a search result? Just use the search cache and it's intranet.
Blame the law. Companies are bound by it. Actually blame terrible programming practices and the reluctance to tie the long tail of software maintenance and compliance to the programmers and product managers that write them.
companies can be held liable for what people using their networks do, so they need a way to prove it's not their fault and provide the credentials of the malevolent actor.
it's like call and message logs kept by phone companies.
nobody likes to keep them but it's better than the breaking the law and risking for someone abusing your infrastructure.
it would also be great if my colleagues did not use the company network to check the soccer stats every morning for 4 hours straight, so the company had to put up some kind of domain blocking that prevents me from looking up some algorithm i cannot recall from the top of my mind on gamedev.net because it's considered "gaming"
Looking up soccer stats is not illegal so the company doesn't have to block it.
Blocking the website instead of punishing them in their performance reviews (assuming it does impact their performance, if they're still productive why even care) is useless, they'll use their phones and still spend time on it.
> Looking up soccer stats is not illegal so the company doesn't have to block it.
it's not because it's illegal, it's because they are wasting time on the job using company's equipment for something not work related. And usually they end up clicking everywhere on shady ads, trackers etc.
We are in Italy, there's no such thing as performance review here, if you get hired you get paid every month (actually 13 times a year, sometimes 14) and nobody can fire you ever again.
> they'll use their phones and still spend time on it.
You can try to block zscaller ( or netskope) IP on you home router.
Most of the times IT laptops are default to 'normal' web behaviour if zscaller/netskope is not available.
QUIC is a really nice protocol as well, I find. It basically gives you an end-to-end encrypted & authenticated channel over which you can transport multiple streams in parallel, as well as datagrams. A lost packet in a given stream won't block other streams, and the overhead is quite low. Both ends of the connection can open streams as well, so it's really easy to build bidirectional communication over QUIC. You can also do things like building a VPN tunnel using the datagram mechanism, there are protocols like MASQUE that aim to standardize this. Apple is using a custom MASQUE implementation for their private relay, for example.
HTTP/3 is a protocol on top of QUIC that adds a few more really interesting things, like qpack header compression. If you e.g. send a "Content-type: text/html" header it will compress to 2 bytes as the protocol has a Huffman table with the most commonly used header values. I found that quite confusing when testing connections as I thought "It's impossible that I only get 2 bytes, I sent a long header string..." until I found out about this.
I dabbled with QUIC a few years ago and I couldn't agree more. It was pleasant to work with, and because it's UDP based, suddenly you can do NAT hole punching more easily.
Funny that you mentioned a VPN, because I made a little experimental project back then to hole-punch between two behind-the-NAT machines and deliver traffic between them over QUIC. I was able to make my own L2 and L3 bridges across the WAN, or just port forward from one natted endpoint to an endpoint behind a different NAT.
At one point I used it to L2-bridge my company's network (10.x) to my home network (192.168.x), and I was able to ping my home server from the bridging host, even though it was different networks, because it was essentially just connecting a cable between the networks. It was quite fun.
I only tested one hop, e.g. A(10.x)---->B(192.x). All I had to do there was to adapt the routing tables: On A, route 192.x traffic to the "natter" tap/tun interface (I always forget which is L2), and on B, route traffic to 10.x accordingly. That's all.
For it to be routable in the entire network, you'd need to obviously mess with a lot more :-D
Well it's abandoned and experimental, and there are better ways to hole punch than what I did, e.g. using STUN and TURN. But yeah it could replace one ngrok use case, though I think ngrok does not do L2/3 bridges.
Also: I think technically this was my first Go program (https://github.com/binwiederhier/re), but that was so tiny that it doesn't really count. ;-)
I believe that qpack(like hpack) still has some sharp edges. As in...low entropy headers are still vulnerable to leakage via a CRIME[0] style attack on the HTTP/3 header compression.
In practice, high entropy headers aren't vulnerable, as an attacker has to match the entire header:value line in order to see a difference in the compression ratio[1].
> And there are bit flags in the encoder instructions that carry header values that signal they should never be inserted into a dynamic table.
My understanding is that these header values have to be known to the Qpack implementation beforehand. As a result, it isn't possible for an HTTP/3 client to signal to Qpack to treat a particular custom header as a never-indexed literal.
I’ve been trying to wrap my mind around whether (and how much) QUIC would be better than TCP for video streams that rely on order-sensitive delivery of frames, especially for frames that have to be split into multiple packets (and losing one packet or receiving it out of order would lose the entire frame and any dependent frames). We used to use UDP for mpegts packets but found TCP with a reset when buffers are backlogged after switching to h264 to be a much better option over lossy WAN uplinks, (scenario is even worse when you have b or p frames present).
The problem would be a lot easier if there were a feedback loop to the compressor where you can dynamically reduce quality/bandwidth as the connection quality deteriorates but currently using the stock raspivid (or v4l2) interface makes that a bit difficult unless you’re willing to explicitly stop and start the encoding all over again, which breaks the stream anyway.
QUIC handles the packet ordering and retransmission of lost packets before it is handed to the upper layer of the application. However, if you are just sending one channel of video packets through the pipe, it probably won’t buy you much more. Where QUIC additionally excels is being able to send and/or receive multiple streams of data across the “single” connection where each is effectively independent and does not suffer head of line blocking across all streams.
> stop and start the encoding all over again, which breaks the stream anyway.
It's a common thing I wish encoders could do - if I try to compress a frame and the result is too large to fit in my packet/window, I wish I could 'undo' and retry compression of the same frame with different options.
Sadly all hardware video compressors mutate internal state when something is compressed, so there is no way to undo.
Since it lives on top of UDP, I believe all you need is SOCK_DGRAM, right? The rest of QUIC can be in a userspace library ergonomically designed for your programming language e.g. https://github.com/quinn-rs/quinn - and can interoperate with others who have made different choices.
> Since it lives on top of UDP, I believe all you need is SOCK_DGRAM, right? The rest of QUIC can be in a userspace library ergonomically designed for your programming language e.g. (...)
I think that OP's point is that there's no standard equivalent of the BSD sockets API for writing programs that communicate over QUIC, which refers to the userspace library you've referred to.
A random project hosted in GitHub is not the same as a standard API.
Did they not answer that question? It uses the BSD sockets API with SOCK_DGRAM?
Right, that random project is not a standard API - its built using a standard API. You wouldn't expect BSD sockets to have HTTP built in... so you can find third-party random projects for HTTP implmented with BSD sockets just like you can find QUIC implmented with BSD sockets.
QUIC is roughly TCP-equivalent not HTTP-equivalent, and we do have a BSD sockets API for TCP. You might be thinking of HTTP/3 rather than QUIC; HTTP/3 actually is HTTP-equivalent.
You can turn the OP's question around. Every modern OS kernel provides an efficient, shared TCP stack. It isn't normal to implement TCP separately in each application or as a userspace library, although this is done occasionally. Yet we currently expect QUIC to be implemented separately in each application, and the mechanisms which are in the OS kernel for TCP are implemented in the applications for QUIC.
So why don't we implement TCP separately in each application, the way it's done with QUIC?
Although there were some advantages to this while the protocol was experimental and being stabilised, and for compatibility when running new applications on older OSes, arguably QUIC should be moved into the OS kernel to sit alongide TCP now that it's stable. The benefit of having Chrome, Firefox et al stabilise HTTP/3 and QUIC were good, but that potentially changes when the protocol is stable but there are thousands of applications, each with their own QUIC implementation doing congestion control differently, scheduling etc, and no cooperation with each other the way the OS kernel does with TCP streams from concurrent applications. Currently we are trending towards a mix of good and poor QUIC implementations on the network (in terms of things like congestion control and packet flow timing), rather than a few good ones as happens with TCP because modern kernels all have good quality implementations of TCP.
> QUIC is roughly TCP-equivalent not HTTP-equivalent, and we do have a BSD sockets API for TCP. You might be thinking of HTTP/3 rather than QUIC; HTTP/3 actually is HTTP-equivalent.
No, I understand QUIC is a transport and HTTP/3 is the next HTTP protocol that runs over QUIC. I was saying QUIC can be userspace just like HTTP is userspace over kernel TCP API. We haven't moved HTTP handling into the kernel so what makes QUIC special?
I think it is just too early to expect every operating system to have a standard API for this. We didn't have TCP api's built-in originally either.
> What makes TCP special that we put it in the kernel? A lot more of those answers apply to QUIC than to HTTP.
I mean, that seems like GP's point. There's no profound reason TCP should be implemented in the kernel, besides trying to fit every form of IO into Unix's "everything is a file descriptor" philosophy.
If QUIC works in userspace, that's fine. If it ain't broke, don't fix it.
Huh, you think so? I'd be a bit surprised if that was the intent, because TCP obviously is in the kernel, so if their point blatantly disagrees with preexisting decisions in every OS then that's something that needs to be addressed before we can talk about QUIC.
> I was saying QUIC can be userspace just like (...)
I think you're too hung up on "can" when that's way besides OP's point. The point is that providing access to fundamental features through a standard API is of critical importance.
If QUIC is already massively adopted them there is no reason whatsoever to not provide a standard API.
If QUIC was indeed developed to support changes then there is even fewer arguments to not provide a standard API.
Isn't the point of QUIC to offer high performance and flexibility at the same time? For these requirements, a one-size-fits-all API is rarely the way to go, so individual user-space implementations make sense. Compare this to file IO: For, many programs, the open/read/write/close FD API is sufficient, but if you require more throughput or control, its better to use a lower-level kernel interface and implement the missing functionality in user-space, tailored to your particular needs.
It occurs to me that QUIC could benefit from a single kernel-level coordinator that can be plugged for cooperation - for instance, a dynamic bandwidth-throttling implementation a la https://tripmode.ch/ for slower connections where the coordinator can look at pre-encryption QUIC headers, not just the underlying (encrypted) UDP packets. So perhaps I was hasty to say that you just need SOCK_DGRAM after all!
There are dozen of libs to do it right now but I expect ultimately distro folks will want to consolidate it all to avoid redundant dependencies, so we'll probably end up with openssl/gnutls/libressl doing it eventually and most apps using that.
Note there were talks to add it in-kernel like kTLS, but since the certificate handling is difficult that part will be offloaded to userspace as it is with kTLS -- you won't ever get an interface like connect() and forget about it.
"QUIC improves performance of connection-oriented web applications that are currently using TCP.[QUIC] is designed to obsolete TCP at the transport layer for many applications, thus earning the protocol the occasional nickname "TCP/2"."
Also taken from the wiki page:
"QUIC aims to be nearly equivalent to a TCP connection but with much-reduced latency."
Here is the only relevant bit regarding TLS:
"As most HTTP connections will demand TLS, QUIC makes the exchange of setup keys and supported protocols part of the initial handshake process. When a client opens a connection, the response packet includes the data needed for future packets to use encryption. This eliminates the need to set up the TCP connection and then negotiate the security protocol via additional packets."
[1] https://en.m.wikipedia.org/wiki/QUIC
The RFCs for QUIC (RFC 9000 and RFC 9001) mandate encryption.
Some random stackoverflow answer[1] claims there are implementations that ignore this and allow "QUIC without encryption", but I'd argue that it's not QUIC anymore -- in my opinion it'd be harmful to implement in the kernel.
Per the RFC, QUIC is literally defined as a transport protocol... Literally the first sentence of the overview is:
"QUIC is a secure general-purpose transport protocol."
More importantly, QUIC literally bakes in TLS into how it works which is a far cry away from replacing it.
"The QUIC handshake combines negotiation of cryptographic and transport parameters. QUIC integrates the TLS handshake [TLS13], although using a customized framing for protecting packets."
> So why don't we implement TCP separately in each application, the way it's done with QUIC?
Because library/dependency management sucked at the time, and OSes competed for developers by trying to offer more libraries; on the other side we were less worried about EEE because cross-platform codebases were rare and usually involved extensive per-OS code with #ifdefs or the like.
I don't think we would or should put TCP in the kernel if it was being made today.
> So why don't we implement TCP separately in each application, the way it's done with QUIC?
Because TCP also does multiplexing, which must be handled in some central coordinating service. QUIC doesn't suffer from that since it offloads that to UDP.
You wouldn't use a system TLS implementation either (well, technically SChannel/NetworkTransport exist, but you're vastly better off ignoring them).
That's going to happen regardless, that's just the nature of Ethernet, or even just existing in reality. TCP has to deal with the same thing. Either way the solution is the same: acks and retransmissions.
> Did they not answer that question? It uses the BSD sockets API with SOCK_DGRAM?
No, that does not answer the question, nor is it a valid answer to the question. Being able to send UDP datagrams is obviously not the same as establishing a QUIC connection. You're missing the whole point of the importance of having a standard API to establish network connections.
> Right, that random project is not a standard API - its built using a standard API.
Again, that's irrelevant and misses the whole point. Having access to a standard API to establish QUIC connections is as fundamental as having access to a standard API to establish a TCP connection.
> so you can find third-party random projects for HTTP (...)
The whole point of specifying standard APIs is to not have to search for and rely on random third-party projects to handle a fundamental aspect of your infrastructure.
Not having a system API is the entire point of QUIC. The only reason QUIC needs to exist is because the sockets API and the system TCP stacks are too ossified to be improved. If you move that boundary then QUIC will inevitably suffer from the same ossification that TCP displays today.
No, the reason QUIC exists is that TCP is ossified at the level of middle boxes on the internet. If it had been possible to modify TCP with just some changes in the Linux, BSD and Windows kernels, it would have been done.
"""
A middlebox is a computer networking device that transforms, inspects, filters, and manipulates traffic for purposes other than packet forwarding. Examples of middleboxes include firewalls, network address translators (NATs), load balancers, and deep packet inspection (DPI) devices.
"""
I don't know about that. Without middlebox problems we might have used SCTP as a basis and upgraded it. But it's so different from TCP that I doubt we would have done it as a modification of TCP.
The alternative is that either browsers will be the only users of QUIC - or that each application is required to bring its own QUIC implementation embedded into the binary.
If ossification was bad if every router and firewall has its own TCP stack, have fun in a world where every app has its own QUIC stack.
User-space apps have a lot more avenues for timely updates than middleboxes or kernel-space implementations do though, and developers have lots of experience with it. If middleboxes actually received timely updates and bugfixes, there would be no ossification in the first place, and a lot of other experiments would have panned out much better, much sooner than QUIC has (e.g. TCP Fast Open might not have been DOA.)
There's also a lot of work on interop testing for QUIC implementations; I think new implementations are strongly encouraged to join the effort: https://interop.seemann.io/
I am not seeing the problem with every participant linking in their own QUIC implementations. The problem of ossification is there is way too much policy hidden on the kernel side of the sockets API, and vanishingly few applications are actually trying to make contact with Mars, which is the use-case for which those policies are tuned.
There are a billion timers inside the kernel, not all of which can be changed. Some of them are #defined even.
In these days when machines are very large and always have several applications running, having an external network stack in the kernel violates the end-to-end principle. All of the policy about congestion, retrying, pacing, shaping, and flow control belong inside the application.
Can you point me to an example of a timer in the kernel that is not settable/tunable that should be? My experience in looking at such things suggests that most of the #defined bits are because RFCs define the protocol that way.
As for network stack per application: you're more than welcome to so so a myriad of ways - linux provides many different ways to pull raw IP, or raw ethernet into userspace (e.g. xdp, tun/tap devices, dpdk, and so on). It's not like you're being forced to use the kernel stack from lack of supported alternatives.
I wouldn't. I would write down the protocol in a good and extensible way, the first time. It's no good throwing something into the world with the assumption that you can fix the protocol later.
> The alternative is that either browsers will be the only users of QUIC - or that each application is required to bring its own QUIC implementation embedded into the binary.
This is already being done with event loops (libuv) and HTTP frameworks. I don't see why this would be a huge issue. It's also a boon for security and keeping software up-to-date because it's a lot easier to patch userspace apps than it is to roll out a new kernel patch across multiple kernels and force everyone to upgrade.
> because the sockets API and the system TCP stacks are too ossified to be improved
What part of the sockets API specifically do you think is ossified? Also, that doesn't seem to have kept the kernel devs from introducing new IO APIs like io_uring.
I think the point of QUIC is 'if the implementation other using is problematic, I can use my own. And no random middlebox will prevent me from doing so' instead of `everyone must bring their own QUIC implementation.`
There is a slight different here. It's the difference between 'the right to do' and 'the requirement to do'.
While at same time. You must use 'system tcp implementation' and you are not allowed to use custom one. Because even system allow it (maybe require root permission or something), the middlebox won't.
> Not having a system API is the entire point of QUIC. The only reason QUIC needs to exist is because the sockets API and the system TCP stacks are too ossified to be improved.
I don't think your take is correct.
The entire point if QUIC was that you could not change TCP without introducing breaking changes, not that there were system APIs for TCP.
Your point is also refuted by the fact that QUIC is built over UDP.
As far as I can tell there is no real impediment to provide a system API for QUIC.
> If you e.g. send a "Content-type: text/html" header it will compress to 2 bytes as the protocol has a Huffman table with the most commonly used header values
Reminds me of tokenization for LLMs. 48 dashes ("------------------------------------------------") is only a single token for GPT-3.5 / GPT-4 (they use the cl100k_base encoding). I suppose since that is used in Markdown. Also "professional illustration" is only two tokens despite being a long string. Whereas if you convert that to a language like Thai it is 17 tokens which sucks in some cases but I suppose tradeoffs had to be made
That makes sense -- if a packet is lost, and it affected just one asset, but you're on TCP, then everything has to wait till the packet is re-requested and resent.
HTTP2 allowed multiple streams over one TCP stream, but that kinda made HoL blocking worse, because in the same scenario HTTP 1.1 would have just opened multiple TCP streams. QUIC, as GP said, basically gives you a VPN connection to the server. Open 100 streams for reliable downloads and uploads, send datagrams for a Quake match, all over one UDP 'connection' that can also reconnect even if you change IPs.
There is an API for this on Linux. It's used to checkpoint the state of a TCP connection, for example to move a live TCP connection to another machine incase you want to do zero downtime hardware replacement while keeping all TCP sockets open.
I can't wait to start implementing RTP atop QUIC so we can stop having to deal with the highly stateful SIP stack and open a media connection the same way we open any other Application layer connection.
QUIC is junk.. You people all care about raw throughput but not about multiuser friendless. Selfish. Problem is, QUIC is UDP and so its hard to police/shape.
I really want to play some FPS game while someone is watching/browsing web.
Also, I really want my corpo VPN have a bit of priority over web, but no, now I cannot police it easly. TCP is good for best effort traffic, and thats where I classify web browsing, downloading, VoD streaming. UDP is good for gaming, voice/video conferencing, VPNs (because they are encapsulate stuff you put another layer somewhere else).
I feel like that’s an ungenerous characterization. First, QUIC should contain some minimal connection info unencrypted that can be middleware to do some basic traffic shaping. It’s also intentionally very careful to avoid showing too much to avoid “smart” middleware that permanently ossifies the standard as has happened to TCP.
Finally, traffic shaping on a single machine is pretty easy and most routers will prefer TCP traffic to UDP.
Finally, the correct response to overwhelm is to drop packets. This is true for TCP and UDP to trigger congestion control. Middleware has gotten way too clever by half and we have bufferbloat. To drop packets you don’t need knowledge of streams - just that you have a non-skewed distribution you apply to dropping the packets so that proportionally all traffic overwhelming you from a source gets equally likely to be dropped. This ironically improves performance and latency because well behaving protocols like TCP and QUIC will throttle back their connections and UDP protocols without throttling will just deal with elevated error rates.
So what? You dropping packets, and they are still coming, eating BW and buckets.
Because traditionally UDP did not have any flow control, you just treat is as kinda CBR traffic and so you just want to leave it queues as fast as it can.
If there was a lot of TCP traffic around, you just drop packets there and vioala, congestion kick sin and you have more room for importand UDP traffic.
Now, if you start to drop UDP packets your UX drops.. packet loss in FPS games is
terrible, even worse than a bit of jitter. Thank you.
But your complaint is about QUIC, not generic UDP. QUIC implements TCP-like flow control on top of UDP, designed to play well with TCP congestion control.
QUIC does play well with others. It's just implemented in the userspace QUIC library instead of the network stack.
I really don’t follow your complaint. QUIC (and other similar UDP protocols like uTP used for BitTorrent) implement congestion control. If packets get dropped, the sender starts backing off which makes you a “fair” player on the public internet.
As for gaming, that remains an unsolved problem, but QUIC being UDP based isn’t any different than TCP. It’s not like middleware boxes are trying to detect specific UDP applications and data flows to prioritize protecting gaming traffic from drops, which I think is what you’re asking for.
Now, I wish ToS/QoS were more broadly usable for traffic prioritization.
It sounds like you're using UDP vs. TCP as a proxy for ToS/QoS. At a minimum, you're still going to have a bad time with TCP streams getting encapsulated in UDP WireGuard VPN connections.
Is your complaint fundamentally that it's harder to tell the difference between games/voip and browser activity if you can't just sort TCP versus UDP?
That's true, but it's not that big of a deal and definitely doesn't make QUIC "junk". Looking at the port will do 90% of the job, and from what I can tell it's easy to look at a few bytes of a new UDP stream to see if it's QUIC.
The quick test is that the first packet is generally going to start with hex C[any nibble]00000001 and be exactly 1200 bytes long, ending with a bunch of 0s.
A better test is to see if the full header makes sense, extract the initial encryption key, and check if the rest of the packet matches the signature.
This is correct, you can recognize Initial packets easily/reliably, and they contain the connection IDs in plaintext so that a stateful packet filter/shaper can recognize individual data flows.
Note that packet numbers (and everything else other than port/IP) are encrypted, so you can't do more than drop or delay the packets. But blocking/shaping/QoS is perfectly doable.
I want to be clear that I was talking about checking the encryption of the very first packet, which isn't secret yet.
Once keys are already established I don't see any particularly reliable test for a single packet, but as you say the connection ids are accessible so if they're in the right place in both directions then that looks good for discovering a QUIC flow.
Somewhat the point. The issue we've had is that multiple ISPs and gov have been "policing" TCP in unsavory ways. Security and QoS are just fundamentally at odds with each other.
QoS is still possible with QUIC: initial connection IDs are in plaintext, and while you can't modify packets (or even see the packet numbers) you can drop or (ugh) delay them.
On the level of entire networks serving multiple end consumers and businesses, I really hope that ISPs get bigger pipes instead of trying to shape traffic based on type. I'm fine with making traffic shaping on a local network a little harder, if it ends up biting those who oppose net neutrality (or want to use middleboxes to screw up traffic in various technical-debt-fuelled ways).
Upvoted because I think you bring up some interesting challenges, but you might consider a softer tone in the future. (Calling the OP "selfish" goes against site guidelines, and generally doesn't make people open to what you're saying.)
That selfish was NOT the the OP. It was for general audiency who prefer all the bandwidth for themselfs. We know how most people behave. They do NOT really care what others are doing. For years, I was proud of my QoS because my entire home could utilize my (not so fast Internet) and I always could do gaming, because everything was QoS correctly. Nothing fancy, just separating TCP vs UDP and futher, doing some tuning between TCP bulk vs interactive traffic. Same went to UDP, some separation for gaming/voip/interactive vs VPN (bulk). HTB is pretty decent for this.
I don’t think they’re super valid concerns though - QUIC isn’t just dumb UDP that transmits as fast as it can, it has congestion control, pacing etc. built in that’s pretty similar to certain TCP algorithms, just it’s happening at the application layer instead of the kernel/socket layer handling it. In the design of these algorithms, fairness is a pretty key design criteria.
If anything, potentially QUIC lets people try better congestion control algorithms without having to rebuild their kernels which could make the web better if anything…
Read up until author is confusing HTTP/1 with TCP, claiming that it's TCP's fault that we must make multiple connections to load a website.
Actually, TCP allows continuous connection and sending as much stuff as you want -- in other words, stateful protocol. It was HTTP/1 that was decidedly stateless.
Sessions in web sites are direct consequence of the need to keep state somewhere.
There is though a fundamental mismatch between TCP, and the problem HTTP (any version) needs to solve: TCP is for sending stream of bytes, reliably and in the same order they were sent. HTTP needs to transmit some amount of data, reliably.
The only aspect of TCP HTTP really wants here is reliability. The order we don’t really care about, we just want to load the whole web page and all associated data (images, scripts, style sheet, fonts…), in a way that can be reassembled at he other end. This makes it okay to send packets out of order, and doing so automatically solves head-of-line blocking.
This is a little different when we start streaming audio or video assets: for those, it is often okay for reliability to take a hit (though any gap must be detected). Losing a couple packets may introduce glitches and artefacts, but doesn’t necessarily renders the media unplayable. (This applies more to live streams & chats though. For static content most would prefer to send a buffer in advance, and go back to sacrifice ordering in order to get a perfect (though delayed) playback.)
In both use cases, TCP is not a perfect match. The only reason it’s so ubiquitous anyway is because it is so damn convenient.
That’s a different issue than what parent is talking about: HTTP definitely needs each individual resource to be ordered, what it does not need is for different resources to be ordered relative to one another, which becomes an issue when you mux multiple requests concurrently over a single connection.
> This is a little different when we start streaming audio or video assets: for those, it is often okay for reliability to take a hit (though any gap must be detected). Losing a couple packets may introduce glitches and artifacts, but doesn’t necessarily renders the media unplayable.
This is exactly what UDP is for. There is nothing wrong with TCP and UDP at the transport layer, both do their job and do it well.
The whole HTTP request/response cycle has led to a generation of developers that cannot conceive of how to handle continuous data streams, it's extraordinary.
I have seen teams of experienced seniors using websockets and then just sending requests/responses over them as every architecture choice and design pattern they were familiar with required this.
Then people project out from their view of the world and assume the problem is not with what they are doing but in the other parts of the stack they don't understand, such as blaming TCP for the problems with HTTP.
I switched from web dev to data science some years ago, and surprisingly couldn't find a streaming parallelizer for Python -- every package assumes you loaded the whole dataset in memory. Had to write my own.
Same in video parsers and tooling frequently, expects a whole mp4 to be there, or a whole video to parse it, yet gstreamer/ffmpegapi delivers the content as a stream of buffers that you have to process one buffer at a time.
Traditionally, ffmpeg would build the mp4 container while transcoded media is written to disk (in a single contiguous mdat box after ftyp) and then put the track description and samples in a moov at the end of the file. That's efficient because you can't precisely allocate the moov before you've processed the media (in one pass).
But when you would load the file into a <video> element, it would off course need to buffer the entire file to find the moov box needed to decode the the NAL units (in case of avc1).
A simple solution was then to repackage by simply moving the moov at the end of the file before the mdat (adjusting chunk offset). Back in the day, that would make your video start instantly!
This is basically what cmaf is. the moov and ftyp gets sent at the beginning (and frequently gets written as an init segment) and then the rest of the stream is a continuous stream of moof's and mdat's chunked as per gstreamer/ffmpeg specifics.
I was thinking progressive MP4, with sample table in the moov. But yes, cmaf and other fragmented MP4 profiles have ftyp and moov at the front, too.
Rather than putting the media in a contiguous blob, CMAF interleaves it with moofs that hold the sample byte ranges and timing. Moreover, while this interleaving allows most of the CMAF file to be progressively streamed to disk as the media is created, it has the same CATCH22 problem as the "progressive" MP4 file in that the index (sidx, in case of CMAF) cannot be written at the start of the file unless all the media it indexes has been processed.
When writing CMAF, ffmpeg will usually omit the segment index which makes fast search painful. To insert the `sidx` (after ftyp+moov but before the moof+mdat s) you need to repackage (but not re-encode).
It is possible that this is not a fault of the parser or tooling. In some cases, specifically when the video file is not targeted for streaming, the moov atom is at the end of the mp4. The moov atom is required for playback.
That's intentional, and it can be very handy. Zip files were designed so that you make an archive self-extracting. They made it so that you could strap a self-extraction binary to the front of the archive, which - rather obviously - could never have been done if the executable code followed the archive.
But the thing is that the executable can be anything, so if what you want to do is to bundle an arbitrary application plus all its resources into a single file, all you need to do is zip up the resources and append the zipfile to the compiled executable. Then at runtime the application opens its own $0 as a zipfile. It Just Works.
Also, it makes it easier to append new files to an existing zip archive. No need to adjust an existing header (and potentially slide the whole archive around if the header size changes), just append the data and append a new footer.
I’ve found the Rust ecosystem to be very good about never assuming you have enough memory for anything and usually supporting streaming styles of widget use where possible.
ha! I was literally thinking of the libs for parsing h264/5 and mp4 in rust (so not using unsafe gstreaer/ffmpeg code) when moaning a little here.
Generally i find the rust libraries and crates to be well designed around readers and writers.
My experience that played out over the last few weeks lead me to a similar belief, somewhat. For rather uninteresting reasons I decided I wanted to create mp4 videos of an animation programmatically.
The first solution suggested when googling around is to just create all the frames, save them to disk, and then let ffmpeg do its thing from there. I would have just gone with that for a one-off task, but it's a pretty bad solution if the video is long, or high res, or both. Plus, what I really wanted was to build something more "scalable/flexible".
Maybe I didn't know the right keywords to search for, but there really didn't seem to be many options for creating frames, piping them straight to an encoder, and writing just the final video file to disk. The only one I found that seemed like it could maybe do it the way I had in mind was VidGear[1] (Python). I had figured that with the popularity of streaming, and video in general on the web, there would be so much more tooling for these sorts of things.
I ended up digging way deeper into this than I had intended, and built myself something on top of Membrane[2] (Elixir)
It sounds like a misunderstanding of the MPEG concept. For an encode to be made efficiently, it needs to see more than one frame of video at a time. Sure, I-frame only encoding is possible, but it's not efficient and the result isn't really distributable. Encoding wants to see multiple frames at a time so that the P and B frames can be used. Also, to get the best bang for the bandwidth buck is to use multipass encoding. Can't do that if all of the frames don't exist yet.
You have to remember how old the technology you are trying to use is, and then consider the power of the computers available when they were made. MPEG-2 encoding used to require a dedicated expansion card because the CPUs did have decent instructions for the encoding. Now, that's all native to the CPU which makes the code base archaic.
No doubt that my limited understanding of these technologies came with some naive expectations of what's possible and how it should work.
Looking into it, and working through it, part of my experience was a lack of resources at the level of abstraction that I was trying to work in. It felt like I was missing something, with video editors that power billion dollar industries on one end, directly embedding ffmpeg libs into your project and doing things in a way that requires full understanding of all the parts and how they fit together on the other end, and little to nothing in-between.
Putting a glorified powerpoint in an mp4 to distribute doesn't feel to me like it is the kind of task where the prerequisite knowledge includes what the difference between yuv420 and yuv422 is or what Annex B or AVC are.
My initial expectation was that there has to be some in-between solution. Before I set out, what I had thought would happen is that I `npm install` some module and then just create frames with node-canvas, stream them into this lib and get an mp4 out the other end that I can send to disk or S3 as I please.* Worrying about the nitty gritty details like how efficient it is, many frames it buffers, or how optimized the output is, would come later.
Going through this whole thing, I now wonder how Instagram/TikTok/Telegram and co. handle the initial rendering of their video stories/reels, because I doubt it's anywhere close to the process I ended up with.
* That's roughly how my setup works now, just not in JS. I'm sure it could be another 10x faster at least, if done differently, but for now it works and lets me continue with what I was trying to do in the first place.
This sounds like "I don't know what a wheel is, but if I chisel this square to be more efficient it might work". Sometimes, it's better to not reinvent the wheel, but just use the wheel.
Pretty much everyone serving video uses DASH or HLS so that there are many versions of the encoding at different bit rates, frame sizes, and audio settings. The player determines if it can play the streams and keeps stepping down until it finds one it can use.
Edit:
>Putting a glorified powerpoint in an mp4 to distribute doesn't feel to me like it is the kind of task where the prerequisite knowledge includes what the difference between yuv420 and yuv422 is or what Annex B or AVC are.
This is the beauty of using mature software. You don't need to know this any more. Encoders can now set the profile/level and bit depth to what is appropriate. I don't have the charts memorized for when to use what profile at what level. In the early days, the decoders were so immature that you absolutely needed to know the decoder's abilities to ensure a compatible encode was made. Now, the decoder is so mature and is even native to the CPU, that the only limitation is bandwidth.
Of course, all of this is strictly talking about the video/audio. Most people are totally unawares that you can put programming inside of an MP4 container that allows for interaction similar to DVD menus to jump to different videos, select different audio tracks, etc.
> This sounds like "I don't know what a wheel is, but if I chisel this square to be more efficient it might work". Sometimes, it's better to not reinvent the wheel, but just use the wheel.
I'm not sure I can follow. This isn't specific to MP4 as far as I can tell. MP4 is what I cared about, because it's specific to my use case, but it wasn't the source of my woes. If my target had been a more adaptive or streaming friendly format, the problem would have still been to get there at all. Getting raw, code-generated bitmaps into the pipeline was the tricky part I did not find a straightforward solution for. As far as I am able to tell, settling on a different format would have left me in the exact same problem space in that regard.
The need to convert my raw bitmap from rgba to yuv420 among other things (and figuring that out first) was an implementation detail that came with the stack I chose. My surprise lies only in the fact that this was the best option I could come up with, and a simpler solution like I described (that isn't using ffmpeg-cli, manually or via spawning a process from code) wasn't readily available.
> You don't need to know this any more.
To get to the point where an encoder could take over, pick a profile, and take care of the rest was the tricky part that required me to learn what these terms meant in the first place. If you have any suggestions of how I could have gone about this in a simpler way, I would be more than happy to learn more.
using the example of ffmpeg, you can use things like -f in front of -i to describe what the incoming format is so that your homebrew exporting can send to stdout piped to ffmpeg where reads from stdin with '-i -' but more specifically '-f bmp -i -' would expect the incoming data stream to be in the BMP format. you can select any format for the codecs installed 'ffmpeg -codecs'
In a way, that's good. The few hundred video encoding specialists who exist in the world have, per person, had a huge impact on the world.
Compare that to web developers, who in total have had probably a larger impact on the world, but per head it is far lower.
Part of engineering is to use the fewest people possible to have the biggest benefit for the most people. Video did that well - I suspect partly by being 'hard'.
There are many packages that can do that, like Vax [1] and Dask [2]. I don't know exactly your workflow. But the concurrency in python is limited to multiprocessing, which is much expensive than threads which usually a typical streaming parallelizer will use outside python world.
I've looked into the samples and recall what the problem was: geopandas was in an experimental branch, and you had to lock yourself into dask -- plus, the geopandas code had to be rewritten completely for dask. So i wrote my own processor that applies the same function in map&reduce fashion, and keeps code compatible with jupyter notebooks -- you decorate functions to be parallelizeable, but still import them and call normally.
https://github.com/culebron/erde
This is actually the old SAX vs DOM xml parsing discussion in disguise.
SAX is harder but has at least two key strongly related benefits: 1. Can handle a continuous firehose 2. Processing can start before the load is completed (because it might never be completed) so the time to first useful action can be greatly reduced.
> On an average desktop/server system the OS would automatically take care of putting whatever fits in RAM and the rest on disk.
This is not true, unless you’re referring to swap (which is a configuration of the system and may not be big enough to actually fit it either, many people run with only a small amount of swap or disable it altogether.)
You may be referring to mmap(2), which will map the on-disk dataset to a region of memory that is paged in on-demand, but somehow I doubt that’s what OP was referring to either.
If you just read() the file into memory and work on it, you’re going to be using a ton of RAM. The OS will only put “the rest on disk” if it swaps, which is a degenerate performance case, and it may not even be the dataset itself that gets swapped (the kernel may opt to swap everything else on the system to fit your dataset into RAM. All pages are equal in the eyes of the virtual memory layer, and the ranking algorithm is basically an LRU cache.)
Fair enough, it’s totally possible that’s what they meant. But the complaint of “every package assumes you loaded the whole dataset in memory” seems to imply the package just naively reads the file in. I mean, if the package was mmapping it, they probably wouldn’t have had much trouble with memory enough for it to be an issue they’ve had to complain about. Also, you may not always have the luxury of mmap()’ing, if you’re reading data from a socket (network connection, stdout from some other command, etc.)
I don’t do much python but I used to do a lot of ruby, and it was rare to see anyone mmap’ing anything, most people just did File.read(path) and called it a day. If the norm in the python ecosystem is to mmap things, then you’re probably right.
That’s a really misleading thing to say. If the kernel already has the thing you’re read()’ing cached, then yes the kernel can skip the disk read as an optimization. But by reading, you’re taking those bytes and putting a copy of them in the process’s heap space, which makes it no longer just a “cache”. You’re now “using memory”.
read() is not mmap(). You can’t just say “oh I’ll read the file in and the OS will take care of it”. It doesn’t work that way.
> If the kernel already has the thing you’re read()’ing cached
which it would do, if you have just downloaded the file.
> But by reading, you’re taking those bytes and putting a copy of them in the process’s heap space
i mean, you just downloaded it from the network, so unless you think mmap() can hold buffers on the network card, there's definitely going to be a copy going on. now it's downloaded, you don't need to do it again, so we're only talking the one copy here.
> You can’t just say “oh I’ll read the file in and the OS will take care of it”. It doesn’t work that way.
i can and do. and you've already explained swap sufficiently that i believe you know you can do exactly that also.
Please keep in mind the context of the discussion. A prior poster made a claim that they can read a file into memory, and that it won’t actually use any additional memory because the kernel will “automatically take care” of it somehow. This is plainly false.
You come in and say something to the effect of “but it may not have to read from disk because it’s cached”, which… has nothing to do with what was being discussed. We’re not talking about whether it incurs a disk read, we’re talking about whether it will run your system out of memory trying to load it into RAM.
> i mean, you just downloaded it from the network, so unless you think mmap() can hold buffers on the network card, there's definitely going to be a copy going on. now it's downloaded, you don't need to do it again, so we're only talking the one copy here.
What in god’s holy name are you blathering about? If I “just downloaded it from the network”, it’s on-disk. If I mmap() the disk contents, there’s no copy going on, it’s “mapped” to disk. If I read() the contents, which is what you said I should do, then another copy of the data is now sitting in a buffer in my process’s heap. This extra copy is now "using" memory, and if I keep doing this, I will run the system out of RAM. This is characteristically different from mmap(), where a region of memory maps to a file on-disk, and contents are faulted into memory as I read them. The reason this is an extremely important distinction, is that in the mmap scenario, the kernel is free to free the read-in pages any time it wants, and they will be faulted back again in if I try to read them again. Contrast this with using read(), which makes it so the kernel can't free the pages, because they're buffers in my process's heap, and are not considered file-backed from the kernel's perspective.
> i can and do. and you've already explained swap sufficiently that i believe you know you can do exactly that also.
Swap is disabled on my system. Even if it wasn’t, I’d only have so much of it. Even if I had a ton of it, read()’ing 100GB of data and relying on swap to save me is going to grind the rest of the system to a halt as the kernel tries to make room for it (because the data is in my heap, and thus isn’t file-backed, so the kernel can’t just free the pages and read them back from the file I read it from.) read() is not mmap(). Please don’t conflate them.
Yep I do the same. If I have a server with hundreds of GB or even TB of RAM (not uncommon these days) I'm not setting up swap. If you're exhausting that much RAM, swap is only going to delay the inevitable. Fix your program.
> A prior poster made a claim that they can read a file into memory, and that it won’t actually use any additional memory because the kernel will “automatically take care” of it somehow. This is plainly false.
Nobody made that claim except in your head.
Why don't you read it now:
Just curious: why would you not load the entire dataset "into memory" ("into memory" from a Python perspective)?
Look carefully: There's no mention of the word file. For all you or I know the programmer is imagining something like this:
>>> data=loaddata("https://...")
Or perhaps it's an S3 bucket. There is no file, only the data set. That's more or less exactly what I do.
On an average desktop/server system the OS would automatically take care of putting whatever fits in RAM and the rest on disk.
You know exactly this what is meant by swap: We just confirmed that. And you know it is enabled on every average desktop server system, because you
> Swap is disabled on my system
are the sort of person who disables the average configuration! Can you not see you aren't arguing with anything but your own fantasies?
> If I “just downloaded it from the network”, it’s on-disk.
That's nonsense. It's in ram. That's the block cache you were just talking about.
> If I mmap() the disk contents, there’s no copy going on, it’s “mapped” to disk
Every word of that is nonsense. The disk is attached to a serial bus. Even if you're using fancy nvme "disks" there's a queue that operates in a (reasonably) serial fashion. The reason mmap() is referred to zero-copy is because it can reuse the block cache if it has been recently downloaded -- but if the data is paged out, there is absolutely a copy and it's more expensive than just read() by a long way.
> Even if it wasn’t, I’d only have so much of it. Even if I had a ton of it, read()’ing 100GB of data and relying on swap to save me is going to grind the rest of the system to a halt as the kernel tries to make room for it
You only have so much storage, this is life, but I can tell you as someone who does operate a 1tb of ram machine that downloads 300gb of logfiles every day, read() and write() work just fine -- I just can't speak for python (or why python people don't do it) because i don't like python.
You're basically just gish galloping at this point and there's no need to respond to you any more. All your points about swap are irrelevant to the discussion. All your points about disk cache are irrelevant to the discussion. You have a very, very, very incorrect understanding of how operating system kernels work if you think mmap() is just a less efficient read():
> Every word of that is nonsense. The disk is attached to a serial bus. Even if you're using fancy nvme "disks" there's a queue that operates in a (reasonably) serial fashion. The reason mmap() is referred to zero-copy is because it can reuse the block cache if it has been recently downloaded -- but if the data is paged out, there is absolutely a copy and it's more expensive than just read() by a long way.
Please do a basic web search for what "virtual memory" means. You seem to think that handwavey nonsense about disk cache means that read() doesn't copy the data into your working set. You should look at the manpage for read() and maybe ponder why it requires you to pass your own buffer. This buffer would have to be something you've malloc()'d ahead of time. Hence why you're using more memory by using read() than you would using mmap()
> You only have so much storage, this is life, but I can tell you as someone who does operate a 1tb of ram machine that downloads 300gb of logfiles every day, read() and write() work just fine -- I just can't speak for python (or why python people don't do it) because i don't like python.
You should definitely learn what the mmap syscall does and why it exists. You really don't need to use 300gb of RAM to read a 300gb log file. You should probably look up how text editors like vim actually work, and why you can seek to the end of a 300gb log file without vim taking up a bunch of RAM. Maybe you've never been curious about this before, I dunno.
Try making a 20gb file called "bigfile", and run this C program:
#include <sys/mman.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <inttypes.h>
#include <unistd.h>
int main(int argc, char *argv[]) {
char *mmap_buffer;
int fd;
struct stat sb;
ssize_t s;
fd = open("./bigfile", O_RDONLY);
fstat(fd, &sb); // get the size
// do the mmap
mmap_buffer = mmap(NULL, sb.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
uint8_t sum;
// Ensure the whole buffer is touched by doing a dumb math operation on every byte
for (size_t i = 0; i < sb.st_size; ++i) {
sum += mmap_buffer[i]; // overflow is fine, sorta a dumb checksum
}
fprintf(stderr, "done! sum=%d\n", sum);
sleep(1000);
}
And wait for "done!" to show up in stderr. It will sleep 1000 seconds waiting for you to ctrl+c at this point. At this point we will have (1) mmap'd the entire file into memory, and (2) read the whole thing sequentially, adding each byte to a `sum` variable (with expected overflow.)
While it's sleeping, check /proc/<pid>/status and take note of the memory stats. You'll see that VmSize is as big as the file you read in (for me, bigfile is more than 20GB):
VmSize: 20974160 kB
But the actual resident set is 968kb:
VmRSS: 968 kB
So, my program is using 968 kB even though it has a 20GB in-memory buffer that just read the whole file in! My system only has 16GB of RAM and swap is disabled.
How is this possible? Because mmap lets you do this. The kernel will read in pages from bigfile on demand, but is also free to free them at any point. There is no controversy here, every modern operating system has supported this for decades.
Compare this to a similar program using read():
#include <sys/mman.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <inttypes.h>
#include <unistd.h>
int main(int argc, char *argv[]) {
int fd;
struct stat sb;
ssize_t s;
char *read_buffer;
fd = open("./bigfile", O_RDONLY);
fstat(fd, &sb); // get the size
// do the read
read_buffer = malloc(sb.st_size);
read(fd, read_buffer, sb.st_size);
uint8_t sum;
// Ensure the whole buffer is touched by doing a dumb math operation on every byte
for (size_t i = 0; i < sb.st_size; ++i) {
sum += read_buffer[i]; // overflow is fine, sorta a dumb checksum
}
fprintf(stderr, "done! sum=%d\n", sum);
sleep(1000);
}
And do the same thing (in this instance I shrunk bigfile to 2 GB because I don't have enough physical RAM to do this with 20GB.) You'll see this in /proc/<pid>/status:
Oops! I'm using 2 GB of resident set size. If I were to do this with a file that's bigger than RAM, I'd get OOM-killed.
This is why you shouldn't read() in large datasets. Your strategy is to have servers with 1TB of ram and massive amounts of swap, and I'm telling you you don't need this to process this big of files. mmap() does so without requiring things to be read into RAM ahead of time.
Oh, and guess what: Take out the `sleep(1000)`, and the mmap version is faster than the read() version:
$ time ./mmap
done! sum=225
./mmap 6.89s user 0.28s system 99% cpu 7.180 total
$ time ./read
done! sum=246
./read 6.86s user 1.72s system 99% cpu 8.612 total
Why is it faster? Because we don't have to needlessly copy the data into the process's heap. We can just read the blocks directly from the mmap'd address space, and let page faults read them in for us.
(Edit: because I noticed this too, and it got me curious, the reason for the incorrect result is that I didn't check the result of the read() call, and it was actually reading fewer bytes, by a small handful. read() is allowed to do this, and it's up to callers to call it again. It was reading 2147479552 bytes when the file size was 2147483648 bytes. If anything this should have made the read implementation faster, but mmap still wins even though it read more bytes. A fixed version follows, and now produces the same "sum" as the mmap):
#include <sys/mman.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <inttypes.h>
#include <unistd.h>
int main(int argc, char *argv[]) {
int fd;
struct stat sb;
ssize_t s;
ssize_t read_result;
ssize_t bytes_read;
char *read_buffer;
fd = open("./bigfile", O_RDONLY);
fstat(fd, &sb); // get the size
// do the read
read_buffer = malloc(sb.st_size);
bytes_read = 0;
do {
read_result = read(fd, read_buffer + bytes_read, sb.st_size - bytes_read);
bytes_read += read_result;
} while (read_result != 0);
uint8_t sum;
// Ensure the whole buffer is touched by doing a dumb math operation on every byte
for (size_t i = 0; i < sb.st_size; ++i) {
sum += read_buffer[i]; // overflow is fine, sorta a dumb checksum
}
fprintf(stderr, "done! sum=%d", sum);
}
Depending on the source of the data, that is not as good as an actual streaming implementation. That is, if the data is coming from a network API, waiting for it to be "in memory" before processing it still means that you have to store the whole stream on the local machine before you even start. Even if we assume that you are storing it on disk and mmapping it into memory that's still not a good idea for many use cases.
Not to mention, if the code is not explicitly designed to work with a streaming approach, even for local data, might mean that early steps accidentally end up touching the whole data (e.g. they look for a closing } in something like a 10GB json document) in unexpected places, costing orders of magnitude more than they should.
JSON discourages but does not forbid duplicate keys. In case of duplicate keys browsers generally let the last instance win. So if you want to be compatible with that, you always must read the whole document.
Example: I'm querying a database, producing a file and storing it in object storage (S3). The dataset is 100 gigabytes in size. I should not require 100 gigabytes of memory or disk space to handle this single operation. It would be slower to write it to disk first.
What is the best approach to handling a continuous stream of data? Is it just 'buffer till you have what you need and pass it off to be processed' approach? And then keep reading until end of stream/forever.
I think what they mean is that most people think that only the following is possible on 1 connection
1. Send Request 1
2. Wait for Response 1
3. Send Request 2
4. Wait for Request 2
while you can do
1. Send Request 1, move on
2. Send Request 2, move on
and have an other process/routine handling response
and potentially you even have Request that can be sent without needing a reponse "user is typing" in XMPP for example.
and even wilder for people using only http, that you can receive Response without a Request ! (i.e that you don't need to implement a GET /messages request, you can directly have your server sending messages)
> Sounds like a worse way of writing async requests,
It's just how it works under the hood, this complexity is quickly abstracted and actually it's how a lot of async requests are implemented, it's just here it's on 1 tcp connection.
> , while the last part is basically what websockets seem to be intended for
yes I was specifically answering that :
> I have seen teams of experienced seniors using websockets and then just sending requests/responses over them as every architecture choice and design pattern they were familiar with required this.
i.e people using websocket like normal http request.
In the case of websockets that is already handled for you.
I think GP talks about how to think about communication (client server).
Stateless request response cycles are much simpler to reason about because it synchronizes messages by default. The state is reflected via this back and forth communication. Even when you do it via JS: the request data is literally in the scope of the response callback.
If you have bi-directional, asynchronous communication, then reading and writing messages is separate. You have to maintain state and go out of your way to connect incoming and outgoing messages semantically. That's not necessarily what you should be doing though.
I know right. The kids these days. Most of them never learned to solder either so how can they assemble their own computers from ics? I caught one of my tech leads using a jet burner lighter and just scorching the whole board. And forget reading core dumps in hex!!! An intern was just putting the hex dump in the chat chibi and asking it what went wrong. Get off my lawn already.
If I recall correctly in TCP a packet loss will cause a signifcant slowdown in the whole data stream since that packet needs to be retransmitted and you generally end up with a broken data stream until that happens (even tho TCP can continue to send more data in the meantime meaning it's more of a temporary hang than a block). Thus if you are sending multiple different data streams over the same TCP connection a packet loss will temporarily hang the processing of all data streams. A constrain that QUIC doesn't have.
A somewhat decent analogy of thinking about TCP is a single lane queue with a security guy who's ordering people around and making sure the queue is orderly and doesn't overflow. He has pretty much no respect for families, groups or overall efficiency. He only cares about the queue itself.
> I suppose it's easier since it's all in userland.
I doubt applications would provide more reliable information to the QUIC library than they would to the kernel.
The main difference as I understand it is that QUIC allows multiple separate TCP-like streams to exist on the same negotiated and encrypted connection. It's not fundamentally different from simply establishing multiple TLS over TCP connections with separate windows, it just allows you to establish the equivalent of multiple TLS + TCP connections with a single handshake.
The key point is that if I want to load a webpage that needs to download 100 different files, I don't care about the order of those 100 files, I just want them all to be downloaded in order. TCP makes you specify an order which means that if one packet gets lost, you have to wait for it. QUIC lets you say "here are the 100 things I want" which means that if you lose a packet, that only stops 1 of the things so the other 99 can continue asking for more packets.
QUIC improves on TCP stream windowing/reassembly in two ways:
- as noted by another commenter, you can send multiple streams in a single connection, so a lost packet for one stream does not necessarily affect others.
- QUIC sends ACK as a bitmap instead of a single number (TCP only sends the number of the highest continuous stream byte seen). Senders can see that one packet was lost, but that the packets following were successfully received. For long/fat pipes this prevents the sender from re-transmitting packets following the single lost packet.
TCP is a streaming protocol. You can build whatever multiplexing scheme on top like h2 does, but you simply can’t escape TCP head of the line blocking, as it’s just a single undelimited stream underneath.
As an aside, I only truly grasped the stream nature of TCP when I started dissecting and reassembling packets. The first ever reassembly program I wrote was hopelessly wrong because I was treating TCP packet boundary as data packet boundary, but in fact higher level protocol (HTTP/RTMP/etc.) data packet boundaries have nothing to do with TCP packet boundaries at all, it’s a single continuous stream that your higher level protocol has to delimit on its own.
It's common in restricted environments. Egress for 80/443 allowed and DNS must use local recursive DNS servers. Those internal DNS servers probably pass through a few SIEM and other security devices and are permitted out, usually to minimize data exfiltration. Though in those cases 80 and 443 are often using a MITM proxy as well for deep packet inspection. There are both commercial and open source MITM proxies. Fans of HTTP/3 and QUIC would not like most of the MITM proxies as they would negotiate a specific protocol with the destination server and it may not involve QUIC.
I worked in an environment with similar setup. First step for all devices allowed to connect to the network was to install the companies custom CA root certificate. There are a lot of sharp edges in such a setup (like trying to get Charles or other debugging proxies to work reliably). But in highly sensitive environments it would seem the policy is to MiTM every packet that passes through the network.
I wasn't involved but another team did some experimenting with HTTP/2 (which at the time was still very early in its rollout) and they were struggling with the network team to get it all working. Once they did get it to work it actually resulted in slightly less performant load times for our site and the work was de-prioritized. I recall (maybe incorrectly) that it was due to the inefficiency of forcing all site resources through a single connection. We got better results when the resources were served from multiple domains and the browser could keep open multiple connections. But I wasn't directly involved so I only overheard things and my memory might be fuzzy on the details of what I overheard.
Funnily enough, we did have to keep a backup network without the full snooping proxy for things like IoT test devices (including some smart speakers and TVs) since installing certs on those devices was sometimes impossible. I assume they were still proxying as much as they could reasonably manage.
And this was a principled stance. HTTP/1 was for serving web pages, and web pages are documents. Then we started generating web pages from databases, which was pretty useful, but also started the slippery slope to us hijacking the web and turning it into a network of applications running on a nested operating system.
That's a slightly different question (ie. how do we pragmatically tunnel our data between applications) but yes, that evolutionary race to the bottom (or port 80/443 as you say) also sucks.
I could not find that claim in the article, but perhaps you are
referring to an earlier version of the article that the author then
updated? The performance problem with TCP is that it imposes a strict
ordering of the data. The order in which clients receive data is the
same as the order in which servers send data. So if some packet fail
to reach the client, the packet has to be retransmitted before
subsequent packets can be sent (notwithstanding the relatively few
non-ACKed packets that are allowed to be in flight). I think this
problem is what the author is referring to. And that it would be nigh
impossible to get all network equipment that sits between a client and
a server to a more latency-friendly version of TCP.
The author obviously knows this. As others have noted, even though HTTP over TCP supports multiple requests they must be accomplished consecutively unless you use multiple TCP connections.
Growing from 0 to ~27% of traffic in 2 years is great, but the graph from TFA shows that it was not a gradual increase but rather two big jumps in adoption:
- One in mid 2021, to ~20%.
- A second one in July 2022, to ~29%.
Since then the traffic share has been pretty flat. I don't think that counts as "eating the world" at all tbh.
Is the graph showing the number of requests served by HTTP/3, or the number of individual hosts that support it? I believe it's the former, and it's explainable by the fact that most people these days visit only the same handful of siloes: Google, Amazon, Facebook, Twitter and Youtube probably make 90+% of all anonymised requests seen by Mozilla. But I doubt even 20% of global webservers are HTTP/2 ready, let alone HTTP/3.
So HTTP/3 is eating the world, if by world you only count FAANG websites, which sadly seems to be the case.
Remains to be seen how many websites are hidden behind tech's favorite MITM, Cloudflare, which might make figures a little harder to discern.
Right, and those big jumps are more to do with large Cloud providers or proxy providers switching it on, and less to do with web applications themselves switching it on.
It's great that the upgrade path for the end clients is so easy, but it doesn't reflect that the environment for web application developers has shifted towards HTTP/3
Intetestingly (or rather not surprising at all?) http3 is eating away the http2 traffic, while http1 just stays on a very slow decline.
Since http1 webservers are usually stuck there for a (more or less) good reason, they won't be converting to http3 anytime soon, while already more modern webservers are easier to upgrade from http2 to 3.
> Intetestingly (or rather not surprising at all?) http3 is eating away the http2 traffic, while http1 just stays on a very slow decline.
HTTP1: the OG, meant for a much smaller individual scale. So great even today for smaller websites. Where smaller right now, with modern hardware, can actually still mean hundreds of requests per second and millions per day...
HTTP2/HTTP3: BigCorp optimization primarily for problems they're facing and not that many others are.
So viewed through this lens it's clear what's happening.
BigCorps are just moving from HTTP2 to HTTP3 and mom and pops don't give a crap, HTTP1 is good enough.
HTTP2 is useful even if you just want to show multiple images on a page. For HTTP1 you have to add all of your images to a sprite sheet to avoid sending a lot of requests.
Apache2 still doesn't support HTTP/3 (though I hear nginx recently added it). So yeah, the increasing adoption numbers seem to be major websites like Google and YouTube, and major CDNs like Cloudflare and Cloudfront.
QUIC's promise is fantastic, and latency-wise it's great. And probably that's what matters the most for the web.
However I have run into issues with it for high-throughput use-cases. Since QUIC is UDP based and runs in user-space, it ends up being more CPU bound than TCP, where processing often ends up being done in the kernel, or even hardware.
In testing in a CPU constrained environment, QUIC (and other UDP-based protocols like tsunami) capped out at ~400Mbps, CPU pegged at 100%. Whereas TCP+TLS on the same hardware could push 3+Gbps.
It'll be interesting to see how it plays out, since a goal of QUIC is to be an evolving spec that doesn't get frozen in time, yet baking in to the kernel/hardware might negate that.
Luckily, there are ways to reduce syscalls (like Generic Segmentation Offload and other tricks[1]). But I agree that not having things run in the kernel makes it more challenging for high-throughput scenarios.
I can't agree with author about it eating the world though. It seems like only internet giants can afford implementing and supporting protocol this complex, and they're the only ones who will get a measurable benefit from it. It is an upgrade for sure, but an expensive one.
One thing where the semantics are not the same between HTTP/1.1 and HTTP/2/3(?) is the `Host` header, which is often (always?) gone in the latter in favor off the `:authority` pseudo-header.
Apps/scripts may rely on `Host`, but the underlying HTTP/2 server software might not normalize/expose it in a backwards-compatible way (e.g. Hyper in Rust).
Technically I guess it’s the browsers fault it doesn’t set it.
I would be curious to know if there are other such discrepancies that apps migh run into when enabling HTTP/2/3.
You'll be fine, there will be http2 for the next couple of decades at least. By then you'll be saying "Siri set up nginx for reverse proxying my new email server and open-facebook federated instance, oh and turn on the coffee maker, i'm gonna grab another 15 minutes of sleepy time"
Seeing the same thing both in Firefox and Chrome, every response from blog.apnic.net comes over HTTP/2, none of the HTTP/3 ones are from any apnic.net requests.
Strange that a registry, who should have really good idea about what kind of infrastructure they run, would get which protocol they're using wrong.
Since QUIC is UDP based, there are performance issues that become apparent at higher rates. In TCP, a single write can write a large amount of data. You can even use sendfile to directly send the contents of a file to a TCP socket, all in kernel space. In UDP, you need to write out data in chunks, and you need to be aware of the link MTU as IP fragments can be very bad news at higher rates. This means an app needs to context switch with the kernel multiple times to send a large amount of data, whereas only a single context switch is required for TCP. There are newish system calls such as sendmmsg which alleviate the strain a bit, but they are not as simple as a stream oriented interface.
The QUIC implementations are primarily in user space, so you end up with multiple QUIC implementations in whatever language on any given system.
Hopefully Linux will add a QUIC kernel implementation that will bypass and overcome the traditional UDP shortcomings in the current stack.
Not necessarily. With GSO, you can send near 64K datagrams, which will then get split into MTU sized datagrams in the driver or on the card. But if you're sending a 1G file, that's a lot of 64K writes.
And then you have to consider what is seen on the other end. Is GRO configured on all the endpoints these datagrams are going to? They won't necessarily see those near 64K datagrams, but lots of smaller ones.
I'm extremely sceptical of anything proposed by Google especially because they are building some truly evil stuff lately (like FLoC/Topics and WEI). I really view them as the enemy of the free internet now.
But QUIC doesn't really seem to have their dirtbag shadow hanging over it. Perhaps I should try to turn it on.
Google just ignored anything Windows needed in HTTP and included everything they needed — to the point that the compression in HTTP/3 is basically tuned for www.google.com and no one else.
Frankly, I am a bit disappointed with the article. I fail to follow the argument that an encrypted header makes it easier to adopt HTTP/3 & QUIC because middleboxes cannot see the header. With HTTP/1.1 & TCP, the middleboxes should not be changing the TCP packets anyway, no?
Also, the author does not point out that QUIC builds on UDP instead of directly building on IP like TCP does.
Even though they arguably shouldn't, some middleboxes assume that protocols don't change. They have made it hard for protocols such as TCP and TLS to evolve without breaking things.
Similarly, middleboxes have made it unviable to deploy protocols that aren't TCP or UDP based.
I did not know that protocol ossification was a such a thing. Thanks for the link, it's an interesting article.
It says that middleboxes near the edge are more likely to be the cause of ossification. Are there any stats about that? Such as some manufacturers or software companies "infringing" on the end-to-end principle more often than others?
Not sure about stats but if you hang out in networking forums, you'll see netops complaining about bad implementations of protocols in their networking gear forever. This has been a huge problem in several protocols, everything from SCTP to IPSec to UDP RTSP.
Will the spread of QUIC improve low latency services such as video game servers? Last I checked they were using either gRPC or homecooked protocol over UDP.
Not really. Video game servers definitely don't want session oriented protocols, and don't really care about dropped packet. Lost the packet with the player position? It comes at a regular interval, so we'll get it anyways, and more up to date! All they care about is the best latency possible. As for the non-time sensitive stuff in games (scoreboard, nicknames, etc), there is no real benefit there either. It's not time sensitive, the connection get opened once at the start and that's it.
Where QUIC really shines if for fast connections and many streams, ie the web where a page will need 500 resources spread across 100 hosts. In this case session establishment speed and parallel streams are paramount.
Not a whole lot, but the connection identifier will probably make it much more seamless when your phone switches from wifi to mobile networks. Right now, that's somewhat disruptive; with QUIC, you might not even notice.
I was thinking the same thing. One item I thought about was "H/3 Adoption Grows Rapidly" is paired with a graph that shows that adoption is absolutely flat over the last year.
I wouldn't expect HN to gain much by moving to HTTP/2 or HTTP/3. Loading this page with a cold cache is just 60kB, divided as 54kB of HTML, 2kB of CSS, and 2kB of JS, plus three tiny images. And on follow-up pageviews you only need new HTML.
If the whole web was like this then we wouldn't need new versions of HTTP, but of course it isn't.
But would it gain something? I'm wondering if HTTP/3 could be enabled in one of my websites, which is even much smaller, and which has no blocking requests. I don't mind if the gains are small, as long as the impact is not negative for most visitors. I'm mostly concerned about some of my visitors visiting the website through an unreliable connection (mobile).
It should gain a small amount; it is a more efficient protocol, with better compression and including 0-RTT for repeat visitors. But I doubt it would be noticeable on such a light site.
The main thing it would get is 2x faster loading on high latency networks (because the TLS handshake happens with the same packets as the QUIC handshake).
Yes, I use this site as the paragon of page weight. When other sites I visit are routinely throwing 200MB (MB!) pages at me, this site is throwing 60KB, of which 54KB is content! It's the benchmark I measure all my design against.
I have exactly the opposite view! HN load is fast but the TTFB is quite slow compared to HTTP/3 websites. On blogs that are using HTTP/3, sometimes I don't see loading time, it's instantaneous. On HN, just checking the dev tools, the TCP+TLS handshake is slower than the time it takes to make DNS request and loading the page data. I think HN would really benefit from using HTTP/3.
This made me realize that I don't really understand basic concepts.
For instance, when connecting to a webserver on a given port, is it up to the kernel to take packets of data and assemble them into ... streams (? - not sure about the correct term here) that the server can then read and process? For some time I had the incorrect idea that a connection to a webserver would ultimately get its own port on the server side. In reality the only port that is randomly assigned is on the client side (from what I understand).
Based on the above, when a server does not use threading to process multiple incoming streams (?), the kernel is, I guess, buffering up data from multiple clients while waiting for the server process to read and process the one stream the kernel has assembled for it?
Or I am still completely a dunce about how this works and the above is still wildly incorrect. I would like to better understand but honestly I don't know where to start.
On a very low level even on a threaded server that can receive and process multiple HTTP requests how exactly is the incoming data on the single server socket turned into something that each thread uses for a particular client? Especially in the context of something like http smuggling... I just don't understand the fundamental mechanism of multiple tcp connects on a single server port are turned into individual http requests... With http smuggling it makes it sound like all of the data is actually just a single stream. But if that is the case, how does threaded processing of http requests even happen?
This “depends” on a lot of factors. Under Windows, the HTTP stream is decoded by the kernel! Even when this feature is turned off, there’s still a system service for enabling port sharing so that multiple services can listen on the same port (but different binding paths).
The opposite end of the spectrum is “user mode networking” where the kernel is completely bypassed and the server process accepts data directly from the NIC. This is rare but used in high performance systems that need to process hundreds of gigabits.
Simple HTTP/1.1 would've been perfectly adequate if the web wasn't collapsing under its own weight made up of megabytes of JavaScript firing off tens of requests a second.
I'm not sure about that. I noticed that some websites (even simple blogs) felt instantaneous when loading (after pressing enter or clicking a link), way much faster than other websites. So I opened the dev tools, and the first request was HTTP/3. But all resources were still loaded with HTTP/2! HTTP/3 really brings something not negligible. Those accumulating TCP/TLS handshakes (+WIFI high latency) are really a burden and degrade the user experience when you have a fast internet connection.
I just recently found out that even if browser supports up to http3, it still up to the browsers to decide which protocol to use even if the browser supports http3 too, this was dishearten to find out that you don't have control of forcing the browser to use http2 or 3 specially if you have features that only worked on http3 and was broken on http2, I guess I should have just fixed the implementation on http2
You don't have control because browser might not support of http3 at all. It's up to browser developers to decide when their support levels are mature enough to use by default. There's no other way of doing it.
> This means that metadata, such as [...] connection-close signals, which were visible to [...] all middleboxes in TCP, are now only available to the client and server in QUIC.
Do I understand that correctly, that middelboxes are not supposed to know whether or not a connection still exists? Then how exactly is NAT going to work?
But then the "close signal" is in practical terms not encrypted at all: You close a connection by not sending data for 30 seconds. That's a signal that's plainly visible to network intermediaries. So then why not eliminate one risk of bad middlebox heuristics and expose the actual flag?
Not sending an explicit signal prevents certain injection attacks, but I'm not sure how much that would matter in practice. They would have to be some sort of off-path attack so the attacker can't just end the connection by blocking it.
I get that major participants have switched but what's the developer experience like?
It's a while since I read up on this but previously it seems relatively hard to get set up for any local usage of http3 - has that changed recently?
Realistically, http/3 growth is likely largely thanks to a epic ton of the web being delivered from cloudflare for the last mile and when they enabled http3... ya... thats a lot of sites suddenly being served on quic.
It looks bizarre to me to see ppl complaining "if there's no Google services my phone's gonna be a brick when traveling in China", while in so many posts in HN, people are digging substitutes for evil Google. (Yeah I know they might not be the same group of people)
It's incredible that we need to assist to this farce where it is pretended that thanks to the tech giants the web is becoming faster. The average web latency is terrible, and this because of the many useless framework/tracking layers imposed by such tech giants. And HTTP/3 is a terrible protocol: for minor improvements that are interesting only for the narrow-data-driven mentality of somebody in some Google office, the web lost its openness and simplicity. Google ruined the Internet piece by piece, with Adsense and AD-driven SERP results, with complexity, buying and destroying protocols one after the other, providing abandonware-alike free services stopping any potential competition and improvement.
HTTP/3 is becoming more popular because Microsoft and Google white-washed google's QUIC through the IETF. It's popular because what they call "HTTP/3" is perfectly designed for meeting the needs of a multi-national corporation serving web content to millions of people. But it's only megacorps using it for their walled gardens.
It's a terrible protocol for the human person web. You can't even host a visitable website for people who use HTTP/3 without getting the permission of a third party corporation to temporarily lend you an TLS cert. Once Chrome drops HTTP/1.1 support say goodbye to just hosting a website as a person free from any chains to corporate ass-covering. While LetsEncrypt is benign now, it won't always stay that way as more and more people use it. Just like with dot Org it'll go bad and there'll be nowhere to run.
TLS only HTTP/3 is just as bad for individual autonomy as web attestation in it's own way.
Your beef is with Google and the Chrome team, then. QUIC itself is a good protocol, and at the library level you can just tell it to trust any cert it sees.
Google doesn't need to enable QUIC to disable self-signed certs in Chrome.
If you want a domain name, it was never possible to host your own webpage without involving someone else who owns the domain/name server side of things
But you don't need a domain name to host a webpage. It can be served over the IP address. You don't need a public IP either, a page can be for the local net.
But indeed, if you want a traditional webpage that is accessible over the net and possible to remember it's URL, then yes, you need a domain, and for that, you need (at some level, even if you're a registrar) the entity who runs the tld.
Just to be pedantic, you only need a registrar to globally register the domain name and associate it with DNS records. You could choose to point your system at a locally-controlled DNS server, or edit the local /etc/hosts file, to use user-friendly names without depending on registering the domain with any authority.
If HTTP/1.1 is deprecated from all browsers, and HTTP/3 eventually becomes the only way to view a web page, then it will be impossible to host a localnet web page (ex. a wifi router). The people pushing this standard through don't make routers, so they don't give a shit, and everyone on the planet will just be screwed. This is what happens when we let 1 or 2 companies run the internet for us
Routers today are already dealing with this problem because Chrome throws major security warnings for any unencrypted HTTP traffic. The current solutions I've seen are to use things like Let's Encrypt/ACME certificates for wildcard sub-domains *.routercompanylocal.tld and a bit of secure handshake magic to register temporary A/AAAA records for link-local IP addresses (DNS has no problem advertising link-local addresses) and pass the private parts of the certificate down to the local hardware. Several major consumer routers I've seen will auto-route users to something like https://some-guid.theircompanyrouterlocal.tld and everything just works including a CA chain usually up to Let's Encrypt.
Doing Let's Encrypt/ACME for random localnet web pages is getting easier all the time and anyone can use that wildcard domain loophole if they want to build their own secure bootstrap protocols for localnet. It would be great if the ACME protocol more directly supported it than through the wildcard domain name loopholes currently in use, and that may come with time/demand. I imagine there are a lot of hard security questions that would need to be answered before a generally available "localnet ACME" could be devised (obviously every router manufacturer is currently keeping their secure handshakes proprietary because they can't afford to leak those certificates to would be MITM attacks), but I'm sure a lot of smart minds exist to build it given enough time and priority.
for routers there's a simple and easy workaround. Let's say your router answers on "router.lan". All they would have to do is redirect this name via the router's DNS resolver to, say, 192-168-0-1.company.com, which would be an officially-registered domain that resolves back to ... wait for it... 192.168.0.1!
If you control company.com you can run wildcard DNS for any amount of "private" IP addresses, complete with an official and valid trusted certificate. For an internal IP address. Problem solved.
(and no, this is not theoretical, there were appliances some 10+ years ago that did exactly that...)
Yeah, that is mostly what I was describing. There's some rough security benefit to using more transient A/AAAA records under a GUID or other pseudo-random DNS name than a DNS zone that just encodes common link local addresses. There are definite security benefits to a company using mycompanyrouters.com instead of their home company.com (XSS scripting attacks and the like). So some things have changed over 10+ years, but yes the basic principles at this point are pretty old and certainly working in practice.
Pardon my knowledge. If we are to get very technical instead of for average people, surely you can self-sign a cert and setup your CA chain on the computers in your local network?
Or is there something else that prevents you from hosting HTTP/3 locally?
As far as I know, browsers don't allow self-signed certificates on HTTP/3. This was mentioned by people in comments here, and quick google seems to confirm.
You cannot use a certificate that was not signed by a trusted CA, but nothing keeps you from creating your own CA, making it trusted locally, and using it to sign your cert
That is precisely the problem. Most proprietary systems don't let you touch the trust store at all. Even "open" platforms like Android have been locking down the ability to do anything to the trust store.[1]
With that said, if we assume the user is only using Google Chrome and not an alternative browser, then typing "thisisunsafe" on the TLS error page should let one elide trust store modifications entirely. I cannot guarantee this is the case for HTTP/3 since the reverse proxies I deal with still use HTTP/2.
One might then ask: why not just let the user click a button in the browser to see the page, without jumping through all those hoops? How does increased human toil make it better? (Spoiler: it doesn't)
Sci hub often needs to be accesses by a random IP and/or without a corporate-blessed TLS cert. The same goes for many counter-esrablishment sites across the world (China, Iran, ...).
The premise of the Internet was distributed dissemination of information for the mass public. There is a real fear that we are walking through practical one-way doors, ever increasing the barrier of access to disruptive counter-corporate/counter-state information.
It doesn't take a huge leap to relate these concerns to America's future political discourse.
Security and accessibility/simplicity are almost always at odds with each others. It's a tradeoff that needs to be made. You are entitled to dislike the current trend and prefer making security optional. But you can't possibly be surprised if most people are happy to prioritize their privacy and security over "the barrier of access to disruptive counter-corporate/counter-state information".
HTTP 1 is on a depreciation path and HTTP3 requires TLS, which would mean getting the blessing of a trusted (typically corporate) root cert every 90 days to continue letting random people access my website.
In the US, states recently passed anti-abortion laws which also banned aiding and abetting people seeking the procedure. That would cover domain names and certs if any relevant tech companies had headquartered in those states - or if passed as federal law.
Trans rights are actively heading in that direction, and supporters are the very same that lambasted NYT and others as "fake news" that needed to be banned while pushing narratives of contrived electoral processes.
Fear of political regression is real in America, without even looking internationally.
Societal and technical systems evolve together. With the depreciation of HTTP1, future cheap middleware boxes will likely effectively enforce using HTTP3 and consolidate the tech landscape around a system that is far more amenable to authoritarian control that the prior generation of protocols.
It's fair and valid to call out such scenarios when discussing international technical standards. These protocols and the consequences will be around for decades in an ever evolving world.
The "most" in your strawman here is just companies like Google who want to a) bend to those who want to DRM the entire web b) hide and lock away their tracking traffic from those being tracked c) make ad blocking impossible.
I have HTTP/3 on my local network using traefik + let’s encrypt and an internal zone. I didn’t actually go out of my way to set up HTTP/3 it was pretty much just a few extra lines of configuration so why not?
But I assume you have an officially-registered domain for that? That's the main issue people are having, that without an official domain (i.e. with only "foo.local" or whatever) it's hard to use HTTP/3
AFAIK Let's Encrypt won't sign certificates for internal domains?
I know, I run two domains at home on a Raspberry Pi, I know that it's easy, but I also don't have an issue with paying 10€/year for a domain name. I guess this is the thing people are angry about, that by buying a domain you're funding the very corporate greed that will one day destroy the internet, or something...
GNUnet is a network protocol stack for building secure, distributed, and privacy-preserving applications.
With strong roots in academic research, our goal is to replace the old insecure Internet protocol stack.
The GNU Name System (GNS) is a fully decentralized replacement for the Domain Name System (DNS). Instead of using a hierarchy, GNS uses a directed graph. Naming conventions are similar to DNS, but queries and replies are private even with respect to peers providing the answers. The integrity of records and privacy of look-ups are cryptographically secured.
That's not a great analogy. Your registrar is bound by a strict contract, in some countries it may even be telecom legislation, and your domain is legally yours (again, within contract bounds). While they need to delegate it to you, they cannot arbitrarily suddenly give it to someone else.
BBC.co.uk belongs with the Beeb, anything else would be considered an attack on Internet infrastructure and treated as such. You cannot compare that with the power Google has over Chrome. It is theirs to do what they wish with.
This was an HTTP/2 issue as well. IMO it was a big miss to not specify some kind of non-signed mode with opportunistic encryption. That is, if you set up an HTTP/2 (or /3) connection to a server that did not have a signed cert, it would do an ECDH exchange and provide guarantees that nobody was passively snooping on the traffic. Of course this would not protect against an active MITM but would be better than falling back to pure plain text.
As opposed to decades of stonewalling and sluggish progress just because a few big corporations didn't want to have a harder to manage IT system?
Literally most of the complaints against http3 are from corporate network engineers that now have a harder time controlling their networks. And more importantly, harder to implement mitm and monitoring. Which sure, that sucks for them but that's a massive upside for the vast majority of other internet users.
You're against HTTP/3 because it makes security mandatory?
Have you been completely asleep at the wheel over the past 20 years as ISPs around the world have started injecting ads in people's unencrypted HTTP traffic?
That's great that you want to host a simple website that other people around the world can visit using simple HTTP. Maybe if your website is completely benign and harmless, that's not unreasonable.
But a lot of us want to share information that's important, or sensitive - and we want sites where people can interact privately. I don't want the content of my site manipulated by third-parties along the way, and I don't want them snooping on traffic as people interact with my site.
Yes, because it's mandatory. If the HTTP/3 implementations allowed self-signed certs it'd be okay. But they don't. Or if HTTP/3 allowed the option of connections without TLS but defaulted to TLS, that'd be okay. But it doesn't.
Yeah, it's difficult for me to always add the qualifer, "HTTP/3 allows self-signed certs but no implementation that exists in any browser allows self signed certs".
Plenty of browsers allow self-signed certs—Firefox and Safari, to the best of my knowledge, treat HTTP/3 certs exactly the same as they treat HTTP/2 and HTTP/1.1 certs. Chrome has taken the position that it will no longer allow self-signed Root CAs for HTTP/3 websites, to prevent SSL interception with companies intercepting all of your traffic. For personal use, you can always whitelist an individual certificate using a CLI flag without allowing a trusted root CA
My testing in the past seems to indicate you're wrong. Firefox does not support setting up HTTP/3 connections without CA TLS. Unless they've changed it since version 115 esr. While the neqo lib they use for HTTP/3 does technically allow it unless you compile it yourself with all the flags required for it FF ships with neqo having CA TLS required and no support for self-signed certs when setting up HTTP/3 connections.
I'd love to be wrong or shown a newer version that does allow these things. It'd be a huge load off my mind.
> Chrome does not. But that choice is orthogonal to protocol.
Which means HTTP/3 de facto doesn't support self-signed certificates. Once Chrome disables HTTP 1.1/2 which it will at some point in the name of security or performance, you'll only be able to exist on the web with a CA signed certificate.
"A self-signed certificate is one that is not signed by a CA at all – neither private nor public. In this case, the certificate is signed with its own private key, instead of requesting it from a public or a private CA."
This definition sounds right to me. Do you disagree with it?
I get what you're saying, that you can set up a certificate yourself. But you can't accept a certificate someone else set up. (in Chrome)
That's definitely the wrong definition. A self-signed certificate is a certificate that is signed by the private key corresponding to the public key contained in that certificate - in other words it's signed by itself.
In fact every CA root are by necessity self-signed and therefore signed by a CA (i.e. itself)
I don't think they were talking about certificates that were also CAs. It's not wrong, it's insufficiently precise.
Anyway, your definition is clearer, but it still supports the point I was trying to make. You don't "add your CA" in order to use self-signed certs. You do that to use a private CA that will sign all your certs. And doing so only allows you to use websites you signed, not websites other people signed. It would be a terrible idea to add anyone else's CA, and you can't easily use your CA to slap your signature onto websites other people signed. Adding your own CA is a completely different situation from trying to visit a self-signed site.
It's an implementation limit that at least Chrome (and possibly other browsers) are enforcing for QUIC connections. The TLS certificate must be signed by a trusted CA. [1]
There does appear to be cli flags to change the behavior, but I don't see how you'd do that on mobile or embedded browsers.
That is true. But an actual self-signed certificate is not signed by any CA, so it still can't be used with QUIC connections (with specific clients such as Chrome, anyway.)
RFC 9114 §3.1 ¶2 [0] requires TLS certificates, but I imagine you can easily modify existing implementations to remove this restriction. And even if the spec didn't require it I'd expect all major implementors (web browsers) to still impose this. I imagine something like curl has an opt-out for this (-k?).
Note that the spec essentially says "if verification fails don't trust this connection". What not trusting means is up to the application. For browsers that's probably equivalent to not allowing the connection at all.
> Have you been completely asleep at the wheel over the past 20 years as ISPs around the world have started injecting ads in people's unencrypted HTTP traffic?
Now finally those that forced HTTPS on everybody have a monopoly on that :-) (joking, but only in part...)
The crux of people's objections is that to bring up a "secure" site with this mechanism, you need the blessing of an unaffiliated 3rd party.
And yea, for the last 20 years ISP's have been doing stupid stuff with people's connections, but companies have been trying harder and harder to lock down the internet and put the genie back in the bottle.
(This is the core of the objection to PassKeys, FWIW)
If HTTP/3 let users run their own secure site, without any 3rd party parts, then we are good. Why not a trust-on-first-use mechanism, but with a clear UX explaining the implications of accepting the certificates?
> Have you been completely asleep at the wheel over the past 20 years as ISPs around the world have started injecting ads in people's unencrypted HTTP traffic?
ISPs are in my local jurisdiction, under regulators I may have voted for and I have a contract which them which I could get checked by courts if it's important enough to me. And I've voted with my feet by switching to a different ISP when the previous one did even a hint of nefarious DNS things (they never injected something into HTTP ftws, "merely" NXDOMAIN hijacking).
I can't say the same about google, cloudflare, let's encrypt and so on.
Trading a cacophony of some local good and bad ISPs to a quite US-centric dubious companies is hardly an improvement.
Also, it's a general sign that things are in a low-trust equilibrium when all this extra complexity is needed. HTTP is perfectly fine if you live in a nicer pocket.
I could. But that wouldn't help make my website visitable for anyone. I'm not trying to access my own hosted site. I want a random person on the other side of the world who I've never met and will never meet to be able to visit my website. That's the point of public websites.
Generating my own root cert and using it does not help with this at all since no one else will have it installed nor install it unless I'm already good friends with them. Installing a root cert in the various trust stores is no simple task either.
The easier self-sign with no root cert also doesn't help because every current implementation of HTTP/3 does not support self-signed certs.
I mean you have to do that with https now if you don't go with one of the big guys? How is that any more dangerous to the end user than just going to an http site that they don't know personally? The risks are the same as far as malware.
>How is that any more dangerous to the end user than just going to an http site that they don't know personally?
Installing a random root cert to your trust store is a very dangerous thing to do but you're looking at this from the wrong end. I'm not talking about the dangers of a random user installing a random TLS root cert from me I send them over, what, email? That's just not going to happen. It shouldn't happen.
I'm talking about the ability for human persons to host visitable websites without corporate oversight and control. With HTTP/1.1 I can just host a website from my IP and it's good. Anyone can visit. With HTTP/3 before anyone can visit my website I first have to go ask permission every ~90 days from a third party corporation. And if I piss them off (or people who put pressure on them in certain political regions), say by hosting a pdf copy of a scientific journal article, or a list of local abortion clinics, they can revoke my ability to host a visitable website.
With HTTP/3, as a human person, the cure is worse than the "disease".
I think they meant, can't a visitor just ignore that your cert is self-signed and view the page anyway, _without_ adding it to the trust store. Firefox, at least, has this as the default "ignore the cert error" action.
Setting up a root cert in a trust store and accepting a non-root self-sign cert are very different things.
That said, "can't a visitor just ignore that your cert is self-signed and view the page anyway,"
No. They cannot. Not when it's HTTP/3. Even if the browser ostensibly supports it the HTTP/3 connection will fail because the HTTP/3 lib Firefox uses does not allow it.
Would that actually help your cause, since it would push people to build and distribute their own user agents which accepts self-signed certs/CA in a user-driven community without big corp oversight?
You'd have to get all the major browsers to trust that CA, it's not possible to do that without "corporate oversight".
That's the point the other poster is making, it adds a new level of control that becomes the de facto only way to do things if http 1.1 ever gets deprecated.
I have no opinion on the likelihood of any such deprecation but I fully understand the concern.
Thats mainly a problem with the browsers, no? Not saying it isn't an issue, obviously the big ones are going to drive a lot of this technology, but you could still use something like curl or whatever.
This is literally the same as HTTP except that browsers don't (by default yet) put up a scary warning for that. But with a self-signed cert you get protection from passive attackers and once you press yes the first time it verifies if someone else tries to hijack your connection.
I think almost all protocols should have always-on encryption. You can choose to validate the other end or not but it is simpler and safer to only have the encrypted option in the protocol.
FWIW I have HTTPS-only mode enabled and I would prefer to be notified of insecure connections. To me a self-signed cert is actually better than HTTP.
I'm sure it will be a while until HTTPS-only is the default, but it seems clear that browsers are moving in that direction.
> It's a terrible protocol for the human person web. You can't even host a visitable website for people who use HTTP/3 without getting the permission of a third party corporation to temporarily lend you an TLS cert.
Um, so many parts of web hosting require willing actions of corporations. Network, servers, etc. Why are you singling out just one part?
For someone who complains about web stuff so much, it's weird that you don't have a better understanding of the technology. No amount of pipelining can fix HTTP/1's request-per-connection model enough to make it competitive with HTTP/2.
I would say I have a pretty good understanding of Ethernet, IP, TCP, and also HTTP, having written implementations of that full stack several times -- including pretty sophisticated ultra-low-latency tricks.
HTTP/2 implements multiplexing on top of TCP, which is just plain stupid. TCP is not meant to be used for this, and doing this is actively harmful; the correct thing to do is to establish multiple TCP connections. That sort of thing is not surprising since most of web and cloud tech is all about re-implementing low-level components on top of inadequate high-level abstractions.
The main reason to use HTTP/2 is simply that when a server provides both 1 and 2, the backend quality will typically be much better on 2.
Imagine you are holding a 200 requests/page bag of excrement and it’s generating too much load on all systems involved.
What do we do? Empty the bag? No, we do not. That doesn’t scale.
We create the most elaborate, over the top efficient bag-of-excrement-conveyer-belt you ever came across. We’ll move these babies in no time flat.
Don’t worry about side-loading 24 fonts in 3 formats and whatnot. Especially do not worry about having to load three different trackers to “map” your “customer journeys”.
No sir, your customer journeys are fully mapped, enterprise-ready and surely to be ignored by everyone. We got you covered with QUIC. Reimagining “data transport”, for your benefit.
Edit: Of course I didn’t read the article, but now I did and it’s somehow worse than I thought. “Why TCP is not optimal for today’s web”: no answers. There are literally no answers here. Other than, TCP is old. Old things are difficult.
Let's put it this way, imagine going to the doctor, complaining about how you’re putting on a little extra weight, your legs are getting tired, and you’re short of breath. The doctor, instead of suggesting the radical idea of a diet or exercise, goes, “Aha! What you need, my friend, is a super-fast, mechanized, all-terrain wheelchair that can drift corners at 60 mph. That’ll get you to the fridge and back in record time, ensuring your problematic lifestyle remains unscathed by reality!”
When life gives you lemons, don’t make lemonade. Construct an intricate, high-speed, international lemon delivery system, ensuring everyone gets to share in your citrusy misery, QUICly.
So here’s to QUIC, the glorious facilitator of our cybernetic decadence, ensuring that the digital freight train of unnecessary features, exhaustive trackers, and bandwidth-gulping page elements gets to you so fast, you won’t have time to realize you didn’t need any of it in the first place.
Buckle up, folks, because the future of web browsing is here and it’s screaming down the information superhighway like a minivan packed with screaming toddlers, on fire, yet somehow, inexplicably, punctual.
While I partially sympathize and agree with your pessimism, you underestimate the impact of regular users disengaging from various places of the internet exactly because it gets more and more meaningless, watered down, and doesn't provide any value over super brief validation.
And while that super brief validation is still a very strong hook for metric tons of people out there, people do start to disengage. The sense of novelty and wonder is wearing off for many. I'm seeing it among my acquaintances at least (and they are not the least bit technical and don't want to be).
That the future of the internet looks very grim is unequivocally true though. But there's a lot that can and will be done still.
> While I partially sympathize and agree with your pessimism, you underestimate the impact of regular users disengaging from various places of the internet exactly because it gets more and more meaningless, watered down, and doesn't provide any value over super brief validation.
Thank fuck. Burn it down. Take us all the way back to vBulletin, MSN Messenger and Limewire. I have seen the future and it sucks. The past sucked too but at least my grandparents weren't getting radicalized by HillaryForJail2024 on Facebook.
I was going to write something along the lines how happy I am that someone solved the problem of the time to first byte when at least two javascript frameworks and three telemetry packages, each including a dozen of resources, are required for just about any page. But you put it so much more eloquently.
If you do not know the answer yet,as it sounds like you do, but there is a ting of rant there, let me expand what was said in the article:
TCP/IP was build for 80s/90s when
- Amount of data transferred was low
- Data access patterns were different
- Most TCP/IP was in university networks
- There were no consumer users
- Everyone on Internet could be trusted
Today we have
- 3B more people using Internet
- 5B people living in authoritative regimes where the government wants to jail them for saying wrong thing on Internet
- Mobile users
- Wireless users (subject to different packet loss conditions)
- You can stream 4k movies in your pocket
- You can search and find any piece of information online under one second
To get the low latencies and high bandwidths for mobile links TCP/IP can do it, but QUIC and HTTP/3 does it much better. Plus it's always encrypted, making all kind of middlebox attacks harder.
And you put those two statements together and you're back to the 5B number. I don't really care if your system of choosing leaders is democratic elections if at the same time you're burning books.
no, if you include anybody in the World, you are back at 7B
The point was that you contradicted yourself: if 3B people use the internet (as you said) only 3B people are in danger of being spied through the internet.
> I don't really care if your system of choosing leaders is democratic elections
Unfortunately the dictionary does.
And one does not imply the other, I mean it's one thing to have cameras in the streets for (allegedly) safety purpose and another thing entirely to be hanged from a crane if you're gay.
Let's not pretend that everything is the same everywhere just because we don't like or agree with a particular aspect of what happens in our own country.
I think you meant to reply to someone else, I said nothing about the internet only that the math of
"5B people live under authoritarian regimes."
"Actually it's only 1.9B if you use my definition of authoritarian, the rest are just authoritarian behaviors."
Okay so to everyone but you 5B live under authoritarian regimes then -- "Not technically authoritarianism" is not something you should feel the need to say about countries you're arguing aren't authoritarian.
> another thing entirely to be hanged from a crane if you're gay
I know right, we're so much more civilized. We just lock them up if they do gay stuff in public, require sexual education to teach that homosexuality is an "unacceptable lifestyle choice", define consensual sex between two homosexual teenagers as rape, and and have a standing legal theory that people just can't help it if they are thrown into a blind rage and attack a gay person for being gay that in 2023 is still a legal defense in 33 states. And that's just the gays, the political punching bag de jour is trans people and they get it even worse.
While trimming fat from excessively complex web pages would be nice, HTTP over TLS over TCP has some very clear fundamental inefficiencies that need to be addressed sooner or later.
The biggest one is that there is just no reason to negotiate multiple encrypted sessions between the same client and server just to achieve multiple parallel reliable streams of data. But since TLS is running over TCP, and a TCP connection has a single stream of data (per direction) you are forced to negotiate multiple TLS sessions if you want multiple parallel streams, which needlessly uses server resources and needlessly uses network resources.
Additionally, because TCP and TLS are separate protocols, TCP+TLS connection negotiation is needlessly chatty: the TLS negotiation packets could very well serve as SYN/SYN-ACK/ACK on their own, but the design of TCP stacks prevents this.
I believe it would be theoretically possible to simply set the SYN flags on the Client Hello and Server Hello TLS packets to achieve the TCP handshake at the same time as the TLS handshake, but I don't think any common TCP/IP stack implementation would actually allow that (you could probably do it with TCP Fast Open for the Client Hello, but I don't think there is any way to send the Server Hello payload in a a SYN-ACK packet with the default stacks on Linux, Windows, or any of the BSDs).
And, since you'd still need to do this multiple times in order to have multiple independent data streams, it's probably not worth the change compared to solving both problems in a new transport protocol (the way QUIC has).
The right way would probably be to implement a real TCP replacement. Look at SCTP for inspiration. There are certainly things that could be improved and would be even nore useful than multiplexing, such as working congestion control and multipath.
I understand that HTTP is the new TCP, but I don't have to like it.
Sort of? Except being implemented on top of IP, like TCP and UDP are, it's implemented on top of UDP, mainly so that existing middleboxes don't have to be updated. Either a TCP replacement or a major version update to TCP would be the Right Thing, but we can't have good things.
Is there anything in the design of UDP that constrains what QUIC would have wanted to do? If it used a different ID in the IP packet, would it really change anything about the UDP headers?
I would say UDP is such a simplistic protocol you can actually see it more as a protocol-building framework than a layer in-and-of itself.
I'm not sure what you mean. What does UDP have to do with acceptable service? If you want to prioritize certain kinds of traffic, you either to do deep packet inspection, or you trust IP-level QOS info (which you probably shouldn't). There is no other way. No one sane prioritizes UDP in general, you prioritize things like VoIP, for which you anyway need to do DPI.
And separating data connections from encryption has in fact proven to be a a bad idea over those 40 years.
What a contrast HN provides. The top post is some really interesting detail about the QUIC protocol, the things it can do and the efficiencies it enables. Immediately followed by this rant about web pages being too big these days.
QUIC is really cool technology. It can solve a wide range of problems that aren't necessarily HTTP based.
However, HTTP/3 is using QUIC as a patch over badly designed websites and services.
QUIC doesn't solve the problem, it provides another path around the source of the problems. This blog was written by Akamai, a CDN that makes profits off the problem of "my website is too fat to lose quickly anymore". I don't blame them, they should sezie the opportunity when they can, but their view is obviously biased.
QUIC makes most sense for sites that can optimise it, like the Googles, Cloudflares, and Akamais of the world. For the average website, it's just another technology decision you now need to make when setting up a website.
> However, HTTP/3 is using QUIC as a patch over badly designed websites and services.
Most "modern bloated" SPAs are tiny. The default MSS for most TCP implementations is 1460 bytes, that's 1.42 kB max per packet. Just the TCP handshake packets themselves (3 * 1460) can hold almost as much data as it takes to get a standard SPA. This says nothing about the TLS handshake itself which adds another 3 packets to connection establishment. Most SPAs send small and receive small payloads; a compressed JSON exchange can easily fit in a single packet (1.42 kB.)
The actual amount of bandwidth on the wire between a server-side rendered "fast" app and a "modern bloated" SPA isn't very different, the difference is in how they send and receive data. SSR pages send a thin request (HTTP GET) and receive a large payload; generally the payload is much larger than the connection establishment overhead, so they make good use of their connection. On the other hand, a naive SPA will involve opening and closing tens or even hundreds of connections which will lead to a huge waste (6 packets per TLS+TCP connection) of overhead. QUIC makes it possible to have small connection-oriented request/response semantics on the internet with minimal fuss.
That said the problem with rants like GP's comment is that they don't serve to illuminate but serve to push on us, the readers, the author's agitated emotional state. Instead of having a discussion about bandwidth and connection overheads, or even trying to frame the discussion in terms of what "waste" really is, we get emotional content that begs us to either agree with an upvote or disagree with a downvote with nothing of substance to discuss. Any discussion website is only as strong as its weakest content and if content like this continues to get lots of upvotes shrug
> Just the TCP handshake packets themselves (3 * 1460) can hold almost as much data as it takes to get a standard SPA.
I have no idea in what reality you live where you encounter SPAs of a few Kbs. Don’t even know what to say if that’s your experience.
> That said the problem with rants like GP's comment is that they don't serve to illuminate but serve to push on us, the readers, the author's agitated emotional state.
You are absolutely right there. I will try to refrain from these “frustration posts” in the future. Sorry about that.
> I have no idea in what reality you live where you encounter SPAs of a few Kbs. Don’t even know what to say if that’s your experience.
Let's look at a SPA version of HN [1] and compare it against base HN. This base SPA page fetches 21 KiB of data. The base version of HN fetches 36 KiB of data. Most SPAs are much less information dense and have a lot more CSS and media than non-SPA websites. In a world without SPAs these sites would not be leaner, they would instead just insert tons of CSS style links and img tags to fetch the media themselves, resulting in the same amount of bandwidth.
If your argument is that websites should be leaner regardless, well in my experience most users would rather put up with a slow good-looking page than a fast spartan one. It's a hallmark of a nerd that's willing to cut corners just to get access to the information they want.
Although HN is a very bad example of an “average” anything, it’s still a good catch. Thanks for reminding me. SPAs can definitely be leaner than regular websites. Never disagreed on that one to be honest. SPA or not is just a delivery method. Clean SPA or clean “regular” site, it’s all good.
Again I was just venting and that sucks. Yelling at the cloud(s).
To be honest, I share your aesthetic principles, but other than the other nerds in my life nobody else does shrug. I remember the '90s pop-up hell and hated that too. I just spend my time on the net in places that I enjoy.
This comment has a real grain of truth to it: reducing request load on your users has way more bang for the buck than changing protocols.
At my company we did a test for HTTP2 and HTTP3 versus HTTP 1.1 in South America on a 4G connection. We found that Netflix loaded in less than 2 and 1/2 seconds regardless of what protocols the client supported.
We determined that if we enable the HTTP2 or 3, We would save perhaps a second or two load time, but that if we reduced the request load we could perhaps save on the order of tens of seconds.
Joking aside, we should be happy for any improvement that can be obtained. A single second saved for a single connection on its own is insignificant, but dozens or hundreds of connections for a single user, multiplied by the thousands or possibly millions of users, that's a huge amount of time saved.
Sure, yeah, it's all sunshine and roses for the client, but I'm a DevOps engineer. It is very difficult on the server side.
You can enable it on CDNs pretty easily these days, but enabling it on the API side is more difficult. The nginx ingress controller for kubernetes doesn't support HTTP3 yet. It supports HTTP2, but I've heard that you can get into trouble with timeouts or something if you don't tune HTTP2 right. It does look like it's doable though, certainly more doable than HTTP3.
HTTP2 is a moderate lift from a systems administration standpoint, but it's also a risk. It means changing the infrastructure enough that outages can occur, so that I'd rather do this lift when things are more stable and the holiday season is over.
Making the network more efficient
enbiggens the amount of excrement
you can put in that bag.
Efficiency bad!!
?
> Edit: Of course I didn’t read the article, but now I did and it’s somehow worse than I thought. “Why TCP is not optimal for today’s web”: no answers. There are literally no answers here. Other than, TCP is old. Old things are difficult.
I think you just didn't read the article period. That QUIC hides a lot of metadata that TCP doesn't was covered, for example, and there's stuff about multipath, etc. And clearly, yes, TCP is old, and a lot of the extensions done to it over the years (e.g., window scaling, SACK) are great but still not perfect (especially window scaling) and it's getting harder to improve the whole thing, and then there's congestion control. But TCP was never going to be replaced by SCTP because SCTP was a new protocol number above IP and that meant lots of middleboxes dropped it, so any new thing would have to be run over UDP, and it would have to be popular in the big tech set, else it wouldn't get deployed either.
“Efficiency” for whom and for what? Think deep before you all bow down before our overlords and ditch clear and simple protocols for “efficiency”.
I know what QUIC is for and I know it’s strengths. I just want the web to be simple and accessible. It’s great Netflix and friends can stream content efficiently. It’s another thing to push this to the “open web” and declare it The Protocol.
That's actually an excellent idea you've got here. Let's make chip companies design weight loss chips, were Electron running apps are forced to run on a purposely built slow core, like your doctor prescribing going to the gym.
> Imagine you are holding a 200 requests/page bag of excrement and it’s generating too much load on all systems involved.
Sorry, this is a purely garbage comment. A "web 1.0" page with a photo gallery of a few hundred thumbnails can take a dozen seconds or more to load on my phone over 5G. Why? Because browsers limit you to six HTTP/1.1 connections each downloading a single image at a time, requested serially one after the other. A few big images? The rest of the page stops loading (regardless of the importance of the assets being loaded) until those images are done. It has nothing to do with how much bandwidth I have, it has everything to do with the insubstantial nature of HTTP/1.1 and TCP as protocols for downloading multiple files.
For literally decades, we've been jumping through hoops to avoid these problems, even on "web 1.0" sites. In 2006 when I was building web pages at a cookie cutter company, rounded corners were done with "sliding doors" because loading each corner as its own image was too slow (and border-radius hadn't been invented). Multitudes of tools to build "sprite sheets" of images that are transformed and cropped to avoid loading lots of little icons, like the file type icons on web directories of old. The "old web" sites that HN adores tend to run astoundingly badly on HTTP/1.1.
Not only do H2 and H3 fundamentally solve these _generational problems_, they've made the initial loads far faster by reducing the overhead of TLS handshakes (yet another sticking point of TLS holdouts!) and improved user privacy by encrypting more data. H2 and H3 would _absolutely_ have been welcomed with open arms fifteen years ago, long before the age of "24 fonts in 3 formats" because the problems were still present back then, regardless of whether you'd like to pretend they didn't.
We should be celebrating that these protocols made _the whole internet_ faster, not just the bloated ones that you're upset about.
> A bunch of images could load on a single TCP connection just fine.
"Just fine" is subjective here. Of course they could load, but with certain performance characteristics.
> That has absolutely no bearing on TCP
It does, actually. Every packet requires a full round trip done sequentially. H3 significantly relaxes this because it has an understanding of what's actually being transmitted. An unreliable connection will inherently be far slower over TCP than over H3 because any delayed or lost packets or acks need to be retransmitted before the next packet goes out. H3 makes intermittent packet loss affect fewer files being transferred and reduces the impact of latency on a packet-by-packet basis.
The vast majority of Internet users aren't on highly reliable broadband. H3 is a win for all of those people.
This is what always happens, because it's not the same people/teams/companies filling the bags and building the conveyor belts, and because it's easier that way.
Another reason is that people are reluctant to give away any sort of convenience or vanity (a few nicer pixels in their website design) just for saving someone else's resources (bandwith/data/latency of server & and user). For the same reason people are driving SUVs in cities - it's a horrible misfit, but makes them feel good about very marginal benefits.
I think the problems of TCP (especially in a network with high latency spikes, non-negligible packet drop rates or when roaming through multiple networks while driving 160kp/h on the Autobahn) are pretty obvious, even if you leave the security/encryption aspect out of the picture... But maybe that's only me
Hmm, what is "today's web". Surveillance and advertising.
Let's be reasonable. Not every web user is exactly the same. Some might have diferent needs than others. People might use the web in different ways. What's optimal for a so-called "tech" company or a CDN might not be "optimal" for every web user. The respective interests of each do not always align on every issue.
"Over time, there have been many efforts to update TCP and resolve some of its inefficiencies - TCP still loads webpages as if they were single files instead of a collection of hundreds of individual files."
To be fair, not all web pages are "collections of hundreds of individual files", besides the file/path that the web user actually requested, that the web user actually wants to see. For example, I use TCP clients and a browser (HTML reader) and I only (down)load^1 a single file, using HTTP/1.x. This allows me to read the webpage. Most of the time when I am using the web, that's all I'm trying to do. For example, I access all sites submitted to HN this way. I can read them as text just the same as someone using HTTP/3. Hence I can discuss this blog post in this comment using HTTP/1.1 and someone else can discuss the same blog post using HTTP/3. The text that we are discussing is contained in a single file.
So what are these "collections of hundreds of individual files". Well, they might come from other sites, i.e., other domains, other host IP addresses. Chances are, they are being used for tracking, advertising or other commercial purposes. And I am ignoring all the HTTP requests that do not retrieve files, e.g., behavioural tracking, telemetry, (incorrectly) detecting whether an internet connection exists, etc. HTTP/3 seems to downplay the concept of requests made by a web user, the idea of user agency. It allows servers to send files to the web user that the web user never requested. Surely these will only be files the user did not know they actually wanted. For example, ads. This is called something like "server push".
IMHO, the best websites are not comprised of pages that each contain hundreds of individual files, sourced from servers to which I never indicated an intent to connect. The best websites are, IMHO, ones with hundreds of pages where each is a single file containing only the information I am interested in, nothing else. No tracking, no ads, no manipulative Javascripts, no annoying graphical layouts, no BS. HTTP/1.1 provides an elegant, efficient method to request those pages in sequence. It's called pipelining. Multiple HTTP requests sent over a single TCP connection. No multiplexing. The files come from a single source, in the same order as they were requested. Been using this for over 20 years. (If the web user wants "interactive" webpages, filled with behavioural tracking and advertising, then this is not optimal. But not every web user wants that. For non-interactive, 100% user-controlled information retrieval HTTP/1.1 is adequate.)
Not every web user is interested in a commercialised web where the protocols are optimised for tracking and programmatic advertising. The early web had no such complexity nor annoyances. HTTP/3 can co-exist with other, simpler protocols, designed by academics not advertising service companies or CDNs. The other protocols may or may not be as suitable for advertising and commercial use.
HN changed the title from "Why HTTP/3 is eating the world" to "HTTP/3 adoption is growing rapidly". Perhaps HTTP/3 is being hyped. If it is a protocol "optimised" for commercial uses that benefit the company who authored the specification, this would not be surprising.
Interestingly, the commentary in this thread mainly focuses not on HTTP/3 but on QUIC. QUIC reminds me of CurveCP, an earlier UDP-based TCP replacement. I have run HTTP/1.1 over CurveCP on the home network. In theory shouldn't it be possible to use any protocol with QUIC, via ALPN. Something like
1. The term "load" is interesting. It's more than downloading. It's downloading code plus running it, automatically, without any user input. There is no opportunity to pause after the download step to choose whether or not to run someoone else's code. That's "today's web". If one uses a "modern" web browser issued by an advertising-supported entity. The browser I use is not issued by an advertising-supported company, AFAIK it's originally written by university staff and students; it does not auto-load resources and it does not run Javascript. There is no waiting for a site to "load".
The reason to put encryption at the bottom of the stack is that it helps with Hyrum's Law. Part of the reason TLS is so hard to change is that everyone can see all the data and therefore anyone on the path your package takes might make decisions based on the data. This code will break if you try to update anything (even if the thing they are observing is something that they shouldn't have been observing). By encrypting everything possible, you remove the ability for everyone in the middle to see or depend on any details of the higher layers.
Enjoy debugging things when everything is encrypted... and then your certificate provider goes down (or removes you because they don't like you) and you can't even connect...
Look at the ones pushing this stuff, who they work for, and what interests they have. It's easy to see why a lot of things are the way they are, when you realise their true purpose.
> 2) Encryption is needed in the bottom stack! WHY?
One reason is that hardware (NICs) can offload encryption more easily when it's closer to the lowest end-to-end layer (i.e., no lower than IP).
So IPsec and QUIC are easy to offload, but TLS-over-TCP less so. It especially helps that there is no need to buffer up part of the TCP stream so as to get whole TLS records to decrypt.
I hate QUIC. I don't like that now there's an implementation of TCP in user space, and I find binary protocols nauseating, particularly when 90% of their value can be achieved using gzip over the wire. Not to mention corporate firewalls generally block UDP by default. UDP? Really?
The design makes a lot of sense if you're talking about a remote client because now you can update the TCP stack without regard to when the user updates Android or whatever, but like GraphQL I feel like it's a technology that bends over backwards for the client. Maybe that's necessary and I get that, but whenever possible for other services that don't need to be sensitive to rural users I prefer to use things that make sense. REST and HTTP 1.1 over TCP continue to make sense.