> the benefit of HTTP/2 is loading smaller things earlier
No, the benefit of HTTP/2 is a lack of head-of-line blocking. Head-of-line blocking can be easily seen when big things block small things, but that's not what it is. What it is, is when something doesn't make progress because another thing is being waited for.
Imagine a multimedia container file-format where you can't interleave audio frames with video frames, but rather need to put the whole audio track first, or the whole video track first. This format would be unsuited to streaming, because downloading the first chunk of the file would only get you some of one track, rather than useful (if smaller) amounts of all the tracks required for playback. Note that this is true no matter which way you order the tracks within the file—whether the audio (smaller) or video (larger) track comes first, it's still blocking the progress of the other track.
HTTP/2 is like a streaming multimedia container format: it interleaves the progress of the things it loads, allowing them to be loaded concurrently.
This doesn't just mean that small things requested later can be prioritized over large things requested early (though it does mean that.) It also means that, for example, if you load N small Javascript files that each require a compute-intensive step to parse+load (GPU compute shaders, say), then you won't have to wait for the compute-heavy load process of the previous files to complete, before you begin downloading the next ones; but rather you can concurrently download, parse, and load all such script files at once. Insofar as they don't express interdependencies, this will be a highly-parallelizable process, much like serving independent HTTP requests is a highly-parallelizable process for a web server.
One benefit of HTTP/2's lack of head-of-line blocking, that would be more talked-about if we had never developed websockets, is that with HTTP/2, you get a benefit very much like websockets, just using regular HTTP primitives. You can request a Server-Sent Events (SSE) stream as one flow muxed into your HTTP/2 connection, and receive timely updates on it, no matter what else is being muxed into the connection at the same time. Together with the ability to make normal API requests as other flows over the same connection, this does everything most people want websockets for. So the use-case where websockets are the best solution shrinks dramatically (down to when you need a time-linearized, stateful, connection-oriented protocol over HTTP.)
> new protocol that sets in stone some heuristic
Note that there's actually no explicit specification of the order in which HTTP/{2,3} flows should be delivered. What I'm calling "content-oblivious round-robin chunk scheduling" is just the simplest-to-implement strategy that could possibly meet HTTP/2's non-head-of-line-blocking semantics (and so likely the strategy used by many web servers, save for the ones that have been highly-optimized at this layer.) But both clients and servers are free to schedule the chunks of HTTP flows onto the socket however they like. (They can even impose a flow concurrency cap, simulating browsers' HTTP/1.1 connection limit and starving flows of progress. It'd make the client/server a non-conformant HTTP/{2,3} server, but it'd still work, as what progress "should" be being made is unknowable to the peer.)
It's a bit like saying an OS or VM has a "soft real-time guarantee" for processes. Exactly how does the OS scheduler choose what process will run next on each core? Doesn't really matter. It only matters that processes don't break their "SLA" in terms of how long they go without being scheduled.
No, the benefit of HTTP/2 is a lack of head-of-line blocking. Head-of-line blocking can be easily seen when big things block small things, but that's not what it is. What it is, is when something doesn't make progress because another thing is being waited for.
Imagine a multimedia container file-format where you can't interleave audio frames with video frames, but rather need to put the whole audio track first, or the whole video track first. This format would be unsuited to streaming, because downloading the first chunk of the file would only get you some of one track, rather than useful (if smaller) amounts of all the tracks required for playback. Note that this is true no matter which way you order the tracks within the file—whether the audio (smaller) or video (larger) track comes first, it's still blocking the progress of the other track.
HTTP/2 is like a streaming multimedia container format: it interleaves the progress of the things it loads, allowing them to be loaded concurrently.
This doesn't just mean that small things requested later can be prioritized over large things requested early (though it does mean that.) It also means that, for example, if you load N small Javascript files that each require a compute-intensive step to parse+load (GPU compute shaders, say), then you won't have to wait for the compute-heavy load process of the previous files to complete, before you begin downloading the next ones; but rather you can concurrently download, parse, and load all such script files at once. Insofar as they don't express interdependencies, this will be a highly-parallelizable process, much like serving independent HTTP requests is a highly-parallelizable process for a web server.
One benefit of HTTP/2's lack of head-of-line blocking, that would be more talked-about if we had never developed websockets, is that with HTTP/2, you get a benefit very much like websockets, just using regular HTTP primitives. You can request a Server-Sent Events (SSE) stream as one flow muxed into your HTTP/2 connection, and receive timely updates on it, no matter what else is being muxed into the connection at the same time. Together with the ability to make normal API requests as other flows over the same connection, this does everything most people want websockets for. So the use-case where websockets are the best solution shrinks dramatically (down to when you need a time-linearized, stateful, connection-oriented protocol over HTTP.)
> new protocol that sets in stone some heuristic
Note that there's actually no explicit specification of the order in which HTTP/{2,3} flows should be delivered. What I'm calling "content-oblivious round-robin chunk scheduling" is just the simplest-to-implement strategy that could possibly meet HTTP/2's non-head-of-line-blocking semantics (and so likely the strategy used by many web servers, save for the ones that have been highly-optimized at this layer.) But both clients and servers are free to schedule the chunks of HTTP flows onto the socket however they like. (They can even impose a flow concurrency cap, simulating browsers' HTTP/1.1 connection limit and starving flows of progress. It'd make the client/server a non-conformant HTTP/{2,3} server, but it'd still work, as what progress "should" be being made is unknowable to the peer.)
It's a bit like saying an OS or VM has a "soft real-time guarantee" for processes. Exactly how does the OS scheduler choose what process will run next on each core? Doesn't really matter. It only matters that processes don't break their "SLA" in terms of how long they go without being scheduled.