A point of great pain for me in the use of HTTP range headers is that they are absolutely incompatible with HTTP compression.
This boils down to an ambiguity found in the early days of HTTP: when you ask for a byte range with compression enabled, are you referring to a range in the uncompressed or compressed stream?
The only reasonable answer, in my opinion: a range of bytes in the uncompressed version, which is then compressed on-the-fly to save bandwidth.
I suspect the controversial point was that originally HTTP gzip encoding was used via pre-compression of the static files. On-the-fly encoding was probably too computationally expensive at the time.
Anyway, the result of this choice done decades ago is currently preventing effective use of the ranged HTTP requests. As an example, it is fairly likely that chunks from the SQLite database can be significantly compressed, but both HTTP servers and CDN refuse to compress "206 Partial Content" replies.
Interestingly, another use case of HTTP byte ranges is video streaming, which is not affected by this problem since video is _already_ compressed and there is no redundancy to remove anymore.
> This boils down to an ambiguity found in the early days of HTTP
The RFC is clear on this, different Transfer-Encoding is only for the transfer and does not affect the identity of the resource, while different Content-Encoding does affect the identity of the resource.
Its not sure who did it first, but either Browsers or Web Servers started using the Content-Encoding header (which means, "this file is always compressed" like a tar.gz, user agent not meant to uncompress) with the meaning of Transfer-Encoding (which means, the file is not compressed, compression is just applied for the sake of the transfer and the user agent needs to uncompress first). This is a violation of the HTTP spec.
This fuckup resulted in big confusion and now requires an annoying amount of workarounds:
- E-Tag re-generation [1]
- Ranges and Compression are not usable together
- Browsers's "Resume Download" feature removed
- RFC-compliant behavior being a bug [2]
- Some corporate proxies are still unpacking tarballs on the fly, breaking checksum verification
Contrast this to eMail, which has the Encodings correctly sorted out, while using the same RFC822 encoding technique as HTTP.
I think another issue with compressing partial content is that those compressed responses cannot be cached efficiently - it would have to be compressed on-the-fly for every range requested. And while compression is not as computationally expensive as it used to be, it does still add overhead that could be more than the bandwidth overhead of uncompressed data.
There should be a workaround for this using a custom pre-compression scheme, instead of relying on HTTP for the compression. Blocks within the file will have to be compressed separately, and you'll need some kind of index mapping uncompressed block offsets to compressed block offsets.
Unfortunately there doesn't seem to be a common of-the-shelf compression format that does this, and it means that you can't just use a standard SQLite file anymore. But it's definitely possible.
I see the caching argument only partially. It applies to any ranged request, non just compressed ones, and there are plenty of ways for origin servers to describe what and how to cache content. The origin may explicitly only allow caching when the access patterns are expected to likely repeat.
SQLite isn't compressed. There is a commercial offering from the creators of SQLite that adds compression but its rarely seen in the wild to be honest.
SQLite isn't compressed by default, but there are extensions like https://phiresky.github.io/blog/2022/sqlite-zstd/ that offer compression. It would be interesting to see how well that extension works with this project.
Good note. Byte-range requests are supported by default for static content on all web servers, so to your point, this won't work if you've explicitly turned it off.
It's funny, I was wondering whether this replicated Simon Willison's work on Datasette [1], but I guess I got things confused because Simon himself was the first to comment on that older post about how clever this was.
There's a web browser version of datasette called datasette-lite which runs on Python ported to WASM with Pyodide which can load SQLite databases.
I grafted the enhanced lazyFile implementation from emscripten and then from this implementation to datasette-lite relatively recently as a curious test. Threw in a 18GB CSV from CA's unclaimed property records here
You can run queries through to datasette-lite if you URL hack into it and just get to the query dialog, browsing is kind of a dud at the moment since datasette runs a count(*) which downloads everything.
The author mentioned how running a Web server risks it going down because they forgot to pay or something. Then uses "free" services instead.
Unlike paying, which you can easily set to auto pay. Free services can go up and down whenever they wish. In fact, it's probs in their terms of service. The more generous the free service, the more likely it'd get cut down and be unusable later or just more expensive than alternatives. Like heroku free tier.
Paying money means there's an actual incentive and legal contract for the company to provide you service you paid for.
Static sites benefit from being low-maintenance and highly-portable. You can easily switch from generous free services (Cloudflare Pages, Netlify, Vercel, Firebase, etc.) to highly-available paid services (Amazon S3, Cloudflare R2, Backblaze B2, Google Cloud Storage, etc.) Keeping a VM running and updated is more work (and sometimes higher cost).
In theory it should be possible to write back to static storage, but you’d have to deal with authentication and file management, cacheing writes, consistency/atomicity etc. It’d be a lot of work.
To clarify, Seafowl itself can't be hosted statically (it's a Rust server-side application), but it works well for statically hosted pages. It's basically designed to run analytical SQL queries over HTTP, with caching by a CDN/Varnish for SQL query results. The Web page user downloads just the query result rather than required fragments of the database (which, if you're running aggregation queries, might have to scan through a large part of it).
This is a cool project, and really well presented.
I've looked doing similar, trying to query a large dataset through ranged lookups on a static file server in the past. I ran into issue of some hosters not providing support for ranged lookups. Or the CDNs I was using having behaviour like they only fetch the origin data source in 8MB blocks meaning there was quite a lot of latency when doing lots of small reads across a massive file.
It would have been interesting to find out a bit more about these topics, and see a bit more on performance.
This can potentially become very interesting where if all these pages are pushed to edge via CDN, then browsers can essentially have a GraphQL built on top of SQLite on the client side.
Reminder that Mozilla killed off SQLite in the browser, to push indexedDB, a much more limited database that they then went and implemented in Firefox with... SQLite.
That's a bit misleading. IndexedDB is a standard maintained by the World Wide Web Consortium (W3C) and is supported by all major browsers. Web SQL Database was a prior API developed by Apple. But Firefox refused to add support for it and argued against it becoming a standard because it would codify the quirks of SQLite. It was thus deprecated in favor of IndexedDB.
Usually standards will have holes. Therefore, it is necessary to ask: what happens when a competing standard notices a hole in the current standard? In this case, it is better to have a reference implementation, matching which, the competing implementation should have programs that depend on the standard to continue to work.
Therefore, tying a standard to a reference implementation, especially one as reputed as SQLite, might not have been a bad outcome overall.
It's an open source project, they could've forked it and got their wish to keep the API unaffected by any changes to SQLite. This blog post[1], this amazing blog post gives us all the insight needed to know this was going to go badly. The bit about whiteboarding a replacement, is that better or worse than believing Javascript is really a replacement for SQL? Or that IndexedDB is somehow elegant? I can't tell, the whole thing makes me cringe.
SQLite is a massive library, with a lot of functionality and optional extensions, and behaviour is added or changed over versions. Codifying all the behaviour into a standard would be a massive effort. And then you're either stuck on that version of SQLite, or your standard depends on what an external project is doing with new releases - which is really not what you want for a web standard.
Alternatively, you need to pick a small stable subset of SQLite functionality to expose. But that would mean you can't just use plain SQLite anymore - you either need to modify it significantly, or need to wrap it somehow. Which either way adds a lot of overhead again.
This is SQLite compiled to WebAssembly along with a storage layer ("virtual filesystem") that fetches database pages from a web server. So you have the full power of SQLite, and the page doesn't have to download unnecessary data from the server (like the entire dataset just to show you one row). It's a super nifty hack. SQLite built into the browser is somewhat orthogonal to this concern; it didn't allow you to store database pages on the server, and this one doesn't let you write to it.
I hate indexedDB as much as almost anyone, but this is not an accurate characterization of what happened. People wanted to ship 'just SQLite' and magically anoint an open source library as a standard. There are lots of problems with that.
Still fair to ask, given the world we have now where there’s kinda just “nothing” / broken and crappy implementations, would the alternative world in which SQLite was anointed, have been better head-to-head? I think maybe yes it would have?
The new origin-private filesystem allows random access reads and writes. This should give the possibility to use WASM SQLite in the browser with local persistence - everything that WebSQL had.
A major advantage over WebSQL is that the developer is in complete control. The developer can pick the latest version of SQLite, enable any custom extensions (standard or additional ones), or use SQLCipher for encryption, instead of just using whatever the browser provides.
The only real disadvantage is the download size of the wasm code, but it's small enough that it won't be a blocker for sizeable interactive web apps. Just don't do this for your static blog.
This boils down to an ambiguity found in the early days of HTTP: when you ask for a byte range with compression enabled, are you referring to a range in the uncompressed or compressed stream?
The only reasonable answer, in my opinion: a range of bytes in the uncompressed version, which is then compressed on-the-fly to save bandwidth.
I suspect the controversial point was that originally HTTP gzip encoding was used via pre-compression of the static files. On-the-fly encoding was probably too computationally expensive at the time.
Anyway, the result of this choice done decades ago is currently preventing effective use of the ranged HTTP requests. As an example, it is fairly likely that chunks from the SQLite database can be significantly compressed, but both HTTP servers and CDN refuse to compress "206 Partial Content" replies.
Interestingly, another use case of HTTP byte ranges is video streaming, which is not affected by this problem since video is _already_ compressed and there is no redundancy to remove anymore.