I was wondering how this ran queries against SQLite files in an S3 bucket. Here's the answer - it's using some very clever Python ctypes code to implement that HTTP Range header trick from last year:
This is great. Servers have been the weak link in the open vector tiles stack for a good while and an out-of-the-box recipe like this will make a real difference.
(For those running on bare metal, I knocked this up a few months ago - https://github.com/systemed/mod_mbtiles - which is an Apache module that does pretty much the same thing.)
This is pretty cool! Didn't know you can query SQLite databases from s3. I recently tried out PMTiles[1] to do something similar but it requires an extra processing step. Would anybody know the pros and cons of this vs PMTtiles?
Com-tiles[1] also looks pretty good, a bit more optimized than pmtiles.
The whole cloud optimized vector tile/data space is next after COGs (cloud optimized geotiffs). Point-clouds are also getting the index-range-request treatment as COPC[2].
The COMTiles design has a very neat idea for batching requests directly on the client due to its ordered layout; this should also be possible in PMTiles spec v3 which introduces an optional Hilbert ordering on disk, as well as index compression and optimizations for run-length-encoding of repetitive tiles. I'm hoping to have this ready and implemented in clients by the FOSS4G conference (late August)
The end goal of PMTiles is similar to SQLite-over-HTTP: SQLite is designed for access on local disk, and might make many requests for a single query, each incurring a HTTP roundtrip (could take hundreds of milliseconds). PMTiles is designed from the ground up for HTTP, so attempts to minimize the # of requests to 3 or less for any tile.
The extra conversion step is something I'm hoping to get rid of by building PMTiles writers into tiling tools directly.
Another bonus of writing pmtiles directly is that it will cut the time to generate vector tiles for the planet from ~40 to ~30 minutes on a 64-core ec2 instance since pmtiles format is much simpler and more efficient to write than SQLite. It should also cut the output file size a bit from the 80gb that mbtiles uses.
Though if you have enough users pinging the server you'll probably end up with a bunch of operations happening, which do have fees. It may be better cost-wise to just put this SQLite database on D1 (especially if it gets read-only replication at some point) and set up a Worker to return tile data in one HTTP request. That way the infrastructure is still serverless, but doesn't require N requests per tile, just 1.
I'm not sure! I would imagine it's planned though, scaling to that size seems important for a database product. I'm not on the D1 or R2 teams so I'm not intimately familiar with their products.
We’ve been experimenting with R2 for serving tiles from an 80gb pmtiles archive, unfortunately the latency on up to 3 byte range requests from a worker is 5-10 seconds so it’s not practical for real usage yet. Can we expect that to improve as R2 gets closer to general availability?
https://github.com/michalc/sqlite-s3-query/blob/main/sqlite_...
Here's the inspiration for that: https://github.com/phiresky/sql.js-httpvfs