Show HN: mbtiles-s3-server – serve vector map tiles directly from mbtiles on S3

simonw · on July 10, 2022

I was wondering how this ran queries against SQLite files in an S3 bucket. Here's the answer - it's using some very clever Python ctypes code to implement that HTTP Range header trick from last year:

https://github.com/michalc/sqlite-s3-query/blob/main/sqlite_...

Here's the inspiration for that: https://github.com/phiresky/sql.js-httpvfs

Doctor_Fegg · on July 10, 2022

This is great. Servers have been the weak link in the open vector tiles stack for a good while and an out-of-the-box recipe like this will make a real difference.

(For those running on bare metal, I knocked this up a few months ago - https://github.com/systemed/mod_mbtiles - which is an Apache module that does pretty much the same thing.)

durkie · on July 10, 2022

I've got an nginx version here too: https://github.com/durkie/ngx_http_mbtiles_module

jtmiclat · on July 10, 2022

This is pretty cool! Didn't know you can query SQLite databases from s3. I recently tried out PMTiles[1] to do something similar but it requires an extra processing step. Would anybody know the pros and cons of this vs PMTtiles?

[1] https://github.com/protomaps/PMTiles

xemoka · on July 10, 2022

Com-tiles[1] also looks pretty good, a bit more optimized than pmtiles.

The whole cloud optimized vector tile/data space is next after COGs (cloud optimized geotiffs). Point-clouds are also getting the index-range-request treatment as COPC[2].

[1] https://github.com/mactrem/com-tiles.

[2] https://copc.io/

bdon · on July 10, 2022

The COMTiles design has a very neat idea for batching requests directly on the client due to its ordered layout; this should also be possible in PMTiles spec v3 which introduces an optional Hilbert ordering on disk, as well as index compression and optimizations for run-length-encoding of repetitive tiles. I'm hoping to have this ready and implemented in clients by the FOSS4G conference (late August)

bdon · on July 10, 2022

The end goal of PMTiles is similar to SQLite-over-HTTP: SQLite is designed for access on local disk, and might make many requests for a single query, each incurring a HTTP roundtrip (could take hundreds of milliseconds). PMTiles is designed from the ground up for HTTP, so attempts to minimize the # of requests to 3 or less for any tile.

The extra conversion step is something I'm hoping to get rid of by building PMTiles writers into tiling tools directly.

zylepe · on July 10, 2022

You can track progress for planetiler producing pmtiles output here: https://github.com/onthegomap/planetiler/issues/98.

Another bonus of writing pmtiles directly is that it will cut the time to generate vector tiles for the planet from ~40 to ~30 minutes on a 64-core ec2 instance since pmtiles format is much simpler and more efficient to write than SQLite. It should also cut the output file size a bit from the 80gb that mbtiles uses.

londons_explore · on July 10, 2022

I wouldn't recommend using any sqlite database direct on s3... You'll end up with a pretty high bill from all the operations.

toomuchtodo · on July 10, 2022

Do range requests against s3 work if you’re fronting with Cloudflare and aggressively caching? Could be more economical.

elithrar · on July 10, 2022

It should work directly against R2 - our S3 compatible object storage) w/ no egress fees: https://developers.cloudflare.com/r2/

Makes experimenting with things like this much cheaper.

(I work for CF)

piperswe · on July 10, 2022

Though if you have enough users pinging the server you'll probably end up with a bunch of operations happening, which do have fees. It may be better cost-wise to just put this SQLite database on D1 (especially if it gets read-only replication at some point) and set up a Worker to return tile data in one HTTP request. That way the infrastructure is still serverless, but doesn't require N requests per tile, just 1.

(I also work for CF)

simonw · on July 10, 2022

Can D1 handle a 92GB SQLite database file?

piperswe · on July 10, 2022

I'm not sure! I would imagine it's planned though, scaling to that size seems important for a database product. I'm not on the D1 or R2 teams so I'm not intimately familiar with their products.

toomuchtodo · on July 10, 2022

Thank you Cloudflare folks for contributing, extremely helpful.

zylepe · on July 10, 2022

We’ve been experimenting with R2 for serving tiles from an 80gb pmtiles archive, unfortunately the latency on up to 3 byte range requests from a worker is 5-10 seconds so it’s not practical for real usage yet. Can we expect that to improve as R2 gets closer to general availability?

killingtime74 · on July 10, 2022

I also just learned this. Apparently range header works with S3 https://docs.aws.amazon.com/AmazonS3/latest/API/API_GetObjec...

dacort · on July 11, 2022

A while ago I wrote an Athena connector for querying SQLite in S3. ;) https://github.com/dacort/athena-sqlite Uses a similar approach.

th0ma5 · on July 10, 2022

Byte range queries against S3 are the core mechanism to search the Common Crawl and has been for years.