Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: mbtiles-s3-server – serve vector map tiles directly from mbtiles on S3 (github.com/uktrade)
72 points by michalc on July 10, 2022 | hide | past | favorite | 19 comments



I was wondering how this ran queries against SQLite files in an S3 bucket. Here's the answer - it's using some very clever Python ctypes code to implement that HTTP Range header trick from last year:

https://github.com/michalc/sqlite-s3-query/blob/main/sqlite_...

Here's the inspiration for that: https://github.com/phiresky/sql.js-httpvfs


This is great. Servers have been the weak link in the open vector tiles stack for a good while and an out-of-the-box recipe like this will make a real difference.

(For those running on bare metal, I knocked this up a few months ago - https://github.com/systemed/mod_mbtiles - which is an Apache module that does pretty much the same thing.)


I've got an nginx version here too: https://github.com/durkie/ngx_http_mbtiles_module


This is pretty cool! Didn't know you can query SQLite databases from s3. I recently tried out PMTiles[1] to do something similar but it requires an extra processing step. Would anybody know the pros and cons of this vs PMTtiles?

[1] https://github.com/protomaps/PMTiles


Com-tiles[1] also looks pretty good, a bit more optimized than pmtiles.

The whole cloud optimized vector tile/data space is next after COGs (cloud optimized geotiffs). Point-clouds are also getting the index-range-request treatment as COPC[2].

[1] https://github.com/mactrem/com-tiles.

[2] https://copc.io/


The COMTiles design has a very neat idea for batching requests directly on the client due to its ordered layout; this should also be possible in PMTiles spec v3 which introduces an optional Hilbert ordering on disk, as well as index compression and optimizations for run-length-encoding of repetitive tiles. I'm hoping to have this ready and implemented in clients by the FOSS4G conference (late August)


The end goal of PMTiles is similar to SQLite-over-HTTP: SQLite is designed for access on local disk, and might make many requests for a single query, each incurring a HTTP roundtrip (could take hundreds of milliseconds). PMTiles is designed from the ground up for HTTP, so attempts to minimize the # of requests to 3 or less for any tile.

The extra conversion step is something I'm hoping to get rid of by building PMTiles writers into tiling tools directly.


You can track progress for planetiler producing pmtiles output here: https://github.com/onthegomap/planetiler/issues/98.

Another bonus of writing pmtiles directly is that it will cut the time to generate vector tiles for the planet from ~40 to ~30 minutes on a 64-core ec2 instance since pmtiles format is much simpler and more efficient to write than SQLite. It should also cut the output file size a bit from the 80gb that mbtiles uses.


I wouldn't recommend using any sqlite database direct on s3... You'll end up with a pretty high bill from all the operations.


Do range requests against s3 work if you’re fronting with Cloudflare and aggressively caching? Could be more economical.


It should work directly against R2 - our S3 compatible object storage) w/ no egress fees: https://developers.cloudflare.com/r2/

Makes experimenting with things like this much cheaper.

(I work for CF)


Though if you have enough users pinging the server you'll probably end up with a bunch of operations happening, which do have fees. It may be better cost-wise to just put this SQLite database on D1 (especially if it gets read-only replication at some point) and set up a Worker to return tile data in one HTTP request. That way the infrastructure is still serverless, but doesn't require N requests per tile, just 1.

(I also work for CF)


Can D1 handle a 92GB SQLite database file?


I'm not sure! I would imagine it's planned though, scaling to that size seems important for a database product. I'm not on the D1 or R2 teams so I'm not intimately familiar with their products.


Thank you Cloudflare folks for contributing, extremely helpful.


We’ve been experimenting with R2 for serving tiles from an 80gb pmtiles archive, unfortunately the latency on up to 3 byte range requests from a worker is 5-10 seconds so it’s not practical for real usage yet. Can we expect that to improve as R2 gets closer to general availability?


I also just learned this. Apparently range header works with S3 https://docs.aws.amazon.com/AmazonS3/latest/API/API_GetObjec...


A while ago I wrote an Athena connector for querying SQLite in S3. ;) https://github.com/dacort/athena-sqlite Uses a similar approach.


Byte range queries against S3 are the core mechanism to search the Common Crawl and has been for years.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: