https://fivethirtyeight.datasettes.com/ contains SQLite versions of every CSV published by the FiveThirtyEight crew to https://github.com/fivethirtyeight/data
I built a separate tool on top of pandas which can mass-convert CSVs to SQLite - https://github.com/simonw/csvs-to-sqlite - so building the database for https://fivethirtyeight.datasettes.com/ is now a one-liner:
csvs-to-sqlite ~/fivethirtyeight/data fivethirtyeight.db
Both of these demos are served through Cloudflare - since the URL to each database incorporates a sha256 hash of the database contents, I can send a far-future cache expiry header on everything.
Small nitpick: It incorporates a SHA-256 prefix, i.e. a greatly reduced SHA-256 hash, so the cryptographic integrity guarantees no longer hold.
> send a far-future cache expiry header on everything
If you really want to do that, you should include the full SHA-256 hash in the URL. If you want to shorten the URL, use a more compact encoding (e.g. Base64 or Ascii85, hex).
Alternatively, make it optional (but I admit this probably defeats the purpose): Short URLs without hash and short expiration, long URLs with hash and eternal expiration.
Not as fancy as OP's, but runs great in production since last June :-)
XMYSQL : One command to generate REST APIs for any MySql
> 20+ releases to npm in last 2 weeks.
> Have added 'a ton' of features since last shown on HN.
There's a hard limit of 1,000 rows allowed to be returned at any one time. This is because the overhead of serializing more JSON that than could potentially lock up the server. If you try to pull more than that back you get a warning that the data has been truncated, e.g. https://fivethirtyeight.datasettes.com/fivethirtyeight-2628d... - in the JSON that shows up as "truncated": true
If that happens to you using raw SQL, it's up to you to implement offset/limit pagination.
If you are looking at a view, the tool implements offset/limit pagination for you e.g. https://parlgov.datasettes.com/parlgov-25f9855/view_election...
For iterating through a raw table, a more efficient mechanism is used: cursor-based pagination. The table is ordered by its primary key (or its rowid if no primary key was specified) and the "next" links say "get the next 100 items where the primary key is greater than this value" - e.g. https://parlgov.datasettes.com/parlgov-25f9855/cabinet_party...
This means even a table with millions of records can be efficiently paginated through - unlike offset/limit which tends to get slow after a few hundred thousand records.
The thing that excites me is being able to take almost any dataset, package it up into a simple, well-understood format and then deploy that in a read-only, highly scalable way, taking advantage of modern containerization systems.
Read-only SQLite is /perfect/ for this use-case. It's fast, widely supported, has a great feature-set and can be distributed as a single file.
My favourite feature of Datasette isn't the web UI, it's the publish command:
datasette publish now ~/path-to-my/sqlite.db
I have a sqlite DB with historical Visa data, and I want to expose it as an API (like your software does), but with API key. Not a python user here, can you tell me about the following?
1. Is it possible to put some security behind the API? So only queries with a key can access it?
2. How would I host this API? Where would the SQLite DB reside?
3. I understand the API is read-only, which is great and works for my use case! Can I also do read-writes to the database from a separate application that will keep updating the database on a nightly basis?
I understand the benefits of Immutable pattern, but it'll be nice if datasettle could provide interface with single non-frequent writer(for example, most FRED data is updated seasonally, some of them updates daily) so the pipeline pulling data from some source could be automated
Because the data is meant to be immutable, Datasette works extremely well with any caching proxy. Cloudflare just happens to have the most generous free tier, but self-hosted Varnish or Fastly would work just as well as Cloudflare, if not better.