
Show HN: Datasette Publish – Turn CSVs into a SQLite-Backed JSON API - simonw
https://publish.datasettes.com/
======
simonw
Way more information plus an animated screenshot showing how to use it on my
blog: [https://simonwillison.net/2018/Jan/17/datasette-
publish/](https://simonwillison.net/2018/Jan/17/datasette-publish/)

This is built using the brand new Zeit Now API, which makes it easy to write
software that in turn deploys other software:
[https://zeit.co/blog/api-2](https://zeit.co/blog/api-2)

~~~
o1lab
Xmysql: One command to generate api for any MySql database. Generates api at
blink of an eye (literally).

[https://github.com/o1lab/xmysql](https://github.com/o1lab/xmysql)

Has some really nice apis for group by, join and intutive way to perform
'where', 'order by' and other MySql clauses.

(shameless plug)

------
simonw
Ryan Pitts on Twitter used this to build an API for UFO sightings in
Washington state and the North West:
[https://twitter.com/ryanpitts/status/953728651945627648](https://twitter.com/ryanpitts/status/953728651945627648)

Here's his datasette: [https://datasette-cmwqbwgnqe.now.sh/csv-
data-5284def/nuforc_...](https://datasette-cmwqbwgnqe.now.sh/csv-
data-5284def/nuforc_reports_WA)

And here's an example SQL query showing the cities with the most UFO
sightings: [https://datasette-roborbllfq.now.sh/csv-
data-b1429ef?sql=sel...](https://datasette-roborbllfq.now.sh/csv-
data-b1429ef?sql=select+%22City%22%2C+count%28%2A%29+as+%22count%22+from+convertcsv++group+by+%22City%22+order+by+%22count%22+desc+limit+100)

------
dmeeker
Related option for those who are less focused on the JSON API and instead want
to make their datasets public and queryable: data.world added SQL support a
little while back, joining the original SPARQL. One big advantage there is
that folks can write queries that join your datasets to other open datasets on
data.world.

------
dgudkov
This can be a good solution to use with JS charting libraries that pull
CSV/JSON data from an external data source. Add dynamic chart updates on
source data updates and you get cheap and easy dynamic web charts.

------
edraferi
Original discussion around Datasette's launch back in November 2017:

[https://news.ycombinator.com/item?id=15691409](https://news.ycombinator.com/item?id=15691409)

------
elephant_burger
This is amazing. Great idea I will try it. I've also been using Zeit now and
love it too

------
pwaai
been seeing a lot of these convert X into JSON api with query ...but what
would be the use cases for this type of tool? who needs this type of solution?

~~~
simonw
Datasette is specifically designed for publishing static data - it doesn't
handle UPDATE/INSERT traffic at all.

This is a good fit for any time you want to share some static data with the
world - instead of posting a CSV file, you can share your data as a Datasette
instance with a browseable web UI and a JSON API instead.

The three use-cases I'm personally most interested in for this are:

Data journalism: newspapers collect data about e.g. police shootings
[https://vice-police-shootings.now.sh/](https://vice-police-shootings.now.sh/)
or actions under the antiquities act
[https://fivethirtyeight.datasettes.com/fivethirtyeight-2628d...](https://fivethirtyeight.datasettes.com/fivethirtyeight-2628db9/antiquities-
act%2Factions_under_antiquities_act) \- I want to make it as easy as possible
to publish the data beneath the stories.

Government open data: cities like San Francisco publish huge amounts if
interesting data. I used San Francisco's 190,000 line CSV of trees in the city
to build [https://sf-tree-search.now.sh/](https://sf-tree-search.now.sh/)

Cultural institutions: museums, libraries and art galleries have the
collection and sharing of data as part of their mandates. I want to make it as
easy as possible to build things like the London Science Museum's collection
API: [https://group.sciencemuseum.org.uk/about-
us/collection/using...](https://group.sciencemuseum.org.uk/about-
us/collection/using-our-collection-api/)

~~~
toomuchtodo
Why not convert the data from csv to json and serve the json blob from your
object store or CDN (if this solution is immutable)? No need for the API
interface, just gzip the json blob and have the client retrieve it entirely.

No need for an app, app server, and the json can be cached with the rest of
your content. Unless I’m missing something?

~~~
simonw
Size. The San Francisco trees database is around 50MB, and I've successfully
tested it with multiple GB sized SQLite databases.

~~~
lozzo
this was a key answer to a good question. I think you should emphasize that
part (handling large data sets) as a key aspect of datasette

------
WaxProlix
Is this a managed hadoop/hive cluster with some sort of field heuristics baked
in? Looks cool and very useful for a certain set of users.

~~~
simonw
Nope, it's much more simple than that. Each CSV upload is converted into a
SQLite database. Then I deploy a static, read-only copy of that database to a
Zeit Now hosting account and fire up my "datasette" command-line tool (written
in Python 3) which opens the SQLite database and starts serving up queries
from it.

[https://github.com/simonw/datasette](https://github.com/simonw/datasette)

The magic here is that the SQLite database is opened in read-only mode, which
means I don't have to worry about concurrent access or write activity. SQLite
is screamingly fast if you use it in this context.

Because the data is read-only, if you ever DO need to scale to handle more
traffic you can do so by firing up additional instances, each with their own
copy of the SQLite database file.

~~~
brandonlipman
This is awesome. Could this be extended so that new CSV uploads can be added
at a later date and deduplicate?

~~~
simonw
Zeit Now deployments are immutable, but I can have an "edit" feature which
retrieves the data used for a deployment, lets you modify it (add new CSV
files for example) and then deploys a brand new instance with the new changes.
Zeit supports URL aliases (which I need to expose in Datasette Publish) so you
can point an alias at the fresh deployment.

Stuff like CSV de-deplication is out of scope for Datasette Publish - it's
something you would need to do with other tools (e.g. Excel) before uploading
the data.

------
m3kw9
Now the bottle neck to publish data getting the API and dB to be hosted
somewhere because that part is still not trivial for many

~~~
simonw
That's exactly what Datasette Publish solves: it's a tool which you upload
CSVs to and it deploys the application on the internet for you, for free.

Under the hood it's taking advantage of the incredibly generous free hosting
tier offered by Zeit: [https://zeit.co/pricing](https://zeit.co/pricing)

As a user though, you don't need to worry about this at all. Sign in, verify
your email address (which signs you up for an account with Zeit), upload your
CSVs and Datasette Publish will deploy the application with your data and give
you back a URL hosted on the public internet.

