Hacker News new | past | comments | ask | show | jobs | submit login
Shot-scraper: Automating screenshots for documentation (simonwillison.net)
63 points by todsacerdoti on Oct 15, 2022 | hide | past | favorite | 15 comments

I just released shot-scraper 1.0, which promises CLI interface stability (until 2.0) and introduces a couple of small new features:

- https://github.com/simonw/shot-scraper/releases/tag/1.0

I had a colleague (thanks Sanjay!) make a similar tool at a hackfest (you can see the source code here: https://github.com/FusionAuth/fusionauth-site/blob/master/sr... ).

I've modified it a bit to take a URL, but haven't yet set it up to read a config file to make a large number of screenshots easy to do.

We do outline certain fields or other areas in the doc to highlight a point. That's caused some hesitation on my part. However, it looks like I could use imagemagick to automatically put a red box or similar on an image (with a `-draw` command).

We have a ton of screenshots (600+) throughout our doco, and a way to initialize our product to a known state, so the pieces are all there.

One of these days it'll be worthwhile to do this.

In the iommi docs (https://docs.iommi.rocks) I went another direction: I save the actual html page for each test and show it in an iframe. I think this is a superior approach to docs. You can inspect element on the result!

shot-scraper, combined with ImageMagick's compare [1], would be a great start for automated QA to check for visual stability to a website/webapp.

[1]: https://imagemagick.org/Usage/compare/

I use compare as part of our deploy pipelines with puppeteer as a driver. Works great as a way to ensure a simple and fast way to confirm we're "still up".

I probably said this before, but I don't get Datasette. I know I'm not the target audience (as I don't have large datasets), but whenever I used it it just gave me a table. Is it meant to be Looker for SQLite? I'm not quite sure what the killer app is.

I have trouble answering this question myself, and I created it!

The problem I have is that it can be applied to too many different problems.

I personally have used it for the following (a truncated summary):

- Publishing data online to allow other people to explore it, for example https://scotrail.datasette.io and https://russian-ira-facebook-ads.datasettes.com/

- Building websites, by combining it with custom templates. https://datasette.io and https://www.niche-museums.com and https://til.simonwillison.net are three examples

- Building my own combined search engine over a bunch of different data. https://github-to-sqlite.dogsheep.net is this for my GitHub issues and commits and issue comments across 100+ projects

- Similarly, building a code search engine across multiple repos (partly to demonstrate how far you can go with custom plugins): https://ripgrep.datasette.io

- Any time I have a CSV file I open it in the Datasette Desktop macOS app first to start exploring it: https://datasette.io/desktop

- As a prototyping tool. It's the fastest way I know of to get from some data files (CSV or JSON) to a working JSON API - and a GraphQL API too using this plugin: https://datasette.io/plugins/datasette-graphql

- Messing around with geospatial data - here's a write-up of my favourite experiment with that so far: https://simonwillison.net/2021/Jan/24/drawing-shapes-spatial...

This is a bewilderingly wide array of things! And I keep on finding new problems I can apply it to.

Of course, if all you have is a hammer, everything looks like a nail. But thanks to the plugin system (and the amazing flexibility of SQLite under the good) I can reshape my hammer to do all sorts of other things!

Maybe it's more of a sonic screwdriver.

I've been trying to capture some of this at https://datasette.io/for

This is one of my biggest marketing challenges for the project though. If someone asks you for an elevator pitch you need to do better than spending 15 minutes talking through a wide ranging bulleted list!

Hi Simon! That's a good list, thank you.

I think my problem with understanding here is twofold: First, I visited the links above, but I don't understand how you get to that end result from downloading Datasette. Is it a "base layer" and then I'm meant to mostly download plugins for the rest of the functionality? I'm not sure if it's meant to be barebones or if I just didn't find the included batteries when I tried it.

The second issue I have is, what exactly is the use case? I don't know if it's meant to be a Django Admin for SQLite, or a Looker for SQLite, or something else? I know it's very powerful and flexible, but that's a curse in this case, as I can't see a simple sentence of "think of it as X for Y" so I can make it fit into my mental model of other stuff.

As a use case, I tried to make it show a trend of my blood test results (I have about a dozen rows in a spreadsheet), but I had a hard time graphing stuff, and I gave up. It seemed to me back then that this probably should have been standard, but I remember having to look for plugins and them not being very easy to configure, so I'm just left with confusion :/

This is really useful feedback, thank you. As you can see, I've been struggling with this problem (how to succinctly explain the project) for a very long time at this point.

Love that first question. I need a much better "getting started" flow - right now you can start using a static demo or head to the tutorial, but something more instant than that would be much better.

I'm building out the SaaS version of Datasette at the moment, maybe a free preview of that would help?

I've tried out a whole bunch of "X for Y" things in my head, none of which have quite worked - the problem is that there are too many different facets of it to capture in a single comparison.

Here are a couple:

"Snowflake for small data". Data warehouses are big and expensive and complicated. If your data fits under 10GB (which is true for the vast majority of individuals and companies) you don't need a data warehouse: you need Datasette running on your laptop.

"WordPress for structured data". WordPress is a decent CMS with a plugin system that means you can apply it to any publishing problem. Datasette aims to be a decent SQL explorer and API server with a plugin system that ideally means it can be expanded to handle any form of data visualization, analysis or API-driven integration you could want to handle.

That blood test example is really useful. I've used https://datasette.io/plugins/datasette-vega for that kind of thing but you're right - it really needs to be promoted to a first-class feature at this point. (I'd also like to make it a lot more powerful - it's one of the oldest plugins.)

> I'm building out the SaaS version of Datasette at the moment, maybe a free preview of that would help?

Maybe, though my issue hasn't been so much installing it as getting it to do useful things.

Your use cases sound great, thanks. Especially "Snowflake for small data" sounds great, I did want something like a simple PowerBI for my small SQLite database. If I could somehow create a dashboard, or some views, or something, and publish them along with my database, that would be amazing.

Plugins do sound good too, but when I download Datasette I already have something in mind so I already want it to come with the most common things I might want to use, along with examples on how to use them.

Vega was the plugin I found as well, but I remember it took me too long to figure out how to make a simple graph, so I quit at that point. If it were built-in, along with a bunch of other plugins, and I had an example of how I could create some simple graphs, I would have gotten much farther along, I think.

OK, I really like how you're framing this in terms of the first-run experience.

Running "brew/pip install datasette" isn't much help if you don't have a SQLite database already. So now you need to learn to use "sqlite-utils" or similar too to create one... by which point I could absolutely see most people having dropped off already.

I had a feature idea relating to this a while back which I should bump up the list: https://github.com/simonw/datasette/issues/1160

I opened a new issue based on this conversation here: https://github.com/simonw/datasette/issues/1845

Ah yes, that was another one of my problems. I had a CSV and had to figure out a way to convert that into a DB. I could have created a table and written an import script, but I think I did end up using `csvs-to-sqlite`. You're right, if Datasette could import my data from the CSV, even if the schema wasn't very reasonable, that would have gone a long way towards helping my first-run experience.

Generally, the first-run experience is really important, because you may be getting lots of people in the funnel, but if you're losing them to complexity before they can get to the "this is nice" moment, it's for nothing.

Even something like finding plugins can be hard for first-timers, and you unfortunately don't hear about all the users who drop out, because they just stop using the software and never talk to you.

Have you thought about "Data-Intelligence-as-a-Service"? (a made-up mishmash of "Data Access" And "Business Intelligence")

I had not, that's an interesting angle - thanks!

"Datasette: an open source data warehouse you can run on your laptop.", or "like Snowflake, but for chubby data."

(I'll see myself out)

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact