Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: SQL workbench in the browser (sql-workbench.com)
117 points by tobilg 10 months ago | hide | past | favorite | 24 comments



I’ve been working on something similar to this, but using the WASM build of SQLite, which is amazing. But then I started implementing a small file browser for the Origin private FS API, just so I could manage the database files when debugging, you see - and I got so thoroughly sidetracked, I only noticed when I finished the multi-modal CSV file editor which could show CSV files as sortable tables, using a streaming, web worker based parser no less. Shortly after I found the whole thing way too bothersome and forgot about it completely until now.

It’s a dangerous business, creating side projects. Before you know it, you build streaming parsers :)


My version of this kind of thing the WASM build of Python which includes a WASM build of SQLite, running my Datasette server-side web application entirely in the browser. Here's a SQL query executed against that parquet file of AWS edge locations: https://lite.datasette.io/?parquet=https://raw.githubusercon...


The SQL Workbench is built upon DuckDB WASM, Perspective.js and React.

It supports the querying remote and local data (Parquet, CSV and JSON), data visualizations and the sharing of multiple queries via URL.

There‘s also a tutorial blog post at https://tobilg.com/using-duckdb-wasm-for-in-browser-data-eng... that explains common usage patterns.

Happy to answer any questions!


Perspective is an awesome table library, very powerful and fast. I get the sense it's mostly used for internal finance applications and not for web apps much.


I integrated the perspective js into the datasette.io as a plugin as I was dealing with a larger number of rows.

Its not bad. The map by GPS isn't as great as Kepler.gl, and for some reason, perspective doesn't work so well in corporate offices that may be using Remote Browser Instances... had some issues.


This is great, I've been meaning to build something similar for some time. I tried to run the queries from the tutorial but hit lots of CORS errors loading the datasets, is there any way that you have found to work around those?


I'm sorry, I think I fixed this. Can you check? Thanks for letting me know!


Sorry, it seems to still be happening, running:

    SELECT count(*) FROM 'https://data.quacking.cloud/nyc-taxi-data/yellow_tripdata_2023-01.parquet';
Shows the error:

Cross-Origin Request Blocked: The Same Origin Policy disallows reading the remote resource at https://data.quacking.cloud/nyc-taxi-data/yellow_tripdata_20.... (Reason: CORS header ‘Access-Control-Allow-Origin’ missing). Status code: 200.

I get the same issue in both firefox and brave

edit: actually accessing that file directly gets a Access Denied error for me



Yep, looking good now, thanks! Other datasets that don't have CORS headers still don't work ie. https://sql-workbench.com/#config=%7B%22plugin%22%3A%22Datag... but I'm guessing that's inherent limitation of running this without a backend


Yes, unfortunately if the "foreign" sources don't support CORS, you'd have to use a CORS proxy... If you want to self-host, there's one at https://github.com/Zibri/cloudflare-cors-anywhere that can be deployed to CloudFlare Workers (the code is a bit messy though).

GitHub supports CORS for raw data for example, that's why I put it in the sample queries.


This is interesting. Any chance you will open source it?


Eventually at a later point in time! I‘m currently integrating the AI-based generation of queries…


Nice, I was working on something similar.

Using meta queries to fill my system prompt with the entire schema of the db in a condensed format.

It was working quite well!

Would love to hear about your approach.


Cool, my approach is basically the same, put the schema in the system prompt automatically, and the user prompt from the UI, and render the resulting SQL back to the UI.

Question is more where to host this "on the cheap" because this is a free service, and I can't just spend hundreds of Dollars/month to keep it running... Do you have any recommendations?


If you're open sourcing it just include a docker-compose in the root directory. I'm sure most people would like to self-host.

In general I don't think this would eat too many resources, just throw it on a VPS using systemd.


This is a relatively similar architecture to that we are building at Evidence.dev (open-source data viz framework)

Architecture: https://evidence.dev/blog/why-we-built-usql/

1. Query SQL databases, APIs, or local data (eg CSV)

2. Compile all the data sources into Parquet files

3. DuckDB-WASM in the browser that allows you to aggregate across sources

4. Users write code in DuckDB SQL and Markdown, enriched with viz components (built in Svelte)

Some things that we have learned in the process:

- It can be pretty performant up to about 20M rows of data in the parquet files,

- Above a certain level, for speed, it's helpful to sort your data in your parquet files to take advantage of DuckDB's predicate pushdown

- It's helpful to map DB types into a smaller set of Arrow types when you convert to Parquet - otherwise you have to consider a lot of different cases in the browser when you render in JS

- DuckDB-WASM still has some rough edges, though is improving fast


I love Evidence.dev, it’s a great tool!


We're really excited to see all the stuff built on DuckDB-WASM too!

I love how fast your workbench runs, and how effortlessly it renders 60k rows in the browser

Some UX feedback, if you're open to it:

- It was initially unintiutive for me that I needed to highlight a whole SQL statement to get Ctrl-Enter to run it. Also a prompt that this was the correct shortcut would help!

- I dragged in a CSV file, and the name was too long to show up in your table explorer, so I couldn't tell what the table name had been called (it turned out I needed to `select * from table_name.csv` - the csv postfix was unexpected

- The CSV file had headers, but they were not auto detected - would be good to be able to configure this


Thanks for the feedback! Regarding the run query shortcut, if you open the page, there‘s a comment banner on the top of the editor that mentions it. It‘s not the first time that I hear/read that people overlooked it though.

What would be a more intuitive way from your POV?

I‘ll look into the horizontal scrolling/CSV header issues, thanks!


Hi Tobi; Hi Archie, we all know each other from Twitter.

Check out what Postico does; it just highlights the current statement under the cursor by default and that's what gets run by default. Can't resist imposing this subtle SQL UX on everyone because it's my favorite (for like 10 years now).


Thanks Nico, a small world :-)

I‘ll look into this as fallback method. I don’t like Monaco Editor‘s handling of selections and cursors, it’s quite complicated, but this should be possible to implement…


Aha I did not notice that prompt. By default my brain is trained to ignore things that are commented out in code

I think I would execute the SQL query that the cursor was within.

Ie anything until the next ;


Thanks, I‘ll look into that!




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: