Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Query Your Sheets with SheetSQL (sheetsql.io)
128 points by tarasyarema 9 months ago | hide | past | favorite | 71 comments
Hello HN!

I've developed a tool named SheetSQL that allows you to query, join, export, and schedule your queries to Google Sheets through a straightforward SQL interface in the browser.

This tool is a simple, first iteration of the idea, so I'm eager to receive feedback on it. You can contact me at taras [at] sheetsql.io :D. I'd love to find out if it could be useful for folks out there!

As someone working in fintech, I constantly deal with sheets and often find it challenging to perform seemingly simple tasks like JOINs natively. However, I'm familiar with SQL, which inspired the creation of SheetSQL. It's designed to assist those who use sheets daily and have some SQL knowledge, making operations across multiple sheets and worksheets as easy as if they were interacting with a Postgres database.

For those interested in the technical details, the engine powering the queries in the background is DuckDB. Therefore, you can expect support for all syntax from the latest version of DuckDB :)

Cheers,




Very cool! I built a company around SQL and Google Sheets (https://www.seekwell.io/, acquired by ThoughtSpot in 2022). Plenty of opportunity there.

You get a lot of this for free in duckdb. e.g. if you run the query below from the duckdb CLI, it'll query the sheet[0]:

    SELECT *
    FROM read_csv_auto('https://docs.google.com/spreadsheets/export?format=csv&id=1GuEPkwjdICgJ31Ji3iUoarirZNDbPxQj_kf7fd4h4Ro', normalize_names=True);

We use duckdb under the hood at my new company (https://www.definite.app/), so we get this same sort of functionality for free.

0 - https://twitter.com/thisritchie/status/1767922982046015840


Yeah, that is actually super nice, the potential DuckDB offers to operate with data in such an easy way. With a bit of tuning on the auth side it's that easy indeed.

I think it is a pain point for some people, and it's great to see other teams working on similar products and being based on the same tech.


Don’t you also get this functionality from a combination of BigQuery, Connected Sheets and ImportRange nowadays?


Probably you could do something similar, the thing is that duckdb is much more lightweight. But at the same time if you end up having billions of data points BQ is there to scale for sure jaja


Nice! I built https://sql-workbench.com, which is based on DuckDB WASM as well, and can also query Google Sheet directly from the browser.

Here's an example for querying a Google Sheet:

https://sql-workbench.com/#queries=v0,SELECT-*-FROM-read_csv...


Indeed the tool is great, and the performance is amazing in the browser. I believe DuckDB has a ton of potential!

The same query in SheetSQL

https://ibb.co/dDdZ0vC


Readers may also enjoy Steampipe [1], an open source CLI to live query Google Sheets [2] and 140+ other services with SQL (e.g. AWS, GitHub, etc). It uses Postgres Foreign Data Wrappers under the hood and supports joins etc across the services. (Disclaimer - I'm a lead on the project.)

1 - https://github.com/turbot/steampipe 2 - https://github.com/turbot/steampipe-plugin-googlesheets


Oh wow, I did not know about this, thanks for adding. Curious, do you handle live updates of the data upstream, it would refetch the data each time?


Steampipe does live queries against the API endpoint and stores results in an in-memory cache. You can modify the cache expiration to adjust that behavior across queries [1]. If you want to do change data capture to act on results then check out the query trigger [2] in our Flowpipe open source project [3].

1 - https://steampipe.io/docs/guides/caching 2 - https://flowpipe.io/docs/flowpipe-hcl/trigger/query 3 - https://github.com/turbot/flowpipe


Very interesting. Seems conceptually (and implementation wise too) quite similar to https://trino.io/docs/current/connector/googlesheets.html.

Where are you storing the structure of the sheet though? Is it re-evaluated on each query? I am thinking about cases where a column header is renamed or the data types in a column are changed?


Oh, that's actually cool and thanks for the reference.

It's using a temporary cache that re-uses the latest data from the sheet (checking if it was updated) so that's why the subsequent queries should be much faster than the initial one.

Now about the headers, it's indeed an issue that I need to figure out, as there are some sheets that have weird structures and it would be nice to find an easy UXsy way to preview the sheet data and say "select from this range" or "these are the headers".


Google Sheets has an extensibility system: you write a React app that renders as a sidebar inside the sheets UI. This exposes a few additional API endpoints vs the vanilla sheets API. For example, you can determine the selected range, when a user has edited a value, changed sheets, etc. We use this in our add-on to help people select a range. Sounds like it might help for you, too!

The extensibility system also lets you show a modal dialog, which we use for previewing API calls to third party HTTP servers -- sounds like a similar thing could work well for you, too.

You could also look at supporting named ranges if you don't already, so people can refer to a range as `FinanceData` instead of `Sheet 1!A3:F90`


> it would be nice to find an easy UXsy way to preview the sheet data and say "select from this range" or "these are the headers".

Indeed. Defining the UX when the only UI you have is a SQL query is a bit challenging.

Very impressive project indeed and solves a real problem a lot of people and companies have. I wish you luck.


Will keep the site posted on this, but definetly is one of the things that worries me the most as of now.

Thanks for the comment!!


I’ve made it used an interface to Google sheets at every company I’ve worked at, and in every case it’s made me a hero to all the non-tech employees. Finally a UI they understand!

Often previous tech people had to built internal CRUD apps with Byzantine interfaces that trafficked in JSON or YAML files.

But being able to make changes to our app, or import/export data, and all via making changes to a google sheet—this feels like magic.


How do you deal with the validation problems that crop up, the data entry problems that crop up, and the impedance mismatch between a traditional datastore and the sheet?


Google sheets actually has a pretty robust data validation setup (up to regular expressesions), and most business users are fairly good at handling detailed instructions.

But yes, for the inevitable times when they get it wrong, you just need some robust alerting system. Having it email/text me when anything goes wrong turned out to still be easier than writing our own CRUD app or fancy interfaces.


I am really glad that worked! I am pretty paranoid about the downstream breakage of things but I probably just need to let go a little more.


Nice, actually I think the best target for this would be people that poweruse sheets and have some SQL knowledge. I see teams that eventually would develop tools or use data lakes etc instead of sheets, but as a first tool to quickly operate over the initial startup sheet drama I think it could be powerful :D


it's almost always required. the API for Sheets and its documentation isn't great. when I dug into sheets to make my receipts tracker, I found myself having to transpile their HTTP REST API docs into Golang, which was difficult since Golang has special objects for many kinds of requests you'd make in Sheets.


If someone want to try the PRO version here's a promo code to use it for free the first month "PGRAHAM" it's limited to the first 50 uses :D


There is similar open-source project for Google Sheets: https://github.com/0x6b/libgsqlite (SQLite extension), which is a fork of https://github.com/x2bool/xlite (same idea but for Microsoft Excel spreadsheets)


very cool, here's a pattern i've used in the past for integrating with google sheets using rclone[0] and PowerShell[1]:

    rclone config # setup a new config with the "drive" configuration
    rclone copy file.xlsx . # copy the file to local storage
    $Data = Import-Excel file.xlsx # using the ImportExcel[2] module, we can read this file as if it were a csv
    $MyData = $Data | Where-Object { $_.ColumnName -eq "MyColumn" -and $_.MyOtherColumn -match "\d+" }
[0]: https://rclone.org

[1]: https://learn.microsoft.com/powershell

[2]: https://github.com/dfinke/ImportExcel


I’ve solved a similar set of problems in my current role by piping all of our important sheets into BigQuery, saving/executing the SQL logic there, then making the result available as views that can be sent back to Sheets through Connected Sheets (or practically anywhere else eg Tableau via native connectors to BQ). For small datasets that fit in spreadsheets, the storage and query usage in BQ is practically free.

The trick (and I assume you’ve found ways around this) is dealing with the flaky Sheets API that throws a 500 error what feels like 1% of the time, and also deciding when/where/how to enforce the SQL data types and column headers. I made a config layer for the latter.. the interface to which is also a Sheet, lol.


I've been using Google Sheets as a data store for some of my latest projects, and it's really (mostly) delightful.

https://www.wheremoneygo.com barely has a database, and bedsides protecting the Google access keys has minimal privacy/security risk.

The main risk, and something I've only seen once, is a corrupted database on the Google Sheets side. Since I don't want to keep backups of peoples sheets, if something goes wrong on the Google-side the entire database could be lost.


Very interesting, good to see such use-cases around google sheets, much needed. I'm building an open-source Reverse ETL tool to run SQL query from data warehouse and send results into Google Sheets and many more. (https://github.com/Multiwoven/multiwoven)

All the best :)


Cool! How do you deal with irregularly shaped data? For instance if I have two levels of headers, the first of which spans multiple columns via merged cells?


As I replied to @hashhar, the header thing it's indeed an issue that I need to figure out, as there are some sheets that have weird structures and it would be nice to find an easy UXsy way to preview the sheet data and say "select from this range" or "these are the headers".


Note tho, that vertically merged cells I think should not be an issue probably, but would need to QA jaja


This would be interesting to use not as an external UI but rather completely inside of Sheets. Maybe as a function or something like that.


Google sheets already has an inbuilt QUERY function that'll do basic sql..


it is better than messing around with vlookup (what isn't), but only a tiny subset of sql is supported (with dedicated syntax). Also it claims to support actual human readable column names but as far as I know it doesn't work and you have to use column labels. Also no joins.

Still, if it were to be extended to a full query language with good ergonomic it would be amazing.


Indeed, you can run queries in a sheet, but

1. It's writing a query in a single row cell which is hard from a UX perspective 2. SheetSQL offers you a way to cross join sheets as if those where tables, even between worksheets, which is something that is not supported natively 3. Full DuckDB SQL support, so you can use advanced functions like PIVOT or window functions


This is not exactly true, QUERY pseudoSQL is available, but as a separate API not listed in Sheets Docs: https://developers.google.com/chart/interactive/docs/queryla... I used it as a DB for Cloudflare Workers, the only hard part is Google Auth.


Yeah actually I'm thinking on making an extension that you could use from the sheet page. Do you think that would make sense?


Can do this for free via GCP BigQuery console.


Is this something that can be done without being in Google? And it would not be free in the end right? As BQ is not free in the end either...


Needs more docs and more examples. Either I'm dim or I'm getting the syntax on a simple SELECT..WHERE wrong.


Could you email me maybe some of the issues you encountered so I could help and find a good way to add the docs in the page?


I'd rather do it here so everyone can join in.

I'm not getting a quota error despite not using it all day:

> Could not run query

> Error: Monthly free usage of 100 exceeded,

It's saying I've used 188 out of 100 which is odd. I haven't used 188 - more like 15.

Earlier I was trying a query like:

> SELECT * from <table> where Status="Complete";

And it was complaining that Complete wasn't a valid column... (rather than Status...) I also had weird issues trying to replace * with an actual column name. The autocomplete went crazy.

Not sure if it makes a difference but I'm a stubborn Firefox user.


Ah, I see, could you try now? The quota should be fixed, my bad there, had a bug.

Now about the query, DuckDB works in the following way, the query you sent should be written like

> SELECT * from <table> where "Status" = 'Complete';

Now this is because the `"` is used for the table and column names, and the `'` is used for raw string. See more here https://duckdb.org/docs/sql/introduction#querying-a-table which might be a bit different from other SQL database engines.

And finally, regarding the autocomplete, it indeeds tries to autocomplete in a too verbose way, which is something I'll try to improve.

Thanks for reaching out, and hopefully this can help you try it out!


Yes. Obvious in hindsight but - this underlines my point that a few examples would make life a lot easier for people casually trying it out.


Totally, In fact I think it would be nice to have a sort of interactive example based on one of your sheets, definitely! Thanks for the feedback.


I get a certicicate error - invalid authority. When I proceed: FortiGuard Intrusion Prevention - Access Blocked


Talk to your corporate IT. They are not mitm-ing you transparently enough.


It's strange. Never had this before on any other site. It's not even corporate IT, it's just the building's internet connection.


Could you reach out to taras@sheetsql.io with a screenshot or something, so I could have a look at what could be going on?


Yes, I reached out.


Fixed


That's pretty cool. So do they load your sheet into their database from which you write SQL?


No, it's done dynamically over the data, no loading nor anything. In fact it's streaming the queries with the power of DuckDB :D


Looks great!

Is there a rate limit?

Will changed to the sheet instantly go live? Or is that a manual process to clear cache?


I have the GCP sheet basic read quota limit, which is something I need to investigate if I can make it higher.

Also, in theory right now once you use a sheet once it will cache the version so that subsequent queries are faster, and in theory if you change the sheet it will check for that and use the live data if it got updated in the meantime.

Nevertheless, it might take a minute or so sometimes.


This looks awesome, nice work!


Very cool! I am often frustrated by the lack of SQL features in Google Sheets.


Can you tell me why this is better than using Power Query in Excel?


I guess this is more focused on Google sheets and the limitations of it, i.e. people that do not operate with Excel files basically


great work on the first iteration!


Thanks! First time deploying a project like this on my own, so nervous to see how the users use it, or if they do it at all jeje


Is the site MKDocs?

edit: looks like sphinx


:D Actually completely accidental, as I'm using the chakra ui for the frontend part


I would make the mid tier limited to 10k (100k?) queries and add one more plan.


That is an interesting thing you are raising here. I think right now the point of the paid plan is the scheduled queries, which are the more compute expensive, that's why the pricing like this.

Now, I do believe that I may just increase the free allowance, and maybe if I see people that want to power use normal queries and have maybe a few scheduled ones have an mid tier plan (or even on demand) that allows more queries and a few schedules.

And eventually move the PRO to be a really PRO plan.


Why's that? I'm terrible at making decisions on pricing, so I'm curious about people's thought processes.


There's some weird nudging psychology that says when when people see an expensive and free option, the mid-tier seems even more appealing, which makes it easier to get them paying.

Honestly, I wouldn't worry about it. If your tool is useful to people, I think your query limit on the free-tier will be enough.


Probably some kind of psychological pricing strategy.

However, charging for queries would be like ChatGPT charging in tokens — yes, it's understandable, but non-intuitive. And very product-, not user-driven.

I'd rather keep it simple in terms that a user can understand. Really like how you set it up right now — makes it easy to say yes.

The _enterprise_ game is the one where you will make the real money: Teams, Management, Integrations, Access Rights, Billing cycles, Onboarding, etc.


There will be that one user who makes a gazillion queries just because they are unlimited.

Or the second tier should still be capped. But that looks wrong feo marketing point of view.


You can just import your CSV file to SQLite without any limits, e.g.:

    sqlite3 :memory: -cmd '.import -csv file.csv table'\
    'SELECT col, COUNT(*), AVG(total) FROM table GROUP BY col'
More: https://www.sqlitetutorial.net/sqlite-import-csv/


The problem is you need to create a table with the correct data types first, which can be time-consuming. Additionally, using CLI can be a bit cumbersome to use, especially if it involves quering multiple tables, or using joins. I don't want to point to the infamous "dropbox" comment [1], but it's important to understand that many folks just want a friendly interface where they can put in a URL/file, and start querying.

This is the exact reason I built TextQuery [2], which solves a similar problem as OP's. I just wanted a simple app where I could drop a CSV and start SQL querying it, without creating schemas or writing code/commands.

[1]: https://news.ycombinator.com/item?id=8863

[2]: https://textquery.app/


In the link I posted there is a GUI way of importing CSV files by using the open source SQLiteStudio: https://sqlitestudio.pl. Your TextQuery also looks nice, although it's a pity it isn't open source.


It is indeed an option, I guess the problem with this is that its static and and you need to re-create the tables each time.

For someone more techie probably could work, as you could dump the data to sqlite and then query it. But if you want to have it dynamically and from sheets in real-time probably it would be harder to handle this way.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: