Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: SpreadsheetDB – A database that you can query with spreadsheets (spreadsheetdb.io)
144 points by Clex on March 28, 2017 | hide | past | favorite | 42 comments



It's a little known fact (is it?) that you can use a SQL like query language on Google Sheets.

You can query across different tabs on a single sheet, or query a totally different sheet!

https://developers.google.com/chart/interactive/docs/queryla...

https://support.google.com/docs/answer/3093343?hl=en


This feature makes me think, "When all you have is SQL, everything looks like a relational database."

I recently confused some co-workers when they examined my Google Sheets query - not knowing Excel very well, I just wrote SQL-ish. Now they have to live with my SQL-ish spreadsheet or figure out how to redo it with Excel functions.


How about "when all data is fundamentally structured, a Structured Query Language will be highly useful in many cases?"

Seriously, we live in a universe where everything is composed of lots of smaller things that exist in relation to each other. Relational Data isn't so much a database technology as it is fundamental information theory.


Let me preface this with: I'm a huge proponent of SQL when appropriate. I understand why Cassandra et al were required when generating dynamic live newsfeeds for each user in an ecosystem when the RDBMS query plan was pretty much designed in a naive non-uber-scalable-fashion.

That being said - let's take a field which changes fairly rapidly, all things considered - medicine. You have two forms of changes which can occur within your data model -- (a) definition changes where your schema stays the same effectively and (b) paradigm changes. Let's say you're with a patient and need to code in a ICD[1] value for whatever malady he has. A crude analogy would be a comparison between the Dewey decimal system and the ICD. As the ICD evolves, new codes might be added but the fundamental structure remains the same. So, a new database paradigm TotallyAwesomeRuleBasedImmutableDB pops up, and you just use the pre-existing code for "Databases" (say 20) and appropriate the next available integer in that set for TARBIdb(lets say, 73). No problems there. Essentially, we just inserting a new value (73, or added a new "smaller thing") into the set of Databases to universe U_1 by without violating any invariant properties. Consider (a) a non-issue.

The problem occurs when you need to address the issue of your patient being an alcoholic. Previously, this was referred to as trunksucht (loosely translated as: "dipsomania"). This would fall under the 305, 303, 291, or 980 codes depending on if you were say, a non-drinker with no psychological cravings for alcohol who pledged for a frat, drank too much and developed alcohol poisoning as a result of that. If you were a daily drinker exhibiting physical withdrawal symptoms, you would be coded from an entirely separate section. From the ICD-9 to ICD-10, the WHO reclassified this entirely (such that either patient would be coded in to an entirely new category 'F10').

This breaks a lot of things, from the front-end GUI which would do validation with a regexp [0-9]{,3}\.[0-9]* or some such which is trivial enough I suppose, to the stored procedures which then have to get vendor approved schema changes, all of which will be rolled up and sent to your IT guys who have to go through tedious QA to make sure everything works. (And I mean literally every modal dialog, every validation, every external XML-RPC call to bill out your patients insurance company, Universe U_2).

I've read Codd's paper, the entire ANSI SQL11 standard, and have a fair grasp of both major vendors. There is nothing in there about gracefully dealing with these situations. I suppose one could argue that an RDBMS' purpose is solely to act as a data-store while retaining availability and data integrity -- that is, U_1 is all your Oracle instance cares about (manage your tablespace's set of schema objects and the interaction between U_1 <--> U_2 is completely out of the scope of it's set of responsibilities). Maybe so, but when Oracle added the XMLType with full XPath selector capabilities in 9i (give or take?) they certainly co-opted the responsibility to deal with changes of a structural (that is, "of type 'b'") nature. Dynamically typing, weakening of type systems, writing to an 'duck typing'), serializing via the Protobufs methodology, etc are all just band-aids; we need a surgeon to go in there with some surgical sutures.

[1] https://en.wikipedia.org/wiki/ICD-10


Relational calculus is pretty fundamental. A better quote might be "Every application that uses complex data has buggy hand coded relational database plans it in."


If you work with data, you should learn basic SQL. It's a lot more useful and safe than a tangle of Excel functions.

The relational model was a real breakthrough. It really does describe, generically, any data structure you may have. Folks who adopted NoSQL solutions are now learning this the hard way


Also little known, extremely powerful feature of Google Apps is full-on scripting: https://developers.google.com/apps-script/overview

With urlFetchApp you can use it as an API runner: https://developers.google.com/apps-script/reference/url-fetc...


It's a little known fact (is it?) that there are database functions in Excel! You can calculate simple aggregates like count and sum of records subject to some constraints. Example: https://support.office.com/en-us/article/DAVERAGE-function-a...

These functions were available since Excel 2.0 but many people don't know they exist


Excel's filtering syntax (e.g. SUMIF) and this function are obtuse, difficult to use, and very simplistic just compared to a WHERE clause on a sql statement. And this ignores other very powerful sql features like pivot and group by that are exceedingly difficult for excel to produce.


You just described pivot tables, which are well-supported by Excel and Google Sheets.


Sumifs and countifs are far more powerful (see my other comment) I no longer use d functions since these where added in office 2013


High five to you for adopting (did you?) my (is it?) quasi-half-assertion.


You could also query Excel with SQL; the ODBC driver is part of Windows. http://stackoverflow.com/questions/15844633/using-excel-as-a... gives an example of how to use. Not the best database in the world, but sometimes "SQL can save ya".


I often wind up needing kind of the opposite of this: a spreadsheet as a view on a complex SQL query I've written, so that folks can manipulate the results themselves.

A quick way I've set this up with Google Sheets is via IMPORTDATA(), combined with a url that dumps a csv of my query results. That way people can do all sorts of analysis and manipulation in a different tab while being able to refresh the underlying data. https://support.google.com/docs/answer/3093335?hl=en


Wow. This is what I've wanted for years. Amazing you can even write a query into a URL string.


You might like (love!) this: "Programming in the URL string"

http://apenwarr.ca/log/?m=201212#18

http://afterquery.appspot.com/help


And BigQuery!


Other posters have mentioned Microsoft's Excel 20-year-old ODBC support [0]: you can connect any database as a source of data, then create pivot tables for live ad-hoc transforms and summarizations of a database. This is a separate feature to connecting to an Excel spreadsheet using ODBC.

I showed this to my manager and he immediately demanded I explain to the rest of the team how this black magic worked…

[0]: https://support.office.com/en-gb/article/Overview-of-connect...


ODBC is over 20 years old.

I was using ODBC from MS Access to query Oracle databases in the early 90s - both ODBC and (preferably) pass-through queries. The entire system included then generating (recursive) reports in Access, passing the data to Excel to integrate with other data and input, opening and updating a Powerpoint file with linked tables, saving the PPT and returning to Excel for the next in the series.

All driven by VBA. All in 1995, maybe 1994 too.

I had to restart Windows after each run of 20 or so, and had two computers as one was always tied up running jobs. Kludgy but at least I wan't writing COBOL.

Here is some 1994 PR: http://archives.cpajournal.com/old/16531700.htm


Was going to post about this. The ODBC support is actually pretty awesome, but it is a pain in the ass to get working on some databases. If you want to connect to PostgreSQL, for example, go through these steps and enter the bowels of Windows:

http://www.dashbay.com/2011/03/working-with-postgres-on-wind...

You can also use the ODBC in LibreOffice Calc to similar effect. You can also connect to the database with MS Access and LibreOffice Base. In my opinion, it is much easier to make an intuitive interface for a non-tech user. SpreadsheetDB seems to want curl and JSON requests, which is never going to be used by a non-tech.


The idea is very nice, but I don't like the fact that it is just an API for calculations. That kind of defeats the purpose of a spreadsheet format.

At first I thought it was an actual spreadsheet application, or an Excel plugin or whatever.

I tried multiple times to create a spreadsheet app that would speak to JSON APIs easily from within cells. The problem is that writing a spreadsheet app from scratch is not easy (one of my tries: http://sheets.alhur.es/), but when that is done it is not hard to integrate it directly with PouchDB, CouchDB -- or any other database or application, actually, with a custom integration.


You might want to look at https://github.com/handsontable/handsontable for the front end. You could probably mash it up with this to build what you describe.


Cool idea, but that logo just reads as "preadsheet", which, is not exactly positive.


Good call out. I just want to add that the issue is exacerbated on a mobile phone. iPhone 6, portrait mode, the "S" and "preadsheet" are even on different lines!

The linewrap goes away when I rotate to landscape mode.


Thanks for the feedback, that will get fixed soon.


Also, you mis-spelled "guarantee" under "99% uptime"...


Fixed, thank you.


I'd love to see this concept taken a step further and be able to write back changes (within reason, of course) to a SQL database.

The piece I'd most love to see is some form of editable Pivot Table, where the changes get written back to the appropriate place in the database. For example:

    CREATE TABLE Balances(Account varchar, Scenario varchar, Amount numeric, primary key(Account,Scenario));
    -- Pretend that this is pre-populated with all Account and Scenario combinations
Using this schema, I would want to get a Pivot Table out of this where Account makes up the row labels, Scenario forms the column labels, and the Amount values are the data. Where the magic happens is that I want to be able to make changes to the amounts and have it write those back into the database with the appropriate UPDATE statements.


Be prepared for a bill from your DBA for his/her blood pressure medication then... or the accounts team when they realise that a missing conversion of a string to a numeric caused you to zero out every customer's account balance... ;) (Based on real world experience with my clients over 3 decades)

Even 20 years ago good old Microsoft with their MSQuery capability in Excel stopped you from doing this every way they could.


Depending on how it's done it can be perfectly safe.

I've actually built a thing, in Lazarus of all things, to do this very thing. Unfortunately it's very specific to my use-case, but it is "safe" in the sense that it doesn't screw things up too badly. My particular use case is actually very spreadsheet-like, but I also want some things that are a lot easier to do in SQL than in Excel.

My use case is to have a tool to help me evaluate monthly financial statements for a smallish organization, enter my own adjustments, and come up with a final statement.

It's spreadsheet-like in the sense that I enter everything into a grid of rows and columns. The rows are accounts (in the income statement sense, not customer accounts), columns are a "type" (e.g., actuals, my adjustments, adjustments made by the auditor, adjusted actuals, budget, budget vs actuals, etc). Some accounts and some types are calculated, thus read-only in the table. There's also a drop-down at the top to select a period.

Basically, it's everything that a basic Excel Pivot Table does, but database-backed.

My Lazarus program lets me fill in that table however I want, when I hit save it persists it all to the database and runs a stored procedure to recalculate all of the calculated values. The database makes sure that I don't try to put a string in a numeric field, and it (PostgreSQL) is smart enough not to convert strings to zeroes.

The thing that makes this better than a spreadsheet is that I can just add a new account/scenario/whatever and run a stored procedure to insert the missing rows in the data table. When I did a similar thing with Excel, it was easy enough to insert rows. The hard part was inserting all of the rows everywhere they needed to go (I had a tab for budget, a tab for actuals, a tab for adjustments, etc) and updating all of the formulas.


I created something simpler but based on Google Docs and just retrofitted promises into it: https://github.com/franciscop/drive-db


Cool idea. We've built a spreadsheet-powered web platform (Cloudstitch - YC S15) that lets you publish web forms and widgets that rely on spreadsheets as a backend.

Several folks use us just for the spreadsheet API -- as this service seems oriented. We provide a simple, common API that can talk to both GSheets and MS Excel 365, support various forms of row-level access control, and Ruby on Rails-like join syntax. Would love to hear feedback.


I was an Excel expert before becoming a sql expert. Following this I changed the way I created large excel models, now 90% of my data is in Excel tables. The main benefit of these is they are much easier to use and manage than range names and the ranges automatically change their size to contain the data. Wish ms would add unique row ids (as we'll as the unique column ids they have) is a bit cumbersome having to setup index match everywhere to link rows in different tables. It's amazing what you can do with sumifs and countifs these are far more powerful then the sumif - very similar to sql where clause, with multiple criteria such as <= , works with text and numbers. I used to use tons of pivot tables but no longer, they are too unstable and confusing and very hard to format nicely.


Any good resources you used to learn this (to replace pivot tables)?


pretty cool - I have been a fan of tools like Airtable.com (IMHO the best cloud spreadsheet I have seen). In my own startup, I have hacked together a version of a cloud spreadsheet using Handsontable + reactjs + flask api.

Can you talk about the stack you are using ? really interested to hear about it.


Thank you for the kind words!

Airtable is really neat indeed. But SpreadsheetDB aims to be more a database with an alternative paradigm than an actual spreadsheet editor.

Regarding the stack, it is made with Go and Postgres.


Not sure what pain problems this solves. You can already programmatically access and manipulate Google Sheets.

And no, you are not going to run in to their upper bound limitations on rows and columns often.


Aka Access?


SSSh - we like reinventing the wheel in web based form!


Or Excel! It's been able to backend to various RDBMSes for at least twenty years now. Ripe for disruption, no doubt.


Somewhat related. I've been using AirTable (https://airtable.com/) to manage my consulting startups clients and even track hours. Absolutely loving it.

You define tables and scheme in their beautiful excel like web interface. AirTable then automagically generates a RESTful API you can build on-top of. They even have a great Node.js ORM[1] library. Finally, you can export public embeddable forms for inserting data into tables.

I'd love to see somebody create a React based front-end that pulls data and schema from AirTable and automatically creates a CRUD interface like Rails does. All inclusive authentication (Twitter, Facebook, GitHub) and user level permissions.

[1] - https://github.com/airtable/airtable.js


There's some related research out of UIUC that's pretty neat. http://dataspread.github.io/




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: