I looked around and Google has a patent on it, and seemingly no public implementation. This makes me super sad because I honestly considered Sets one of the most useful exploration and ideation tools ever created.
BigQuery in Sheets, cool I guess? I want Sets back.
Here is Google's Patent:
I put in: fat obese porcine
Stick it on a website and then the rest of the world can use it too!
The way the paradigm and learning curve work just seems to work for a lot of people. I reckon it's still the most popular plaform for coding, data engineering, analytics & data science... using liberal definitions for all those terms.
It also seems like something that's in google's wheelhouse.
> "All enterprise software competes with Excel.
> All productivity software competes with emailing things to yourself."
You'd think we would have come up with better answers. I'm constantly looking for ways to better manage bug tracking, todo lists, and even managing job searches and professional networking, because I'm still not really happy with anything I've found.
As I bounce back and forth between different tools, I always land back on just using Excel or Google Sheets. I'm never entirely happy with those either, but I find that they're no worse at the job than the dedicated tools.
It's a sort of tribute to users-know-best and a "I made it so I understand it" mentality. Spreadsheets are a rare example where people make their own tools in a modern work environment, where most tools are highly politicised.
For me, I'd like to see spreadsheets improve at the tasks they're doing, not hand over to some cookie-cut app procured by management.
re: spreadsheets and recursive publics ;)
Actually, I should phrase it a different way. For me, Notion competes with OneNote, and for my personal use they're somewhat comparable. Notion doesn't take the spot of Excel/Sheets in my toolkit because Excel/Sheets can be gradually automated out of initially fuzzy requirements.
My main issue is that the specialized tools always impose some sort of opinion on how ____ should work that is close to mine but different in impactful ways. Case in point, JIRA. With how infinitely customizable the columns are, it might as well be a spreadsheet with links to email threads.
That's selling JIRA short on how much the UX tries to do for you, but I'm intentionally making a point about how much overhead there is in paperwork generated by the fact that you're using JIRA. When I compare how much paperwork JIRA saves me vs. how much paperwork JIRA generates, I often find, even as a PM, that I come out behind. As a developer, I pretty much never feel like I'm coming out ahead on time.
I don't mean to pick on JIRA specifically because every bug tracker I've used has this problem, but JIRA was the tool I had to use for work for the longest time, so I wanted to stick to what I experienced firsthand.
I always think "people built cathedrals and pyramids without all this fancy software, it isn't the key to getting things done".
On a bit of thought, I find it useful to organize and break down things and maintain lists (the daily/weekly list habit works best for me currently). This is different than emailing things to myself because I maintain an actively curated list that's hopefully logically organized in big and small tasks.
On the enterprise software front, I am a linux-friendly software engineer, so commandline tools are my best friend. I am not sure Excel can ever beat the productivity of vim+linux tools, let alone other specialized tools like sql. One thing to note is that a lot of time I deal with production data, so its a huge effort to actually move data to excel.
And I have found keeping a todo.txt file on my desktop is still better than any productivity software. Although, these days, it's a symlink to a cloud-storage folder on my various machines.
I've done some tech consulting work and we found a way to deal with this, even at a "traditional" SME level.
The answer :
enter excel (not csv) -> decode data with sheetJS into web app + web frontend for "make the stuff happen" -> export to excel with sheetJS (not csv).
MY 2 cts interpretation : everyone uses excel because everyone uses excel. Above workflow does not raise any barrier for client and he's happy with time performance gains
Ideally I'd like a Python package that I can point at an .xlsx file and it would give me back an object graph that replicates the values and calculations of the workbook.
And really I don't mind as some of those users are really good at it.
There is a lot of painful excel abuse in business.
I didn't understand why finance liked them so much and I just looked upon finance groups as boring old finance nerds.
Man was I wrong... I love Sheets and spreadsheets in general.
I've literally spent hours in Sheets running financial analysis of our operational stack and saved thousands and thousands of dollars a month by analyzing our hardware spend.
I don't think it happens as much any more but the same analogy could be used for shell scripts. You may have a process that a couple shell scripts are useful for, but they can also metastasize to the point where only the original developer (if even) can maintain them.
Similar to MS Access or other low code environments, it is great until it is absolutely unacceptable due to system limitations.
Spreadsheets are GUIs for people who prefer that to shells.
I didn't say anything about that.
You said spreadsheets are like shells for finance people. I was just pointing out that shells for finance people exist and that most prefer not to use them.
Notably, the Data Connector has a row limit of 10,000 results, while the Connected Sheets claim to surpass the regular row limits of Google Sheets in general.
As a disclaimer, I'm an engineer on the BigQuery team (but not BI Engine specifically).
Idiomatic BigQuery will make use of partitioning, so that a large dataset will span multiple tables in a way that you only read the tables of interest. (E.g. the Google Analytics integration partitions by date, so reporting on 1 month of data will only read 30 tables out of the 1,500 you might have.)
Especially with column-based querying and partitioning.
I can't wait to see how Google's flavor of it for Sheets works.
 It's officially called "Get & Transform" now, but when it came out one Excel 2010 it was called PowerQuery, and still gets referenced that way everywhere (including the official help docs).
We've also built a full SQL and Python IDE with a shared code repository, but solving the simple problem of "write SQL -> get data in a Sheet" was where we started.
Sheet takes time to load but still works fine!
Why that over a DB?
I really hope that this announcement would herald a renewed interest on their side in improving Sheets as a whole.
And it was a bit of shocker writing es5 js again.
This announcement seems like an expansion on what's above (e.g. pivot tables). I've tested what's above for my org and there's a lot of promise. (only criticism is creating formulas in the query don't work automatically since they are returned as Text cells; workaround is to cast as Automatic)
A job that should be less than an hour took over 6 hours due to various annoying datatype issues.
Another interesting thing is that BigQuery doesn't have Full Text Search so you need to go back into postgres or use elastisearch to use FTS which is beyond annoying.
So as an in-house solution, ClickHouse most probably would be the fastest option (if your use case suits OLAP requirements).
For clouds / PaaS - it's hard to compare directly. Do you know how many servers will process your BigQuery request?
AFAIK usually BigQuery shows a bit higher performance than single mid-level ClickHouse server (but you can also have a cluster of ClickHouse servers).
Otherwise it's really difficult to compare without knowing how much data you have, how often you query, etc.
The answer depends massively on the intended use case.
If you do a "SELECT field1, field2 FROM AAA" you'll pay only for the total size of field1 and field2 rows.
So you usually want to use BigQuery in situations where you don't need to query data all the time, but rather a fixed number of times a day.
Now about performance: BigQuery queues queries, so you don't have a guaranteed time. It can take a couple of seconds before your query starts running, it can take longer. If you need something that responds in < xxx ms, BigQuery is not it.
But the queries themselves are fast. If you need to query across petabytes of data, as a simple BigQuery query will gladly run on however many dozens or hundreds of instances it needs, at no additional cost for you (since you only pay by the size of your data queried).
It's really a great example of serverless. You can run your query across 100 instances, but you only use those instances for a few seconds.
> SELECT * FROM AAA
> "SELECT field1, field2 FROM AAA" you'll pay only for the total size of field1 and field2 rows.
Another example is materialized view creation. It's common for these to scan large quantities of data to compute aggregates.
A where clause still searches the entire column, unless it is conditioned on the partition column.
"Partition your table and create clustered columns and filter on those"
I largely ignore Google products announcements these days. Why bother getting excited about things that won't last?
There's also a decent argument to be made that sheets is an enterprise product too. It certainly is for enterprise customers.
Disclosure: I work in Google Apps, but not on sheets or docs.
Well clearly not...
In case anyone’s looking for it, there’s the buried lede. Whenever Google announces something like this, don’t hold your breath. Remember that robo-calling assistant they demoed awhile back? I remember that should’ve gone live within a few months, too.