Hacker News new | past | comments | ask | show | jobs | submit login
Visualize data instantly with machine learning in Google Sheets (blog.google)
508 points by pmcpinto on June 2, 2017 | hide | past | web | favorite | 104 comments

When I worked for SAP back in 2007 (I was a fresh grad at the time), I was working in the business intelligence (reporting, analytics, and data warehousing) group and noticed how cumbersome it was for organizations to simply create and view reports (we're talking millions of dollars). I once said to my boss "you realize that in the future we'll simply just write 'show me a line graph for sales in the northeast'".

And so here we are now.

I'm currently evaluating BI vendors for my company and just about every major contender has this functionality or is developing it. Read the Gartner report from 2017 and they basically just come out and say that this will be the new standard in BI.

The much harder problem is data management and preparation. Anyone with half a brain and a decent visualization tool can create basic graphs - but that doesn't mean they should, especially if the organization doesn't have good data management processes in place.

Issues like data governance, data prep, and data modelling are the major pain points for me. And honestly, developing a useful BI solution is more about culture change than it is technology. If a company has poor data governance, it doesn't matter how whiz-bang their technology is, they're still not going to get useful insight from their numbers.

And data management / preparation is where major mistakes are made - get a join wrong and you may easily be missing data or double counting something just obscure enough to go unnoticed like "ancillary sales".

There is something potentially harmful, or perhaps that needs addressing, about end-user tools growing in expressive power. A good friend who does statistical genetics work once told me "but I don't want every user running their own regressions and drawing nonsensical conclusions from badly prepared data!"

Yep. Wake me when an AI can tell me "hey, that monthly trending conversion report you asked me to pull...yeah, you're missing two days worth of data when tracking broke, so it will just make your numbers look lower when rolled-up monthly and be hard to notice."

BTW, that is also the reason I not only set alerts, but review data at a daily level when pulling any rolled-up reports of significance.

Why can't AI tell you you're missing data?

For most use cases, you don't even need AI. Once you reach a certain scale, there are certainly ML-based solutions available: https://medium.com/netflix-techblog/rad-outlier-detection-on...

That ship sailed somewhere around Excel 95...

Just out of curiousity, which one did you go with?

We're in the middle of doing this ourselves and would like to hear a random HNers thoughts on the subject.

I'm at a ~100 person company. We've been using Mode for about six months and like it.

Microsoft PowerBI

The problem is that this approach requires well structured data and most corporations have data all over the place, in legacy systems or some half-assed backed solution that kind of works. So even if you provide them with some state of the art platform that pulls graphs out of thin air they'd still need shitloads of consulting and custom made tinkering to make it work.

The other problem is that most executives don't even know what kind of graphs/reports they need. Sure, they can have the classic sales by region and stuff, but the really important things are hidden and require a lot of digging and asking to get to.

So many of our clients (I'm still working in the BI space) are still using Excel workbooks and sending them to each other over email. I think it will take quite some time before we see any major improvements in this area. But I agree with you. Some of the stuff I see makes me cringe.

Excel is the appropriate tool for one-off adhoc analysis.

When something needs to be regularly produced, then and only then should it be moved to a formal BI solution.

People have a lot of questions and Excel is an approachable tool for small analysis that users can self-serve with.

OMG you are so like 5 days ago..

everybody knows the correct answer is jupyter notebooks, and totes everyone like uses them, that's what it said on my facebook the other day.

wow forgot to add /s here, clearly lol

I doubt it.

NLP interfaces in reporting work for simple things, when you want to start doing more advanced things they become incredibly verbose and inconsistent.

That's where a well designed UI really shines...and will continue to do so.

And I'd argue that a well-designed UI only goes so far, and that eventually you need the expressiveness of some sort of programming language. This suggests a hierarchy of user interfaces for complex tasks:

structured language > purpose-specific UI > natural language (processing)

Old adventure games were really illuminating with respect to limitations of natural language interfaces. Rather than allowing for the expressiveness of natural language, you simply had the feeling of groping around like a blind child at an easter egg hunt for the few valid operations that existed.

I feel like this using GUI tools for data tasks. It's like being forced to communicate by banging rocks together compared to, say, the expressive power of R or Python.

I wonder if natural language interfaces could leapfrog GUIs if we all up and learned something like Lojban...

"Show me country & city as rows, year as a column, sum the amount of population for country/city combination, show the YoY as a column and the subtotal for city".

Really? That's 3 clicks a way with a well-designed UI.

And what if I have those 3 fields in different tables? Even more verbosity...

P.S: My job has nothing to do with designing UIs, just to dispell any doubt :)

Watching the gif on that page, it actually seems slower to use. Instead of just drag selecting the table, I have to write out all the header text now by hand.

Yeah, this. It's cool but you probably have to match some grammar and really it's just a long text format for some keyboard shortcuts a pro could do.

That being said, it's a start. SQL as natural language.

It will all be voice driven soon so natural language has an advantage over using hands.

> we'll simply just write 'show me a line graph'

How about we just 'say' it ? At one hackathon, I had the idea of making an Alexa skill that would translate simple voice commands like 'graph column A' into matplotlib functions and then show the graphs. Teammates weren't too excited about spreadsheets though.

Wouldn't that make "mechanical keyboards for everyone in the open floorplan" seem like a smart decision?

I think that depends... open floorplans already have plenty of ambient noise usually. Chatter, people on phones and web calls. If the voice input picked up in a small range and people got in the habit of kinda semi-whispering their occasional commands... I doubt it'd be a massive difference.

But I'm assuming lots of things, I can also envision your scenario playing out to great annoyance.


You don't necessarily need to make noise to speak to computers.

Just need to bypass the mouth and jack-in directly to the speech center in the brain.

At that point it would be tempting to bypass the screen, the eyes and the brain as well ;-)

brain was bypassed long time ago...

OMG, Google stole your idea. You should sue.

I played around with this the other day. I have a spreadsheet with a bunch of columns. It wasn't immediately obvious how to use the explore feature intuitively. It graphed data but not really the ones I wanted. I was also hampered by it using only about 200 pixels on the right side of the screen.

I started typing in a question but it couldn't guess what I was interested in. YMMV. Perhaps with a fairly simple spreadsheet you can intuit things? Back 10 years ago I built a google spreadsheet competitor called Numbler (well, I didn't know if was a competitor, google sheets came out a couple of months later). But one of the things I learned is that people use spreadsheets for just about everything, and it can be in the wierdest format.

"people use spreadsheets for just about everything"

That is so true. I used to do consulting for small businesses and was amazed at how non-programmers used Excel to solve so many problems outside of finance and accounting. The CRM/HR solutions were very common and interesting (e.g. Lead/prospect management, sales, timesheets, vacation schedules, etc.).

Can we talk about getting data into Google Sheets? Is there a standard way to build a pipe from, say, a reporting database to dump aggregates into Google Sheets?

I built a private Add-on for my company that surfaces specific aggregates as Sheets functions (i.e. getSalesByDay(...)) and I have found so many bugs with that whole ecosystem. Deploys are completely manual and require copy-paste, you can't reliably tell what version is being invoked in a sheet, invisible cell-level caching that caches error state, concurrency limits that are too low and impossible to work around, and more. It all kinda sorta works but Google doesn't make it easy.

Yes, so much this. I've encountered this same issue.

For our non-technical employees I write an endpoint that gives them a CSV with the most current info (often with a super simple front-end that they can use to query for specific date ranges or with filters). They download this CSV and upload it into their sheet manually to update the underlying data when they need new info.

It would be so great if you could just say "here's a URL; keep my data fresh from this source" and it would automatically do it.

Does the IMPORTDATA function work for that?

It does. I use heroku dataclips to get data out of a database and import it into google sheets using importdata. It's really simple and for my use case incredibly useful.

I never had a problem with the Sheets V4 API that came out last May, how big of a database are you talking here?

The add on doesn't talk to the reporting database directly, Google's Apps Script JDBC service doesn't support Postgres/Redshift. The add on gets data from a REST API. I then have an etl job that precalculates the aggregates and stashes them in Redis that the API then reads from.

See https://www.blockspring.com. The add-on has prebuilt connectors for both postgres and mysql. Can also build your own connectors https://open.blockspring.com/blocks/new.

Ive used gspread for Python without any issues, except for its lack of support for font/cell formatting (highlights, strikethrought...etc)

what kind of DB was the reporting database if you don't mind me asking? Seems like translating the data into Google Sheets is unnecessary if you're just trying to graph/chart it, although as I say that I'm realizing there may not be a tool that could just graph it directly that's easy enough to use/cheap enough.

The reporting DB in my case is Redshift. The business use case isn't exactly charting. My goal is to enable a business user to reference certain types of important aggregates in Google Sheets. By enabling this, business users can then compose tabular views of data that are rich, dense, and always correct and up to date. It also allows business users to very quickly prototype dashboards and make detailed specs for reports before an analyst or engineer has to get involved.

This is one tool in our tool belt for communicating with data. We also have an in house SQL-in-chart-out platform that everyone has access to and we have Looker for deeper exploratory work.

I use Zapier. It works. Sounds as if its not even close to the depth of feature set you need, but for basic sync its beautiful.

Cool stuff - someone else mentioned Thoughtspot - that was my initial thought as well. Very similar idea.

I wish that Google would take the same sort of "embed" idea further in G-Suite. I find it amazing that I can't (as far I know) reference slides from another deck in Google Slides. The use case would be putting together a series of "core" slides that are updated across your organization as they change. Given the web nature of G-Suite, this, to me, would seem like a no brainer.

Also, inserting charts from Google Sheets into Google Presentations looks pretty terrible. I often revert to Excel because the charting is fair superior imho (though just as challenging to wrangle).

They are solving a problem that doesn't really exist, the challenge is not the last step of a data report, it's the steps involved in the beginning, getting good data in, formatting, joining multiple sources, automation, dealing with junk data, procedures,etc.

I don't understand the example, what's the difference between typing "Show me a line graph" and clicking a button in excel that does the same thing.

Did you try it? Yes, getting good data is a challenge but that's not one they can solve in Sheets. Making a graph is also not always one button, unless you have very simple data.

I found it immediately useful. They've solved some UX and discovery issues around creating charts. And it's not _just_ charts... they're answering questions and identifying trends within the data. e.g. I threw some pretty basic data at this and it told me “Flights” contains a yearly cycle: “Flights” increases until May 1, decreases until October 1, and increases until December 1.

That's pretty handy. YMMV, but this is awesome.

Tableau did some research in this area, and found NLP queries were much faster:


To be honest no, I don't use sheets unless it's for something quick and dirty, it just doesn't compare to excel. In terms of machine learning I have tried IBM Watson with sets of data, interesting concept but I found myself still juggling the data to get what I wanted. I currently use Tableau.

THIS. As someone always dealing with data, I agree that getting to the point of a regression plot is where all the work is at.

Mandatory shameless plug - Google Sheets integrates with Google BigQuery.

You can:

- Query data in Google Sheets from BigQuery

- Create virtual views in BigQuery that are powered by Google Sheets

- One-click export data from BigQuery to Google Sheets (< 20k rows or so)

- Using AppScript, build dashboards and reports in Sheets that query BigQuery for results.

(work at G)

I tried to rely on your first bullet point and randomly end up with BigQuery surpassing some API treshold towards Drive (paying customer). Have to carefully manage doing copy-queries over to BQ, which is a pain, but better than nothing I guess.

Thanks for the feedback! I'd love to get more detail on your experiences so that we can iterate and improve and would be eternally thankful if you emailed me.

Oh, wow. I love where this is headed. Spreadsheets are one of the most abused products in a normal business--used for everything, and then some poor excel jockey ends up being forced to create a semblance of order from the chaos.

That's because spreadsheets are an excellent tool for prototyping, and developers have shitty tools for building robust products from a prototype.

If developers had a quick way to create properly engineered applications from a working workflow involving a spreadsheet, it would be easy to define most company processes on Excel or Google Docs and turn them into solid software.

But building an application that is functionally equal to the spreadsheet, just with robust engineering practices, typically involves several-months-long projects with many developers and managers, which is expensive.

I once reverse engineered a spreadsheet that had cost the best part of a million pounds to develop - I made the relevant manager very unhappy by pointing out that significant chunks of the beast did absolutely nothing.

I did come to really like the "Trace Dependents" and "Trace Precedents" features in Excel! :-)

Yeah... the worst part of spreadsheet is that it's hard to document their structure and purpose. You may try to explain it with cell comments and blocks of color, but there's no easy way to signal the entry point or see an overview of how the different parts work together.

At least they are functional reactive, so it's easy to trace the evaluation of expressions.

Quickbase is the usual suspect for workflow via a spreadsheet, with functionality that's broader than say Airtable.

Google's "app maker" looks like their attempt to do something similar. https://developers.google.com/appmaker/ Though they seem to have left out the 'start from a spreadsheet' part.

>But building an application that is functionally equal to the spreadsheet

can you explain this to me? i've never used spreadsheets extensively so i'm clearly ignorant but my impression was that the only thing you could really (i know you can bend the rules etc.) is data transformation. when you say application is what you mean? i.e an ETL app or do you mean more than that?

Excel can pretty much do anything you'd want in an application, though not in a particularly maintainable way. GUIs? Yes. Data updates triggered by GUI actions or cell entry changes? Yes. Interacting with external resources? Yes, that too. It can do a lot more than data transformation (except in the sense that all computation can be modelled as data transformation between inputs and outputs, in which case, yes, that's all Excel can do but all any other app does, too.)

The more that's been done in Excel, the more pain it will be to maintain it in Excel or port it to a non-Excel platform, because Excel is seductive for write-once, but horrible for maintenance, and the people in an org who have it as their main tool are unlikely to be either cognizant of the problem or devoted to spending effort mitigating it.

Excel in business / finance is a great mish-mash of ETL, modelling (from simple statistics to pretty advanced optimization), graphics and dashboarding. With too little structure and documentation to be good for you. A big plus is that every step is (at least without VBA) user transparent. From Excel to application would mean specs by non-sexually writers and continuous exceptions on specs when thereto unknown use cases of pretty much the same spreadsheet are found. (Since you don't exist unless you run your own spreadsheet, users often branche from the sheets of others.)

Someone needs to create Spreadsheets Anonymous and share the crazy stuff people do in Excel, the best one I've seen is a full GUI wizard (back, next etc.) in VBA.

I once worked at a large company (70K employees) in which the check request software (that is, if you needed to pay a vendor with a written check) was written in VBA embedded in Excel, using Outlook (the client, not Exchange) as Workflow.

You opened the "spreadsheet", which hid Excel itself and showed a form. You filled it out, clicked Send and it used your local Outlook client to email a copy of itself to your boss, who then opened it and was presented with another form for approval, etc, up through finance and finally to the person who wrote the check.

It was amazingly ugly and fragile, and of course, there was no way for anyone to see the state of the system (because the 'state' was distributed among the entire workforce's Exchange inboxes!).

Want to know the status of your check request? No problem, simply call everyone who might have been in the approval chain and ask them to search their inboxes. Hope you find the most current person!

(updated to correct the name of the software)

God, I work in an environment like that. We are a non-profit, and our primary CRM is The Raiser's Edge. However, data is never manipulated within The Raiser's Edge. Instead every process involves exporting data from The Raiser's Edge, manipulating it with an MS Office application, and then re-importing it.

When I started, almost every process was based around Access, using queries made using the query builder. With some basic scripting I managed to cut a couple of man-months of work down into about an hour or two of work a month. The entire thing is more fragile than a stack of cards though.

At this point I could literally write a 200 page dissertation on all of the reasons why this is a bad idea. But it works, and it is cheap. So there you go.

In the late 90s I worked for a company that ran their manufacturing logistics using Lotus Notes.

An excel form / outlook workflow would have been an improvement.

There are lots of great example replies here.

I miss using machinations in MS Access back in the day. Now with Google Sheets, I am left wonting for a tool that enforces primary keys and is /easy/ to set up a view or query.

I worked in a department at one of the largest tech companies that only developed custom, complex Excel plugins. It wasn't my cup of tea but there's definitely a lot of expertise out there.

I'm at a 250 EE company with very small ops budget and I built a stable and modular account planning and CRM system (when we had 150 EEs) that uses structured + validated + protected Google Sheets for user inputs and some viz, Alteryx as the ETL, and Tableau (Reader) as the visualization. Took half a junior person and half a senior person to maintain it for 100 clients and 200 users. It has served well as a prototype for our Salesforce installation this year, which has enabled us to decommission half of the modules in the Google Sheets system. If we had gone straight from nothing to Salesforce it would have induced shock in most of our employees, and our data and processes would not have been ready to just drop into Salesforce. In my mind, that is the best use case for a spreadsheet 'app' ... build it from the start with the goal of it being a transitional tool to a 3rd party app, nudging users along the way to professional processes.

Well charts is a good addon but just wanted to understand how they are able to do this ... i mean Machine Learning part , for example if somebody asks "Show me sales of X product in last year" , from machine learning perspective how this gets interpreted in actual SQL query ..

Presumably some ad engineers got sick of producing arbitrary customer reports, took the corpus of all customer requests, and tied it to the sheets that generated those reports. And a ton of glue code to make it all work.

I wonder if it's actually machine learning or if they just implemented a query language that tries to guess what you mean.

The definition of machine learning that I use is: an algorithm which improves its performance through experience.

So, if charts doesn't get better the more you use it, it's not machine learning.

If I had to guess I'd say the machine learning part is analyzing a sentence ("Show me a line graph for x, y, and z") and determining what exactly they want to see.

Seems similar to what Wolfram Alpha does, but restricted to a smaller data set.

Kind of goes in the direction of what Thoughtspot (https://www.thoughtspot.com/) is doing (https://www.youtube.com/watch?v=D-y_EjFsDuk)

As far as I care, Google Sheets has beaten several BI companies towards "Natural Language Querying" by being free and accessible to everyone.

Looks like a good idea. But, where does it get its data from?

Does it perform NLP on company documents?

Honestly I don't know, I do not work for Thoughtspot.

My assumption is that it works on metadata coming from a relational schema with a rule parser on top.

Honestly I don't see how it would work well enough if it was based on NLP of unstructured data.

This is very neat.

But the best, still-mostly-hidden feature I've found recently is App Scripting and especially the ability to do a UrlFetch.

I use it as an "API Runner" to run various batch jobs against APIs.


I'm wondering how Microsoft is responding to this. Do they expect their current Excel dominance to continue despite competitors constantly catching up to feature parity and even extra goodies, like this one?

Actually, this has been available in Power BI for a while. I just went to a meetup last night and saw it in action.

It's fairly trivial to get this to work with well-formatted data.

I'm currently evaluating BI solutions for my company and just about every single one has something like this.

They are betting heavily on Power BI: https://powerbi.microsoft.com/en-us/

You don't shop for Excel, you shop for Office. The contracts around O365 are pretty onerous to get out of if you're an EA customer, and the process of going away is full of pitfalls and surprises.

Google used to say that everyone who did an RFP chose Google Apps/GSuite. The problem is that nobody does that!

Yes, Power BI has had this for a couple years in the form of "Q&A"


I wonder if we will see more software including query based input like their charts, and what sort of speed improvement we could see? At first I was not excited to type something where I could click a couple buttons, but then I recognized the other enhancements such as applying a filter right away.

I'm not convinced it's better just because it has machine learning on the back end, but if excel would learn how I want my graphs made from how I manually adjust the graphs (adding axis labels and a title, color preferences, never a 3d bar or pie chart), that'd be a nice enhancement. I'm sure there's a setting, but I haven't searched for it.

I thought Explore in google sheets has had this feature for awhile? I remember it suggesting visualizations in sheets a few months ago.

I've been using it for a while, too. I'm guessing it's been available in Enterprise/g suite, but is now rolling out to all users.

From my experience the public versions of g suite run a more bleeding edge version and they introduce stabilized features into g suite.

This is similar to statsbot.co 's Slack bot

You would message the bot something like "sessions for this month", and it would send back a graph.

wonder if you could make a similar bot with google sheets if they provide an api

https://statsbot.co/slack https://medium.com/slack-developer-blog/bots-you-can-count-o...

There's a lot of basic stuff like column titles, moving columns about, filtering, search that I found had a quite a learning curve with sheets. I built and use this instead. Bell+Cat https://bellpluscat.com

I love your "Made with and ️ on the 3rd rock from the sun." tag.

Thanks, its a little sad how I survive on taco's and coffee :)

does anyone know how this kind of stuff gets built ? I'm considering a spreadsheet-y internal admin dashboard for my startup. I was looking at https://github.com/JoshData/jot to be able to sync stuff on the client side to the server.

has anyone worked on something like this ? the big challenge is synchronization - between server and multiple clients - while being able to offload a lot of computations on to the client.

I wonder how is the security built ? if i maliciously change the formulas in my browser.. will the backend datastore still accept the data ?

The big challenge is implementing the spreadsheet UI.

There's a React component for that. React-datasheet or something like that.

I know there's a React component. There are many. They don't come even closer to the functionalities of an actual Excel-like spreadsheet.

Implementing 70% of the most basic of all functionality of Excel is easy. Implementing 90% is insanely difficulty.

I would say Handsontable reaches the mark of 78%.

I've tried twice, one with React + Mori[0] and one in Cyclejs[1]. In the first I may have reached 82%, in the second probably 85%. But in both cases the latest features I added started to conflict with the oldest ones and I abandoned everything.

[0]: https://github.com/fiatjaf/react-microspreadsheet [1]: http://sheets.alhur.es/

Actually aggrid or handsontable are the commercially available spreadsheet browser plugins.

We use HOT, but have been considering aggrid. Because of these tools, that's not the hard part any more..It's the synchronization and computation logic.

This sort of thing it's easy to get to 80% but good luck getting that last 20% without formalisms. Might be useful for getting a quick feel for a data set to confirm some intuitions, but not really useful beyond that.

Is there any way yet to sort values in a pivot table? Kind of ridiculous that we still need to resort to the query function...

The next major of Tableau should be implementing NLP in similar fashion.

Wow amazing, when will this be available as a javascript library?

still no tables. How hard is it?

404 for me on the link.

how many corporate secretes will be leaked into this?

Leaked into what? If you are using Google Spreadsheets your "secrets" are already on Google servers and are "leaked" long time ago.

While technically true, this is the first time I'm actually thinking "maybe I'll start using Sheets". I hate Google and giving my data away, but lately I often feel that I'm missing pleasure of easy use by avoiding all these hateful botnets. So, eh, maybe I just die before it gets actually massively harmful, and then fuck it.

We need open source implementations for stuff like that.

We have open source implementations for nearly everything that is provided commercially. The problem is that no company wants to use said implementations.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact