After the past few years working with spreadsheet power users, here's how I like to think about them:
1. Most spreadsheets are just used a lists/trackers .
2. Some spreadsheets are very calculation heavy, and are better understood as complex software projects (usually modeling something; s/o to those PE mfs) than as anything else.
3. Spreadsheets make the transition between (simple list/tracker) and (complex software project very fluid. This flexibility usually means that the (complex software project) that is created is buggy as hell.
Spreadsheets give not-super-technical users an incredible visibility into their data. Spreadsheets give not-super-technical users a way to program data transformations in what I would argue is the most generally intuitive way that exists. Spreadsheets give not-super-technical users tools to build software without introducing them to proper software development methods. Ya know, like maybe a test. Or a code review. Or no global variables. Etc.
If you want to see how we're attacking the spreadsheet problem, check us out. Feedback highly appreciated! 
I don't think spreadsheets are a good way to program data transformations at all. Data transformations are inherently a pipeline/function abstraction, and even non-technical users understand that. Spreadsheets don't offer a good way to decompose a data transformation process and inspect each step for correctness. This alone makes them prone to calculation errors. Also, tables are an afterthought to spreadsheets, therefore even trivial joins are always a pain in Excel.
I have a good practical experience with building visual pipeline-based data transformation tools  for non-technical users and they have been a success.
But it is an accessible way. Your average Excel user can perform any number of useful transformations with simple formulas, and then throw that into a pivot table for useful aggregations (which is also a transformation).
> tables are an afterthought to spreadsheets
Excel power users have been solving this with VLOOKUPs for decades.
> therefore even trivial joins are always a pain in Excel
Trivial to a trained user. Excel has more advanced data modeling capabilities, which includes implicit joins when creating PivotTables from tables with relationships.
In a similar fashion you could've posted a link to running an SQL join query from Excel (which is surely doable for a trained user). Technically it's possible, but it would be wrong to pretend like the join is done in Excel.
Power Query is an Excel add-on, and has been around much longer than Power BI and has nothing to do with it. This has been possible for almost a decade, since Excel integrated a robust in-memory columnar database capable of dealing with millions of rows.
> it would be wrong to pretend like the join is done in Excel
The join is done in Excel, I don't need to pretend. That you think Excel is just a bunch of sheets and cells is your own mistaken mental model of what the program is and can do.
Here's another reference:
"Now that Excel has a built-in Data Model"
Something that is as simple as this: https://www.youtube.com/watch?v=RYCtoRTEk84
Here's a video of the process:
Here's another video:
They absolutely give you that ability; you break the transformation across many cells. Spreadsheets are probably the most successful functional reactive programming paradigm set of languages out there.
In theory, yes. But the reality, nobody does that unless an error is apparent and needs to be investigated. The vast majority of calculations in Excel spreadsheets are done using cumbersome, multi-line, unreadable formulas that try to pack as much logic as possible into one expression. Such obscure formulas are the main reason why lots of business-critical Excel spreadsheets contain grave errors, many of which are are never discovered.
Joins are great and relational databases are super useful but these peoples' problems are 99% solved by "store a list, maybe do some arithmetic."
The downside is that something useful you make keeps getting extended and added on to. It is hard to identify when you should switch and do that extra work to move it. And then, you know, tech debt forevs
Imo that thinking is wrong because it makes this assumption that doing anything at all with a gui is easy and frictionless. I don't know how to do anything in excel. If I have to look up how to make a plot in excel, I will have to find some article on the internet of how to do that in my particular excel version. Excel is tough software to use that only becomes easier after you spend time with it (hint hint, just like programming). If I don't know how to make a plot in R, I might also have to find an article. It's no different, no harder, usually easier in fact, to just do the job in R. All these common tasks in R or excel are all 5 min articles worth of information you have to cram in your head.
It's not hard to code, children do it. I'd love to see spreadsheet power users realize that they could be working with much simpler things like parsing plain text files with a few lines of code rather than trying to open some macro heavy spreadsheet on the shared network drive that takes 5 mins just to open on the work issued workstation. I'd guess it would be up to business programs and accounting programs and all these other college majors to actually teach classes in python and R, rather than what they do now which is teach classes in excel. Imo the entrenchment in excel is rooted in ignorance to other (often simpler) options available with python or R, than in any actual critical proprietary features offered by excel. Microsoft is probably pretty happy that their software is so baked into these different academic curriculums, keeping it alive in industry going forward for an entire working persons career after college.
My mother self-learnt excel, first as a to-do list, then a money tracker, then she learned some simple equations to keep track of weekly spending.
How in R would someone do that? With a nice graphical view? I agree Excel has many many issues, but people use it because, in my experience, it super easy to use and in particular let's you easily mix data and code.
At least in germany children are often forced to learn excel in school because the official curriculum includes "office software" which is an euphemism for "microsoft product training". This has certainly not developed from necessity. Even someone who thinks school should only be a preperation for the job market would agree that employees who are able to help themselves are better than ones who know how to use a specific version of Excel.
Every time a business user sits at a computer, are they going to spend 10 minutes getting it done, or 30 minutes trying to figure out how to get pip env configured so they can maybe eventually get it done? If by some miracle they are supernaturally gifted at programming and succeed after some weeks, they now have:
- a nonstandard file format that will eventually break
- no real time feedback of formatting or calculation as they input/manipulate data
- no charting
- no GUI
- no "undo" button
- no import wizards
- no support (but I guess someone light years ahead of any dev talent I have ever seen wouldn't need it)
Microsoft (and I am no fan) has the best tool for data manipulation that exists for most of humanity. Excepting the awful ribbon UI redesign, you could take any Excel 97 user, put them in front of the latest M365 version, and they could get the same work accomplished in the same amount of time.
You cannot say the same thing about almost anything else in the tech industry. Excel is terrible in a lot of ways. But it's a mostly reliable tool, has worked approximately the same way for 25 years, so people are going to use it and teach it. There's no point in wasting someone's valuable time with the horrifying fragility of the python/R/whatever ecosystem.
Sigh... "Microsoft has". Time was, spreadsheets was a generic category of software, made by lots of different software makers. Nowadays, it feels like people not only don't know this, but can hardly even conceive of the possibility that "spreadsheet" could mean something other than "Excel". At least I think that more people are still able to think of, say, word processors or presentation software as not necessarily Microsoft products; the spreadsheets battle seems to have been their most crushing victory in the "Office" or "productivity software" wars.
It's sad not only for the IT world in general or from a free-market perspective, but for spreadsheets in specific, too: Who knows what stuff like Quattro Pro or Lotus Improv would be capable of now, if they were still around?
Except instead of trying to be the C++ of Spredsheets, they target Go like capabilities, and then wonder why business keeps adopting Excel.
It is hard to code, and average children don't do it. The fact that exceptional children do it more than they do other adult things is because coding doesn't require a lot of expensive equipment and you can afford to do it as a kid.
Consider the quadratic formula. It's trivial to write this code. That's not a hard problem to teach anyone to solve with code. Or calculating compound interest. Or solving payroll and basic accounting.
All these things are done with spreadsheets as the abstraction instead of code because of the myth that "code is hard." Code is not hard, it's just we use it to solve problems that are so hard we can't do it with spreadsheets.
Spreadsheets are hard too.
My wife knows R and Python. 90% of the pain of any new project is in environments and dependencies. Some weird error message shows up because there are multiple python executables on her machine or it isn't speaking with jupityer properly and we are on a wild goose chase of googling and command lining.
For excel, you open it and run.
Even if you know how to code, setting up a coding environment has enough "wtf does this mean" roadblocks for nonexperts that it becomes a terrible experience.
I think a paradox of programming is that we know it's easy, yet we also know that a certain fraction of people -- perhaps a majority -- will never grasp how to do it effectively. And the ones who can do it well enough to get paid for it, aren't sticking around in low paying administrative jobs.
And coding well enough to really replace a spreadsheet without making something even worse is yet a higher level of programming skill.
Given the virtually ubiquitous knowledge that programming skill is of market value, I think the existing proportion of people who can program is indicative of how many people are potentially capable of it. I don't know a single manager or engineer who doesn't think that learning a little bit of Python would be useful.
Excel is a glorious tool which welcomes all, the savvy and the unskilled but imaginative newbies alike. There is something about all those little cells that presents an itch everyone wants to scratch, and you just know that for some that scratching is going to produce something akin to a spreadsheet version of gangrenous melanoma.
On the other hand, some people are happy to say that the complexity of sophisticated Excel models means they would have been better off built in a code interface. Ha ha ha!
Even with the right mix of functional and object-oriented code and suitable documentation and version control applied to code, I respectfully disagree.
Data handling capabilities could benefit from skill in all of these tools: error checking (including analytical review); pen & paper; calculators; databases; spreadsheets; math & stat techniques; Word processors (& clear logical explanations).
Together, they form a rich menagerie of tools which will probably all be around until we have an AI that follows us around and we just tell it what we want. Actually, even then, it would be good to know them so that we can grasp the underlying logic and frame the concepts leading to the actions that we ask the AI to assist with...
I wish there were simple-as-Excel frontends for databases where a normal user could input his data like in preformated Excel table without having to deal with the database mechanics.
This is exactly what we are attempting to solve at nocodb : https://github.com/nocodb/nocodb
Which is, nocodb gives you a google-drive like collaborative spreadsheets on your existing databases (MySQL, Postgres etc)
And the original problem in article could be countered with nocodb as we keep audit of all changes done to the database.
There are several products that will let you populate databases in a table view and easily import CSV files into a database. The problem is with almost everything else.
Databases are designed to address a different set of problems. The design of database applications reflects the needs of people who need to maintain a database and make sense in that context. Unfortunately, those design decisions do not make sense in the context of people who use spreadsheets. Doubly unfortunate, a spreadsheet makes a better database than a database makes a better spreadsheet. If anything, the shortcomings of spreadsheets implementation dependent in most cases while the shortcomings of databases are due to the domain they are designed to address.
First example: creating a table is more involved than creating a new file and being presented with a blank table. First, a database is created. Then a table is created within that database. Then columns are created with each column, as a minimum, having a type. Only then can the database be populated.
Second example: doing a quick calculation is more involved than finding an empty cell and inserting a formula. The only way I can think of doing a calculation within a database involves performing a query. If the calculations are performed within a row, a new column can be created to contain the results. If it makes sense to store a result for each row, but it pulls or consolidates data from other rows, the complexity of the query is going to be interesting. If the calculation involves several rows, there is no obvious place to store the result within the table. There are several ways to deal with that, yet none of them are particularly convenient. If you need to perform a calculation on arbitrarily selected data, well, good luck with performing the query. Oh, and automatic recalculate ...
Third example: putting the data into a presentable form is less convenient with databases. It certainly isn't done in the table view since that is just a raw presentation of the data. A form is useful here, but it is a separate step and it isn't going to address all of the functionality found in spreadsheets.
What's the easiest way to generate a CRUD form?
We need a ubiquitous HyperCard equivalent that talks to the Real Database(TM) of your choice on the backend -- sqlite, postgresql, mysql/mariadb, Oracle, MS-SQL, whatever -- with easy functions for "get a sequence number", "get a unique identifier cookie" and "do this block atomically".
I bet there's a few out there right now.
Horrors surfaced when we analyzed the data: unintended duplications because ids and uniqueness checking was not a thing, shifted columns, undocumented column "formats"... In the end we parsed the whole mess into a DB.
I don't see how you could implement a frontend for a database simpler than Access. It was as easy as Excel, with relationships. But reasoning a schema is harder than stuffing all things in a 2D table, and querying a multitable is harder than filtering a few columns.
If we could provide a properly designed database which has already defined datatypes but that would be as easy to fill for a standard user as a preformated Excel table that would be great. You're right, Access has this simplicity, I just wish such tools would be available out-of-the-box for other DB engine.
For the cells with multiple values, I agree that is a terrible idea.
All this adds up to a bunch of errors when you're trying to input and manipulate data, because what you've entered is violating some rule someplace in the database.
It would be awesome if someone could build an interface that gives users the flexibility of Excel, and communicated the issues with the data they're trying to write in a really intuitive way. (e.g. syntax highlighting for data with issues, mouseovers / tooltips, dropdowns & autocompletes for foreign keys, etc)
A database that’s 7 years old.
And also free with whatever package you're already paying for :-)
We lost them.
This is only a failure story for every software development route that involves procurement, which is every route except Excel. Those routes tracked, within the only timescale that mattered, precisely zero patients.
You want to give smart users DB and DVCS tooling? I'm right behind you. Start asking why they don't have those things already.
If you have not listened to the series I highly recommend it. His episode "LaLa Land: Galileo’s Warning" is by far one of my favorite pieces of media. In brief it is why redundant tightly coupled fail safes will often lead to cascading failures.
I also adore the specific episode mentioned by Jaepa. It clearly describes why "hey this bandaid helps us, so let's do more of it" can actually cause problems, vs helping us be more safe.
It seems like just adding the ability to spread a calculation out over multiple lines and add some indentation would make the bugs everyone complains about go down by... a lot.
A3: IF(<boolean>, <result if true>, <result if false>)
A3: IF(B3, C3, D3)
In fact, people who are aware of Alt-enter produce buggier code: they end up writing longer formulas, with fewer intermediate results displayed, and have less visibility of the functioning of their spreadsheets.
Write simpler formulas.
Excel's formula language seems deliberately designed to prevent that.
Sometimes I'll get these spreadsheets with byzantine formulas that I have to copy it to a text editor and format myself to make sense of all the parenthesis.
The only disadvantage is if someone else isn't expecting the formulas to be like this, then gets confused when they can only see the first line.
Having the compose box fit itself to the formula size or give other indication that there is more to see is still a head-scratcher why they didn't do it.
smallpox had no animal reservoirs
Most viruses (and other diseases) have animal reservoirs, and that's why few diseases which ever get widespread, are ever eradicated, and it's nothing to do with spreadsheets.
Now if he wanted to make a point about data, it would be good to mention that we have very little data on, say, how many species have which of these diseases, and how that's changing over time. But that wouldn't have much to do with spreadsheets either.
I don't think that part of the story had anything to do with spreadsheets. It just illustrated the value of (timely) data, especially when fighting a virus and specifically in distributing scarce vaccine doses.
A virus with animal reservoirs could similarly be controlled based on monitoring where outbreaks occur and taking some action. That could be distributing vaccines, limiting access to the animals or taking some action to reduce the animal reservoir.
From my experience these monstrosities are usually created by two reasons: Push from business side and a data team too slow to react. Thus the analysts have to create the first version that seems to be OK and smart but eventually evolved into a monstrosity.
Why? Because we don't have access to any databases. I dread to think of the requests and committees I would need to find and traverse to get a database set up for my team, let alone a system that I can write and run code on. Yet we can guarantee that every internal and external user has access to Excel.
Unless you are working for a software firm, or are part of a dedicated analytics team with access to suitable systems, large companies and government pretty much have no chance of being able to roll your own database.
Don't suggest Access. Never suggest Access.
Can you try sqlite? It's usually enough for simple things I think.
So Excel > PostrGRE? :)
I wonder what kind of data did you use and why it had to be stored in so many columns. This approach would probably kill a "real" database too.
With a relational database this would be kind of insane, as it's much easier to normalize data into separate tables, then just join up them up when you need to.
Because of that, chances are the natural format for a data item taking 16,384 columns in excel likely takes fewer than 16,000 columns in PostreSQL.
Spreadsheets allow the user to see each intermediate state in their application, as well as each iteration over the input data set, laid out in a 2-dimensional grid. There just isn't any substitute I've seen that is as accessible to the computing "layperson."
It allows people to become programmers by making them learn this clumsy tough gui that isn't readily apparent how anything works, and changes between versions. I don't think the learning curve is any easier than what it would take to make a person actually into a programmer with a language like python or R. A two hour tutorial in either language is probably all you need to do 90% of what people use excel for. Why not just skip the abstraction and learn how to actually program if you have to spend time learning how to faux program in excel anyway?
To do that in Python, I'd have to first choose a format to enter my numbers (I'd probably use Excel, then export CSV). Then I have to install Python, pip install a graphing library, import the CSV, then plot. I have to go through the different types of plots one by one.
I've taught python and graphing, you need much more than a 2 hour lesson before people can be left alone without help I'm afraid, if they have zero programming experience.
I don't claim Excel is good programming practice, or scalable, but it is much easier than Python or R. The best future thing (I think) would be an excel like interface which would show you, and let you edit, Python that did what you'd just chosen in the GUI.
It's only after trying to convert an excel sheet to [insert your favorite language] that they realize it would take them 6 month with a team of 5 to replicate what a single accountant did in a week.
Excel is the most widely used programming environment on earth because it is so good at it. That said, I think the fact that we haven't made anything better speaks to the change in philosophy our industry has undergone, from one where we wanted to enable users with computing to one where we want to herd and farm them for money.
 yes, both 'sophisticated' and 'application' are defined pretty loosely here.
( Any organization that has a smallest reason to care about their data should remove save button from excel and start educating their personnel. Using excel in any important role should be seen as making the eventual mistakes on purpose and someone should be kept responsible.)
Why do they need to be webapps? If you've done it with Excel so far, you're obviously on a LAN, so why not traditional client/server native apps?
The "years" of work are typically not for implementing the spreadsheet, but to define what it does. The requirements evolve over many versions.
Of course it's fast to re-implement it after all the work to define the requirement is done, but it's only possible because Excel allowed all those prototypes.
I fully agree that Excel is great for quick drafting, visualizing data quickly, and prototyping. But it should be left there. Anything you do that lasts even overnight and has any significant numbers in it should be done with something else than Excel.
So prototypes are done in Excel because it’s the fastest and cheapest way to do it, and they don’t get redone in something else afterwards for the reasons above.
Done by a domain expert in their field. If you had to create the project from scratch without aid of an existing Excel file to clone, you would not be able to do it even remotely as quickly as the domain expert, if at all.
You're not seeing the process and effort that went into it, which can be considerable.
It's a two-pronged attack of using tools that you have figured out as you go (not everyone is an educated, practiced, professional programmer, or even good at Excel) and a sunk cost fallacy/resistance to changing what "works".
The thing about spreadsheets is you don't have to commission bespoke software. And just the value of that swamps everything else in the vast majority of cases.
In small businesses I doubt the success rate is anywhere near 50% if a non-technical person recognizes a business need for a software solution, hires someone to build it for them, and then it actually reaches a useful conclusion. But if you are numerate at all you can drive a spreadsheet. Not having to swim with the sharks for months or years while you hemorrhage cash is hard to appreciate for the sharks (us, the people who build technical things). Most small businesses and many large ones don't have anyone who can manage even contracting something like this out.
If things have gone on long enough that the problem is stable, clearly defined, and the costs of 3rd order effects like lack of revision control are losing enough money to justify the risk of a custom software project, sure, it needs to be done. But for many companies that risk will always be bigger than the ones that come with spreadsheets.
ERP systems are supposedly finished software that you deploy and tweak for particular business needs. The success rate on deploying those is waaaay below 100%, and the expenses are huge. If your company or department is non-technical the time for custom business software is usually when your hair is on fire and it's the only option.
I worked at a financial software firm written in python that used Excel behind the scenes to run calculations and modelling.
That's from my worst nightmares :(
No single application has been as widely adopted by as wide a number of people for such a wide range of uses as the spreadsheet. Love them or hate them, but the spreadsheet metaphor is at once highly useful, highly adaptable, and highly usable requiring minimal support to extract needed value.
The spreadsheet, with all its questionable glory, is here to stay. From the first Visicalc to Excel 2300 I don't think we'll see the end of spreadsheets for a long time coming.
* flexibility (throw data anywhere and link it to each other, make edits, write notes),
* power (formulas, etc), and
* familiarity (the most underrated factor) of spreadsheets.
Nothing else allows the non-tech user to feel empowered like spreadsheets do.
Spreadsheet evolution has been slow though. Google Sheets added cloud + collaboration 10 years ago. We (https://coefficient.io) are adding the layer of connectivity to sheets so they can remain in sync with the actual sources of data (Cloud apps like Salesforce, DBs, BI tools, etc) so sheets actually become "live" (even though they have been in the cloud for a while) and to reduce manual work and increase trust/accuracy. There is so much more that can be done to leverage this largest software platform that is out there.
These were not small companies either, ~$800M in yearly revenue for one, and ~$150M for the other.
Then, afterwards, the end value would be massaged to what felt right. And yet, these sheets were seen as part of the "secret sauce" of the business.
It's one of the reasons I really wanted out.
Programmers have a reputation of being arrogant assholes, but I think this push-back and ridiculing other industries of using excel for stuff like this is completely justified. Excel spreadsheets let these people FEEL productive and like masters of their own fate with a bunch of numbers neatly encapsulated in their own little cells in a table, but their actual usefulness is questionable. For construction, it gives a rough feel for a project, but a lot of it is smoke and mirrors.
Jesus... want to start a competitor to these guys!? Email me
I also have seen Access applications with more features than a common Salesforce CRMs. I have seen it as a navigation system that could bring you to the nearest partners or shops that provide necessary parts. Yes, there is code of doom behind it and if those maintainers leave the company, problems arise. But the feature set was often extremely hard to beat.
And I am not sure how to respond to non-technical users about better alternatives. They use it as a front end for SQL manipulation and it isn't easy to come up with something better for that target group.
Even worse is SharePoint. It is a complete abomination, it can easily beat most horror novels. But the workflow engines are just extremely practical for corporate processes. Only recently some alternatives came to the market... I just wish I could kill it...
Excel has to contend with nearly 40 years of backwards compatibility (MultiPlan, the predecessor to Excel, was released in 1982) and a deep userbase that literally has decades of experience and muscle memory with the software. The Symbolic Link "SYLK" file format introduced in MultiPlan is still supported in recent versions of Excel, leading to the infamous CSV "ID" issue.
Many of our users still run very old versions of Excel and Windows (e.g. Excel 5.0 on Windows 95) because a change in a future version of Excel caused problems or gave different results.
I'd say it allows the same for technical users as well since it helps bridge communication/knowledge gaps and move things along. It's typically not the solution but often plays a critical supporting role.
Spreadsheets aren't 100% reliable for use cases where you need to collaborate and share immutable health records. Especially during a time of global emergency when tensions are heightened.
Spreadsheets don't impose validation, schema correctness, constraints, etc. and can amplify human errors. They can also inject errors of their own (eg. turning March1 the gene  into a date) when they're simply trying to be helpful.
How do multiple people manage a spreadsheet? How do you safely merge spreadsheets? How do you keep records from being duplicated or replaced? How do you do double bookkeeping? Can you atomically identify individual records?
This article is saying that we have to realize spreadsheets can be a source of scientific error simply because of their design intentions and ergonomics.
The closing remarks estimate that 1,500 people died as a result of spreadsheet error. That's remarkable.
Spreadsheet mistakes have been directly attributed not only to thousands of deaths but billions of dollars of mistakenly wasted money. Heck, in the UK alone, they have attributed billions of pounds and thousands of deaths specifically to spreadsheet errors and flaws, and that's not even counting the COVID-related issues from the article. Check this scary article: https://theconversation.com/excel-errors-the-uk-government-h...
There have been hordes of purpose-built apps that have aimed to replace the common spreadsheet for any number of tasks from inventory management to financial planning to even tracking nuclear weapons (true story: our nation's nuclear weapons stockpile is tracked in a spreadsheet). And yet, the spreadsheet is still here. The spreadsheet is the cockroach of all apps. At once utterly adaptable and seemingly indefeatable.
The spreadsheet is a horror show with lack of control, schema, and collaborative management. We surely can do better. But we haven't. Google and Microsoft have the best developers in the world who can do anything. And they have produced more spreadsheets. But spreadsheets are the bane of our existence! And yet they are still here.
The spreadsheet has survived from the mainframe era to the cloud-based era in pretty much the same form with enhancements. The spreadsheet is not going away, even with all the complaints in the article. The points are well made and well founded. But unfortunately they miss the point. 20 years from now we'll still be cursing the tyranny of the spreadsheets because of their utility.
I close with a Haiku:
| Rows and cells, alas |
| What makes your simple structure |
| So very useful? |
That's exactly what's being done. I work in mechanical engineering and Excel is the primary calculation tool all over the industry.
I bet excel has some excellent features around this, but they are not front and center in the UI.
Conventional spreadsheets used on-screen cells to store all data, formulas, and notes. Improv separated these concepts and used the cells only for input and output data. Formulas, macros and other objects existed outside the cells, to simplify editing and reduce errors. Improv used named ranges for all formulas, as opposed to cell addresses.
Like most of the software on the NeXT platform, it was ahead of its time.
yeah, it's called IF
We might even argue that spreadsheets were already used in 1295 BCE, as shown in Fig. 1.1 of .
Then you provide a spreadsheet export or direct connection that can be hooked into another spreadsheet with all the formulas, pivot tables and so on.
This way the core data is safeguarded and kept clean while not requiring constant development for every report requested.
I don’t think I’ve ever had a time when I wanted to use a spreadsheet for anything moderately complicated where gsheets was up to snuff. That’s not to say that excel doesn’t have problems or does things easily, but it doesn’t lead to quite the same level of frustration and hair-pulling, and indeed it’s usually possible to achieve things in excel when they are impossible in gsheets. Indeed that app even struggles at the ‘easy’ task of being able to fiddle around with table formatting. Its sole advantage is the collaboration. It can be used as a place to dump small amounts of varied tabular data to share but not really for calculation or analysis.
With Excel you have to ensure the file is there, and readable, not tampered with, etc.
This allowed us to create better multi-workbook "apps" that have a bit more modularity - you could even embed sheets into a webpage as a view-only client.
In the end, Excel is good for us only for interoperability with external groups.
Google Sheet excels at many other tasks, but it's not a replacement for specialized tool (and it should not).
In the end I put on my robe and wizard hat and used matplotlib. (╯°□°）╯︵ ┻━┻
Plotly is my current default when I'm working with Jupyter, though. Sane defaults are nice.
Not just sheet sizes, but Sheets feels much slower and has a vastly weaker toolset. Just the fact that the browser takes precedence for keyboard commands from the web app makes the usability suffer massively.
It will be extremely funny to see how Google does this whole "rewrite all our office products to use <canvas> instead of the DOM" project over the next couple quarters... he says, laughing, shaking his head at the wasted effort... Caesar wept for there were no more worlds to conquer
Data integrity is definitely a hard problem.
Even now the limit is ~1million rows, meaning that if you have even a very simple table but lots of rows then you have to pay more for Access.
The situation is something of a general problem of a data-driven Society. One group of people want information available with one interface and spends resources only in putting it in a format suitable - to the detriment of others who/want need it. With spreadsheets in particular, naturally you have a data-integrity but that's still an instance of the broader problem.
The good part of this not-invented-here mindset is that it's also how technology moves forward. "I need this new system to be mine/fit my way of organizing" is also often another way of creating a "this never existed before" product or outcome.
These storage and organization tools and silos are born as subjective (context-fitting) design processes. So they tend to get locked up behind depth-oriented (subjective) processes, and eventually frozen by stabilizer groups.
Stabilizer groups follow up to use the tools and promote the idea of keeping the data siloed the way it is. They don't like change at work because change breaks their preferred psychological processes, causes them to have to re-build their perceptual frameworks, and because they're humans, this makes them do stupid things at work, like lash out or become passive-aggressive or detonate their new diet plan or become late for a baseball game. They will blame all of this on "open data" or whatever it is that caused their stable workflow to destabilize.
There are lots of solutions to this, and Excel by itself can be seen as an attempt at a solution to the problem...also the problem can be moderated by the system of energy surrounding the data valuation and access to the data.
And "just don't silo stuff, guys" seems to be the proposal with the worst track record so far.
Spreadsheets have many advantages, they are easy to edit, copy around, and quickly filter and do simple calculations (like find how many students are in a class if size smaller than 10, for example). I can't think of a sensible alternative given current tools -- we should make better tools of course.
Excel excels at creating spreadsheets. Spreadsheets are pretty reports used to report tabular data. Excel is capable of so much more, but for eveything beyond spreadsheets it's medicore at best and its complexity often makes it a liability.
It's a bad calculator. It's a bad database. It's a bad medium for transferring data. It's a bad front-end. It's a bad notetaking app. If your use case goes beyond making tabular data look pretty, Excel is probably bad at it.
People use Excel for everything because they're comfortable with it. But other solutions are often much better and I'm so tired of cleaning up all the issues Excel has caused (and continues to cause) me.
It's also something that most people in most businesses have installed and as such, can be incredibly helpful for sharing data, models, etc. The collaborative editing in Office 365 adds a new dimension in that (finally) we don't have to email round documents and implement half-cocked versioning any more (though plenty of people still do), and multiple people can make changes at the same time.
If you want to slag off any of the applications in Microsoft's Office suite then the correct answers (and I don't think this is particularly controversial) are Outlook and Teams. If you want real productivity destroyers and daily pain, look no further.
Excel? It's great.
Yes, Excel is a very powerful tool and I would never argue otherwise. I definitely wouldn't argue against using it altogether. But I have come to loathe it because of how often it is misused.
Sure, any tool can be misused. But in my experience Excel gets misused more than anything else combined. Excel's accessibility and ubiquity is part of what makes it so powerful... and what compels people to use it where other tools are much better suited.
Don't get me wrong. It's okay to use a suboptimal tool... to a point. Users should be aware of a tool's limitations and seek out better solutions when it's clear those limitations will cause problems.
Excel is not a bad tool. It's so popular because it really is a good tool. It's just not the ultimate tool.
p.s., Google Sheets is one of the more underrated cloud apps of all time.
I've noticed there different ways people use excel: as a dashboard, to store data, to modify data.
IMO storing data is the worst use of excel, especially when it's done in a non 'tidy' fashion , with highlights and formatting to encode data.
This is from my experience as an R and python user in science.
Most Excel tables I've seen aren't just data and presentation. It's computing the data as well as presenting it.
And how is the data computed?
* Applying formula to a row/column range to get another range
* Applying formula to a range to get a single-cell output out of it
Obviously, I am not talking about clustering and performance here, but a computational paradigm.
Yes. Emphasis on "poor man's". Excel is mediocre at many things, but it's highly accessible so people use it anyways. That can be good or bad depending on the situation.
Need to quickly crunch numbers for a small dataset? Excel is perfect. You can probably put something together quickly to get what you want.
Need to maintain a contact tracing database? Not so great. Excel scales poorly and the low implementation cost isn't worth the limitations and risks.
One wonders what would happen if you could extend lisp to do reactive programming, and then extend it a bit further to load an excel workbook into an abstract syntax tree.
I can write python, but I can get the answer out of excel in a fraction of the time for most use cases.
As an added bonus, excel documents are far more ubiquitous and I can share them with clients, and they can see my workings. Python isn’t very transparent and business users can’t usually check the logic.
Excel is also much more interactive and allows for much more “discovery” and playing with the data.
What I would say though, is most power users don’t know about functionality in excel like PowerQuery, relational data models and DAX that actually do turn it into a serious and repeatable tool.
(I’m not arguing that Python isn’t better for many use cases, and excel definitely isn’t right for most critical applications, but I also think Python is much slower for ad-hoc data analysis so both have different places in the market. Anyone think python is faster? I’m down for a race!).
How? Do they check that every cell in a column actually has the same formula? Do they check data format everywhere?
Python script is something that is possible to be actually reviewed, and results - reproduced, and "formula's" there are actually readable.
But for some stupid reason excel files are still shared over email.
I would hope that we would be generally wiser than that here.
I actually have a lot of empathy for people who forced to deal with all those problems, otherwise I just wouldn't care.
They just have to check the top cell as most of my formulas are array formulas.
You don’t have to drag a formula down - that’s a common misconception in the latest versions. If you do =A1 + B1 and want to apply it to the 1000 cells below you just write = A1:A1000 + B2:B1000.
That’s still not that readable though, so I’ll apply those cells two named ranges “Sales” and “Taxes”.
Then the formula is = Sales + Taxes once and that will populate the whole column of data.
Then there’s M Code and PowerQuery which literally allows you to review the data cleaning line by line and even see the data state at any intermediary step. It also has
> Python script is something that is possible to be actually reviewed
The problem for me is that, as someone who works in consulting, it can’t be reviewed by my boss or a client, neither of whom can program. But they can review a tidy excel sheet.
And then they can’t edit it either, so if I go on holiday and I’ve built some sort of model nobody else can make progress until I come back, or if I move projects I’m also stuck maintaining the model on the old one.
I work as a banker, and what you do is one of the very first things new employees are taught not to do in excel. It's an amazing solution for the person that built it, but a terrible one for anyone looking to check the work.
Can you point me to the 'best practice' guide you are referring to? What authority on excel standards said this?
Personally I tried to find articles saying it's not best practice by typing in "dont use named ranges best practice" or "named ranges in excel are bad" into google, but it mostly brings up articles stating that using named ranges is best practice and improves readability!
> How could someone looking at it know what Sales and Tax actually are?
If you really want to use that example, you would click in the formula bar and it will highlight the ranges, and colour code them automatically.
If it's on a different sheet, you just hit ctrl + g and type in the name, and it will take you directly to the cell it's linked to (which usually in my case, is linked to a sheet that contains all my model's assumptions in one place, each one with a named range describing what it is). It's much easier and quicker than going to =Assumptions!G52.
> I work as a banker, and what you do is one of the very first things new employees are taught not to do in excel.
Seems like a silly thing to teach people IMO. In my experience it makes formulas much more readable (both writer and reader), makes it much faster to build models, and cuts down errors substantially.
I personally find that formulas are much easier to review, because the named ranges provide some intent. If someone writes =(A2Assumptions!92)/Assumptions!91 I've got to really unpick it to work out if it's right, but if someone labels it =(A2Miles_Per_Hour)/Average_Miles_Per_Vehicle then I can see that the formula is wrong almost instantly.
Additionally if I want to write another formula using those values, I can just type it straight into the formula bar without having to go and click on the right cell reference in another sheet.
Well investment bankers tend to be pretty darn good at excel, and the banks I've been at would scorn you for doing it, as well as all the standardized training given out to fresh recruits to the industry. We had a buy-side deal recently fall through partially because the (fairly sophisticated) model the company selling itself used was absolutely unreadable and unaccountable. They did exactly what you said (naming), including with their assumptions. It was 20 sheets of unpenetrable mass, and we were all turned off by the fact that you couldn't follow it whatsoever, and was basically unaccountable.
>If you really want to use that example, you would click in the formula bar and it will highlight the ranges, and colour code them automatically.
If it's on a different sheet, you just hit ctrl + g and type in the name, and it will take you directly to the cell it's linked to (which usually in my case, is linked to a sheet that contains all my model's assumptions in one place, each one with a named range describing what it is). It's much easier and quicker than going to =Assumptions!G52.
This how I can tell you've never seriously worked with excel, because that's MUCH slower than using the native/macabacus auditing tools. Adds up over thousands of times. And what if Sales isn't just a simple cell in another sheet, but is actually tied to a named formula itself, tied to a named formula itself, tied to another worksheet (with named formulas in them!). That's a deep rabbit hole and you to be going down with little transparency, where you're having to look up the formula manager to find whatever the fuck the named variables are actually referring to (what if someone follows your advice and names the tax rate as "Tax" and now ctrl-g doesn't work!)
>Seems like a silly thing to teach people IMO. In my experience it makes formulas much more readable (both writer and reader), makes it much faster to build models, and cuts down errors substantially.
It makes formulas much easier to read, but with zero accountability, and considering someone will very likely be looking at the excel at some point, with no idea what you did, they have to check each and every one out to make sure it's not bullshit. Plus, it doesn't really make modeling faster assuming you've built out your source numbers/assumptions well and use best practices (like A24+A25+A26 instead of A25+A26+A24).
>I personally find that formulas are much easier to review, because the named ranges provide some intent. If someone writes =(A2Assumptions!92)/Assumptions!91 I've got to really unpick it to work out if it's right, but if someone labels it =(A2Miles_Per_Hour)/Average_Miles_Per_Vehicle then I can see that the formula is wrong almost instantly.
Use tracing, and if the assumption tab is well built out it really isn't faster, at all. +alt w,n
>Additionally if I want to write another formula using those values, I can just type it straight into the formula bar without having to go and click on the right cell reference in another sheet.
Use multiple windows, makes life much easier.
> This is how I can tell you have never seriously worked in excel.
Ah, so the style guide and best practice is based on “Trust me - I rock and you suck”.
I can guarantee that I do use excel seriously, and my personal experience is the exact opposite. You wouldn’t see beautiful code with variable names like A2, and it’s exactly the same for me with excel.
Besides - a feature you can turn off in about 5 minutes stopped you from doing a deal?! Why not just explode the named ranges out? I suspect the issue here is an overly complex model rather than named ranges - I’ve seen lots of these without named ranges too and they are even less manageable in that state!
Every year freshly minted MBA's would begin the training and immediately become aghast when the trainers told them to never, ever, use named-ranges in formulas. Not only had the trainees been explicitly taught to use named-ranged in their MBA programs, but any idiot could plainly see that [=enterprise_value/ebitda] was better in every way than [=C13/F22]. More expressive, more readable, easier to spot errors, etc.
The trainers would argue that in an MBA program you build your models, submit them, and move on to the next assignment, so you don't get a view of the longer-term problems that arise from named-ranges. What does [=ebitda] actually tell you? Is that the last quarter's actual number, is it the current estimate of the next fiscal quarter, is it a 12-month blended forward estimate?
And as you try to solve those questions with [=ebitda_est_next_fiscal_qtr] you often end up with two more problems: an unambiguous name for you can unambiguously mean something else to someone else and as you update your model over time, if you forget to update your named-range references, you have formulas that look right, but are wrong, e.g., [=ebitda_est_next_fiscal_qtr] now refers to an actual reported ebitda number, not next quarter's estimate.
[=C47] while not telling you much at all, is unambiguous. It doesn't look "right" or "wrong". It can't mislead you. If you want to know what it is, go look at cell C47. The new MBA's would argue, "Wait a second, C47 could now point to the wrong data as well, and it could be mislabeled with a stale row or column header. And the trainers would reply, "Exactly, and when you go to see what C47 represents you will have the context to recognize those errors and fix them."
Also, very cool experience at Goldman!
I wonder if there could be made some middleground language, visual enough to not being scary for spreadsheet people, and debuggable and readable enough to not become a mess when model becomes big. Jupyter comes to mind, but it looks like it didn't get any traction outside data science.
If you need to treat one row differently as a special case that is easy in excel but can be a lot of work with programming, both for technical reasons (you need to handle the special case with special code) and for silly reasons (in excel you click the row which is special; in a program you must describe how to find the row.)
Certainly there is an argument that one shouldn’t have such special cases as they make maintainability a nightmare but sometimes it matters a lot more to have something now rather than something later.
In excel, the data is the most important thing and in the real world it often is too. In a program, the data (especially the intermediate data) is quite invisible. Power users may spot errors because intermediate values look wrong, rather than by carefully reading the formulas. Though if the values look right or there are a lot of them, errors may not be spotted.
Excel is very ad-hoc which is the problem people complain about but also it’s power in dealing with the changing real world and anything exploratory. Once it is working and stable in excel, it is more wise to do it in python.
It’s not about every problem looking like a nail, it’s that, nobody had to pay for this fork, somehow everyone has it and it’s such a great fork that it can do 90% of the problems you throw at it.
Yea sure some problems would be better solved with specialized tools, (eg knife, spoon, etc) but those require training and budget. No one has those. Everyone has a fork.
Actually, everyone had to pay for that fork. Excel is a paid product, and isn't even included in the cost of Windows. It is, however, bundled with MS Office, which includes Word, which is even more of a must have at almost every single company/org.
The main point I'm making is that no one has to go to their manager and justify paying for it. It's already bought and paid for, and because of its ubiquity most people have a pretty good idea of how to use it.
Spreadsheets are like a Swiss Army Knife.
I have used them for
- Templates for Printing
- Data Analysis
- Financial Tracking
- Report Generation
- Calculation sheet for engineering stuff
and a multitude more uses that I cannot even remember now.
They are not a hammer, though that is one form in which they can work.
Ultimately they are a tool, and a damn fine tool at that.
Their potential is limited only by the creativity of the user using them.
But, nothing beats the availability (both of sharing and editing) of an open Google Sheet.
Its spreadsheets all the way.
It enables you to load CSVs and write SQL on those CSVs.
The main benefit is that it is very fast when compared to Excel or Python.
Excel cannot really handle 1GB CSV file. With Python, just loading 1GB file and iterating through every line probably already takes more than a minutes.
Superintendent pushes most logic to C and Sqlite, so it's super fast.
Here's the app: https://superintendent.app
But if you're running things in prod via excel you haven't actually thought enough about it to call it prod.
As for reasons you're outgrowing a spreadsheet; access and permissions, friendlier error messages or a better "wizard-like" process, formalizing input data, outgrowing data size, smarter caching for computing values, protection or clarity around modifying "constant" or "magic" numbers. Many of those you could implement in a spreadsheet, but its often more complex than using something else. That something else probably varies with which problem you're trying to solve.
Lots of data in business isn't generated by another program, it's built and maintained by users.
I see people move from a filesystem based workflow to a database and while that may be faster and more efficient you lose those tools to look at and tweak data. You either need to write those or teach them how to navigate the database. People rightfully hate black-boxes as soon as stuff stops working as expected.
Personally, I love the Linux command line and will pipe stuff in and out or stash things in flat files. I may move the data back into a spreadsheet to get more visibility or look for trends.
I could just open it in a text editor I suppose, but then I have to deal with escaping, and moving around cells, and that sounds painful.
But things really are hopeless.
Even Microsoft can't control the monster they unleashed on the world. Execs and managers using Excel instead of Planner for project tracking ("Spreadsheets are for manipulating numbers, not word processing!"). Used in that way, Excel isn't so fragile, but why is it being used at all? For f*cks sake, hasn't anyone ever heard of Word tables?
Honestly, anyone who uses Excel as a database should be exiled to... someplace where they can't do any more damage. Maybe connectivity-poor rural Arkansas. That's technological malpractice. It's indefensible, and an embarrassment to the whole tech sector that anyone is trying to excuse it.
There is simply no point complaining about Excel unless you have a viable alternative, and that alternative is not coming from traditional IT depts.
Specifically, iTunes (especially the older interface from iTunes 5 to 10) allowed for Playlist Folders and Smart Playlists.
Workbooks can have several sheets, but not in a nested structure. Same with SQL databases and tables. If I want to organise data (e.g. Country > Social group > Friend name/email/phone number) then it's easy to do in iTunes but difficult with typical Contacts apps.
Web browser bookmarks are pretty good, but the end object has only 2 fields (name and URL). So what I do is I have a fake-radio track object in iTunes, with all the extra tags. Double-clicking it loads a page on localhost, which launches a PHP script that then opens a URL in my browser (for the friend list) or document.
You’re not paying attention to the space but recalculating notebooks are gaining popularity.
I wouldn't use observable, as you are tied to their web service, and I've had too many companies close to use such a service for important data. Pluto is promising, but (I believe?) Only supports Julia, which is growing but still not that popular, or known.
On a humorous side note ... competitive excel
Didn’t he say this backward? I feel like I’m about to lose my mind
For scripting a thing or two it's not terrible, but back then it wasn't so easy to split out the code for version control. Some formula for pricing this or that is gonna work in whatever language anyway and doesn't change a whole lot though.
It was when I started to pull FIX connections into it that I thought maybe it's time to do things properly. Then you're suddenly in need of understanding how the execution model works.
Also as soon as you need to follow a schema Excel is too soft. It's trivial for someone to just add an extra bit of info on a row that throws everything.
I guess F# like is modern enough.