This is a very interesting article, but I do find it very one-sided. The problem isn't Excel, it's those who use it from a foundation of ignorance or carelessness. No doubt, human error does creep into every human activity. Even so-called experts fumble. It happens to all of us. Blaming the tool or the tool maker is pure nonsense.
I have used the tools mentioned in this article, Excel and Powerpoint, extensively. And I have used them for engineering and business purposes. And, yes, I have made mistakes that have cost me money. Funny enough, never when used for financial applications but rather for engineering.
Over time we developed a rather complex (probably a nearly a hundred columns and two-hundred rows with data and formulas) Excel workbook used to calculate timing parameters and counter reset values for a complex timing and synchronization circuit inside an FPGA. After one crank of the wheel this spreadsheet would output a table of hex values that would get plugged into Verilog code to setup the circuit. Everything worked as expected, or so we thought.
Then, a few weeks later, as the code was being run through test-benches (unit tests) we discovered that a corner case was broken. Being that it was hardware we immediately thought that the problem was the circuit. The code was checked and re-checked, simulated and re-simulated. After about six months of hard debugging work we discovered that the Excel spreadsheet was the culprit. We had used the ROUND() function in a calculation, when, in reality, we were supposed to use ROUNDUP(). Only one of hundreds of formulas contained this mistake. This cost us six months of engineering time and, therefore, lots of money.
Having suffered through something like that I still could not bring myself to blame Excel for the problem. We made a mistake. No, in this case, I made a mistake. And that was that. Blaming Excel or Microsoft would have been nothing less than moronic.
I have used Excel to design polyphase FIR filters, maintain state machine code, produce complex lookup table initialization values for a variety of languages, prepare data for EDA tools, process sensor data and even generate G-code to machine bolt patterns on a Haas CNC mill.
Oh, yes, also invoices, purchase orders and financial analysis.
It's a great tool. And, if you use VBA you can really make it do some amazing stuff.
Powerpoint is another interesting tool. Of course, the most visible application is to create slides for presentations. However, you can also access its automation layer via VBA. And, yes, Visual Basic isn't the most elegant of programming languages, but, who cares, this is about getting a job done, not about authoring monuments to computer science. Anyhow, the most interesting application I found for this was the creation of a full simulation of the physical control panel for a device. The simulation included buttons and even a pretty good simulation of a dot matrix display. It was used during a training presentation in lieu of having the actual physical device there. The menu code was pulled out of the state machine that ran the menu on the actual device (written in C but easily translated to VB). It was really cool.
What surprises me is that financial folks, as related by the article, don't seem to have a system in place to create "checks and balances" on spreadsheets that might actually be used for very serious financial decisions. That's odd, but I can see it happening. Startup opportunity? Maybe.
A compelling if painful story. You must have some thoughts about what you could have done to track down the spreadsheet problem sooner – what are they? Also, if you could change Excel in any way you chose, what might be ways for it to prevent or detect such problems?
I'd like to ask about a technicality as well, if you don't mind. You said that ROUNDUP was used in hundreds of formulas and ROUND in only one. How many of these formulas were identical except for relative cell references, as opposed to doing logically distinct things? By "identical except for relative cell references" I mean the kind of formulas you get by selecting a cell and telling Excel to copy its formula across a range -- Excel sees these as distinct formulas, but really they're computing the same thing on different inputs; in APL it would be a single expression. I'm wondering if the bad ROUND guy was identical to some other formula in the sheet, or whether it was a fundamentally new formula that someone typed in and forgot to use ROUNDUP. Really, I guess I'm just wondering how the bad formula happened to get entered.
The reason for that last question is that I'm interested in what a spreadsheet would be like that recognized common formulas across a range and treated them as a single data-parallel computation, rather than a bunch of separate cells that happen to have relatively identical formulas. Clearly this would have performance advantages (parallelism and data locality). But it would also seem to have a better chance at preventing errors, since it would treat "computation X across an array of N cells" as a single thing, rather than N relative copies of X that might get altered separately. Curious to hear your thoughts on that general idea too.
Not many people on HN mention APL. I programmed in APL professionally for almost ten years and know (knew?) the language well.
It's too late for me to go dig-up this old spreadsheet. Maybe I'll do it tomorrow. Excel does warn you when there are discrepancies between adjoining cells. If I remember correctly this was a cell that was copied down and used on a row. Thinking back I think the formula was entered once but replicated Dow a range of rows.
The problem with the APL comparison is that the language can become less elegant if the operations to be applied across the rows of a matrix are somewhat varied. I'm typing on an iPad right now so I can't even try to give you an APL example. Maybe tomorrow from my PC.
(I remembered your APL experience from some previous HN comment or other.) You're right that spreadsheets tend to be more heterogeneous than a chain of array operators. I'm not suggesting that APL as a language should be integrated into spreadsheets, but rather that there is a way to implement spreadsheet language (formulas and cells) that partially approaches how APL treats computation. This is possible because there is a lot of implicitly array-oriented computation in typical spreadsheets – that's what you're "declaring" when you drag a formula across a range of cells. What would a spreadsheet system be like if it really exploited this? The trick is to do it without breaking too much with the well-established mental model of Excel users.
Excel does have array formulas. Every time I used them I have to look up usage. From memory they work well for matrix transforms and such things. Again, my memory fails me because It's probably been five or more years since I've touched them.
I have a busy day today so I have to stay off HN. I'll look into some of the items requiring deeper answers later tonight or tomorrow.
"Excel sees these as distinct formulas, but really they're computing the same thing on different inputs; in APL it would be a single expression."
Excel notices when there is one formula that breaks a pattern, and puts a warning sign on that call (my Office 2003 version of Excel does). Now it presumably misses some cases, but the standard 'drag a formula, change one of them' case, Excel saves you from.
They always have checks and balances, and you are quite right in wondering about it.
Take the usual life cycle of a young financial analyst in one of these firms (not even doing quant stuff)
He makes model, it is sent to his superior who checks it and tells him the out put is wrong, the PE ratios he has calculated for the firm are through the roof, at something liek 18x while the rest of the industry is always at around 3x. (numbers are fictional).
Check your numbers.
Further, the lowly analyst has also made a million errors in even reaching this stage - from incorrect formatting, to incorrect formulas, incorrect linking, to the dreaded ref error.
They do this stuff constantly, and the ones who can do it in their sleep and not die of exhaustion to the brutal hours are the ones who get through to the next level.
People have their fingers burnt regularly, and every day. Bad models still manage to slip through, but are more often than not, caught, or handled.
So people in the industry are frightened witless that something like a bad model may escape into being published.
It took several sequences of failure for the whale to amass the positions he did, and it is an uncommon sequence. Which is why its such a failure on the industry's part.
You've hit on exactly why mistakes in excel spreadsheets don't get corrected. Evveryone checks the output, many people check the results of intermediate calculations, but in most cases people don't check the code if those are ok.
Everything goes well until either the data changes enough to make the error in the code matter, or some other code is added which relies on assumptions about how the broken code works, and the sheet starts pumping out garbage.
You decline to blame the tool because you made a mistake.
But the most important thing you need to know about tools is this: tools are not neutral. Every tool promotes working in certain ways, and discourages working in other ways. All programming tools can perform the same calculations; its their effect on the way we work (and the way we interact with the other humans we work with) that changes between them.
Excel IS an amazing tool, but it has many extremely serious faults which have been mentioned on this thread. Some of these can be solved by adopting specific working practices for making spreadsheets, but sometimes this is like trying to nail jelly to the ceiling -- the things that Excel makes easy are often the things that you want to 'ban' people from doing (things like forking a sheet without passing the changes back).
One enormous problem is that it's difficult to track progressive changes to the code (formulas or macros) in a spreadsheet. What you normally save is a binary blob which is impossible to diff.
Another one is that there is no straightforward way to apply the same code to different data. If you copy your sheet, you now have two versions of the code which can diverge. So people add more and more data to the same sheet, and add more and more ways of selecting the inputs you want, whereas a program in a conventional language is separate from its input and can be applied to any input.
A third problem is that while it's easy to call external code from Excel, running Excel as a library and calling it FROM external code is extremely complicated and arcane. (Just look at the number of questions on programming help sites asking how to do straightforward things with VSTO if you don't believe me.) So rather than make nice apps which interact with Excel, people make 'applications' inside Excel which are really just large, complicated spreadsheets with weird dependencies. Since Excel and VBA weren't really designed to do this a lot of these work very badly.
Some of these things can be fixed for the spreadsheet model, although for a mix of technical and commercial reasons they might not be able to be fixed within Microsoft Excel. However to appreciate why they need to be fixed, you have to understand that programmers don't use things like version control because they love bureaucracy and due process. (Most programmers HATE bureaucracy and due process, and almost all of them resented the overhead of version control and code review until they very quickly discovered how huge the payoff is when things go wrong.) Likewise, we don't use sophisticated languages and study theory because we want to 'author monuments to computer science'.
These things have huge, and immediate practical advantages. The programmers' task is to make people understand why they benefit if they do blame their tools, and start seeking better ones.