Hacker News new | past | comments | ask | show | jobs | submit login
The Importance of Excel (baselinescenario.com)
285 points by DavidChouinard on Feb 10, 2013 | hide | past | favorite | 203 comments

One of the amusing/annoying things I've learned when working with business/finance people is how the spreadsheet seems for them to be a freeform tableau with which to conveniently display an assortment of calculations, as opposed to a structured data format.

For example, I'm inclined to list financial data in this somewhat-normalized format in an Excel spreadsheet:

        Apples	1/10/2013	40	$50
	Oranges	1/12/2013	12	$200
	Apples	1/15/2013	30	$80
	Oranges	2/1/2013	10	$40
	Pears	2/2/2013	50	$100
	Pears	2/9/2013	20	$40
However, people I've partnered with, and who most definitely (I think) have a better grounding in financial math than I do, might structure their spreadsheet like this:

    Apples  1/10/2013   40  $50     Oranges 2/1/2013    10  $40
    Apples  1/15/2013   30  $80     Oranges 1/12/2013   12  $200
    Total Apples:       70  $130    Total Oranges:      22  $240

                                Pears   2/2/2013    50  $100
                                Pears   2/9/2013    20  $40
                                Total Pears:        70  $140

(you can imagine the bespoke text-formatting/cell-coloring that also ends up as part of the spreadsheet)

While I understand that their priority is to not care about data processing...not only is this format extremely annoying to machine parse, but it seems unwieldy for any professional use case. When you want to add more rows to each category, you're going to run into layout problems, for one thing. And at some point, your wrists are going to get a nice dose of carpal tunnel syndrome from all the clicking-to-navigate that you end up doing.

Interestingly, Apple's Numbers tries to break people of this (perhaps unsuccessfully) by decoupling the 'sheet' with the column/row/cells grids. You can have multiple cell tables on a single page.

When I first saw this functionality I was really excited by it as a way to keep data consistent and independent of the layout. In reality I think the tooling around creating/moving/linking the cell groups is a bit awkward to use. Perhaps some day it will get there, or someone else will pick up the idea and run with it some more.

Lotus Improv did this first. Excel has pivot table support that supports this use case too if people know how to use it.

You miss the point, both Lotus Improv and Excel are an (pseudo-)endless wall of cells.

Numbers puts bounded tables on a page[0][1] instead of fitting the pages (and everything, really) into an endless table dating from Multiplan, thus solving the "hacking the cells to implement layout" problem. In this example[2] a table is actually selected, and allows for south, east and south-east extension.

[0]: http://www.file-extensions.org/imgs/app-picture/3615/iwork-n...

[1]: http://maymay.net/blog/wp-content/uploads/2008/09/example-bu...

[2]: http://farm2.static.flickr.com/1400/1050927588_8765bb65a6.jp...

I see the latter case, and far worse, at my work every day as a mechanical engineer. What it boils down to I think is, is that this "annoying" implementation has worked fine for them for many years, and our company at least has little competition and thus little incentive to develop more efficient methods. Seeing this is painful to anyone who's done modern database processing, and I'm making inroads with my manager at improving this state of affairs, but some guys don't know any better and hate being told what to do. One coworker prefers to punch in hundreds of values into his Casio calculator at a time instead of dragging a row in Excel. He knows how to do this in Excel, I've seen it in spreadsheets he's made, but for that particular usage case an old habit dies hard.

I faced this before. I have a solution: Open up PHPMyAdmin, show them how it looks like in a proper database.

Then show them how to lock panes in Excel. Problem pretty much solved.

I think danso's problem would also be solved by showing them pivot tables in Excel.

sumifs || dsum an alternative too.

sumproduct rules my world when it comes to this.

Regular person here who learned to code using VBA with Excel and Access to build spreadsheets and "dashboards" (oh the suits love "dashboards"!!) to track sales for what grew to be a half-billion dollar book of insurance business. The problem with all the "professionals" is that they take forfuckingever to get anything useful and usable to you when you're trying to run an actual business. Plus, you have to sit on endless conference calls trying to explain to some "professional" why you want to see the data the way you want to see it, not the way that's easiest for them to program. And building out the specs. And following their painful changelog process. And waiting some more. And know they're getting paid more than you even though a lot of the time they leave at noon on Friday.

If you want people to use your "professional" code, learn something about how to get it done now. The people who are making it happen right now don't have time for all this. Not everything needs to be engineered - not even the small chunks of the global financial system I have dealt with.

"Not everything needs to be engineered".

This is true. However, in a professional environment, it is reasonable to expect that a professional level of attention and diligence is being applied to most tasks. With a half-billion dollar spreadsheet, it's not going to take much of a rounding error to make that software engineer's salary look insignificant.

You draw a distinction between "professionals" and people who make spreadsheets. Why is that a good distinction? Shouldn't those people have some professional standards too? What are you doing if you don't have time to do a good job on something? Heck, why is it okay for you Excel programmers to not have the same set of standards that the rest of us follow?

Following all these rules of procedure and specs and change requests isn't any more fun from the other side of the table. The point of that stuff is, if the product does not conform substantially to those specifications, the person coding it is (usually) liable. Reimagine your spreadsheet-writing life with the idea that you could be sued into oblivion for any losses incurred by your code, and you might develop an all-new appreciation for testing and specifications.

But you know, maybe it's your money on those spreadsheets.

> The point of that stuff is, if the product does not conform substantially to those specifications, the person coding it is (usually) liable.

Can you point out a single case where a programmer was held personally liable for losses incurred by their code? Just one?

Libility is usually limited to willful wrongdoing and negligence, and when it involves work done by an employee, it's the employer not the employee that's held liable.

I don't think he means liable in the literal sense but in a figurative way: that permanent employee might lose out on a bonus, raise or future greenfield projects if they screw up. A consultant or contractor might not get their contract renewed or told to sod off, depending on the severity.

But you're right about the financial liability: I've never heard it happen; but what with Murphy's Law I'm still retaining a Liability insurance just in case it should ever happen to me.

This sounds like the conversation from the day before the Agile Manifesto was published.

Why exactly should you be afraid when a turing complete language is used to do computation by people who do not understand more complicated languages?

I do not see the point of the article.

In fact, we should instead celebrate that excel is used for so many things that can be better automated - just think about the business opportunities!

EDIT: and yes I read the article, and it specifically points to a coding mistake. When I work with excel (sometimes I still do!) I create references to the cells holding the formulas, and feed them some known input and compare the results to what I expect - if the computations do not match the formula may have been overwritten so I print a "ERROR DETECTED" in a special "TEST" field next to the cells holding values.

Congratulations - you have a test suite!

And that's just one way. Sometimes when I have to do a different implementation I just keep the previous implementations of the same algorithm somewhere on the sheet, use the same input and substract the results - any non zero value should get you worried enough to display an error message. This is interesting for new edge cases that your test suite could miss, and especially useful if the initial algorithm was vetted in some way.

Congratulations - you have added regression !

All this is easy to do, just a copy/paste special away. Even better- you can teach that process to non coders and they will see how useful it is, because anyone who has used excel for non trivial things has been exposed to such problems and is weary of excel limits.

The tool is not the problem. Software development is a process, and if you do a poor implementation, without tests, without regression testing, guess what it will bite you - but it would be the same in C, perl or python if you try to do without at least tests.

TLDR : There are many ways to make excel more robust. Maybe the "experts" are not really experts if they have never been exposed to copy-paste induced errors and never had to take care of such errors.

> I do not see the point of the article.

Did you read it? The problem is that for all its power, quirks about Excel as a product make it very easy to make entirely hidden errors. Too much magic in the hands of unsophisticated users can lead to trouble, but in Excel's case, even experts can make mistakes and not notice.

And when components of the global financial system are exposed to such risk, that gets problematic indeed, as the article details persuasively.

The article doesn't do this persuasively.

I'v seen the process, and I know of what the process is in one of the Big Firms. There is almost always someone above you to check your work, because the drudgery and pain of a model is farmed of to an analyst. The associate/seniors take a look at the work, and finally there is almost always a sanity check which is not trivial.

All of the major finance firms have lots of people who've burnt their fingers with excel mistakes.

They know that their analysts and their associates can and will break something. (Heck the analysts do so regularly, I'm sure there are several analysts freaking out over REF errors right now)

The London Whale wasn't just because mistakes in excel. There were several things that broke here.

The assumption that the users are unsohpisticated in the use of excel is a bit naiive. If anything, the most arcane applications and commands of excel will likely be best known in the major banking firms.

There were several things that broke with the London Whale. And spreadsheets are not to blame for things like LIBOR fixing, or late trading on mutual funds or any of the other fraudulent practices of big banking.

But still, spreadsheets are full of normal human errors, and based on flawed assumptions, with little oversight, and they control huge amounts of money in novel instruments.

From EuSpRIG:

> "This market grew very quickly due to the ease with which it was possible to design and promulgate opaque financial instruments based on large spreadsheets with critical flaws in their key assumptions." Grenville J. Croll, 2009 [http://arxiv.org/abs/0908.4420]*

> "Complex financial instruments such as CDO’s are implemented within, and valued by, large and complex spreadsheets. CDO’s and other credit derivatives played a leading role in collapse of the global financial system”. Grenville Croll [http://arxiv.org/abs/0908.4420]*

> “Spreadsheets have been shown to be fallible, yet they underpin the operation of the financial system. If the uncontrolled use of spreadsheets continues to occur in highly leveraged markets and companies, it is only a matter of time before another ‘Black Swan’ event occurs [Taleb, 2001], causing catastrophic loss. It is completely within the realms of possibility that a single, large, complex but erroneous spreadsheet could directly cause the accidental loss of a corporation or institution, significantly damaging the City of London’s reputation”. Grenville J. Croll, 2009 [http://arxiv.org/abs/0709.4063]*

etc etc.

Because people make errors. This is well documented. For simple tasks people have an error rate of about 0.5%, but for more complex tasks people have an error rate of about 5%

So, when people are creating their own spreadsheets to get stuff done, and using that for multi-million dollar transactions, and not getting the software audited or approved, they're introducing risk to the company.

>> For simple tasks people have an error rate of about 0.5%, but for more complex tasks people have an error rate of about 5%

Can you cite the source for this data? I would like to read more.

Panko has a paper here (http://panko.shidler.hawaii.edu/My%20Publications/Whatknow.h...) which mentions it in the Introduction to Errors section, under the "consistent with other human error data" section.

> Consistent with Other Human Error Data

> When most people look at Tables 1 2, and 3, their first reaction is that such high error rates are impossible. In fact, they are not only possible. They are entirely consistent with data on human error rates from other work domains. The Human Error Website (Panko, 2005a) presents data from a number of empirical studies. Broadly speaking, when humans do simple mechanical tasks, such as typing, they make undetected errors in about 0.5% of all actions. When they do more complex logical activities, such as writing programs, the error rate rises to about 5%. These are not hard and fast numbers, because how finely one defines reported "action" will affect the error rate. However, the logical tasks used in these studies generally had about the same scope as the creation of a formula in a spreadsheet.

He has cites at his website too.


Traders don't care. Traders are the profit-generating people. They are the rockstars in banks. Programmers are tools to help them achieve their goals. When traders want something that helps them make money, they want it now. Try explaining code maintainability to them. If they don't like you, you're gone and another keen programmer with an interest in finance will take your place.

I find that most people that program excel are most likely quants that just want to get the job done, do something to allow the bank to generate bigger profits.

Disclaimer: I work in one of these banks.

This completely makes sense and probably applies to any industry (aside from companies whose main purpose is to program, of course)...but do you think there is any possibility that someone with a well-developed, bespoke software package that can be tweaked, tested, hypothesized-with in a much more flexible manner would have a significant advantage than someone armed with an Excel spreadsheet?

And if so, is the reason why no one has gone for the more bespoke route is because no one who programs is in charge of these companies, and thus does not even know that they could benefit from such an approach?

The only parallel I have some experience is in the publishing industry. No one thinks that the technical details of the CMS is important, as long as stories are visible on the Internet. But if a clunky CMS requires you to hire 10 to 20 people just to move pieces around on the frontpage (this is on top of the backend that automatically adds stories to the system and to their proper categories), then that is not a trivial cost.

And I'm sure The Verge, with its custom CMS (in Rails, it seems) has some clunkiness to deal with, but they've been able to produce varied and interesting formats for their stories on a much more everyday basis than even the New York Times (I'm not counting interactive graphics/multimedia, I mean the general story form).

Joshua Topolsky mentioned that one of the reasons for partnering with Vox Media was the quality of their CMS.

> If they don't like you, you're gone and another keen programmer with an interest in finance will take your place.

Where are they getting all this talent from? I assume they need atleast a certain degree of expertise which is incredibly difficult to get out of people right now...

Have you heard a bunch of people recite the phrase "Good programmers don't care about money" ?

Turns out, it isn't a perfect description of reality.

Graduates from universities. They apply in droves. (guilty as charged). I suspect this applies to US and UK, although I'm based in the UK myself.

The fact is that they called the trade 'financial engineering' but as a matter of fact, it is a lot of finance but very very little engineering....

The "engineering" in financial engineering does not refer to engineering as a technical profession. Rather, it refers to engineering in the sense of "engineered lumber."

Engineered lumber is when you glue a bunch of different woods together. The resulting product has different properties from any of the original woods. It is more profitable for the lumber company because it can be sold at a price premium.

Similarly, financial engineering involves packaging up multiple financial products into a more complicated product. The product can then be sold at a higher price to gullible fools, aka "customers."

I think I will never forget the time when I used to fix their Excel spreadsheets of all kinds of inane problems.

Say what you will about Microsoft, but Excel is amazing.

I'm no genius, but during my corp years in $LARGE_PHONE_COMPANY the excel macros I was able to write, interfacing with SAP-HR/BW and HP's Quality Center, was a great time-saver and made a lot of things very stream line and less error-prone. All the COM objects that are available to VBscript create crazy possibilities that, really, you could start a whole company just writing macros to stream-line work for other corps.

I really should look into that kind of thing in Open/LibreOffice now that I'm more at home in Linux.

you could start a whole company just writing macros to stream-line work for other corps

Oh, quite a lot of companies started that way. Many of them still do it exactly like that.

Worse: for a while the trend was clearly "all these Excel sheets are terrible, let's replace 'em with <SHINY_NEW_CENTRALISED_WEB_SYSTEM>!", but now established web-based software products are busy touting their wonderful Excel add-ins that seamlessly push data back-and-forth from spreadsheets.

Developer tools have always been Microsoft's sweet spot. Excel just happens to be targeted at non-developers.

"I really should look into that kind of thing in Open/LibreOffice now that I'm more at home in Linux."

You can start with that now, and be done with it by tea time...

{not sure if sarcasm detected}

Since LibreOffice embeds macro writing in Python, the environment is pretty rich - even compared to the mighty Excel VBA...

Raymond Panko has some interesting writing about spreadsheet errors.


The European Spreadsheet Risks Group has some 'horror stories' (http://www.eusprig.org/stories.htm)

It's an interesting field of research, and you could probably make a bit of money if you can audit unintentional human error; deliberate human deceit; or software errors.

While you're at it, I think the piece: A critical review of the literature on spreadsheet errors by Powell, Baker, and Lawson is a valuable read as well. Here's a free copy of it: http://mba.tuck.dartmouth.edu/spreadsheet/product_pubs_files...

I'm really surprised something like EuSpRiG exists. Then again I'm really happy it does.

The fault is not in Excel, which is an excellent piece of software. All the problems identified are the fault of poor math.

After the London Whale trade blew up, the Model Review Group discovered that the model had not been automated and found several other errors. Most spectacularly,

“After subtracting the old rate from the new rate, the spreadsheet divided by their sum instead of their average, as the modeler had intended. This error likely had the effect of muting volatility by a factor of two and of lowering the VaR . . .”

It's not like programmers are immune from making such errors. You can put the wrong equation into a page of C or Java as you can onto an Excel spreadsheet.

> It's not like programmers are immune from making such errors. You can put the wrong equation into a page of C or Java as you can onto an Excel spreadsheet.

The difference is that programmers know they will make such errors and thus have developed tools, techniques and a culture that acknowledges the necessity for testing to find and eliminate bugs.

In contrast, I'd wager that less than 0.01% of financial Excel sheets has automated tests, and less than 0.1% has a written test plan. If you confront your typical "work-hard-party-hard" hotshot trader with demands for such, the response is most likely "Testing? I wrote it, I know it works, do you think I'm stupid?"

"The difference is that programmers know they will make such errors and thus have developed tools, techniques and a culture that acknowledges the necessity for testing to find and eliminate bugs."

LOL, seriously? How much software do you use every day that doesn't have bugs? Especially software that only gets used every now and then?

Yes, and that's despite all those efforts to reduce bugs. Imagine all the bugs found by tests going into production...

Or were you jsut put off by the phrase "eliminate bugs"? That wasn't meant to imply eliminating all bugs, just those the tests have found. And we all know they don't catch 'em all.

OK, put that way it makes sense and I agree with you.

> It's not like programmers are immune from making such errors.

Too true. It wasn't that long ago that the programmers at The Knight Capital Group nearly sent the company broke with their $430 million programming glitch.


Except that wasn't a programming glitch, it was an operations glitch. They accidentally deployed a test trading partner for their real trading app to the real world.

So when I accidentally enter the wrong rate into the spread sheet everyone calls me an idiot and tells me I should leave the programming to people who know what they are doing.

But when a programmer I accidentally promotes test code to production, it's just an accident.

That is exactly the point the OP that I replied to was trying to make.

When someone else fucks they are stupid and they don’t know what they are doing, but when the coders fuck up it's always someone else that is at fault. Please give me a break!

Given that most spreadsheets only have one enrivonment (PROD), this class of error isn't likely. However, as the grandparent says, this means testing is equally unlikely.

> They accidentally deployed a test trading partner for their real trading app to the real world.

Do you have a citation for this? I saw a lot of speculation at the time but nothing that was ever corroborated.

I just found the Nanex analysis (http://www.nanex.net/aqck2/3525.html ) very convincing.

Yes, so excellent that it lets me add a date to USD and format it as a percentage, as I copy and paste that one formula into every row of data.

There is also the occasional - ctrl c, alt es v. And then realize you copied date values along with your data.

Excel is great, until that moment that you do something in haste over a large set of heterogenous data and cells.

And if you are doing finance, a world of pain awaits you.

edit: One other gripe about it, is if you use a macro, you can never use the easy ctrl-z out of whatever mistake you've just made.

Backups for a new analyst are few and far between. Backups for me a short while into my job could happen hourly.

Well, I didn't say it was strongly typed :-) But you make a good point.

Excel is fantastic if you picture it as the World's Best Visual REPL. There's no doubt that it's overused, but it does occupy a very particular sweet spot. In fact, I think one of the best use cases for something like Light Table is to act as a spreadsheet killer.

We software guys complain about this all the time, but Excel completely permeates the corporate world because it:

- is universally available (who in an office building DOESN'T have an Office license?)

- can be "Programmable" to the extent that it needs to be. You don't have to start with code. Formulas and conditionals are great for most things. People usually ease into Macros gently.

- produces a format that is sharable. People "fork" Excel spreadsheets all the time. It's not pretty but it works.

- gets the job done. You want data entry with some calculations and maybe a few if-then rules here and there? What's better than Excel?

In a corporate environment, often the best way to get things done is to circumvent the official software and just write something that works. When we do it, we call it an elegant hack, but when guys in suits who went clubbing last night do it, we call it a terrible, amateur travesty that should be replaced by PROPER code as soon as possible.

And you know what, eventually that happens. Very rarely, an incredibly useful Excel spreadsheet will be replaced by an even more useful (and reliable) piece of custom software that also adds tons of value to both the users and the organization. But I've worked in corporate consulting for years; can you guess how often this happens? I'd wager it's less than 40%.

No, what typically happens is that an analyst or software dev notices someone's cool spreadsheet and says "hey, I can make something that does this job, but it'll be a LOT faster and I'll put the data up in the cloud and multiple people can access it at once and..."

And that sounds great, so they get a little budget and a project is born. Most of us who have been there and done that know what happens next: higher-level stakeholders get involved, broader objectives get defined, more team members are brought on, timetables are established, results are "metricized", paradigms are going to be shifted, etc.

Rarely does a piece of software escape from this process that is as genuinely useful as the spreadsheet which spawns it. Often, rather, it gets delivered 6 months late. It crashes all the time. What used to be one simple input field is now a whole page with checkboxes you have to check and sub-objects you have to define. The end result might look a little prettier but that cool Infragistics graph is locked inside the program and can't be shared like the old Excel report because no one hooked up the "export feature". People are getting upset. Everyone hates this program. But we have to use it, it's mandated by corporate.

Meanwhile, a talented new guy comes on the team and notices what a bloated piece of crap this software is. He wonders why no one has written a little Excel sheet to get around it and REALLY get some work done...

I know I'm being cynical. And look, I GET that rogue spreadsheets can turn into productivity-damaging unseen business risks. But until the corporate "software project" culture understands why it happens and why people are often far happier with their clunky spreadsheet than with your shiny WPF app or web page, I don't think this problem is going to go away.

I've developed an ERP that replaced a set of buggy Excel sheets and nowadays processes tens of millions of euros every year. Employees of the customer organization mostly like it, despite that it has it's problems like any organically grown piece of software has.

However, the problem with custom-made ERP software (or how they are typically developed) is that adapting them to new processes and ideas is typically slower and requires "real" programmers, where as systems like Excel that empower people without programming experience to model their processes, is much more dynamic and adaptable to needs of the business.

This is something that I'm currently working on.

That's not a bad thing in high-value areas, though. This is the exact problem that the article points out - people without the skills and expertise to design and build robust, fail-obvious software are moving around sums of money that can be large enough to crash the global economy.

The problem isn't Excel - it's lack of standards and precision in financial and business software. If you build a new ERP that replaces Excel, but lets users modify the algorithms, then you will have the exact same issue as you'd have with Excel. The slower adaptation to new processes and ideas is a good thing as it means that these new processes are going to be specced correctly and hopefully tested to some degree. It's not okay for a financial institution to save a bit of time and money on their software and put client's life savings and public tax money at risk.

Of course, this is all very idealistic, and I personally have no doubt that Excel spreadsheet hacks are here to stay. Idealism will always fall down if a few bucks can be made.

"The slower adaptation to new processes and ideas is a good thing as it means that these new processes are going to be specced correctly and hopefully tested to some degree"

I disagree. Most businesses need more agility and experimentation, not less. There is a huge spectrum of businesses. It would be ridiculous to say to a startup that you need to experiment less and put safeguards to a place, so that you don't make huge losses of revenue. There are lot of traditional businesses in which downside can be guarded easily (retail, many sales organizations) but finding better ways to run the machine can give significant edge.

Most ERPs and Excel-sheet apps don't model automatic stock trading, and don't move billions of dollars without safeguards in place. Rather, they model and assist in existing human processes and the sad fact is that the ERP software is the stumbling point for rapid experimentation.

That is why Excel is so popular.

I don't see any reason why you can't have the agility of Excel with a few of the nice tools that modern IDEs have to make it easier to debug or secure a spreadsheet "app". Excel actually has a lot of features for preventing issues such as formula debugging, input validation and named ranges that make formulas more readable, but the UI is terrible and it's not very discoverable. The problem with Excel seems to be primarily that the MS desktop monopoly made it very hard for people to market better UIs for the "quick financial model" use case. The only way around that was either in B2B sales (where the economics forced you into building big ERP systems where management locked down the processes) or over the web, where we've had to wait for browser technology to reach a sufficient level of power before such a system could be built. I think since IE9, we're now at that point. It wouldn't surprise me if we now start to see a multitude of apps chip away at Excel's dominance in each of its use cases (like trello is doing for the 'lists of stuff' use case).

We (Fivetran) are doing exactly this, bridging the gap between spreadsheets and coding and making real algorithms accessible to non-programmers. The primary thing that makes spreadsheets more approachable is the live-updating, and this feature can be separated from the grid-of-cells model with a cleverly designed language. We'll be starting a private beta in the next few weeks, stay tuned.

This sounds interesting. Would be interested in the user interface of how this turns out.

"they model and assist in existing human processes"

Isn't that the point though - the "RP" in "ERP" is "Resource Planning" so they really are mainly focused on process oriented manufacturing or services.

If you don't need the "RP" bit of an ERP then aren't you really just left with a financials package?

I have worked with some immensely powerful and complicated spreadsheets with entire worksheets nearly full of formulas and 10's of thousands of lines of VBA code. This includes spreadsheet driven systems trading 50-100 million dollars daily as black box systems.

Nearly always, these started as "prototype" systems that became so useful, so quickly that they end up as "production" systems.

This is the power of Excel. But...

When spreadsheets get this large and complicated they are incredibly fragile and become hugely difficult to maintain and bug fix. It is just too easy to make an accidental change and not know your spreadsheet is broken. The VaR example given in the OP is just typical. Sure, you can protect sheets but as a developer/maintainer you need access and then you can very easily break things, especially because version control with spreadsheets can be next to impossible.

The moment you start taking your spreadsheets seriously, as a business process, is the moment you need to consider very seriously recoding them as a bespoke system using conventional software development.

Depends. At my old job, they had a ton of spreadsheets in accounting. Most of them required manual data entry of stuff that we had in the database. Most spreadsheets took 1 ~ 2 days of manual data entry by a single person... per month! Even more when it was time to close a year or a quarter.

Needless to say, when upper management turned over a bit, there was a big push to get rid of all of this wasted time. (I know of at least one person from Accounting that quit because of the mandatory evenings and weekends just to get necessary things done)

Not to mention the addition of complex rules with little to no documentation. Sometimes these spreadsheets are created by someone who knows Excel macros, then they move on, leaving others that don't know anything about it to run it. Maybe new requirements come down, and they poke around it at it to make it seem like things are ok, but they miss an edge case... etc. Not that this doesn't happen in the programming world too, but in the programming world we try to tackle these issues as part of honing our craft. In (e.g.) the accounting world, it's just a necessary evil to get done and over with.

Strange... Excel talks to everything, couldn't they use ODBC, or at least NSLOOKUP from an imported report? Rarely a reason to manually re-enter, even with Excel.

> Rarely a reason to manually re-enter, even with Excel.

I wish it was so. People using excel can get very — ahem — creative, of which I've encountered many a spreadsheet designed by some minion of hell, including (but not limited to):

- a GIS implemented with cells, one cell per bitmap image block, and excel vector drawing features, with atrocious macros doing things

- an insurance broker contract and customer database, whose records were separated by fuzzy formatting (like cell border colors and width), full of varying labels (typos and inconsistencies), and without specific cell placement for data, notably would-be primary keys.

Every single one of them should be locked down in a digital safe, guarded night and day, only to serve as last-stand honeypot tactical weaponry on sufficiently smart cyberwar adversaries, who, once spoiled by the guarding words of "Abandon all sanity, ye who enter here", would quickly have their mind cower in fear back into reptilian neurologic territory[0]. The Snow Crash noise was probably one of those file's raw data.

[0]: http://en.wikipedia.org/wiki/The_Funniest_Joke_in_the_World

I wish I could upvote this more times.

Except the people managing these spreadsheets where not programmers and had probably never heard of ODBC. People outside the process generally suggest a rewrite over trying to maintain Excel spreadsheets which is generally a bad idea.

1. Accounting people were non-programmers, so they probably didn't know what ODBC was/is.

2. Rewriting them found errors in the Excel spreadsheets in some cases.

3. They weren't a Microsoft shop, so most/all of the experience was with an Open Source stack.

4. Most of the functionality was implemented as a series of views in the database, with a layer of reports on top of it.

5. It was easier to just pull them into our existing system as additional reports, and just have an option to export the report to Excel (which was already baked in functionality).

You would need to add a new user account for the database from the dba who is in another department. Something like that would require communicating with your manager who, as retric says, is non technical and probably won't buy in.

An opportunity presents itself to plug a product I helped write version 0.1 of: http://www.glbsoft.com

If your users spend all their time in Excel, why not connect it to the enterprise back end?

Excel is like dynamically typed programming languages.

It gives fast positive feedback.

It gives very slow, often quite spectacular, negative feedback.

Are you sure that you understand the meaning of positive and negative feedback? Positive feedback leads to runaway (it adds); negative feedback is self limiting (it subtracts).

Feedback in the sense of psychology, not systems theory.

I totally agree. I work for a finance team in a top tech company. As the place is full of engineers, all the time someone tries to replace Excel tool with web based dashboards. The result is that end users are complaining, that they lost all the flexibility and can't easily get the data they want.

Making nice charts and dashboards is not enough. People need the ability to process the data.

My company created this: http://www.synapseinformation.com - and is currently used by a leading UK High Street Bank (case study at the site) - it solves the multi-user data sharing problem for Excel - as well as providing automated data integrity in Excel Spreadsheets - we are looking for more users for this - comments/feedback/questions about it would be welcome ...

Excel and similar tools are never going to go away. It's just got too much utility for that. The best thing we can do is provide spreadsheet users with the same quality of tools as we ourselves are used to.

I know a startup in the Netherlands working on just that, and it's a golden business opportunity with a large untapped market!




I always produce Excel outputs from my applications and it works great, stops people from pestering me for stuff like different sorting options.

The problem comes when people start to view their database as the "point of truth" for some data and then want to suck their data back into the database because the sheet is "more upto date".

This of course is a hell of a problem when different people have different versions of said spreadsheet and want to somehow merge them all back together.

One advantage of macro laden spreadsheets though is that they are often much better for gathering actual requirements than some word document put together by committee.

I use Excel to bulk edit TFS (don't ask) work items. It works by having a plugin that lets you refresh and publish your changes back to the data source. Something like this for regular data would be ideal.

For those who might ask, it's actually useful when you have a series of work items that you want to rapidly change, like the status of a project, deadline, etc. It's a lot less time consuming to add a new row to a sheet rather than having to go through a series of menus and windows to get to where you need.

In a corporate environment, often the best way to get things done is to circumvent the official software and just write something that works. When we do it, we call it an elegant hack, but when guys in suits who went clubbing last night do it, we call it a terrible, amateur travesty that should be replaced by PROPER code as soon as possible....Meanwhile, a talented new guy comes on the team and notices what a bloated piece of crap this software is. He wonders why no one has written a little Excel sheet to get around it and REALLY get some work done...

Imagine Excel with a complete audit trail of all keystrokes and spreadsheet interactions. Quite a few problems would be solved.

Many firms have absolutely gargantuan spreadsheets and workbooks.

I suspect that individual files would become humongous with those features added.

If you store the commands from the beginning, you can recreate the state at any point in time.

Space gets cheaper every day.

The idea does have merit. On the average case it should be able to work.

A few things may be an issue with implementation, one of which would be excel macros. As it stands excel undo cannot regress past the last use of a macro. I suspect a state list may encounter a similar roadblock.

As it stands excel can be very very slow on large sets of data. On a sufficiently complex and clean model, I've seen load times and computer slow downs which begin to make Tokyo traffic seem mild, if it doesn't just hang.

This would make it easier to blame the person who "made the mistake" of mistyping something in a spreadsheet that is error-prone by nature and used in a process without QA. Exactly what problem would that solve?

It's a foot in the door toward QA and detection of common mistakes.

Google docs spreadsheets _almost_ have that - it keeps a history but its hard to do a diff with previous versions

It doesn't keep nearly enough history for even basic usage.

You have to click the "Show more detailed revisions" button.

Still does not prevent revision pruning. For personal use, this is very rarely a problem. For anybody with Google Apps for Education or Business, it's a pretty frequent issue with (AFAIK) no real or good solution:


As I understand it, Hypernumbers (the cloud spreadsheet) offers this.

Regarding the programmable part, have you tried the new JavaScript based office applications in office 2013? I haven't but heard good things about it

The thing that leaps out at me from this thread is how high-status spreadsheets are socially—many spreadsheet users have high social/business status; and how high-status they are economically—huge value flows through Excel; yet how low-status they are technically—many programmers hate them, and they have received surprisingly little serious research attention. There is an enormous market inefficiency in that gap. The question is how to exploit it.

There is an enormous market inefficiency in that gap. The question is how to exploit it.

Finance guy here (hobbist programmer). The more I think about this, the more I realize the way to exploit this is through well designed and _specialized_ software which doesn't have to evolve around spreadsheet at all. You have to look at the work flow of the professionals using (abusin) Excel.

I give you an example: audit. Now, the final product of an audit review is what? An MS Word report, that's what they sell the client. How efficient is this? Barely efficient, consider that:

1. There is no version control. 2. Tables and pictures look horrible and are a pain in the ass to format (oh latex) 3. The doc is edited by various people: people paste in the report the wrong stuff --> client gets crappy drafted doc --> bad image 4. The document "links" to different workpapers (Excel sheets with the calculations) that you have to go and pick up, you can't click and get the doc like it was 2013, you need to search it like in the 80's 5. The senior has to review it all, guess how easy it is to do this navigating tons of MS docs and spreadsheets

Then you get to the spreadsheets... it's another mess:

1. No audit trail 2. No workflow of how you go from the original data to the calculation. This is very important, you need to tie to the original source of data ALWAYS 3. You use crappy templates 10 years old with things nobody knows anymore 4. You can't read well .txt files and you can't process big amounts of data (what if you want to test for frauds in petty cash in a 20 billion multinational? For example)

I mean, there are tons of inefficiencies, the points are 2:

1. How you sell your shiny new perfect product to people used to abuse data in this way? 2. How do you build such a product from a programmer background? And here I have my rant about ideas not being useful, they are if they come with lots of market insight focus.

And that's about it. I think, coming to your point: yes, there are inefficiencies, no, you will not get there if you don't have insight into the specific profession you are trying to provide tools to.

Excel and Word do have a lot of these "paper trail" functionalities you talk about; in fact, Excel has amazing database support thanks to ODBC as well -- you can easily separate your business logic from your data by using Access and Excel together.

So how come users still manually set the font size for their headings instead of using styles; how come they ignore Track Changes; why do they copy spreadsheets instead of enable the Share Worksheets functionality -- and so on -- well, it's simple: they don't know about them. Most Excel/Word users are self taught or are given a very rudimentary training course by their peers or one of those cash-generating "Certificate Farms" you pay out the nose for.

I definitely agree that to solve the Excel woes you're going to have to write domain-specific software; Excel's already the Swiss army knife.

Working in the aerospace industry for a number of years, I've seen and created all types of crazy spreadsheets to design aircraft and ensure they are safe to fly.

The good thing about the engineering industry though is that all those spreadsheets get checked by someone more senior than you.

The problem is though, thanks to the increased usage of finite element analysis and more detailed models, the amount of data being pushed around in spreadsheets has grown exponentially.

Far beyond the ability of someone to reasonably check it all.

My wife was analyst in a company that was took over by international corporation. It was hell of two mismatched systems (SAP and internal oracle based system).

They "solved" it so: each morning one person imported the data from both systems into huge excel spreadsheet (file had almost 100 MB), then she send the file to person that know how to correct all the duplicated and missing data in a few columns (it wasn't turing-complete process, it was heuristic, cause systems used different subdivisions and codes for same things, some data required to be sure how to integrate them was missing, and some data was changing without reflection in the systems - like the ingredients used to make one unit of product could change every day, and NOBODY KEPT RECORD of when and how it changed, only some people in production knew how they make the product today).

Then that person send the file to the next person, that knew how to fill the next few details. And so on, close to 15:00 file was supposed to be ready to send to the management. Sometimes it took till 20:00 (hours after 16:00 wasn't paid). When somebody important was on leave, nobody knew how to fill the details.

That spreadsheets had almost no formulas in them, cause people "cut and pasted as values" all data after calculating it (my wife was told it's to "protect the data from accidential changes"). I think it was to prevent people from seeing the errors.

Everybody knew that the data is all wrong. Nobody knew how to even get the correct input for most fields, it was all fuzzy (for example - one person in production knew which labels are bought in 100s, 500s, 1000s, etc. He was sent the file to fill that details, nobody could check that, and there was always some contradictions in the last pages of the spreadsheet. Most of the work went into massaging it untill it seemed OK. The file also crashed half the time, even on computers with 4GB of ram and 64bit windows.

I wonder, how they managed to stay in business (they are still around, and have quite big marketshare).

They stay in business because their competitors' databases are in worse shape. Or because the data wasn't as important as they thought it was.

Verizon's billing system is the same way. Is it bad? Yes. Is it worse than AT&T's? Probably not. Are people going to quit Verizon because of billing errors? Well, I did, but most people wouldn't.

As far as I am aware, the business world has been using Excel for a lot of things for quite some time now, why is it suddenly something we should all be worried about? Excel in capable hands is a strong and multi-purpose software application that is proven to work. It's not merely just a spreadsheet application, Excel allows spreadsheets to be shared around and viewed by people who don't even own Excel.

Should we be worried about driving cars and flying in planes because they're so complicated internally errors are easily hidden? A bad mechanic will always blame his tools. There is nothing wrong with Excel and just because errors are easily hidden doesn't make it a bad program to use. I'm sure error rates are pretty low anyway.

Compare the amount if quality control that goes into your car vs what goes into your sprewdshr.

Posted yesterday, although it attracted no comment: http://news.ycombinator.com/item?id=5193999

And yours didn't have the # appended at the end of the URL.

Is that a new elaborated form of Karma whoring?

"Mine" ?? I didn't submit one. I'm just pointing out that this is a duplicate. Sometimes people wonder why they don't get any comments, and it might be useful to know that many people may have read it recently.

Excel has far worse problems in the financial sector than this. Among other things Excel's thread-safety is flakey, which when you're importing a realtime price feed into excel turns out to be fairly important.

Most of the time your spreadsheet even if it has perfect logic will fine, until some weird race condition gets hit the price being loaded gets transposed and you model gives you a wildly wrong answer which you then trade on.

Dont blame Excel. Those people were running a scam. And from my experience with people who run banks, its actually not even a little bit out of the ordinary to have completely bullshit calculations spitting out numbers that look good.

The problem is that economics is not a science, and finances are designed to enable power plays. We need to apply real science and technology to problems rather than letting a bunch of suits run around scamming everyone.

there’s no easy way to test spreadsheets

Sure there is... you set up a spreadsheet with a known test data set, and make sure your formulas compute the expected results. Just like testing any other software.

Correctness is not enough. (http://arxiv.org/abs/0808.2045)

> The usual aim of spreadsheet audit is to verify correctness. There are two problems with this: first, it is often difficult to tell whether the spreadsheets in question are correct, and second, even if they are, they may still give the wrong results.

You're assuming that people know what model to use; use the correct model; implement it correctly; and use the correct inputs for that model.

Since most of these ad-hoc spreadsheets don't have any kind of specification, let alone a formal spec, it's really hard to know what they're supposed to do, and that makes it impossible to know if they're actually doing it.

And that's just one spreadsheet. Many spreadsheets are linked - you take data from some other unauditable spreadsheet for an input.

After all of that you still have users manipulating the spreadsheet. We know that users will make errors. We can minimise the errors using excellent design (excellent design isn't always used for expensive customer facing planned websites; what hope is there for semi-secret ad-hoc internal quick and dirty spreadsheets?) but even with clear labels we know that users will make errors, and that some of those errors will trickle through to future versions of this spreadsheet, and maybe other spreadsheets that depend on this one.

If you can automate (or partly automate) spreadsheet testing there's probably money to be made.

How many test cases are you able to test this way? How do you measure your test coverage?

Make a spreadsheet to count how many formulas are tested

...then it's spreadsheets all the way down.

Spreadsheet verification and type systems have been studied extensively by the rather terrifyingly productive Martin Erwig. See http://web.engr.oregonstate.edu/~erwig/papers/abstracts.html, and scroll down the page.

"Software Engineering for Spreadsheets" "Automatic Detection of Dimension Errors in Spreadsheets" "Type Inference for Spreadsheets"


I used to build structured credit risk models at JPMorgan and other banks. A few observations:

- Excel models are usually built on a pre-existing template, probably created by someone else. Any errors/hidden assumptions carries through. This is he biggest source of errors.

- Any large model usually has a tab devoted to reference tables and assumptions based on external sources, a la dependency injection. Usually you have to use static numbers since you need to email the model around (you don't want your boss seeing a bunch of #NAs due to missing links. Keeping this data updated is another huge source of errors.

- A sanity check is your best defense. I'm positive that every CDO model I've ever built has contained at least one error, just like every large software application has at least one bug. The difference is that models have fewer output parameters, which allows you spot incongruities more easily.

- People who use big Excel models are not software engineers and will never be. In my opinion, solutions that implement concepts like testing, diffs, commits, etc are a waste of time, because it creates extra work and learning curve.

A lot of work goes into engineering native code Excel add-ins that supply realtime market data in complex ways to spreadsheets. The Bloomberg Excel add-in is almost an entire application unto itself. On top of that, there is an entire team of people whose only job is to help customers develop custom spreadsheets using the data. On top of that, even more tight integration is available, allowing live charts and other components to be embedded into the spreadsheets. There are many who develop really complex "apps" in Excel and would not consider themselves programmers.

How do multiple people use such spreadsheets? Are they opened from a share drive?

As far as the Bloomberg Excel plug-in goes, as long as you open the spreadsheet on a desktop with the Terminal installed, it will be displaying realtime data in the context of the logged in user (delayed/realtime, exchange permissions, etc.). If the spreadsheet is opened on a PC without the plug-in, it will just be missing any data that is supposed to be supplied from the plug-in. So users share spreadsheets as they would any other Excel spreadsheet.

As far as the origin of spreadsheets that Bloomberg helps create, they usually fall into one of two categories:

- There is a gallery of great complex examples that have already been created and designed (probably initially for one client / company) that are available to launch from within the Bloomberg Terminal w/ screenshots, descriptions, etc. They can be searched and when a user finds one they like, it is downloaded and opened in Excel, allowing them to further customize it.

- The user contacts the Analytics desk (chat/phone) and is either trying to integrate live market data into an existing spreadsheet or needs to build something complicated from scratch and does not know how to go about doing it. The team specializing in Excel spreadsheet builds takes over and works with the person to build what they need. In many ways, it's like a small consulting gig -- a service provided to Terminal subscribers.

Once the user has the spreadsheet via either method, they are free to share it internally like any other spreadsheet.

I personally am not fan of Excel, even though you can do a lot in it, I prefer other programming languages. But I can see its huge popularity among the "cool" office workers. Learning the functions in Excel its pretty easy and powerful at the same time.

Take my uncle for example. He is mediocre office worker, selling lawn-mowers for living. He took the time and learned some advance excel functions and created few cool spreadsheets for his office, and everyone loves it. His reputation grown a lot with bosses and e is known as the guy with solutions.

At one time, the Excel team at Microsoft were considered some of the sharpest programmers at Microsoft.

still are.

This is why I strongly believe that regular people should not learn how to code, despite all the "you should learn how to code" articles.

Excel is a really good example of how easy it is to get up and running, and how easy it is to get yourself into a lot of trouble. You might not need to program per se, but you do need to know how to DEBUG and TEST your programs as well. This is something that most people are loathe to do. It requires a great deal of discipline, and even though in finance you're supposed to test spreadsheets during internal auditing, most auditors aren't trained in proper testing/coding, so it ends up being ad hoc. In fact, it's scary how much intermediate financial statements are rubber-stamped/robo-signed by people in the finance, usually because they are pressed for time.

I generally agree with you, I really do (I code, myself, professionally, or, well, have until very recently) – but I think that your wish that everyone learn to code is missing a few things.

First, what people are doing in Excel is coding.

Second, they do it because Excel is, thus far, the best environment available for lowering the (many) barriers of entry that are perceived as blocking one from learning to "program".

Therefore, in my never to be humble opinion, we'd also be wise to try to figure out a way of helping build in testing methodologies into excel. I think it wouldn't have to be that hard. Imagine a testing mode or a test overlay that you could toggle on and off, allowing you to identify input and output cells, setting test data and expected output, etc.

The crucial thing is that the best work on this front would have to come from Microsoft, if it has to be in Excel. One possible fix would be a testing environment that reads .xlsx files – build something great, sell it to managers and MSFT will either buy it or, more likely, imitate it.

> You might not need to program per se, but you do need to know how to DEBUG and TEST your programs as well. This is something that most people are loathe to do. It requires a great deal of discipline

It requires first and foremost the realizations that bugs in code are frequent, normal and to be expected, no matter how smart and well-prepared you are.

And that's one thing you learn when you learn to code. The first programmers were quite shocked when they found out.

What do you mean exactly by "regular people"?

Thank god that "regular people" are smart enough to not care what you think.

Spreadsheet error commission and detection rates are almost identical to those in programming, mathematics, and writing. And professionals make about the same number of errors as novices. What is different in spreadsheet development is the amount of testing. At companies like Microsoft, 40% of development time is spent in testing, and their usually is one tester for each Dev. In spreadsheet development, professional testing is rare. Here is a link on human and spreadsheet error. https://dl.dropbox.com/u/20338447/WhyThinkingIsBadPublicCopy...

From the interesting submitted article, "After subtracting the old rate from the new rate, the spreadsheet divided by their sum instead of their average, as the modeler had intended. This error likely had the effect of muting volatility by a factor of two and of lowering the VaR . . ."

This astounding story is a really nasty example of why "always comment your code" is a good idea--but only if someone else with knowledge of business logic reviews the code.

Good luck commenting your Excel formulas...

Right-click on the cell and select Insert Comment.

But this is really the nub of the problem. By default (and most of the time), Excel hides the formulae and comments - requiring clicking into cells to figure out what's going on. By only exposing a glimpse of the whole model at a time, it's very difficult to get an overview of how the model interacts.

I don't understand what you've said here.

Excel and Bloomberg are the only reasons our FinTech shop has a few Windows boxes in our office--the rest are Linux. If MS Office becomes available on Linux, as rumored, we will have even fewer.

Quite simply, the financial world runs on Excel. For this reason we make sure all our APIs work just as well when being called from VBA as they do from Python, JavaScript, Perl, etc. It's Excel's world, we just live in it.

Many years ago I did a short contract at a FX trading bank and my job was to create a trading report.

While I was there I found out why the report was needed.

The bank had managed to misplace a couple of million dollars in a missing trade.

They did an audit, trying and find out what happened to the missing money and came to the conclusion someone had accidentally deleted a row from the spread sheet being used to record the trades.

Very interesting article, but I belive one of the main assumpion is "Excel makes it easy to mess it up" because it's so easy to use that most non-technical people can use it, in a non-professional, careless way.

My question is : would the error in the model have been caught if the model would have been written in a more technical mean, say an R script ?

Excel does not give any feedback if the "error" is conceptual, but so it happens with more technical programming languages and environments. Excel puts out "errors" if you write a wrong formula, the equivalent of a syntax error for a programming language.

On the other hand, one would think the "on your face" presentation and ease of use makes it easy for other (non technical, but skilled in finance/math) people to review the numbers and calculations.

I agree with the author that I the fault lies in too many people trusting blindly the model, without reviewing the numbers/calculations, but I'm not sure Excel can be pointed out as the culprit.

I wanted to say "a language with units of measure wouldn't have allowed this", but in this case it would have... I guess sometimes even the most restricted tools can't save you :)

Have you ever worked outside of a start up with engineers at a fortune 500 company? Excel is crack for engineers.

Excel is an easy-to-use functional programming language.

Reminds of this paper by Simon Peyton-Jones at al.:

Improving the world's most popular functional language: user-defined functions in Excel


That was kind of a PR gimmick. Excel is declarative, but misses a lot of functional. For example, first class functions.

Can you imagine a spreadsheet with first-class functions, and possibly other functional features? What would such a thing look like?

Strictly speaking, Excel doesn't have functions at all, let alone first-class, since you can't define a new function using only cells and formulas. You have to go to VBA.

Like this: http://xllfunctional.codeplex.com/. Not easy to use at all, but it is possible. Barely.


I think thirsteh was talking about the big models that operate at the spreadsheet layer, without VBA. VBA is imperative, but when you chain worksheet functions together, you have a more explicit, more visible way of manipulating data the way you might with a functional programming language.

Indeed. Some people have even used e.g. Haskell to provide more "real" logic, but with Excel as the user interface:



These are cool! Thank you!

The popularity of Excel speaks to what most common enterprise software lacks- flexibility and programmability. It's hard to construct what-if scenarios in most software.

In some ways, what has been the evolution of Excel in our company has been to switch to Google Spreadsheets, where you get the audit history, sharing, and version control solved, you can build web forms in less than five minutes to have people submit data, you get a decent amount of programability and web service API.

A second thought that occurs to me is why it is so important for business software to be flexible, is that it is important for the business to be flexible. If there is no way for a trader or an analyst to create a new model, or a new process, the current models and processes ossify. Once you go to SAP, your process starts to flow like sap.

I think a big problem is reviewing, debugging and editing spreadsheets since they mostly refer to random cell-addresses instead of well thought through variable names and formulas can be hidden anywhere in the massive grid of cells. Which error would you spot by simply skimming through the code: ´E64 / B2´, or ´rateDifference / rateAverage´.

Most of the time you wouldn't even see the code, just a bunch of numbers on top. Being able to switch back and forth between the code and the resulting data would be a huge help. I like what Bret Victor said in his learnable programming series where he tries to build on this principle http://worrydream.com/#!/LearnableProgramming (Great read btw if you haven't read it already.)

    > One of the all-time most popular programming models is the spreadsheet. A
    > spreadsheet is the dual of a conventional programming language -- a language
    > shows all the code, but hides the data. A spreadsheet shows all the data, but
    > hides the code. Some people believe that spreadsheets are popular because of
    > their two-dimensional grid, but that's a minor factor. Spreadsheets rule because
    > they show the data.

I'm not a huge TDD person, but do appreciate the value of tests around software. This may be a dumb question, but are there any testing tools for Excel? I don't think people would write tests before adding/modifying data and such, but... are there any testing tools that allow you to ensure that your macros and data do what you're expecting, even after changes?

I do not by any means write enough tests for my code, but I have some projects where I've got tests around parts of code, and occasionally I catch regressions when I make changes because of a run of the tests (if I had more tests I'd probably catch more over time).

I've never seen anyone writing custom functionality in Excel that even understood the idea of tests or test/sample data. That is probably a bigger issue with Excel - it encourages people to just throw stuff together and 'start hacking' - with live data, often the only copy of said data - and not understand the potential consequences. Of course, you've usually got a backup of yesterday's data because it's sitting in your inbox, because often Excel files are just emailed around to people, which perhaps encourages yet another sense of 'security' around the process.

It sounds to me like the problems here are very much like the problems with SW development in the early 1990s. There is a social problem: lack of an engineering professionalism and lack of interest by management in fostering such professionalism AND There is a technical problem: lack of the equivalent of a source code control system.

One way to at least start to approach the problem would be to put the equivalent of a SCCS into Excel. You'd be expected to checkin and checkout spreadsheets, changes would be tracked, you could diff against older versions, etc.

The analogy is not perfect, not least because many spreadsheets consist of a logic part, written by one person, and a data part, filled in by another. Even so, forcing a checkin/checkout model would provide an audit trail and, as has been said, storage is cheap.

Perhaps the second easy modification that could be made would be to make it vastly more visually obvious which cells are locked, and which cells are logic. It should be possible to look at a spreadsheet and just see the logic portion separate from the data portion. Imposing this by visualization would at least make it less likely that spreadsheets get checked into the SCCS without appropriate locking.

I could add more, but in all cases it seems to me the idea is: look at the problems we programmers had in the 90s, and how we solved them, and do likewise. I've suggested SCCS and syntax coloring, but one could go to talk about, for example, code commenting, or making it natural for people to name cells (make that the thing that's easy to do, and make the cell names easily visible) so that faffing around with B$3 becomes a weird thing that normal people would not bother with --- just like in coding I don't tell the compiler to use register EAX, I just use a variable name.

Have moved on to other software pastures now, but spent many years building large-scale LOB applications whose sole presentation layer was Excel. Seamless federated data sources, powerful multi-dimensional calculations, easy (re)formatting and charting, and very, very fast delivery, usually around 5-10% of that of some "killer app" from a team of developers the client couldn't afford. And around 5-10% training time, too, because say what you want about their idiocy level, users GET Excel, quickly. It is ultra-intuitive, to the extent that any technology/business tool can be.

Those who sanctioned trading on VaR models built on pasted data were obviously not qualified to hold their jobs. Neither are those who over-glorify or over-promote the virtues of "it ain't real unless programmers built it", and I say that as a programmer. I work with other UI tools now, but I have yet to work with a better or faster-adopted one. The ubiquity of Excel in the workplace is well earned; the poverty of its many implementations is no different from all the other ubiquitous human poverties.

I had the misfortune of working for a firm where there were lots of actuaries.

Every day the team leader would ask a junior to run his pricing model for a structured product.

The junior would duly press F9 and the macro would commence, with the screen flashing away like a strobe light.

24 hours later the macro would stop and the boss would return to find out the answer, which was usually wrong by an order of magnitude.

How did you know the answer was wrong? If it was obvious to you, why wasn't it obvious to them?

As a DBA/Developer working in the Property/Asset Management Industry for decades I can tell you that Excel will NEVER go away so long as the executives are making the decisions. When it comes to the typical executive and even high level accountant/controller/comptroller, the knowledge level and understanding of things computer related is highly restricted outside of Excel, PowerPoint And Outlook with Excel being the big daddy app among this group.

In my experience the typical executive user would prefer spending hours or even days working on a spreadsheet then spending even a few hours learning how to properly use some other task specific software regardless of whether or not it’s better suited for the task at hand. It’s not so much that the executives don’t understand the issues with using Excel so much as how Excel lets them do what they need and more importantly, want to do and all without having to learn a new system. The Excel model (how it works using rows & columns ) is intuitive to one with an accounting mind. I’ve had executives ask me before why can’t our really expensive <fill in the blank> Enterprise Ready Application Do X when Excel can do the same thing and easily? This is especially true when dealing with what I term JCL, Judgment Call Logic. Unlike programming structures (If/Then, CASE, etc) , JCL is perspective based and results in a different value based on who is assessing it. This is why its “judgment” logic and not natural logic. Excel users don’t understand how you can’t code JCL into software.

That said there is certainly a need in the software world to do things better, smarter. The world of software engineering needs more byte artisans and less bean counters. Too often short term decisions are made with negative long terms effects, all done in order to meet some artificial deadline that has more to do with someone getting a commission for meeting a deadline then for delivering something of quality.

There has been an annual conference in London for quite a number of years all about spreadsheet mistakes and how to avoid disaster. A few years ago, some entrepreneurs from resolversystems.com built a better spreadsheet named resolver-one specifically to address the problems of the financial services industry. Essentially what it does is write code while you build a spreadsheet model. Then, when you recalculate, it runs the code. Once a model is built, it can be passed on to professional software developers to QA the code, document it, refactor it and build proper unit tests. However the spreadsheet model still appears to end users just like a normal spreadsheet model. I think this is the real solution to this problem. Use Excel to play around with numbers and do one-off analyses, but for serious business rely on a tool like Resolver One.

I think the main issue has always been that "The Business" -- that nebulous, supposedly all-knowing corporate entity -- never wanted their employees to rely on a spreadsheets stuck on somebody's shared network drive. The issue has never been "formalisation" -- we already have tools for that, though they could surely do with improvement -- but the ad hoc nature that is, over time, festooned with more and more features or blessed as the Official way of doing something -- and when that happens, it's not going anywhere; it's too late to change.

Resolver One was a great idea, but it has been discontinued:

"But while many people did buy Resolver One, we struggled to turn it into a commercial success. The vast majority of people are happy with either Microsoft Excel, or one of the free alternatives available from Google and the like. Since mid-2011, all of our development focus has been on a new product, PythonAnywhere."


That's the annual conference of the European Spreadsheet Risk Interest Group, always noted for the lively debates between academic researchers and hardened practitioners.


University of Greenwich, London, on the 4th & 5th July 2013

Excel is the best tool non programmers have in order to write code, that's why it wins. This is one of the problems startups should focus on in order to change the world, because making non programmers more able to program really changes the world, as Excel did, but is possible to improve over it of course.

I work on Wall st. As many others have already pointed out, excel is the bread and butter of many in investment banking and finance. The power and flexibility of excel cannot be underestimated. At the same time, with all the new regulations/Dodd Frank blah blah, there is a lot of discussions about adding "control points" around these mega spreadsheets/macros/VBAs that are used by Traders. I personally work with traders who use an Excel based addin to push trades into downstream systems.

Few days ago, I built a very simple VBA utility that queries an oracle database to give me relevant transaction information. They love it and I am the superstar already. Reason: this requirement was discussed with IT teams, sent to priority queue for months. After building this macro which took me 2 hours, traders got what they wanted without going through an IT budget.

Check out the concept of 'enterprise spreadsheet management' as a different way of thinking about the issue of Excel in critical business environments.

This perspective recognises that nothing beats Excel as a source of financial innovation and 'can-do'. The problem is that few want to recognize this essential role. The large software vendors need to demonize the issue to sell licences to their replacement solutions. The result is denial at every level.

Much better to appreciate the core strategic value that Excel offers and not confuse it with the essentially tactical value that an individual spreadsheet offers. (Though a tactical spreadsheet can be valuably tactical for 10 years).

The answer is transparency. Once one buys in to wanting to know what is going on then enterprise spreadsheet management can really help.

This is a very interesting article, but I do find it very one-sided. The problem isn't Excel, it's those who use it from a foundation of ignorance or carelessness. No doubt, human error does creep into every human activity. Even so-called experts fumble. It happens to all of us. Blaming the tool or the tool maker is pure nonsense.

I have used the tools mentioned in this article, Excel and Powerpoint, extensively. And I have used them for engineering and business purposes. And, yes, I have made mistakes that have cost me money. Funny enough, never when used for financial applications but rather for engineering.

Over time we developed a rather complex (probably a nearly a hundred columns and two-hundred rows with data and formulas) Excel workbook used to calculate timing parameters and counter reset values for a complex timing and synchronization circuit inside an FPGA. After one crank of the wheel this spreadsheet would output a table of hex values that would get plugged into Verilog code to setup the circuit. Everything worked as expected, or so we thought.

Then, a few weeks later, as the code was being run through test-benches (unit tests) we discovered that a corner case was broken. Being that it was hardware we immediately thought that the problem was the circuit. The code was checked and re-checked, simulated and re-simulated. After about six months of hard debugging work we discovered that the Excel spreadsheet was the culprit. We had used the ROUND() function in a calculation, when, in reality, we were supposed to use ROUNDUP(). Only one of hundreds of formulas contained this mistake. This cost us six months of engineering time and, therefore, lots of money.

Having suffered through something like that I still could not bring myself to blame Excel for the problem. We made a mistake. No, in this case, I made a mistake. And that was that. Blaming Excel or Microsoft would have been nothing less than moronic.

I have used Excel to design polyphase FIR filters, maintain state machine code, produce complex lookup table initialization values for a variety of languages, prepare data for EDA tools, process sensor data and even generate G-code to machine bolt patterns on a Haas CNC mill.

Oh, yes, also invoices, purchase orders and financial analysis.

It's a great tool. And, if you use VBA you can really make it do some amazing stuff.

Powerpoint is another interesting tool. Of course, the most visible application is to create slides for presentations. However, you can also access its automation layer via VBA. And, yes, Visual Basic isn't the most elegant of programming languages, but, who cares, this is about getting a job done, not about authoring monuments to computer science. Anyhow, the most interesting application I found for this was the creation of a full simulation of the physical control panel for a device. The simulation included buttons and even a pretty good simulation of a dot matrix display. It was used during a training presentation in lieu of having the actual physical device there. The menu code was pulled out of the state machine that ran the menu on the actual device (written in C but easily translated to VB). It was really cool.

What surprises me is that financial folks, as related by the article, don't seem to have a system in place to create "checks and balances" on spreadsheets that might actually be used for very serious financial decisions. That's odd, but I can see it happening. Startup opportunity? Maybe.

A compelling if painful story. You must have some thoughts about what you could have done to track down the spreadsheet problem sooner – what are they? Also, if you could change Excel in any way you chose, what might be ways for it to prevent or detect such problems?

I'd like to ask about a technicality as well, if you don't mind. You said that ROUNDUP was used in hundreds of formulas and ROUND in only one. How many of these formulas were identical except for relative cell references, as opposed to doing logically distinct things? By "identical except for relative cell references" I mean the kind of formulas you get by selecting a cell and telling Excel to copy its formula across a range -- Excel sees these as distinct formulas, but really they're computing the same thing on different inputs; in APL it would be a single expression. I'm wondering if the bad ROUND guy was identical to some other formula in the sheet, or whether it was a fundamentally new formula that someone typed in and forgot to use ROUNDUP. Really, I guess I'm just wondering how the bad formula happened to get entered.

The reason for that last question is that I'm interested in what a spreadsheet would be like that recognized common formulas across a range and treated them as a single data-parallel computation, rather than a bunch of separate cells that happen to have relatively identical formulas. Clearly this would have performance advantages (parallelism and data locality). But it would also seem to have a better chance at preventing errors, since it would treat "computation X across an array of N cells" as a single thing, rather than N relative copies of X that might get altered separately. Curious to hear your thoughts on that general idea too.

Not many people on HN mention APL. I programmed in APL professionally for almost ten years and know (knew?) the language well.

It's too late for me to go dig-up this old spreadsheet. Maybe I'll do it tomorrow. Excel does warn you when there are discrepancies between adjoining cells. If I remember correctly this was a cell that was copied down and used on a row. Thinking back I think the formula was entered once but replicated Dow a range of rows.

The problem with the APL comparison is that the language can become less elegant if the operations to be applied across the rows of a matrix are somewhat varied. I'm typing on an iPad right now so I can't even try to give you an APL example. Maybe tomorrow from my PC.

(I remembered your APL experience from some previous HN comment or other.) You're right that spreadsheets tend to be more heterogeneous than a chain of array operators. I'm not suggesting that APL as a language should be integrated into spreadsheets, but rather that there is a way to implement spreadsheet language (formulas and cells) that partially approaches how APL treats computation. This is possible because there is a lot of implicitly array-oriented computation in typical spreadsheets – that's what you're "declaring" when you drag a formula across a range of cells. What would a spreadsheet system be like if it really exploited this? The trick is to do it without breaking too much with the well-established mental model of Excel users.

Excel does have array formulas. Every time I used them I have to look up usage. From memory they work well for matrix transforms and such things. Again, my memory fails me because It's probably been five or more years since I've touched them.

I have a busy day today so I have to stay off HN. I'll look into some of the items requiring deeper answers later tonight or tomorrow.

"Excel sees these as distinct formulas, but really they're computing the same thing on different inputs; in APL it would be a single expression."

Excel notices when there is one formula that breaks a pattern, and puts a warning sign on that call (my Office 2003 version of Excel does). Now it presumably misses some cases, but the standard 'drag a formula, change one of them' case, Excel saves you from.

They always have checks and balances, and you are quite right in wondering about it.

Take the usual life cycle of a young financial analyst in one of these firms (not even doing quant stuff)

He makes model, it is sent to his superior who checks it and tells him the out put is wrong, the PE ratios he has calculated for the firm are through the roof, at something liek 18x while the rest of the industry is always at around 3x. (numbers are fictional).

Check your numbers.

Further, the lowly analyst has also made a million errors in even reaching this stage - from incorrect formatting, to incorrect formulas, incorrect linking, to the dreaded ref error.

They do this stuff constantly, and the ones who can do it in their sleep and not die of exhaustion to the brutal hours are the ones who get through to the next level.

People have their fingers burnt regularly, and every day. Bad models still manage to slip through, but are more often than not, caught, or handled.

So people in the industry are frightened witless that something like a bad model may escape into being published.

It took several sequences of failure for the whale to amass the positions he did, and it is an uncommon sequence. Which is why its such a failure on the industry's part.

You've hit on exactly why mistakes in excel spreadsheets don't get corrected. Evveryone checks the output, many people check the results of intermediate calculations, but in most cases people don't check the code if those are ok.

Everything goes well until either the data changes enough to make the error in the code matter, or some other code is added which relies on assumptions about how the broken code works, and the sheet starts pumping out garbage.

You decline to blame the tool because you made a mistake.

But the most important thing you need to know about tools is this: tools are not neutral. Every tool promotes working in certain ways, and discourages working in other ways. All programming tools can perform the same calculations; its their effect on the way we work (and the way we interact with the other humans we work with) that changes between them.

Excel IS an amazing tool, but it has many extremely serious faults which have been mentioned on this thread. Some of these can be solved by adopting specific working practices for making spreadsheets, but sometimes this is like trying to nail jelly to the ceiling -- the things that Excel makes easy are often the things that you want to 'ban' people from doing (things like forking a sheet without passing the changes back).

One enormous problem is that it's difficult to track progressive changes to the code (formulas or macros) in a spreadsheet. What you normally save is a binary blob which is impossible to diff.

Another one is that there is no straightforward way to apply the same code to different data. If you copy your sheet, you now have two versions of the code which can diverge. So people add more and more data to the same sheet, and add more and more ways of selecting the inputs you want, whereas a program in a conventional language is separate from its input and can be applied to any input.

A third problem is that while it's easy to call external code from Excel, running Excel as a library and calling it FROM external code is extremely complicated and arcane. (Just look at the number of questions on programming help sites asking how to do straightforward things with VSTO if you don't believe me.) So rather than make nice apps which interact with Excel, people make 'applications' inside Excel which are really just large, complicated spreadsheets with weird dependencies. Since Excel and VBA weren't really designed to do this a lot of these work very badly.

Some of these things can be fixed for the spreadsheet model, although for a mix of technical and commercial reasons they might not be able to be fixed within Microsoft Excel. However to appreciate why they need to be fixed, you have to understand that programmers don't use things like version control because they love bureaucracy and due process. (Most programmers HATE bureaucracy and due process, and almost all of them resented the overhead of version control and code review until they very quickly discovered how huge the payoff is when things go wrong.) Likewise, we don't use sophisticated languages and study theory because we want to 'author monuments to computer science'.

These things have huge, and immediate practical advantages. The programmers' task is to make people understand why they benefit if they do blame their tools, and start seeking better ones.

The problem isn't Excel.

The problem is that those who are developing complex Excel models are not necessarily taking into account the need for solid design, rigorous testing and model validation. The quant generally building an Excel model is generally thinking "how can I get this model built and running as quickly as possible?" Whereas, a software developer who is used to bugs, and the negative impact they cause, might take a more guarded and gradual pace - with the realization of the importance of testing.

Neither Excel nor the quants that use it in a fast, ad-hoc basis are going away. And so I believe the solution involves application of good design, QA and validation to the Excel modeling process.

Models which are at best a charicature of some reality as a sole pretense for actions are the problem and distract from the issue of what sorts of controls are appropriate in finance like position limits. Allowing banks to create their own risk models shows whose pockets regulators are in - the Banks pockets. So we talk about Excel and models and don't notice models are distractions that allow banks to do what they please until a trader blows up, when then they can blame the models instead of a lack of controls they don't want in the first place. Get real.

I have a probably untestable hypothesis that, were you to come up with some measure of units of software output, Excel would be the most-used program in the business world.

Outlook would like to have a word with you...

And Freecell would like to have a word with both of you...

The scary reality is most of Wall Street and The City is built on Excel. It's universal, and non-CS quants can understand it. Of course QA on these sheets is near impossible.

I make a good living replacing Excel for small businesses.


I just finished a 5 week long project revolving around replacing an Excel sheet filled with a bus load of macros. Now the client has a web based solution that can use the excel sheet as an optional data import medium, but not be beholden to it.

What do you replace Excel with?

I build custom apps based heavily on what is described here - http://onsmalltalk.com/sandstonedb-simple-activerecord-style... (more towards the bottom).


How did you attract your first clients doing this?

Entirely word of mouth.

One limitation of the spreadsheet approach to rapid application development is that, in typical use* your code and data tend to be tied together. This creates a testing problem where you tend to confirm results in just one scenerio. I prefer to think of spreadhseets as electronic squared paper.

*while you can structure things to pull data in from other sources, in my experience people evolve a spreadsheet around existing data.

It isn't bad when they push the limits with Excel. That is normal, people tend to push the limits of the technologies they are comfortable with and use often.

What is bad is when they think Excel is a superior platform than some other more suitable technology simply because they have seen what it can do. I can excavate tunnels with spoons, but there is a reason we use TBMs for big projects.

Aren't there any tools to generate documentation from an existing excel mixture of formulae and scripts?

This might be a great opportunity for new product.

Excel is, simply, the most successful object oriented programming interface. Certainly a lot easier to work with than C++ or Java. That's also its curse; any idjit can pick it up, start typing and mousing, and end up with a couple meg of executable that runs just fine and generates extremely dubious output.

I've always wondered why someone hasn't done a Excel->C# or Excel->Java cross-compiler.

Because nobody will pay for it. Seriously though, as many others have pointed out, you need human beings that knows something about software engineering in the process. My clients won't spend time specing out what they want. Often they don't understand what they want in sufficient detail to do that. Creating correct and useful software is not a linear process. If you give a trader a spreadsheet he will look at it and tell you "this is right", "that is wrong", "and, oh yeah, I forgot to tell you I also wanted X". My technique is to use Excel to tease the spec out of the domain expert and keep pushing the business logic into C++. http://xll.codeplex.com provides a great way to do this. Calling C++ from C# and Java is a much simpler problem to solve if that is one of the business requirements.

Hey Protomyth. Great thought. We are building a Java cross-compiler right now. We will be launching in a few weeks. Check it out fivetran.com

The most egregious Excel abuse I have seen was an application that had a "database" embedded in it, i.e. a hidden sheet containing lines of data that was added to, and (linearly!) searched by, macros hanging off buttons in the "app" part of the xls.

This is not the "most egregious Excel abuse", but rather standard opreating procedure for most Excel applications I've seen in my career.

If you doubt the power of Excel, try to do the same thing in Numbers... I'll just wait here.

I think this sort of thing expresses the wrong attitude. It's like saying "I can do anything in assembly, so I have no interest in your fancy 'compilers' and 'scripting languages' and 'programmer methodologies'".

The issue is NOT Excel vs Numbers, just like software engineering is not a fight about whether Java is better than Ruby. The issue IS how can we use what we do know about making professional programming safer and more productive (limited though it may be) to make amateur programming in Excel safer and more productive.

And yeah, some of that WILL involve modifying the current wild and crazy world where anything goes for a world of more discipline. But you are not helping anyone if you are the guy screaming "keep your stupid for loops and your structured basic blocks --- I want the power to use goto whenever I want for whatever reason I want".

MS Excel is the largest Database in the world ;-)

I saw a power user who would put reference values in the lower cells and would change the font color to white to hide the values.

Last year they had this awesome Excel and financial modelling world championships called Modeloff. I bet you they don't make many mistakes. www.modeloff.com

So I have read all the comments and POV in this discussion and have resisted posting this link. But there is only one way to solve this problem. Accept that Excel and Spreadsheets are here to stay, you are never going to impose the controls the IT guys would like the world to live by and put in place a technology that transforms Excel into a true, secure, Enterprise application. Here is the link: http://vimeo.com/user13359040/review/48949438/7d482fb814

Wherefore art thou, Lotus Improv?

I'm not normally the type to nitpick over minor grammar/semantic matters of no real import, but since this particular one touches on a key line from Shakespeare, I'm going to point out that "wherefore" in Elizabethan English means "why" and not "where". So when Juliet says "wherefore are thou Romeo", she's lamenting, "why are you Romeo" (i.e., why were you born a Montague), not wondering where he is.

Where art thou, Lotus Improv?

Excel is an incredible product, especially if you look at its UI.

- Performance is astounding. Sorting thousands of rows by color? No problem.

- The formula Editor. I edit the cell which holds the formula and it highlights, in color, the data ranges that provide the input for the formula. I can copy the cell, which moves the formula. I can, through drag and drop, expand the data area for input.

- All the little things. The Sum, etc helpers in the status bar. Autoexpand of number ranges.

If you sat down now and tried to replicate it, you would tear your hair out. Best piece of software ever done by MS.

That being said: Powerpoint is the exact inverse. What an amazing piece of sh*t.

What makes you think it should be hard to sort thousands of rows by color?

The cost of an ERP is the main issue to stop the SMBs from using excel.

"The world runs on Excel, and we should be afraid"

Ah. No SQL. No Java. No C#. No webapps. No JavaScript. No IT economy (Apple, eBay, Google, FaceBook, Oracle, SAP, etc.: they're all running on spreadsheets producing what? Spreadsheets of course).

Corporate drones use Excel. The corporate world represent less than % of the GDP. Hence:

"The world runs on Excel, and we should be afraid".

Perfect linkbait title. Well done. I'm sure the spike of trafic thanks to HN has been noticeable.

The Corporate Word does represent a smallish % of GDP. But the impact of 1) Government Departments, 2) Banks, 3) Infrastructure Providers (eg Telco's) are felt by everyone in the economy.

Quick Thought Experiment: which would be more disruptive - every YC company shutting down for a day (as-in all their services going offline and completely unavailable) or $LARGEST_BANK or $LARGEST_TELCO in your country shutting down for the same 24 hours?

>Perfect linkbait title. Well done. I'm sure the spike of trafic thanks to HN has been noticeable.

Simon and James are both best selling authors and have been blogging about economics and finance for years. Simon was the economic head hauncho at the IMF and is a professor at MIT.

I'm sure they are concerned about the traffic spike from HN...

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact