For example, I'm inclined to list financial data in this somewhat-normalized format in an Excel spreadsheet:
Apples 1/10/2013 40 $50
Oranges 1/12/2013 12 $200
Apples 1/15/2013 30 $80
Oranges 2/1/2013 10 $40
Pears 2/2/2013 50 $100
Pears 2/9/2013 20 $40
Apples 1/10/2013 40 $50 Oranges 2/1/2013 10 $40
Apples 1/15/2013 30 $80 Oranges 1/12/2013 12 $200
Total Apples: 70 $130 Total Oranges: 22 $240
Pears 2/2/2013 50 $100
Pears 2/9/2013 20 $40
Total Pears: 70 $140
While I understand that their priority is to not care about data processing...not only is this format extremely annoying to machine parse, but it seems unwieldy for any professional use case. When you want to add more rows to each category, you're going to run into layout problems, for one thing. And at some point, your wrists are going to get a nice dose of carpal tunnel syndrome from all the clicking-to-navigate that you end up doing.
When I first saw this functionality I was really excited by it as a way to keep data consistent and independent of the layout. In reality I think the tooling around creating/moving/linking the cell groups is a bit awkward to use. Perhaps some day it will get there, or someone else will pick up the idea and run with it some more.
Numbers puts bounded tables on a page instead of fitting the pages (and everything, really) into an endless table dating from Multiplan, thus solving the "hacking the cells to implement layout" problem. In this example a table is actually selected, and allows for south, east and south-east extension.
Then show them how to lock panes in Excel. Problem pretty much solved.
If you want people to use your "professional" code, learn something about how to get it done now. The people who are making it happen right now don't have time for all this. Not everything needs to be engineered - not even the small chunks of the global financial system I have dealt with.
This is true. However, in a professional environment, it is reasonable to expect that a professional level of attention and diligence is being applied to most tasks. With a half-billion dollar spreadsheet, it's not going to take much of a rounding error to make that software engineer's salary look insignificant.
You draw a distinction between "professionals" and people who make spreadsheets. Why is that a good distinction? Shouldn't those people have some professional standards too? What are you doing if you don't have time to do a good job on something? Heck, why is it okay for you Excel programmers to not have the same set of standards that the rest of us follow?
Following all these rules of procedure and specs and change requests isn't any more fun from the other side of the table. The point of that stuff is, if the product does not conform substantially to those specifications, the person coding it is (usually) liable. Reimagine your spreadsheet-writing life with the idea that you could be sued into oblivion for any losses incurred by your code, and you might develop an all-new appreciation for testing and specifications.
But you know, maybe it's your money on those spreadsheets.
Can you point out a single case where a programmer was held personally liable for losses incurred by their code? Just one?
Libility is usually limited to willful wrongdoing and negligence, and when it involves work done by an employee, it's the employer not the employee that's held liable.
But you're right about the financial liability: I've never heard it happen; but what with Murphy's Law I'm still retaining a Liability insurance just in case it should ever happen to me.
I do not see the point of the article.
In fact, we should instead celebrate that excel is used for so many things that can be better automated - just think about the business opportunities!
EDIT: and yes I read the article, and it specifically points to a coding mistake. When I work with excel (sometimes I still do!) I create references to the cells holding the formulas, and feed them some known input and compare the results to what I expect - if the computations do not match the formula may have been overwritten so I print a "ERROR DETECTED" in a special "TEST" field next to the cells holding values.
Congratulations - you have a test suite!
And that's just one way. Sometimes when I have to do a different implementation I just keep the previous implementations of the same algorithm somewhere on the sheet, use the same input and substract the results - any non zero value should get you worried enough to display an error message. This is interesting for new edge cases that your test suite could miss, and especially useful if the initial algorithm was vetted in some way.
Congratulations - you have added regression !
All this is easy to do, just a copy/paste special away. Even better- you can teach that process to non coders and they will see how useful it is, because anyone who has used excel for non trivial things has been exposed to such problems and is weary of excel limits.
The tool is not the problem. Software development is a process, and if you do a poor implementation, without tests, without regression testing, guess what it will bite you - but it would be the same in C, perl or python if you try to do without at least tests.
TLDR : There are many ways to make excel more robust. Maybe the "experts" are not really experts if they have never been exposed to copy-paste induced errors and never had to take care of such errors.
Did you read it? The problem is that for all its power, quirks about Excel as a product make it very easy to make entirely hidden errors. Too much magic in the hands of unsophisticated users can lead to trouble, but in Excel's case, even experts can make mistakes and not notice.
And when components of the global financial system are exposed to such risk, that gets problematic indeed, as the article details persuasively.
I'v seen the process, and I know of what the process is in one of the Big Firms. There is almost always someone above you to check your work, because the drudgery and pain of a model is farmed of to an analyst. The associate/seniors take a look at the work, and finally there is almost always a sanity check which is not trivial.
All of the major finance firms have lots of people who've burnt their fingers with excel mistakes.
They know that their analysts and their associates can and will break something. (Heck the analysts do so regularly, I'm sure there are several analysts freaking out over REF errors right now)
The London Whale wasn't just because mistakes in excel. There were several things that broke here.
The assumption that the users are unsohpisticated in the use of excel is a bit naiive. If anything, the most arcane applications and commands of excel will likely be best known in the major banking firms.
But still, spreadsheets are full of normal human errors, and based on flawed assumptions, with little oversight, and they control huge amounts of money in novel instruments.
> "This market grew very quickly due to the ease with which it was possible to design and promulgate opaque financial instruments based on large spreadsheets with critical flaws in their key assumptions." Grenville J. Croll, 2009 [http://arxiv.org/abs/0908.4420]*
> "Complex financial instruments such as CDO’s are implemented within, and valued by, large and complex spreadsheets. CDO’s and other credit derivatives played a leading role in collapse of the global financial system”. Grenville Croll [http://arxiv.org/abs/0908.4420]*
> “Spreadsheets have been shown to be fallible, yet they underpin the operation of the financial system. If the uncontrolled use of spreadsheets continues to occur in highly leveraged markets and companies, it is only a matter of time before another ‘Black Swan’ event occurs [Taleb, 2001], causing catastrophic loss. It is completely within the realms of possibility that a single, large, complex but erroneous spreadsheet could directly cause the accidental loss of a corporation or institution, significantly damaging the City of London’s reputation”. Grenville J. Croll, 2009 [http://arxiv.org/abs/0709.4063]*
So, when people are creating their own spreadsheets to get stuff done, and using that for multi-million dollar transactions, and not getting the software audited or approved, they're introducing risk to the company.
Can you cite the source for this data? I would like to read more.
> Consistent with Other Human Error Data
> When most people look at Tables 1 2, and 3, their first reaction is that such high error rates are impossible. In fact, they are not only possible. They are entirely consistent with data on human error rates from other work domains. The Human Error Website (Panko, 2005a) presents data from a number of empirical studies. Broadly speaking, when humans do simple mechanical tasks, such as typing, they make undetected errors in about 0.5% of all actions. When they do more complex logical activities, such as writing programs, the error rate rises to about 5%. These are not hard and fast numbers, because how finely one defines reported "action" will affect the error rate. However, the logical tasks used in these studies generally had about the same scope as the creation of a formula in a spreadsheet.
He has cites at his website too.
I find that most people that program excel are most likely quants that just want to get the job done, do something to allow the bank to generate bigger profits.
Disclaimer: I work in one of these banks.
And if so, is the reason why no one has gone for the more bespoke route is because no one who programs is in charge of these companies, and thus does not even know that they could benefit from such an approach?
The only parallel I have some experience is in the publishing industry. No one thinks that the technical details of the CMS is important, as long as stories are visible on the Internet. But if a clunky CMS requires you to hire 10 to 20 people just to move pieces around on the frontpage (this is on top of the backend that automatically adds stories to the system and to their proper categories), then that is not a trivial cost.
And I'm sure The Verge, with its custom CMS (in Rails, it seems) has some clunkiness to deal with, but they've been able to produce varied and interesting formats for their stories on a much more everyday basis than even the New York Times (I'm not counting interactive graphics/multimedia, I mean the general story form).
Where are they getting all this talent from? I assume they need atleast a certain degree of expertise which is incredibly difficult to get out of people right now...
Turns out, it isn't a perfect description of reality.
Engineered lumber is when you glue a bunch of different woods together. The resulting product has different properties from any of the original woods. It is more profitable for the lumber company because it can be sold at a price premium.
Similarly, financial engineering involves packaging up multiple financial products into a more complicated product. The product can then be sold at a higher price to gullible fools, aka "customers."
- Performance is astounding. Sorting thousands of rows by color? No problem.
- The formula Editor. I edit the cell which holds the formula and it highlights, in color, the data ranges that provide the input for the formula. I can copy the cell, which moves the formula. I can, through drag and drop, expand the data area for input.
- All the little things. The Sum, etc helpers in the status bar. Autoexpand of number ranges.
If you sat down now and tried to replicate it, you would tear your hair out. Best piece of software ever done by MS.
That being said: Powerpoint is the exact inverse. What an amazing piece of sh*t.
I'm no genius, but during my corp years in $LARGE_PHONE_COMPANY the excel macros I was able to write, interfacing with SAP-HR/BW and HP's Quality Center, was a great time-saver and made a lot of things very stream line and less error-prone. All the COM objects that are available to VBscript create crazy possibilities that, really, you could start a whole company just writing macros to stream-line work for other corps.
I really should look into that kind of thing in Open/LibreOffice now that I'm more at home in Linux.
Oh, quite a lot of companies started that way. Many of them still do it exactly like that.
Worse: for a while the trend was clearly "all these Excel sheets are terrible, let's replace 'em with <SHINY_NEW_CENTRALISED_WEB_SYSTEM>!", but now established web-based software products are busy touting their wonderful Excel add-ins that seamlessly push data back-and-forth from spreadsheets.
You can start with that now, and be done with it by tea time...
Since LibreOffice embeds macro writing in Python, the environment is pretty rich - even compared to the mighty Excel VBA...
The European Spreadsheet Risks Group has some 'horror stories' (http://www.eusprig.org/stories.htm)
It's an interesting field of research, and you could probably make a bit of money if you can audit unintentional human error; deliberate human deceit; or software errors.
After the London Whale trade blew up, the Model Review Group discovered that the model had not been automated and found several other errors. Most spectacularly,
“After subtracting the old rate from the new rate, the spreadsheet divided by their sum instead of their average, as the modeler had intended. This error likely had the effect of muting volatility by a factor of two and of lowering the VaR . . .”
It's not like programmers are immune from making such errors. You can put the wrong equation into a page of C or Java as you can onto an Excel spreadsheet.
The difference is that programmers know they will make such errors and thus have developed tools, techniques and a culture that acknowledges the necessity for testing to find and eliminate bugs.
In contrast, I'd wager that less than 0.01% of financial Excel sheets has automated tests, and less than 0.1% has a written test plan. If you confront your typical "work-hard-party-hard" hotshot trader with demands for such, the response is most likely "Testing? I wrote it, I know it works, do you think I'm stupid?"
LOL, seriously? How much software do you use every day that doesn't have bugs? Especially software that only gets used every now and then?
Or were you jsut put off by the phrase "eliminate bugs"? That wasn't meant to imply eliminating all bugs, just those the tests have found. And we all know they don't catch 'em all.
Too true. It wasn't that long ago that the programmers at The Knight Capital Group nearly sent the company broke with their $430 million programming glitch.
But when a programmer I accidentally promotes test code to production, it's just an accident.
That is exactly the point the OP that I replied to was trying to make.
When someone else fucks they are stupid and they don’t know what they are doing, but when the coders fuck up it's always someone else that is at fault. Please give me a break!
Do you have a citation for this? I saw a lot of speculation at the time but nothing that was ever corroborated.
Excel is great, until that moment that you do something in haste over a large set of heterogenous data and cells.
And if you are doing finance, a world of pain awaits you.
edit: One other gripe about it, is if you use a macro, you can never use the easy ctrl-z out of whatever mistake you've just made.
Backups for a new analyst are few and far between. Backups for me a short while into my job could happen hourly.
- is universally available (who in an office building DOESN'T have an Office license?)
- can be "Programmable" to the extent that it needs to be. You don't have to start with code. Formulas and conditionals are great for most things. People usually ease into Macros gently.
- produces a format that is sharable. People "fork" Excel spreadsheets all the time. It's not pretty but it works.
- gets the job done. You want data entry with some calculations and maybe a few if-then rules here and there? What's better than Excel?
In a corporate environment, often the best way to get things done is to circumvent the official software and just write something that works. When we do it, we call it an elegant hack, but when guys in suits who went clubbing last night do it, we call it a terrible, amateur travesty that should be replaced by PROPER code as soon as possible.
And you know what, eventually that happens. Very rarely, an incredibly useful Excel spreadsheet will be replaced by an even more useful (and reliable) piece of custom software that also adds tons of value to both the users and the organization. But I've worked in corporate consulting for years; can you guess how often this happens? I'd wager it's less than 40%.
No, what typically happens is that an analyst or software dev notices someone's cool spreadsheet and says "hey, I can make something that does this job, but it'll be a LOT faster and I'll put the data up in the cloud and multiple people can access it at once and..."
And that sounds great, so they get a little budget and a project is born. Most of us who have been there and done that know what happens next: higher-level stakeholders get involved, broader objectives get defined, more team members are brought on, timetables are established, results are "metricized", paradigms are going to be shifted, etc.
Rarely does a piece of software escape from this process that is as genuinely useful as the spreadsheet which spawns it. Often, rather, it gets delivered 6 months late. It crashes all the time. What used to be one simple input field is now a whole page with checkboxes you have to check and sub-objects you have to define. The end result might look a little prettier but that cool Infragistics graph is locked inside the program and can't be shared like the old Excel report because no one hooked up the "export feature". People are getting upset. Everyone hates this program. But we have to use it, it's mandated by corporate.
Meanwhile, a talented new guy comes on the team and notices what a bloated piece of crap this software is. He wonders why no one has written a little Excel sheet to get around it and REALLY get some work done...
I know I'm being cynical. And look, I GET that rogue spreadsheets can turn into productivity-damaging unseen business risks. But until the corporate "software project" culture understands why it happens and why people are often far happier with their clunky spreadsheet than with your shiny WPF app or web page, I don't think this problem is going to go away.
However, the problem with custom-made ERP
software (or how they are typically developed) is that adapting them to new processes and ideas is typically slower and requires "real" programmers, where as systems like Excel that empower people without programming experience to model their processes, is much more dynamic and adaptable to needs of the business.
This is something that I'm currently working on.
The problem isn't Excel - it's lack of standards and precision in financial and business software. If you build a new ERP that replaces Excel, but lets users modify the algorithms, then you will have the exact same issue as you'd have with Excel. The slower adaptation to new processes and ideas is a good thing as it means that these new processes are going to be specced correctly and hopefully tested to some degree. It's not okay for a financial institution to save a bit of time and money on their software and put client's life savings and public tax money at risk.
Of course, this is all very idealistic, and I personally have no doubt that Excel spreadsheet hacks are here to stay. Idealism will always fall down if a few bucks can be made.
I disagree. Most businesses need more agility and experimentation, not less. There is a huge spectrum of businesses. It would be ridiculous to say to a startup that you need to experiment less and put safeguards to a place, so that you don't make huge losses of revenue. There are lot of traditional businesses in which downside can be guarded easily (retail, many sales organizations) but finding better ways to run the machine can give significant edge.
Most ERPs and Excel-sheet apps don't model automatic stock trading, and don't move billions of dollars without safeguards in place. Rather, they model and assist in existing human processes and the sad fact is that the ERP software is the stumbling point for rapid experimentation.
That is why Excel is so popular.
Isn't that the point though - the "RP" in "ERP" is "Resource Planning" so they really are mainly focused on process oriented manufacturing or services.
If you don't need the "RP" bit of an ERP then aren't you really just left with a financials package?
Nearly always, these started as "prototype" systems that became so useful, so quickly that they end up as "production" systems.
This is the power of Excel. But...
When spreadsheets get this large and complicated they are incredibly fragile and become hugely difficult to maintain and bug fix. It is just too easy to make an accidental change and not know your spreadsheet is broken. The VaR example given in the OP is just typical. Sure, you can protect sheets but as a developer/maintainer you need access and then you can very easily break things, especially because version control with spreadsheets can be next to impossible.
The moment you start taking your spreadsheets seriously, as a business process, is the moment you need to consider very seriously recoding them as a bespoke system using conventional software development.
Needless to say, when upper management turned over a bit, there was a big push to get rid of all of this wasted time. (I know of at least one person from Accounting that quit because of the mandatory evenings and weekends just to get necessary things done)
Not to mention the addition of complex rules with little to no documentation. Sometimes these spreadsheets are created by someone who knows Excel macros, then they move on, leaving others that don't know anything about it to run it. Maybe new requirements come down, and they poke around it at it to make it seem like things are ok, but they miss an edge case... etc. Not that this doesn't happen in the programming world too, but in the programming world we try to tackle these issues as part of honing our craft. In (e.g.) the accounting world, it's just a necessary evil to get done and over with.
I wish it was so. People using excel can get very — ahem — creative, of which I've encountered many a spreadsheet designed by some minion of hell, including (but not limited to):
- a GIS implemented with cells, one cell per bitmap image block, and excel vector drawing features, with atrocious macros doing things
- an insurance broker contract and customer database, whose records were separated by fuzzy formatting (like cell border colors and width), full of varying labels (typos and inconsistencies), and without specific cell placement for data, notably would-be primary keys.
Every single one of them should be locked down in a digital safe, guarded night and day, only to serve as last-stand honeypot tactical weaponry on sufficiently smart cyberwar adversaries, who, once spoiled by the guarding words of "Abandon all sanity, ye who enter here", would quickly have their mind cower in fear back into reptilian neurologic territory. The Snow Crash noise was probably one of those file's raw data.
2. Rewriting them found errors in the Excel spreadsheets in some cases.
3. They weren't a Microsoft shop, so most/all of the experience was with an Open Source stack.
4. Most of the functionality was implemented as a series of views in the database, with a layer of reports on top of it.
5. It was easier to just pull them into our existing system as additional reports, and just have an option to export the report to Excel (which was already baked in functionality).
If your users spend all their time in Excel, why not connect it to the enterprise back end?
It gives fast positive feedback.
It gives very slow, often quite spectacular, negative feedback.
Making nice charts and dashboards is not enough. People need the ability to process the data.
I know a startup in the Netherlands working on just that, and it's a golden business opportunity with a large untapped market!
The problem comes when people start to view their database as the "point of truth" for some data and then want to suck their data back into the database because the sheet is "more upto date".
This of course is a hell of a problem when different people have different versions of said spreadsheet and want to somehow merge them all back together.
One advantage of macro laden spreadsheets though is that they are often much better for gathering actual requirements than some word document put together by committee.
I suspect that individual files would become humongous with those features added.
Space gets cheaper every day.
A few things may be an issue with implementation, one of which would be excel macros. As it stands excel undo cannot regress past the last use of a macro. I suspect a state list may encounter a similar roadblock.
As it stands excel can be very very slow on large sets of data.
On a sufficiently complex and clean model, I've seen load times and computer slow downs which begin to make Tokyo traffic seem mild, if it doesn't just hang.
They "solved" it so: each morning one person imported the data from both systems into huge excel spreadsheet (file had almost 100 MB), then she send the file to person that know how to correct all the duplicated and missing data in a few columns (it wasn't turing-complete process, it was heuristic, cause systems used different subdivisions and codes for same things, some data required to be sure how to integrate them was missing, and some data was changing without reflection in the systems - like the ingredients used to make one unit of product could change every day, and NOBODY KEPT RECORD of when and how it changed, only some people in production knew how they make the product today).
Then that person send the file to the next person, that knew how to fill the next few details. And so on, close to 15:00 file was supposed to be ready to send to the management. Sometimes it took till 20:00 (hours after 16:00 wasn't paid). When somebody important was on leave, nobody knew how to fill the details.
That spreadsheets had almost no formulas in them, cause people "cut and pasted as values" all data after calculating it (my wife was told it's to "protect the data from accidential changes"). I think it was to prevent people from seeing the errors.
Everybody knew that the data is all wrong. Nobody knew how to even get the correct input for most fields, it was all fuzzy (for example - one person in production knew which labels are bought in 100s, 500s, 1000s, etc. He was sent the file to fill that details, nobody could check that, and there was always some contradictions in the last pages of the spreadsheet. Most of the work went into massaging it untill it seemed OK. The file also crashed half the time, even on computers with 4GB of ram and 64bit windows.
I wonder, how they managed to stay in business (they are still around, and have quite big marketshare).
Verizon's billing system is the same way. Is it bad? Yes. Is it worse than AT&T's? Probably not. Are people going to quit Verizon because of billing errors? Well, I did, but most people wouldn't.
Finance guy here (hobbist programmer). The more I think about this, the more I realize the way to exploit this is through well designed and _specialized_ software which doesn't have to evolve around spreadsheet at all. You have to look at the work flow of the professionals using (abusin) Excel.
I give you an example: audit. Now, the final product of an audit review is what? An MS Word report, that's what they sell the client. How efficient is this? Barely efficient, consider that:
1. There is no version control.
2. Tables and pictures look horrible and are a pain in the ass to format (oh latex)
3. The doc is edited by various people: people paste in the report the wrong stuff --> client gets crappy drafted doc --> bad image
4. The document "links" to different workpapers (Excel sheets with the calculations) that you have to go and pick up, you can't click and get the doc like it was 2013, you need to search it like in the 80's
5. The senior has to review it all, guess how easy it is to do this navigating tons of MS docs and spreadsheets
Then you get to the spreadsheets... it's another mess:
1. No audit trail
2. No workflow of how you go from the original data to the calculation. This is very important, you need to tie to the original source of data ALWAYS
3. You use crappy templates 10 years old with things nobody knows anymore
4. You can't read well .txt files and you can't process big amounts of data (what if you want to test for frauds in petty cash in a 20 billion multinational? For example)
I mean, there are tons of inefficiencies, the points are 2:
1. How you sell your shiny new perfect product to people used to abuse data in this way?
2. How do you build such a product from a programmer background? And here I have my rant about ideas not being useful, they are if they come with lots of market insight focus.
And that's about it. I think, coming to your point: yes, there are inefficiencies, no, you will not get there if you don't have insight into the specific profession you are trying to provide tools to.
So how come users still manually set the font size for their headings instead of using styles; how come they ignore Track Changes; why do they copy spreadsheets instead of enable the Share Worksheets functionality -- and so on -- well, it's simple: they don't know about them. Most Excel/Word users are self taught or are given a very rudimentary training course by their peers or one of those cash-generating "Certificate Farms" you pay out the nose for.
I definitely agree that to solve the Excel woes you're going to have to write domain-specific software; Excel's already the Swiss army knife.
Should we be worried about driving cars and flying in planes because they're so complicated internally errors are easily hidden? A bad mechanic will always blame his tools. There is nothing wrong with Excel and just because errors are easily hidden doesn't make it a bad program to use. I'm sure error rates are pretty low anyway.
The good thing about the engineering industry though is that all those spreadsheets get checked by someone more senior than you.
The problem is though, thanks to the increased usage of finite element analysis and more detailed models, the amount of data being pushed around in spreadsheets has grown exponentially.
Far beyond the ability of someone to reasonably check it all.
Is that a new elaborated form of Karma whoring?
Most of the time your spreadsheet even if it has perfect logic will fine, until some weird race condition gets hit the price being loaded gets transposed and you model gives you a wildly wrong answer which you then trade on.
The problem is that economics is not a science, and finances are designed to enable power plays. We need to apply real science and technology to problems rather than letting a bunch of suits run around scamming everyone.
Sure there is... you set up a spreadsheet with a known test data set, and make sure your formulas compute the expected results. Just like testing any other software.
> The usual aim of spreadsheet audit is to verify correctness. There are two problems with this: first, it is often difficult to tell whether the spreadsheets in question are correct, and second, even if they are, they may still give the wrong results.
You're assuming that people know what model to use; use the correct model; implement it correctly; and use the correct inputs for that model.
Since most of these ad-hoc spreadsheets don't have any kind of specification, let alone a formal spec, it's really hard to know what they're supposed to do, and that makes it impossible to know if they're actually doing it.
And that's just one spreadsheet. Many spreadsheets are linked - you take data from some other unauditable spreadsheet for an input.
After all of that you still have users manipulating the spreadsheet. We know that users will make errors. We can minimise the errors using excellent design (excellent design isn't always used for expensive customer facing planned websites; what hope is there for semi-secret ad-hoc internal quick and dirty spreadsheets?) but even with clear labels we know that users will make errors, and that some of those errors will trickle through to future versions of this spreadsheet, and maybe other spreadsheets that depend on this one.
If you can automate (or partly automate) spreadsheet testing there's probably money to be made.
"Software Engineering for Spreadsheets"
"Automatic Detection of Dimension Errors in Spreadsheets"
"Type Inference for Spreadsheets"
- Excel models are usually built on a pre-existing template, probably created by someone else. Any errors/hidden assumptions carries through. This is he biggest source of errors.
- Any large model usually has a tab devoted to reference tables and assumptions based on external sources, a la dependency injection. Usually you have to use static numbers since you need to email the model around (you don't want your boss seeing a bunch of #NAs due to missing links. Keeping this data updated is another huge source of errors.
- A sanity check is your best defense. I'm positive that every CDO model I've ever built has contained at least one error, just like every large software application has at least one bug. The difference is that models have fewer output parameters, which allows you spot incongruities more easily.
- People who use big Excel models are not software engineers and will never be. In my opinion, solutions that implement concepts like testing, diffs, commits, etc are a waste of time, because it creates extra work and learning curve.
As far as the origin of spreadsheets that Bloomberg helps create, they usually fall into one of two categories:
- There is a gallery of great complex examples that have already been created and designed (probably initially for one client / company) that are available to launch from within the Bloomberg Terminal w/ screenshots, descriptions, etc. They can be searched and when a user finds one they like, it is downloaded and opened in Excel, allowing them to further customize it.
- The user contacts the Analytics desk (chat/phone) and is either trying to integrate live market data into an existing spreadsheet or needs to build something complicated from scratch and does not know how to go about doing it. The team specializing in Excel spreadsheet builds takes over and works with the person to build what they need. In many ways, it's like a small consulting gig -- a service provided to Terminal subscribers.
Once the user has the spreadsheet via either method, they are free to share it internally like any other spreadsheet.
Take my uncle for example. He is mediocre office worker, selling lawn-mowers for living. He took the time and learned some advance excel functions and created few cool spreadsheets for his office, and everyone loves it. His reputation grown a lot with bosses and e is known as the guy with solutions.
Excel is a really good example of how easy it is to get up and running, and how easy it is to get yourself into a lot of trouble. You might not need to program per se, but you do need to know how to DEBUG and TEST your programs as well. This is something that most people are loathe to do. It requires a great deal of discipline, and even though in finance you're supposed to test spreadsheets during internal auditing, most auditors aren't trained in proper testing/coding, so it ends up being ad hoc. In fact, it's scary how much intermediate financial statements are rubber-stamped/robo-signed by people in the finance, usually because they are pressed for time.
First, what people are doing in Excel is coding.
Second, they do it because Excel is, thus far, the best environment available for lowering the (many) barriers of entry that are perceived as blocking one from learning to "program".
Therefore, in my never to be humble opinion, we'd also be wise to try to figure out a way of helping build in testing methodologies into excel. I think it wouldn't have to be that hard. Imagine a testing mode or a test overlay that you could toggle on and off, allowing you to identify input and output cells, setting test data and expected output, etc.
The crucial thing is that the best work on this front would have to come from Microsoft, if it has to be in Excel. One possible fix would be a testing environment that reads .xlsx files – build something great, sell it to managers and MSFT will either buy it or, more likely, imitate it.
It requires first and foremost the realizations that bugs in code are frequent, normal and to be expected, no matter how smart and well-prepared you are.
And that's one thing you learn when you learn to code. The first programmers were quite shocked when they found out.
This astounding story is a really nasty example of why "always comment your code" is a good idea--but only if someone else with knowledge of business logic reviews the code.
While I was there I found out why the report was needed.
The bank had managed to misplace a couple of million dollars in a missing trade.
They did an audit, trying and find out what happened to the missing money and came to the conclusion someone had accidentally deleted a row from the spread sheet being used to record the trades.
My question is : would the error in the model have been caught if the model would have been written in a more technical mean, say an R script ?
Excel does not give any feedback if the "error" is conceptual, but so it happens with more technical programming languages and environments. Excel puts out "errors" if you write a wrong formula, the equivalent of a syntax error for a programming language.
On the other hand, one would think the "on your face" presentation and ease of use makes it easy for other (non technical, but skilled in finance/math) people to review the numbers and calculations.
I agree with the author that I the fault lies in too many people trusting blindly the model, without reviewing the numbers/calculations, but I'm not sure Excel can be pointed out as the culprit.
Improving the world's most popular functional language: user-defined functions in Excel
Strictly speaking, Excel doesn't have functions at all, let alone first-class, since you can't define a new function using only cells and formulas. You have to go to VBA.
In some ways, what has been the evolution of Excel in our company has been to switch to Google Spreadsheets, where you get the audit history, sharing, and version control solved, you can build web forms in less than five minutes to have people submit data, you get a decent amount of programability and web service API.
Most of the time you wouldn't even see the code, just a bunch of numbers on top. Being able to switch back and forth between the code and the resulting data would be a huge help. I like what Bret Victor said in his learnable programming series where he tries to build on this principle http://worrydream.com/#!/LearnableProgramming (Great read btw if you haven't read it already.)
> One of the all-time most popular programming models is the spreadsheet. A
> spreadsheet is the dual of a conventional programming language -- a language
> shows all the code, but hides the data. A spreadsheet shows all the data, but
> hides the code. Some people believe that spreadsheets are popular because of
> their two-dimensional grid, but that's a minor factor. Spreadsheets rule because
> they show the data.
I do not by any means write enough tests for my code, but I have some projects where I've got tests around parts of code, and occasionally I catch regressions when I make changes because of a run of the tests (if I had more tests I'd probably catch more over time).
I've never seen anyone writing custom functionality in Excel that even understood the idea of tests or test/sample data. That is probably a bigger issue with Excel - it encourages people to just throw stuff together and 'start hacking' - with live data, often the only copy of said data - and not understand the potential consequences. Of course, you've usually got a backup of yesterday's data because it's sitting in your inbox, because often Excel files are just emailed around to people, which perhaps encourages yet another sense of 'security' around the process.
One way to at least start to approach the problem would be to put the equivalent of a SCCS into Excel. You'd be expected to checkin and checkout spreadsheets, changes would be tracked, you could diff against older versions, etc.
The analogy is not perfect, not least because many spreadsheets consist of a logic part, written by one person, and a data part, filled in by another. Even so, forcing a checkin/checkout model would provide an audit trail and, as has been said, storage is cheap.
Perhaps the second easy modification that could be made would be to make it vastly more visually obvious which cells are locked, and which cells are logic. It should be possible to look at a spreadsheet and just see the logic portion separate from the data portion. Imposing this by visualization would at least make it less likely that spreadsheets get checked into the SCCS without appropriate locking.
I could add more, but in all cases it seems to me the idea is: look at the problems we programmers had in the 90s, and how we solved them, and do likewise. I've suggested SCCS and syntax coloring, but one could go to talk about, for example, code commenting, or making it natural for people to name cells (make that the thing that's easy to do, and make the cell names easily visible) so that faffing around with B$3 becomes a weird thing that normal people would not bother with --- just like in coding I don't tell the compiler to use register EAX, I just use a variable name.
Those who sanctioned trading on VaR models built on pasted data were obviously not qualified to hold their jobs. Neither are those who over-glorify or over-promote the virtues of "it ain't real unless programmers built it", and I say that as a programmer. I work with other UI tools now, but I have yet to work with a better or faster-adopted one. The ubiquity of Excel in the workplace is well earned; the poverty of its many implementations is no different from all the other ubiquitous human poverties.
Every day the team leader would ask a junior to run his pricing model for a structured product.
The junior would duly press F9 and the macro would commence, with the screen flashing away like a strobe light.
24 hours later the macro would stop and the boss would return to find out the answer, which was usually wrong by an order of magnitude.
In my experience the typical executive user would prefer spending hours or even days working on a spreadsheet then spending even a few hours learning how to properly use some other task specific software regardless of whether or not it’s better suited for the task at hand. It’s not so much that the executives don’t understand the issues with using Excel so much as how Excel lets them do what they need and more importantly, want to do and all without having to learn a new system. The Excel model (how it works using rows & columns ) is intuitive to one with an accounting mind. I’ve had executives ask me before why can’t our really expensive <fill in the blank> Enterprise Ready Application Do X when Excel can do the same thing and easily? This is especially true when dealing with what I term JCL, Judgment Call Logic. Unlike programming structures (If/Then, CASE, etc) , JCL is perspective based and results in a different value based on who is assessing it. This is why its “judgment” logic and not natural logic. Excel users don’t understand how you can’t code JCL into software.
That said there is certainly a need in the software world to do things better, smarter. The world of software engineering needs more byte artisans and less bean counters. Too often short term decisions are made with negative long terms effects, all done in order to meet some artificial deadline that has more to do with someone getting a commission for meeting a deadline then for delivering something of quality.
"But while many people did buy Resolver One, we struggled to turn it into a commercial success. The vast majority of people are happy with either Microsoft Excel, or one of the free alternatives available from Google and the like. Since mid-2011, all of our development focus has been on a new product, PythonAnywhere."
University of Greenwich, London, on the 4th & 5th July 2013
Few days ago, I built a very simple VBA utility that queries an oracle database to give me relevant transaction information. They love it and I am the superstar already. Reason: this requirement was discussed with IT teams, sent to priority queue for months. After building this macro which took me 2 hours, traders got what they wanted without going through an IT budget.
This perspective recognises that nothing beats Excel as a source of financial innovation and 'can-do'. The problem is that few want to recognize this essential role. The large software vendors need to demonize the issue to sell licences to their replacement solutions. The result is denial at every level.
Much better to appreciate the core strategic value that Excel offers and not confuse it with the essentially tactical value that an individual spreadsheet offers. (Though a tactical spreadsheet can be valuably tactical for 10 years).
The answer is transparency. Once one buys in to wanting to know what is going on then enterprise spreadsheet management can really help.
I have used the tools mentioned in this article, Excel and Powerpoint, extensively. And I have used them for engineering and business purposes. And, yes, I have made mistakes that have cost me money. Funny enough, never when used for financial applications but rather for engineering.
Over time we developed a rather complex (probably a nearly a hundred columns and two-hundred rows with data and formulas) Excel workbook used to calculate timing parameters and counter reset values for a complex timing and synchronization circuit inside an FPGA. After one crank of the wheel this spreadsheet would output a table of hex values that would get plugged into Verilog code to setup the circuit. Everything worked as expected, or so we thought.
Then, a few weeks later, as the code was being run through test-benches (unit tests) we discovered that a corner case was broken. Being that it was hardware we immediately thought that the problem was the circuit. The code was checked and re-checked, simulated and re-simulated. After about six months of hard debugging work we discovered that the Excel spreadsheet was the culprit. We had used the ROUND() function in a calculation, when, in reality, we were supposed to use ROUNDUP(). Only one of hundreds of formulas contained this mistake. This cost us six months of engineering time and, therefore, lots of money.
Having suffered through something like that I still could not bring myself to blame Excel for the problem. We made a mistake. No, in this case, I made a mistake. And that was that. Blaming Excel or Microsoft would have been nothing less than moronic.
I have used Excel to design polyphase FIR filters, maintain state machine code, produce complex lookup table initialization values for a variety of languages, prepare data for EDA tools, process sensor data and even generate G-code to machine bolt patterns on a Haas CNC mill.
Oh, yes, also invoices, purchase orders and financial analysis.
It's a great tool. And, if you use VBA you can really make it do some amazing stuff.
Powerpoint is another interesting tool. Of course, the most visible application is to create slides for presentations. However, you can also access its automation layer via VBA. And, yes, Visual Basic isn't the most elegant of programming languages, but, who cares, this is about getting a job done, not about authoring monuments to computer science. Anyhow, the most interesting application I found for this was the creation of a full simulation of the physical control panel for a device. The simulation included buttons and even a pretty good simulation of a dot matrix display. It was used during a training presentation in lieu of having the actual physical device there. The menu code was pulled out of the state machine that ran the menu on the actual device (written in C but easily translated to VB). It was really cool.
What surprises me is that financial folks, as related by the article, don't seem to have a system in place to create "checks and balances" on spreadsheets that might actually be used for very serious financial decisions. That's odd, but I can see it happening. Startup opportunity? Maybe.
I'd like to ask about a technicality as well, if you don't mind. You said that ROUNDUP was used in hundreds of formulas and ROUND in only one. How many of these formulas were identical except for relative cell references, as opposed to doing logically distinct things? By "identical except for relative cell references" I mean the kind of formulas you get by selecting a cell and telling Excel to copy its formula across a range -- Excel sees these as distinct formulas, but really they're computing the same thing on different inputs; in APL it would be a single expression. I'm wondering if the bad ROUND guy was identical to some other formula in the sheet, or whether it was a fundamentally new formula that someone typed in and forgot to use ROUNDUP. Really, I guess I'm just wondering how the bad formula happened to get entered.
The reason for that last question is that I'm interested in what a spreadsheet would be like that recognized common formulas across a range and treated them as a single data-parallel computation, rather than a bunch of separate cells that happen to have relatively identical formulas. Clearly this would have performance advantages (parallelism and data locality). But it would also seem to have a better chance at preventing errors, since it would treat "computation X across an array of N cells" as a single thing, rather than N relative copies of X that might get altered separately. Curious to hear your thoughts on that general idea too.
It's too late for me to go dig-up this old spreadsheet. Maybe I'll do it tomorrow. Excel does warn you when there are discrepancies between adjoining cells. If I remember correctly this was a cell that was copied down and used on a row. Thinking back I think the formula was entered once but replicated Dow a range of rows.
The problem with the APL comparison is that the language can become less elegant if the operations to be applied across the rows of a matrix are somewhat varied. I'm typing on an iPad right now so I can't even try to give you an APL example. Maybe tomorrow from my PC.
I have a busy day today so I have to stay off HN. I'll look into some of the items requiring deeper answers later tonight or tomorrow.
Excel notices when there is one formula that breaks a pattern, and puts a warning sign on that call (my Office 2003 version of Excel does). Now it presumably misses some cases, but the standard 'drag a formula, change one of them' case, Excel saves you from.
Take the usual life cycle of a young financial analyst in one of these firms (not even doing quant stuff)
He makes model, it is sent to his superior who checks it and tells him the out put is wrong, the PE ratios he has calculated for the firm are through the roof, at something liek 18x while the rest of the industry is always at around 3x. (numbers are fictional).
Check your numbers.
Further, the lowly analyst has also made a million errors in even reaching this stage - from incorrect formatting, to incorrect formulas, incorrect linking, to the dreaded ref error.
They do this stuff constantly, and the ones who can do it in their sleep and not die of exhaustion to the brutal hours are the ones who get through to the next level.
People have their fingers burnt regularly, and every day. Bad models still manage to slip through, but are more often than not, caught, or handled.
So people in the industry are frightened witless that something like a bad model may escape into being published.
It took several sequences of failure for the whale to amass the positions he did, and it is an uncommon sequence. Which is why its such a failure on the industry's part.
Everything goes well until either the data changes enough to make the error in the code matter, or some other code is added which relies on assumptions about how the broken code works, and the sheet starts pumping out garbage.
But the most important thing you need to know about tools is this: tools are not neutral. Every tool promotes working in certain ways, and discourages working in other ways. All programming tools can perform the same calculations; its their effect on the way we work (and the way we interact with the other humans we work with) that changes between them.
Excel IS an amazing tool, but it has many extremely serious faults which have been mentioned on this thread. Some of these can be solved by adopting specific working practices for making spreadsheets, but sometimes this is like trying to nail jelly to the ceiling -- the things that Excel makes easy are often the things that you want to 'ban' people from doing (things like forking a sheet without passing the changes back).
One enormous problem is that it's difficult to track progressive changes to the code (formulas or macros) in a spreadsheet. What you normally save is a binary blob which is impossible to diff.
Another one is that there is no straightforward way to apply the same code to different data. If you copy your sheet, you now have two versions of the code which can diverge. So people add more and more data to the same sheet, and add more and more ways of selecting the inputs you want, whereas a program in a conventional language is separate from its input and can be applied to any input.
A third problem is that while it's easy to call external code from Excel, running Excel as a library and calling it FROM external code is extremely complicated and arcane. (Just look at the number of questions on programming help sites asking how to do straightforward things with VSTO if you don't believe me.) So rather than make nice apps which interact with Excel, people make 'applications' inside Excel which are really just large, complicated spreadsheets with weird dependencies. Since Excel and VBA weren't really designed to do this a lot of these work very badly.
Some of these things can be fixed for the spreadsheet model, although for a mix of technical and commercial reasons they might not be able to be fixed within Microsoft Excel. However to appreciate why they need to be fixed, you have to understand that programmers don't use things like version control because they love bureaucracy and due process. (Most programmers HATE bureaucracy and due process, and almost all of them resented the overhead of version control and code review until they very quickly discovered how huge the payoff is when things go wrong.) Likewise, we don't use sophisticated languages and study theory because we want to 'author monuments to computer science'.
These things have huge, and immediate practical advantages. The programmers' task is to make people understand why they benefit if they do blame their tools, and start seeking better ones.
The problem is that those who are developing complex Excel models are not necessarily taking into account the need for solid design, rigorous testing and model validation. The quant generally building an Excel model is generally thinking "how can I get this model built and running as quickly as possible?" Whereas, a software developer who is used to bugs, and the negative impact they cause, might take a more guarded and gradual pace - with the realization of the importance of testing.
Neither Excel nor the quants that use it in a fast, ad-hoc basis are going away. And so I believe the solution involves application of good design, QA and validation to the Excel modeling process.
Outlook would like to have a word with you...
I just finished a 5 week long project revolving around replacing an Excel
sheet filled with a bus load of macros. Now the client has a web based solution that can use the excel sheet as an optional data import medium, but not be beholden to it.
*while you can structure things to pull data in from other sources, in my experience people evolve a spreadsheet around existing data.
What is bad is when they think Excel is a superior platform than some other more suitable technology simply because they have seen what it can do. I can excavate tunnels with spoons, but there is a reason we use TBMs for big projects.
This might be a great opportunity for new product.
The issue is NOT Excel vs Numbers, just like software engineering is not a fight about whether Java is better than Ruby.
The issue IS how can we use what we do know about making professional programming safer and more productive (limited though it may be) to make amateur programming in Excel safer and more productive.
And yeah, some of that WILL involve modifying the current wild and crazy world where anything goes for a world of more discipline. But you are not helping anyone if you are the guy screaming "keep your stupid for loops and your structured basic blocks --- I want the power to use goto whenever I want for whatever reason I want".
I saw a power user who would put reference values in the lower cells and would change the font color to white to hide the values.
Corporate drones use Excel. The corporate world represent less than % of the GDP. Hence:
"The world runs on Excel, and we should be afraid".
Perfect linkbait title. Well done. I'm sure the spike of trafic thanks to HN has been noticeable.
Quick Thought Experiment: which would be more disruptive - every YC company shutting down for a day (as-in all their services going offline and completely unavailable) or $LARGEST_BANK or $LARGEST_TELCO in your country shutting down for the same 24 hours?
Simon and James are both best selling authors and have been blogging about economics and finance for years. Simon was the economic head hauncho at the IMF and is a professor at MIT.
I'm sure they are concerned about the traffic spike from HN...