
You shouldn’t use a spreadsheet for important work - ot
http://lemire.me/blog/archives/2014/05/23/you-shouldnt-use-a-spreadsheet-for-important-work-i-mean-it/
======
seanstickle
Felienne Hermans has done some very interesting work examining how to refactor
spreadsheets.
([http://www.felienne.com/publications](http://www.felienne.com/publications))

Her PhD dissertation on the subject "Analyzing and Visualizing Spreadsheets"
([http://www.felienne.com/archives/2534](http://www.felienne.com/archives/2534))
is particularly detailed.

Much though some people might think that spreadsheets are not for serious
work, they're being used for that right now, and that is unlikely to change.
Better to figure out how to improve the tooling, rather than undertake the
quixotic quest to get business people to abandon their perfectly viable
programming tool.

There is an excellent video with her on InfoQ, explaining her spreadsheet
formula refactoring tool Bumblebee (and the F# code behind it):
[http://www.infoq.com/presentations/spreadsheet-
refactoring](http://www.infoq.com/presentations/spreadsheet-refactoring)

~~~
dragonwriter
> Much though some people might think that spreadsheets are not for serious
> work, they're being used for that right now, and that is unlikely to change.
> Better to figure out how to improve the tooling, rather than undertake the
> quixotic quest to get business people to abandon their perfectly viable
> programming tool.

Its largely used by business people because they (1) have inadequate IT
support to have things developed using proper tooling, and (2) have been
forbidden from using better tooling themselves by IT. Since both of these are
direct products of IT policy, I don't think its a quixotic quest to fix it --
however, trying to fix it by trying to convince them to just give it up is the
wrong approach. The people who need to be convinced to change the situation
are CIOs.

~~~
untog
A lot of people who have access to Excel also have access to Access - and a
lot of what they do would be better served in database than a spreadsheet. Yet
they don't use one - there must be a reason for that, and I don't think it's
IT restrictions.

~~~
andrewfong
This little tidbit from Chris Granger seems relevant:

"Excel is inherently observable since it doesn't have any hidden state and all
values are there for you to see and manipulate. It's also direct. You change
values in the grid, drag drop things, do calculations on selections, and so
on. And it manages to sidestep a lot of incidental complexity; spreadsheets
are timeless, without setup, and don't even have a notion of being run."

[http://www.chris-granger.com/2014/03/27/toward-a-better-
prog...](http://www.chris-granger.com/2014/03/27/toward-a-better-programming)

~~~
frandroid
Saying Excel doesn't have any hidden state is stretching things a bit--your
formulas don't show up until you are directly on them, so mistakes (formula
results that got turned into static data, for example) are hidden until you
put your cursor directly onto them.

~~~
rahimnathwani
"formula results that got turned into static data"

This hints at one of the sources of problems with Excel. One good approach to
using Excel is to always be in one of two modes: changing the structure
(equivalent to coding) and entering values (equivalent to using software). Of
course, at the early stages of designing something you will be entering lots
of dummy values (e.g. ARPU) to check that your formulae work.

However, many people don't distinguish between collections (rows, columns or
other contiguous areas) of cells which _should_ contain values, and those
which should contain formulae. I, like many, prefer to indicate input cells
with a yellow or orange background, so that I know everything else is a
formula.

You can do even better:

\- If you're done designing a spreadsheet, and expect to use it for a
production process, then mark the input cells as unlocked, and protect all the
sheets. Then the end user won't be able to mess up the formulae.

\- Always write formulae in a way that they can be copied across or down
whilst pointing to the right places. This can be achieved through use of one
or more $ signs to fix a reference. I've seen a fair number of spreadsheets
where there are a large number of similar formulae, but they have been
entered/adjusted manually. This is fine only if you never make mistakes, and
if no one else needs to change the formulae later or verify they are correct.

My point is that "formula results that got turned into static data" can be
avoided with just a few easy rules. However, most people aren't taught these
rules :(

~~~
dragonwriter
> Always write formulae in a way that they can be copied across or down whilst
> pointing to the right places. This can be achieved through use of one or
> more $ signs to fix a reference.

Except if you are using modern version of Excel, don't general use $
references for this, go one step further and use names (for fixed individual
datapoints where you would use two $s) or named tables with named columns (for
the most common use of single $ references.) This is more self-explanatory and
less error-prone.

------
piokoch
Very often Excel is a "programmer in a hostile environment" last hope. By
programmer I mean someone who works in sales, bussiness analysis, finances,
etc. but knows how to write programs.

In a typical non-IT company someone who needs to automate something has to
wait for IT department to purchase software which would do the task. And very
often it turns out that in software purchase process the purpose of that
software is lost, requirments are comming from a wrong person and at the end
that software is pretty useless.

It is not that someone can just download Python or Ruby or Java and start
coding. No, no, company IT would never allow for this because they fear about
security, patents, licences, etc. (and it does not matter that these doubts
are not justified, very often it is just forbidden and that's all).

But chances are that such company has MS Office. Excel is just an interface to
a better or worst programming languge. That's just better then nothing.

------
panarky
JP Morgan's $6.2 billion "London Whale" trading loss:

    
    
      An unnamed overworked staffer in London copied and pasted the
      wrong figures into an Excel spreadsheet, throwing the firm's
      risk models out of whack.
    

[http://finance.fortune.cnn.com/2013/01/18/jpmorgan-london-
wh...](http://finance.fortune.cnn.com/2013/01/18/jpmorgan-london-whale-
investigation/)

Spreadsheet errors are reaching epidemic proportions globally. What we need is
a transnational organization to contain the threat with research, best
practices and conferences!

[http://www.eusprig.org/](http://www.eusprig.org/)

Love their compendium of horror stories. Did you know the US Securities and
Exchange Commission has weak accounting because they rely on spreadsheets?

[http://www.eusprig.org/horror-stories.htm](http://www.eusprig.org/horror-
stories.htm)

    
    
      As we have reported in prior audits, SEC's general ledger system
      and certain software applications and configurations are not
      designed to provide the accurate, complete, and timely
      transaction-level data needed to accumulate and readily report
      reliable financial information...
    
      Many of the agency’s financial reporting processes are still
      manual in nature and reliant on spreadsheets and databases to both
      initiate transactions and perform key control functions.

~~~
twistedpair
I used to work for Deutsche Bank. All financially modeling ran through Excel,
though admittedly a massive C++ library that is a big Excel macro plugin.
These sheets get massive and there are a few clusters of blades that do
nothing but run Excel macros day and night.

But why is this madness? It _works for them_. The iBanking world is the
madness. You can put in place effective change controls and auditing with any
system. However, if people just _want it done_ and don't care how, regardless
of whether these calculations were in Python, Mathematica or Fortran, garbage
in will still be garbage out. So, let's not wantonly scapegoat Excel for a
process problem here.

~~~
dworin
There is actually a business reason why this is the case. The general
assumption in most investment banks is that either the market or the
regulations will change so quickly that by the time you've written software
the "right" way, either the market or the government has already moved on.
Excel lets you move fast, and make changes quickly, even if it results in a
bloated mess later on. Like you say, Excel isn't the issue, it's the nature of
the business.

~~~
orbifold
I would still expect that with their profit margins they would all have come
up with custom solutions by now.

~~~
jmeister
They all do in varying degrees - e.g. Goldman's Slang/SecDB -and are investing
in well-engineered valuation/risk systems that work across various trading
units due to regulatory pressures as well as to reduce manpower/maintenance
costs.

------
jasode
(Since the blog's website is not responding, I had to read the article from
google's cache[1])

The author has well-intentioned advice about avoiding MS Excel but it's
misguided. The criticism fails to accommodate _the reason why_ MS Excel was
used. MS Excel is the lingua franca of _non-programmers_. Thomas Piketty is a
trained economist, not a programmer. It's not realistic to expect Mr. Piketty
to set aside months (years) of his time to master C++/Python/Scala/Fortran/etc
to avoid using Excel. It's more realistic for an economist to use MS Excel to
back his thesis than for a Python programmer to write a bestselling book about
economics.

If we then tweak the advice to be, " _if Piketty is not a programmer, he
should have hired a compsci graduate student as a programmer_ ", well...
you've only shifted the (human source of) errors somewhere else. Plenty of
examples where software _not written in Excel_ had fatal errors: Therac-25[2],
Mars Climate Orbiter[3]

Lastly, some of Piketty's errors were _transcription errors_. In other words,
GIGO (Garbage In Garbage Out). Therefore, using Python while there were GIGO
issues isn't going to solve the data problem.

[1][http://webcache.googleusercontent.com/search?q=cache:1r99Ioj...](http://webcache.googleusercontent.com/search?q=cache:1r99Iojv98gJ:lemire.me/blog/archives/2014/05/23/you-
shouldnt-use-a-spreadsheet-for-important-work-i-mean-
it/+&cd=2&hl=en&ct=clnk&gl=us)

[2][http://en.wikipedia.org/wiki/Therac-25](http://en.wikipedia.org/wiki/Therac-25)

[3][http://en.wikipedia.org/wiki/Mars_Climate_Orbiter#Cause_of_f...](http://en.wikipedia.org/wiki/Mars_Climate_Orbiter#Cause_of_failure)

~~~
adrusi
Economists are expected to know how to program, and they do. They just have a
tendency to reserve writing programs for complex models and prefer
spreadsheets for what they consider "simple" data analysis.

On the point about transcription errors: Using python _could_ have avoided
some transcription errors. If the error was in copying a single constant from
a source into the program, then in python it might be buried as an unlabeled
magic number deep in the source tree, but it also might be labeled as a top
level constant identifier. In the former case it would have the same problem
as in a spreadsheet, but in the latter, a typo would likely be caught by
someone reading over the code at some point. Or if some of the data came in a
file format that excel couldn't automatically import, maybe he could have used
a python library instead of manually transcribing 50 data points. It's hard to
think of a situation where using python would have lead to _more_
transcription errors.

~~~
jasode
I don't know where this sentiment that economists (who are not trained as
programmers) are expected to know computer programming languages comes from.

Both Harvard and Stanford economics degrees _do not_ list Computer Science 101
as a core requirement to graduate.

Compare Stanford's requirements for Economics[1] with Electrical
Engineering[2]. The engineering major has core classes including computing and
programming. The economics major does not. (Whether or not economics studies
_should_ include it (and therefore Stanford is "wrong" in leaving it out) is a
separate conversation.)

Since Thomas Piketty is from France, is there a tradition of European schools
requiring computer programming language courses in their training?

As for the Python vs Excel comparisons, I say people are _overestimating_
Python's syntax to avoid errors and _underestimating_ Excel's visual spatial
grid of cells & near-universal collaboration to also avoid certain classes of
errors. For non-programmers, the rows-columns grid displayed _visually at all
times_ is almost a perfect 1-to-1 correspondence to their mental model. For
Python loops, it takes programmer training to mentally "unroll" loops and see
how data is munged. This visual "blindness" in programming languages ends up
creating its own class of errors.

I've done programming in MS Excel + Excel macros/VBA, MS Access, Oracle
PL/SQL, MS Transact-SQL, C++ Qt, C# ADO.NET/LINQ, Python,etc and I'm not
convinced the syntax of programming languages leads to reduced errors. It's
just a different class of errors.

[1][http://economics.stanford.edu/undergraduate/economics-
major-...](http://economics.stanford.edu/undergraduate/economics-major-
requirements)

[2][http://exploredegrees.stanford.edu/schoolofengineering/elect...](http://exploredegrees.stanford.edu/schoolofengineering/electricalengineering/#bachelorstext)

------
thatthatis
Spreadsheets ARE software, anyone arguing differently has insufficient
experience in at least one of the fields to have an informed opinion. I've
worked professionally in both, and the argument in this article is bunk.

Spreadsheets are a different kind of software with advantages and
disadvantages compared to compiled/scripted languages.

How do you treat a spreadsheet as real software:

* One operation per cell

* Inputs, processing, and display separated into distinct sheets

* Every formula checked twice (or three times)

* Write row and column check sums

* Treat row and column locking ($F4 or F$4) as the footguns they are.

* Name constants

Testing spreadsheets requires a different set of tools and behaviors, but to
say that "quick and dirty" spreadsheets are less durable than "well engineered
software" is really to say that "quick and dirty analysis" is less durable
than "well engineered analysis."

Spreadsheets are remarkably easy to learn to use and to use correctly compared
to scripted/compiled software. Not the least benefit being that they force
abstraction in a way that is extremely intuitive to most people -- spacial
context with the labels and state simultaneously displayed leads to much much
lower context loading cost (for the human brain) than keeping all the abstract
variables in your head when writing code.

Mr. Lemire seems to treat the tool as a toy then claim it is only good as a
toy. Flub and not Flub languages alike can be run quite easily without unit
tests, in which case they are no more durable to errors than excel (and
arguably less as they don't display intermediate state).

~~~
dragonwriter
> Spreadsheets are remarkably easy to learn to use and to use correctly
> compared to scripted/compiled software.

I don't think this is true. IME, people who aren't primarily
programmers/developers and have been using Excel for years tend to have worse
practices developing Excel tools than the same people have with much less time
learning, e.g., Python. Excel is much easier to learn to use to get some kind
of result, perhaps, but not easier to learn to use _correctly_. (Part of it, I
think, is because the cost of incorrect use is somewhat more deferred, which
is also part of what makes it more attractive to use initially in many cases.)

~~~
thatthatis
I largely agree with you.

Perhaps it is a difference in semantics.

It is very easy to learn to use correctly, but if you aren't taught by someone
who uses it correctly (either self taught or otherwise) you're not likely to
discover the best practices.

Easy to learn correctly != obvious way is correct != commonly learned
correctly.

------
barrkel
Billions are made in the finance industry from Excel spreadsheets. Whole funds
have been run from a single spreadsheet making buy / sell calculations, with a
whole IT infrastructure designed to automate the execution of that single
spreadsheet file.

Sure, it may be stupid, but it's democratized programming. That's why
spreadsheets work.

~~~
jimmcslim
I remember working on a system at an investment bank in London before the GFC
that valued exotic derivative portfolios on a grid of machines all running an
instance of Excel; each trade was represented as a separate spreadsheet with
well known locations for market data to be inserted.... madness!

~~~
angersock
Wow. That's...I'm not even mad. Excel as an actor framework?

------
jimmcslim
There have been a number of efforts to build a better spreadsheet. One that I
was quite familiar with (I bought a license!) was Resolver One [1] that was
trying to build a spreadsheet with support for Python at the cell-level.
Unfortunately it didn't take off, and the team moved onto Python Anywhere [2].
Excel clearly has strong network effects!

I loved this article [3] from 2008, that argues that Excel and VBA were
responsible for the credit crunch, that later possibly lead to
Reinhart/Rogoff's flawed research using Excel... its spreadsheets all the way
down!

[1] [http://www.resolversystems.com/products/resolver-
one/](http://www.resolversystems.com/products/resolver-one/) (but link appears
to be dead [2]
[https://www.pythonanywhere.com](https://www.pythonanywhere.com) [3]
[http://www.theregister.co.uk/2008/01/21/vba_office_victory/](http://www.theregister.co.uk/2008/01/21/vba_office_victory/)

~~~
maxerickson
_I loved this article [3] from 2008, that argues that Excel and VBA were
responsible for the credit crunch_

That's a rather aggressive phrasing of what the article says, that many of the
calculations involved in the credit crunch were done in Excel/VBA.

~~~
jimmcslim
Yes, I agree its a bit aggressive. Ultimately it was human nature...
spreadsheets don't cause apocalyptic financial crises, overzealous capitalists
with spreadsheets cause apocalyptic financial crises!

The author was (is?) a recruiter for quants and financial developers so he
certainly had a unique perspective.

~~~
collyw
Is it just me, or does the financial system seem to be designed in precisely
the opposite way of how we try to design good software? One huge intertwined
web, where one small thing indirectly affects thousands of other things. Its
seems to be the equivalent of a "big ball of mud" type piece of software.

~~~
seabee
You are making the mistake of thinking the financial system was designed in
any meaningful way!

------
filearts
The beauty of Excel is its ability to do many things 'well enough' in a way
that is accessible to a lot of people.

My dad put together his architectural plans for a new house in Excel because
it is a tool with which he felt comfortable. Some people do calendars in Excel
while others try and recreate a full General Ledger system. Because it works
well enough, people don't see the need to invest the time in learning a new
application and instead invest their time in pushing the limits of the tool.

As someone who builds financial models and who audits those built by others
(and is a competent programmer by night), I think that the key source of risk
in Excel models is that the tool has no knowledge of intentions. What I mean
by this is that if I'm building a cash flow model or if I'm doing a pixel
drawing, Excel doesn't care; as users, we are forced to create our own
structure and build in our own checks and balances. If I make a balance sheet
in Excel that doesn't balance, Clippy won't show up and let me know that
things are broken.

I've often thought that it would be really amazing if a semantic layer could
be built that uses Excel as the calculation backend. This sort of tool could
understand the sorts of concepts of financial statements, projections, time-
series and other concepts that often show up in financial models. It would
have a built-in understanding of the domain-specific models that would let it
leverage that understanding to reduce risk in the building of financial
models. If I told it that I wanted to add a revenue stream that is tied to the
output of production, the tool would connect the dots between the production
schedule, any inflation and/or foreign exchange assumptions and would feed
changes in working capital according to the associated collection terms,
etc...

Before I get too carried away, the point is that this type of semantic layer
would be much better at preventing and detecting anomalies and potential
errors in the development of a high-risk financial model. Does anyone have
experience with any such tools?

~~~
taylodl
That's why I always thought it'd be a good idea for Microsoft to expose such a
semantic layer via a .NET API. When you can program an Excel sheet using the
.NET languages and have access to the broader .NET libraries then all kinds of
interesting possibilities start to open up. The fact is business users _love_
Excel and a great number of CRUD applications could be easily built on top of
Excel if only we had a reasonable programming environment (keep the VBA, you
could create a VBA implementation on top of .NET). That would be win/win for
everyone!

~~~
daigoba66
I always got the feeling the Office team and .NET team never got along well
considering the completely lack of cohesion between _any_ of their respective
products.

------
opium_tea
In my experience most civil engineering organisations use spreadsheets for the
majority of design calculations. Some large organisations even still insist on
engineers writing out calcs by hand. For example, a calculation that
determines whether or not a retaining wall is of sufficient size more often
than not will be completed in excel. Ditto with the calculation that checks
the weight bearing capability of a column or beam in a large building.

Bespoke tools will be bought for tasks too complex or important for excel -
finite element analysis, problems involving non-linear springs etc. Some time-
consuming, repetitive tasks may also be deemed worthy of more automated tools,
but on the whole the engineering industry is very much in the dark ages when
it comes to modern software approaches. I often wonder what you'd end up with
if you introduced a team of computer scientists into a civil/structural
engineering company and told them to assist with analysis. I imagine you'd get
some pretty innovative approaches to concept screening/design/cost-optimising
etc.

~~~
sanarothe
This is what I'm hoping to do with a software background going to more
traditional engineering.

It's just a little offputting when people talk about the tools they (have) to
use -- When I first started, I would never have expected Excel to be so
prominent, and Matlab so absent.

~~~
NamTaf
Because not every engineer is a programmer. Especially not every non-CS/EE
engineer. _Especially_ not every non-CS/EE engineer over ~30.

Yes, they've been exposed to programming at some level, but they are not
_programmers_. Even as a late-20s mech engineer, I am far above most of my
peers when it comes to slinging code and I'm a rank amateur. Many wouldn't
even want to touch it.

You're mistakingly assuming that the path of least resistance is "teach non-
CS/EE engineer to program because they are technically minded and can easily
pick it up" rather than "adapt mathematical process to suit excel".

------
notacoward
Also, you shouldn't use an unstable-at-load web server for important writing.

~~~
nkozyra
Ironically, it's Spreadsheet Server 2014.

------
thomasbachem
Site seems to be offline:
[http://webcache.googleusercontent.com/search?q=cache:http://...](http://webcache.googleusercontent.com/search?q=cache:http://lemire.me/blog/archives/2014/05/23/you-
shouldnt-use-a-spreadsheet-for-important-work-i-mean-it/&es_sm=91&strip=1)

------
eggoa
_Yet spreadsheets do not allow testing._

This is an absurd claim. Any good spreadsheet will contain multiple self-
tests, either in live formulas or via macros.

~~~
Noughmad
"good spreadsheet"

I think the biggest problem is the small crossection of these two terms.

~~~
walshemj
there is a cross section who knew :-)

In the past a company y that I worked for went down and one of the factors was
some wacky spreadsheet in our finance workflow :-(

------
Pxtl
Honestly, I don't have a problem with the fact that Excel is used in so many
cases - programming for non-programmers is important.

My problem is that Excel is _terrible_. Its formula system is painful and the
formulas are invisible.

It's not that a light user DB/spreadsheet program is a bad idea, it's that
Excel made a lot of terrible decisions 20 years ago and now they're married to
them.

~~~
mistermann
> Excel made a lot of terrible decisions 20 years ago and now they're married
> to them

I can't really think of any off the top of my head, would be interested if you
could share one or two that they're permanently married to.

However, I could make a very long list of easy to implement fixes & features
they could add that would offer different ways of doing things than the
braindead way you have to do things now, while still maintaining backwards
compatibility. My conspiratorial theory is that MS is very well aware of all
these things, and they will be released only if a legitimate competitor
appeared on the scene, or slowly over time to encourage upgrades.

~~~
Pxtl
Little things like how terrible and confusing their formula language is.
Anything beyond simple arithmetic requires two Googles per cell.

Now, replacing it with Python isn't better - any language that uses double-
equals-signs is not fit for a layman-programming-platform.

~~~
JasonFruit
Your confidence in the layman is inspiring. Why is == more difficult to
understand than parentheses for function calls, for example?

~~~
Pxtl
Because highschool math acquainted them with the notion that a function call
looks like

    
    
      afunctioncall(argument)
    

thanks to sin(x) and cos(x) and log(x). Some people will even have vague
notions of functions with multiple variable inputs like f(x,y).

Basically, highschool math introduced a lot of syntax that users will be
familiar with - we're universally okay with the typical +-/* operators, for
example, and ^ isn't a stretch, neither is f(). AND/OR/NOT are introductory
Boolean algebra and while they're not universally taught in high-school, I
find many scientists are acquainted with them (although languages using || &&
! syntax are a good way to lose them). Excel's lack of boolean operators
(booleans operators are functions in excel) is disappointing.

I've always found that SQL gives a good minimal arithmetic/boolean-logic
toolkit for laymen. It's a good model to follow.

~~~
thaumasiotes
If you think it's normal to write sin(x), cos(x), or log(x), you've been away
from math for too long. Those would nearly always be written much more
conveniently as sin x, cos x, and log x. You see parentheses when you have
complicated arguments, but using parens in sin(3x+4) doesn't really differ
conceptually from using them in 5 · (3x+4), and nobody thinks you need parens
for multiplication -- compare 5 · 3x or just a simple statement like 15 · 4 =
60.

f(x) is different; you're right about that one.

------
wikwocket
Let me get this straight. It is hard to review, test and maintain code in
Excel workbooks, so we should instead _write custom software_ for these
tasks?!

Custom software is, to a non-hacker, the _definition_ of "difficult to review,
test, and maintain." You may as well suggest they input their data into a
mystery box with some pixies inside who will crunch the numbers for them. :)

Remember, Excel (and spreadsheets in general) ARE custom software, written for
this exact need, and with decades of polish and bug-fixing so that even non-
wizards can use them.

I agree that we should not fly the space shuttle from an Excel macro. But I
feel that suggesting economists should write software instead of use
spreadsheets is not only throwing out the baby with the bathwater, but also
throwing out the bathtub and the whole bathroom too. Specialty tools exist for
this very use case. Let's discuss how to improve them instead of indulging our
hacker instinct and reinventing a hammer every time we need to drive a nail.

~~~
wtbob
> It is hard to review, test and maintain code in Excel workbooks, so we
> should instead write custom software for these tasks?!

Well, maybe before we attempt to send men with guns to further deprive people
of their property and liberty (which is, of course, the end goal of Piketty's
work and advocacy of a global wealth task) we should determine whether or not
his results are, y'know, _true_?

I know, I know, truth is such an old-fashioned concept.

~~~
was_hellbanned
What a stunningly off-topic, irrelevant comment.

~~~
wtbob
Ummm, my comment was on-topic, since if Piketty's arguments for wealth-
confiscation depend upon his research; if his research is wrong, then his
conclusions are likewise unsupported.

And yes, when writing software we should use the appropriate tools for the
task. If we're about to advocate the confiscation under force of a significant
proportion of the world's wealth, then maybe--just maybe--we should use more
complex, more expensive but more complex software development methods than
'bing into Excel and fiddle with the formulæ until they feel right.'

------
laurencei
I'm pretty sure patio11 says on one of his blogs that anywhere anyone is using
an excel file for important work is a SaaS waiting to be born.

edit: found it:

"My favorite symptom of an unmet need for software is any Excel spreadsheet
which is ever updated by one employee, sent to a second employee, updated, and
then sent back. Every time that happens a SaaS angel gets its wings."

[https://training.kalzumeus.com/newsletters/archive/validatin...](https://training.kalzumeus.com/newsletters/archive/validating_product_ideas)

------
danielweber
It makes programmers' heads spin, but the business world is totally used to
doing all its stuff in Excel.

If Excel code is hard to audit, that's means someone could write a tool to
show all the calculations being performed to get to a result. I'm giving this
idea away for free because I'm pretty sure someone has already done it.

Writing a new program from scratch has issues, too.

~~~
talkingquickly
I used to work in business modelling and a part of that was creating and
auditing spreadsheets so complex they fell into the "this should never be a
spreadsheet" category. There are several proprietary excel addons which are
designed to do exactly this. Unhelpfully I can't remember the name of the ones
I used but if you Google "Excel Auditing Tools" you get quite a few.

~~~
filearts
A popular addon is the ACE toolkit. I'm sure that there are many more that may
simply not be available for public consumption.

I do financial modelling and our firm as developed its own formula auditing
(and Excel Swiss Army knife) tool. For example, we can analyze a block of
cells and have a visual overlay of which cells contain the same formulas,
which cells contain formulas of different types (external file, external
sheet, embedded constant, constant, etc..).

This lets us really optimize our time while giving is more confidence that we
have properly reviewed an entire spreadsheet.

------
mgkimsal
Biggest issue I've had with excel and spreadsheets in general has been "put
this on the web". People spend hours/days/weeks building a spreadsheet tool
that does all their calculations, then say "put this on the web" to me, and I
can't do it. Not in their budget. To them, it should just be some sort of
magic, but... I don't know of tools to just 'put' something like that on the
web. They want people to be able to interact with it, but 'not download it'.
They won't use Google spreadsheets or any office365 online stuff.

Maybe there are some tools that can take an xlsx file and make it secure,
interactive and loggable and are also free (or nearly so?) Am I missing
something?

------
kubiiii
Maybe the problem with Excel is that errors are more likely to stay under the
radar. You can refer to the wrong cell, extend formulas with relative
reference where it should be absolute (or the opposite) whithout noticing.
When programming logical errors will more often lead to fatal errors upon
running the program.

If anything, it's more a matter of using Excel correctly. Tools like slate for
excel make it easier to audit spreadsheet especially other people's.

------
ghshephard
I'll buy that - the fact that the pretty essential vlookup function _defaults_
to guessing when it can't find a value, rather than defaulting to "Couldn't
find the value you were looking to" \- has been the source of endless errors.

Explicit really reduces the number of errors you run into - there's a lot of
implicit going on in a spreadsheet.

~~~
filearts
I think you'll find that VLOOKUP is not used that much by people deeply
invested in the modelling field. It is dangerous in the same way as referring
to cell addresses in macros. (This is where named ranges are particularly
useful)

As soon as the structure of your lookup table changes, your VLOOKUP formulas
risk being invalidated. After being burned a bunch of times, I've switched to
INDEX or a INDEX/MATCH to accomplish the same sort of thing. With INDEX/MATCH,
you are required to be explicit in selecting both the range of the data and of
the key.

------
jackbravo
The errors are not produced by using excel, according at lest to the financial
times. But to manipulation of data.

[http://www.ft.com/cms/s/2/e1f343ca-e281-11e3-89fd-00144feabd...](http://www.ft.com/cms/s/2/e1f343ca-e281-11e3-89fd-00144feabdc0.html)

Other people have praised the statistical work done on the book:

\- [http://www.nytimes.com/2014/03/24/opinion/krugman-wealth-
ove...](http://www.nytimes.com/2014/03/24/opinion/krugman-wealth-over-
work.html?_r=1) \-
[http://www.telegraph.co.uk/culture/books/bookreviews/1081616...](http://www.telegraph.co.uk/culture/books/bookreviews/10816161/Capital-
in-the-Twenty-First-Century-by-Thomas-Piketty-review.html)

So given the controversial nature of the subject, I wonder if we could point
to an objective analysis of the work.

------
chrisBob
Error prone? I haven't found an IDE yet with syntax highlighting as nice as
Excel's when I am punching in a formula that references a few cells.

------
DanBC
People here don't seem to inderstand wuite why people use spreadsheets.

I knew someone who ised the spreadsheet supplied with MS Works (a truly
_hateful_ piece of software) to create fax cover sheets and notice signs and
printed letters.

The spreadsheet was also used to creat Bills of material for electronic
subcontracting. These would be printed out and then only the paper copy was
relevant. (This worked better than you might think - that paperwork system had
been refined over years and all the kinks ironed out. Everyone knew if a part
had been ordered, arrived, booked in, kitted, issued to the shop floor, paid
for by us and paid for as a final product by the customer. Moving from that
paperwork system to sage line 100 was painful and produced a lot of confusion
and kludges.

So, people like spreadsheets because they can get stuff done. To you it might
seem like they're hammering in nails with the handle of a screwdriver, but the
answer is to make better documentation and better software.

(Also I see people using screwdrivers and other handtools and I wonder why
noone ever told them how to do it properly.)

------
sitkack
The problem with spreadsheets to counter was Chris Granger is saying, is that
they are NOT observable from a software maintenance perspective, the code is
invisible but the entire heap and the intermediate values are visible.

I cannot easily swap out one set of data for another, making it hard to
confirm my models are running correctly.

How do I diff a spreadsheet?

How do I refactor a spreadsheet?

------
joe_the_user
Just a default note that the whole "he got stuff wrong" claim itself might not
stand up to scrutiny.

[http://krugman.blogs.nytimes.com/2014/05/24/is-piketty-
all-w...](http://krugman.blogs.nytimes.com/2014/05/24/is-piketty-all-
wrong/?_php=true&_type=blogs&_r=0)

Edit: Also regarding the Rogoff data. Sure, those findings really were crap
but surprise, surprise, it was crap that policy makers were oh so eager to
hear. One might blame the falsity on the dumb use of spreadsheet but it seems
more likely it was a case of "let's fudge data in a crap fashion till we come
up with what people want to hear". Then when someone points out how it's crap,
we always blame the spreadsheets.

~~~
briandh
> it seems more likely it was a case of "let's fudge data in a crap fashion
> till we come up with what people want to hear"

That is a pretty bold claim.

~~~
joe_the_user
Rogoff has a pretty long history of publishing what well-heeled people
apparently want to hear, it's amazing that he was able to hide behind the
"this was just a spreadsheet error" point.

The problem is I've mostly observed his chameleon act in radio interviews pre
and post crisis. The pre-crisis interview, he claimed their was no problem and
the post-crisis interview, he talked about how angry he was against the folks
who claimed there was no problem, so unfortunately I can't provide a link (the
files probably wouldn't be online anymore even if I took the trouble to find
the link - smart shills do things by voice, it seems).

------
neotrinity
having worked for Investment Banking clients I can see what the author
intended to address.

I have seen a lot of abuse/ over use of excel/VBA

I once had to write CRUD front-end with Excel/VBA activeX components
retrieving data from Sybase / SQL Server databases.

Which could have been easily engineered as a simple web-app. "But NO .. IT HAD
TO BE A FUCKING SPREADSHEET"

~~~
collyw
Welcome to my world. We are using excel for everything here, because basically
it means the user doesn't have to think. That means there are countless
errors, and I get to be a data entry clerk rather than a software developer.

------
felienne
I wrote an answer to this on my blog
[http://www.felienne.com/archives/3355](http://www.felienne.com/archives/3355)

and @seansickle Thanks for linking to my stuff!

------
pasbesoin
I've observed disasters created in spreadsheets. I've also observed -- and
occasionally created -- robust models and calculations.

One flipside is that, if you know what you're doing, it's very easy to created
orthogonal calculations and comparisons in order to check your work. And, if
something doesn't compare or look right, to track back step by step and
through all the precedents to find the fault(s).

Databases can also create and hide problems, particularly if they are not
understood and/or designed and set up properly. And some problems thusly
created can be rather opaque, particularly to those without a good
understanding of or access to the underlying design and the principles upon
which it rests.

As with many things, ultimately I found that the problems lay not with
spreadsheets, per se, but with the people using them and with the
organizations that tasked those people without consideration for their
capabilities (and limits thereof) and without adequate resources to do the job
right.

Personally, there has been a time or three where -- in significant measure due
to such resource constraints -- I would have been sunk without the flexibility
that a spreadsheet -- often in combination with some programming and database
work -- provided me. For example, I could quickly and programmatically deal
with the 95% that was clean enough, and then manually go through and figure
out and adjust the crap.

Ideal? Perhaps not. But then, "business" seldom is. (Nor "life", for that
matter.)

------
ianamartin
Meh, use the right tool for the right job. Sometimes that tool is a
spreadsheet. My biggest complaint about Excel is that it fails gracefully when
I would prefer that it pitch a fit.

------
endlessvoid94
This is all fine, but he doesn't suggest an alternative. Which, of course,
there aren't any. Which is the reason it's a problem.

~~~
kitsune_
Don't social scientists and economists spend a huge amount of time learning
statistical software such as SPSS or Stata or Gauss? Why bother with Excel?

~~~
dagw
Not in my experience (economics). An econometric class used Stata and I know a
couple of statistics classes used SPSS, but beyond that it was use whatever
you want, just get the job done. And for most people "whatever you want" was
Excel, with a few people using Matlab. I never saw anyone using Stata when
they didn't have to.

Admittedly one reason for this was that all student machines throughout the
university had Excel installed and all the students had Excel on their
laptops, while Stata was only available in a couple computer labs. And it's
pretty much the same once you get out to the working world. No matter where go
in the world or what office you walk into, if you sit down in front on windows
machine, it will have Excel. Convincing your boss to buy you Stata or SPSS
just so you can do a couple of quick statistical tests that you are perfectly
capable of doing in Excel is just too much of a hassle most of the time.

------
bjoerns
"The fact that they use spreadsheets suggests that they spend very little time
worrying about accuracy." I don't get it. Why does the fact that a person uses
a particular piece of software suggest that they don't care about accuracy?
Does this mean that just because I use R or Python I worry more about
accuracy? Really? I think it has more to do with loss of control - _real_
developers hate Excel because it gives _business_ people something they can
work with straight away. Every time there's a bug which could remotely be
related to spreadsheets, people start another wave of Excel-hate. As if this
was entirely impossible in a _real_ programming language. Don't get me wrong,
spreadsheets are not perfect by any means. But neither is anything else in
this world. It purely comes down to choosing the most suited tool for a given
problem. And in Piketty's case there's nothing wrong with choosing Excel.

~~~
aidenn0
Did you read the article? The author spends most of it attempting to justify
that exact point. You may or may not agree with their arguments, but saying "I
don't get it" without addressing any of those arguments isn't a useful
response.

------
catwork
The problem with spreadsheets is not a simple issue of good vs. bad.
Spreadsheets have an extremely flexible and well accepted if not intuitive
interface. The problem is that they do not provide visibility into what
processing is occurring, require a lot of error-prone manual manipulation, and
are difficult to audit. As such, they are not really designed for testing and
disciplined business processes that ensure accuracy and data integrity. You
can't have complete flexibility as well as rigid controls.

One solution is to recognize when a given spreadsheets usage has increased to
a point where it - or some portion of its functionality - would be better
embedded in an application. Another is to have some users adopt a
environment/language like R which addresses the short-comings listed above at
the expense of being more complicated and less user-friendly. But there is no
simple solution that is going to result in spreadsheets disappearing from use.

------
nilsimsa
We use a spreadsheet to do gantt charts at work. Although Microsoft project is
available, not everyone has installed it or knows how to use it. Everyone
knows how to do basic Microsoft Excel. These schedules are only to a week
granularity so we use one column for each week and each row becomes a task.

------
polskibus
"… but I will not use Microsoft Excel to run a bank "

obviously you never worked in a big bank. Most traders use Excel a lot, if not
in every moment of their work. It depends on the bank of course but in some
banks even the trades are made from Excel spreadsheets.

------
davidgerard
Pretty much nobody will be getting _right onto_ fixing this one, you can rest
assured.

Those millions and billions of pounds and dollars and euros swishing back and
forth in the financial centres? Yeah, Excel macros. BE AFRAID.

------
rurban
Same problem for any graphical programming environment which doesn't provide
proper software tools, like diff's for updates, tests, coverage, debugging.
But nevertheless they are used and recommended very successfully.

Matlab Simulink is a very similar environment used professionally to a much
higher degree. There are extensions in some toolbox for graphical diffs, but
who uses it? Who can review the diffs in a version control system?

And those systems are pretty hard to debug also. In a 10Khz loop you cannot
step through or set breakpoints and hardly do printf debugging.

------
dccoolgai
Data in arrays, exports to CSV, facile API... what's not to love?

~~~
protomyth
I remember on job where I wrote a lot of tools to load a database by reading
an excel spreadsheet from perl[1]. It was amazingly fun.

1) perl does have some good libraries to read Excel files without exporting

~~~
macintux
Perl also has nice libraries for creating Excel files. Used that at my last
job to generate reports from MySQL.

~~~
protomyth
Yep, it is amazing how much b_tching about crappy reports ends when you just
give the raw data to a customer and let them pivot table to their heart's
desire.

------
mathattack
What's scary here is how much of Finance is based on Excel. Tons and tons of
spreadsheets. Pretty much every bank's risk measurements go through a
spreadsheet at some point. So do most trades. High frequency may get executed
in C++, but somewhere along the way it gets aggregated in Excel.

The only thing worse than being able to see all these convoluted spreadsheets,
is not being able to see the code behind non-spreadsheet systems.

------
habosa
I think tools like MATLAB could bridge the gap between spreadsheets and
general-purpose programming/data analysis. It has the huge standard library
Excel users want, a REPL for prototyping, visual data inspection, and the
ability to very quickly create charts and figures. The scripts are also easy
to share and it doesn't involve use of the command line unless you want it to.

Anyone want to teach "MATLAB for Bankers"?

------
cratermoon
Bahaha. I know of multi-million complex international 3-party deals in the oil
and gas business that were worked out on a spreadsheet. Insurance actuaries I
know of refuse to move away from their spreadsheets that do business-critical
risk table calculations.

It's idiocy, but I've seen the poor spreadsheet program warped into even worse
uses, but not for business-critical stuff.

------
rlvesco7
While spreadsheets are sub-optimal, the work was OPEN so people could find
those bugs. We need to stop beating scientists who share their data because
currently there is little incentive for them to do so.

Most scientists do not make their code and resources easily available. I'd
prefer an open excel sheet to a closed python analysis which is only slightly
less likely to be wrong.

~~~
welterde
Some groups however do share their code [1] (I am not really aware of similar
sites for fields other than astrophysics, although there is plenty of code out
there).

[1]
[http://asterisk.apod.com/wp/?page_id=12](http://asterisk.apod.com/wp/?page_id=12)

------
bhartzer
Excel can only handle a certain amount of records. I've personally found it
hard to deal with more than 100,000 rows in Excel.

~~~
morganherlocker
The limit went up considerably with the introduction of 64-bit Excel. Also,
there is a plugin called "Power Query" that allows the actual data to be
stored in MSSQL. I have seen this work on tables with 100s of millions of
records without a hitch.

------
pronoiac
I've done a bit of work building a complex Excel spreadsheet, because that was
the job - and what I wanted afterwards was an automated way to turn it into a
Python script. Collaborating in Excel is tricky - it was a large enough
spreadsheet that 32-bit Excel would sometimes crash, and version tracking was
entirely manual.

------
n00b101
I've worked for many large financial institutions as a quant and am intimately
familiar with use of spreadsheets in critical functions. I believe this is a
multi-billion dollar opportunity. If someone is interested in start-up
ventures in this space, I would be very interested in discussing further.

~~~
Iftheshoefits
How would you overcome the very large inertial barrier that exists in that
world to using something new and untested? The world of finance and investment
banking is a world of networking and extreme resistance to change. There's a
reason Bloomberg's suite of products sits as pretty as it does, and unless
some great upheaval occurs will continue to do so for many years (it's not
because the products are easy to use, developer friendly, or even good tools
necessarily).

Also, you should put some contact info in your profile, or provide it in your
comment, if you want to be contacted for further discussion.

~~~
n00b101
> How would you overcome the very large inertial barrier that exists in that
> world to using something new and untested?

I've already successfully done this. The sales cycle can admittedly be long,
but it's not impossible.

You have to create a product that addresses pain points for business users. It
has to be a compelling product and it has to solve problems for business
decision makers. Initially, you want to sell to the business, not to the IT
department. It turns out that spreadsheets are mainly used by business users
and subject matter experts, instead of IT.

The next issue is price. If you are trying to sell something for (say) $1
million to a big corporation, then it is quite a a long sales cycle (but still
achievable). My personal take on this is you do want to eventually be charging
several million dollars per corporate client, but you also should have a low
cost "entry level" version of the product. The cost has to be low enough that
a business power user could charge it to his Corporate American Express
without being noticed (say $1,000) and download and install your application
(he will need admin rights, which he might in fact have).

> The world of finance and investment banking is a world of networking and
> extreme resistance to change.

Yes there is a lot of networking that goes on. This can really work in your
favor, because there is a sort of herd mentality. Once people start talking to
each other about your product, customers will want to talk to you. If you
close a large deal, news will quickly spread and other large deals can follow.

I would not agree that there is "extreme resistance to change." In my
experience, banks, hedge funds, etc, are often early adopters and one of the
largest users of high technology.

> There's a reason Bloomberg's suite of products sits as pretty as it does

I have Bloomberg in my cross-hairs, next to Excel.

> and unless some great upheaval occurs will continue to do so for many years

You are correct that change often requires a catalyst. But I think that great
upheavals are happening everyday, but they are hidden behind closed doors,
occurring deep inside of companies and departments. In my experience, it is
precisely during those times of upheaval that you have the best chance of
introducing a new, disruptive technology into an organization. There are many
kinds of upheavals that happen. (mergers, divestitures, running into Excel's
maximum row limit, etc) and they can all be catalysts for change.

I've put my email address in profile now, thanks for the tip.

------
eddyparkinson
The process NOT the tool.

Yes spreadsheets have errors, but the only method I know of that is able to
get error rates down to near zero is due process. There are good processes
that reduce error rates to near zero, but there are no tools that reduce error
rates to near zero. The problem is the process, not the tool.

~~~
dragonwriter
> The process NOT the tool.

But the tool constrains the process, which is why you should choose tools by
appropriateness to the process they are going to be used in.

------
ganeumann
Although I agree with the author, using Piketty as the example seems counter
to his argument: the FT found the errors in the spreadsheet in a few weeks.
They were obviously not very difficult to find. Sounds like Piketty's problem
was a lack of peer-review, not the tools he used.

------
kristianp
Interesting that the article on wealth concentration (linked from the
article), doesn't mention Piketty (yet). It needs some work.

[http://en.wikipedia.org/wiki/Wealth_concentration](http://en.wikipedia.org/wiki/Wealth_concentration)

------
_pmf_
A thought experiment: is it advisable to save mission critical, structured
information as a loose collection of schemaless text files?

If it is not advisable, why is it accepted for software engineering
professionals to manage their source code this way?

~~~
octo_t
well a programming language with a specification has a pretty well-defined
schema.

The E?-BNF provides a syntactical schema for the code and the rest of
specification provides the semantic schema.

------
bhewes
I look forward to the day when F# is a first class citizen in Excel. That will
make my life much easier.

------
brianbarker
TLDR: Use the right tool for the right job.

------
pinaceae
Wrong end of the correlation/causation pattern.

Human beings make mistakes.

A lot of humans use Excel.

Hence a lot of errors in Excel.

Would all those humans program their logic in Python - we'd have a lot of
wrong calculations in Python code and an article stating to not use Python.

~~~
saosebastiao
I am human, therefore there is no difference between coding in C and coding in
Rust/Cyclone?

I don't think it works that way. The surface area where you are permitted to
make mistakes is completely different in language X compared to excel. In
Excel, `="a"+4` is a cell error but a valid spreadsheet. In python it is an
invalid program, throwing a runtime error. In haskell, it doesn't even
compile.

~~~
jasode
It's an oversimplification to assume the hypothetical Python implementation
would have less errors. We don't know that. Some type of math errors would
overlap between Excel and Python. Some types of errors would be easier to
stumble into in Python than Excel.

For example, an inexperienced programmer in Python might use floating point
instead of Decimal data type to add currency amounts. His programming loop to
sum the amounts would be _incorrect_. In MS Excel, adding currency in cells
correctly down to the penny is a no brainer. Sure, the Python programmer can
be taught "best practices" to avoid this type of error but the point is that
while Python helps eliminates some errors, it also creates new ones.

Another example is data munging. A programmer might write some Python to slurp
a data file (exported from mainframe or whatever) and do some financial
calculation on it. Load the values into an array some other memory structure.
The problem is that the memory contents are "hidden" from sight unless the
programmer uses a visual debugger or prints out all the values to inspect. The
Python programmer may not notice that some values are misaligned or garbage.
With Excel, the non-programmer imports the data file and he immediately
scrolls through the worksheet as a sanity check. His eyeballs notice that the
source data is dirty. Again, the Python "problem" can be corrected with best
practices but the point still remains: different tools create different
problems.

Lastly, Excel spreadsheets are easily emailed among dozens of people. You
can't send ???.py programs to everyone because you can't expect all the
Windows users to have the Python runtime. Spreadsheets are even sent to
iPhones and tablets and you definitely can't expect easy Python deployment
there. With less coworkers examining the ???.py file, it has potential for
_more errors_ compared to a scenario where everybody can participate in
questioning the spreadsheet's numbers.

Our intuition says that _on balance_ , the hypothetical Python/Haskell
solution should have less errors than MS Excel but I can't confidently say
I've seen any definitive proof of that.

~~~
collyw
>The problem is that the memory contents are "hidden" from sight unless the
programmer uses a visual debugger or prints out all the values to inspect.

I think that's what most people do when they are programming.

------
mantrax5
We've done business with pen and paper for thousands of years.

I think we can handle spreadsheets. While I'm a programmer and often find it
easy to whip up a program to do what many would do in a spreadsheet, I don't
have the arrogance to tell people to not use spreadsheets for serious work.

On the contrary, I encourage it. It's a wonderful tool, and, in the end,
superior to pen and paper.

