
Ask HN: Do you feel Excel is dangerously overused? - Dwolb
I&#x27;m an MBA and I took a data analytics course where we used Python and Pandas to manipulate decently-sized datasets.  Now I&#x27;m taking an optimization course and we use only Excel. However, I&#x27;m less confident in my Excel models than I ever was in my pandas-based models.<p>I feel like we&#x27;re stretching Excel further than it really should go because it hides a lot of complexity and doesn&#x27;t allow for easy testing.  Couple this with the fact there doesn&#x27;t appear to be a large open source movement around Excel models so fewer people can verify its correctness.<p>Here&#x27;s another a Reddit thread that inspired me to ask this question: http:&#x2F;&#x2F;www.reddit.com&#x2F;r&#x2F;finance&#x2F;comments&#x2F;35esdl&#x2F;inside_the_johns_hopkins_finance_class_thats&#x2F;
======
dzdt
The problem is that the surrounding tooling is underdeveloped. Spreadsheets
give the most widely accessible programming language available today, with a
feature set that makes it an order of magnitude faster for making solutions to
certain problems than any other language. Terrific!

But do you have source control? Diff? Three-way-diff? Unit tests? Code review?
Any process at all?

Any programming language lets you hack up a solution for one-off problems; for
most languages there are tools and an ecosystem to help "production" code meet
higher standards. The problem isn't that Excel is commonly used, it is that
there is no ecosystem to support its "production" use.

~~~
brudgers
In fairness, the ecosystem around Excel is more than 30 years old and the
community's expectations reflect that. There's nothing stopping someone from
writing a test harness for Excel files...it's all automatable. There's nothing
preventing a person from storing Excel files in version control or running a
diff on the underlying XML.

The reason it isn't done is because by the time end users care about those
things enough to get around to implementing them, they're probably ready for a
process that already addresses those problems with baked in
functionality...e.g. an RDBMS and application layer. When there's a business
case for something better than Excel there are products and services and
consultants that serve that business case.

------
zhte415
Excel is used as a RDBMS. It is used as an analytics system. It is used as a
workflow management tool. It is used as a templating system - I've even seen
it used to print credit card statements with z/OS on the backend (so it is not
like developers or budget are lacking). It is really easy to use, and everyone
uses it. It can easily conform to the Agile method for creating dashboards and
reports.

If you're concerned, perhaps write (or get someone to write) modules for it
that you can import. A lot of proprietary stuff gets done in C++ modules, from
investment to HR. These modules are pretty easy to break down complexity and
testing, and are defacto in my industry (banking).

Excel is, simply, not going away.

------
crdb
Yes. Yes. Yes! Omg. Yes.

The worst I have seen was a team of several hundred people who, on $200 half-
broken laptops, were running - in parallel - 500MB Excel files for hours every
day, which basically did a few linear operations on about 30 variables, before
joining the results in a superfile itself taking up to 20 minutes just to open
on a gaming-specced desktop.

In another case, I broke down an Excel model with over 50 tabs and discovered
that there were only 2 input variables. Unfortunately the CFO of that business
decided to go for the ostrich strategy (after all the monster model impressed
investors).

Every business I have ever worked in had some convoluted, cumbersome, buggy,
opaque, human-dependent Excel processes. My co-founder and I finance our
startup replacing these with tight scripts running on AWS (amongst other
things) - it's low hanging fruit. We approach these things as a black box,
figure out desired behaviour, and rebuild from scratch.

Excel wins all the time because it's a UI that managers are familiar with, and
they are often either the consumers, or want to audit your data processes.
It's got a LOT of tooling that lets non-technical users do things they
shouldn't.

If you thought Python Pandas was an improvement, try learning about the
relational model and how to use a decent relational database (I recommend
PostgreSQL, whose error messages will teach you a few things and which has a
saner type system where you CAN compare two different integer types without
getting NULL). You'd be amazed how far you can go with just SQL.

You should also do so from a UNIX OS, and from the command line (using pg,
editing a file then running \f blah.sql). It will take you a few hours of pain
but will be worth it in added productivity and transparency over any point and
click alternatives.

What you're touching on is the power of abstraction, functional vs imperative
programming, and particularly how proving programs can be enormously
productive. SQL is powerful for this because it is declarative and very close
to mathematics (Codd brought set theory to databases).

The next steps are category and type theory and then the
Haskell/Mercury/Idris/Coq rabbit hole... an unpopular one because, as someone
once told me, "businesses are resilient to application errors"...

~~~
Nicholas_C
>The worst I have seen was a team of several hundred people who, on $200 half-
broken laptops, were running - in parallel - 500MB Excel files for hours every
day, which basically did a few linear operations on about 30 variables, before
joining the results in a superfile itself taking up to 20 minutes just to open
on a gaming-specced desktop.

This is my nightmare. I work on a sharedrive with large Excel files that conk
out Excel. We are working on building a tool to do this work but it is slow
going.

~~~
jfjeschke
What does the tool do exactly?

~~~
Nicholas_C
It's a custom developed planning/forecasting tool. It's a P x Q thing, but the
P and Q are variable as to the type and mix of the Ps and Qs and need to be
combined/sliced into many different ways for analysis/planning/forecasting.
The team actually developing it has been very challenged.

~~~
crdb
You're not _really_ challenged until the growth rate of their Excel monster
outpaces the rate at which you automate the existing stuff... thankfully Excel
has a 1 million row limit, and it always takes them time to figure out how to
shard an Excel database :P

~~~
Nicholas_C
Well in this case it's actually mine and the rest of the finance group's Excel
monsters (plural) that are being automated. The devs are doing yeoman's work.

------
stevep98
An friend of mine told me a story about excel abuse in her company. She's in
HR, and it was annual review time. They were passing around a spreadsheet with
'employee name' and 'raise %'. Each manager was supposed to enter the raise
for their employees.

So, a few days later, the spreadsheet came back, and the HR director was ready
to process all these raises into payroll. My friend took a look at the sheet,
and nothing made sense. Low performers were getting high raises, and vice
versa.

After some investigation it turned out that some of the managers had sorted
the spreadsheet so that they could find their employees better. But someone
had screwed up, and only selected the name column. So, everything got messed
up.

What was amazing that even after this was pointed out, the HR director was
still pushing for the raises to go through as-is, because it would take too
much time to do everything again.

------
lvspiff
Many people here have already echoed the overuse of Excel as a RDBMS - I have
noticed this as well. The problem however (at least in my area - education)
has come from the DB admins not wanting anyone to access "their" data due to
security and speed concerns. Their solution - give someone CSV output from the
table rather than allow then read only access to the data. Problem is you get
CSV files saved to excel files and you end of with thousands of Excel files
floating around between people all based off the same sets of data - that if
given a better method of direct access could be more accurate and effective.
The lack of a direct connection to the database of record is what I have seen
causes much of the overuse of Excel and if more tools (i.e. Tableau) were
implemented to make those connections I think (again in my area at least) the
use of Excel would dramatically decrease.

------
brudgers
People have "voted with their feet". That is why they use Excel...and before
it Lotus 123...speaking of which my experience of editing letters that the VP
of Engineering wrote in 123 is illustrative of why people use Excel for so
many things: when faced with a problem Excel is cheap and available (and by
cheap I mean the purchase is already a sunk cost). The VP of Engineering
wasn't budgeted for a copy of Word Perfect and had he been he probably wasn't
going to have the time to learn how to use it before the letter needed to be
out.

There is, at least in my mind, perhaps a lesson for a prospective MBA here.
People are not stupid and when everyone is doing something that isn't ideal,
it may be because it is good enough or even the best alternative in practice.
Part of the reason there isn't a large open source community around Excel is
that there good professional support is ubiquitous and Excel has commercial
grade documentation in such a diversity of forms that there is something close
to any user's particular needs or expectations.

[http://www.dummies.com/how-to/computers-software/Software-
Fo...](http://www.dummies.com/how-to/computers-software/Software-For-
Seniors/Using-Office-2010/Working-with-Excel.html)

~~~
rskar
Adding to brudgers comment:

Rather than "Excel dangerously overused", I feel that technology is generally
overused in an undisciplined unmindful inefficient way. Excel is often
presented as the exemplary to such overuse.

I think that's because people who are motivated to come up with a solution to
their common or more pressing problems of the day will naturally reach for the
immediately accessible tools, and the more familiar the tool the better. In
the usual corporate environment, Excel is already available for use.

Such people will be subjected to annual reviews that will not explicitly
include any acquired or proven computer savviness. Instead they will be
measured against the expectations of their job roles, which is likely not
"applications developer". If they manage to create custom tools (via Excel or
the many other manifestations of Microsoft Office) that have empirically
demonstrated their worth to somehow become part of the production environment,
then so much the better as far as the department or business is concerned (at
least at first).

These people are not looking to become programmers. And besides, corporate
desktops have gotten tremendously locked down since the late 1990's. It would
take a fair bit of effort to make a business case to allow for some other
computing environment (or tool) to be put on a corporate device. Which means
experimentation would have to be done at home on their own time. Has it been
mentioned that these people are not looking to become programmers?

We should put this sort of situation in perspective, and celebrate that there
were and are folks willing to really learn a tool sufficiently as part of
their problem solving in becoming more effective in their job roles. At some
point of course the home-grown Excel-based solution will hit its limitations,
which is where we as the truly expert computing professional types step in and
get to work. If you really understand Excel and VBA and COM, you can really
tease out quite a bit of the user story and business logic out the Excel
"prototype" and come up with the more elegant solution.

And we can do this because we should be well versed at using this sort of
technology in a more disciplined mindful efficient way.

~~~
collyw
The problem is that prototypes hardly ever remain prototypes and get hacked
and hacked until they are a true nightmare.

------
kfk
The point is that many BI enterprise solutions are mere data extractor with
very little intelligence built into them. That means that people have to
download big chunks of data and run models in Excel. No, you won't convince
people to move away their models from Excel, but you can/should build an
analytical layer on top of your BI warehouse solution so that people will use
Excel less. Unfortunately, big BI stuff is very unflexible and building good
models (aka: dashboards) can take months or even years. Excel is the de fact
answer to that unflexibility, solve that and you will see far less
spreadsheets moving around.

------
JamesBell
I've participated in several assessments of a companies and departments. An
inevitable question in the interview process for department heads is: "tell me
about your Excel addiction". This is a sign that their BI tools have failed
them and frequently that they have neglected to invest in reporting. The next
level down is that they are performing operations necessities with Excel. The
last level is that they have no operations software and it's all done in
Excel.

------
panglott
Excel is a great tool. It's great for keeping track of small or ad hoc
collections of data. The problem is that it doesn't "grow" naturally with its
needs.

A small business or project can start with Excel, then migrate to Access, then
migrate to SQL Server for ACID compliance. But each migration step is not
trivial.

I keep thinking that Kexi would be a good way to go rather than Access, but it
is Linux-only.

------
floppydisk
Programming is hard, period. You have to visualize the data in your head,
write code to operate on it, and iteratively keep improving it or tracking
down wonky bugs until you get "the right" answer. It becomes time consuming,
tedious at times, and can lead you down rabbit holes for perpetuity. In fact,
it requires a certain way of thinking that not everyone possesses or has the
desire to develop the skills in.

Excel, on the other hand, lets you bypass a lot of these issues with
programming and get to results faster in some situations. First, it is
completely visual. I don't have to visualize the data, it's right there in
front of me. This reduces the cognitive load on the worker. Second, I can
develop iteratively and see the results immediately. Yes, you can do this in
programming, but any changes require a recompile-->execute-->look at the data
process that Excel includes intuitively. Third, Excel is accessible. People
don't need to memorize syntax or commands, they can simply got to the formula
finder and start typing in what they're looking for and excel will load the
formula for them.

Can you build amazing Excel programs/spreadsheets? Yes. Can you build wonky
spreadsheets that would make Bill Gates cry crocodile tears? Yes. Can you
build amazing programs? Yes. Can you build convoluted programs that would send
the flying spaghetti monster running in terror? Yes. There are two different
problems conflated here. One problem is how do I answer a question. In some
situations, especially small data ones, excel will provide a faster and more
intuitive way of getting to the answer that is easily understood by the widest
audience. The other problem is how do I build something that fulfills the
requirements and isn't a mess. I think a better question might be what are
Excel best practices that keep spreadsheets from becoming a mess.

------
EvanPlaice
Yes, it's a (reasonably) intuitive front-end for a lousy data format.

Xlsx is supposed to be an 'open' format but it's overly complex and a PITA to
parse. Like most things produced by Microsoft, there are a ton of unnecessary
and overly-complex features added on that have little/nothing to do with
data/calculations.

External tooling sucks (because documentation of the format sucks). The
additions that are supposed to support networking suck. The security features
that are supposed to ensure read-only access suck. The file size of XLS files
are unnecessarily large and not conducive to transmission over the web. If
there was some way to reasonably decouple the data from all the formatting,
object, validation, security, and scripting overhead it wouldn't be so bad to
work with.

Unfortunately, it's the most complex data structure manipulation format that
'business types' can reasonably understand so it gets used for anything and
everything that deals with data and/or calculation.

------
murbard2
Excel can be a great prototyping tool. I'm a statistician/programmer, and yet,
I sometimes like to explore some ideas quickly in excel. It has a reactivity
that many programming languages like... it's very nice to change a parameter
and see an entire computational graph update and a chart change in real time.
The optimizer is actually decent and allows you to fit a lot of models. Excel
is actually really cool.

BUT

using excel for ANY business critical purpose is insanity. Some desks in some
banks rely on crappy old buggy spreadsheets to track the p&l of some large
transactions, it's horrible. Academics use excel models and have been bitten
because the models they produce are very hard to audit and to check for bugs.

So by all mean, use excel to play around with data if you would like to, and
when you get a feeling of what you want, build a clean, well documented script
with python tied to a database.

------
DanBC
You may he interested in the EU SPreadsheet Risks Interest Group which talks a
lot about the problems of error checking and auditing spreadsheets.

[http://www.eusprig.org/](http://www.eusprig.org/)

Ray Panko also has some excellent work on this.

[http://panko.shidler.hawaii.edu/SSR/](http://panko.shidler.hawaii.edu/SSR/)

Personally: yes, spreadsheet use is scary. They're error prone; hard to audit;
overly trusted. One person will create a very complex spreadsheet. If that
person moves on the company is stuck with a mostly opaque blob created by
someone who hasn't had any training in creating readable maintainable code.
There are risks for sensitive data to be inadvertantly distributed to the
wrong people.

------
nemexy
The issue with Excel is that it is extremely easy to use and is widely
popular, especially for new companies as it is available to anyone with an
Office plan, which is available on almost any Windows computer with very few
exceptions. So when it comes to that a new business will rarely use something
more complex and more expensive in the short term, they just prefer to use the
Microsoft Office programs.

This is something I have mey a lot, including in the hotel/restaurant niche,
where I work. All of them use it exactly because it is easy and is insanely
powerful even for newbies. On the other hand at a certain point they start to
learn about the negatives of it and an application that will do what they did
in Excel in a proper way, is very easy sell.

Oh, about that... Pandas is great!!!

------
glennp
Personally, I think Excel models are overly relied upon (which pretty much
amounts to the same thing).

Excel provides a great tool, my issue starts when too many Excel spreadsheets
are being used to provide decision support, with little or no control / audit
of the logic in these sheets.

------
shaftway
I spent 2 years at a major bank converting processes from Excel sheets to a
database with a strong distributed processing platform around it. They were
doing risk analysis of tens of billions of dollars in CDO swaps (hey, it was
2007, these things were like printing your own money). That's where I learned
about how much you can abuse Excel. FTP CSV output to servers? No problem.
Copy data out to custom XML and import it back? Sure. Shell out to command
line quant programs to run simulations? Fuhgeddaboudit.

So much abuse.

------
jfjeschke
I think you're correct in that its overused, but the jump from using a
spreadsheet and programing even the simplest macros, is a big one for a large
portion of the those use Excel. Much less using a RDBMS or programming in
Python. Even trying the macro recorder makes most users assume 'that's not
meant for me'. If the requirement of the job include building a model that can
be used and manipulated by anyone than Excel is the only tool for the job,
anything else requires more training.

------
MalcolmDiggs
I think excel has a lot of good features for small samples. But when it comes
to larger datasets, the macro and scripting capabilities (in Visual BASIC) are
just too slow to be practical. So in that way, yes it's overused for large
datasets, there are many many better/faster toolsets out there.

------
cafard
Excel is fine for what is intended to do. However, too many people use it as a
database.

------
koz1000
A wise man once told me: all project management eventually becomes an Excel
spreadsheet.

------
Simulacra
35,000 line spreadsheet? Gosh, I couldn't imagine how that could be a bad
thing...

