

Make up your own rules of probability - wallflower
http://www.johndcook.com/blog/2009/09/18/make-up-your-own-rules-of-probability/

======
dmlorenzetti
_These [incorrect] rules are not explicitly stated in the methods; we inferred
them... from formulae embedded in Excel files..._

One of the things that most puts me off from using spreadsheets is the
difficulty of validating calculations. The formulae themselves are not
visible, so it takes a lot of mousing around to convince yourself the cells do
what you think they should. For example, I've seen problems where a long
"vector" calculation is correct in the upper reaches, and wrong down below,
due to failure to completely "fill down" the calculation.

I wonder how many of the spreadsheets studied by Baggerly and Coombes
expressed input errors, rather than errors of understanding?

~~~
MichaelSalib
_One of the things that most puts me off from using spreadsheets is the
difficulty of validating calculations._

A very good friend of mine worked for a major aerospace company that used
Excel spreadsheets for almost all engineering. I'd say the biggest problem is
that if you're doing important calculations in Excel, then you're completely
ignorant of the last 40 years of research into defect reduction and detection
in computer science. That ignorance is not completely unintentional either:
there were a lot of people who literally pride themselves on not being like
the dumb computer folks.

Plus, if spreadsheets are your tool, you likely have very bad tools. So when
you make a spreadsheet to run the power budget for satellite A, the obvious
thing to do when satellite B comes up for development is to copy the
spreadsheet containing the analysis from sat A, change the filename and bury
it somewhere on the shared network drive. Then, if you find a defect in the
spreadsheet, you have no way of tracking which copies have that same defect
because they're scattered all over the shared drive.

In any event, my friend found tons of bugs in spreadsheets, many of which were
actually important. Probably the best bug involved a power budget for
satellites that inverted electrical current so instead of comparing current to
a max theoretical current value, it compared 1/current to the max value.

~~~
mturmon
I can verify that this practice is still common. I have been amazed at the
size and complexity of some spreadsheets in aerospace applications.

It's so clearly the wrong thing, and yet the authors of these beasts are
always really pleased with their efforts.

Besides the "old shoe" feeling of Excel, I think there's something about the
automatic updating done by the spreadsheet that users find gratifying. You
alter a cell and the program launches into a bunch of updates automatically.
It's like seeing the array of blinking lights.

------
johndcook
The problems pointed out in this post have turned into a major issue. A
clinical trial was designed to test cancer treatments recommended by the
erroneous analysis. The trial was stopped (and restarted and stopped again)
and now the National Cancer Institute is conducting an investigation.

------
brucebadeau
Part 1: I'm a financial model builder since Excel came out on the MAC in 1987
(I’m now however a PC and 63 years old www.fser.net). Some of the inaccuracies
are caught using accounting articulation points: Assets = Liabilities +
Shareholders' Equity... if out of balance, something is wrong... What is wrong
is a bit tricky and takes thorough understanding of how Income Statements,
Balance Sheets and Cash Flow Statements interact. Unfortunately, articulation
points don't catch algorithmic mistakes. My models have some very complex
algorithms built over the years and the best thing to do is graph each line
“quickly”, looking for breaks or unusual patterns in each line item. My models
are regularly 3,000 to 5,000 lines long - not because I like complexity, but
because I have lots of ways of expressing revenues and expenses. I can model
most business in a few hours. Recently, I built a steady state feasibility
model for growing, harvesting and extracting lipids from algae using
heterotrophic and autotrophic systems. I did make a graphing tool that would
allow me to capture the relationship between any input and output as I
incremented an input 10 times. In most cases the output was basically linear
as the model was basically a linear calculator… I would suggest any builder of
engineering models to implement an incrimination graphical recording system as
an error checker.

------
l3amm
These kinds of things are why I hate papers with undetailed methods sections.
Being too explicit in the methods section is obviously unnecessary (e.g.
restating Bayes' theorem), however your assumptions and probablistic methods
should at least be detailed in supplemental notes.

~~~
krallja
Some of the errors are in applications of methods that I don't feel should be
documented. I would be worried if I saw someone explaining what they did to
find P(AB), even if their method were correct. Of course, I would be extremely
worried if they claimed it was max(P(A), P(B)), as the original article
quotes!

------
brucebadeau
Part 2 To underscore the other knowledgeable experts on this bog, finding the
error before you replicate is very tough, indeed. It takes time, several sets
of devoted eyes to methodically question each algorithm, which unfortunately
are written in cells, which are hard to edit. One thing I learned was to use
spaces in a cells code… Spaces inside a huge concatenated IF Statement (the
type I love to make), have no effect on calculation, but a huge effect on
legibility and understanding… Remember, Excel 2007 allows something like 7,000
characters in a cell, so spaces can work… Use Alt Tab to move code to the next
line… give yourself a clear line and line up the code so that similar code
components are in columns… this really helps when you come back a year later
and have no comments to remind you what you wrote… also if you use VBA – to
calculate and fill in a cell, be sure you clean that code and leave some
narratives as to what you are doing…

------
brucebadeau
Part 3 A few weeks ago I wrote a simple fuel saving device feasibility
calculator. I whipped off the Excel version in a couple of hours…. Then, I
tried to do all the calculations inside controls using VBA code, placing
output in text boxes… OMG … this “simple” experiment turned into a long
nightmare… What we take for granted with Excel – i.e. dependent relationships
being automatically programmed is NOT done in VBA… I had to place code in
every dependent control (SpinButtons) to calculate several text boxes and to
carefully format them both numerically and graphically … like Green Boxes for
Positive Numbers and Pink for Negative Numbers… Now I should have call up a
module and avoided duplication, bu nonetheless, this was a tough edit…
Spinning numbers in and seeing if every text box that should have registered
movement actually did…

------
brucebadeau
Part 4 Why did I do this excruciating experiment? Well, I wanted to show
clients how I can make Excel look like a bullet proof application with output
on top of sculptured, raised backgrounds… i.e. the look and feel of make-your-
own Dashboards… Bottom line – I don’t think clients will pay for this
expertise… You can make you own Excel Dash Boards 9 as I have done many times)
without resorting to the extremes I recently undertook… …So having confessed
my inexperience about the difficulty in making non- Excel calculators, I thank
my luck stars that we have Excel… without out a doubt the most flexible and
beautiful numerical authoring tool on the planet offset by the extreme
difficulty in detecting algorithmic mistakes. Isn’t life a trade off, anyway?
Thanks for reading this … Bruce Badeau bruce@fser.net

------
jacques_chester
Spreadsheets are a fascinating blight on the face of science and business.
They offer so much convenience and speed, but amplify the risk of errors
quickly.

There is actually a research group _devoted_ to studying spreadsheet risks:
<http://www.eusprig.org/>

