
Gene name errors can be introduced inadvertently when using Excel in bioinformatics - soundsop
http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pubmed&pubmedid=15214961
======
jherdman
I can't get into specifics per confidentiality at my job, but I'll tell you
this: if you saw some of the things being done in the financial sector with
Excel that I have you'd cry and go into a fit of rage.

~~~
ConradHex
Whenever I think about the financial sector at all, I cry and/or go into a fit
of rage already. So it's probably for the best that I don't know.

------
mixmax
A few years ago I had an office at an incubator, and my neighbour company made
a better spreadsheet. He had worked in the financial sector before, and had
conducted research that showed that around 90% (Don't remember the exact
number) of excel files in the financial sector had mistakes in them. Scary
stuff..

~~~
trapper
If you are doing any sort of data analysis you really need to avoid
spreadsheets.

Unfortunately, most data analysis is just as bad. I have worked with SAS & R
code, thousands of lines long, all without a single test. Basically a big
series of hacks. Mostly, statisticians I have worked with have no _concept_ of
testing, or even why it's necessary.

I shudder to think of the gross error in research results from all I have
seen!

~~~
time_management
Trying to do real programming in SAS is a nightmare. The language sucks. Also,
it's proprietary and expensive.

It's surprising that statisticians would fail to understand the importance of
testing, given that a lot of the root issues in statistics (e.g. the
Bayesian/frequentist dispute over "what is a probability?") are
epistemological.

~~~
thwarted
One of my undergrad classes was for SAS, and they mixed MIS, CS, math, and
statistics people in this class. It took weeks for those on the CS track to
get that when your code says "proc average", you're _calling_ a procedure, not
defining a function. We had no idea where the code was supposed to go. There
was a serious disconnect between the students and the professor in that class,
who, I later realized would have just needed to say "this calls the procedure
average" and things would have been fine. We had no idea, initially, how we
were getting results.

Of course, we had to run all this on VMS and print it out on a fan-fold line
printer, and it wasn't very interactive. SAS products seem to be more visual
and interactive now.

------
ars
This paper is from 2004, any updates? Did microsoft add an option to disable
the conversion? Are public databases cleaner?

------
spolsky
Excel has a cell format called "TEXT" that prevents all of these problems. If
you plan to use a range of your spreadsheet as text (and not as numbers or
dates), format it as Text and excel will never misinterpret what you type.

You can also start the cell entry with a single quote to make a single cell
store text.

------
biohacker42
Excel shouldn't be used for science. Or for that matter anything important and
numerically intensive.

But there's silly people everywhere, even in science.

------
jodrellblank
This is the sort of error that worries me a lot about "programmable" business
systems.

That people who don't know the domain and tools well enough, and don't have
the pernickety edge-case worrying mindset will introduce errors like this.
Writing a report that doesn't take into account a new database table, editing
permissions and leaving a big hole in them, etc.

What I'm not sure about though, is whether it really matters. Whether the
tradeoff with only allowing certain people to make changes saves enough
trouble that it's worth the limitations. Businesses with people are very good
at surviving errors, after all...

------
albertcardona
Beyond the floating-point problems, there's also the automatic name
capitalization, the automatic replacement of "teh" for "the", the automatic
conversions of numbers into dates, and many more.

Everyday practice gets me a bit more and more annoyed at all the automatisms
that software office packages come with. And they are nearly all on by
default! It takes a skilled person about 10 minutes to disable nearly all
automatisms in MSWord or in OpenOffice Write, in several menus. In MSExcel, I
don't even know where to look myself (having switched to Gnumerics long ago).

We would all appreciate one single button that says: "Don't do anything unless
I told you to."

~~~
jgrahamc
"Don't do anything unless I told you to."

For a moment there you made me have a bad flashback to Microsoft Frontpage
circa 1997.

------
time_management
I worked at a hedge fund where one of the phone-screen brainteasers was "How
many zeros are at the end of 35!?"

The very not-smart people would try to use Excel for this, and get hilariously
(and obviously) incorrect results due to rounding errors.

~~~
sireat
Took me a while to figure out where the "extra" 8th zero came out.... namely
25. That there are at least 7 zeros is evident, since 5 and 2 is needed for
each 0, and 5s are more scarce than 2s in 35!

~~~
bd
Aha, so that's the elegant way how this was supposed to be solved (in your
head and on the spot).

~~~
time_management
It's basically an honesty test, since the interviewer explicitly says not to
use paper or a calculator (which would include Excel). Quants are given harder
questions, and I doubt that a trader would be dinged for not knowing the
correct approach, since it's not a brainteaser that would correlate highly to
trading acumen. But the Excel answer (14, I believe) is a very wrong one.

~~~
bd
Interesting, 35! according to various tools:

    
    
      10333147966386100000000000000000000000000 (Open Office Calc)
      10333147966386100000000000000000000000000 (Excel)
      10333147966386144000000000000000000000000 (Gnumeric)
      10333147966386145000000000000000000000000 (Windows Calculator)
      10333147966386144929666651337523200000000 (Python)

~~~
time_management
Python is correct. It's promoting to bignum, while the others are using
floats, presumably, and passing them off as integers. I think they ought to
display 1.0333147966386144e40.

Your average wannabe-banker/Wharton undergrad has never heard of Python,
however, but has used Excel. Quants are familiar with Python, but they get
harder problems.

Most quants don't, however, get to use Python. For some inexplicable reason, a
lot of them are mired in C++, of all languages, and tend to be poor-to-
mediocre programmers. There are exceptions, though; the one I worked at used
an FP language and had excellent programmers.

~~~
bd
_I think they ought to display 1.0333147966386144e40._

Actually, they did show the result by default as floats in scientific
notation, it was me who changed number display options for easier visual
comparison.

I assumed quants preferred C++ because of performance. When I occasionally
have to go back from Python to C/C++, I'm surprised much faster it is.

