
Should biologists study computer science? - terpua
http://arstechnica.com/science/news/2009/07/should-biologists-study-computer-science.ars
======
buckwild
In case you guys are wondering about current college curriculum, I just
graduated with a BS in biotechnology-bioinformatics so I can provide a little
insight.

My core classes consisted mainly of biology, chemistry, organic chemistry and
physics. Other classes that my degree required was advanced mathematics (with
biological application), advanced statistics, and computer science.

About the computer science portion of my degree: I was required to learn C,
C++, discrete math, Perl, and data structure/algorithm design. I chose to take
machine language as an elective.

What I've learned in industry: The CS foundation I built in college was
critical. Although Perl is widely used where I work, languages like R and C
are used more often (for my particular projects). I've also learned that my
job is to bridge the gap between biologists and computer scientists.

Biologists say what they want to get-> Statisticians/Mathematicians think up a
procedure -> I make sure the formulas make sense with the subject at hand,
program it in Perl or whatever -> CS people optimize it and do their magic to
make it run super fast -> checked by everyone to make sure its okay. -> stats
analyze and feedback to the biologists.

My point being, I always think everyone should learn more math, but the
industry has found a way to get around everyone needing to learn everything
(jack of all trades master of none) to having experts work together towards a
common goal (an oceans 11 type set-up). Everyone has something special to
offer. Personally, I think the current set-up is working fine. Although
everyone should learn more advanced math (or biologists should learn more CS),
not everyone is willing and/or capable.

I hope this was helpful.

~~~
biohacker42
Upvoted for comparing science work to oceans 11.

------
biohacker42
I'm conflicted about this. A little knowledge can be far more dangerous then
no knowledge. I have seen things... things I can't unsee. Things done to
software by biology and chemistry Ph.Ds that still give me nightmares.

But make no mistake, modern science is neck deep in serious computering. Not
being computer literate is almost as bad as being just plain illiterate.

So here's what I think about this. Every scientists who's not a physicist,
mathematician, or computer scientists needs to study more math, more stat, and
more cs.

In fact I would go so far as to say everyone needs the equivalent of an
associate degree in cs to get a Ph.D. in anything. For mathematicians and
physicists this would happen almost without any extra effort, for biologists
it might be quite a bit of extra effort, but well worth it.

~~~
davi
"A little knowledge can be far more dangerous then no knowledge. ... Not being
computer literate is almost as bad as being just plain illiterate."

Need examples before I can know whether I agree or strenuously disagree.

~~~
biohacker42
Vis-à-vis A little knowledge :

One particular Ph.D who I worked with had created a much lauded database. It
was done in MS Access. It stored natural numbers. Each sample contained a few
gigs of natural numbers. Numbers could be anywhere form 0 to 99999. _So each
entry contained all possible numbers in increments of 0.02, with Null in all
places where the sample did not contain that number._

This is what happens when people have never heard of foreign keys or a one-to-
many relationship or a many-to-many relationship.

This could have been done so it's a lot smaller and faster to search. The
place holder could have been a negative int, instead of Null, so that it would
evaluate to something other then Null in searches. An infinity of things could
have been done better.

But when you have Ph.D and you legitimately think of yourself as brilliant.
AND you are used to doing a lot of HARD work, then heroics with MS Access just
seem natural, right, that's what hard science is damn it. And this is why a
little knowledge can be worse then no knowledge.

If that guy had instead hired almost anyone else to design the database, he
would have been better off. It's hard to imagine anyone who could have screwed
up worse AND had the tenacity to stick with it.

PhDs are not just smart, but they also are used to working hard so when things
get hard they don't quickly perceive that as signal to try something else.

As to being computer illiterate:

A field biologists might get away with it, but anyone working in a lab will
sooner or later measure something with an instrument entirely controlled by
and accessed though a computer.

Let me say that again, a computer is the gate keeper.

And it's not enough to simply know how to use the GUI/API what ever. You have
to also know at least something about the algorithms involved in the analysis.
_Because a lot of analysis of things that can't be seen with the naked eye is
actually statistical inference of what's there._

Let me say that again, a whole of of stuff is not measured in the way a lay
person thinks of measuring, _it is inferred using fancy math_.

But make your math a bit too fancy and you're just making stuff up now. Use a
second order polynomial and you're good, use a 3rd or higher order and you can
see anything you want. There's a reason we use cubic b splines for computer
graphics, it's because we can fit them to any shape we want!

And oh yeah I've seen 3rd order polynomials used in science, no they were not
used correctly.

~~~
davi
"So each entry contained all possible numbers in increments of 0.02, with Null
in all places where the sample did not contain that number."

OK. That is heinous. It makes me cringe. And if the person who made it thought
that it was an example of brilliance, that's cocksure ignorance coupled with a
needy ego.

___but___

There are a lot of biologists out there who run computers at the the level of
how a business person deals with Microsoft Word: what the computer can do
equals what paths are available through the GUI.

In the example you give, the biologist got the computer to do something it
couldn't do before, something that made life in his domain easier, and
presumably made science possible that would otherwise have been impossible.

Many biologists don't have the money or time to hire a consultant to do it
right.

If a heinous kludge lets you do biology that you otherwise couldn't do, then
by god that kludge has merit. You can make fun of the person for being proud
of the kludge -- but you should admire them for their willingness and ability
to make foreign tools do something new.

My work spans biology and CS, and unlike 99% of the other biologists I know, I
have experienced what a well managed project feels like -- version control, a
build process, bug tracking, etc. To me, the Mythical Man Month is not a novel
concept; it's a given. The level of ignorance among biologists about how to
get computers to do useful work in novel ways is stunning, and the biologists
_don't know how ignorant they are_.

Again: ___but___

The computer scientists these biologists hire are often so averse to kludging
their way forward, that the result is stasis. Adherence to notions of
architectural purity, reusability, the 'right' libraries or platform, all
result in long iteration cycles where the biologists get no feedback as to
whether or not a given line of inquiry is promising or not. And the biologists
don't get what's happening -- they can't say, "No, don't do a 'proper' object
model and code a 'proper' solution yourself; instead, write a perl script to
munge this other tool's backend XML to achieve a similar effect, so I can find
out whether or not that functionality will be useful; and if it is useful,
maybe then we can do it 'properly'." The biologist doesn't know what Perl is,
and doesn't know what XML is.

And anyway, this approach is likely to be anathema to a good computer
scientist. People don't get into computer science to make ugly kludges; they
get into it to make things of beauty. Reusable libraries. Infrastructure.
Clean GUIs. Excellent data structures. Etc.

So there's this huge tension between the biologist PI and the computer
scientist he hires to build stuff, and neither one really speaks the other's
language.

If a biologist knows a bit of computer science -- say, enough to understand
his own limitations and ignorance, enough to communicate effectively with
computer scientist employees or collaborators -- he can be tremendously
effective. If he can write a little bit of perl and munge his own XML to find
out if an approach is promising, he can show his computer scientist employee
the kludge, and say, "Make this better."

So, a little bit of CS can be a very good thing. I suspect I'm violently
agreeing with you. As you can tell, the topic is near & dear to my current
life's work. :)

~~~
biohacker42
Indeed I think we are in agreement.

Definitely in the need for enough cs to understand your own limitations.
That's actually quite a bit of cs, but then again obtaining a Ph.D. can take a
while, so I'm sure a little more time for extra study can be found :)

And taking a look at your work... Are you really trying to create a high
throughput electron microscopy workflow? Whoa! If you make electron microscopy
even close to high throughput that would be awesome!

But how much of the brain structure are you able to preserve during sample
prep? For that matter, how quickly does brain structure degrade after death?
Minutes, hours, days, weeks, months? Are you imaging the sections in a light
microscope first? Correlating the light and electron results of the same slice
in software? Freezing the sample in liquid nitrogen so as to freeze it faster
then ice crystal can form? Man science is fun, too bad it's not a good way to
make a living.

~~~
davi
"Are you really trying to create a high throughput electron microscopy
workflow?"

Yes. We've increased acquisition rates by a factor of ~15-20x over what is
available using commercially available TEM systems. We now have tens of
terabytes of image data in the can, and when I'm not reading HN (argh!) I'm
working on collating & analyzing these data.

Light level microscopy preceded embedding of the material for EM, the anatomy
is correlated between the two modalities, and the sample preparation method
(this is of mouse brain) is by perfusion with an aldehyde mixture, so there is
essentially no deterioration of the ultrastructure.

So, I wonder what you do -- no home page in your profile -- but I do remember
what it's like to make a living. I like biology better. :)

~~~
biohacker42
20x? That's awesome!

I used to work in a bioinformatics startup - tons of fun with cutting edge
science. But the startup tanked, and I moved on to a high paying corporate
software engineering job. But I hope to be back in bioinformatics before long.

~~~
davi
Thanks. :) It's been a long road, hope to get a paper out in the next ~6
months or so. Shoot me an email (address on my home page) if you ever want to
be in touch outside of HN.

------
tom_b
This article is timely for me, having recently started a new job in
bioinformatics. Specifically, building a centralized database (warehouse) for
a variety of cancer research study data.

I'm coming from the opposite direction - a computer science background to the
biology. A huge challenge for me is rapidly learning enough of the biostats
and process to understand how to allow researchers to leverage having all this
data in one place, easily accessible, and with a front-end that makes "sense"
to the those MD/PhD types. A starting point is understanding what type of
questions researchers can ask now that they have all the different data in one
spot.

Fred Brooks said something like "computer scientists are toolsmiths." We build
tools for user needs that simplify and strengthen the user's work. This
requires the ability to somehow understand the user's needs, communicate with
them effectively, and implement usable tools for them.

I sometimes feel like it is a failure on our part as builders to make it
necessary for people who need software tools to build their own. I'm more than
happy for other fields to add more CS type education to their required
courses, but I'd rather be able to give researchers tools so that they stay on
their critical path, rather than having to learn enough to hack together their
own full solution.

------
jacquesm
This is a bit like the welder and the diver question, is it easier to teach a
welder how to dive or a diver how to weld ?

For divers and welders the answer appears to be that it is easier to teach
welders to dive than the reverse, even if both are far from trivial
activities.

For biologists and computer scientists the answer is probably that it is
easier to teach programmers to do biology than the reverse.

(good) Programmers have something universal about the way they apply
themselves to problems and that way generalizes to problems in a different
domain.

~~~
queensnake
I admit I don't have much to go on, but it seems like biology is a full load
of study, whereas at least undergraduate 'computer science' ie programming can
be picked up by a smart person almost incidentally. I've seen that done, and I
think it /is/ done more often than the other way around. As for writing /good/
code, that can come from practice and aesthetics. But knowing biology (or
another science) takes real study.

~~~
plinkplonk
I _think_ (please correct me if I am wrong, always glad to learn) that one key
difference would be that learning computer science would (mostly) need books
,the internet and the time and willingness to buckle down, while a proper
study of biology would involve serious _lab_ work, with a need for costly
equipment and instructions.

------
stevenbedrick
Well, as a reformed biologist and current informatician, I certainly think
that biologists should study CS. However, even more important than studying
CS, they absolutely NEED to learn how to program. I've seen lab scientists use
extremely convoluted and error-prone workflows to conduct their analyses and
experiments- workflows that, if they knew just a little bit of Python, would
have been much simpler. I'm actually teaching a class in the fall on "utility
scripting" to a mix of molecular biology PhD students and informatics master's
students for just this reason.

Regarding the age-old question of "should biologists learn CS or should CS
people learn biology", I'm firmly in the camp of biologists learning to do
their own CS, or at least learning enough CS to productively work with CS
people. A little bit of CS really goes a long way towards improving a
biologist's workflow. A little bit of biology, however, is almost completely
useless for a CS person who wants to get involved in lab science. It really
takes a surprising amount of domain knowledge to be productive in a
laboratory, or even to understand the nitty-gritty details of an experiment at
a deep enough level to write or modify an existing bioinformatics tool.

~~~
boblol123
Haha, you have the opposite opinion of me, but you also pretty-much have the
exact opposite experiences too. Maybe we both know a lot in our "main"
subjects and think of all the underlying knowledge/related material is
required to be useful, but in reality you can just pick up the knowledge you
need on its own without any background knowledge on how it all works, you'll
be confused when that stuff in mentioned or brought up, but if you stay in
your niche you'll be fine.

------
stevenbedrick
One aspect of the article that I haven't seen much discussion of is the second
part- about representing biological processes using an algebraic notation.
While this might be really helpful for computational biology, it strikes me as
a lousy idea for general work, because it presents an overly reductionist view
of what's going on. Biomolecular pathways are almost never as simple as they
seem at first, and they always interact in weird and complex ways. Presenting
them as a big, gnarly, nasty diagram communicates this to readers...
explaining them using nice neat equations makes the whole thing seem both
simpler and better-understood than it probably really is.

------
jganetsk
EVERYONE should study Computer Science. The questions are, how much and what
parts?

------
ken
The article is about computers as "part of biological research". I _wish_ it
had been about the real place I want biologists: designing the computer
systems themselves.

Biological systems have scaling and reliability that we computer scientists
only dream about. (Can you name a self-repairing computer system that runs for
80 years?) I want computers with the kind of systems thinking that biological
systems have, not just more x86 cores on a single chip.

The only biologist I know who switched to designing computer systems is Alan
Kay. I think we could do with a few more like him.

------
carterschonwald
short answer, yes. Otherwise how can anything useful or meaning be effectively
done with the huge volumes data that biologists now have quite frequently.
They should also work on their statistics background to so that they can do
more sophisticated model / hypothesis testing, but that's a whole separate
issue that gets into the matter of education and community incentives and this
is not the appropriate forum for that latter topic.

~~~
boblol123
A computer scientist can effectively analyse large volumes of biological data,
I'm not convinced a biologist could do the same because there is just so much
computer science related to visualisation, graphics programming and data
modelling and their prerequisites.

A person who did 1/2 and 1/2 would likely not have enough knowledge or
experience to do either the biology or cs side particularly well.

Not to mention, there are very few people I know who are good at both biology
and cs.

~~~
gnaritas
I know of one good example, Alan Kay, the inventor of Smalltalk.

~~~
boblol123
I briefly looked it up, molecular biology would be a good fit though its
pretty-much chemistry. It entirely depends on which areas are studied in
biology.

~~~
gnaritas
Sure, but what's more interesting is that he credits what he knows about
biology as the inspiration for Smalltalk and its particular flavor of OO.

------
tome
How about having biologists work side-by-side with experts in data analysis
and statistics, rather than requiring the scientists be be experts in all
fields?

~~~
Agathos
Often leads to the statisticians giving the biologists tutorials in statistics
and the biologists giving the statisticians tutorials in biology. Which is
okay, but then you have to wonder if more formal study up front would have
been more efficient. Which brings us back to this topic.

------
pchristensen
Useful links for those interested in this crossover:

Great Principles of Computing: <http://cs.gmu.edu/cne/pjd/GP/GP-
site/welcome.html>

90 min talk by Peter Denning about Great Principles:
<http://www.youtube.com/watch?v=5a_pO3NYJl0>

------
TrevorJ
Corollary: should Computer scientists study biology? _Yes_

~~~
jacquesm
Agreed! If you can get your hands on a copy of a university level textbook in
genetics that will be some of the best time you could invest in learning
something about another field unrelated to the one you are currently in.

~~~
TrevorJ
From the 10,000 foot level I think it _is_ related, and will become more and
more related as time goes by. DNA seems to be a pretty robust information
encoding and executing system. There's a lot we could learn from it.

~~~
jacquesm
Absolutely.

The first time I read what a ribosome does my immediate thought was CPU/Turing
machine. There are so many analogies it is scary.

Nano technology is here to stay, it's called life.

~~~
TrevorJ
Not sure why you where getting downmodded - that's an interesting observation.

~~~
jacquesm
Some people seem to have a way to express their disagreement with the 'down'
vote instead of saying what is on their minds. It comes with the territory it
seems.

------
tybris
Probably both could do with more math.

