

Ask HN: computational chem/bio - quantize

Fellow HN'ers:<p>I'm in university right now, and I'm looking to work with some professors in the computational sciences.  I'm wondering if you guys know of any cool intro projects I could hack on before approaching professors.  I'm really just looking for suggestions, because I can't think of a good place to start.  Thanks!
======
bendmorris
It's such a big area that it's hard to know where to start. Have you taken an
intro class or something? That might be the best way to figure out what's out
there.

I'm not sure what your computational background is but if you're not familiar
with the basics like sequence alignment I suggest you check out a book. I'd
recommend An Introduction to Bioinformatics Algorithms by Neil C. Jones
([http://www.amazon.com/Introduction-Bioinformatics-
Algorithms...](http://www.amazon.com/Introduction-Bioinformatics-Algorithms-
Computational-Molecular/dp/0262101068)). In addition to describing the
algorithmic techniques, it gives synopses of current research issues and big
names in the field. There are also hundreds of sample problems - some of which
are active research problems - and no answer key (because often these types of
problems don't have clear "best" answers). It's kind of like being taught how
to swim by being thrown in a lake. If you survive, you're better off for it.

Also, before approaching professors it helps to have read and understand their
description of their research interests and maybe have read a few of their
recent papers. This will give you an idea of whether they're doing something
that you might be interested in or not.

------
dalke
And then there's quantum mechanics, molecular modeling, microarray analysis,
protein folding, drug design, structure visualization, and still more.

If you want good initial projects, try one of these: 1) make a structure
viewer (to start, read a PDB file and draw colored spheres in some sort of
UI), 2) implement a sequence alignment then use it infer the family tree from
a set of related sequences, 3) build a web site which lets someone do full-
text or other complex searches of UniProtKB/Swiss-Prot. When you display a
record, also render the sequence annotations graphically.

------
kwellman
I have a masters in computational biology. I did some work on flux balance
analysis (FBA), which is a pretty interesting topic.

FBA simulates the metabolic system using linear programming, which is a
technique used most extensively in economics, but applicable to biology.
Basically, linear programming is a mathematical method of determining the
maximization of some linear function given a set of linear contraints.

The assumption behind applying FBA to metabolic systems is that the cell acts
to maximize its growth based on what is available in the environment and the
stoichiometry of the metabolic reactions that can occur in the cell (ie the
constraints of the system).

It is suprisingly accurate at predicting things like the consequences of
metabolic gene knockouts, and has been applied to identify potential drug
targets.

The most renowned researcher is this area is Bernhard. O. Palsson. His
group[1] has created computational models of different organism that can be
used to perform FBA and test things such as, for example, the outcome of gene
knockouts. His models are available to download.

There are linear programming libraries available for linux. I used lpsolve,
which has Python bindings. As a starter project you could do something like
identify the essential genes in a organism like E. coli.

I'd be happy to help. My homepage (in my profile) has my email address.

[1] <http://gcrg.ucsd.edu/>

------
falsestprophet
Don't wait to approach professors. Do it tomorrow. Each of them will have
10,000 ideas and will be glad to help you work on one of them.

~~~
hga
Agreed 100%, that's what worked for me. "I've got this molecule here... [he
then described it a bit, showed me the physical metal model he'd made of it
(from a more permanent variety of the plastic models people use in organic
classes)], and the current approaches are inadequate..." and then he lent me a
book that would get me started on learning what I needed to know to approach
the problem.

Basically, if you can largely offer a "fire and forget" proposition, i.e. you
need a problem and pointers to learning but not a lot of hand holding, you
should be able to find a professor who has one or more things he'd like to
investigate but that he doesn't have the resources of one sort of another to
do, and that you can offer your particular talents towards solving.

------
bbgm
Any specific types of projects you are interested in? Like bendmorris says,
this is a very vast field. Some pointers that might help

Genomics:

Take a look at work that Mike Schatz is doing at CSHL:
<http://schatzlab.cshl.edu/>

Or you can check out this recent webcast by C. Titus Brown:
<http://oreillynet.com/pub/e/1784>

On the comp chem side, check out Rajarshi Guha's blog. You don't get much
better on the cheminformatics side of things: <http://blog.rguha.net/>

There's a lot of folks doing some very interesting research in genomics,
proteomics, cheminformatics, metagenomics, etc. You might also want to ask
this question at BioStar: <http://biostar.stackexchange.com>

~~~
quantize
Thanks a lot for the links!

------
papaf
From your post I can't tell if you come from a biochemistry or a computational
background.

If you have a biochemistry background: I'd recommend that you use your
strengths, and maybe concentrate on something that will help in the wetlab.
The biological background required for algorithmic problems such as the
statistics of protein and dna sequences is a well trodden path for people who
know computer science. However, we know nothing about the wetlab and most of
the software I see there is ugly, closed and expensive.

If you have a computer background: One suggestion is to install R,
Bioconductor (bioconductor.org), and start playing with microarray data in the
form of CEL files (<http://www.ncbi.nlm.nih.gov/gds/>).

~~~
quantize
I have a strange background, more of a CS background then bio, so this is a
great suggestion. Thanks!

