
In a First, an Entire Organism Is Simulated by Software - donohoe
http://www.nytimes.com/2012/07/21/science/in-a-first-an-entire-organism-is-simulated-by-software.html?hp
======
alexholehouse
This is very exciting progress, but let's remember that there is a rather
important caveat with this sort of thing;

 _"For their computer simulation, the researchers had the advantage of
extensive scientific literature on the bacterium. They were able to use data
taken from more than 900 scientific papers to validate the accuracy of their
software model."_

This is not (necessarily) a model of _M. genitalium_ \- it's a model of our
understanding of _M. genitalium_ , and as such incorporates in everything that
current technology in biological sciences allows us to look at. It takes a
huge amount of data from many sources and tries to bring it together on a
scale not previously done. However, that data may have significant flaws,
biases and ultimately considers only what we're good at looking at
(technologically/scientifically speaking). It's awesome, and 100% the right
direction for the field, but equally it is not a, "synthetic life form being
simulated" as much as, "a very very complicated model which uses huge amounts
of multidimensional data to try and replicate the behavior seen in that data".

~~~
kens
That's true of any simulation or model - it's only as good as the data that
goes into it.

The exciting thing is that with this model, they can now rapidly iterate
between the behavior of the model and the behavior of the real organism, see
where the gaps in knowledge are, and work to fill them in.

(I'm happy to see M. Genitalium getting attention, since I made a Java genome
display for it 14 years ago <http://www.righto.com/java/genome/MG.html>)

~~~
pbhjpbhj
> _That's true of any simulation or model -it's only as good as the data that
> goes into it._ //

Models can be predictive, indeed that's one of the main reasons to create a
model to observe behaviour that otherwise might not have been expected.

A theory in physics for example is a model with proven predictive powers, the
model [theory] 'knows' some aspects of what can be observed before those
things can be confirmed empirically.

But as you indicate new empirical data can shows flaws in a model just as it
can in a scientific theory.

------
gliese1337

        Currently it takes about 9 to 10 hours of computer time to simulate a single division of the smallest cell — about the same time the cell takes to divide in its natural environment.
    

So, it's a realtime simulation! Cool, although wholly incidental.

It's hard to tell from the article exactly how fine-grained the simulation
actually was. Was it actually at the level of tracking individual molecules,
or did each "cell object" just keep track of, e.g., concentrations of
different types of molecules?

Also, I wonder if this will have any long-term impact on things like the Open
Worm project ( www.openworm.org ).

~~~
jkarr
We used a combination of tracking concentrations and individual molecules. For
specific macromolecules like DNA polymerase, RNA polymerase, ribosomes, etc we
kept track of the position of each individual molecule. For other things like
glucose, water, etc with very high copy number we just kept track of the copy
number.

I agree modelling entire organisms will ultimately require large
collaborations, potentially through projects like Open Worm.

------
apl
<http://wholecell.stanford.edu/>

Source code and training data available online; written, of course, in MATLAB.
Very refreshing, and I'm looking forward to dissecting this first-hand.

~~~
joe_the_user
Wow,

Having the code in Matlab seems like a disaster as far as ever making this or
similar approaches modular and so usable-by-others goes. And I do know by
miserable experience that Matlab indeed what biologists generally use but if
biology is ever going to interface with larger scale software construction, it
seems like it is going to have to change it's standard operations a bit.

Edit: And this isn't saying Matlab is generically "horrible". It is great at
what it does but horrible from my perspective, as a programmer whose task
usually is putting pieces of software together.

~~~
apl

      > the code in Matlab seems like a disaster as far as ever
      > making this or similar approaches modular and so usable-
      > by-others goes.
    

As always, it depends. I've seen very well maintained MATLAB code bases, and
I've seen the opposite (with the latter greatly outweighing the former). We
should give these guys the benefit of the doubt. Somewhere Karr mentions
Hudson CI, so they don't seem fully removed from good practices. Interfacing
with MATLAB from C is reasonable.

In an ideal world, this would be a NumPy/SciPy prestige project, but neither
the community nor Python for that matter are quite there yet.

~~~
nagrom
I'm confused. In what way is Python not there? If one specifies, e.g. Python
2.7, what do you consider to be the short-comings?

I'd also like to know how you think numpy/scipy are short compared to Matlab.
I've not run into their limitations yet.

~~~
lazyjeff
I think the main thing is libraries. I was looking for graphical model
software that supported DBNs once, and could only find Matlab ones. There was
a python wrapper for one of the Matlab libraries but was not widely supported.

------
Cieplak
I would love to read the article, but $31.50 is a bit steep.

[http://www.sciencedirect.com/science/article/pii/S0092867412...](http://www.sciencedirect.com/science/article/pii/S0092867412007763)

Thanks, Elsevier.

~~~
gwern
<http://dl.dropbox.com/u/85192141/2012-karr.pdf>

~~~
Cieplak
Cheers!

------
joe_the_user
My vague Googling gets that a human being has 50-100 trillion cells. So
Moore's law would say a full human simulation might be 80 years away if it
keeps up. I wonder if an organ could be simulated taking a smaller selection
of cells, determining their characteristic interactions and extrapolating from
there.

The further question with any project for full organism simulation would be
how many lines of code are going to be produced and what would the process of
maintaining that code look like?

~~~
praxulus
On the one hand, a human cell is many times more complex than the cell
simulated here (you could say that each of the hundreds of organelles in a
human cell are closer in size and complexity to a bacterium than the whole
cell). On the other hand, you could throw a datacenter at the problem, and get
quite a leap over the 128-machine cluster used by the researchers.

You're probably right about the hierarchical simulation though. I have trouble
believing that it's actually useful to model an entire body at a macro-
molecular level. It would be like modeling electrons in circuit design
software, rather than using the abstractions of voltage and current.

~~~
joe_the_user
It will be interesting to see what's possible _when or if_ we have
supercomputers an order of magnitude or two more powerful than the present
ones. The problem of producing software of similarly larger size is naturally
daunting.

I think voltage etc is airtight abstraction mostly because each electron is
guaranteed to be both simple and the same. Cells are both complex and distinct
from each other (based on both genetics and internal physiology and so-forth).
So macro-configuration of cells would seem to be a more leaky abstraction.

Roger J. William's classic text Biochemical Individuality describes how much
the parameters of even very basic physiological functions varies from person
to person. So unlike a chip which starts with simpler building blocks and is
designed to depend on discreet inputs as much as possible, the simulation of
an organism may not have a better solution than a bottom up design with
perhaps a variety of clever shortcuts.

~~~
kens
I've been looking a lot at the 6502 processor simulation
(<http://visual6502.org>) and it's interesting to consider the different
levels of abstraction possible for chip simulation. The current simulator
simulates abstract on/off transistors, which is sufficient for almost
everything, but it doesn't exactly handle some unsupported opcodes that put
conflicting signals on the bus. For that, you'd need to simulate actual
voltage levels. If the transistors were very small, the voltage abstraction
would break down because each electron starts to count. The simulator also
ignores propagation delays. Moving up the hierarchy, people have gate-level
6502 simulations, which are tricky to implement exactly since the 6502 uses a
lot of pass transistors and stored charge, rather than strictly Boolean logic.
And then the typical CPU simulator runs at the register level, which is a lot
simpler, but often gets the corner cases wrong (e.g. decimal arithmetic with
invalid inputs).

The point is that circuits can be simulated at many different levels of
detail, with low-level simulations more likely to get things exactly right,
but with high-level simulations much faster, easier to write, and easier to
understand.

Likewise, it will be interesting to see with cell simulations how much
complexity can be abstracted away and still have a useful simulation. To get
all the protein interactions right for example, you'd need to simulate
individual atoms, which is insanely slow. So to simulate a cell, you're
probably running at the level of protein and chemical concentrations and known
interactions, which is faster but introduces error. For instance, how much do
local concentrations matter? And to simulate a multicellular organism, you're
probably going to make the cells fairly abstract.

Personally, I think the key area for biology is going to be dealing with cell
state. Cells hold state in a lot of different ways over many time frames (eg
epigenetics), and I think computer scientists have a lot to offer biologists
in understanding state. Someone else mentioned the few hundred cell types in
the human body, but the internal state makes a huge difference. (Not to
mention distributed state, such as how the brain stores information.)

~~~
coopdog
I wonder if it would be possible (in the not so distant future) to simulate an
entire human, but 'cache' the results of parts of the simulation. Let the
program decide where a certain interaction has been calculated too much and
create its own level of abstraction.

Sort of like how image compression algorithms store an abstraction of the
image for large, uniform areas and get more granular for complexity, but
applied to simulation instead

------
siavosh
As an undergrad, I did some research work in a bio-computation lab trying to
model the basal ganglia (a part of your brain that helps you move and learn).
I left shaking my head at how much guess work, "fine-tuning", and hand waving
there was in the field. The biologists didn't appreciate how crappy code can
deceive and the engineers didn't appreciate the mind-boggling complexity and
non-imperative world of biological systems.

I can't even imagine how much parameter tuning and hacks went into a model of
such staggering complexity. Paraphrasing one of my old academic advisors, the
curse of models is that you can always make them look good.

------
pbw
<http://www.stanford.edu/~jkarr/research.html> look for "more information"
links to animations, etc.

<http://wholecell.stanford.edu/> contains a link to the code (written in
matlab)

------
drcode
I think this news is huge: I suspect they probably took a lot of shortcuts and
aren't simulating everything at the level of individual atoms. However, this
type of system could be refined and perhaps give a 99% accurate simulation of
a cell.

Imagine if you could get the DNA from a cancer cell in a human patient, as
well as the DNA of a normal cell, and then test the effect of a million
different randomly-generated molecules until you find one that kills the
cancer cell, but not the normal cell.

If you could scale the performance of this type of system and allow it to
simulate Eukaryotic cells (much more difficult) it might let you cure most
cancers!

------
Cieplak
Any clue what software/technology they are using? I'm guessing Java.

~~~
tosseraccount
It appears to be hodgepodge of technologies but mostly Matlab (.m) ...

Downloaded code zip file and unziped to to "WholeCell" directory

find WholeCell -type f | awk '{FS="."; print $NF}' | sort | uniq -c | sort -n

...

    
    
          2 java
          2 lib
          2 log
          2 mexa64
          2 mexw32
          2 pdf
          2 swf
          2 vbs
          2 xlsx
          3 exe
          3 fsa
          3 mexw64
          4 jpg
          4 svg
          5 desktop
          5 license
          5 sql
          5 TXT
          6 col
          6 sh
          6 tmpl
          6 tpl
          7 ico
          7 xml
          8 bat
          8 pl
          8 z
          9 json
         15 dot
         16 gif
         18 map
         19 css
         23 dll
         24 dat
         25 p
         52 mat
         62 txt
        114 jar
        238 png
        277 js
        427 php
        531 m
       1096 html

~~~
apl
A lot of that stuff is incidental stuff; see .php, .html, .js, and so on.
(Apparently there are some web-based tools that he mentions; not sure,
though.) The bulk amounts to MATLAB plus Java libraries for remedying MATLAB's
functionality gaps.

------
Achshar
Can any one explain what they mean by simulate? Like to what depth? Do they
define different types of cells and their functions and then let them work? Or
molecules, atoms? particles?? I strongly believe a "proper" simulation is
impossible because we don't even know the bottom of the barrel, so best we can
do is define a building block and how it behaves and see it grow from there.
Although the coolest thing would be to define the quarks and four forces and
_then_ let it organically grow into some kind of matter over time. We can also
control time in a simulation which will basically confirm our understanding if
the outcome of the simulation is similar to real world.

~~~
JackC
Just based on the Times article, I think the idea is, define a simulation of a
cell and subject it to a bunch (100s) of simulated experiments, based on real
experiments that have been run on real versions of the cell. If you get the
same results from your simulated cell that other researchers got from watching
real cells, then your simulation is potentially accurate enough to run _new_
experiments on that will generate useful information about the actual
organism. The fact that it predicts the outcome of previous experiments we
were interested in suggests (hopefully) that it has sufficient resolution to
predict new stuff we're interested in.

So in this case, it sounds like they're simulating the interaction of genes
and molecules, since they think that's sufficient to model cell behavior
(and/or it's the best we can do). But it doesn't really matter what technical
level of detail they went to -- the only useful definition of a "proper"
simulation is whether it behaves the same as the real thing _in the context
you care about_. For example, this simulation would be totally insufficient if
I wanted to model a hydrogen bomb -- but totally excessive if I wanted to
model gravitational forces on independent objects in space. If it's good
enough to tell us _anything_ new about the actual cell, that'll be pretty
cool.

------
SoftwareMaven
I would love to see this software on Github and to watch the organism "evolve"
in different forks into different virtual organisms. A very meta-biological-
software thing.

~~~
nagrom
Well, the comment above yours says that the whole thing is available:
<http://news.ycombinator.com/item?id=4272721> There's nothing to stop you from
creating a Github project with all that information yourself.

In my experience, physics researchers are not familiar with GitHub or similar
OSS hosting sites. The omission of a Github project is likely through
ignorance or apathy, rather than disapproval. I'd be happy (delighted!) for
you to take my own physics simulations and play with them, for example.

------
mtinkerhess
_In designing their model, the scientists chose an approach called object-
oriented programming, which parallels the design of modern software systems.
Software designers organize their programs in modules, which communicate with
one another by passing data and instructions back and forth.

Similarly, the simulated bacterium is a series of modules that mimic the
various functions of the cell._

Pretty cool that a relatively mass-market outlet like the Times thought it was
worth mentioning OOP. Even as a software developer, it's a stretch for me to
visualize what it means to simulate an organism; what a difficult job for the
Times to distill it down into a couple of paragraphs for a lay audience.

------
paulsutter
“The model presented by the authors ... should be commended for its audacity
alone"

Audacity, yes we do need more of that. pg says that tenacity if the single
most important trait of an entrepreneur. Maybe audacity is a multiplier that
brings us the big innovations.

~~~
dllthomas
audacity + tenacity => high variance

------
lectrick
It looks like they have modeled the subprocesses of the cell and run a
simulation of an assembly of those, instead of actually running (for example)
a physics simulation of the entire organism at the atomic level. FYI

~~~
reitzensteinm
There are 7 * 10^9 carbon atoms in E Coli [1], so we're probably a decade or
two away from being able to do any kind of simulation at the atomic level.

Though it's probably getting close to the point where getting the right
algorithms is the bottleneck; if you could somehow bring the techniques and
optimizations we'll inevitably learn over the next 100 years back to today, a
half decent simulation would probably already be possible on a cluster of
commodity hardware.

Unless the techniques revolve around a new method of computation, of course -
maybe memristor logic helps with that kind of thing.

[1]
[http://bionumbers.hms.harvard.edu/bionumber.aspx?s=y&id=...](http://bionumbers.hms.harvard.edu/bionumber.aspx?s=y&id=103010&ver=0)

------
dailo10
While reading this, the Matrix crossed my mind:

 _What if we're just living in a giant simulation? What level of computing
power would it take to run the planet Earth?_

------
usefulcat
“The major modeling insight we had a few years ago was to break up the
functionality of the cell into subgroups which we could model individually,
each with its own mathematics, and then to integrate these sub-models together
into a whole,” Dr. Covert said. “It turned out to be a very exciting idea.”

I gotta say, if I was going to be working on that code, that statement would
make me rather uncomfortable.

------
technotony
This is a great example of a field which will be massively impacted by Moore's
law. In 10 years time you will have the equivalent processing power in just
one computer, meaning hobbiests are going to be able to run these kinds of
experiments. Then things are going to get really exciting!!

------
sgoranson
So if we ran this simulation for 4 billion years would it pass the Turing
test?

------
praptak
There is a patent on that: <http://www.google.com/patents/US5621671> "Digital
simulation of organismal growth." Draw your own conclusions.

~~~
nosse
It's only for Drosophilia embryo. <https://en.wikipedia.org/wiki/Drosophilia>

Owned by U.S. Navy, issued 1987 so it's free now.

------
chatmasta
Who else thought this said orgasm?

~~~
daveman
Women have been able to simulate those for a while now.

------
dedward
Anyone else recall a Greg Egan story along these lines? can't recall the name.

~~~
RobertKohr
Yep, Permutation City - the human brain was scanned at super high resolution
(was able to snapshot the current state of all cells). Then it was able to be
run in a simulated world with simulated sensory inputs. And of course multiple
copies could be run.

I am really just scratching the surface of what was covered, but it was a
great book and I would recommend it for anyone interested in simulated life.

------
martindale
Does it emulate mitosis? Can we artificially emulate mutations?

------
ef4
Basic biology research is really exploding. Very exciting stuff.

------
tpowell
We need Digg back for comments on stories like this...

