
In Newly Created Life-Form, a Major Mystery - DiabloD3
https://www.quantamagazine.org/20160324-in-newly-created-life-form-a-major-mystery/
======
davnn
TL;DR: Venter and his collaborators originally set out to design a stripped-
down genome based on what scientists knew about biology. ... With the right
tools finally in hand, the researchers designed a set of genetic blueprints
for their minimal cell and then tried to build them. Yet “not one design
worked," ... So the team took a different and more labor-intensive tack,
replacing the design approach with trial and error. They disrupted M.
mycoides’ genes, determining which were essential for the bacteria to survive.
... Venter is careful to avoid calling syn3.0 a universal minimal cell. If he
had done the same set of experiments with a different microbe, he points out,
he would have ended up with a different set of genes. ... In fact, there’s no
single set of genes that all living things need in order to exist. ... They
found that not a single gene is shared across all of life. “There are
different ways to have a core set of instructions,” ... Venter’s minimal cell
is a product not just of its environment, but of the entirety of the history
of life on Earth. ... He and others are trying to make more basic life-forms
that are representative of these earlier stages of evolution. ... Some
scientists say that this type of bottom-up approach is necessary in order to
truly understand life’s essence. “If we are ever to understand even the
simplest living organism, we have to be able to design and synthesize one from
scratch,” ... “We are still far from this goal.”

As @sixQuarks has already written, finding the minimal amount of genes when
there are 175 unknown ones and you don't know anything about their
dependencies and relationships seems to be pretty much impossible.

> In fact, there’s no single set of genes that all living things need in order
> to exist. ... They found that not a single gene is shared across all of
> life.

That's the most interesting point to me, I deeply believed that organisms
share the same basic set of genes.

~~~
dnautics
I worked in the synthetic biology lab (on a different project) while the early
stages of the Syn3.0 was being done, and also have paid several visits to chat
with them in the meantime. [Proof: first author on
[http://dx.doi.org/10.3390/ijms16012020](http://dx.doi.org/10.3390/ijms16012020)]
I was a fly on the wall for most of the group meetings and even contributed
some unpublished results (there were transposons that were causing problems by
shuffling the the DNA in the yeast and I hypothesized the orientation that
causes the issue and discovered the yeast genes responsible for this process)

Firstly what constitutes "minimal" depends on what you feed the organism.
There's a set of bacteria called phytoplasma which are plant parasites that
are missing genes to make nucleotides (even mycoplasmas have those) and
they've adapted by sucking nucleotides from their hosts.

Some insider information: Most of the mystery essential genes are vague cell
wall proteins. Probably what is going on is that if you knock out too many of
these genes, you lose cell wall turgidity and the cell becomes nonviable. So
what is important is not so much which of these genes you have, but how many
of them you have.

Second insider information, for fun: The "hypothetical minimal genome" (syn2.0
is referred to as HMG in the paper) was actually a backronym because we called
it the "hail mary genome" but decided that was inappropriate for publication.

~~~
whoopdedo
> Most of the mystery essential genes are vague cell wall proteins. Probably
> what is going on is that if you knock out too many of these genes, you lose
> cell wall turgidity and the cell becomes nonviable. So what is important is
> not so much which of these genes you have, but how many of them you have.

Sounds like dependency hell. Gene A only works in the presence of Gene B which
only works when there's Gene C which needs Gene A. Knock any one of them out
and everything breaks.

~~~
rosser
> _Sounds like dependency hell._

Random mutations are like that. It's almost like no-one was planning out how
these things should work, and whatever did work stuck.

Weird, huh?

~~~
tadfisher
In other words, God writes spaghetti code.

~~~
Eerie
Praise his noodly appendage!

------
no_flags
Interesting work, but it reminds me of the famous "Could a biologist fix a
radio?" paper [1]. The paper imagines biologists trying to determine what
parts of a radio are essential using a similar trial and error technique. As
you can imagine, this technique would lead to many erroneous conclusions,
especially when paired with the "publish or perish" and "gold rush"
mentalities so prevalent in academia. It makes you wonder if we are trying to
understand biological systems with a fundamentally wrong approach.

[1]
[https://www.cmu.edu/biolphys/deserno/pdf/can_a_biologist_fix...](https://www.cmu.edu/biolphys/deserno/pdf/can_a_biologist_fix_a_radio.pdf)

~~~
roadnottaken
> It makes you wonder if we are trying to understand biological systems with a
> fundamentally wrong approach.

Got a better idea?

~~~
no_flags
The last couple pages of the aforementioned paper give some thoughts on that.
The author suggests an approach more like engineering that is focused on
formalized mathematical descriptions of biological systems.

~~~
scott_s
The submitted article talks about attempts to _design_ a minimal set of genes,
but they kept failing. The difficulty with your comment is it sounds like
you're dismissing the work; I'm sure they are aware of that paper and its
lessons. They tried working in that direction and failed.

------
sixQuarks
I'm afraid it's not the individual genes themselves that are important to
life, but a specific combination of those genes. Further complicating the
problem is that we have no idea how many genes comprise an "essential group".
Is it a combination of 2 genes? 3, 5, 10?

When you're talking about 175 unknown genes, the combination of all of these
is a huge number. It's like finding a needle in a haystack the size of the
solar system.

I don't think this brute force approach is going to work, we need a different
way to figure this out, but I'm confident that once figured out, it will seem
simple looking back on it.

~~~
smaili
> I don't think this brute force approach is going to work, we need a
> different way to figure this out, but I'm confident that once figured out,
> it will seem simple looking back on it.

Why not? Couldn't we use machines to automate this and analyze the results?

~~~
alister
They got it down to 473 genes, but let's say that minimal life is some 20 of
those genes. A brute force approach would require trying 10^34
combinations[1]. And that's already with the simplifying assumption that 20
genes is the minimal set rather than 19 or 106 or some other subset.

[1]
[http://www.wolframalpha.com/input/?i=473+choose+20](http://www.wolframalpha.com/input/?i=473+choose+20)

------
daemonk
I wonder how they removed the genes? I am not an expert in microbiology. I
work with mainly eukaryotes, but I am guessing intergenic distances are not a
huge factor here? What about spatial arrangement of the genes? Did they
generally leave that alone?

Genes are not independent functional modules. Their placement and arrangement
on the genome matters. Did they only mess with coding features (genes)? Or did
they also mess with other genomic features? Or do such things just not matter
with bacteria?

~~~
maxerickson
If I understand correctly, they built the technology to synthesize arbitrary
genomes. So they remove genes by omitting them from the synthesis.

~~~
chubot
The article says they tried the additive/synthesis approach first, which
didn't work. And then they tried the subtractive approach, starting with M.
Mycoides. And in that case they were left with a lot of "unexplained" genes.

But he reminds us that the subtractive process entirely depends on the
starting cell.

I would like to hear more about the difference between additive and
subtractive methods -- it wasn't entirely clear from the article.

~~~
alextheparrot
Additive method: Synthesize genomes with the genes you expect to be necessary,
pretty much an optimization problem on the number of genes where you
continually try to remove genes that were in previous iterations.

Subtraction method: After reading a bit of the paper, they utilized Tn5, which
causes a DNA sequence to be arbitrarily inserted into the DNA (Random
locations). This randomly disrupts genes, causing them to likely not function
correctly.

Here's the logic: If all genes were necessary we would expect no cells to live
that had mutations.

If no genes were necessary we would expect all cells to have mutations at
about an equal rate relative to the space they occupy. (Assuming no bias by
the Tn5, but that's a nuance)

What they instead found was that some cells grew, but there were certain genes
that were not mutated, meaning that they are likely necessary.

They also classified a few things like studying the growth of the cell (Slow
growing, but still viable was classified as "quasi-essential") and implanting
their minimal genome in another species, which failed giving evidence that a
subset of genes that are sufficient for survival are not necessarily
sufficient in all cells.

------
kazinator
That's like forking someone's program, but not understanding two thirds of the
code. (What's the big deal? It happens!)

If you take someone's 4000-5000 line program and whittle it down to 473 lines
which are still somehow useful, "newly created" doesn't apply in full honesty,
let alone if you don't know what a third of those lines do.

~~~
fnovd
In this analogy, they're not looking for the code to "work". They simply want
it to compile.

~~~
majkinetor
That is wrong analogy. Compiling the code doesn't mean it will run.

~~~
oldmanjay
Maybe life is written in Haskell

------
darawk
This is great. This is the proper engineering approach to understanding life.
Whittle it down to the minimal reproducing test-case, and poke it with a stick
until you understand what all the parts do.

This is just excellent science. It seems like it should be very easy to get
these unknown genes to reveal their function now. Very exciting times.

------
sevenless
I find it fascinating we still don't really know how life works. I have a
hunch we are going to find the large scale 3D structure of the chromosome is a
big deal, and these genes regulate it. There aren't many good tools to study
chromosome structure and it's quite possible there's a whole layer of
information we've missed so far.

~~~
roye
Actually there are new approaches to study exactly that, and they are
developing rapidly, cf:
[https://en.wikipedia.org/wiki/Chromosome_conformation_captur...](https://en.wikipedia.org/wiki/Chromosome_conformation_capture)
Basically, this problem, along with a long list of other applications is being
attacked by deep sequencing. The way it works is that you apply a treatment to
DNA that 'glues' 3D contacts in place and covers the glued segment, apply
restriction enzymes to cut out what's not covered by glue, get rid of the
glue, sequence all that's left, and then map the remaining sequenced fragments
to the reference genome. The output is the relative tendency of different
regions to come into contact with each other.

~~~
sevenless
> The 3-D organization of the genome can also be analyzed via
> eigendecomposition of the contact matrix.

Love it when linear algebra pops up in unexpected places!

------
fauigerzigerk
Forgive my complete ignorance, but how do they even count genes? Given a very
long string of base pairs, how is it possible to know where one gene ends and
the next one begins? Do genes overlap?

~~~
ceejayoz
[https://en.wikipedia.org/wiki/Start_codon](https://en.wikipedia.org/wiki/Start_codon)
and
[https://en.wikipedia.org/wiki/Stop_codon](https://en.wikipedia.org/wiki/Stop_codon)

------
haberman
How much information is 473 genes? How many bytes does this represent, in a
compact but "raw" encoding? What about if you compressed it with gzip?

Also what are the raw/compressed sizes for a human genome?

I have wondered this for a long time but never seem to find a concrete answer.

~~~
gberger
From [1], "Mycoplasma genitalium: it has only about 480 genes in its genome of
580 070 nucleotide pairs." So in average, each gene is 1200 nucleotide pairs.

A nucleotide is one of [A,T,C,G]. So that means it encodes 2 bits of
information.

473 * 1200 * 2 = 1 135 200 bits, or 141.9 kilobyte.

Of course, this is coming from a software developer using numbers from Google,
so I might be wildly wrong.

[1] "Molecular biology of the cell" by Bruce Alberts

~~~
haberman
That's a great start at least! Posting something that is wildly wrong is the
best way to get someone who knows more to interject with corrected
information, so hopefully someone with expertise can confirm or deny.

~~~
gberger
Cunningham's Law

------
meursault334
Article summary and Article
[http://www.cba.mit.edu/docs/papers/16.04.minimal.pdf](http://www.cba.mit.edu/docs/papers/16.04.minimal.pdf)

edit: appears to be the full article (scroll down past summary)

~~~
kens
I encourage people to read the one-page summary and at least the section
headings in the full article.

To answer a few question that have come up:

How did they synthesize the genome? By creating short sequences of DNA and
joining them together step by step to make larger sequences. They chemically
synthesized short DNA sequences and assembled them into 1.4 thousand base pair
(kbp) fragments. Five fragments were assembled into 7 kbp cassettes, which
were assembled in yeast to generate 1/8 chunks of the genome, and these chunks
were assembled in yeast to create the full genome. (See figure 2.) To try out
deletions, they could replace a 1/8 chunk, rather than synthesizing the whole
genome.

Why can't they delete all the non-essential genes? Bacterial have a lot of
redundant genes, where two different genes provide the same essential
function. Just like redundant disks, you can remove one, but not all of them.
And you'll get different of minimal genomes depending on which gene you keep.

One interesting thing is that because the growth medium provides almost all
the necessary nutrients, they could remove a lot of the metabolic genes, but
needed to keep a lot of genes to transport molecules across the cell membrane.
You can imagine a minimal cell constructed the opposite way.

The article doesn't mention that in their first synthetic bacterium (2010),
they encoded text (actually HTML!) in the DNA to provide a secret watermark.
See [http://www.righto.com/2010/06/using-arc-to-decode-venters-
se...](http://www.righto.com/2010/06/using-arc-to-decode-venters-secret-
dna.html)

------
misiti3780
Has anyone else here read Nick Lane's new book - The Vital Question - I just
finished it and this article really seems like a continuation of some of his
theories he talks about.

I found about the book from HN, and have since bought every single other book
by him, almost done with Life Ascending now, which is also amazing

~~~
kens
Yes. I read The Vital Question after hearing about it on HN a few weeks ago
and found it very interesting (although less so nearer the end, as it seemed
to get more speculative). To oversimplify, mitochondria are the key to
understanding life and eukaryotes. (I should study electron transport more
closely.)

------
coldcode
If I were a biology/biochem/genetics/etc type of student today this is exactly
what I would love to work on. Perhaps someday we will actually understand how
life works. That's both exciting and incredibly scary.

~~~
pluma
I don't know. The definition of "life" seems so arbitrary and nebulous. It
seems really more like we're trying to attribute properties based on a wishful
interpretation of emergent behaviour that ultimately boils down to extremely
convoluted chemical reactions.

Trying to produce genuinely novel and distinct biological lifeforms sounds
cool enough on its own though.

~~~
sevenless
Yes, the boundary between life and non-life is not sharp, but to claim it
follows that life is hard to define would be the continuum fallacy
([https://en.wikipedia.org/wiki/Continuum_fallacy](https://en.wikipedia.org/wiki/Continuum_fallacy))

Manufacturing a living cell from purely chemical ingredients is a concrete
enough goal.

And almost certainly it lies far, far off in the future.

------
callesgg
If the cell had a log it would be filled to the brink with warnings and errors
:)

Thinking if it like: Just remove files from the OS until it wont start :)

------
otto_ortega
"The function of 79 genes is a complete mystery.'We don't know what they
provide or why they are essential for life — maybe they are doing something
more subtle, something obviously not appreciated yet in biology' "

It is the activation key God puts on each living being... Ain't gonna work
without it. =P

~~~
pluma
So you're proposing the creation of the Church of Genuine Advantage?

------
drabiega
My take-away: Work like this is somewhat akin to attempting to determine the
specification of Intercal through reverse engineering given a working program
and a compiler.

~~~
dekhn
One of the things that originally attracted me to biology was the idea that it
was like reverse engineering a computer system with no manual.

After a while that got old.

------
phieromnimon
So it's like a unikernel of life?

~~~
IncRnd
It's closer to a microkernel.

------
k26dr
Anybody else find it suspicious that 2 quanta magazine stories were the top
two stories on HN today?

------
rgtk
That should be us.

------
LionessLover
The actual study can be accessed for free on "The Pirate-Bay of Science
publications" Sci-Hub:

[http://science.sciencemag.org.sci-
hub.cc/content/351/6280/aa...](http://science.sciencemag.org.sci-
hub.cc/content/351/6280/aad6253)

About Sci-Hub: [https://en.wikipedia.org/wiki/Sci-
Hub](https://en.wikipedia.org/wiki/Sci-Hub)

------
peter303
Four months old news. I will Dr Venter in Aspen Saturday.

