
Craig Venter: 'We Have Learned Nothing from the Genome' - tokenadult
http://www.spiegel.de/international/world/0,1518,709174,00.html
======
michaelkeenan
I was curious about the disagreement over his predictions from 1998, so I
looked it up. I don't think it reflects well on him.

Spiegel interview 2010:

 _SPIEGEL:_ The genome project has been called the Manhattan Project or Moon
Landing of its era. It has also been said that knowledge of the genes will
change the future of humanity and become a "main driver of the world economy."

 _Venter:_ Who said that? I didn't. That was the people at the consortium.

 _SPIEGEL:_ You're wrong. You made all those statements in an interview with
DER SPIEGEL in 1998.

 _Venter:_ Really? Those are Francis Collins' lines. So I may have said that
that's how he describes it. I, on the other hand, have always said, "This is a
race from the starting line to the finish."

Spiegel interview 1998 (translated to German by Spiegel and back to English
with Google Translate[1]):

 _SPIEGEL:_ What does the knowledge of the genetic material?

 _Venter:_ The End of Ignorance, a completely new understanding of the human
body and a revolution in medicine. What we plan is there, of the order of the
Manhattan Project or the moon flights. The decoding of the genome will change
the self-image of humanity.

[1]
[http://translate.googleusercontent.com/translate_c?hl=en&...](http://translate.googleusercontent.com/translate_c?hl=en&ie=UTF-8&sl=auto&tl=en&u=http://www.spiegel.de/spiegel/print/d-7972392.html&prev=_t&rurl=translate.google.com&twu=1&usg=ALkJrhhtyOWCfoxOrojkUDuXwu8uuAG5-g)

~~~
rwmj
According to this excellent book I read:

[http://www.amazon.co.uk/Backroom-Boys-Secret-Return-
British/...](http://www.amazon.co.uk/Backroom-Boys-Secret-Return-
British/dp/0571214967)

Venter also wanted to patent the whole human genome once he'd sequenced it. It
was only by a huge and expensive effort by the Wellcome charity that the
genome was sequenced and made freely available. If you enjoy reading
..AGAAACCATCAGCACA.. then:

<http://www.gutenberg.org/etext/3501>

This is also discussed in the Wikipedia page:

<http://en.wikipedia.org/wiki/Human_Genome_Project>

Anyway, the book I linked to is great fun.

~~~
ben1040
_Venter also wanted to patent the whole human genome once he'd sequenced it.
It was only by a huge and expensive effort by the Wellcome charity that the
genome was sequenced and made freely available_

Not only Wellcome (although their Sanger Centre was a major player), but a
whole worldwide bucket of public money.

It also bears mentioning too that on March 14, 2000 Bill Clinton and Tony
Blair issued a joint statement saying that the results of genomic sequencing
shouldn't be patented. The biotech sector lost $40bn in value as a result[1],
as it became evident the patent business model was dead.

[1][http://books.google.com/books?id=FOk3LE70l1EC&lpg=PA284&...](http://books.google.com/books?id=FOk3LE70l1EC&lpg=PA284&ots=CvGQ1E6YuC&dq=celera%20stock%20crash%20march%2014%202000&pg=PA284#v=onepage&q=celera%20stock%20crash%20march%2014%202000&f=false)

------
GiraffeNecktie
As I pointed out the first time this article was posted
(<http://news.ycombinator.com/item?id=1559056>), the article title is a
truncated and distorted version of the actual quote: "We have learned nothing
from the genome other than probabilities."

------
carbocation
I would not have expected such a misunderstanding of the fruits of the
knowledge of the human reference sequence from J Craig Venter. Frankly, it
sounds like sour grapes. For some reason, I feel inclined to try to explain
some of the technical reasons why the public project's approach was necessary
for the completion of the human genome.

There is broad agreement (perhaps "universal - 1") that Venter's sequencing
approach would never have been able to achieve the level of completeness that
we have now thanks to the Human Genome Project. This is because of duplicate
DNA in the human genome, and the nature of short read lengths.

Let's try this with colors. Say you have a terrible sequencer that only reads
7bp at a time, max. You come across the following 2 strands of DNA:

GREEN _gattaca_ GREEN

RED _gattaca_ RED

They're on 2 entirely different chromosomes, but you don't know this since
there exists no reference sequence - you're mapping _de novo_ , remember?
Good. So you're get readouts like "GREEN _gattac_ " and " _gattaca_ " and "
_attaca_ RED" - but you don't have any reads >7bp, so you never see what is on
_both_ sides of gattaca at once.

Thus, since you can never span gattaca, you will never know if your DNA in
this region should be GREEN _gattaca_ RED + RED _gattaca_ GREEN, or RED
_gattaca_ RED + GREEN _gattaca_ GREEN. For _de novo_ alignment with short
reads, this type of shortcoming is deadly.

Because the public consortium / Human Genome Project used much longer read
lengths and a method involving bacterial artificial chromosomes, they were
able to map the more difficult regions of the genome having repeats.

 _Edit_ I should have addressed the public consortium's approach more
thoroughly, though to be clear I am less familiar with the approach and have
never used it myself. Essentially, you chop up the 3 gigabase genome into a
bunch of 150 kilobase segments (in this case, bacterial artificial
chromosomes). You can map these back to their source chromosomes, of which we
already knew we had 23. You then chop these BACs up into smaller pieces and
sequence. In the worst case, you might be unsure of where some of your reads
from each BAC map within the 150kb window that they come from, but hey, at
least you know which chromosome and more or less which portion of the
chromosome they come from. In other words, in the situation described above,
the RED and the GREEN never co-occur, since they are on different BACs and are
not sequenced together. Thus you know that RED/RED and GREEN/GREEN are the
right pairings (and you know which belongs on which chromosome).

~~~
dasht
carbocation:

Can you say a word or two about the current view of short reads for re-
sequencing? I worked for a time on software for a particular lab to do
alignment of short(-ish) reads like 13-bp+gap-of-100-200+7bp+gap-
of-5-8+7bp+gap-of-100-200+13-bp. The idea was that you give me millions or
more of those reads, and a reference like the human genome, and I give you
back all of the alignments including partial matches (a few bp off, here and
there).

Interesting programming problem. This aren't the actual bp counts and gap
sizes I was working with - they were in that range. I got concerned that the
reads we were working with were too ambiguous -- similar to the problem you
describe with de novo. I was having trouble doing the combinatorics to prove
myself wrong and started asking: hey, where's the mathematical analysis that
shows reads of this size can possibly work? Nobody could produce it and this
led to some internal strife.

That was a few years back. What's current thinking on what kind of reads
plausibly work for re-sequencing? (If you happen to know. And where can I read
something about the math that proves it.)

~~~
aheilbut
It's more an empirical question rather than a theoretical one, and short-read
sequencing wouldn't be as popular as it is if it didn't work out most of the
time.

I don't have all the exact numbers handy, but 76bp will uniquely map about
80-90% of potential reads, just with single ends. A high proportion of reads
can be mapped uniquely even down to 25 or 30bp. There are some 'mappability'
tracks available in the UCSC browser if you're curious about a particular
region.

[http://hgdownload.cse.ucsc.edu/goldenPath/hg18/encodeDCC/wgE...](http://hgdownload.cse.ucsc.edu/goldenPath/hg18/encodeDCC/wgEncodeMapability/)

Of course there are repetitive and low complexity regions that can't be
sequenced at all with these read lengths, or which haven't yet been properly
sequenced or assembled at all even with much longer reads.

I'm not aware of a paper that has fully explicated how read-length and the
structure of the reads affects mappability (especially with pairs or reads
with multiple gaps), though it's certainly something that all the people
building the instruments (and the mapping algorithms) have thought about a
quite a lot.

------
igravious
Oh how I would love to see both Craig Venter and Stephen Wolfram together, in
the flesh, in the same room, at the same time.

I would love to see if it were possible for both their egos to be occupy the
same portion of space-time for longer than 20 attoseconds.

~~~
jacquesm
I think you've just come up with a new kind of high-explosive.

------
adulau
An interesting point of view, especially that is a "bit opposing" the
perspective of Sergey Brin described in this article :
[http://www.wired.com/magazine/2010/06/ff_sergeys_search/all/...](http://www.wired.com/magazine/2010/06/ff_sergeys_search/all/1)
where Sergey Brin knows having a specific gene (LRRK2) that has been
associated with a higher rate of Parkinson disease. But the conclusion looks
the same, we still don't know well the interaction of the Genome with our real
biological operation and need more research especially in data analysis.

Looking at it on a programming perspective, it seems that we have a large pile
of "source code" and we are still evaluating some functions (or even at some
variable name) without grasping the overall modus of operation. When we are
doing binary reverse engineering, we are always looking for an entry point
where the program starts to be executed. It seems that for Genome that there
are many entry points...

~~~
phaedrus
Like a program binary, which has been developed by trial and error without a
design, which uses overlapping instructions that mean different things
depending on the entry point, and which makes extensive use of self modifying
code (via gene expression). And which goes through three phases of binary to
binary translation: DNA->RNA->Proteins.

~~~
mbreese
And that's just the simple stuff. Now imagine that you will silence a lot of
the binary code, only running small parts of it. But you may not know which
parts. I think this is one of the big things to come out of the human genome
project: exactly how little is there. And since there is so little code to
work with, there is a bigger need for epigenetic regulation and alternative
splicing.

------
jacquesm
I'm equal parts impressed with, annoyed by and intrigued by Craig Venter and
his undertakings.

Impressed because of the relentless energy he pours in to his ventures and
their success rate, annoyed because of his tendency to take credit and self-
aggrandise at every opportunity and intrigued because I wonder where it will
all lead.

------
checker659
Damn, I missed this article while it was higher up on the page. Not sure if
there are enough people around but I'll ask my question nonetheless.

"""Venter: Well, the goal is multifold. We have to start by creating minimal
cells. A human cell is too complex -- we have no idea how any human cell
works. We don't even know how the simplest bacterial cell works."""

Do we have no idea how a cell works? I know we understand the anatomy but I
had no idea we hadn't moved beyond that? Is this some kind of satire/joke or
is this where the state of the science is at?

~~~
danielford
I think what he's getting at is that you can look at the genome of E. coli,
which was sequenced in the nineties, and is one of the most common model
systems used in research, but we don't know what the functions of a lot of
those genes are. Last time I heard anything about this we had no idea what
twenty percent of its genes did. That number may be off by ten to fifteen
percent either way, since I haven't read anything on the topic in years, and
it's going to depend on the parameters you set for how sure you are about a
gene's function.

Keep in mind that this doesn't mean we're completely clueless about how the
cell works, just that there's still a lot of work to do.

