

Complete Genomics To Sequence 1 Million Genomes – Interview With CEO - kkleiner
http://singularityhub.com/2010/01/26/exclusive-complete-genomics-to-sequence-1-million-genomes-interview-with-ceo/

======
msg
Somebody needs to get on top of appropriate encryption and anonymization for
this area. After all, there's nothing I want more private than my DNA sequence
(against health insurers, if no one else), and nothing that is less anonymous
than something that can identify me uniquely out of all the world.

At the same time, I want my doctors to have access for testing before they
begin a course of treatment.

I would also like to participate in studies that use my genome for others'
benefit and data mine it, but appropriate safeguards and protocols will be a
must. I want a way to share part of my sequence, to compare one sequence to
another, or to perform a test on the sequence, and be secure doing so.

~~~
abstractbill
I take a very different approach: When I have it, I'll gladly share my entire
genome with _anyone_ who wants it (after all, with the rapid decrease in
price, and the ease of collecting a sample from me without my knowledge, any
sufficiently determined person won't even have to ask my permission in a
couple of decades).

 _But_ , I'll always be very strongly in favor of legislation that prevents
entities like health insurance companies from using my genome against me.

~~~
msg
I don't think your genome going public is mostly in danger from nefarious
ne'er-do-wells collecting your skin.

But there will be mass data breaches from stolen laptops or social engineering
scams at these sequencing/medical record companies. Past experience tells us
that the data will not be secured properly. So if you care about it enough,
the right solution is a more federated solution where the DNA owner controls
the key, like PGP.

~~~
abstractbill
Insurance companies have been known to hire private investigators to check up
on their customers. If it was legal and cheap enough, why wouldn't they give
genetic material collectors?

In any case, I agree with you - there will be mass data breaches, so I prefer
to just assume all genome data will be essentially public, and then figure out
what to do from there.

------
yannis
Great interview and optimism.

 _The $1,000 genome is indeed within sight._

Let us hope that software will be evolving as fast to handle the data.

~~~
ericb
So DNA is a long string, with certain identified parts that have meaning (code
for something we know). What would a layman like myself need to get myself to
a hello world program that could take an actual DNA string(file) as input and
give a hello for the presence an actual known gene? Not code-wise, that's
obvious. I'm asking where I could get a sample genome, information about a
gene, fill in the gaps of my understanding, etc. Anyone know?

~~~
yannis
You can start with your favorite language by downloading a library, there is
bioPerl, bioPython, bioRuby and even bioPHP! The libraries have all the
routines to connect to public sources.

The National Center for Biotechnology Information
[http://www.ncbi.nlm.nih.gov/nuccore/89889045?report=fasta...](http://www.ncbi.nlm.nih.gov/nuccore/89889045?report=fasta&log$=seqview)
has an enormous database and is all open source. The link I quoted lists the
genome for a virus. I wrote a short program in JavaScript that did frequency
analysis on the pairs. It took 30ms in Chrome. It is actually not all that
difficult at a certain level of analysis.

In my mind the big question is, how can we understand how it works? It is like
giving you a binary for windows to reverse engineer it. My own question is;
does my DNA makes me? If my grandkids have it on a memory stick (it can
actually fit), would they be able to 're-construct' me in some form of virtual
reality?

A potential area for hackers to get involved in my opinion - is the potential
of the Cloud. You can imagine say 10 million genomes stored out there at the
not so distant future and distributed software analyzing it and indexing it
and all the ethical and other problems that come from it.

