
Color Genomics raises $45M to provide genetic tests that detect cancer risk - sandeepc
https://techcrunch.com/2016/09/27/color-genomics-raises-45-million-to-provide-cheaper-genetic-tests-that-detect-cancer-risk/
======
niels_olson
Folks, everyone in the world who can get their hands on an illumina sequencer
is developing these "30 gene", "400 gene", "N gene" tests, liquid biopsies,
blah, blah, blah. Even the fact that they got a VC to shell out $45M is
something happen pretty regularly now. Source: senior pathology resident in
San Diego, driving past illumina and the Craig Venter Institute every day.
Developing these tests is literally all molecular pathologists do. All day
long.

The game is to actually get a lot of patients. Memorial Sloan Kettering,
Foundation One, Broad Institute, Venter are the biggest data-gatherers I'm
aware of right now, with the DoD starting to get in the game. But who really
wins will be the platforms that do the bioinformatics analysis: Google
Genomics, illumina (basespace), etc.

And the ethics questions and "we don't know about the environment" questions
aren't going to get answered until the data is collected. Wait till the EMRs
are tied into the big data pipelines. Oh, nellie.

~~~
untilHellbanned
Yep. Academic molecular biolgist here. 100% agree. For as much as this
community knows about software it knows shockingly little about the rules of
health care.

Hate to break it to everyone breathless over yet another press clipping, but
this startup is dead in the water. Another $45M down the tubes.

~~~
dmix
Are you saying the lack of access to data will be the issue? Or following
health laws?

If they get some limited access to data, develop the machine learning software
better than anyone, whats stopping one of the big pharma companies from
shelling out $1B to acquire the company? Getting access to a tech team + IP
could be very valuable. Pharma companies aren't known for their software teams
either and this seems to be the core of their business offering here. Not the
tests that everyone's already doing... I mean I just reread the article and
they aren't claiming this is their core business proposition at all (as the OP
seems to imply).

Even Illumina makes $2B in revenue currently. There's lots of money in the
pharma industry and they always converge on a few big firms. They don't need
to be original to provide real technical value here.

Unlike the pharma industry where you shield your IP with lawyers for 10 yrs,
being first means nothing in the software world. It's about who can do it
best.

The founders comments here also clarifies that they aren't going after new
interesting areas because they are focusing on commercializing something that
works right now and advancing that data science aspect of it, instead of
making a future play on some original R&D.

~~~
untilHellbanned
Fair enough. Though to prove that their tech is better is definitely a data
access issue. I obviously wish them luck but some ex-Twitter employees doesn't
sound like a very leveraged team to be better at health data acquisition or
data science. Unlike Twitter or Facebook, being good at science requires
acquired knowledge. So even starting a bio PhD as one of the guy's did, the
strength of their competitors and the carcasses of people before wouldn't give
me any confidence to throw any amount of money let alone $45M at this thing.

------
mcarlise
This is border line academic imperialism. The founders seem to think that
software engineering and data science applied to biological data will provide
insight that traditional biology has not found. Unneccessary screening from a
single dimension (genome) is only going to misguide patients into
oppurtunistic drug companies and non-FDA approved remedies.

We are still unsure of the nature/nuture problem. What if the environment is
more attributable to cancer development?

~~~
brandonb
Elad started his career with a PhD in cancer biology prior to starting the
mobile team at Google, so I don't think "academic imperialism" is an accurate
descriptor.

I think you'll see more people with these hybrid skills over time. They take
years to develop, though.

If we want advancement in medicine, can we really do it without deep
collaboration between biology and computer science? If so, isn't Color
Genomics a model of particularly deep cross-disciplinary work, not academic
imperialism?

~~~
jahewson
> If we want advancement in medicine, can we really do it without deep
> collaboration between biology and computer science?

There's an entire field for that already, it's called _bioinformatics_.

------
noname123
Here's their white-paper on their testing methodology:
[https://s3.amazonaws.com/color-static-
prod/pdfs/validationWh...](https://s3.amazonaws.com/color-static-
prod/pdfs/validationWhitePaper.pdf)

"Color has developed a next-generation sequencing based test for hereditary
cancer. This test analyzes 30 genes associated with increased risk to develop
breast, ovarian, colorectal, melanoma, pancreatic, prostate, stomach, and
uterine cancers... The assay has a high degree of analytical validity for the
detection of single nucleotide variants, small insertions and deletions
(indels), and larger deletions and duplications (copy number variants, or
CNVs)."

So not micro-arrays like 23andMe that test for SNPs ($99?), but not full
genome sequencing either (~$1,000?); but specific sequencing for the sites of
these genomic regions of those 30 genes.

Wet Lab sequencing method: "Specifically, it includes target enrichment by
Agilent’s SureSelect method (v1.7) and sequencing by Illumina’s NextSeq 500
(paired-end 150bp, High Output kit)"; Unanswered question, are they doing the
sequencing in-house or using a facility somewhere else?

Computational method: "The bioinformatics pipeline was built using well-
established algorithms such as BWA-MEM, SAMtools, Picard and GATK. CNVs are
detected using dedicated internally developed algorithms for read depth
analysis and split-read alignment detection."

So basically perform the standard genome assembly, alignment with human
reference genome of your partial assembly, and then identify what variant of
these 30 genes the patient sample has; plus a special sauce for counting the
number of specific bp repeats, due to in-del events, this is not something I
am not too familiar, but presumably the number of a specific k-mer repeats you
have in these genes of interest might correlate to a specific type of cancer?
(would love to hear someone who is an expert in this field their opinion).

"These [30] genes are APC, ATM, BAP1, BARD1, BMPR1A, BRCA1,BRCA2, BRIP1, CDH1,
CDK4, CDKN2A (p14ARF and p16INK4a), CHEK2, EPCAM, GREM1, MITF, MLH1, MSH2,
MSH6, MUTYH, NBN, PALB2, PMS2, POLD1, POLE, PTEN, RAD51C, RAD51D, SMAD4,
STK11, and TP53". (You can follow up by searching them here, e.g.,
[http://www.genecards.org/cgi-
bin/carddisp.pl?gene=BRCA2&keyw...](http://www.genecards.org/cgi-
bin/carddisp.pl?gene=BRCA2&keywords=BRCA2)).

Also interesting to note, since it's clinical, each of their test has to be
verified by a certified "genetics counselor" and also meet lots of clinical
standards.

~~~
jrm5100
_plus a special sauce for counting the number of specific bp repeats, due to
in-del events, this is not something I am not too familiar, but presumably the
number of a specific k-mer repeats you have in these genes of interest might
correlate to a specific type of cancer? (would love to hear someone who is an
expert in this field their opinion)._

"Copy number variant" refers to larger deletions and duplications that can
occur in the genome. There isn't some specific cutoff for size, but some
examples in these kinds of genes would be an entire exon or gene. There are
countless studies that find correlations between specific variants or CNVs and
risk of cancers.

Standard variant detection is pretty straightforward. CNVs are harder because
they are longer (several hundred to several thousand base pairs) than the raw
data (150 to 250 bp for Illumina)- you don't get single reads that span the
entire variant. You have to normalize then look for differences in coverage,
or look for split reads (where the read is aligned on the border of one of
these CNVs).

This kind of funding baffles me because they don't seem to be proposing
anything new at all (maybe slightly better CNV detection?) and there are
already lots of labs/companies doing this kind of testing. Maybe they are
working on being very efficient to offer a better price.

~~~
noname123
Thanks for your detailed explanation.

Just out of curiosity and to follow-up, presumably this is a example of the
list of detected CNVs in a TCGA Breast Cancer data-set you're referring to:
[http://cancer.sanger.ac.uk/cosmic/gene/analysis?ln=BRCA1#cnv...](http://cancer.sanger.ac.uk/cosmic/gene/analysis?ln=BRCA1#cnv_t)

According to Sanger (or maybe TCGA?), a gain is when a genomic region (for a
diploid) has more than five absolute copies of this region and a loss is when
the genomic region has no reads
(([http://cancer.sanger.ac.uk/cosmic/help/cnv/overview](http://cancer.sanger.ac.uk/cosmic/help/cnv/overview)),
where the copy number is perhaps determined by that normalized distribution of
read coverage across the reference genome?

([http://bmcbioinformatics.biomedcentral.com/articles/10.1186/...](http://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-14-S11-S1#Fig1)).
This is for CNVs that are longer than the 150-200bp Illumina fragments (Fig1c.
Read Depth method, e.g., exome#3 looks like it has two absolute copies vs
exome #1 and #2)

Then for small CNVs that perhaps span that 150-200bp fragment, we use the
split read method to filter for incompletely mapped reads that are only
aligned on the edges to the reference. This implies that there was a
duplication event that expanded that sequence? (Fig 1b. Split Read method).

Presumably, the pipeline would determine the CNV sites in a specific patient
sample, then cross-reference with the TCGA CNV data-set and come up with
correlation score of how much those CNVs sites match with consensus CNVs in
the cancer data-set? Thanks again for your detailed breakdown.

~~~
jrm5100
The Sanger/TCGA (The Cancer Genome Atlas) stuff seems to be specific to
microarray data which is different (older, more expensive) than the newer
high-throughput data.

The figure you linked is a good explanation. The split read method is helpful
for finding the edges of the CNV, while the number of reads (relative to other
regions that were tested) can give an idea of the number of copies. The
problem is that these methods all have their own unique biases/noise that
makes it non-trivial to figure out the absolute copy number change.

Ideally they would find a similar CNV that has some clinical association.

The DGV has a lot of reference CNVs. Here are some in BRCA1:
[http://dgv.tcag.ca/gb2/gbrowse/dgv2_hg19/?name=id:3087443;db...](http://dgv.tcag.ca/gb2/gbrowse/dgv2_hg19/?name=id:3087443;dbid=gene:database)

~~~
noname123
Thanks jrm5100 for the link. I see the variants under the "DGV Structural
Variants" track. Really appreciate your explaining what CNVs are and also
following up on my questions/confusions!

------
daemonk
I am not sure there is really a large enough corpus of data out there that
covers multiple facets of this system to really give us a strong predictive
value. Just variant calls is probably not enough. There might be, if we can
somehow consolidate and integrate disparate datasets from various publications
and labs. But I don't think we are at that stage yet.

I am all for them trying though. I just don't think we are at a point where we
can make a good diagnosis/conclusion yet.

------
futuremeats
Does anybody know the total number / exact list of SNPs covered in this panel?
How about the read depth?

I found this whitepaper on their website, which provides some level of
detail...

[https://s3.amazonaws.com/color-static-
prod/pdfs/validationWh...](https://s3.amazonaws.com/color-static-
prod/pdfs/validationWhitePaper.pdf)

However, there were a good many asterisks and caveats about not testing every
position along these genes (some of which are quite large).

While I'm not aware of any other companies that are doing this type of direct
to consumer testing, companies like Myriad have offered targeted panels on
some of these gene targets for some time.

[http://myriadgenetics.eu/products/](http://myriadgenetics.eu/products/)

------
zzguy
There are already companies that do this... Today they are called innovators,
tomorrow they'll be called money-grubbing, unethical, anti-FDA maniacs for do
the same thing.

