Hacker News new | past | comments | ask | show | jobs | submit login

That depends on the specific kind of analysis being done and one what size of dataset you want to run the analysis. Some analysis techniques filters data down to just a few MiB, others need as much as a few hundred GiB, for a single individual. Sometimes you do population analysis on that data which can end up multiplying individual sample size to the size of the population.



Ok, what kind of ballpark time do each of those entail? You didn't give a single hard number.


I've done work in statistical genetics, but not in forensic analysis specifically, so take these rough estimates with a grain of salt. That said, forensic analysis generally involves identifying the origin of a tissue sample. If the sample is substantial enough for reliable genotyping, once the genotyping is done the actual identification step should take less than a second in a well-optimized system, even to pick out the origin from every human genome in history (assuming such a database was available.) It's just a matter of finding who has matching allelic variants, and ensuring that you match over enough common alleles with low linkage disequilibrium (shared-haplotype correlation) that the risk of a match is infinitesimal, even among individuals with high consanguinity. If you organized the genomes in a tree where each node split on a single allele, you could search in logarithmic time.


The situation is more difficult if you consider the possibility of having multiple contributors to a sample (mixtures), allele dropout, etc. I’ve heard some models/implementations take a day for MCMC sampling to calculate the probabilities. (I’m talking about forensic DNA specifically)


Are you talking from DNA extraction to customer-has-their-results? Or just specific analysis?

For a very high priority sample, DNA extraction can take a couple of days; sequencing prep can take a few hours to a couple days; sequencing can take a couple of hours to a couple of days depending on how much data you need; quality control can take a few hours to a few days; first stage analysis software (just calling what DNA exists and giving it quality scores) can take a few hours also depending on how much data you're analyzing to a couple days; so from initial extraction start until you have DNA data you can analyze for a specific product is a week if you babysit; up to four weeks if you worry more about cost efficiency. After that depends on what type of customer-end analysis you want to do and using what software. Industry standard university level software can take anywhere from a few hours to a few months... again also depending on how much data it's looking at.

Typically the turn-around time (we receive DNA to we've published results to the customer) is around 6 to 8 weeks for most of the analysis products my company sells.

> You didn't give a single hard number.

Considering end-to-end could take a few days (super high priority sample being baby-sat at every step of the way by experienced technicians) to a few months (statistical analysis on populations)... it's pretty hard to give a single number. It really depends on what analysis you want done.

There's a lot of room for improvement in this field of software!


> There's a lot of room for improvement in this field of software!

That's what I was thinking. If it takes a lot of CPU time, it seems that there is likely a huge amount of optimization that would be useful to the people using it.

> it's pretty hard to give a single number.

I wasn't looking for a single number in the sense that it encapsulates everything I was looking for at least one number to see if the run time of these processes is a hindrance. What you have here is very interesting, thanks!

For the 6 to 8 week analysis, how many cores is that typically running on?


6 to 8 week analysis goes through many steps on many machines. "How many cores" isn't directly comparable. All of our sold products target 6 to 8 weeks.

Think of it like an assembly line. How many robot arms does the assembly line need to produce something in 6 to 8 weeks? It depends on the type of object being produced; more intricate details, sizes, or etc might need to go through more robots, more arms. An assembly line for a complex item, such as a vehicle, doesn't put all of the materials in front of a single robot and give the robot more arms. It instead goes through many different robots with specialized purposes.

So if your factory produces many different things, but some of those specialized robots could be generalized again for multiple products so that you share those robots among things being produced, you might have one robot that spends 20% of its time producing a thousand of product A and 75% of its time producing a hundred of product B, and another 5% of its time in maintenance.

It's the same for DNA analysis; it goes through many different software steps, each with varying requirements of I/O throughput, CPU speeds, memory size, etc. Each individual step could be (and generally is) run on different machines with different resources suited to a particular step. Sometimes those steps can be shared among different products. But different products will take different amounts of time to work through the analysis.

The machines running our largest analysis steps have 88 cores and 512GiB of RAM. But number of cores is not indicative of why it takes 6 to 8 weeks to process.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: