

SeqAlign: Hardware Acceleration of DNA Sequence Alignment - luu
http://www.chrisfenton.com/seqalign/

======
dalke
Did anything ever become of that work? Based on the project link to
[http://opencores.org/websvn,listing,seqalign](http://opencores.org/websvn,listing,seqalign)
, I see that that the last modification time is 2009-08-17 18:21:06 GMT, so
it's been 4 years.

BTW, there are more details at [http://chrisfenton.com/wp-
content/uploads/2009/08/final_repo...](http://chrisfenton.com/wp-
content/uploads/2009/08/final_report.pdf) , which has a date one day before
the above "last modified" time.

~~~
tensor
As far as I know, people generally use more sophisticated algorithms that run
on commodity hardware. I remember hearing about this many years ago, but never
actually saw it in the wild.

~~~
epistasis
Smith-Waterman is the "gold-standard," but for speedy heuristic DNA alignment
these days, BWA and Bowtie are probably the two most common mappers, and
they're both based on the ideas of the FM-index [1]. BLAST and BLAT were
previously the two most commonly used, both hash-based aligners, and they're
still used today for one-off database searches, as they are more accurate,
particularly for long sequences.

[1] [http://en.wikipedia.org/wiki/FM-index](http://en.wikipedia.org/wiki/FM-
index)

~~~
hyperbovine
But see: [http://snap.cs.berkeley.edu](http://snap.cs.berkeley.edu). BWT-based
aligners hark back to a time when it was not cheap to own enough RAM to store
an entire (human) genome seed lookup table in memory. That is no longer the
case (you need about 64gb). Also, hash aligners perform better as the read
length increases.

~~~
epistasis
Well, BLAT for example also stores the entire seed table in memory, it just
has to use fewer, shorter, and more frequently-occurring seeds. BWT requires
something like two random RAM accesses per base pair, which is incredibly
slow. With a large enough seed table, hash-based can get down to just a few
random accesses per 100 base pair read.

Has SNAP been published? I had heard about it but not seen it used in practice
anywhere.

------
usamec
It is much better to speed to aligment using heuristics (e.g. use hashing for
find matching kmers and do dynamic programming only on small portion of data)
than by using faster hardware. Look here: [http://bowtie-
bio.sourceforge.net/index.shtml](http://bowtie-
bio.sourceforge.net/index.shtml) or here:
[http://mummer.sourceforge.net/](http://mummer.sourceforge.net/)

------
TheLegace
I'm curious what people would think if they had this in their computers,
laptops, game systems. So they could actively choose to help large scale
computation for DNA sequencing or any other big scientific problem they could
lend computation to.

~~~
gren
Yeah! like some hidden script in popular websites doing some computation with
WebCL

------
oakwhiz
I wonder if it would somehow be possible to use this with the protein folding
game Foldit. [http://fold.it/portal/](http://fold.it/portal/)

Some puzzles start off with a sequence alignment phase.

~~~
epistasis
These particular algorithms are commonly used with proteins' amino acid
sequences for general database searches. But since the database of protein
sequences is far far smaller than typical DNA search problems, there are more
sophisticated and computationally expensive algorithms such as Pair HMMs [1]
or Profile HMMs [2] can be used, and for fine tuning of 3D model threading
they would be a much better option, since the problem is so small.

[1]
[http://ai.stanford.edu/~serafim/CS262_2008/notes/lecture8.pd...](http://ai.stanford.edu/~serafim/CS262_2008/notes/lecture8.pdf)

[2]
[http://www.biology.wustl.edu/gcg/hmmanalysis.html](http://www.biology.wustl.edu/gcg/hmmanalysis.html)

