

Compression with the Burrows-Wheeler Transform - ssp
http://james.fabpedigree.com/bwt.htm

======
barrkel
Unless I missed it, this page doesn't actually contain a description of how to
invert the BWT. Fortunately, Wikipedia has a fairly decent description:

 _The inverse can be understood this way. Take the final table in the BWT
algorithm, and erase all but the last column. Given only this information, you
can easily reconstruct the first column. The last column tells you all the
characters in the text, so just sort these characters to get the first column.
Then, the first and last columns together give you all pairs of successive
characters in the document, where pairs are taken cyclically so that the last
and first character form a pair. Sorting the list of pairs gives the first and
second columns. Continuing in this manner, you can reconstruct the entire
list. Then, the row with the "end of file" character at the end is the
original text._

------
carbocation
This algorithm is highly useful for next-gen sequencing.

Heng Li, the creator of MAQ, has recently released an aligner called BWA. The
heart of BWA is the Burrows-Wheeler transform. The benefits? For genome
research, they include a small memory footprint (<4GB), and fast exact
matching of short reads to a reference.

Text compression, meet genomics. Genomics, meet compression.

------
terra_t
In my mind, Burrows-Wheeler is a clever trick, but it's not clearly superior
to dictionary-based methods. bzip2 beats gzip on compression not because B-W
is better, but because it has a larger window size and because of the entropy
encoder at the tail end.

7Zip builds on gzip, greatly expanding the window and using an arithmetic
encoder at the tail end. It performs better in both time and space over bzip2,
making both gzip and bzip2 obsolete

~~~
wmf
In tests I've seen, LZMA -1 is even slower than gzip -9 so I wouldn't say gzip
is obsolete in situations where compression speed matters.

------
imurray
It’s funny that the article mentions gzip but not bzip2, which is also
commonly installed on Unix systems and uses the Burrows–Wheeler block sorting
transformation (followed by Huffman coding).

------
Daniel_Newby
The original Burrows & Wheeler paper, SRC Research Report 124:
[http://apotheca.hpl.hp.com/ftp/pub/DEC/SRC/research-
reports/...](http://apotheca.hpl.hp.com/ftp/pub/DEC/SRC/research-
reports/abstracts/src-rr-124.html)

