
Show HN: Fast suffix arrays in Python - Labo333
https://louisabraham.github.io/notebooks/suffix_arrays.html
======
matt4711
There are much faster SA construction algorithms than skew (check out
divsufsort). The O(n) algorithms using induced sorting are also likely much
faster than this work. The constants of recent O(n) algorithms are very low.

here a link for some state-of-the-art benchmarks of non-parallel SA
construction algorithms:
[https://github.com/y-256/libdivsufsort/blob/wiki/SACA_Benchm...](https://github.com/y-256/libdivsufsort/blob/wiki/SACA_Benchmarks.md)

there also now exist linear speedup parallel SA construction algorithms.

~~~
pelario
Hi, do you have some pointers for "linear speedup parallel SA construction
algorithms." ? Thanks!

~~~
Labo333
Hi, sorry, I don't have anything in particular. However, it seems that
[http://snnynhr.github.io/ParallelSuffixArrays/](http://snnynhr.github.io/ParallelSuffixArrays/)
contains interesting algorithms and links.

------
lqdc13
Is there another implementation we can compare this with?

Kind of crazy to say that this is fast without any comparisons with anything
else. So far I've been using suffix trees from here:
[http://www.daimi.au.dk/~mailund/suffix_tree.html](http://www.daimi.au.dk/~mailund/suffix_tree.html)

~~~
dom0
Article seems to be mostly about efficient, not fast.

~~~
lqdc13
"Suffix arrays : How to compute them fast with Python"

The assumption is it's fast in Python without using Cython etc. My intuition
says that looping and iterating in Python is slow. It seems like a great
reference implementation, but I don't think anyone should use it if there are
implementations that are (possibly) orders of magnitude faster.

I guess my bigger issue with this is that in many cases it's much more time
efficient to just drop to Cython earlier on instead of fighting with the
language.

------
xfer
When i was doing a derivation of LZ compression using suffix arrays i used
induced sorting. Here is a good overview:
[http://zork.net/~st/jottings/sais.html](http://zork.net/~st/jottings/sais.html).

Also i think Doug's C implementation of MM suffix array is worth mentioning :
[http://www.cs.dartmouth.edu/~doug/sarray/](http://www.cs.dartmouth.edu/~doug/sarray/)

~~~
Labo333
Cool page! It would be really great to have the code shipped in a single file
so that people can use it!

I love Jupyter notebooks because 1) they are beautiful 2) if you download
them, you can play :)

Doug's implementation seems indeed really fast.

~~~
xfer
It's not mine. Sorry if my comment gave that impression.

------
visarga
Anyone remember "sary"? It was doing suffix arrays 17 years ago. Last update
was in 2005.

[http://sary.sourceforge.net/](http://sary.sourceforge.net/)

