

Sukhotin's Algorithm - budu
http://alaska-kamtchatka.blogspot.com/2010/07/sukhotins-algorithm.html

======
tansey
Very cool algorithm. I just implemented and tested it for fun:
<http://www.nashcoding.com/?p=51>

Basically, all it's doing is tracking the letter with maximum frequency and
subtracting 2x its occurrences next to every other letter. If there's a letter
that's still positive in frequency, it keeps going, otherwise it assumes all
the vowels have been found. Clever!

~~~
lookACamel
What if the words in your language looked like 'thcobjk'?

~~~
tansey
Presumably it would work so long as vowels appear more frequently than
consonants and typically are seen next to consonants. If you want to try a
made up language, just create one and feed it into the program.

------
daivd
If I understand this correctly, saying that vowels tend to be close to vowels
is just a special case of using n-grams. If you have a fingerprint with the
common n-gram distribution for the target language (or even subject), you get
an optimization problem where you try to guess the substitutions such that the
angle between the fingerprint vectors are minimized.

If it cannot be solved analytically, it seems something like a GA should solve
it well.

Is there a standard method for solving substitution cryptos?

~~~
tudorachim
There is a much more straightforward and principled way: the Metropolis
algorithm
([http://en.wikipedia.org/wiki/Metropolis%E2%80%93Hastings_alg...](http://en.wikipedia.org/wiki/Metropolis%E2%80%93Hastings_algorithm)).
You are basically sampling from the distribution of n-gram likelihoods by
following a markov chain. Here is a javascript implementation:
<http://www.andrew.cmu.edu/user/tachim/page.html>

~~~
abecedarius
See also <http://norvig.com/ngrams/> for Python. Random-restart hillclimbing
instead of Metropolis.

------
p3ll0n
A main problem for speaker independent automatic speech recognition systems is
the variability of the speech signal - i.e. the same sequence of words uttered
by different speakers or even uttered several times by one speaker never
results in identical speech signals.

ROS (rate of speech) is one of the primary contributors to this variably and
some recent research
(<http://www.ee.columbia.edu/~dpwe/papers/PfauR98-spkrate.pdf>) has shown that
good estimates of speaking rate can be obtained using vowel detection as
vowels in general correspond to syllable nuclei.

I wonder if Sukhotin's algorithm could be modified to improve upon this work?

~~~
shasta
Probably only in the same sense that Sukhotin's algorithm could be modified to
implement the rules to Super Mario Brothers.

