
Verifying the Substitution Cipher Folklore - nazri1
http://www.spinellis.gr/blog/20160318/
======
unlikelymordant
You _can_ break substitution ciphers pretty trivially, just using a simple
hill climbing algorithm see
[http://practicalcryptography.com/cryptanalysis/stochastic-
se...](http://practicalcryptography.com/cryptanalysis/stochastic-
searching/cryptanalysis-simple-substitution-cipher/)

If your cipher is at least 100 characters this will solve it very quickly.

------
arnarbi
Many misunderstand the classical lesson as "substitution ciphers are trivially
broken by symbol frequency analysis", which isn't the point.

The point is to illustrate a property of a cipher that leaks information, in
this case the symbol frequencies because the cipher preserves them. This is
information that we don't normally consider valuable when working with
plaintexts, but for crypto it's enormously valuable (i.e. it leaks a lot of
information).

------
grzm
_" I was expecting that this would yield an almost perfect result. In fact,
the result still needs significant guesswork to decrypt."_

I've never heard that substitution ciphers are simple to break using _only_
letter frequencies. It does get you to a point where it makes the guessing a
lot easier.

------
tptacek
If you'd like to play with this yourself, it's #6 in the cryptopals
challenges:

[http://cryptopals.com/sets/1/challenges/6](http://cryptopals.com/sets/1/challenges/6)

I agree with the author: it's conceptually very simple, but a little tricky to
code, even in the simplest case where you're relying on simple letter
frequencies. You could probably do 10 good challenges on different ways to
attack this problem, and towards the end you'd be getting into somewhat
serious cryptanalysis: for instance, look at what Patterson and Al Fardan did
with RC4.

~~~
pvg
The problems in set 1 are a little different, though. You get the important
hint (which the author was missing) to use letter frequencies for a fitness
function. But the cryptopals set 1 ciphers have tiny keyspaces that are easily
searched exhaustively - a substitution cipher in general doesn't. To solve the
author's problem you need to re-invent something like stochastic hill
climbing.

------
empath75
He wrote like half the algorithm and then said it didn't work.

~~~
cmrx64
Seriously. With a simple genetic algorithm you can do _significantly_ better.
Considering 2-grams etc. It's nice that they decided to write up what they
"learned" with this simple experiment, but they didn't try very hard. Or look
at the literature.

~~~
Pitarou
Nobody is disputing that. If you think they are, you have misunderstood the
article.

------
stevetrewick
> _" In fact, the result still needs significant guesswork to decrypt."_

I have never heard anyone other than the author of this piece suggest
otherwise. Ironically, this result is trivial. That said, I have a pretty
serious classical crypto habit, so my conception of what constitutes 'crypto
folklore' may be poorly calibrated.

------
Smaug123
I used simulated annealing, which is a non-obvious but fairly easy algorithm:
[https://github.com/Smaug123/ClassicalCiphers.jl/blob/master/...](https://github.com/Smaug123/ClassicalCiphers.jl/blob/master/src/monoalphabetic.jl)

------
benchaney
I know plenty of CS undergrads who have broken the substitution cipher a part
of a assignment. Saying it isn't trivial just because you couldn't do it
foolish.

~~~
johncolanduoni
Did they break it using _only_ single symbol frequency counts? Completely
ignoring positional information? Because when a human breaks it by starting
with frequency counts and then filling in blanks in words, that's a totally
different ball game. Computers can definitely do it, but you need to put in
more effort.

------
nullc
[http://www-i6.informatik.rwth-
aachen.de/unravel/](http://www-i6.informatik.rwth-aachen.de/unravel/)

------
jakewins
Does anyone know the history of where the word "trivial" started being used to
mean "easy" instead of as "unimportant", like the author does here?

It drives me crazy - but perhaps I'm the one that's wrong. Is it correct to
use "trivial" to mean "easy"?

I keep thinking it comes from people misunderstanding the meaning of "non-
trivial", as in complex

~~~
stan_rogers
It literally means "of the _trivium_ " (grammar, logic/dialectic and rhetoric)
, or, more broadly, that which can be arrived at by straightforward argument,
without further specialized knowledge (covered in the _quadrivium_ ). It can,
but doesn't necessarily, mean either easy or unimportant. Is identity merely
unimportant, or is it also easy? Is a long chain of logic either easy or
unimportant merely because it's trivial?

