Hacker News new | comments | show | ask | jobs | submit login

I suspect there's a second code hidden in there. From the article, describing the code symbols that are Roman letters:

    These unaccented Roman letters appeared with the frequency 
    you’d expect in a European language. But they don’t 
    represent letters—they mark the spaces between words.
It's implausible that these characters just happen to appear with a language-like frequency distribution and are all meaningless spaces. I suspect they actually have a meaning and provide a second message.

To clarify, it's like taking "SthisEisCtheRfirstEmessageT" and assuming all the capitals just indicate spaces.

I wouldn't doubt it, since codes within codes was common. However its not clear that they are necessarily related. There where some secret codexes where the additional codes acted like a watermark to tell you whose copy of the notes it was so if they leaked out you could go deal with them.

As a kid I was always making up codes and ciphers. Much more of a spy vs spy kid than a cops and robbers kid.

Unless they simply took random letters from some other text to get their spaces, in which case you'd get the correct frequency, in a sufficiently large text, with no meaning.

My initial thought was that they were inserted as a diversion to those who thought a simple frequency analysis would break what appeared to be a simple substitution cipher. Whereas in practice, the 'expected' substitutions yield gibberish, and the actual message is encoded using poly-alphabetic substitutions to throw off simple attacks.

    It's implausible that these characters just happen to
    appear with a language-like frequency distribution and
    are all meaningless spaces
Really? If I were to try to pick random letters I suspect I would end up mirroring the frequency that they appeared in English.

Probably depends on how you were picking them. If you were using something like DiceWare to pick letters from an English document, then yeah, you would nail the frequency. But if you were picking letters at random in your head, I would be very surprised if you got anything more than a very roughest of distributions. Anything more than "lots of e's, few q's" would surprise me.

Humans rather suck at picking random numbers. We skew towards picking numbers that seem "more random", whatever that means (which would probably work in your favour for picking random letters with an English frequency, though I'm not too sure), but we also avoid "randomly" generating streaks of numbers, because we feel those are "less random".

If you put two teams in a room, one flipping a fair coin (and writing down the results), and the other pretending to flip a coin but just faking the results, it is usually very trivial to pick out which team actually flipped the coin. They are going to have surprisingly long streaks of heads or tails.

I don't have any evidence to necessarily suggest it, but I suspect this anti-streak tendency will tend to be strong enough to interfere with any correct frequencies which may otherwise appear. ("Oh my, this is far too many e's in a row..")

Probably not. People are bad at random: http://scienceblogs.com/cognitivedaily/2007/02/05/is-17-the-...

I think his point is valid exactly because people are bad at randomness.

/me rereads

Oh, heh, I can see it that way now. I had intended my comment to say that, since you'd be trying to reach that set of ratios to hide things, you'd probably fail miserably against any competent analysis.

I think his computer/software is defective.

Testing the "random distribution" like it was done - with a small sample size - is ineffective at best

The article suggests this as well. In the last paragraph, they say that a second coding using numbers could be treasured in the text.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact