I suspect there's a second code hidden in there. From the article, describing the code symbols that are Roman letters:
These unaccented Roman letters appeared with the frequency
you’d expect in a European language. But they don’t
represent letters—they mark the spaces between words.
It's implausible that these characters just happen to appear with a language-like frequency distribution and are all meaningless spaces. I suspect they actually have a meaning and provide a second message.
To clarify, it's like taking "SthisEisCtheRfirstEmessageT" and assuming all the capitals just indicate spaces.
I wouldn't doubt it, since codes within codes was common. However its not clear that they are necessarily related. There where some secret codexes where the additional codes acted like a watermark to tell you whose copy of the notes it was so if they leaked out you could go deal with them.
As a kid I was always making up codes and ciphers. Much more of a spy vs spy kid than a cops and robbers kid.
My initial thought was that they were inserted as a diversion to those who thought a simple frequency analysis would break what appeared to be a simple substitution cipher. Whereas in practice, the 'expected' substitutions yield gibberish, and the actual message is encoded using poly-alphabetic substitutions to throw off simple attacks.
Probably depends on how you were picking them. If you were using something like DiceWare to pick letters from an English document, then yeah, you would nail the frequency. But if you were picking letters at random in your head, I would be very surprised if you got anything more than a very roughest of distributions. Anything more than "lots of e's, few q's" would surprise me.
Humans rather suck at picking random numbers. We skew towards picking numbers that seem "more random", whatever that means (which would probably work in your favour for picking random letters with an English frequency, though I'm not too sure), but we also avoid "randomly" generating streaks of numbers, because we feel those are "less random".
If you put two teams in a room, one flipping a fair coin (and writing down the results), and the other pretending to flip a coin but just faking the results, it is usually very trivial to pick out which team actually flipped the coin. They are going to have surprisingly long streaks of heads or tails.
I don't have any evidence to necessarily suggest it, but I suspect this anti-streak tendency will tend to be strong enough to interfere with any correct frequencies which may otherwise appear. ("Oh my, this is far too many e's in a row..")
Oh, heh, I can see it that way now. I had intended my comment to say that, since you'd be trying to reach that set of ratios to hide things, you'd probably fail miserably against any competent analysis.