If you were to randomly corrupt a highly redundant format, like a bitmap, it would just change a few pixels. On more compressed formats like JPEG it seems to affect the entire image, and in very specific ways (mainly the color of every block of pixels after the point that was corrupted.)
If you corrupted a perfectly compressed image, it would give you a different image entirely, possibly of something very similar. I.e. if you had a image format very good at compressing faces, corrupting it would result in a different face entirely, not randomly misplaced pixels or colors. And the face would be similar to the original, maybe with stuff like a different nose type, or an extra freckle.
The corruption is revealing what kinds of assumptions it is making about the content.
It goes like this: in a sufficiently "fault-tolerant" language (ie with low per-character information content, which sorta gives large Hamming distance), crosswords become impossible, because puzzles won't be satisfiable: words could never line up right. But in a sufficiently compact language (many bits per character in words, and Hamming distance thus tending towards zero), crosswords are impossible as well, but now because puzzles would be ambiguous: too many words that fit. Somehow, natural languages appear to fit somewhere in the middle.
I don't recall where, but I have heard an argument that there is pressure on the information density of natural language, as follows: If you are frequently misunderstood, then you can save effort (not have to repeat/explain) by adding redundancy. If you are never misunderstood, then you can save effort by removing redundancy.
There's no particular reason the result of this process should be good for crosswords, but it's a reason to be in the middle rather than at the extremes.
If the added words improve comprehension, then they're not redundant. Not all repetition is redundant. Oh, by the way, did I forget to mention that not all repetition is redundant? :)
in a sufficiently compact language crosswords are
impossible as well, but now because puzzles would
be ambiguous: too many words that fit
It turns out that I either hallucinated (incorrectly, perhaps) one side of the argument, or I read it from some other place, because the paper only talks about one side, the one about feasibility. The relevant paragraph follows:
"The redundancy of a language is related to the existence of crossword puzzles. If the redundancy is zero any sequence of letters is a reasonable text in the language and any two-dimensional array of letters forms a crossword puzzle. If the redundancy is too high the language imposes too many constraints for large crossword puzzles to be possible. A more detailed analysis shows that if we assume the constraints imposed by the language are of a rather chaotic and random nature, large crossword puzzles are just possible when the redundancy is 50%. If the redundancy is 33%, three-dimensional crossword puzzles should be possible, etc."
Anyway, to somewhat rescue a botched story, the ambiguity part reminds me of the mindblowingly awesome and famous puzzle by Jeremiah Ferrell on the eve of the 1996 presidential election, which clue was "Lead story in tomorrow's paper", with possible answers BOBDOLE ELECTED or CLINTON ELECTED, both satisfying all crossclues as well! Amazing:
Though that was due to overly aggressive compression instead of corruption.
It makes me wonder what audio format could be fun to corrupt this way. FLAC, maybe?
Many video formats do look back a fair number of frames, so you do sometimes see corruption remain and spread for a few seconds, but then you hit a key frame or block boundary where everything resets and all is suddenly well again.
Audio compression is usually a different bag entirely: over a given time period the compression algorithm looks for what it can leave out or merge from the input signal over that time because human ears are unlikely to hear the difference, there is much less opportunity (compared to still images and video) to be able to encode "copy that chunk from a frame or few back, rotate/skew/what-ever it a small amount, then change these few blobs".
But I don't know if there is a way to create a tree of the possibilities.
It's very trial and error, but the results can be fun when you screenshot each progressive glitch and then gif them together to get stuff like this:
Neat that someone automated the process, though.
Later I combined this with another script to gather up the generated images and create an animated GIF.
An example: https://plus.google.com/107781042718674753240/posts/icwnoyHY...
(Actually, now I remember there was a subreddit for pictures of dead children. I don't know if it's still up.)
edit: oh yeah, it's still up.
But seriously though, I was quite was surprised by how little is lost if you repeatedly encode the same image. Then again, it makes a lot of sense if you look at how blocks are encoded.
Turns out, not every JPEG encoder is identical in that regard and Photoshop destroys more than others.
Hopefully more systems will start shipping with checksumming file systems by default. Even better if they have error correction.
I still have some of the first MP3s I ripped back in the late 90s. They still play, but it sounds like a scratched CD. HDDs aren't as robust as we'd like to think.
If you put enough restart markers in an image, you can swap or replace the compressed blocks between any given sets of restart markers in the same image without glitching it.
I'd share them, but copyright and all that.
Get an mp3 error checker and see if it has anything to say about the particular files.
You also might want to grab EncSpot Pro (free, for windows) to figure out exactly what codec you used. Chances are that if the codec was buggy, then it's been documented such a google search on the specific codec and version will probably turn up people talking about the known errors.
I wonder, could you do this just by chopping I frames from one video with B and P frames from another?