Hacker News new | past | comments | ask | show | jobs | submit login

I'd be curious to see some numbers backing one or the other, but I suspect proportional would be better. Fixed width narrows to a number of characters. So it depends how many possible values have x characters. Proportional narrows to a particular width, which could be more granular than character count. So it depends how many possible values turn out to be exactly x pixels across given the type settings.

Proportional is easier to try to decode. See http://cryptome.org/cia-decrypt.htm for an example from 2004 about inferring words from a redacted US document.

From the Le Monde article: The space occupied by an "I" differs from that taken by "W," which can give additional clues, compared to the text-spacing known as "monospace," like that of often used in e-mailwhere all the letters have the same spacing.

From the NYT article: In January, the State Department required that its documents use a more modern font, Times New Roman, instead of Courier, Mr. Naccache said. Because Courier is a monospace font, in which all letters are of the same width, it is harder to decipher with the computer technique. There is no indication that the State Department knew that.

It doesn't really matter how many values turn out to be exactly x pixels across given the type settings, you still have to do an exhaustive search on different numbers of characters to find them. That problem is simplified by a fixed width font.

In general, though, a proportional face will give a larger number of possible widths, meaning more information to work with (fewer possible strings will map to any particular width).

Additionally, depending on the type of information you are trying to recover, you may already know the number of characters (for example, a US SSN), in which case fixed-width typefaces mean you get no information you didn't already have, whereas a proportional one would at least give you a little bit of info. But on the other hard I would also guess fixed-length strings such as an SSN are more likely to appear in a more "table"-like part of a document than in the middle of a paragraph of text.

I suspect it'd still be a bloody hard analysis either way, but at least according to this back-of-the-envelope theory a proportional typeface should preserve more information.

Oh yeah, in the case where you already know the number of characters, a proportional case font definitely gives you more information. That's a good point. I was thinking more of the general case where any number of characters might be redacted from a document.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact