Hacker News new | past | comments | ask | show | jobs | submit login

> (e.g., Null character (�) a black diamond with white question mark in the middle).

The description in the linked article is wrong, that's not a null character (U+0000) or ASCII's NUL. That black diamond symbol is U+FFFD the Unicode Replacement Character, it means "Something went wrong, so instead here is this symbol". For example if your decoder algorithm gets some gibberish and you can't or won't accept errors in decoding, a correctly designed decoder should produce U+FFFD each time, this is what Rust's String::from_utf8_lossy and String::from_utf16_lossy are doing.

U+FFFD was carefully chosen because it's not anything. It's not a letter, or a digit, it's not some ASCII control character, it's not a separator in any known writing system or protocol, it has no defined pre-existing purpose, which means if bad guys can trick your bad software into injecting U+FFFD into some data, chances are they achieved nothing of value.




Tangential aside we've collectively made a mistake using A-F for HEX representation. Alphabetical order seemed obvious at the time, but there's a far more literal option that's just pleasing on a visceral level. LHTIFE. The horizontal lines of each letter are literally encoding binary information. True there's no letter encoding 3 in this block style but that can either be invented or ignored. Or you can fudge the pattern slightly with a diagonal such as z or a curved lower case such as b. It would have mapped so cleanly to 7 seg type displays. Even the name "HEX" is 2/3rds of the way to being self descriptive. A perfect little numeral grouping of chars.


That would be a trivia to have to know, otherwise it just looks like a mess of arbitrarily chosen letters. The binary patterns shown on the chosen letters do not match the actual binary patterns those letters represent. And then you don't try to do the same with 0 to 9 which makes the whole effort half-baked.


Just going to point out real quick that the lower left vertical on a 7 segment display does in fact come extremely close to directly encoding even/odd. Write them all out and see for yourself. Only the number 4 breaks the pattern. That would only leave two bits not directly encoded in segments as far as anyone has noticed.


The I is a bad choice since it's similar to 1, using the letter Z instead would be much better. The letter T can't be mapped to a 7-segment display either, so if that's your goal you would need to use EFGHLP.


I'm reluctant to grab "encoding letters" out of the curvy set, simply because there's nearly enough of them to make a complete binary-lettering alternative on their own. uhbDPB. That one is also frustratingly one digit missing from completion. If they were complete sets, you could treat curvy/rigid as a bit and thus have an easy and obvious system for translating between half-bytes and English orthography.

I'm not getting sucked back into this. There comes a point when you're just staring at the alphabet letters and thinking "TF am I DOING?"


Also, yeah obviously T can't be mapped, but I still put T in the same category as all the other square intersecting grid letters. There are other, obviously repeating groupings that would work better on a different segmented basis. Some examples: diagonals {Z N X Y A V K 7 4 W M}, circle based {O G C Q D}, loopy: {B P b d q J S 8 9 3 }, 7 seg: {L H T I F E}


I've seen UVWXYZ. Less seriously, I've seen GBNJFL (Great Big Numbers Just For Laughs).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: