Hacker News new | comments | show | ask | jobs | submit login
The Sad Story Of The Unicode Snowman (☁→❄→☃→☀→☺→☂→☹→✝.ws)
149 points by treitnauer 2580 days ago | hide | past | web | favorite | 36 comments

I really can't wait until the phishers get their hands on this. It's going to be a gas.

Maybe browser vendors could agree to render non-ASCII characters in a different colour?

That way the characters are respected, but the user is alerted if a very similar (non-ASCII) code has been used to dupe a user.

That'd be fine for Americans (or other people using ASCII) faced with a Russian а, but how about a Russian faced with an ASCII a? You could hardly flag every use of ASCII, and creating a table of which characters go together seems a a lot of work (and it would be hard to do properly).

Would it really be a lot of work? I'd imagine the effort expended towards security in general exceeds what you're proposing.

EDIT: As I understand it, Cyrillic languages use code page 866 as an extension to ASCII http://en.wikipedia.org/wiki/Code_page_866. Is this correct?

I'm sure that something would be extremely painful. Some sort-of-independent tribe/people that uses one or two letters that are "really part" of another language. In short, politics.

To the best of my knowledge, Cyrillic languages don't use the Roman script (except where letters appear to be similar). The ASCII subset of codepage 866 is for "cd C:\", not for Cyrillic.

I agree - it could be tricky politically, but I'm not sure if the alternative of representing the characters via punycode conversion is more culturally supportive / sympathetic?

CP866 is a very old standard, it was used in pre-Windows times. There are at least 3 more standards to encode cyrillics in 8 bits. Today, most Cyrillic letters on the web are encoded in either UTF-8 or CP1251.

All of them define the whole alphabete, though, so even the letters that look similar to some latin letters are always encoded differently.

The solution to the known problem is more or less "don't use them" (ie, punycode them). It's still a pretty crap situation, it'd be nice to have but if we actually use them in a nice manner, it opens Pandora's Box.

They show up properly for some TLDs, those that have a policy of not allowing confusing/pointless characters. Far as I can tell though, you can't register names like the following (due to TLD restrictions). http://☁→❄→☃→☀→☺→☂→☹→✝.org/

In Chrome, the domain name in question shows up as:


This is to prevent phishing. Hardly ideal, but it works.

Back in the 80s one would be very excited to find a BBS that supported enhanced Apple IIe Mouse Text for drawing graphics, symbols, and menus at 1200 baud.

I suppose this is the Stanley Kubrick 2010 version of that.

Does anyone here remember the graphical BBS client that ran on the Apple II? It ran in hires mode. Wikipedia indicates it might have been called PixelTerm. Whatever it was called, I seem to recall it was written in assembly, as I remember printing out the source listing on my Epson MX100 and spreading all 50+ sheets out on the floor to learn assembly. Ah the days, and sorry for the tangent. :-)

On a related note, does anyone remember the graphical dos BBS client from the 90s called ripterm?

My first modem was 2400 baud so I'm a youngin' by some standards, but wow, seeing incremental graphic drawing was pretty cool back in the pre-AOL days. It was neater than progressive-jpeg because you'd watch the vector image get constructed on the fly: first the background outline, then a floodfill, then some more shapes, a few more flood fills, some detail lines, etc.

Before that I do remember using a modem in 3rd grade (circa 1988) and reading ascii-art emails from my principal. It was a green monochrome display so it was probably either ASCIIExpress or ProTerm, but I'm just guessing based on context.

I remember RIPscrip not getting much love on the BBS scene while ANSI got more and more elaborate; like a very juvenile microcosm of the war between bitmap and vector graphics.

It might be because while RIP graphics were technically superior, they were just another lo fi graphics medium. PCs had better graphics in games etc, while ANSI art has an enduring oldschool quality about it. Also it degrades gracefully to monochrome ASCII. :-)

High-end ANSI art did not degrade well, for what it's worth. Beyond that, towards the end of the trend, BBS banners were really only legible in TheDraw's high-res mode.

if you are a youngin' at 2400 baud, what does that make me with my parallel port 14,400 that was bigger than the laptop that I am typing this on?

RIP was quite widespread in NZ for some reason.

Ha! Excellent. Although I might like to add a ♨ in there (technically the hot springs character, but it could also be melting snow).

I wonder how many twitter clients are going to recognize that as a link: http://twitter.com/#!/dlsspy/status/18062117684903937

Twitter itself does, but tweetdeck does not.

FriendBinder does, I remember fixing it for the http://tinyarro.ws link shortner

I don't know about twitter clients, but Google Reader makes a mess out of it in the HN RSS feed.

For real fun, try this one: http://twitter.com/#!/apag/status/1260699996

Twhirl does not display all the necessary characters, but posting the URL is possible. It did not convert into a clickable link on identi.ca.


That is how the names are represented to the DNS: http://en.wikipedia.org/wiki/Internationalized_domain_name#T...

I believe this is abstracted at the browser level.

I don't get it. Could someone explain the story?


One cold night, when the temperature was just at that comfortable temperature below freezing, clouds began to form in the heavens. Before long, these clouds let forth a surplus of unique, sticky snowflakes. These snowflakes came to litter the ground. A child looked out and observed his frontyard which had been transformed into a winter wonderland. Rushing outside, he rolled the snow into 3 large orbs descending in size; thus he made the snow into the likeness of a man. The sun was shining warm as he worked. All was happy.

However, as the sun continued to come out, the falling snow quickly turned to rain. The rain came down upon the snow, and working with the sunlight, the snow quickly went away. As the snow disappeared, so too did the snowman. The snowman bore a slow death as he melted to the ground. Such sorrow was wrought, until the final death had at last approached.

LOL, that's awesome :)

Neat, but I can't help thinking "oh goodie, URL's I can't type..."

I mean, there's always Unicode tables/charts you can copy the symbol out of, or perhaps keystroke combinations, but Unicode has 65,533 characters.

Can somebody post a screen-shot for those of us who only see squares?

http://ompldr.org/vNnA4bA - Screen-Shot of the linked Homepage

http://ompldr.org/vNnA4bw - Screenshot of HN

Why do you only see squares?

lack of international language support likely.. I see the same thing.

I was expecting a story about how/why it was introduced into Unicode (many parts of it have interesting stories). This is completely content-free.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact