The magic words you want to look for are [Unicode canonicalization], which aspires to make that (and other string-comparison needs) actually work. Implementation quality across the universe of programs is... mixed.
More restricted in that it only treats truly identical strings that have multiple representations the same. Normalization won't turn "foo" and "FOO" into the same string, but it will turn "fòó" and "fo<with grave accent>o<with acute accent>" into the same string.
Different in the sense that it creates a new string, rather than comparing two strings. Just like you neither need nor want to do a tolower(s) when comparing case-insensitively, you don't need nor want to normalize unicode to do a normalization invariant comparison.
The unicode standard uses "equivalence" to treat "<fl ligature>" and "fl" as equivalent (see http://en.wikipedia.org/wiki/Unicode_equivalence; Unicode technical report on normalization at http://www.unicode.org/reports/tr15/tr15-18.html)
The practical part of me agrees with patio11, and that the knowledge gain of having these semantics inherent in the file is offset many times by possibly having to treat different bytes as the same character semantically.
My Ubuntu box can't render the first three so I don't know what they are.
"is there" is blackboard lettering, though Unicode insists on calling it double-struck lettering. You may recognize the capital R in double-struck, ℝ, as the symbol for the Reals. http://en.wikipedia.org/wiki/Real_number Similarly, ℤ is the integers, ℂ is the complexes, et cetera. I can say this on HN, without formatting, because unicode.
The word "unicode" is in Mathematical Bold Italic. The words "for different," which you probably interpret as Fraktur, are ... oh wait unicode calls them mathematical fraktur. http://www.w3.org/TR/MathML2/bycodes.html#U1D58C 𝖆 means a ring group. ℵ is used for the cardinality of infinite sets. 𝖌 is a Lie algebra. Etc.
I don't know what the last set are.
What's weird is that if I copy/paste the variant characters from the article title into HN Search, it matches a whole bunch of articles, none of which are this one.
DuckDuckGo finds this post as its first hit if you copy/paste the title into its search box, but not any other articles about "Unicode" or "variants" (just lots of random junk).
Google, on the other hand, apparently canonicalizes the text, since it returns hits on other articles about Unicode and text variants, as well as this post.
So, here we have three text search programs that behave rather differently.
Also, Firefox doesn't find any of the variant text strings on this page if you search for the normal (ASCII) characters.
Mysql2::Error: Incorrect string value: '\xF0\x9D\x96\x86 m...' for column 'text' at row 1:
Thank you for the bug report ;)
It renders fine in both on my mac.
In other words Safari on iOS shows the same boxes. When and if Apple danes to fix it it will also be fixed on Chrome for iOS.
It's possible it's just a font issue. Tried pasting in FB Messenger, iOS Notes, iOS Messages. All of them just show boxes except for 🅆🄷🅈
Note that other than U+0020 (SPACE), U+1F146 (SQUARED LATIN CAPITAL LETTER W), U+1F137 (SQUARED LATIN CAPITAL LETTER H), and U+1F148 (SQUARED LATIN CAPITAL LETTER Y), iOS doesn’t have any fonts containing glyphs for any of the characters. (The squared letter characters are covered (on iOS 7 at least) by the fonts Hiragino Kaku Gothic ProN and Hiragino Mincho ProN²). However, for some reason, Safari on iOS is not performing font substitution in this situation and using one of the Hiragino fonts to display them. As you noted, this font substitution seems to be working fine in other iOS apps, as these squared letters are displaying there.
As an aside, on iOS 7, it is finally possible to install custom fonts yourself³ via a configuration profile⁴. I’ve installed a few fonts I’ve needed in this way⁵, and while they work perfectly in apps like Pages (for iOS), Safari seems to completely ignore their existence. Even after installing them and rebooting, Safari still shows the entire string as boxes⁶. (Symbola alone has a glyph for every character in the above string, so every character should certainly be displayable). However, copying and pasting the string of boxes into an app like Pages shows all the characters just fine⁷.
¹ — Screenshot is of UnicodeChecker (http://earthlingsoft.net/UnicodeChecker/).
² — http://support.apple.com/kb/HT5878
³ — http://www.saturngod.net/create-custom-font-for-ios-7
⁴ — https://developer.apple.com/library/ios/featuredarticles/iPh...
⁵ — http://f.cl.ly/items/0o360a3t1q2R3E2a2g2b/IMG_0403.jpg
⁶ — http://f.cl.ly/items/3f1o1R2F082i3I0B1y1o/IMG_0404.jpg
⁷ — http://f.cl.ly/items/3y1J2G0u2u1c0Y1w2Q3Z/IMG_0405.jpg
These should almost never be used outside that context as they have different byte values from the usual Roman characters (which means the computer doesn't even see them as equivalent without "help" from the programmer), may not be supported by every browser or text search program, and may not even render correctly on many OS systems as the glyph for that character may be swapped in for a glyph from a different font or not at all for older systems.