
Unicode Text Converter - sysk
http://www.panix.com/~eli/unicode/convert.cgi?text=The+secret+is+out.+
======
Systemic33
Well that definitely takes the 𝕡𝕣𝕚𝕫𝕖 for most noticeable Hacker News
submission.

Suggestion (if you are author): There are a lot of chars that look like
another char, often used on the web, so i think that there are more advanced
versions to be made. I think i read that a lot of thai signs and cyrillic look
like latin chars.

~~~
samuellevy
Yeah, it's great fun to put a cyrillic "а" into a variable name in code.

~~~
solistice
or having your variable names in _𝕱𝖗𝖆𝖐𝖙𝖚𝖗 which might be more appearent but
none the less annoying. That'd make a nice useless language though.

    
    
      𝖕𝖚𝖇𝖑𝖎𝖈 𝖛𝖔𝖎𝖉[] 𝖒𝖆𝖎𝖓(𝖘𝖙𝖗𝖎𝖓𝖌[] 𝖆𝖗𝖌𝖘) {
        𝕮𝖔𝖓𝖘𝖔𝖑𝖊.𝖂𝖗𝖎𝖙𝖊𝕷𝖎𝖓𝖊("𝕳𝖆𝖑𝖑𝖔 𝖂𝖊𝖑𝖙");  
      }
      // 𝕽𝖊𝖈𝖍𝖊𝖓𝖒𝖆𝖘𝖈𝖍𝖎𝖓𝖊𝖓𝖘𝖕𝖗𝖆𝖈𝖍𝖊 "𝕱𝖗𝖆𝖐𝖙𝖚𝖗" 𝕰𝖎𝖓𝖘 𝕻𝖚𝖓𝖐𝖙 𝕹𝖚𝖑𝖑 𝕹𝖚𝖑𝖑

~~~
jevgeni
OMG, yes:

    
    
        # 𝕲𝖊𝖒ä𝖘𝖘 𝕽𝖊𝖎𝖈𝖍𝖘𝖆𝖚𝖘𝖘𝖈𝖍𝖚𝖘𝖘 𝖋ü𝖗 𝕬𝖑𝖌𝖔𝖗𝖎𝖙𝖍𝖒𝖎𝖘𝖈𝖍𝖊 𝕬𝖗𝖇𝖊𝖎𝖙 
    
        𝖐𝖑𝖆𝖘𝖘𝖊 𝕭𝖊𝖌𝖗ü𝖘𝖘𝖚𝖓𝖌𝖘𝖆𝖓𝖟𝖊𝖎𝖌𝖊𝖇𝖊𝖉𝖎𝖊𝖓𝖒𝖊𝖈𝖍𝖆𝖓𝖎𝖘𝖒𝖚𝖘: 
            𝖉𝖊𝖋 __𝖆𝖓𝖋𝖆𝖓𝖌𝖊𝖓__(𝖘𝖊𝖑𝖇𝖘𝖙, 𝖁𝖔𝖗𝖓𝖆𝖒𝖊): 
                𝖘𝖊𝖑𝖇𝖘𝖙.𝖁𝖔𝖗𝖓𝖆𝖒𝖊 = 𝖁𝖔𝖗𝖓𝖆𝖒𝖊
    
            𝖉𝖊𝖋 __𝖘𝖈𝖍𝖓𝖚𝖗__(𝖘𝖊𝖑𝖇𝖘𝖙): 
                𝖟𝖚𝖗ü𝖈𝖐𝖌𝖊𝖇𝖊𝖓 𝖘𝖊𝖑𝖇𝖘𝖙.𝖁𝖔𝖗𝖓𝖆𝖒𝖊 
    
            𝖉𝖊𝖋 𝖇𝖊𝖌𝖗ü𝖘𝖘𝖊𝖓(𝖘𝖊𝖑𝖇𝖘𝖙, 𝖁𝖔𝖗𝖓𝖆𝖒𝖊=𝕹𝖎𝖈𝖍𝖙𝖊𝖝𝖎𝖘𝖙𝖊𝖓𝖟): 
                𝖉𝖗𝖚𝖈𝖐𝖊𝖓("𝕲𝖚𝖙𝖊𝖓 𝕿𝖆𝖌, " + 𝖘𝖊𝖑𝖇𝖘𝖙.𝖁𝖔𝖗𝖓𝖆𝖒𝖊)
                𝖟𝖚𝖗ü𝖈𝖐𝖌𝖊𝖇𝖊𝖓 𝖘𝖊𝖑𝖇𝖘𝖙
    
        𝖇𝖊𝖌𝖗ü𝖘𝖘𝖊𝖗 = 𝕭𝖊𝖌𝖗ü𝖘𝖘𝖚𝖓𝖌𝖘𝖆𝖓𝖟𝖊𝖎𝖌𝖊𝖇𝖊𝖉𝖎𝖊𝖓𝖒𝖊𝖈𝖍𝖆𝖓𝖎𝖘𝖒𝖚𝖘("𝕳𝖆𝖓𝖘-𝕻𝖊𝖙𝖊𝖗 𝕯𝖊𝖚𝖙𝖘𝖈𝖍" )
        𝖇𝖊𝖌𝖗ü𝖘𝖘𝖊𝖗.𝖇𝖊𝖌𝖗ü𝖘𝖘𝖊𝖓()

~~~
smoyer
Google translate doesn't seem to do well with those characters ... could
someone please help with "𝕭𝖊𝖌𝖗ü𝖘𝖘𝖚𝖓𝖌𝖘𝖆𝖓𝖟𝖊𝖎𝖌𝖊𝖇𝖊𝖉𝖎𝖊𝖓𝖒𝖊𝖈𝖍𝖆𝖓𝖎𝖘𝖒𝖚𝖘".

~~~
jevgeni
Literally it means: Greeting-Display-Control-Mechanism. In German you can
jumble the words together to get a new, more precise German word. The most
notorious being this:
[http://www.telegraph.co.uk/news/worldnews/europe/germany/100...](http://www.telegraph.co.uk/news/worldnews/europe/germany/10095976/Germany-
drops-its-longest-word-Rindfleischeti....html)

~~~
nathell
Or this: [http://linguacuriosa.blogspot.com/2009/10/german-for-
beginne...](http://linguacuriosa.blogspot.com/2009/10/german-for-
beginners.html)

~~~
jevgeni
Mother of god...

~~~
Someone
That would be Gottesmutter or (way better) Gottesgebärerin
([http://de.wikipedia.org/wiki/Gottesgeb%C3%A4rerin](http://de.wikipedia.org/wiki/Gottesgeb%C3%A4rerin))

------
GregBuchholz

                ⎧1               if n = 0;
         F(n) ≡ ⎨1               if n = 1;
                ⎩F(n-1) + F(n-2) if n > 1.
        
        ⎛ ∇∙D⃑ = ρ         ⎞
        ⎜ ∇∙B⃑ = 0         ⎟
        ⎜ ∇×E⃑ = -∂B⃑/∂t    ⎟
        ⎝ ∇×H⃑ = J⃑ + ∂D⃑/∂t ⎠
        
             ⌠¹
        π = 2⎮ √1̅̅-̅̅x̅̅²̅̅ dx
             ⌡₋₁
    
         ⎡1 0 1⎤ ⎡î⎤
         ⎢0 1 0⎥ ⎢ĵ⎥
         ⎣1 0 1⎦ ⎣k̂⎦
    
        Γ ⊢ t:S    S<:T
        ―――――――――――――――  (T-Sub)
            Γ ⊢ t:T
    
                ⎛   1 ⎞ⁿ
        ℯ = lim ⎜1+ ― ⎟
            ⁿ→∞ ⎝   n ⎠

~~~
lisper
That's pretty slick. Did you do that manually?

~~~
GregBuchholz

        ╔════════════════════╗
        ║ Yes.  All manually ║
        ╙────────────────────╜

------
emillon
Funny how it triggered a bug in Firefox. When the tab is unfocused, its title
in the handle is "𝑼𝒏…", but when it gets the focus it becomes "𝑼<D835>…" (in a
square box). The next codepoint is U+1D48F whose UTF-16 BE encoding is d8 35
dc 8f.

I'd say that the truncation algorithm operates on bytes and that it can't make
sense of d8 35, but I'm not too sure how to fix that since graphemes can have
arbitrary length (right?). Do you have to compute the width in advance?

~~~
pwnna
Hm.. i'm on nightly and seems to be unaffected by this problem.

~~~
emillon
It depends on the size of the tab headers.

------
gus_massa
This is similar to the pseudolocalization (þšéûðöļöçåļîžåţîöñ), that adds
random accents to English word to test the localization capabilities of a
program without requiring another language knowledge.

An online version:
[http://www.pseudolocalize.com/](http://www.pseudolocalize.com/)

A library: [http://code.google.com/p/pseudolocalization-
tool/](http://code.google.com/p/pseudolocalization-tool/)

------
gojomo
Hey! I was just thinking about this site, and visited it for the first time in
years, after mentioning the old _San Francisco_ ransom-font in another thread.

By randomly mixing these Unicode letter and letterlike characters, you can
simulate a cut-and-paste ransom-note. For example, an acquired company could
announce changes to its privacy policy:

    
    
      wE ℎåve yøuR ρrIvᴀçy ⅈn a ᴡiNdøwleSs ℞oøm,
      & ℙℓaℕ τø ⅆo µnSρεaKᴀble †hiℕℊs t○ ⅈt

~~~
hanula
Heh, I created something like that in Python:
[https://github.com/hanula/weirdify](https://github.com/hanula/weirdify) while
playing with unicodedata module.

------
hbbio
Oh, no !

The cat should have stayed in a box, if this gains too much popularity, HN
will read like MySpace back in the days.

And top HN news will be: "A browser plugin that translates Unicode back to
ASCII".

~~~
errnoh
The problem is that this doesn't stop here. This method works everywhere and
it will spread.

We'll need a plugin to reverse this, anyone up for it?

~~~
peterwwillis
Go to your browser's menu bar, click 'View', go to 'Character Encoding', and
select 'Western (ISO-8859-1)'. Now it's just garbage characters. (It's not
reversed, but at least it's not bold?)

------
robjh
For others without that specific font or what have you: "Unicode Text
Converter"

On my windows box with chrome all i see are empty boxes.

~~~
TheAnimus
Use IE (wow, don't say that often) it has much better typography support, if
you are on a high DPI display, chrome just looks awful.

~~~
rossy
> if you are on a high DPI display, chrome just looks awful

I'm fairly sure this is no longer the case. Chrome is high-DPI aware on
Windows now, and it uses DirectWrite for font rendering, the same as IE. It
just can't display these characters for some reason.

~~~
mahouse
I think he does not only mean the font rendering, but the UI itself.

Anyway, DirectWrite was horrible at high DPI, if I remember correctly.

~~~
rossy
Nope, the UI got an update too. It renders at high-DPI on Windows. Chrome on a
high-DPI machine looks exactly the same as on a low-DPI machine, except
sharper. It used to be plagued with issues, but I'm fairly sure they're all
gone now. DirectWrite isn't perfect. It still has weird hinting and kerning at
high-DPI with some fonts, but it's better than GDI.

I find Chrome better than IE, actually. IE ignores my DPI settings and scales
pages to 250%, so everything looks too large. Chrome renders correctly at
200%.

------
MrBuddyCasino
This surprises me, what exactly is the point of encoding what are essentially
different fonts in unicode? Isn't that the job of the presentation layer?

(the Fraktur variant is awesome btw, and is apparently in the valid unicode
range for Java...)

~~~
masklinn
The graphical difference has semantic significance in some domains:
[http://en.wikipedia.org/wiki/Mathematical_Alphanumeric_Symbo...](http://en.wikipedia.org/wiki/Mathematical_Alphanumeric_Symbols)

~~~
MrBuddyCasino
I guess that makes sense.

Personally I find it annoying how mathematical notation seems so intractable
today. Things that are easily understood in code for me are a mystery in math
notation. But I guess there will never be an overhaul with a more intuitive
typography...

~~~
RBerenguel
Keep in mind it is also true the other way around. Something can be
mathematically clear to someone and totally a mystery in code form. Each one
has his/her strengths and weaknesses.

~~~
MrBuddyCasino
Probably true, and I guess if you're a mathematician, you quickly get used the
symbols. And I'm not arguing against having those symbols in the first place,
its just that some of them have an 19th century feel to them, and do not seem
intuitive.

The art of typography and signage really only matured in the 20th century, and
I'm certain some of the symbols would look very different if they were
designed today. Anything that helps with teaching math and making it appear
friendlier is a plus, imho.

~~~
RBerenguel
I'm not sure what symbols are you hinting at. First I thought it was to
Fraktur kind of letters, but obviously this shouldn't be the case, as you
point "teaching" as a plus of redesigning them, and Fraktur symbols are used
"traditionally" in relatively high level algebra (for some reason some symbols
are used more in some realms, for me Fraktur started appearing when talking
about complex stuff about ideals). Once you get used to them, it's like a
second language, and that's it. I remember reading Feynman used his own
symbols for sin, cos and other basic functions (turning them to one-stroke
symbols) but he had to give up once he had to talk with other people.

Math symbols are more or less a universal language. Once you know how the
symbol appeared, or get used to "reading it right" they are totally natural. I
don't see ∂ as a "weird d," I read this as "partial." It wasn't natural at
first, but I got used to it, just like I got used to English.

------
mxfh
Since it wasn't mentioned here earlier, it's worth to take a look at
shapecatcher to see what glyphs might resemble latin letters.

Scribbling something resembling the latin capital letter A returns for example
any of these codepoints: A𝘈ΑАÅ𝖠∆ДΔ𝐴𝟺дᎪߡ𝛢Å4𝛥ᴬᐃⵠ𐌀𝘼𝛬Λ△𝟦Ą𝜟𝓐⌓⧍ᗋ🜂Ⲇ🗻🍙ⲇѦᗩᗅ

[http://shapecatcher.com/](http://shapecatcher.com/)
([https://news.ycombinator.com/item?id=5150107](https://news.ycombinator.com/item?id=5150107))

Also the Unicode Consortium has some reports on security:

[http://www.unicode.org/reports/tr36/](http://www.unicode.org/reports/tr36/)

[http://www.unicode.org/reports/tr39/](http://www.unicode.org/reports/tr39/)

listing all kind of spoofing methods you haven even thought of.

------
horse_continuum
One of my friends, moving to China for a semester to teach, was thinking of
using a proper Chinese name to make it easier for students to address him. He
had a good idea, even, which he shared on Facebook.

I proposed that we should name him after the lack of unicode support in our
browsers, and we ended up calling him "Box Boxbox" for a couple of months.

------
TorKlingberg
Does anyone know why there are separate Unicode code points for letters in
bold, bold italic and Fraktur? Normally this sort of thing should be handled
by different fonts / font variants. Is it for compatibility with some legacy
encoding?

~~~
rossy
They're mathematical symbols. I guess they're for situations where, say, a
double-struck letter has a different meaning to the regular letter.

[http://en.wikipedia.org/wiki/Mathematical_Alphanumeric_Symbo...](http://en.wikipedia.org/wiki/Mathematical_Alphanumeric_Symbols)

------
jfmercer
I couldn't help but notice that this converter was copyrighted by Eli the
Bearded. Google "Eli the Bearded", but not from work. You'll get some very
interesting results.

[https://encrypted.google.com/#q=Eli%20the%20Bearded](https://encrypted.google.com/#q=Eli%20the%20Bearded)

------
qeorge
I was once bilked into buying some scraped content as original work by this
method. It passed copyscape, and my test of Googling a a random sentence in
quotes didn't bring anything up. I let it go because I had already accepted
the work, and the lesson was worth more than the article anyway.

Don't be fool as I was! Had I manually transcribed a sentence into Google
instead of copying + pasting the Unicode chars, I would have found hundreds of
copies of the same article.

------
sthlm
In Javascript, many unicode characters are allowed [0], so háćḱéŕŃéẃś is a
valid variable name [1].

Note: The number of іllэБіъlэVаѓіаъlэИамэѕ [2] used in your production code is
inversely proportional to the number of friends you'll make in the maintenance
team.

[0] [https://mathiasbynens.be/notes/javascript-
identifiers](https://mathiasbynens.be/notes/javascript-identifiers)

[1] [https://mothereff.in/js-
variables#h%C3%A1%C4%87%E1%B8%B1%C3%...](https://mothereff.in/js-
variables#h%C3%A1%C4%87%E1%B8%B1%C3%A9%C5%95%C5%83%C3%A9%E1%BA%83%C5%9B)

[2]
[http://www.panix.com/~eli/unicode/convert.cgi?text=illegible...](http://www.panix.com/~eli/unicode/convert.cgi?text=illegibleVariableNames)

~~~
jarek
I had quite a lot of fun defining 汉字 variable names in C#. Though definitely
not something to put into production code of course...

------
edgarallenbro
This is great, but why is the Australian translation called 'upside down
pseudoalphabet'?

------
cgranier
What I need is something that takes all the extended characters (think Spanish
or Swedish) and turns them into alternative safe versions.

For instance, á into a, ñ into n, å into a, etc.

Had my hopes up when I saw the title.

Does anyone have any ideas or links to working scripts that I can turn into
something useful? I need to "sanitize" a database of foreign documentaries
before uploading to YouTube (their metadata input system chokes on extended
chars). Thanks!

~~~
cataphract
You can use ICU transliterators. Example for the PHP ICU bindings:
[http://php.net/manual/en/transliterator.transliterate.php#11...](http://php.net/manual/en/transliterator.transliterate.php#111939)

~~~
cgranier
Thanks. This looks very promising. I'll dig into it and hopefully come out
with a clean database ;-)

------
pud
I made an iPhone app that does kind of the same thing, but converts letters to
their upside-down unicode equivalent. It's fun for sending upside-down texts.

Free and ad-free, just a fun project:

[https://itunes.apple.com/us/app/texting-upside-down-
free/id4...](https://itunes.apple.com/us/app/texting-upside-down-
free/id435354073?mt=8)

~~~
tgcordell
Would it be possible to use the new third party keyboard API in iOS8 to have a
regular styled keyboard that types in an upside down fashion? This would allow
the user to continue having the same input experience, but translate the
output experience? Once confirmed this is possible, you could take OP's idea
and apply as well.

------
kcorbitt
Just a PSA for discoverability: since the replacement characters use different
code points than their more standard equivalents, the default HN search
([https://hn.algolia.com](https://hn.algolia.com)) at least doesn't find this
submission when searching for "unicode."

------
lazyjones
Great, now we'll have to rely on IDEs with clickable drop-down lists of
variables and function names because simple text input just got a lot harder
for languages where Unicode is allowed for symbols!

[http://play.golang.org/p/2zYfCx_J-O](http://play.golang.org/p/2zYfCx_J-O)

~~~
1010
Presumably, we are now in a situation where it is actually more difficult to
learn computer programming if you happen to have had the misfortune to be born
into a 'non-western' language and, to some extent, even non-english. That is
an absurd situation and means that, as a collective species, we are wasting a
huge amount of resources and potential. Definitely something we should look to
resolve.

Having a drop-down for variables certainly isn't a solution, granted.
Hopefully, there are some more sensible compromises - e.g. being able to
specify a locale-dependent subset of unicode in your personal environment,
appropriate use of metadata to describe the language of a file, etc.

------
Immortalin
On iOS 8.1 safari all I see is a bunch of squares ;(

------
petecooper
My iOS/Safari shows squares in the page itself, but a row of boxed aliens in
the `Bookmarks and History` list:

[http://imgur.com/l98p9oN](http://imgur.com/l98p9oN)

(image is safe for work, though other stuff on imgur.com is likely not)

------
tezza
🆃🅷🅴🆁🅴 🅶🅾🅴🆂 🆁🅴🅰🅳🅰🅱🅸🅻🅸🆃🆈, 🆂🅴🅰🆁🅲🅷🅰🅱🅸🅻🅸🆃🆈

~~~
crb
Ctrl-F "there goes" found this comment just fine.

~~~
tezza
No dice on my system, in-page search does not work. Mac or Windows Firefox.

Also some of the menu of glyphs are only visual analogues, not 1-1
replacements.

Plenty of systems and indexers will not be sophisticated enough to cope.

This may be very frustrating if you have visual impairments and need a screen
reader.

~~~
vhost-
Same here on windows, mac and debian with firefox.

------
grimgrin
My friend made a similar tool that you may enjoy:

[http://antglove.com/erger](http://antglove.com/erger)

~~~
akavel
And seems to have more "proper fonts" than the originally linked one,
actually.

------
rossy
I wish this worked on Windows/Chrome, or I knew why it didn't work so I could
star the issue on their bug tracker.

------
gojomo
Interesting; the title displayed OK minutes ago, on the main page, in
Firefox/OSX. But now it's showing as unsupported-glyph boxes inside the
page... but still looks OK in the titlebar of the item (comments) page.

Did some automated or administrative process mutate the characters? Or is this
just Firefox drifting, in choice of font?

------
hesselink
Strangely, for me on Firefox 33.1 on OS X, the title shows up fine on the main
page. But when I click through to the comment, I get boxes only, and from then
on, the main page also doesn't work anymore until I restart Firefox. I suspect
an extension, but I'm not sure.

~~~
hesselink
Found it, it was
[https://github.com/darrinhenein/VerticalTabs](https://github.com/darrinhenein/VerticalTabs).

------
spindritf
Also, strike-through. Which is the one I find genuinely useful because I like
the suggestive way to say s̶o̶m̶e̶t̶h̶i̶n̶g̶ then visibly correcting to
something else.

[http://adamvarga.com/strike/](http://adamvarga.com/strike/)

~~~
endgame
People have written ^H and ^W since forever^W^Wfor a very long timg.

~~~
spindritf
Those are lost on many people nowadays. And strike through imho looks better.

------
guardian5x
I only saw boxes in the title with Chrome 38. Tried out IE10 and it works just
fine.

~~~
rplnt
Boxes with (Blink) Opera as well. Works in firefox.

~~~
guardian5x
I just noticed that in the Chrome tabs it shows the title correctly, i guess
its because it just uses Windows unicode support there. But everywhere else
its not showing.

------
geekam
This fails to show up on my iPhone 5S Safari and I thought it supported
Unicode.

------
ck2
Note that XP cannot show

    
    
        Negative Circled
        Squared
        Negative Squared
        Double-struck
        Bold
        Bold italic
        Bold script
        Fraktur
    

At least not with the fonts I have.

~~~
aruggirello
Firefox on my Ubuntu 14.04 PC cannot show:

    
    
        Negative Circled
        Squared
        Negative Squared

------
huuu
𝕯𝖔𝖊𝖘 𝖆𝖓𝖞𝖔𝖓𝖊 𝖐𝖓𝖔𝖜 𝖜𝖍𝖞 𝖙𝖍𝖊 𝖑𝖎𝖓𝖊 𝖍𝖊𝖎𝖌𝖍𝖙 𝖔𝖋 𝖙𝖍𝖊𝖘𝖊 𝖈𝖍𝖆𝖗𝖆𝖈𝖙𝖊𝖗𝖘 𝖎𝖘 𝖘𝖔 𝖍𝖎𝖌𝖍?

~~~
scarygliders
𝕀'𝕞 𝕡𝕣𝕖𝕥𝕥𝕪 𝕤𝕦𝕣𝕖 𝕒 𝕥𝕙𝕣𝕖𝕒𝕕 𝕠𝕗 𝕣𝕖𝕡𝕝𝕚𝕖𝕤 𝕔𝕠𝕞𝕡𝕣𝕚𝕤𝕖𝕕 𝕖𝕟𝕥𝕚𝕣𝕖𝕝𝕪 𝕥𝕙𝕖𝕤𝕖 𝕦𝕟𝕚𝕔𝕠𝕕𝕖-𝕔𝕠𝕟𝕧𝕖𝕣𝕥𝕖𝕕
𝕥𝕖𝕩𝕥𝕤 𝕨𝕚𝕝𝕝 𝕓𝕖𝕘𝕚𝕟 𝕥𝕠 𝕘𝕖𝕥 𝕠𝕝𝕕 𝕢𝕦𝕚𝕔𝕜𝕝𝕪 ;)

~~~
solistice
Hey, we got this toy and we want to play with it.

There's this great quote that anything that was fun when you were five is
still fun when you're thirty five, and playing around with funky letters was
certainly fun at the age of 5.

~~~
scarygliders
Oh I agree entirely - my post was meant for the irony rather than being a
45-year old curmudgeon ;)

(And I had fun too!)

------
sanxiyn
[https://twitter.com/benbjohnson/status/533848879423578112](https://twitter.com/benbjohnson/status/533848879423578112)

------
sovok
Very cool. Although the upside-down text doesn't work with ümlauts and
numbers. A reverse function would also be nice.

I wrote a similar tool that does this
([http://lunicode.com](http://lunicode.com)). It's on Github if you want to
use the code:
[https://github.com/combatwombat/Lunicode.js](https://github.com/combatwombat/Lunicode.js)

------
cturner
Different problem, but someone who knows about unicode will probably know this
-

When I paste from microsoft documents into putty, characters will often be
transformed to weird versions. Example - emdash is a different character to
'-'. It comes through as a weird tilda character instead of a dash. Mmm.
Frustating.

Is there a robust program you can run on putty to catch such type and flatten
it to ascii?

~~~
benjaminjackman
I use Linux but there are similar problems, I usually will paste text like
that into sublime to remove all the special formatting, then re-copy paste it.
I also found this stack overflow post, which mentions a program (puretext)
that maps win+v to do a text only paste:
[http://stackoverflow.com/questions/122404/how-to-copy-and-
pa...](http://stackoverflow.com/questions/122404/how-to-copy-and-paste-code-
without-rich-text-formatting)

------
netheril96
𝕿𝖍𝖎𝖘 𝖋𝖊𝖊𝖑𝖘 𝖑𝖎𝖐𝖊 𝖙𝖊𝖗𝖗𝖎𝖇𝖑𝖊 𝖍𝖆𝖈𝖐 𝖇𝖚𝖙 𝕴 𝖑𝖎𝖐𝖊 𝖎𝖙. 𝕹𝖔𝖜 𝕴 𝖈𝖆𝖓 𝖚𝖘𝖊 𝖆𝖑𝖑 𝖐𝖎𝖓𝖉𝖘 𝖔𝖋 𝖋𝖆𝖓𝖈𝖞
𝖋𝖔𝖗𝖒𝖆𝖙𝖙𝖎𝖓𝖌 𝖔𝖓 𝖙𝖍𝖔𝖘𝖊 𝖘𝖎𝖙𝖊𝖘 𝖙𝖍𝖆𝖙 𝖉𝖔𝖊𝖘𝖓'𝖙 𝖘𝖚𝖕𝖕𝖔𝖗𝖙 𝖋𝖔𝖗𝖒𝖆𝖙𝖙𝖎𝖓𝖌.

~~~
masklinn
Except when the site in question is completely broken wrt astral codepoints.

Which is unexpectedly common as MySQL's "utf8" can't handle codepoints outside
the BMP and will just truncate text at the first astral codepoint[0]. You need
MySQL 5.5 _.3_ (because adding a whole new encoding in a minor version makes
perfect sense) and "utf8mb4" (because why would a codec called "utf8" actually
do UTF8?). And then the regex are probably broken because it's PHP and
developers use neither UNICODE mode nor properties (PCRE's "\w" will not match
all unicode letters, you need "\p{L}" for that, also note that e.g. "🆄" is a
symbol not a letter, although "𝔹" is a letter)

[0] [https://mathiasbynens.be/notes/mysql-
utf8mb4](https://mathiasbynens.be/notes/mysql-utf8mb4)

~~~
TazeTSchnitzel
MySQL is horrible for all the same reasons PHP is horrible, and this applies
to Unicode too, except PHP is actually trying to fix its Unicode problems
(UTF8 is the default now, moves towards adding a UString class), while MySQL
isn't fixing them.

------
anjbe
I’ve never been a fan of this sort of thing. The Unicode characters in these
font blocks are not letters for making words; at least the double‐struck,
fraktur, bold, italic, and bold italics are semantically for use in
mathematical equations.

This can have some strange effects if you try to use them like letters.
Example: What’s the lowercase transform of 𝑼? 𝑼! Not 𝒖.

------
petercooper
If you like this sort of thing, you might like this piece I wrote some time
back about writing a Ruby script using whitespace for all identifiers:
[http://www.rubyinside.com/the-split-is-not-enough-
whitespace...](http://www.rubyinside.com/the-split-is-not-enough-whitespace-
shenigans-for-rubyists-5980.html)

~~~
TheLoneWolfling
This sounds like it could be abused.

Someone submitting a path to an open-source program (in Ruby) with a NBSP
somewhere that changes the program logic or something. (a<NBSP>or<NBSP>b,
where earlier you did a<NBSP>or<NBSP>b=x, or something similar, is the first
example that comes to mind.

~~~
TheLoneWolfling
Whoops. Patch, not path.

------
edent
This is the w̶o̶r̶s̶t̶ b̲e̲s̲t̲ use of Unicode!

------
hliyan
Impressive! Hopefully, this won't end with HN sanitizing everything except
latin + latin extended from submissions.

~~~
Cthulhu_
Well it does / should make people rethink allowing UTF-8 by default in user-
generated content. I wonder if the stuff generated by
[http://www.eeemo.net/](http://www.eeemo.net/) works here:

Z̡̖̥̙̱͓A̶͚̬̺L̷͖͓Ģ͕O̳̮!̗

~~~
joezydeco
#فͤ҈ͨͥ҉҉ͦ҈҉ͨ҈ͩ҉ͪ҈ͣͯͫ҉ͥͬͨ҈ͭ҉ͮ҈ͯ҉ͨ҈ͭͭͬ҉ͧͥ҈ͣ҉ͨ҉҉҈ͧͥ҉ͯ҈ͮͥ҉ͭ҈ͤ҈ͦ҈ͥ҉ͧ҈ͩͯ҉ͭ҈ͨ҉ͨͥ҉҉ͣ҉ͣͪ҉ͧ҈ͭ҉ͩ҈ͤ҉ͮ҈ͯͥ҈ͬ҈ͭ҈ͦ҈ͨͣ҉ͥ҈ͯ҉҉ͣͧ҈ͫ҉ͭ҈ͥͯͯ҉ͦ҈ͥ҉ͧ҉҈ͩ҉ͭ҈ͣͨ҉ͣͥ҈ͪ҉ͧ҈ͭᅠ'̋̋̋̋̋̋̋̋̋̋̋̋̋̋̋̋̋̋̋̋̋̋̋̋̋̋̋̋̋̋̋̋̋̋̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏
ᅠᅠ'̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̋̋̋̋̋̋̋̋̋̋̋̋̋̋̋̋̋̋̋̋̋̋̋̋̋̋̋̋̋̋̋̋̋
กิิิิิิิิิิิิิิิิิิิิิิก้้้้้้้้้้้้้้้้้้้้้้ก็็็็็็็็็็็็็็็็็็็็็็ก้้้้้้้้้้้้้้้้้้้้้้ก็็็็็็็็็็็็็็็็็็็็็็กิิิิิิิิิิิิิิิิิิิิิิ
ͥͦͧͣͤ ͦͧͣͤͥ ͧͣͤͥͦ ͣͤͥͦͧ ͤͥͦͧͣ ͥͦͧͣͤ ͦͧͣͤͥ ͧͣͤͥͦ ͣͤͥͦͧ ͤͥͦͧͣ ͥͦͧͣͤ ͦͧͣͤͥ ͧͣͤͥͦ
ͣͤͥͦͧ ͤͥͦͧͣ ͥͦͧͣͤ ͦͧͣͤͥ ͧͣͤͥͦ ͥͦͧͣͤ ͦͧͣͤͥ ͧͣͤͥͦ ͥͦͧͣͤ ͦͧͣͤͥ
▲▲▲̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏̏
Works for me.

~~~
spyder
This comment has a strange behavior in Firefox, which is not surprising but
it's probably a bug: When I scrolling to this comment there is no characters
outside of the comment box but when i switch back to this page from another
tab then the characters are going outside the comment box.

~~~
joezydeco
You should try it on mobile. Every browser does something different. Twitter
has a /real/ fun time with this stuff, I've been inspired by @glitchr_

[https://twitter.com/glitchr_](https://twitter.com/glitchr_)

------
NoMoreNicksLeft
I don't really speak/read Russian, but I have a passable understanding of
Cyrillic, and those always look dumb. It doesn't look like "the" to be, it
looks lik "guh-buh-yeh" or something.

Same thing with the Borat DVD cover.

~~~
dang
Or _Toys Ya Us_.

------
calineczka
Finally a way to express myself on facebook properly ;) I wonder if bold text
would lead to better conversion from ads using this trick. And I wonder when
is facebook going to ban this because obviously it works :)

------
dsjoerg
ᴅᴏᴇꜱ ᴀɴyᴏɴᴇ ᴋɴᴏᴡ ɪꜰ ᴩᴏᴩᴜʟᴀʀ ꜱᴇᴀʀᴄʜ ᴇɴɢɪɴᴇꜱ ᴅᴇ-ᴜɴɪᴄᴏᴅᴇ ᴛᴇxᴛ ᴡʜᴇɴ ɪɴᴅᴇxɪɴɢ?

~~~
guipsp
Google does, atleast for fullwidth.

~~~
TazeTSchnitzel
Which makes sense, as fullwidth is likely to be accidentally typed when using
a Chinese/Japanese/Korean IME, and is entirely equivalent to normal
characters, it just fits in with CJK text layouts better.

------
grayclhn
I look forward to a Hacker News front page that looks like a ransom note.

~~~
protomyth
I think of it as a karma backlash from Apple naming a new font San Fransisco
for the iWatch instead of leaving the name for the old ransom font.

------
arikrak
See
[https://news.ycombinator.com/item?id=7383672](https://news.ycombinator.com/item?id=7383672)
though they changed my title to normal text.

------
codemonkeymike
Continued use of this would be a good way of making me not use HN.

------
DanBC
Chrome on iOS is giving me the character unavailable boxes. Normally I'd just
change the font but I can't do that here.

This doesn't feel like the future.

------
rplnt
Does not really work for characters like _úôä_ , not sure if there isn't
anything similar in those "styles" or it was just ignored.

------
parasj
𝓖𝓻𝓮𝓪𝓽 𝓯𝓸𝓻 𝓹𝓪𝓼𝓼𝔀𝓸𝓻𝓭𝓼

------
seba_dos1
𝕴 𝖋𝖊𝖊𝖑 𝖆 𝖓𝖊𝖜 𝖛𝖎𝖗𝖆𝖑 𝖙𝖍𝖎𝖓𝖌 𝖔𝖓 𝖘𝖔𝖈𝖎𝖆𝖑 𝖒𝖊𝖉𝖎𝖆 𝖈𝖔𝖒𝖎𝖓𝖌.

~~~
pbhjpbhj
This has been a ⓣⓗⓘⓝⓖ [thing] for quite some time - guess it might be making a
come back, I've seen zalgo
([http://knowyourmeme.com/memes/zalgo](http://knowyourmeme.com/memes/zalgo)
NSFW; ̖͈̪͙͉̰͈Z͓͎̬͓̯̖A̶̯̝̖͍̥̞L̻G̢̣̘͇̖͍̙O [Zalgo] generator
[http://www.eeemo.net/](http://www.eeemo.net/)) and flip and reverse text live
on my Facebook in the past at least.

------
seqizz
I feel nice.. [http://i.imgur.com/lbvRWwm.png](http://i.imgur.com/lbvRWwm.png)

------
darkstalker
I've used this page for a long time. Ｗｒｉｔｉｎｇ ｓｔｕｆｆ ｉｎ ｆｕｌｌｗｉｄｔｈ ｕｎｉｃｏｄｅ ｆｏｒ
ｓｕｒｅ ｍａｋｅｓ ｉｔ ｌｏｏｋ ｍｏｒｅ ｆｕｎｎｙ

------
getdavidhiggins
[https://www.unicod.es/](https://www.unicod.es/)

------
jrometty
It should be mentioned that this returns a blank title on the android app.

------
cm2012
On my android all the unicode characters (including the title) are blank.

------
tempodox
It works :)

𝑼𝒏𝒊𝒄𝒐𝒅𝒆 𝑻𝒆𝒙𝒕 𝑪𝒐𝒏𝒗𝒆𝒓𝒕𝒆𝒓

comes in a fancy bold italic font in my HN list. I love this hack.

~~~
eterm
Oddly in Firefox the tab name showing the title only gets as far as 𝑼𝒏𝒊𝒄𝒐
before giving up with what looks like a box with D835 in it.

~~~
acqq
The tab name is shortened in the middle of the sequence.

I still don't know what the sequence is though, any Unicode expert to explain?
Apparently is d835 "invalid"?

[http://www.charbase.com/d835-unicode-invalid-
character](http://www.charbase.com/d835-unicode-invalid-character)

Edit: I see now emillon explains:

"U+1D48F whose UTF-16 BE encoding is d8 35 dc 8f."

That's:

[http://codepoints.net/U+1D48F](http://codepoints.net/U+1D48F)

"MATHEMATICAL BOLD ITALIC SMALL N"

------
Flott
This is not good news if it bypasses the spam filters! Does it?

------
sjwright
The question I have is, what's the easiest way to strip this 🅹🆄🅽🅺 out of
unicode strings submitted by web users? With a nod to Cunningham's Law, surely
the right answer is a regular expression?

~~~
atlbeer
Depends on the language... but, the "correct" answer is support unicode and
welcome yourself into a world of pain.

~~~
dsjoerg
Glad you put "correct" in scare quotes, because that "correct" answer is
certainly not correct.

------
gpvos
I do feel that Unicode is slowly jumping the shark.

------
aruggirello
!ꙅᴙɘTliꟻ mAqꙅ ꟻo Tɘꙅ wɘᴎ ɘloHw A ꙅbɘɘᴎ ꙅiHT ,oᴎ HO

------
edem
Can you do z̝̗a͈̣̳͓l͏g̱̭͖̜̙o̢̦̫̯ as well?

------
JulianMorrison
𝕸𝖊𝖎𝖓 𝕷𝖚𝖋𝖙𝖐𝖎𝖘𝖘𝖊𝖓𝖋𝖆𝖍𝖗𝖟𝖊𝖚𝖌 𝖎𝖘𝖙 𝖛𝖔𝖑𝖑𝖊𝖗 𝕬𝖆𝖑𝖊.

------
getdavidhiggins
ｔｈｉѕ ｉѕ ｇｒéäｔ, ƅüｔ ìｔ'ｓ ｃｌ߀ｓèԁ ｓòùｒｃè!!!

üｎíｔｏ߀ɭｓ ìѕ ϻùｃհ ƃｅｔｔëｒ!!

[https://www.unicod.es/](https://www.unicod.es/)

------
noobermin
Ｑｕｉｔｅ ａ ｗａｙ ｔｏ ｍａｋｅ ｔｈｅ ｐｏｉｎｔ．

------
shaurz
What is the point of having different codepoints for FONTS in Unicode? What a
load of nonsense.

~~~
Arnt
Unicode generally includes these things because an older encoding did, in the
name of roundtrip compability. I expect some older font encoding did it to
cater to people who need more than 26 symbols in their maths papers. Let 𝒉 be
the...

Unicode's name for 𝒉 explains it all, really.

~~~
lmm
And yet the Unicode consortium went with Han unification, which is still
blocking adoption for a significant potential userbase (pretty much any
software that needs to display Japanese names).

~~~
Arnt
I went to a unicode meeting about a decade ago, and asked one of the
luminaries over beer one night. He told me that they did some practical
research, including reading newspapers and talking to editors. In Japan they
would ask questions like "I see that you mention Shanghai in today's paper,
and you use Japanese glyphs for the city's name, not the same as Chinese
newspaper use. Why?". The answer was generally "that's how we write Shanghai
here" and out of that came Han unification.

I suspect that if you could find a couple of mainstream publishers in Taiwan
or Japan that prefer to print the names of mainland Chinese using the same
glyphs as are used on mainland China instead of the glypths used on Taiwan or
in Japan, you might be able to reopen the discussion of han unification.

~~~
Arnt
Or even better: A directive from the someone's ministry of education decreeing
deunified Han in school books, so at least one country's pupils would actually
learn to read deunified Han.

Now wouldn't that be fun: "When history textbooks coverthe civil war in
1927-50, they shall use traditional Chinese for the names of then KMT-held
cities and simplified Chinese for the names of then communist-held cities."

------
ryanjmo
เ ђคשє Շ๏ Շгץ Շђเร ๏ยՇ.

------
vjvj
🆃🅷🅴 🆂🅴🅲🆁🅴🆃 🅸🆂 🅾🆄🆃.

------
sakri
fun for passwords

------
tmmm
How does it work?

------
tibbon
It appears to work on Facebook and Twitter.

ｉｎｃｅｐｔｉｏｎ

------
fiatjaf
.ǝɔıu ɹǝdns sɐʍ sıɥʇ

------
yAnonymous
𝓘 𝔀𝓸𝓷𝓭𝓮𝓻 𝓱𝓸𝔀 𝔀𝓮𝓵𝓵 𝓝𝓢𝓐 𝓓𝓟𝓘 𝓼𝔂𝓼𝓽𝓮𝓶𝓼 𝓱𝓪𝓷𝓭𝓵𝓮 𝓽𝓱𝓲𝓼.

------
jackmaney
I'd like to buy a vowel, please. Let's go with "e".

------
kalops
teh cancer that is HN. predicting next post someone shows off rageflipping
text

------
PSeitz
𝓚𝓐𝓦𝓐𝓘

------
Kiro
Twitch chat will love this.

------
Houshalter
☐☐☐☐☐☐ ☐☐☐ ☐☐☐☐☐☐ ☐☐☐☐ ☐☐☐☐☐☐☐☐

