Hacker News new | past | comments | ask | show | jobs | submit login
Beware of the Unicode no-break space (0xC2 0xA0)
11 points by vicjicama 7 days ago | hide | past | favorite | 7 comments
Hi all!

Yesterday I saw a very strange behavior while working with some tags, I thought this was a bug!

Can you spot the difference? (https://jsfiddle.net/vbjtcf3d/)

<div pagews-snapshot="true" referenceid="remotemkt.listws.app" snapshotid="snapshot/board-companies" ></div>

the other ( https://jsfiddle.net/q9d08xp1/ )

<div pagews-snapshot="true" referenceid="remotemkt.listws.app" snapshotid="snapshot/board-companies" ></div>

Both looks exactly the same!, but only one is working....

It turn out that it was an unicode character issue, the spacing between the non working tag was using the unicode non-breaking space (0xC2 0xA0), when I replaced the spaces with the regular space character (0x20) everything worked again.

I think that I copied this tag from an email, I will be extra careful in the future for this kind of issues.

It took me a while to find out this, I hope this could help anyone, if your attributes are null and you are sure that they are present check you html with a hex editor.

Let me know your thoughts

If you use Evernote, at some point in the last year or so they started rewriting their spaces an non breaking space characters. I get tripped up almost weekly by some command or code I pasted from my notes

This is the main reason I stopped subscribing to Evernote. It messes up my notes and I can't even see how. I'd use another note taking tool, but most are worse. It would be nice to have something with just plaintext, even without formatting.

IMHO all the parsers should update to address this. A space is still a space even if it's a non-breaking one.

Thanks for the heads up. Life sure seemed simpler when it was just ASCII and American English to worry about. ;-)

Another thing that trips me up regularly is BOM bytes at the start of text files, it can be really hard to debug if you don't know about them.

The character is U+00A0.

C2 A0 is the UTF-8 encoding.

Applications are open for YC Winter 2022

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact