Hacker News new | past | comments | ask | show | jobs | submit login
Comparing how different devices display the SSID “á̶̛̛̓̿̈͐͆̐̇̒̑̈́͘͝aaa” (hamptonmoore.com)
238 points by herohamp on July 1, 2020 | hide | past | favorite | 58 comments

Fun story about one of the devices mentioned there that I worked on. We used to store the saved wifi creds in a file named exactly what the SSID was.

Some user managed to break things, and with their permission we gathered detailed wifi logs and found they were connected to an SSID that was an ASCII depiction of the equation: boobs plus penis equals a smiley face. The issue was the forward slashes, presumably there to add fingers to the scene. Must have been an awkward customer service follow up when we told them to change their SSID while they waited for an update.

Sounds like a directory traversal to me :)

It's generally a bad idea to have the user in control of filenames you create if those files are not on a device they own.

In this case, it sounds like the files were on a device owned by the user?

The user in control here is the one configuring the SSID, which is not necessarily the same one owning the device used to connect to it.

I guess SCHiM means "own" as in "have administrative control over".

Really they should have fixed the software instead of telling the user to change it. It's a perfectly valid SSID.

And really, using raw environment-derived data directly on the filesystem?? What if the SSID had been "/etc/passwd" or something similar and it wrote to that?

We just told them to change it until we could ship an update that fixed it. We agreed that it was a perfectly valid SSID.

Always base64 data you do that kind of thing with!

Yeah and definitely never use user-derived data directly in your filesystem.

I used to have something like this as my SSID: ʕ•̫͡•ʕ̫͡ʕ•͓͡•ʔ-̫͡-ʕ•̫͡•ʔ̫͡ʔ-̫͡-ʔ (Not this particular one as it was too long though!) Many nice examples at: https://1lineart.kulaone.com/#/

It was fun but some OSes didn't show it correctly, in particular Windows. It would just show it in HEX. And more annoyingly, some devices refused to connect to it at all, especially IoT crap like those WiFi power sockets.

So eventually I gave up.

PS: Something with more vertical stuff would also be really fun, some of these can write across multiple lines of unrelated content! Unfortunately most OSes block this from happening now. Example:


So the Unicode above this would write through the next lines on some platforms, even system screens like the wifi chooser :)

But these quickly get too long for an SSID too.

This worked for me on Windows + Chrome :)


EDIT: though not on native HN, I think it might be a result of having HNES installed.

Lol, that's really bad!

On native HN in Firefox Windows it also works but it stops at the "reply" button under my post.

And on native HN on Firefox Mac it doesn't work at all, strange enough. Firefox must rely strongly on the platform rendering.

The 802.11 standards have always allowed up to 32 bytes which can be filled with any data, it does not have to be in a particular encoding. In 802.11-2012 there is a separate tag SSIDEncoding which can be used to specify if these bytes are in UTF-8 or "unspecified". If the UTF-8 option is set, the SSID should be interpreted as UTF-8.

It is not clear in this case if the router sets this flag or not. Either way there is no stipulation in the spec about how the UTF-8 characters should be displayed so many of these options are potentially valid.

The bytestring was truncated after 32 bytes, in the middle of a UTF-8 byte sequence. This means the resulting truncated string is not valid UTF-8 anymore. So my guess is that most devices decide "if it's not valid UTF-8, it must $LEGACY_ENCODING".

Unicode offers two ways forward when you can't decode what you have, one alternative is an exception, you just fail because you weren't able to decode something.

The other is for any code unit that won't decode you emit U+FFFD the Unicode Replacement Character and then you carry on decoding.

For humans U+FFFD makes it obvious something is wrong, it's typically visualised as a black diamond with a white question mark. And for a machine it shouldn't match parsing rules, it isn't an alphanumeric, it isn't any of the common separator or spacing characters, so it's unlikely to be of use in an attack.

That is a reasonable approach if you know that what you are decoding is supposed to be UTF-8.

If you don't know the text encoding because there is no information to indicate it (or you don't trust that information to be correct) then you will have to guess and "decode as UTF-8 for valid UTF-8, use some legacy encoding otherwise" is a common approach (used e.g. by many text editors).

I cannot believe I did not notice that. I will rerun all of my testing with a valid UTF-8 byte sequence :)

Huh, I'm surprised emojis aren't more popular for SSIDs... can't wait until this knowledge spreads more and we'd have a vomit of color when we open the "Wireless Networks" menu.

OTOH for most people the SSID is "Linksys 4FBD" or similar...

> OTOH for most people the SSID is "Linksys 4FBD" or similar...

And to think that one of the major reasons behind having random strings after <Vendor name> (Apart from non-technical people in apartment blocks being super confused), is so that you can't go around rainbow tables that work for large swathes of the routers you would encounter.

> can't wait until this knowledge spreads more and we'd have a vomit of color when we open the "Wireless Networks" menu.

You're limited to 32 bytes, which limits the spew somewhat. Some emoji are up to 4 bytes long, so you can in theory get a sequence of 8 of them in a row if you want. Should encourage a little bit of creativity to fit within those lines...

I don't even want to know if any system would process things like bell characters or Right to Left special character...

Unrelated note: Had to file a bug last month because OpenWrt's web interface kept accepting more than that and stopping wireless from coming back up when you tried. Javascript length checks are weird.

I work on some ecommerce sites. I've had to cancel orders because the order exports can't handle emoji in the fields. I can't wait until baby names have actual emoji in them. I bet some idiot has already tried it.

Obligatory XKCD: https://xkcd.com/327/

> Both the s8 and the Firestick are rendering the result in what I deem as the correct way with it showing the name just with some of the vertical characters cutoff.

At least one is doing a poor job, though, because the diacritics look nothing alike…

> After asking around on the Apple discord server someone said it might be using the Mac OS Roman character set. It turns out it which is strange because iOS used UTF-8 internally and not Mac OS Roman as that was phased out with the release of Mac OS X.

I would guess that some part of IOKit is passing a C or C++ string to CoreFoundation using an inappropriate function or using the “system encoding”. I can’t remember of the top of my head, but Mac OS Roman might also be encoding 0. In any case there’s certainly a convention going on there with a poor default or some sort of strange compatibility story.

(I’m actually curious if there is “supposed” to be an encoding for this. Perhaps Mac OS Roman is just as correct and more convenient?)

The first Apple Airport routers predate MacOS X, so it wouldn’t be crazy for the initial MacOS X implementation to fall back to MacOS Roman as backcompat to routers configured with MacOS 8.6/9. And then if they never changed it since for 99% of users the UTF8 auto detect works fine...

And out of curiosity, taking some from this:


especially the Asian ones, seems to varying from mildly amusing to interesting effects, when you try to set them as SSID.

What's so naughty about Lightwater Country Park?


(I'm not calling you a twat, I'm pointing out that twat is probably the problem.)

Ok i am a bit angry, first i was thinking that a fly shit is on my screen, then that my GPU has a problem, then i read the Title ;)

It's really crazy, looks completely different on my bsd-box compared to my linux-laptop LOVE IT!!

If you want to share screenshots Ill happily put it up on the site. My email is me (at) hampton {dot} pw

Sure, here you go:


My Canon printer won’t join my SSID containing an emoji, helpfully throws generic E36 (or something like that). All Apple devices show and connect to the SSID just fine.

I'd be curious to see how a car may display that.

I've paired my phone with a family members Volkswagen SUV and it could not display the SSID properly, an emoji.

Most laptops are capable of displaying emoji SSIDs (bluetooth and wifi).

In my firefox it looks like four "a"s with a little rat sitting on top of the first of them.

On my Firefox it looks like four "a"s, with a sort of tower over the first "a" that ends in a frowny face with an accent over it. Is this[1] what you're seeing and describing differently? Or are we having different things displayed by Firefox?

[1] https://i.postimg.cc/nVPBqXjV/fireunicode.png

On my computer I see three different representations: In the text on Hacker News, I see the stuff on top of the first "a", in the tab title, it is on top of the second "a", and in the window title, it doesn't render the SSID string (although the rest of the title is displayed).

Same, I actually assumed that's what the string was supposed to be, now opening this in chrome I can see wildly different it looks

This[1] is how it looks to me on my Firefox, is that the Chrome version or the Firefox version on your side?

[1] https://i.postimg.cc/nVPBqXjV/fireunicode.png

Oh weird, are you on windows? I'm on mac and I see all the diacritics squished down into a small pile in firefox

I actually thought there was a squashed bug on my screen.

I've got some dirt on my screen. Be right back.

I tried to use my nail to scratch it off, no luck.

Very cool. It's pretty interesting to see the various failure modes. Some seem straightforward (e.g., the font is missing the glyphs) while others seem to be parsing limitations.

As an aside, this finally convinced me to explore using additional SSIDs in creative ways with emojis.

Out of curiosity, I ran this test on Nintendo Switch: https://i.imgur.com/8o2LLUm.png

It seems like its OS doesn't support combining characters.

My SSID is a single emoji, and the Switch displays just the missing char/"box" for my SSID as well.

For most of the Western world, if you take the set of all commonly used characters in the language(s) that are widely recognized in each country and form their intersection, you'll have at least the Arabic numerals and plain A-Z.

If SSIDs were restricted to just those characters, it would be fine in the Western World. But of course there is more to the world than the West.

Question: do most or all non-Western languages also have small subsets of characters that would be fine to restrict SSIDs to? For instance, Wikipedia tells me that Persian is written with a 32 character alphabet, and Arabic uses 28 characters for its alphabet.

I'd expect that for every alphabet-based language, there is a similar base set of characters you could reasonably limit SSIDs too, and so avoid all the problems you get with allowing full Unicode.

How about the languages that use logographic writing systems, such as Chinese, Japanese, Korean, and Vietnamese? Do they all have reasonable (albeit probably very large) subsets SSIDs could be limited to that would avoid all their weird stuff that can happen in Unicode but still allow most reasonable names to be used?

Don't forget that some of these are left-to-right (e.g. Hebrew, Arabic). Words are rendered left-to-right, and early email software would just expect each word to be sent reversed so that simple RTL rendering could be used. UTF solves this (and many other issues) quite nicely.

I tested this out of curiosity, and all iPhones I could find in my household rendered correctly in UTF-8 with only 12 octets [0]. This is replicated on iPhone 7, SE and XR, all running 13.5.1. So it may well be the issue was fixed in 6s or 7.

[0] https://i.imgur.com/KDau4PP.jpg

At least it nowhere caused an exploitable crash

On popular, actively maintained operating systems.

Plug in your cheap Chinese IoT device and see what happens...

It might actually do well if you feed it Chinese…

Last I checked late last year, my PlayStation 4 was unable to connect to my network when I used a single emoji in the SSID.

My Logitech device won't even acknowledge an SSID with Japanese katakana.

Tried to set the SSID of an Android Phone Wi-Fi thetering, it said it exceeds the maximum character limit and does not let it set. Bummer

This is a really good post that shines some light on how the insanity of encodings still isn't fixed today, since so many operating systems still don't completely use Unicode everywhere.

Some of the reasonings behind why the characters are displayed like that are slightly incorrect, though, so here are some corrections:

I'm going to supply each example here with some python3 code to reproduce with, with the following definition:

`data = b"a\xcc\xb6\xcc\x81\xcc\x93\xcc\xbf\xcc\x88\xcc\x9b\xcc\x9b\xcd\x90\xcd\x98\xcd\x86\xcc\x90\xcd\x9d\xcc\x87\xcc\x92\xcc\x91\xcd"`

First, let's start at the beginning:

> My router just cut the name down to 32 octets though to stay complient > This was what was being sent according to iw > `a\xcc\xb6\xcc\x81\xcc\x93\xcc\xbf\xcc\x88\xcc\x9b\xcc\x9b\xcd\x90\xcd\x98\xcd\x86\xcc\x90\xcd\x9d\xcc\x87\xcc\x92\xcc\x91\xcd`

If you look at this closely, the last byte in this sequence is `\xcd`, which is an incomplete UTF-8 character. It's missing the final `\x84` that the router cut off (along with the three additional `a` characters).

> with the raw hex being > `97ccb6cc81cc93ccbfcc88cc9bcc9bcd90cd98cd86cc90cd9dcc87cc92cc91cd`

small mistake: the hex of `a` is `61`, not `97` (that's decimal), but otherwise correct.

> Galaxy S8 running Android 9 with Kernel 4.4.153 > Amazon Firestick

Everything correct, except for a small detail:

These two devices render the result of UTF-8 decoding while ignoring bytes that are invalid unicode (in python3: `data.decode('utf-8', 'ignore')`)

> iPhone 6 running iOS 13.5.1 > Apple TV Second Generation

Completely correct. This is definitely Mac OS Roman (in python3: `data.decode('mac_roman')`)

> Windows 10 Pro 10.0.19041

This one is a incorrect again:

Windows is interpreting the characters in the "Windows Codepage 1252" (also known as "Western") encoding and ignoring invalid characters (in python3: `data.decode('cp1252', 'ignore')`)

Decoding every character separately as UTF-8 would fail (since every byte that can be a continuation of a UTF-8 character is not a valid start byte).

Interpreting every character as a Unicode code-point number would give something very similar, but not exactly the same: What Windows decodes as quote, caret-y thing, angle bracket-y thing, tilde, dagger, double dagger, and single quote fall into a control character block at the start of the Unicode "Latin-1 Supplement" block (`\x80` to `\x9f`).

> Chromebook running ChromeOS 83.0.4103.97


The Chromebook seems to have rendered the ASCII a, but replaced all other 31 characters with question marks.

> Kindle Paperwhite running Firmware 5.10.2 > Vizio M55-C2 TV

Also correct.

Those two devices seem to opt to display hex instead of falling back to question marks as the Chromebook does.

I hope this comment gave some useful insight into why these devices decoded it this way :)

Hey, I am the OP. Thank you so much I will go through and amend what I got wrong, anyway that you wish for me to credit you?

If you want to credit me, just tag my twitter :)


How are you running iOS 13 on an iPhone 6? Or did you mean 6S?

> Comparing how different devices display the SSID “á̶̛̛̓̿̈͐͆̐̇̒̑̈́͘͝aaa”

I always though that such Unicode characters not allowed in the HN titles.

This is a wonderful article and great work. I love this type of content. Brilliant!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact