Some user managed to break things, and with their permission we gathered detailed wifi logs and found they were connected to an SSID that was an ASCII depiction of the equation: boobs plus penis equals a smiley face. The issue was the forward slashes, presumably there to add fingers to the scene. Must have been an awkward customer service follow up when we told them to change their SSID while they waited for an update.
It's generally a bad idea to have the user in control of filenames you create if those files are not on a device they own.
And really, using raw environment-derived data directly on the filesystem?? What if the SSID had been "/etc/passwd" or something similar and it wrote to that?
It was fun but some OSes didn't show it correctly, in particular Windows. It would just show it in HEX. And more annoyingly, some devices refused to connect to it at all, especially IoT crap like those WiFi power sockets.
So eventually I gave up.
PS: Something with more vertical stuff would also be really fun, some of these can write across multiple lines of unrelated content! Unfortunately most OSes block this from happening now. Example:
So the Unicode above this would write through the next lines on some platforms, even system screens like the wifi chooser :)
But these quickly get too long for an SSID too.
EDIT: though not on native HN, I think it might be a result of having HNES installed.
On native HN in Firefox Windows it also works but it stops at the "reply" button under my post.
And on native HN on Firefox Mac it doesn't work at all, strange enough. Firefox must rely strongly on the platform rendering.
It is not clear in this case if the router sets this flag or not. Either way there is no stipulation in the spec about how the UTF-8 characters should be displayed so many of these options are potentially valid.
The other is for any code unit that won't decode you emit U+FFFD the Unicode Replacement Character and then you carry on decoding.
For humans U+FFFD makes it obvious something is wrong, it's typically visualised as a black diamond with a white question mark. And for a machine it shouldn't match parsing rules, it isn't an alphanumeric, it isn't any of the common separator or spacing characters, so it's unlikely to be of use in an attack.
If you don't know the text encoding because there is no information to indicate it (or you don't trust that information to be correct) then you will have to guess and "decode as UTF-8 for valid UTF-8, use some legacy encoding otherwise" is a common approach (used e.g. by many text editors).
OTOH for most people the SSID is "Linksys 4FBD" or similar...
And to think that one of the major reasons behind having random strings after <Vendor name> (Apart from non-technical people in apartment blocks being super confused), is so that you can't go around rainbow tables that work for large swathes of the routers you would encounter.
> can't wait until this knowledge spreads more and we'd have a vomit of color when we open the "Wireless Networks" menu.
You're limited to 32 bytes, which limits the spew somewhat. Some emoji are up to 4 bytes long, so you can in theory get a sequence of 8 of them in a row if you want. Should encourage a little bit of creativity to fit within those lines...
I don't even want to know if any system would process things like bell characters or Right to Left special character...
At least one is doing a poor job, though, because the diacritics look nothing alike…
> After asking around on the Apple discord server someone said it might be using the Mac OS Roman character set. It turns out it which is strange because iOS used UTF-8 internally and not Mac OS Roman as that was phased out with the release of Mac OS X.
I would guess that some part of IOKit is passing a C or C++ string to CoreFoundation using an inappropriate function or using the “system encoding”. I can’t remember of the top of my head, but Mac OS Roman might also be encoding 0. In any case there’s certainly a convention going on there with a poor default or some sort of strange compatibility story.
(I’m actually curious if there is “supposed” to be an encoding for this. Perhaps Mac OS Roman is just as correct and more convenient?)
especially the Asian ones, seems to varying from mildly amusing to interesting effects, when you try to set them as SSID.
(I'm not calling you a twat, I'm pointing out that twat is probably the problem.)
It's really crazy, looks completely different on my bsd-box compared to my linux-laptop LOVE IT!!
I've paired my phone with a family members Volkswagen SUV and it could not display the SSID properly, an emoji.
Most laptops are capable of displaying emoji SSIDs (bluetooth and wifi).
As an aside, this finally convinced me to explore using additional SSIDs in creative ways with emojis.
It seems like its OS doesn't support combining characters.
If SSIDs were restricted to just those characters, it would be fine in the Western World. But of course there is more to the world than the West.
Question: do most or all non-Western languages also have small subsets of characters that would be fine to restrict SSIDs to? For instance, Wikipedia tells me that Persian is written with a 32 character alphabet, and Arabic uses 28 characters for its alphabet.
I'd expect that for every alphabet-based language, there is a similar base set of characters you could reasonably limit SSIDs too, and so avoid all the problems you get with allowing full Unicode.
How about the languages that use logographic writing systems, such as Chinese, Japanese, Korean, and Vietnamese? Do they all have reasonable (albeit probably very large) subsets SSIDs could be limited to that would avoid all their weird stuff that can happen in Unicode but still allow most reasonable names to be used?
Plug in your cheap Chinese IoT device and see what happens...
Some of the reasonings behind why the characters are displayed like that are slightly incorrect, though, so here are some corrections:
I'm going to supply each example here with some python3 code to reproduce with, with the following definition:
`data = b"a\xcc\xb6\xcc\x81\xcc\x93\xcc\xbf\xcc\x88\xcc\x9b\xcc\x9b\xcd\x90\xcd\x98\xcd\x86\xcc\x90\xcd\x9d\xcc\x87\xcc\x92\xcc\x91\xcd"`
First, let's start at the beginning:
> My router just cut the name down to 32 octets though to stay complient
> This was what was being sent according to iw
If you look at this closely, the last byte in this sequence is `\xcd`, which is an incomplete UTF-8 character. It's missing the final `\x84` that the router cut off (along with the three additional `a` characters).
> with the raw hex being
small mistake: the hex of `a` is `61`, not `97` (that's decimal), but otherwise correct.
> Galaxy S8 running Android 9 with Kernel 4.4.153
> Amazon Firestick
Everything correct, except for a small detail:
These two devices render the result of UTF-8 decoding while ignoring bytes that are invalid unicode (in python3: `data.decode('utf-8', 'ignore')`)
> iPhone 6 running iOS 13.5.1
> Apple TV Second Generation
Completely correct. This is definitely Mac OS Roman (in python3: `data.decode('mac_roman')`)
> Windows 10 Pro 10.0.19041
This one is a incorrect again:
Windows is interpreting the characters in the "Windows Codepage 1252" (also known as "Western") encoding and ignoring invalid characters (in python3: `data.decode('cp1252', 'ignore')`)
Decoding every character separately as UTF-8 would fail (since every byte that can be a continuation of a UTF-8 character is not a valid start byte).
Interpreting every character as a Unicode code-point number would give something very similar, but not exactly the same: What Windows decodes as quote, caret-y thing, angle bracket-y thing, tilde, dagger, double dagger, and single quote fall into a control character block at the start of the Unicode "Latin-1 Supplement" block (`\x80` to `\x9f`).
> Chromebook running ChromeOS 83.0.4103.97
The Chromebook seems to have rendered the ASCII a, but replaced all other 31 characters with question marks.
> Kindle Paperwhite running Firmware 5.10.2
> Vizio M55-C2 TV
Those two devices seem to opt to display hex instead of falling back to question marks as the Chromebook does.
I hope this comment gave some useful insight into why these devices decoded it this way :)
I always though that such Unicode characters not allowed in the HN titles.