...can someone explain how the repo keeps resurfacing? I haven’t promoted it in a long time. (Looking at the repo traffic, it recently spiked on the 6th, but nothing since then.)
The BLNS allowed me to prove it and I hooked it into our integration and fuzz tests which managed to shake out a few bugs.
No idea if it's just coincidental resonance though.
Is there a place where common things in the dev world like this are accumulated? For example, a list of all countries or list of the US states, for use with an HTML dropdown. I know there are various repos on Github that maintain these types of lists, such as English stop words, profanity word lists etc, but is there a service that accumulates these in a familiar, structured api?
Some of them are quite meta, such as https://en.wikipedia.org/wiki/List_of_lists_of_lists
For a more structured source, Wikidata aims to be that, but I cannot comment on its completeness.
For example when the intent is "list of place where the USPS ships" or "list of state-level political jurisdictions where US residents live"
It looks like 1.0 was just released as well :D
About the repo: nice job, I've used it a lot when testing sites/apps I did, good job on providing different formats too so it's easy to automate testing!
I stumbled on it for the first time and already saved it for future testing. Thanks :)
This puts me in mind of interviews, where I point out to the candidate that their update routine would go into an infinite loop if there was a 2 node cycle in their data. So then they give me an if statement that detects only the 2 node loop. I've even then asked what would happen if there was a 3 node loop, and gotten a 2nd if statement for that as well.
Apps which might crash due to processing untrusted data should be reading that data from a queue. Then a 2nd process can monitor the 1st process, taking problematic data off the queue if necessary. This way the 1st process can die, but be restarted, and your smartphone OS doesn't have to look completely broken due to a primary function just dying, requiring the user to reboot.
I hope someone tells me Apple has already done this. It's been something approaching a decade, at this point.
1. Creating a graceful degradation path is a complicated and expensive, from a UX, product, and testing perspective - if you silently drop the message, that's much worse than crashing; if you insert some sort of tombstone "this message could not be parsed" into the UI, you have to figure out if that's something users will actually understand. These code paths are hard to exercise with normal integration or manual testing, and it's likely that over time they may stop working correctly.
2. Monitoring graceful degradation is more complicated than tracking crashes, especially for unforeseen issues like what you're describing. There's a real risk that if this periodically showed up with "failed to parse" messages, the actual issue would have remained undiscovered by Apple a lot longer.
3. Again, no familiarity with iOS, but on other mobile operating systems I've worked with there's a significant memory overhead to using multiple processes and IPCs this way. If your device is under memory pressure, the extra cost of a pipeline like this introduces new failure modes. First, your other process may be killed by the LMK, which the sender process could interpret as being a failed message. Second, you may increase the amount of memory required to receive a text message in the background - this can directly affect critical high-memory use cases like taking a live video with the camera. There may just not be enough room to do both at the same time.
4. There's significant input processing that can't be done in another process, or meaningfully isolated from that of other messages - the best example being UI rendering of the text. If there's a magic string that causes view measurement to fail, that's extremely difficult to attribute to any specific piece of input - so adding this extra process to do validation won't really help you, since the "validated" string will fail later on.
2. Since this has happened several times at this point, the point is moot. Since this is familiar territory, why not log it and have the system tell HQ something weird happened?
3. May have been true in the past. Very doubtful at this point.
4. Irrelevant. The process can have the UI, crash, then be restarted by the other process. There is no validation done by the other process. The 2nd process only detects that the 1st process has crashed, then removes the offending datum from the queue.
This has real privacy considerations. I would rather Apple not simply get a message sent to me forwarded to them. Is the trade off worth it in this case? That's not so easy to pick for all ~1/2 billion or so users.
All Apple would need to get is the notification that there was a crash. It could be left to the user if the crashed message gets forwarded.
If it's a widespread attack, then the traffic analysis would still be useful. A hex dump of the offending data could be sent.
Which may then crash the prompt asking if it can send the crash...
Thereby hiding away the problem, diminishing the urgency of a fix while strongly favouring it's exploitation. Something that's just a DOS in an ASLR environment escalates to a remotely exploitable hole where the atacker can iterate untill they get the correct offset.
Unless a process is expected to crash or terminate by design, automatic respawning is not a good security practice.
Programs should never, EVER crash. If you can’t decide/handle a Unicode codepoint, replace it with a question mark (or box etc) and carry on. And yes, a big-name program like iMessage absolutely should have unit tests for this.
I can’t believe I’m having to explain this.
TrueType and OpenType are turing-complete all by themselves.
There are nearly 140,000 Unicode code points; exhaustively testing the combinations is computationally infeasible.
It is the definition of a decidedly non-trivial problem. Furthermore layout passes often involve hundreds or even many thousands of string measurements.
Every platform has had numerous bugs, including exploitable ones, in font and string handling.
Oh yes they should. If, for example, the alternative is to continue in an undefined state, potentially corrupting user data. Not all error conditions are recoverable.
The problem is with triggering pathological cases in the complicated beast that is font rendering. And rendering text is something that's so extremely common in apps, and so crucial, that it needs to be done in-process in order to avoid unacceptable overhead.
Of course 'monitoring' is not exactly the same as 'processing' the actual data - so you'd have to know exactly what to pull off. Which would be just as easy to add that to the original process and just delete the unwanted data as you pull it out of the queue anyway.
In either event, you seem fairly bright - I probably don't follow. The main thing is that I believe system processes do monitor those other processes and restart them if they crash - I think the problem comes in when data overflows or buffers spill out into other processes memory areas. ( I am not a programmer, although I do try to be for fun ).
The most universal way to detect the problem is to catch the exception from the crash. However, the process that does that will be in the realm of undefined behavior, so it should be crashed.
Of course 'monitoring' is not exactly the same as 'processing' the actual data - so you'd have to know exactly what to pull off.
Your scheme is limited to all the known strings. My scheme can respond to anything in the future which might crash the 1st process.
I wouldn't remove it from the list, as this sort of thing adds a bit of character, fun and a sense of shared values to programming; what's life for if you can't have a laugh? But if you're ever making jokes of that ilk and someone in the room goes a bit ... quiet ... do gently check up on them.
The fact that Plato played a part in your case shows that while anything can act as a trigger, deep philosophical ideas do seem to be a common theme (2 out of 2 on this thread).
However, if such a surrounding simulation 'errors' that can be noticed. Moreover, there is the rather pressing issue of 'being turned off'. Its a difficult (semantic?) discussion of whether you can notice your world being deleted, but the thought of unpredictable all-encompassing extinction isn't quite comforting.
A better reason not to care, is the idea that there is nothing you can do about it. However, if we are being simulated, there is probably some observer who might react to our actions. Moreover, if we are in a computer simulation we might be able to trigger some 'bug'. So even that line of reasoning isn't clear cut.
In the end, life does seem to be easier when you accept reality. Even more so because any way we have to influence the 'outside' is so infeasible as to be useless.
After all, you don't have data from a real reality to compare to.
Is it coincidence that some of the world's most famous philosophers went mad?
Yes. Some of <insert anything> went mad, that doesn't prove causation at all.
The most acute episode of psychosis happened in hospital and involved some pretty random behaviour accompanied by total loss of response to input. This did not last more than 10 minutes or so, but on return from that state she continued to behave in a way that pretty much resembled someone on a bad LSD trip. This went on for about a week, during which she hardly slept and was never sure whether awake or not, and reported many types of delusional thinking, e.g. that reality was not as it seemed (think inception/matrix/comment in the OP here). Also paranoia that family, friends or strangers were staring or wanted to harm her or her child. Also occasional thoughts of self harm as a means of escape from this awful state of being.
This was all pretty much solved by Olanzapine which she still takes 1.5 years later although dosage is now tapering down to almost nothing. Also CBT. At some point the diagnosis was augmented by one of "O-only OCD" resulting in a second prescription for Sertraline. But the condition is very much under control; she is back to work part time, in a role that includes responsibility for children; her co workers know the score and are alert to possible issues but there haven't been any that she couldn't manage herself. Occasionally various thoughts disturb her but she has learned to handle them appropriately.
Mental health issues are often exacerbated by stigma, and this is in part because people (for understandable reasons) don't rush to share their experiences with others, which is why I've taken the time to write this post. I would recommend anyone to familiarize themselves with the material published by Mind and others, partly because it's interesting, partly because spotting the signs - and understanding the differences between e.g. psychosis, schizophrenia, bipolar, OCD, etc (even if the science behind them is far from well understood) - might help someone you know one day.
Edit: interesting addendum, the matrix/inception fear had been there long before the psychosis. I remember years ago telling her I had lost The Game ( https://en.wikipedia.org/wiki/The_Game_(mind_game) ) which she hadn't encountered before. I explained that "the object of the game is not to remember the game" ... she took it to mean we were all in some sort of simulation and got kinda scared even then.
Clearly, the doctors did not intend for that specific message to appear specifically on my screen. Rather, their treatment will make my brain "fight back" against the delusion, and reality will start to creep in unexpectedly. Maybe someone will stop me on the street and tell me "you need to wake up". Maybe I'll hear a new song titled "you are in a coma". Maybe I'll find a hospital gown on my wardrobe. Or maybe a random string will appear on some repository.
Note: I also realize how much you can eff up someone's day by doing all of this. If you are one of those Youtube pranksters, don't do that.
Going majorly off topic,I have an internal voice, I don't have an internal text. I assume this is normal, and not diagnostic of being in a coma.
I mean do you think computer scientists named a sub collection of instructions after you, or do you think its a hint the wake up? Wake up.... Wake up subroutine.
Ｔｈｅ ｑｕｉｃｋ ｂｒｏｗｎ ｆｏｘ ｊｕｍｐｓ ｏｖｅｒ ｔｈｅ ｌａｚｙ ｄｏｇ
𝐓𝐡𝐞 𝐪𝐮𝐢𝐜𝐤 𝐛𝐫𝐨𝐰𝐧 𝐟𝐨𝐱 𝐣𝐮𝐦𝐩𝐬 𝐨𝐯𝐞𝐫 𝐭𝐡𝐞 𝐥𝐚𝐳𝐲 𝐝𝐨𝐠
𝕿𝖍𝖊 𝖖𝖚𝖎𝖈𝖐 𝖇𝖗𝖔𝖜𝖓 𝖋𝖔𝖝 𝖏𝖚𝖒𝖕𝖘 𝖔𝖛𝖊𝖗 𝖙𝖍𝖊 𝖑𝖆𝖟𝖞 𝖉𝖔𝖌
𝑻𝒉𝒆 𝒒𝒖𝒊𝒄𝒌 𝒃𝒓𝒐𝒘𝒏 𝒇𝒐𝒙 𝒋𝒖𝒎𝒑𝒔 𝒐𝒗𝒆𝒓 𝒕𝒉𝒆 𝒍𝒂𝒛𝒚 𝒅𝒐𝒈
𝓣𝓱𝓮 𝓺𝓾𝓲𝓬𝓴 𝓫𝓻𝓸𝔀𝓷 𝓯𝓸𝔁 𝓳𝓾𝓶𝓹𝓼 𝓸𝓿𝓮𝓻 𝓽𝓱𝓮 𝓵𝓪𝔃𝔂 𝓭𝓸𝓰
𝕋𝕙𝕖 𝕢𝕦𝕚𝕔𝕜 𝕓𝕣𝕠𝕨𝕟 𝕗𝕠𝕩 𝕛𝕦𝕞𝕡𝕤 𝕠𝕧𝕖𝕣 𝕥𝕙𝕖 𝕝𝕒𝕫𝕪 𝕕𝕠𝕘
𝚃𝚑𝚎 𝚚𝚞𝚒𝚌𝚔 𝚋𝚛𝚘𝚠𝚗 𝚏𝚘𝚡 𝚓𝚞𝚖𝚙𝚜 𝚘𝚟𝚎𝚛 𝚝𝚑𝚎 𝚕𝚊𝚣𝚢 𝚍𝚘𝚐
⒯⒣⒠ ⒬⒰⒤⒞⒦ ⒝⒭⒪⒲⒩ ⒡⒪⒳ ⒥⒰⒨⒫⒮ ⒪⒱⒠⒭ ⒯⒣⒠ ⒧⒜⒵⒴ ⒟⒪⒢
Bing doesn't understand any, if you search it, the first result is the github repo with these frases.
Necessarily then, there should not be other sets that encode things you can't in Unicode, since then you can't displace those with Unicode.
So, particularly in the early life of Unicode the goal was collect stuff that already exists and add it to Unicode. (These days we're finished with that and most new work is on adding things that weren't previously in any character set)
Two controversial things were done, at opposite ends of the spectrum, during this period of consolidation:
What you're seeing here is adding copies of the entire Latin alphabet, but with some particular property that Latin users would not really consider part of the character, such as "bold" or "italic" but which _was_ preserved in some character set being used somewhere. Without this choice, if we converted a text file encoded in a way that distinguished bold and italic characters, we'd lose that bold/ italic and it might be significant. This would be like when you get a black & white photocopy of a sheet that says
"Ignore any text below shown in red"
Um, but none of this text is red? Oh. Probably some of it was before it was photocopied. Oops.
At the far end of the spectrum, a process called CJK unification took place in which scholars of the languages using characters from the Han ("Chinese") writing system decided that although say, a Japanese character set and a Chinese character set both had a particular character, and the Chinese and Japanese would not draw this character the same way, actually in some linguistic sense it's the same character (and in many cases the visual differences are quite small) and so Unicode should not encode both separately.
There's a coherent technical argument for why both these types of decisions made sense, but they were nonetheless controversial.
You should not use weird characters like italic Latin letters in new documents, but you also should not transform these characters without warning when processing an existing document as you may lose important meaning.
Both had always bothered me deeply, but I'd never stopped to think that they're also essentially opposed in philosophy to each other. So now that I'm aware of that, I'm triply annoyed :S
The first line, Ｔｈｅ ｑｕｉｃｋ ｂｒｏｗｎ ｆｏｘ, originates with east-Asian character based terminals, on which ideographic characters occupied twice the space of alphabetic characters, and there was also a desire to have latin characters that were also double width. See https://en.wikipedia.org/wiki/Halfwidth_and_fullwidth_forms
The middle lines are included as mathematical symbols. The justification is that 𝑖 is a mathematical symbol that has its own independent meaning, which only coincidentally looks like italicized i. (I think this is silly, and naturally leads to a bloody mess as people misuse these symbols as letters, and in this case there is no backwards-compatbility excuse.) https://en.wikipedia.org/wiki/Mathematical_Alphanumeric_Symb...
The final line, like the first, is apparently present for compatibility with pre-Unicode east-Asian character sets. https://en.wikipedia.org/wiki/Enclosed_Alphanumerics
It was _supposed_ to be +++[wait 0.5s]AT... but that pause was patented by Hayes, so to avoid patent issues, many cloners didn't require it. At least, that's the story I heard.
Other words that had to go included 'Jew'. Nothing 'naughty' about the word it just didn't look good in the codes so that had to be added to the big, long list of mostly 'naughty' words.
We also used a reduced dictionary so that codes could be given out over the phone without people getting it wrong, so no '1, l, L' problem or '0, o, O' problem.
I didn't find a handy library for our exact requirements and, as a consequence, I would advise people to roll their own code for such applications.
The fun part was making my non-technical manager in charge of the big, long list of rude words. I think that his additions were the only contribution to 'code' that he ever made.
There is a similar project, which I think is better organized and has more lists to play with:
I think this is unfortunate since strings with newlines in them could certainly trip up some bash scripts. Filenames on Linux can contain newlines for example.
In our case the string that had the issue should have been machine generated but ended up being assigned user input... Is that kind of string also helpful in a list like this?
EDIT: Yup, there's an issue for that https://github.com/minimaxir/big-list-of-naughty-strings/iss...
It's a little ironic BLNS can't handle certain strings :)
I laugh every time I see this one. Which is just about every year on HN.
I can't for the life of me remember why, or even what I was testing. I honestly think it was more to amuse myself than for an actual practical reason.
$ echo -n 'dW5kZWZpbmVk' | base64 --decode
All of the entries in this repository are encoded using Based64, for obvious reasons.
If the reasons are not obvious, read the header of the repository here:
> Also, do not send a null character (U+0000) string, as it changes the file
> format on GitHub to binary and renders it unreadable in pull requests.
To prevent unintentional breaks of GitHub’s UI, the author decided to encode the strings.