What Apple, and Microsoft before them, are discovering is the tools and processes we use to create software simply doesn't scale. We're simply reaching the limits of what these giant software teams are able to produce and keep running. We can discuss solutions, but currently not everybody even agrees we have a problem. That's the first step we need to take.
I disagree that this is the product of scale. Looking at some of the bugs coming out of Apple lately and how they respond to them (assuming it's really a trend and not just a run of bad luck) it feels like they're slipping into a culture where individual feature teams are making emotional or schedule-based decisions about security without central guidance. A good example is the recent bug on MacOS where a low-privileged user could kill other users' processes and reboot the machine, but allegedly this wasn't considered a security problem. It doesn't seem like they have a core part of their organization that drives security practices and knows how to triage and give guidance on these issues, because if the bug had been escalated to a group like that, it would certainly have gotten some traction. It would be nice to hear an insider's take on it...
I run iOS public betas. The staggering improvement in quality between beta 1 and beta 7 seems to be evidence that they doing largely effective QA of major “happy paths.” What I do find interesting is that in early betas, things are broken that imply there are either constant fundamental rewrites going on, or that architecture is sloppy. (Hope it’s the former.) After the final betas, the remaining bugs seem to be untested edge cases. Of course, with complexity, there’s a combinatorial explosion of edge cases to test, so it’s likely their QA is constantly being outpaced.
> where a low-privileged user could kill other users' processes
Incorrect. A low-privileged user can kill their own processes, which is the expected behavior of kill(-1). The only issue is the fact that this can apparently also kernel panic the machine. But the fact that it kills all the processes belonging to the user is the expected behavior of this call.
You're just making assumptions. All we know is that there's a failure.
We don't know what kind of and how many errors triggered the failure, nor do we know what the faults causing the errors were.
We certainly don't know what kind of hazard analysis they did and what kind of mitigations they have in place, nor do we know whether they accepted a specific amount of residual risk for e.g. economic or market pressure reasons.
We don't know what kind of QC they do.
Without this information, it's pointless to talk about how things feel or seem. This is a very large engineering organization, working on a very large and complex system. It's not possible to find the root cause (or any cause) of a failure by using guesswork or by reporting to familiar failure cases.
Exactly. One can discuss and imagine all they want without full information and it's very normal and socially and personally productive. I'd refrain from issuing final judgements unless you have "all the info" but that's pretty obvious.
One can gossip about Apple's technical problems, but it's neither interesting nor a good use of the time of most of the people on HN I'd imagine.
I don't see how that's productive, what has been "produced" here? Even the reddit comment from an (alleged) Apple employee was contradicted by other (alleged) Apple employees.
Gathering any insights into Apple's quality issues does not seem to be possible at all from internet comments... that makes the whole thing misleading and pointless to me.
I was speaking in a slightly more general sense. I mean what can we as humans do then? Just shut up and go on our way? I don't necessarily disagree that there isn't much of tangible value to be gained for us, but I don't think it's useless to discuss these scenarios. Hopefully it helps us human inform future decisions we may all have to make one day, and if people talk about problems enough they have a way of becoming acknowledged.
Au contraire yourself. Your statement that this is the only thing we can talk about, does not contradict the assertion that talking about it is pointless.
Bingo. The entire system including the software, hardware, people, customer demands and market pressures, (lack of) regulations has become too complex to handle by humans.
These repeated errors looks like classic system or normal accidents as defined by Perrow in his book.
part of me is wondering if this is just a case of showing the limits of the current "developers can do qa" development model.
I do think a lot of these issues could be easily caught before shipment by having a well paid adversarial QA department, but at the same time it is easy to see why this is not done, despite all the griping on tech websites about code quality going down all across the board everywhere, it's not like people will stop buying iphones etc. so it's an easy ROI decision to continue with the "users are QA" mindset.
What's the Unicode codepoint of the character? Is it a single Unicode codepoint or multiple? (I tried to search for info on the web, but Chrome kept crashing.)
Supposedly it's a normal Telugu character, but how could that have gotten past even minimal testing in Hyderabad? Or is it some strange combining corner case? Telugu script is kind of unusual in Unicode, since it is syllabic but has separate codepoints for vowels and consonants, so character rendering is done by combining a vowel and consonant character into a totally different character. (Apologies for probably oversimplifying and mangling that explanation.) This is unlike Japanese katakana and hiragana, which have separate Unicode code points for each syllable so rendering is straightforward.
So: "JA" "VIRAMA" "NYA" "ZERO-WIDTH NON-JOINER" "VOWEL AA"
That looks like a non-trivial character. I wonder how many other crashing characters can be generated from this pattern.
JA is a consonant, VIRAMA indicates a lack of vowel, NYA is another consonant, and it's non-joined(?) with the vowel AA. Can a Telugu speaker explain what that's supposed to do?
It happens for Telugu, Devanagari, and Bengali, for pretty much any (C,C,V) choice in Telugu, for any (C,C,V) choice in Devanagari where the second consonant is 'ra', and any such choice in Telugu where the second consonant is 'ra' or 'ya'. Some vowels don't work.
<consonant, virama, consonant> usually forms a ligature in most indic scripts. usually, that ligature is formed by munging the first consonant and tacking it on to the second.
However, in Telugu, it works the other way around -- the first consonant stays the same, but the second consonant is munged and placed below it. As a result, many forms render <virama, consonant> as a sort of "composite combining character", with a placeholder for where the first consonant goes, and the second consonant below it.
In Devanagari, the 'ra' consonant also does this, and 'ra' and 'ya' in Telugu.
This also happens for all Kannada consonants, but I can't trigger the bug with Kannada.
ZWNJ isn't really specified for Devanagari or Telugu; it can both make the vowel render separately or do nothing. In Bengali for some vowels it changes their form; however this bug occurs for more than just those vowels.
<nitpick>All Unicode characters map, one-to-one, to their code points. A code point being a numeric identifier. It's a grapheme that combines multiple characters to form a unit of writing.</nitpick>
<nitpick>The Unicode standard does not have a single definition for "character" because there's multiple interpretations. One reasonable interpretation is "a grapheme cluster".</nitpick>
More specifically, here's what the Unicode Consortium glossary defines for "Character":
> Character. (1) The smallest component of written language that has semantic value; refers to the abstract meaning and/or shape, rather than a specific shape (see also glyph), though in code tables some form of visual representation is essential for the reader’s understanding. (2) Synonym for abstract character. (3) The basic unit of encoding for the Unicode character encoding. (4) The English name for the ideographic written elements of Chinese origin. [See ideograph (2).]
An accent mark by itself has zero semantic meaning in a written context. It's a modifier. But you need to know what it's modifying in order to assign it any sort of meaning. We're talking about semantic meaning within the context of a written language, not technical details.
And if it get's in your history, it'll crash every time the character shows up again. I ended up having to delete the entry manually:
$ sqlite3 ~/Library/Safari/History.db
SQLite version 3.19.3 2017-06-27 16:40:08
Enter ".help" for usage hints.
sqlite> delete from history_items where url = "https://pastebin.com/9Tr8ytTr";
Ohhh nice. Yep, definitely crashed Safari on OSX 10.13.3 when the character is in the address bar. Firefox seems fine and is my daily driver, fortunately
Firefox Developer Edition 59.0b9 here on latest High Sierra. Opening the link doesn't crash anything, but highlighting the paste did. I would have expected the tab at most to crash, but it took out the whole browser
Original iPad here running iOS 9.3.5. Everything works as expected. I can't seem to crash anything with this string. Oh well, I guess that's the benefit of not upgrading, ever.
I checked the character, it is anything but weird. It is a valid character in a language used by millions of people.
For those curious: It is a Telugu character; Telugu is one of several widely spoken languages in India.
I guess it depends on your definition of "widely spoken."
According to Wikipedia, the number of people speaking Telugu is 0.97% of the world. It's not even a statistical margin of error.
Still, how hard can it be to have a machine step through all of the possible combinations of every iOS-supported character set and jam them into iMessage to see if they're failsafe?
>This is why we shouldn't be writing new code in C.
Depends on your priorities. I'd take occasional bugs over jitter, latency, CPU inefficiency, and excess memory usage. And all that in practical terms means C is very environmentally friendly.
I suppose so. As an embedded developer it appeals to me, but I haven't looked too much into it TBH. Do you know of any analysis done on the bugs in servo or any other large-ish project?
Right, I just wanted to examine it in the 'proof is in the pudding' sense where if Rust is as good as people claim, then certain types of bugs would be entirely eliminated. I'm usually weary of such claims :)
There are compartmentalization options, such as XPC Services on macOS (specialized sub-processes that can “crash” if something is risky).
Also, non-C languages frequently implement lots of library features on top of C because it would be a ton of work not to (and slower), which means any language could be at risk.
Apple has been known to fuzz their code. Just because they didn't uncover this particular bug doesn't mean they didn't do any fuzzing, it just means the fuzzing didn't hit this case.
// my point in posting this was to demonstrate a bug ... that array will eventually run out of memory and this loops infinitely but not using a standard language loop construct... but I see someone didn’t think that was relevant
We need a new "Falsehoods programmers believe about character encodings" article to be added to the falsehoods list [1]. Joel Spolsky wrote in 2003 about the dangers of the ignorance of character encodings [2].
It is in other languages just to be clear. C++ for one example could suffer from it.
But largely the issue is, when an error occurs, can the state change to [undefined]. In some languages, like C/C++, it is possible through bugs to enter an undefined state. In many higher level languages, all states are defined even if not all states are actually handled by the developer's code.
If all states are defined you can use tooling to look for states which aren't handled by any execution path, and ideally handle them. If they remain unhandled there is a documented way to handle the unhandled known state (e.g. global exception handlers).
In a language with undefined states that isn't possible, all you can do is look for areas likely to result in an undefined state, but even that is easier said than done.
Higher level languages save you from a few classes of bugs, and make it easier to find bugs because things are more narrowly defined.
This is one of the most painfully ignorant things I have ever read.
There is absolutely no such thing as "all states are defined". This has nothing to do with the language. There is nothing magical about a "higher level language" to safeguard from failures. At best you could be referring to managed languages, where memory access and exceptions are controlled and the extent of a aborted execution is predetermined - but still not immune to just closing a process when they misbehave. That is precisely what a process should do. Nonetheless, nothing in this bug report indicates that the system is acting in an undefined manner. The library that failed may have, by design, been programmed to close when font rendering reached these conditions. We don't know if this was a signal abort or an intentional if (bad_outcome) close()
Any level from the most managed sandboxed level to the most host level process may fail, regardless of the language used.
To be honest, though, if someone on my team suggested we implement an automated test that tries sending every Unicode character (and it would be applied to every interface of every app and API, right?) I would have objected that this was an over-complicated, over-engineered solution that will probably be too slow.
I'd argue that a set of test data selected to cover a range of patterns, especially ones that are considered risky (either is know to have caused problems in the past or appear complicated or tricky) would have almost as good coverage and be an order of magnitude more useful.
The problem with "run-every-case" tests is that they start off slow and get exponentially slower if you try to go deep you very quickly end up with tests that take too long to run to be useful. (e.g., {every Unicode character} is one thing... {Every Unicode character} X {every interface and app} is a LOT more. {Every Unicode character} X {every interface and app} X {every build} X {every device model} X etc. is impossible). So you end up with very broad but very shallow tests. And that means less coverage ultimately, not more.
Generally, I think you'll get more efficient, effective, useful tests if you tailor the test data sets to the problems you see rather than going for blanket coverage.
Since this affects a huge number of applications and even desktop applications, it's a low-level library that is causing the problem. I'm not sure it's unreasonable to test every unicode codepoint against shared library if that library is responsible for the rendering of unicode text.
But I agree it would be unreasonable to do that for every app that uses the library.
You have a point. It's probably not unreasonable to test every Unicode codepoint against a low-level shared library, at least for APIs that process a single codepoint.
I don't think it's a no-brainer though.
And it is unreasonable to expect a testing technique to be applied in every case where it is reasonable to do so. There are a lot of techniques that are reasonable to use in any give case but it would be no sense at all to apply every one of them. To focus on this one, now, amounts to Monday morning quarterbacking. Once you know the bug it's easy to see how it could have been caught. Of course if you somehow knew of a bug ahead of time you wouldn't need any test at all!
Yeah, I don't disagree. Arguing this would caught by better testing is close to implying that all software bugs are avoidable; we simply have to test "better" until there are no bugs ever.
> Generally, I think you'll get more efficient, effective, useful tests if you tailor the test data sets to the problems you see rather than going for blanket coverage.
Sure, but fuzzing and "throw everything and the kitchen sink at it" type testing will generally expose logic bugs you weren't aware existed. Aka the unknown unknowns. Its not like this is an either/or proposition, you could run the fuzzing type gauntlet tests every week or so.
>I've never done any fuzzing personally, but wouldn't this be discoverable internally during testing?
Yes. This seems like a somewhat embarrassing fail in terms of whatever automated testing Apple does. It's a pretty minimal and restricted case of fuzzing, in fact I'm not sure it'd qualify as "fuzzing" at all because here the crasher is just an actual Unicode character, part of a fixed set that can be completely run through in deterministic time. Unicode is certainly large in human terms but in terms of automated test data sets it's not.
Given the notorious difficulties and edge cases Unicode parsing has long created, testing every single individual character in Unicode as part of standard unit/regression testing seems like something that should just be done as a bare minimum for an operating system or parsing framework/library release. That's not to say there wouldn't be more complex inputs that would cause problems that sufficient fuzzing could discover, but this kind of a error from a single real language character shouldn't have slipped through, it's not random.
Unicode and image/document handling frameworks are both areas that should have extensive fuzzing as well as deterministic input sets for testing. They shouldn't need to be messed with much either once they're "done", so in the case of an org like Apple this is the sort of place where it might even make sense to really devote resources to getting at least some parts of it formally verified.
I wouldn't be so quick to call Unicode characters a "fixed set", because of the way that they can be arbitrarily composed from multiple codepoints. If you've encountered zalgo-text, for example, you've encountered characters made from a base letter and like 20 combining codepoints. Enumerating all of the possible characters is infeasible.
To be specific, here are the codepoints in the character from Pastebin:
U+0C1C [Lo] TELUGU LETTER JA
U+0C4D [Mn] TELUGU SIGN VIRAMA
U+0C1E [Lo] TELUGU LETTER NYA
U+200C [Cf] ZERO WIDTH NON-JOINER
U+0C3E [Mn] TELUGU VOWEL SIGN AA
I don't know if the ZWNJ is in there to make the bug happen or to neutralize it. But the entire group of five codepoints renders and selects as one character for me (in Chrome on Ubuntu).
I would hope that after all the character-crashing-everything bugs that have hit iOS there is a team dedicated to replacing the text rendering code with something more fault tolerant.
In chrome I can see the character for a while before it crashes. That leads me to believe that the actual crash is not in the text rendering code but in some component above that receives some garbage at some point.
the issue is in the compositor, what determines where the windows are that contains all your GUI widgets. You can reproduce the bug on some of the newer versions of OSX.
Likely the next two programs are disagreeing on character, and therefore widget size.
Don't knock instrumentation-based fuzzing when it comes to discovering complex edgecases and code paths.
I've used AFL [1] to generate semantically valid, crashing multi-statement SQL queries when given ~50 CPU days and a simplistic SQL-like database. All from thin air, with an empty text file as the initial testcase. It can also come up with valid JPEGs [2] given enough time.
Not technically correct; <consonant, virama, consonant, zwnj, vowel> is not a sequence you'd see in actual Telugu text (zwnj doesn't do anything for telugu).
However, there are similar test cases for Bengali, and ZWNJ does have an effect on some Bengali vowels, so it's definitely a real world character in that case.
For those you looking to test this out yourself, I have a python script that could be helpful in pulling iMessages off your iPhone backup (assuming because it might be difficult to access them if the application is in a crash loop).
For me it will take Google fixing the update situation for Android. Things would have to get quite a bit worse until I feel that a non-technical person, who does not buy a new phone at least once a year, would be safer on balance.
And it's interesting that you bring Samsung that for 3 years issued patches every single month for my old Note 4. You may not get latest and greatest Android version immediately (missing the essential "hamburger emoji fix") but you'll still get security updates.