Hacker News new | past | comments | ask | show | jobs | submit login
A photo is crashing some Android phones (bbc.com)
377 points by pseudolus 30 days ago | hide | past | web | favorite | 226 comments



Curiosity (stupidity?) got the better of me and I ended up awake until 2am fixing this last night on my Pixel 3a.

I was fortunate enough to have developer mode & USB debugging enabled, which allowed me to install some code to clear the wallpaper using the WallpaperManager [0].

Sadly it wasn't as simple as pushing the code to the phone and running a command to execute it. When the device boots, it would crash _almost_ instantly and reboot into recovery mode after attempting to show the wallpaper a few times. This meant I couldn't put my passcode in, which meant Android refused to run the application I made to clear the wallpaper.

The ADB (Android Debug Bridge) tool [1] lets you replicate user input for the connected device, so you can basically swipe the screen / enter keys from the command line. I was able to use ADB to swipe up on the lock screen and enter my passcode. This was a start but wasn't enough, as the crash was happening before the wallpaper deleting app could be started.

I figured the unlock process was taking too much time, and found out after Android 8 (Oreo) you can actually use ADB to change or remove your passcode [2], provided you know your existing passcode. After I removed my passcode I was able to reboot the device, and run a single ADB command to swipe up on the screen and delete that cursed wallpaper.

And no it wasn't worth it, but I enjoyed the challenge

[0]. https://developer.android.com/reference/android/app/Wallpape...

[1]. https://developer.android.com/studio/command-line/adb

[2]. https://android.stackexchange.com/a/194779


Times like this are why it would be really nice for google to allow a root shell for developers when booted in recovery mode or something. Trying to fix this from userspace in the short window you have before the systemui crashes... what a silly avoidable nightmare.


You can root your phone, but then you lose SafetyNet which means that things like banking apps, etc. won't work. For "normal" developer access there's adb which does have some special privileges (nowhere close to actual root, but better than nothing), but using it to change screen background settings would be quite clunky. Even using the adb shell to delete the file would run into restrictions on general storage access that newer Androids have.


Magisk has an option to force enable passing SafetyNet, plus hide Magisk. If you combine those you can have root and still run banking apps and Google Pay like normal.

https://www.didgeridoohan.com/magisk/MagiskHideSafetyNet


Magisk Hide is an unreliable workaround, since remote attestation of the running OS could be required at any time to pass SafetyNet. Even the author of Magisk Hide is well aware of this.


Could you explain that more, I didn't understand your response


See this tweet chain from the Magisk author. It seems Google might enable remote attestation at some point like they did for a short period during this tweet, defeating MagiskHide.

https://twitter.com/topjohnwu/status/1245956080779198464?s=1...


To me adb was the last resort to make an old tablet of mine useable. That little bastard resisted all rooting attempts, and was filled with branded junk which made it slower than a dead sloth; luckily adb allowed me to remove a ton of "system" rubbish.


I think that depends on the phone and the root application used. My HTC is rooted with magisk and it claims SafetyNet is still in operation. I don't use my phone for much banking, but I've never seen any issues.


If you attach adb and run "stop" before SystemUI starts, it will shut down everything and let you play around in the shell as long as you want.


You may want to write a blog post on how to recover so that others may benefit from your experience.


I would love to see the write up, but I don't think most people have ADB/Dev mode, or have the know how to execute the whole thing.


A lot of people maybe know someone who do ;)


The problem is, there’s no way to enable ADB if you’re in a crash loop. Your phone has to already have it enabled, as far as I know.


If stock recovery mode (or TWRP - not sure if it’s still a thing as it’s been ages since I last used Android) allows access to the file system you should be able to figure out and edit whatever configuration file controls whether ADB is enabled.


This requires a wipe in most cases to unlock your bootloader (if that's even possible for your phone, not all can be). At that point, you might as well just wipe your phone.

EDIT: Some phones also require you to unlock it in settings too don’t they?


Seems to be the original issue (2018) - https://issuetracker.google.com/issues/76022479


In theory, you could develop an app that removes the image, publish it in the play store and then push it to your phone from stores web interface.

No need for dev mode that way :)

Edit: for this to work, the app must act as a service/receiver so it can start automatically.

Edit 2: now that I think about this, Google could easily fix affected phones remotely via their play store services.


Apps can no longer run services without initial user interaction. No broadcasts are delivered until that happens.


Ah, did not know that.

Good work Google, I guess.


Thanks for the instructions. Curiosity got the better out of me...

Another simple way to fix this if you have something like TWRP installed, is to go into data/system/users/0/ and rm -rf wallpaper


TWRP doesn't work on Android 10


I've vouched for this comment, because if it were that easy to disprove it where is the counter-example of a working Android 10 TWRP setup?

Sure, there's some work happening in the TWRP repos, but I can't figure out does that mean TWRP works on Android 10 or not.

https://gerrit.twrp.me/plugins/gitiles/android_bootable_reco...


It depends on what android version your phone had. You cannot flash TWRP on phones that came with 10 by default (there might be few workarounds). Mine came with 8.0 and I flashed it later.


I was so close trying myself and I am glad I did not based on the details you provided.

I would consider doing write up. Another less skilled person may have tried doing the same.


I wrote a lot of the Skia color management code involved here, and I think maybe I can clear some things up? I'd be happy to answer any questions you might have, either here or on https://bugs.chromium.org/p/skia/issues/detail?id=10301 where we've been trying to think about how much Skia's involved with the root cause of this bug.

There's no "Skia color profile" per se. When Skia's asked to write a JPEG from a pixel buffer tagged with a color profile (SkColorSpace), we auto-generate an ICC profile programatically from the parameters that describe the color gamut and transfer functions we use to represent those color profiles. That's what the code in this file is up to: https://source.chromium.org/chromium/chromium/src/+/master:t...

So this profile is one of infinite possible profiles that'll have "Google/Skia/..big long MD5.." somewhere in them.

This profile in particular looks like it's probably describing the ProPhoto color space, which is very, very large and uses imaginary colors outside visible light to increase its gamut. While the JPEG itself is always storing values in range in the ProPhoto gamut, when you convert that to another gamut (say, to match the display, or to sRGB) it's very easy to end up with logical color channel values less than zero or greater than one, which do make a kind of sense, but not to code that expects colors are always sRGB bytes.

It's unclear to me what gamut the colors being read in that Java code are in, and that'd very much affect what the appropriate values for that LUMINOSITY_MATRIX should be. Those are the correct values to get the luminosity for sRGB colors (though I think what's being calculated here is subtly-different gamma-encoded "luma" instead), but they're not likely a sound way to get luminosity if the image is in any different gamut. A typical strategy is to convert any image you load to the display gamut early, since you'll need to do that eventually to display it. And it's more and more common that phone display gamuts are not sRGB, usually something wider now like Display P3.


From the Skia bug report:

> out of bounds write in Java

Isn't Java supposed to be a language safe from these kinds of bugs? At the very worst it throws an exception, which should be caught and dealt with accordingly… How can that crash the whole phone?


> At the very worst it throws an exception

Yes, from what I have read that's exactly what was happening, it was throwing an IndexOutOfBoundsException. Apparently, nothing was catching that exception, so the whole process was terminated.

> How can that crash the whole phone?

The main problem is that the process which was crashing is the Android equivalent of the window manager or the X server. It recovered from that crash either by restarting the process (which immediately crashed again), or by rebooting the phone (and immediately crashing again).


> Apparently, nothing was catching that exception, so the whole process was terminated.

Apposite moment perhaps to refer to the mind-boggling finding in "Simple Testing Can Prevent Most Critical Failures: An Analysis of Production Failures in Distributed Data-Intensive Systems" (Yuan et al, Usenix 2014 https://www.usenix.org/conference/osdi14/technical-sessions/...):

"almost all (92%) of the catastrophic system failures [in the real-world study] are the result of incorrect handling of non-fatal errors explicitly signaled in software"


I lends this leads credence to the go way. Verbose because than its hard to mess up


I mean... not really. This is the equivalent of `if err != nil { return nil, err }` which is so common it's a mindless default behavior worthy of memes: https://pbs.twimg.com/media/DCIF7-2W0AEAv9c.jpg

It absolutely results in the same kinds of failures.


Saw one case where it was `if err != nil { return nil }`, so the error was completely ignored and function exited without error...


If i had to guess: No relevant catch in the right place.

Perhaps there is no catch at all, so the systemUI process crashes. Since that process is critical, and responsible for things like the status bar, some supervisory process panics, and reboots the phone.

Alternatively perhaps a higher level catch clause does catch this error, but it is at too high a level to be able to recover cleanly by say displaying a black background instead of the wallpaper.

Hard to say without being more familar with the codebase.


It's been several years since I've used Java, but from what I remember most devs rarely catch unchecked exceptions (like out of bounds access) in their code.


You generally wouldn't want to catch any random exception you don't specifically know you should or need to catch, as that way lies the madness of working with corrupted data. Usually you'd have a toplevel handler as a final line of defense, but of course that doesn't help if the software automatically restore the state which makes it crash, then you just get a crash loop.


The solution is, at every layer of the system, to think about what it means to recover from a failure. In this case, the obvious answer to a failed wallpaper rendering is to avoid rendering the wallpaper, not to allow the failure to bring down the entire UI with it.

I didn't always have this habit, but picked it up when I switched from working on servers to working on safety-critical hardware. It was a lot easier a habit to pick up than I thought it would be, and I've found it makes you write a more robust, well-understood system.


That's good advice, to always be vigilant about how a system would recover from a possible error/failure, at every layer. I imagine this is a discipline that a software developer or engineer learns from experience, by running into edge cases, weird bugs, corrupted state, crashes, etc.

It's quite a responsibility on the human side of software. Handling unexpected errors is a pretty easy thing to miss, for programmers who are inexperienced or have become too comfortable (not paying attention to every detail).

I can't help but think that the language or compiler (or tests, I suppose) could enforce this better, so that it forces the programmer to explicitly handle recovery from failures - at certain layers/thresholds - so that the error doesn't bubble up to the top and bring the whole thing down.

As you pointed out, an error in the UI layer shouldn't have been allowed to trigger a system crash loop. But, that's kind of the nature of bugs like these, they creep in through unexpected logic paths. In which case, the development environment needs to better enforce the (explicit) behavior of all possible paths that an error could take. Well, I'm sure that's constantly improving, as a perennial challenge of reliable software, and there will always be a need for discipline and attentiveness by the humans involved.


> But, that's kind of the nature of bugs like these, they creep in through unexpected logic paths

Sure, agreed, this isn't a post-hoc "what an obvious bug, they should have just <done everything perfectly from the start>". I was just addressing the dichotomy laid out by the GP comment, which only mentioned top-level generic exception handling instead of case-specific, intelligent exception handling.


It makes sense to catch any exception in certain specific places. For example in this case, wrap the entire batch of code for loading the file, parsing and processing it, and displaying on the screen in a try-catch in whatever code builds and displays the wallpaper. Then if anything goes wrong, you can log the error, and do some sane recovery steps such as removing that wallpaper and reverting to some default wallpaper that you know is good.


There is another way: Common Lisp and smalltalk both have resumable exceptions (although they’re more limited in smalltalk): in an interactive environment, if the exception isn’t handled, you drop into a debugger that lets you pick a recovery strategy. If something like this were standard on phones, you wouldn’t get stuck in a boot loop and Android could provide the option to enable dev mode or check the internet for a solution.


You can configure a JVM to pause whenever an exception is first-thrown and prompt to attach a debugger. Same with the CLR. I think Windows lets you do it with any program that uses SEH too.


(In a better world, Android was written in Common Lisp.)


Newton almost had it.


Yeah but to most users, getting some cryptic console they can implement a workaround in ... isn’t much better.


It doesn’t need to be cryptic, a screen that says something like “enable dev mode” “attempt to fix from computer” (along with a desktop program for modifying settings) “check for updates” “reset to defaults” would be friendly enough


Visual basic had an "ON ERROR RESUME NEXT" feature, which would simply ignore all exceptions and keep running code.

Obviously, when one thing goes wrong in a program, frequently that leads to other things going wrong, but IMO thats better than crashing the whole process.

Sure, you might get unexpected output, but for many usecases, unexpected output is better than no output at all.


> but IMO thats better than crashing the whole process

It's not. Crashing the process (if the error remains unhandled at the top level) is the right answer. When the program raises an exception, you want to either handle it or pass it up, not ignore it. It was raised for a reason. Once an error was raised, if the program ignores it and happily continues all bets are off. This is like reasoning out of false premises: everything goes, which is undesirable. Data is probably corrupted, maybe preconditions are unmet.

For this reason, most automated code checkers consider ignoring exceptions a serious antipattern.


Isn't it effectively the same thing as swallowing Throwable with an empty catch block in Java?


Yes, but you need to wrap every line of source code in your whole program (and all libraries) with an empty catch block to have the same result.

Just a single top level catch block would prevent other code after the error occurred from running, which might still be able to work fine without the results of the faulty code.


I see. I'm just not sure that situations where this way of exception handling is a viable strategy are common enough to warrant this syntactic sugar (which seems very easy to misuse). The Visual Basic docs also recommend using structured exception handling instead. Perhaps it is more useful in scripting contexts.


They are not viable strategies, which is why the analogous strategies in modern-day languages are heavily discouraged and wouldn't pass a code review.


Except maybe healthcare or finance or nuclear plant management software...


Turning off exceptions in this case doesn't prevent you from handling errors. You just have to check the error object to see.


No, this is not what the person who mentioned ON ERROR RESUME NEXT is arguing. They are arguing in some cases it's better if the program plods along and produces any output, even wrong output, rather than crashing.

I think the post above explains why this isn't the case in mission critical software. I'd argue the lesson from software engineering is that it's not advisable in most situations, not just for nuclear reactors. The error was raised by a reason, after all. I'd say it's the opposite: there are some limited cases where you absolutely know the exception is not relevant, and in those cases you can knowingly swallow the exception.


They stated that the feature "would simply ignore all exceptions". It allows you to ignore all exceptions, but you don't have to and probably wouldn't in most cases.

It was an example of blaming a tool for a hypothetical misuse that is not the only way you can use it.


I think the original poster implied one would indeed use it to ignore all exceptions. But also, it's exactly like an empty catch all exceptions block, which is considered a mistake today, for good reason.

I'll go an extra mile and say I'm familiar with this construct in Basic, have seen it used, and nobody ever checked any error codes when using it. It's a footgun that's used almost exclusively to shoot at feet.


Yep, I think that's exactly what happened, an uncaught java.lang.ArrayIndexOutOfBoundsException crashed the rather important com.android.systemui process.


Every time there's a new programming language which claims to be "safer" than existing ones, I look at what they do differently. In some cases, they've come up with a new way to restrict computation so that undefined/error cases aren't possible. In many cases, though, their solution is more along the lines of "don't do that", often with syntax that discourages (but doesn't forbid) it.

The trouble is that it's really easy to say "access the 10th position in a 5-element array". It's possible to make a programming language that doesn't allow you to say this, but it either makes it really awkward to do anything with arrays, or simply pushes the error somewhere else -- or both.


Well, if this process had been written in C, this may have been a security issue, so Java is still an improvement.


Being able to deny a victim access to their phone just by having them set a given wallpaper definitely IS already a security issue.


OK, fair enough. But you know what I meant. :-)


Java is "safer" in the sense that the error is defined. If you try accessing an array with an out of bounds index, you'll get this exception. A language is less safe than Java if the resulting behavior is undefined, or even worse, if it's random or depends on the content of some other memory.

Java doesn't claim it's magic. It's objectively safer by this measure. (More specifically, Java claims to be "memory safe" [1])

[1] https://en.wikipedia.org/wiki/Memory_safety


It's "safe" in that you can't read or write memory you're not supposed to be able to access. The program will crash, but at least you can't steal (or sabotage) data in memory.


If the same thing happened in X trying to render an i3 background, for example, the way you'd recover is starting a new session - which you've proactively setup to not launch X automatically - and inspecting logs and configs to see what went wrong.

It's basically the same thing here, but without some physical keys bound to 'switch to new session'. And maybe you can do that over adb anyway, which is about as (not) normal user friendly.


> which should be caught and dealt with accordingly

Which is easy when you have an Either<L, R> construct for your returns or when you have checked Exceptions, which Java devs dislike.

So you get errors like this, because devs are often not aware or misremember all the failure modes of calls they make. Because they're only humans and they're using tools and practices that like to pretend everything that matters is the happy path.


Yeah, from a static safety perspective, Java’s checked exceptions are a big missed opportunity: if Java could have figured out a better syntax for them that was less annoying to deal with they could have been really interesting. As it is, they actively prevent certain sorts of abstractions.


Java's checked exceptions pretty much died when they introduced lambdas. They don't work well together, at least not in the java type system.

Further, I'd say that IndexOutOfBoundsException can happen pretty much everywhere, so it does not make much sense to make it a checked exception anyway.


> Further, I'd say that IndexOutOfBoundsException can happen pretty much everywhere, so it does not make much sense to make it a checked exception anyway.

Indeed. I don't know of any programming languages in which out-of-bounds array access is an error whose handling is enforced by the type system (checked exception, result, etc). Ditto for numeric overflow and division by zero. I believe even Ada doesn't force error handling of these.


It’s a bit different, but languages like Idris have type systems that can prevent out of bounds array access. I’d be interested in seeing what the exception handling equivalent is.


> they actively prevent certain sorts of abstractions.

Like what?


You can't have an interface implementation throw exceptions not declared in the interface, hence all implementations of a given interface have to declare the same failure modes. And then everything is nicely coupled.


A great example of this is ByteArrayOutputStream's close(). It's documented to do literally nothing, and yet you still have to "handle" an IOException that will never, ever happen.

https://docs.oracle.com/javase/7/docs/api/java/io/ByteArrayO...


The language doesn't force them to do that--they could override the interface and not declare the Exception. Then if you were working with an object declared to be ByteArrayOutputStream, you wouldn't need to handle the Exception. If working with the interface, you would, because that's the point of an interface.


Yes, this is a poor example. The declaration of that exception is either a mistake, or a way of reserving some wiggle room for future changes.


This is backwards of how I think about it, so I'm curious what you're thinking.

If an implementation of an interface can throw an exception that its interface doesn't declare, that breaks the abstraction. How can you safely use the interface? You either have to catch Exception (which is effectively the same as wrapping every exception in some InterfaceException class), or give up on handling any Exceptions thrown by the interface.


I don't think this is backward of your understanding ?

That's just how I phrased it 18 hours ago. The thing I find bad about it is not that exceptions have to be declared in the interface, that's good, it's that every implementations will have to declare them also, even in the case that a particular implementation has no failure mode.


This sounds like a misconception. Implementations don't have to declare the exceptions: https://news.ycombinator.com/item?id=23410443.


The interface can define a type parameter bounded to Throwable, and declare methods that throw that.

But in general, of course an implementation can't just randomly add checked exceptions, because that would violate the Liskov substitution principle.


I've run into this when working with lambdas in particular. Say for instance I want to track the performance of a particular bit of code in a standardized way. I can write a method that takes a supplier and handles the performance tracking while calling the supplier and returning the result. This way tracked functions don't have a load of eg. timer initialization etc. If the supplier throws a checked exception, it should be caught and handled by the code that actually cares about the call, not the performance wrapper.

Unfortunately, because of the way Java handles checked exceptions, I can't feed the supplier with a lambda or any other method reference that I'm aware of that throws a checked exception and let it be passed up to the original caller directly. So I need to catch my checked exceptions and wrap them in an unchecked exception to catch. Not pretty.

There's probably something I'm missing but the language certainly doesn't go out of its way to help with this sort of thing.


It’s been a while, but every time I’ve tried to mix higher order functions and checked exceptions in Java, I’ve found it pretty painful: there’s some annoying type checking issue around the way lambdas work in Stream.map or similar. Maybe I’ll come up with a demo when I’m back at my desk.


The lambda design team knew this was a pain point, and considered some designs that could've alleviated the problem (http://mail.openjdk.java.net/pipermail/lambda-dev/2010-June/...). I think they decided it wouldn't work well, but it's at least possible to do things here.

That said, I think the decision to do lambdas the way they did tacitly killed checked exceptions.


Java only makes you declare checked exception not catch them. Its not _that_ difficult to end up with checked exceptions which get thrown all the way up to main and crash your program. IndexOutOfBounds isn't checked though so it can get thrown from your code or someone else's and crash the whole process. Java exception handling is a mess.


“Restarting the device in safe mode (by holding down the volume button during boot-up) did not fix the issue.”

So, lesson learned: in safe mode, do not use a background image that the user set.


Ha. Perhaps I'm misremembering, but I think I remember this behavior (not using the user background in safe mode) back in Windows 95.


“The Wheel of Time turns, and Ages come and pass, leaving memories that become legend. Legend fades to myth, and even myth is long forgotten when the Age that gave it birth comes again.” - Robert Jordan


"And some things that should not have been forgotten were lost. History became legend. Legend became myth. And for two and a half thousand years, the ring passed out of all knowledge." - Tolkien


Unexpected diss to Robert Jordan for being a copycat.

I love both (I'm one of the few who wrestled through the entire WoT twice and enjoyed that) but your citation shows very pointedly how Jordan was often just copying LoTR.


I think they both just copied the idiom that history will repeat itself.


I'm fairly confident that Tolkein didn't write that. It's the prologue to the Fellowship of the Ring movie, and not in the text, as far as I'm aware.

So maybe Jordan is actually the one who got ripped off. Still though, pretty cliché idea, but the formulation is too close to not be influenced one way or another.


I just tried booting my Windows 10 in "Safe Mode" It, too, didn't display custom lock or desktop image in this mode.

https://support.microsoft.com/en-us/help/12376/windows-10-st...

Microsoft knows what its doing.


The bummer is that they made Safe Mode about 10 times harder to get into. Last time I tried, you could not hit F8 at boot time to select safe mode. You have to either be able to boot into the OS and say boot to safe mode, or you have to do it from recovery media, which doesn't come with new PCs. I think there's also the condition that you can get to recovery mode after the PC fails to start 3 times... either way, F8 made it easy enough to tell someone how to boot into Safe Mode over the phone without all the extra steps... but I guess you gotta save that half a second of boot time.


https://support.microsoft.com/en-us/help/12376/windows-10-st...

It's an unnecessary rigmarole to boot into safe mode without OS access, but is still possible.


Seems like it still falls under the condition that "you can get to recovery mode after the PC fails to start 3 times," as parent mentioned. Just a more explicit way of forcing the PC to fail to start.


They had a fun blog post about it back in the day: https://docs.microsoft.com/en-us/archive/blogs/b8/designing-...


I guess the obvious idea of holding down the Shift or Alt keys to extend the boot time and/or bring up the boot menu was off the table?


I think there's a problem of how the keyboard works and when in the boot process it's reset. With NVMe drives, the boot processes is even faster!


Just yesterday I pointed out how a teal wall had a similar color to the Windows 95 background. To me, it's as iconic as the hilly Windows XP scene.


It also removed the IE thing which showed icons or some stuff like that... Wait, that was in 98.



Was that the feature that would occasionally replace your desktop background with an error message?


Ugh. Disabling that was always my first action on a fresh install.


What, you mean you didn’t feel the need to give up 75% of your system resources so you could have an active website as the wallpaper of your desktop?


...but hey, it allowed you to use a non-BMP image as a wallpaper! :)


And Super Metroid (1992) let you customize the controls. Firefox (2001), after about twenty years of development (more if you count its predecessor's code), won’t let you, as of the 2016 add-on API change.


My LG G5 in safe mode does exactly that, it sets the user background to a black screen with "Safe Mode" in the corner.


I personally wouldn’t have put a lot of faith in android’s safe mode anyway. Nothing in that OS seems thought out carefully.


What makes you think that?


Their prejudice.


I'll bite.

Their prejudice toward...?


The Java ecosystem?


Android


Yeah my phone had the same wallpaper in safe mode...lol but TWRP saved me.


Granted we understand "brick" to mean the device cannot boot, but this comes pretty damn close on some phones, it seems.

> "After setting the image in question as a wallpaper, the phone immediately crashed. It attempted to reboot, but the screen would constantly turn on and off, making it impossible to pass the security screen," he noted.

> Restarting the device in safe mode (by holding down the volume button during boot-up) did not fix the issue."

If restarting doesn't fix it, and not even safe mode will allow you to change the wallpaper to get around it, what method of recovery is left for the affected user? Factory reset from recovery mode?


It depends. If the bootloader is unlocked, they could boot a custom recovery and alter the system image to try and insert a workaround (such as deleting the buggy library and dealing with the resulting breakage). If the data partition is fully unencrypted, they can even delete the offending file straight from recovery. On a bootloader-locked, fully encrypted device (the default nowadays) they'd be SOL and have to use the stock recovery or bootloader to force a factory reset - not only would that involve obvious data loss, but they would even have to deal with FRP afterwards.


You can also delete the offending file with a fully encrypted main storage, since twrp supports decrypting and mounting the storage on demand.


> since twrp supports decrypting and mounting the storage on demand.

It advertises that support, but I've never really seen it work. It seems that modern FDE on Android is such that you can only really "decrypt" from the system environment itself, not from different code - and it's not clear how to fix this.


>It advertises that support, but I've never really seen it work

works for me. it really depends on whether your TWRP distribution implemented it properly. AFAIK android phones don't have a mechanism to bind encryption keys to a system state (similar to sealing keys to PCRs for TPMs on PCs), so I don't think your theory is correct.


Last time I checked it was possible to factory reset the device from the bootloader menu thing. It causes the device to go in to an anti theft mode where you have to log in to the last google account the device was used with but it should be able to solve this issue.


Perhaps the only way to fix it would be get into fastboot mode and enable USB debugging, then look for that arcane settings file on the file-system which your OEM decided to use for configuring wall-papers and fix that file. I see no other way.


Wouldn't that settings file be on the user data partition, which is encrypted and thus not normally accessible in recovery mode?


(We detached this subthread from https://news.ycombinator.com/item?id=23404703, since that was just a terminological dispute and we changed the title to defuse it, but this subthread contains more information)



And the comment in Color.java:

  * @return A float value with a range defined by the specified color's color space


Using the sum of three floats to index an array?


What could go wrong!


It looks like this comment is not right. There is:

  public static int blue (int color)
Anyway this thing has range [0, 255] and adding three of them together as an index of an int[256] doesn't seem like it would ever work regardless of the colorspace.


It does normally work because they apply a luminosity matrix to make a "greyscale" version. The actual greyscale value being the sum of the three components, after applying this matrix. (Yeah, I know, the sane way is to use the matrix's inherent ability to sum the values, I mention that at the end).

This is done by using the formula: ".2126f * r + .7152f * g + .0722f * b" Apparently this will not yield out of bounds values if the colors are in normal range.

Yes, I've got some alarm bells going off in the back of my head about the float to integer rounding (or float multiplication rounding itself), possibly causing a value slightly too high, but it seems like that might not actually happen. Or perhaps it can only happen for some in range values that can only occur after a color profile correction. (i.e. the float versions of 8-bit sRGB never cause bad rounding, but coming from very specific other profiles might create such values). Lastly there is the possibility that the color values started out of range, so after the multiplication they can still sum to more than 256.

In any case, the sane way to do this, would be to create a true grayscale image with the matrix, and pick an arbitrary color component to look at. (I.E. using the matrix for both multiplication and addition, rather than only for multiplication, and then doing addition afterwards.) I'm guessing the `blue` et al static methods clamp their outputs, so this bug would have been avoided.


Red, green, and blue have been scaled by .2126, .7152, and .0722, which add up to 1.0 exactly (even in single-precision float), and if those values really were all in [0,255] range, the maximum value that could be produced by this math is only 254, due to rounding.

I think what's happening here is that the image is in a format that's holding those original red, green, and blue values in a format that can hold values outside logical [0,1].

Update: sorry, the comment below me doesn't seem to have a 'respond' link so I'll just edit one in here. I totally agree with you it's good practice to document your invariants in code, but in practice it would result in the same thing... an unhandled failure with nothing better to do than crash the process. In a way (if you squint) indexing into an array of size 256 is itself documenting the invariant that the index is less than 256. It's just that the invariant itself is wrong.


> sorry, the comment below me doesn't seem to have a 'respond' link so I'll just edit one in here.

For future reference: I'll often see that for a post in the context of a larger thread, but the 'reply' link has always appeared when I navigate to the post itself (by clicking on the timestamp). Maybe you already tried that, though.


Replying to your update: the reason you write out the invariants is then it becomes perfectly obvious in code review that the method only works for certain images and isn't protected against being called with unusable images, at which point the review can say "don't land this".

Truth in advertising. If this had been named "getHistogramOfSRGBBitmapElseOOBE" nobody would have stamped it because that's obviously dumb.


On my planet we write those invariants in code, such as

  CHECK(Color.red(pixel) < 55);
Or whatever.


You mean an assertion? I'm not sure how that would help. The code is still going to crash, and unless you tested for that specific case, it's not going to show up in testing either.


I actually looked into this when I saw it in the original tweet. I tried with a current Android 10 emulator, crashed SystemUI. Tried with the latest build of AOSP, was a-ok. Whatever the issue was, looks like it got resolved already in either the Android color library or the ImageProcessHelper for the wallpaper (the piece of code that was initially crashing with an out of bounds error). I haven't spent much time looking into exactly when/where it was fixed, though.

Quick edit to add that, despite this, I do not believe those changes have made their way into most devices. It seems the error stemmed from the possibility of returning a value over 255 when a histogram was calculated from the addition of color values. As stated in the article, this seemed to result from the use of the Skia color profile in particular. I do not know about other color profiles. The code mentioned by gruez was what I got when the emulator was crashing.


This could explain why it might not have shown up in testing.

It's not just that the image had a non-sRGB colorspace, it's also that the pixel values in this specific image were out of the expected range.


"Out of expected range" values should be the very first thing to test for...


And maybe it was; maybe the colour space aspect also was.

Testing combinations of edges is not so common/obvious.


Could you apply the bugged profile to any image and cause the crash?

GIMP sees it as a JFIF (apparently a predecessor to EXIF) with

"Google/Skia/E3CADAB7BD3DE5E3436874D2A9DEE126 Copyright: Google Inc. 2016"


Opening the picture in Windows Photo Viewer (the old on introduced in Windows 7) throws this error: "Windows Photo Viewer can't display this picture because there might not be enough memory available on your computer."

Other Windows included apps, like Paint, Paint3D and Photos open it just fine.


Paint is like the Notepad of Artwork apps. It can handle anything.

If notepad doesn't open it, nothing will.


Citing a comment I can find anymore:

  For everyone who tried it and wants to recover phone without clean wipe. remove these files (I did it with twrp)

  /data/system/users/USER_ID/wallpaper_info.xml
  /data/system/users/USER_ID/wallpaper_orig
  /data/system/users/USER_ID/wallpaper

  it resets your wallpaper to default one


According to the linked theory from @evowizz, any photo with an unsupported colorspace would cause this issue. Seems like someone would have noticed before? Maybe in QA or at least during a beta instead of 9 months after launch?


What you see right now is the normal Google QA process.

They have literally billions of beta testers, who are even paying for the privilege.


>Seems like someone would have noticed before

Apparently not, and it's not too surprising either. The only way such a photo would get on is either through the on-device camera, or downloaded from the internet. The former seems most likely, considering that most picture sharing sites probably normalize/sanitize/recompress whatever their users upload. Are there any android phones that produce HDR (ie. 10 bit color, not the effect that's commonly available) pictures?



but they didn't notice


The lack of quality control in the Android SDK is pretty impressive. It is by far the worst SDK I've ever used. Hell, even simple think like hiding/showing a keyboard requires a ton of code and checks. In 2020.


It is out of control.

I actually think between the HAL layer, OEMs, chipsets and 3P ecosystem it is very difficult to dig out of the hole that Android has built. Too much was put into short term thinking on annual releases and gap closing with Apple. While Apple planned APIs years out for end to end products, Android had to blindly copy APIs and everything is far less elegant. Android isn't working. It isn't capturing the smartwatch ecosystem well (Fitbit) and it isn't capturing the TV ecosystem well (Roku).

The net impact is harmful to developers. We don't get the reach and compatibility we were promised. We have very fragmented ecosystem to develop for. We have specific device SDKs we have to use to tap into novel hardware. I can't complain enough.

Android had a huge opportunity to define the next evolution of compute platforms on top of the linux kernel and it has kind of blown it.

I think there are even signs Google leadership sees this and is investing in a portfolio of strategies such as CameraX, Pixel, Flutter/Fuschia.

With the Huawei situation too, I think the general health of the Android ecosystem is some of the worst it has been. Globally I think we are going to see even more fragmentation.

We need something that ties this all together. It's not going to be a full blown OS layer, but something to manage the madness for 3P devs.


I find so tragic that Android is worse than J2ME in regards to fragmentation, worse than Symbian/iOS/Windows/Maemo/Jolla in C++ support, at every Google IO we get the best practices rebooted.

Oboe and Vulkan are very good examples how clunky everything is.

Here we have two API that are supposed to be relevant for the platform (real time audio C++ framework) and next generation 3D API support, for which we get told to clone github repos, compile everything from scratch and do the integration with the applications ourselves.

Comparing this with iOS SDK and XCode templates is just feels like having to nuke it all.

I have been moving away from NDK to WebGL 2.0 / WebAssembly, as it is good enough for my interactive demos and Chrome team at least seems to be a bit more sane.


Exactly this. Imagine you are trying to build some kind of camera app that involves layering on effects or filters. You've got to have fine grain control over all the camera sensors, the gpus for rendering and compute, and potentially neural accelerators for some inferencing. How do you begin to do this on Android? Vulkan itself is hard enough. Now try adding Android NN API, and Camera 1/2/X with it's image processors. Now make it work across more than a single OEM hardware device. None of this is as simple as iOS side with a clean API that works together across GPU, NN and camera, in native code.


I've made some WebGL/WASM experiments, too - getting my UI and 3D to the browser. Any references for running this on Android smartphones? (I want to make an app, not a website)


You can pack it as PWA,

https://whatwebcando.today/

https://whatpwacando.today/

Or if you need more low level access to the host OS, as TWA,

https://developers.google.com/web/android/trusted-web-activi...

As sidenote, the TWA idea started originally at Microsoft, where packaged PWAs get access to Windows APIs without you having to write your own wrappers.

https://developer.microsoft.com/en-us/windows/pwa/

I expect the ongoing Edge Chrome efforts to eventually improve the TWA experience.


WebGL/WASM sounds like a good approach to solve fragmentation. Things will get even better once WebGPU is supported as a Vulkan substitute.


The irony of WebGPU, is that we will be able to get it via Chrome and frameworks like BabylonJS, while the Android team is not willing to provide something similar to Java/Kotlin devs.

Sceneform seems to be dead, and Filament is nothing more than a tech demo.


Some of this complexity is unavoidable. pmOS is based on the ordinary Linux stack, but even then it has a lot of trouble with supporting basic features like phone calls, etc. At least Android works as a daily driver, even in its basic AOSP version.


The way NDK is managed, which took the pressure of game devs to actually fix it (see GDC 2020 Google talks) has nothing to do with unavoidable complexity, specially given how other platforms do it.

Same applies to rebooting frameworks every year, or half baked support for Java.

Latest examples being the c, naturally created as answer to Flutter, while they are still "selling" latest improvements on GUI tooling for Constraint and Motion Layout, which are going to be legacy when Jetpack Composer happens to be finally released as stable.

Speaking of which, all stable releases have regressions. There is hardly a stable release that isn't followed up by regression complaints on Android developer channels.


+1 to all this. Android needs to take a deep breath that might need a couple of annual releases. Fix native code experience, fix the core APIs. Fix the things that are going to be around for as long as Android exists and fix them for good. The issue is the phone is just evolving too quickly. Requirements are aggressively being pushed from both the OEM side and 3P app side.


I think there are two kinds of complexity with Android. One is given from the nature of the ecosystem and the pace at which smartphone platforms have evolved. There is no fault of Android developers here. However, there is some responsibility to design the frameworks and HAL interfaces to gracefully expand into future requirements. It seems there are lots of "oh shit" APIs where they needed to be reworked. Camera.. just kidding! camera2.. just kidding CameraX!


Maybe we should treat Android API like an x86 processor (no one writes x86 assembly except compiler writers) and let the "compilers" like React Native reduce the mental strain.


There is no need to wrap the Android API in yet another Android API. This effort is already happening with AndroidX (Jetpack aka support library). The goal here is to reduce fragmentation and perform higher level functions easier. React Native solves a different thing (cross platform compatibility).

Jetpack may be a good temporary solution, but there is no reason why the core APIs, the ones that sit between the hardware and 3P apps, can't just be better.


I use mostly the NDK, and as incredible as it may sound, I occasionally do miss Symbian even with its clunky Symbian C++.

At least Symbian had a proper C++ API, not a bunch of C wrappers to Java and C++ code, expecting everyone to write their own wrappers and cloning half maintained github repos released by someone on the Android team.


> even simple think like hiding/showing a keyboard requires a ton of code and checks

To be fair, I have had the same experience while doing iOS development.


I think Android SDK was made with something else in mind, I did read it was made for digital cameras, ironically Android have very bad camera api too, the concept of activity and fragment is just weird... I can say everything there is weird, but definitely not the worst SDK.


You're right. It was never designed as a mobile OS with a soft keyboard. Hell, it's 2020 and rotating the device still doesn't work as it should.


> rotating the device still doesn't work as it should.

What exactly is the issue?! Been using Android for about a decade now and can rotate the device just fine.


Yeah so it works, but from a developer perspective it's about 10x more complicated than iOS. It's also a source of bugs and crashes because of how Android manages the state change.


Rotating the phone restarts the app and it is the developer's job to restore it somehow.


For me, a lot of apps lose various state, e.g. text written in input fields, when rotating.


No wonder, rotating the phone restarts the app and it is the developer's job to restore it somehow.


I've heard there will finally be an api for keyboard state change.. not sure if that's added to the upcoming android version


Remember those iOS text handling bugs? Fun.


I wonder whether it would be a bad idea to have a "dev" mode where exceptions are thrown and a "production" mode of compilation where every single statement is wrapped in a try-catch block with a default behavior. For example:

    histogram = getHistogram();
should be wrapped automatically in:

    try { histogram = getHistogram(); }
    catch(Exception e) { console.log(e.toString()); histogram = DEFAULT_HISTOGRAM; }
Something like:

    y = a / b;
should be wrapped in:

    try { y = a/b; }
    catch(Exception e) { console.log(e.toString()); y = 1e999; }
Wrap every function call like this, and just don't let Java exceptions bring down an entire system in production especially when a lot of the time it's just a petty UI issue.

Obviously never do it in development though. But odd UI behavior instead of crashes might make a bunch more users happy which is ultimately what matters.


Ah, but your catch block doesn't look for exceptions in console, log, or toString().


It seems that all you'd need is a single try-catch around the high-level operation. I inherited a system where all the errors were swallowed and it was a nightmare. If something failed it was impossible to tell why.


Yeah, I agree that the developer shouldn't do that in the source code. I'm more thinking of a compilation mode where the system tries to consistently do something just slightly smarter than give up when there is an exception, and that is strictly only for production use.

A divide by zero error on a button's width shouldn't cause a spacecraft to give up and abort its mission to the moon. That wouldn't happen, but it's just a figurative description of what I would like. It's ridiculous for an OS to go into a boot loop because of a color histogram. A better behavior would be for the background to be set to an array of NaNs, and the background renderer to fail gracefully to a plain black or white background.


Floating point NaN is pretty close to what you're describing for floats only.

Really we need some kind of NaN value for all java objects.


Another article, which includes the wallpaper in question and the reason why it crashes the phone.

https://techxplore.com/news/2020-06-wallpaper-image-android....


One more argument why do we need more (and more) fuzzing on a regular basis.

Basically everything where we can pass some specially encoded date (read as format or protocol) can be made with bugs and edge cases not covered by test or by intent look (how to say it properly in English? :)

So yeah, fuzzing.


Tangentially related but that might interest fellow readers, you have multiple ways to make a valid image crash a device, one example being making a png bomb [1]. This is similar to a zip bomb, the file will by expanded to order of magnitude more bytes than its compressed size when opened, which might throw an oom error or fill up the disk quickly.

[1] http://www.aerasec.de/security/advisories/decompression-bomb...


Admittedly I haven't been following this story, but now that I see it on HN, am genuinely curious whether they were able to rule out that hardware is involved?

Also curious to know if perhaps there are flaws in the image format's spec that are being exploited?


Oh wow, I set this on my Pixel 3 and started immediately flashing on and off. It reset and allowed me to do a factory reset, and now it works. That’s crazy, don’t do this.

To be clear, this was on my old phone not my main phone, so I didn’t mind testing it.


The article title is "Android: Why this photo is bricking some phones", which more accurately reflects the experience for the overwhelming majority of affected users. Would be good to update that.


One of the biggest and most high profile software companies in the world can't reliably read image files on their flagship phone.

People are scared of AGI. This sort of shit scares me more.


TL;DR: color profile in the metadata could not be interpreted by some phones. When you set that picture as your background it can cause your home screen to hang.

Samsung fixed this with an extremely eraly June security update. I think it took less than a week.


I wonder if Samsung will release this security update for phones released 3 or 4 years ago...


This was not a real security issue, but in general they update even old flagship devices for critical security issue.

IIRC, last time they had a really bad bug they updated some 4-5 year old Notes.


I'm looking at this title and totally not willing to click on it... they don't include the actual photo I would hope.


The tweet that contains the image is included. You must set the photo as your phone's background for it to have any effect, so it is safe to click.


They do, although they say the bug only happens if you set it as wallpaper.

I wouldn't click from a phone just in case, though.


I clicked it from one of the affected phones and nothing happened. Saved it, looked at it from the gallery... Nothing.

Seems like the crash is in some wallpaper-related code: https://news.ycombinator.com/item?id=23404772


FYI, it is not bricking phones, but that doesn't make a good headline.

It is making some of them unusable, requiring a hard reset.


> It is making some of them unusable, requiring a hard reset.

Well that's still a "soft brick" right?


Fun fact, there are also images that can cause some human beings to crash. Namely those that induce gamma oscillations such as black and white bar patterns.

It would be interesting to train a generative NN on human EEG response to create images that produce novel or unusual effects on the viewer.


There is McCollough effect, but it requires a long exposure to “crash” one’s vision. Are there some examples of these patterns? (Beyond epilepsy-inducing blinking, ofc)



Is that Squaretop Mountain in the Wind River Range?


a photo I took on iphone x11 pro crashes my android when trying to set it as a background, wonder if this is related.


Article says crash, not brick. Brick is not recoverable.


We've changed the title to sidestep this. The word 'brick' is up there with 'doxx' on the list of curiously provocative words that people will strongly protest if you use them wrong, yet aren't exactly well defined. Not as high on the list as 'doxx' though. 'Troll' is on that list too.


It's described as crashing in loop, even on restart. For people who don't know how to reflash/wipe from recovery, it render their phone useless, so as well as bricked.

I'm curious to see the post-mortem on this one, i'd guess a bug in the libjpg used in android, and a malformat in the encodibg of the picture ?


From my understanding, if it happens once, it's a crash, if it keeps rebooting it's a brick. If it can be recovered it's a soft-brick, otherwise it's a hard-brick.


The thing is there is always more levels of bricked. Usually people refer to bricked as being unrecoverable through the usb flashing methods but there is usually a way to open up the device and flash the nand chip back to a working state. So its bricked in the perspective of a user but not for someone with electronics experience.


Another commenter agrees with you, writing:

> it soft-bricked their phones (ie. they had to factory reset their device)

But the whole point of coming up with the word "brick" was to describe the set of scenarios where even things like factory reset won't work. This is "literally" all over again.


But even with a hard brick (where it's "literally a brick"), you can sometimes bring it back to life by opening the device and swapping a component or soldering something.

I like soft vs hard brick, because it connects to software and hardware. With a soft brick, the device is basically a brick until you reset the software. With a hard brick, it requires potentially a hardware fix.


Some users report that it soft-bricked their phones (ie. they had to factory reset their device). Others were able to recover it by either deleting the wallpaper from TWRP, or pressing a sequences of keys to take a picture and set that as the wallpaper (apparently for them only the display was broken, everything else worked(.


But the article also says if you set it as your wallpaper, you cannot get past the lock screen, which I am guessing is because it crashes again loading or rendering the wallpaper. So maybe you could boot with a key combination and reset the phone, but it's pretty close to bricking.


Soft bricks can be recovered from.


Neither the article nor the headline say soft brick. Besides, "soft brick" is just someone not wanting to admit that brick has an understood definition but they like the word brick and want to use it.


Does brick have an understood definition though? I think this is the millionth time I see this exact discussion about a news story using the term differently from the one true definition. That doesn't seem very understood to me.


Well, yeah. "Brick", as in a paperweight, as in it's dead for good, forever. It's in the word.


It's so rare these days I really don't understand why people fuss over the definition.

It's almost impossible to brick anything with software, and even in hardware unless you damage the PCB itself you can usually recover if you tried hard enough (e.g. Even if you fry a $1500 FPGA you could technically reflow a new one and pray the bitstream is OK if you need it fixed now - although I'd rather you than me)


Technically you could also lay out new components and copper trace on an actual brick, too.

I've always understood it to mean 'broken beyond software or high-level firmware repair'. i.e. if not actually a blown component, something needs to be reflashed which has an inaccessible JTAG header or something, rather than something 'higher level' like recovery images over ADB & USB.


It is in the Merriam-Webster dictionary:

https://www.merriam-webster.com/dictionary/brick

Under the 'verb' usage:

2 : to render (an electronic device, such as a smartphone) nonfunctional (as by accidental damage, malicious hacking, or software changes) // … those who dared hack the phone to add features … risked having it "bricked"—completely and permanently disabled—on the next automatic update …


discussions must be had. it's one of those sesame street stories anyway; they want to use tech lingo with the kids.


I hadn't heard the term "soft brick" before, and if you Google it all the top results are about this.


"soft brick" is commonly used on XDA.


Is it a term that's usually only applied to androids?


'brick' in general is a sliding scale based on how many device debug/recovery tools you have access to and how much device disassembly you're willing to tolerate.


No, iOS devices can also soft-brick, it's just a general term. Most often people encounter the term in relation to hacking/rooting hardware, like a Nintendo DS/Switch for instance. Hitting up Wikipedia or Trends can help you dig through its usage considering how Google's search algo heavily biases current events wrt relevance.


Not really, it's used for example with OpenWRT too:

site:openwrt.org "soft-brick" - 213 results


Bricking? I would say that a wipe can fix it.


Crashing, not bricking. Can we update the title to remove the editorialization?


I have no idea why you are being down voted


My guess: because there is no editorialization here, since the BBC title also says “bricking”. The title must be editorialized if you want to change it. Again, my guess. I used my upvote!


I've experienced this when suggesting headline edits: get several upvotes by others who agree, dang agrees and changes the title (and folds the thread away, I think to prevent the following...) and then within a few minutes people plink away with the downvotes until it's negative. On one hand it seems like people are downvoting it for being irrelevant... but dang's response seems like it should provide the context to avoid that. Baffling.


Yes, we've de-baited it now.


Thanks!




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: