The discussion on race conditions at the end is an important one, and IMO the bugfix is a bandage at best: the notion of anything accessing the “current” object after any kind of delay, especially in an event handler, when there is any chance the thing is not a singleton, is a recipe for disaster. In this case, dismissing the “current” security code screen was a supported API surface and that should set off all the red flags.
Of course it’s annoying to have to track the identity of “our screen” and bind that identity to event handlers, or make it accessible with context etc. But it’s necessary in anything remotely security-adjacent.
(And never assume anything is a singleton unless you add a breadcrumb comment for someone who might change that assumption on a different team!)
Agreed. The fixed logic, at least judging by the commit message, still feels very shaky on correctness grounds ("if we are dismissing something that doesn't seem to be right, ignore it").
Since they're rewriting code and changing method signatures anyway, I would prefer they got rid of the notion of "currently visible screen" and made sure that all dismiss() calls have a unique pointer or token pointing to what exactly is being dismissed. If this was my codebase, their approach would give me all sorts of bad vibes about additional problems lurking deeper.
The whole process and the nature of the fix doesn't inspire a lot of confidence in the security of Pixel/Android in general.
So you'd go out and refactor a major security sensitive component (which dates to time before your career most likely) in a span of a single month for an emergency security patch deadline?
That doesn't inspire a lot of confidence in your risk assesment and decision making.
I'd do what Google did: rollout a patch that addresses the immediate danger and then backlog proper refactors over time.
Their fix included a similarly large refactor, they just used the "security screen type" as a newly introduced parameter instead of something unique to the screen instance.
I do agree that in the real world, sometimes you have to settle for a less-than-ideal solution. I hope my post reads less like "those people are idiots", which was not my intent, but more like: this specific fix isn't ideal, and knowing this type of code is live in a device doesn't fill me with confidence, even if I can understand reasons for why it was done that way.
Right? This was absolutely the "right" level of refactor for a hotfix, as the full refactor would introduce much more state management that could itself introduce bugs. And especially if behind the scenes there was a detailed audit of what things can currently access the current security screen, it would be fine for now.
But I sincerely hope that in the postmortem, there would be a larger meta-discussion around code review practices and how something like this "global dismiss" became part of the API surface to begin with, and a sincere prioritization of a larger review within the backlog. Though with everyone on edge at this time in big tech, I doubt that ends up happening :(
Their change is hardly a big refactor. This includes all the new code, all the parameter changes everywhere the function is used, and two additional test cases. This is a tiny change.
>12 changed files with 102 additions and 26 deletions. [1]
I don't think that is as much of an issue as the ridiculous process he had to go through.
Think about that first security researcher. You literally found a Screen Unlock bypass (should be Priority #1, right?) - and Google just went and put fixing it on the backburner.
If they will put something like that on the backburner, what else are they ignoring? It isn't confidence-inspiring.
Edit: Also, knowing Google, what are the odds of your full refactor? "Temporary" fixes become permanent fixes quickly.
Maybe it was an already well known exploit. After all this was a duplicate and Google was sitting on it. Two people found it and reported it to Google. Why not a third one, and sold it?
> The fixed logic, at least judging by the commit message, still feels very shaky on correctness grounds
This was my experience as a dev on a team at Google for a few years. I saw a LOT of exceedingly lax treatment of correctness in the face of concurrency. There have even been multiple decisions I've seen to guess at how to fix concurrency bugs and just say "well, looks good to me, let's see if it does anything."
It's par for the course, and folks get (got? =P) paid handsomely for doing it.
They already have multiple security screens, and a demonstrated critical bug with security screen confusion. Not sure how this is premature optimisation.
> the number of screens is small and there are few tiers (only 2)
Making this kind of assumption, when there are no such guards in the system itself, is exactly what leads to security issues.
If the system enforced two named singletons as security screens, so it was impossible to .dismiss() the wrong thing, then sure. But that's not how the system is, and assuming that "the number of screens is small" and "there are only 2 tiers" without enforcing that assumption with code is pretty much how the original bug was introduced.
Since they are dismissing the Lock Screen _type_ (SIM, PUK, Pin) and not the instance, a logical example for where this might go wrong is if you have dual SIM. Then again, worst case you dismiss the incorrect SIM Lock Screen. That will not give you full unlock and also the ‘wrong’ SIM will still not work
Yeah an attacker may be able to use their own personal dual SIM Pixel phone to bypass the SIM lock screen for a stolen SIM card that they don't know the PIN or PUK code to using a similar technique, but like you said, I'm almost certain that it wouldn't actually let them use it to send and receive texts (and if it does, then that's really an issue in the SIM card's OS, considering anyone could just modify AOSP to ignore the SIM lock screen and then put that on their own phone.)
Even still, being able to bypass the SIM lock screen would still be a bug, just not a vulnerability. Google doesn't pay bounties for non-security related bugs to the best of my knowledge, but I can't help but feel this is still not an ideal way to design the system. It likely is fine today, but as strix_varius said, these kind of assumptions are what led to this vulnerability in the first place. Vulnerabilities do pop up from time to time in otherwise well-designed systems, but this lock screen bypass never would have been present in the first place had better design practices been followed. As krajzeg said [1], The whole process and the nature of the fix doesn't inspire a lot of confidence in the security of Pixel/Android in general.
I would indeed expect something more robust like a security state machine where not all states can transition to any other state freely. The UI shouldn't even have a say in what transitions are allowed.
The Rx model works nicely. Rather than trying to model state transitions, you model "scopes of work." So if you have something like an event listener, you would tie it to the active unlock session. When that session ends, you dispose of that "scope of work" and all work associated with it would immediately stop.
No, not exactly, but Android is old and gnarly enough that a lot of components don't have a clear Model/View separation you'd expect in modern codebases.
Well, the UI element’s dismissal is what signals the system that holds unlock state. And the problem was that multiple other systems would automate the dismissal of the current UI element… without checking whether it was the appropriate UI element they expected to be there!
Google invests a great amount of money in their project zero team but does anyone know if they have a specific red team that is dedicated for Android ?
As a former Google Information Security Engineer I can say: I don't know, because the Android security organization is completely separate from Google's security organization. It's something I always found odd, as it created some redundancy between both organizations, lack of homogeneous processes, etc
Of course it’s annoying to have to track the identity of “our screen” and bind that identity to event handlers, or make it accessible with context etc. But it’s necessary in anything remotely security-adjacent.
(And never assume anything is a singleton unless you add a breadcrumb comment for someone who might change that assumption on a different team!)