Within an hour of V8 pushing the fix for this, our build automation alerted me that it had picked up the patch and built a new release of the Workers Runtime for us. I clicked a button to start rolling it out. After quick one-click approvals from EM and SRE, the release went to canary. After running there for a short time to verify no problems, I clicked to roll it out world-wide, which is in progress now. It will be everywhere within a couple hours. Rolling out an update like this causes no visible impact to customers.
In comparison, when a zero-day is dropped in a VM implementation used by a cloud service, it generally takes much longer to roll out a fix, and often requires rebooting all customer VMs which can be disruptive.
> Their use is in some sense way worse than a web browser, where you would have to wait for someone to come to your likely-niche page rather than just push the attack to get run everywhere.
I may be biased but I think this is debatable. If you want to target a specific victim, it's much easier to get that person to click a link than it is to randomly land on the same machine as them in a cloud service.
Great workflow! I long for the day when I can start for a company that actually has their automation as efficient as this.
Few question, do you have a way of differentiating critical patches as this? If so, does that trigger an alert for the on-call person? Or do you still wait until working hours before such a change is pushed?
I'm just so tired of the whole microservices and prima donna developer bullshit.
* They set an easy measurement that doesn't match customer experience, so they say they're in-SLO when common sense suggests otherwise.
* They require customers jump through hoops to get a credit after a major incident.
* The credits are often not total and/or are tiered by reliability (so you could have a 100% uptime and not give a 100% discount if you serve some errors). At the very most, they give the customer a free month. It's not as if they make the customer whole on their lost revenue.
With a standard industry SLA, you can have a profitable business claiming uptime you never ever achieve.
We start a new instance of the server, warm it up (pre-load popular Workers), then move all new requests over to the new instance, while allowing the old instance to complete any requests that are in-flight.
Fewer moving parts makes it really easy to push an update at any time. :)
For 1), write down the entire manual workflow. Start automating pieces that are easy to automate, even if someone has to run the automation manually. Continue to automate the in-between/manual pieces. For this you can use autonomation to fall back to manual work if complete automation is too difficult/risky.
For 2), look at your system's design. See where the design/tools/implementation/etc limit the ability to easily automate. To replace a given workflow section, you can a) replace some part of your system with a functionally-equivalent but easier to automate solution, or b) embed some new functionality/logic into that section of the system that extends and slightly abstracts the functionality, so that you can later easily replace the old system with a simpler one.
To get extra time/resources to spend on the automation, you can do a cost-benefit analysis. Record the manual processes' impact for a month, and compare this to an automated solution scaled out to 12-36 months (and the cost to automate it). Also include "costs" like time to market for deliverables and quality improvements. Business people really like charts, graphs, and cost saving estimates.
However... for this comment I then wanted to see how long ago that patch did drop, and it turns out "a week ago" :/... and the real issue is that neither Chrome nor Edge have merged the patch?!
> Agarwal said he responsibly reported the V8 security issue to the Chromium team, which patched the bug in the V8 code last week; however, the patch has not yet been integrated into official releases of downstream Chromium-based browsers such as Chrome, Edge, and others.
So uhh... damn ;P.
> I may be biased but I think this is debatable. If you want to target a specific victim, ...
FWIW, I had meant Cloudflare as the victim, not one of Cloudflare's users: I can push code to Cloudflare's servers and directly run it, but I can't do the same thing to a user (as they have to click a link). I appreciate your point, though (though I would also then want to look at "number of people I can quickly affect"). (I am curious about this because I want to better understand the mitigations in place by a service such as Cloudflare, as I am interested in the security ramifications of doing similar v8 work in distributed systems.)
This does not appear to be true. AFAICT the first patch was merged today:
(It was then rapidly cherry-picked into release branches, after which our automation picked it up.)
> I am curious about this because I want to better understand the mitigations in place by a service such as Cloudflare, as I am interested in the security ramifications of doing similar v8 work in distributed systems.
Here's a blog post with some more details about our security model and defenses-in-depth: https://blog.cloudflare.com/mitigating-spectre-and-other-sec...
BTW: if there is any hope you can help put me in touch with people at Cloudflare who work on the Ethereum Gateway, I would be super grateful (I wanted to use it a lot--as I had an "all in on Cloudflare" strategy to help circumvent censorship--but then ran into a log of issues and am not at all sure how to file them... a new one just cropped up yesterday, wherein it is incorrectly parsing JSON/RPC id fields). On the off chance you are interested in helping me with such a contact (and I appreciate if you aren't; no need to even respond or apologize ;P): I am email@example.com and I am in charge of technology for Orchid.
HN post on Orchid Protocol for curious: https://news.ycombinator.com/item?id=15576457
This would require ahead knowledge of the vulnerability and someone either within CloudFlare or at one of it's used code dependencies to plant malicious code. Since a rolling upgrade seems to be fully automated at CloudFlare and can be done within a few hours for the complete infrastructure, I don't see CF being at high risk here.
Hope you there is also some scanning for this put in place.
I'm curious how that works in practice. Specifically, the docs say (https://developers.cloudflare.com/workers/learning/how-worke...)
> Each isolate's memory is completely isolated, so each piece of code is protected from other untrusted or user-written code on the runtime.
But they don't quite specify if it's isolated at system level (separate threads with unshared memory) or something simpler ("you can't use native code, so v8 isolates your objects").
And here's a talk I gave about how Workers works more generally (not security-focused): https://www.infoq.com/presentations/cloudflare-v8/
So many juicy details and nuanced takes that really make me appreciate the thought and care CF has put into securing workers.
There are a few ad networks which allow you to run JS. They may not work for this specific exploit, though.
1) This is an RCE, so what it does is achieving code execution in the browser, i.e. it can run arbitrary code from the attacker, it’s literally like running a compiled program inside the target’s browser. This doesn’t bypass Chrome’s sandbox, so a lot of OS operations are not reachable from within the browser (for example, a lot of syscalls can’t be called).
This is the first step in an exploit chain, the second one would be a sandbox escape to expand the attack surface and do nasty stuff, or maybe a kernel exploit to achieve even higher privileges (such as root)
2) WASM being RWX is really not a security flaw. W^X protections in modern times (even in JIT-like memory mappings) makes sense only when coupled with a strong CFI (control flow integrity) model, which basically tries to mitigate code reuse attacks (such as JOP or ROP). Without this kind of protection, which needs to be a system-wide effort as iOS has shown, W^X makes literally zero sense, since any attacker can easily bypass it by executing an incredibly small JOP/ROP chain that changes memory protections in the target area to run the shellcode, or can even mmap RWX memory directly.
This is to say that with enough time, a sufficiently sophisticated and motivated actor can always find 0-days and achieve their goals.
The related article was discussed recently: https://news.ycombinator.com/item?id=26590862
How do you even go about getting rid of WRX in a JIT? Do you generate and then remove W?
In the Wasm engine inside of V8, the code is writable and executable at the same time because the JIT uses multiple concurrent threads in the background during execution and incrementally commits new JITed code as it is finished. (And, funnily enough, this performance optimization is mostly for asm.js code, which is verified and internally translated to Wasm, to be compiled and executed by the Wasm engine).
The long-term holy grail is to move the JIT compiler entirely to another process, so only that process has write permissions, and the renderer process (where Wasm and JS execute) has only R and execute permissions, and they used shared memory underneath.
Remove Array buffers, remove blob support, remove anything which can be used to assembly a continuous binary without passing some sanitation.
2. Can be sanitised to be valid UTF-16
3. Can be intentionally mangled in memory to prevent abuse
JS strings are not UTF-16, they are 16-bit chunks of (potentially) nonsense, and enforcing valid UTF-16 would break quite a few existing uses. For example, anything that stores encrypted data in a string. Which "shouldn't" be done, that should be a Uint8Array, but existing APIs basically force you to do it. And there's such a thing as backwards compatibility.
Your 3rd point is much more feasible. I doubt any "real" mangling would be good enough from a performance standpoint while still being too difficult for attackers to use. But I could imagine eg breaking any invalid UTF-16/UTF-8 string up into separate rope nodes, maybe even ensuring the nodes don't get allocated too close to each other and/or injecting disrupting hardcoded bytes in between them. (I work on SpiderMonkey, the Firefox JS engine, and we do at least make sure to allocate string data in a separate part of the heap from everything else.)
People who hack JS to store arbitrary data in strings are already fighting a loosing battle, and I see no point to help them.
But my point is that we have moved from the JS as a scripting language which did not allow for arbitrary binary data, to one which did without much though over that.
Half of existing problems with zero click, zero days, and zero browse exploits running in the wild, and Chrome becoming the ActiveX 2.0 is that.
I'm not saying to axe it from JS, but JS may limit the the browser use case by a limited set of JS standard.
But yeah you could try and mangle it and represent them internally as e.g. a rope, but that's not a bullet proof solution.
2. Won't help.
3. Simply reverse, or make use of, the mangling.
I can check browser memory with a profiler and see if this page is marked as executable. It would not be.
See also https://news.ycombinator.com/item?id=26737803
At first glance to me, the core bug is actually in abusing an array enough to get an unsigned int into a function that expects them all to be signed, causing an off-by-one error and leveraging that into a memory leak (to get the pointer to a FixedArray for floats and a pointer to a FixedArray of objects) and then replacing one with another to create a type confusion and read/write arbitrary memory through that. r4j will probably correct me on the subtlety here though!
Source: extremely similar to HackTheBox RopeTwo, which I spent more time than I am prepared to admit solving.
Disclaimer: am noob at v8 exploitation, but have done enough of it to know some of the tricks.
See also, an article that helped me previously, much of the code is similar (eg: the WASM stuff, the addrof() and fakeobj() methods): https://faraz.faith/2019-12-13-starctf-oob-v8-indepth/
It's not a neat trick, but a grave problem of WASM model.
WASM memory (in)security will be a big problem until all of memory security tricks from native code will be migrated to WASM world, and then there will be not much use of WASM anymore.
JS promoters want so hard for JS to subplant other major languages, but not noticing themselves ignoring the decades long other path major languages took on robustness, and security.
My email is in my profile.
AW SNAP: Access violation on both.
I then straced the tab processes on Debian (as best as I could guess which chrome child processes were tabs via top activity as I loaded stuff in them) and ran the exploit. Nothing seemed unusual. Just all the system calls stopped after the exploit was run, and the tab crashed. I guess it would be good to gdb to the actual JS VM process and stepi through the instructions, but I don't know how to get the actual process that runs the JSVM for a chrome tab.
I forked the repo so I could easily run it from a GitHub pages site. I don't know what it does but I don't think it does anything:
So probably the standard PoC shellcode.
The program program is shellcode, so the first thing it does is set itself up with the state it needs. This includes clearing the direction flag, aligning the stack to 16 bytes, and figuring out where it is in memory (the call to 0xca both sticks an address on the stack and avoids some code described below). It then calls into a little subroutine (at 0xa) to read through the PEB → LoaderData → InMemoryOrderModuleList. The first entry is the entry for the main binary, from which it grabs the "FullDllName", does some sort of trivial hash on it, and compares it to a known value (presumably a check to see that it's running in a sane environment?) Then it largely skips the next entry (ntdll.dll) and goes to kernel32.dll, where it looks into the exports table to find what is presumably CreateProcessA (this is done using a similar hashing scheme, so the string is not directly present in the shellcode). There's a "calc.exe" string at the end of the code (perhaps you can spot it) and with that it has enough to pop a calc.
OTOH -e, depending on your shell, is either necessary (e.g. bash), or unnecessary but harmless (e.g. zsh) or insufficient (e.g. dash). If you want to print binary stuff portably, you need to use printf(1).
(Shameless plug: you can use https://github.com/jwilk/printfify to generate printf commands for binary files.)
In this case, base64 encoding is much more efficient:
- PicoCTF 2021: Kit Engine, Download Horsepower and Turboflan
- Hack The Box (HTBs): RopeTwo and Modern Typer
- DownUnderCTF 2020: Is This Pwn or Web
- LineCTF: BabyChrome
More info for other browsers and platforms:
window.AudioContext = undefined
It's probably tolerable to turn it off for now unless you know you need it.
Using WASM is the method de'jour for turning arbitrary read/write/object leak into RCE. Without WASM, the writer would have to use ROP/etc, but the primitives are still there.
Check out this discussion by a bunch of the big names in the field, including Niklas B who was on the team that originally found this bug: https://twitter.com/spoofyroot/status/1379911787282243584
This way you get some of the control over Chrome browser back. Not so fun fact - Google is very against users having the ability to execute arbitrary user defined code, afair Gorhill had a problem with google concerning injectable scriptlets.
(1) of course there are issues, you cant inject into <iframe src="data: https://github.com/whatwg/html/issues/1753 in Chrome :( so you would have to manipulate CSP to disable those.
Thanks for the info about 'delete WebAssembly'
Just 2 days after I was heavily downvoted for saying that you're not particularly safe if you don't disable JS.
This absolutely made my day. HN truly is a source of nonstop entertainment.
This is quantifiable, not "alarmist".
What I do, while using OpenBSD, is limit which users access which apps, dividing my activities by user according to risk level and which apps a given user uses and sites that user browses to (some user accounts I use regularly do not use a browser). Also obsd has pledge and unveil built in, which are kernel API calls they put in apps which limit which syscalls and directories an app can access. Those combined give me some increase of confidence.
(Edit: On Debian I could do this with multiple simultaneous X sessions, moving data between them via a shared text file. On obsd, one could use SSH and some scripts, so they can share a desktop if/when desired.)
Maybe there is a maturity and effort level continuum, where we can help people along as appropriate, per their desires, interest, and situation.
The fix exists in the v8 source, but hasn't reached normal Chrome yet. Unless an org is compiling Chromium from scratch, there isn't an actual useful fix available.
Edit: critiques welcome of course.
At 26 I have twice the experience but only a third of the motivation and focus. When you're younger you do stuff just find out whether you can, when you're older you already know you could and decide not to do anything.
It's exceedingly rare I'll find something interesting enough that'll keep me coding for 15 hours straight, at 18 that was the norm on pretty much every free day - of which there are a lot more when you're younger and not already working as a software dev.
My money will always be on the 18 year olds.
I am just smh over this.
Also: it could have been much worse than this irresponsible disclosure, at least it was disclosed.