Hacker News new | past | comments | ask | show | jobs | submit login
Reverse-engineering Rosetta 2 Part 1: Analyzing AoT files and the runtime (ffri.github.io)
121 points by my123 on March 4, 2021 | hide | past | favorite | 17 comments

This isn't something I would be even remotely capable of doing myself and I probably don't understand the vast majority of what they are talking about but I absolutely love that this exists and there are people who enjoy digging into and doing a nice write up about it.

Yeah, I've been interested in learning assembly/reverse engineering/ghidra at some stage, but reading stuff like this by people who understand what a computer's doing to this level makes me despair for my own lack of knowledge.

You can always learn! You're just a few books away from the knowledge.

Which books?

I think the raw reverse engineering knowhow generally comes from reading and writing low level code and reverse engineering other people's.

Atop of that you also need understanding of computer architecture, for which I vaguely recommend Hen.&Pat. but again you get this knowledge by playing around mainly.

A bit of compiler design would help too if you want to write one of these - Engineering a compiler is a good one.

I hope those same people are looking into the M1 SSD mystery.

What mystery?

I wonder if they do indeed compute a checksum of the binary to come up with a aot-translation cache key. Must be quite ineffficient.

Not if your SSD delivers data at more than 2500MB/sec, as the one in the MacBook Air with M1 does. SHA256 calculation can probably be performed at that speed as well, thanks to dedicated silicon, so even a large 250MB binary (haven't seen those in any other use case than browsers) would be hashed in a tenth of a second. Not noticeable at all, if it's just once, at startup.

This also sounds like something that could be cached if the file doesn’t change.

I think there is no good way of doing that. Each time the user tries to run a x86_64 binary you’d have to actually checksum, or otherwise check the content of, the x86_64 binary to know if you have a translated version of it already.

inode meta data such as timestamps are insufficient I think. They can be tampered with.

> inode meta data such as timestamps are insufficient I think. They can be tampered with.

In macOS, there is a security-policy layer of some kind on top of xattrs, separate from the security-policy of the file itself. `com.apple.rootless` is an example of an xattr protected by this mechanism: users (even root) can't apply or remove `com.apple.rootless` from files on a filesystem mounted as the rootfs.

With this mechanism, it'd likely be possible to give executable binaries an xattr containing the checksum, generated by Gatekeeper+Rosetta, that the user couldn't modify, while still being able to otherwise modify/delete the file. (And, presumably, modifying the file would automatically invalidate/remove the checksum xattr.)

The kernel is aware when files are modified.

That's not generally applicable, e.g. not if files are on an external drive, or worse, a network filesystem (where they can change even during use).

In which case falling back on a hash is fine.

Is it? Checksums are something that I use rather than understand, but the CPU is doing billions of instructions per second these days and the hash only happens once

Do openssl speed sha256 to get the idea how high the latency for a cache hit would be. I see a throughput of ~300MiB/sec. This can be parallelised easily but still we are burning lots of CPU cycles for nothing. Bad for battery life.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact