

Low level stuff in ARM - dacav
http://www.coranac.com/documents/bittrick/

======
ajross
The author seems to be freaking out a wee bit much over branching, which the
last time I checked is quite fast on the shallow-pipeline ARM cores in the
market. Branches get more expensive the deeper the pipeline gets, which means
they're hugely important for desktop CPUs (although a little less now than in
the days of the "netburst" cores from Intel). But for ARM? Meh.

What's a much bigger deal on ARM are cache issues. The L1 caches are very
small, and there is no L2. Keeping working set sizes down for instructions and
data is hugely important, which means that tricks like these aren't always a
win if you're using them instead of (e.g.) calling a "min()" function.

And (someone correct me if I'm misremembering) ARM doesn't have a physically
tagged cache, which means the caches can't survive a change in memory domain
like a system call. I know for a fact that syscalls on my Motorola A780
(XScale CPU, Linux 2.4 kernel) are 20k cycles or more.

The bottom line is that I think the author is missing the point. These are
elegant assembly hacks, but aren't really where performance-conscious
programmers need to be focusing their efforts.

~~~
comatose_kid
ARMv6 (and higher) does have a physically tagged cache.

Another thing that helps with avoiding cache flushes on context switches is
the presence of a pid bit in the page table.

------
speek
this is pretty neat stuff.

