
SMT Solving on an iPhone (2018) - creolabs
https://www.cs.utexas.edu/~bornholt/post/z3-iphone.html
======
zero_k
Hah, funny to see my blog referenced. Yeah, cache misses are a huge part of
SAT solving (modern solvers have prefetch code specifically for this loop,
some seemingly following the code I wrote). And SAT solving is about 95+% of
SMT solving. So, yeah, cache is where it's at.

I once had an old, but "enthusiast" i7 that had a 3-lane mem setup, and
"upgraded" to a consumer dual-lane i7 that was 2-3 gens ahead. It had the same
performance(!) for SAT solving. Was pretty eye-opening.

~~~
HALtheWise
At the end of the day, cache miss latency should be bounded by the speed of
light and the distance to memory. It's entirely unsurprising to me that by
putting main memory on the same board as the CPU, as well as the optimizations
that come from allowing a custom communication protocol between them, better
performance is possible. I wonder what a desktop machine designed for cache
and memory latency would look like.

~~~
vvanders
It's SRAM vs DRAM not the speed of light. Go take a look a DRAM CAS values as
clock speeds have increased. The ns/fetch has stayed pretty static(~10ns)
since DDR first showed up.

There's been slight improvements but nothing like the order of magnitudes you
seen in other areas.

If you want to pay the power and cost of SRAM it can be done but it doesn't
usually pencil out beyond what you see in caches today.

------
aminozuur
Apple Silicon has been pretty impressive as it outperforms many laptops and
desktops in benchmarks.

This guy even used the A12 chip from 2018. The A13 is even faster and ships on
all iPhone 11's and even the $399 iPhone SE 2nd gen.

I am eager to see how Apple Silicon will do in the upcoming MacBooks.

~~~
Mandatum
It's going to be awesome to see chips that actually utilize the board space
specifically for the functions the end-device is used for. I get general-
purpose boards are the defacto to suit the masses and it's favoured by OEM's
because they're easier to ship, but I think we're past the point we need that,
adoption is no longer a problem.

I signed up for the Apple DTK, hoping they approve more people abroad..

------
panpanna
Note that this sort of software can be extremely sensitive to cache
arrangement.

Not only actual performance but also small details such as data arrangement,
line size, replacement model and coherency model.

Source: I worked on performance improvement for a related piece of software.

~~~
sitkack
[https://emeryberger.com/research/stabilizer/](https://emeryberger.com/research/stabilizer/)

------
PaulHoule
Smt solvers and other 'old a.i.' workloads depend mainly on memory performance
under unpredictably branching workloads, which the iPhone chips seem to be
strong in. Compare that to 'new a.i.' workloads that eliminate branch
misprediction almost entirely.

~~~
hinkley
I have a hard time putting SAT, SMT, and at least some linear algebra
algorithms into the AI category. They are all logic, just of a sort that’s far
more mind-numbing than our usual data processing workflows.

I know the AI people get grumpy about all of the problem spaces that have been
harvested or “stolen” and called conventional code, but in this case I’m not
so sure that’s accurate.

It feels more like a bunch of people who wanted to use advanced math and logic
jumped on the AI gravy train while it was going past, and then hopped off
again when they had their fill.

~~~
kaik
SMT, SAT, etc are absolutely AI. In fact, Machine Learning is just another
field inside AI (albeit a famous and successful one). As someone that got his
PhD doing his research in "classical" AI, it's surprising that nowadays most
people think that AI is just machine learning.

~~~
zero_k
+1. I am always surprised that they used to do (and of course still must do)
proper full-on formal verification of e.g. train track scheduling systems, so
train accidents, and consequently 100s of deaths don't just happen because
"sorry for the bug, ver 1.2 fixes it" \-- but people are happy to sit into a
Tesla going at 120km/h and allow a well-tested, but not formally verified deep
learning system to make decisions. BTW, I know it's not formally verified,
because formally verifying deep learning systems is a currently very much
unsolved problem for anything but toy examples.

------
wslh
I am always amazed how the user experience of an iPad mini (~USD 400) is much
better than a top end 13" Windows based notebook (~USD 2700). I know I am
comparing apples with oranges in term of running processes and multitasking
but the foreground app experience is much better.

~~~
epistasis
My uninformed hunch: There's likely a lot to be said for 1) requiring all apps
to use a single modern GUI toolkit, and 2) optimizing that toolkit for the use
case of touch. Rather than having a single codebase that's supposed to solve
all use cases in addition to 30 years of baggage.

~~~
gowld
Most of it is running a single app at a time, with far less complexity than a
desktop system.

~~~
saagarjha
Note that at any given moment there's usually a couple dozen daemons running
in the background on iOS. And while iPhone will only show one app at once,
iPad can do a handful.

~~~
MarkyC4
iOS optimizes here too, by limiting how much CPU/IO background apps can use.
It's frustrating to use Chromecast (for example) on iOS since it's always
disconnecting from the receiver.

~~~
saagarjha
I'm talking about background daemons from the system, which have different
resource limits.

~~~
why_only_15
the background daemons are mostly memory-limited, not processor-limited.

~~~
saagarjha
Fair. And to be honest, many of them are present on macOS too, without Jetsam
around to kill them ;)

------
haunter
I really want to see the Apple Silicon under some real world load running a
current gen AAA video game in 4K ultra settings while streaming with OBS,
playing music, and having Chrome open with 100+ tabs. Because that's what my
current PC does for example so I'm really curious about everything
(performance, temps etc.)

To some extent it's just hard to believe that the iPhone CPU is better in
every thing compared to the current gen top of the line Intel and AMD desktop
CPUs.

~~~
wu_187
This. While I can appreciate that Apple is taking a huge risk in creating
their own chips, I cannot believe that the performance of the chips will be on
par with current x86_64 chipsets based on Apple's past. In all honestly, the
vast majority of Apple's clientele for computers are businesses, who basically
just use them for basic web browsing/office work, which isn't a high bar to
meet the needs for.

~~~
DiabloD3
I don't know of a single employee of a serious actual business that has been
issued an Apple anything. Not a laptop, desktop, or phone. I don't know of a
single person that, when given the choice of what to use for work, has chosen
Apple _unless_ they already were entrenched in the Apple ecosystem.

The vast majority of Apple's customers are Apple fans, people who merely wish
to think different, nothing more, nothing less.

Edit: Don't downvote just to disagree. Comment and state your reasoning on why
you think I'm wrong. If you work for a company that only issues Apple devices,
please speak up with your experience on how this went, and, if possible, why
the company didn't go with Windows or Linux computers instead.

For the comments that try to bring up software development roles: most
software developers I know either use Windows or use Linux on the desktop, and
loath OSX worse than I do. Very few prefer OSX.

The few people I know that prefer OSX (and will die on that hill) are frontend
developers.

~~~
Impossible
When I was at Facebook the majority of employees used Mac Book Pros for work
(I was an exception because I worked on VR stuff which was Windows only). It
seems like the majority of Silicon Valley big tech and unicorn startups have a
large (if not majority) population of employees using Apple products for
development. I understand that many employees prefer or need Linux or Windows,
however.

~~~
DiabloD3
That seems to be the only places where MBPs are almost part of the dress code
to work there. The HN community has a disproportional representation of such
employment, and sometimes the Apple echo chamber shows up strongly because of
it.

It even happened at Microsoft until the Surface program was finally given a
proper go ahead by Nadella. A lot of former MBP users seem to love the
Surfacebook, so they succeeded in making a functional yet veblen good in the
same vein that a MBP is.

~~~
Razengan
Apple's A12Z Under Rosetta Outperforms Microsoft's Native Arm-Based Surface
Pro X:

[https://www.macrumors.com/2020/06/29/apple-
rosetta-2-a12z-be...](https://www.macrumors.com/2020/06/29/apple-
rosetta-2-a12z-beats-surface-pro-x/)

~~~
DiabloD3
Yes, but how well does it perform against something that isn't what might be
their slowest (but thinnest and most portable, but somehow not cheapest)
model?

Apparently, a Surface with a Ryzen 4800U exists and is coming out really soon.

The comparison isn't realistic because Apple's acquisition of PA Semi was a
brilliant move, and no one makes an ARM that is anywhere near that level of
performance, not even Qualcomm. I agree with Apple's decision to put their ARM
on the desktop as their answer to the Intel dilemma.

In this case, an ARM vs ARM comparison is not an apples to apples comparison,
Apple does not license their design to anyone else, ergo, the only way to make
a fair comparison is to use the best commonly available part, which currently,
is Zen 2 parts.

I expect a Zen 2 to crush a 2 year old Apple ARM, but I also expect a future
Apple desktop and laptop design to keep up with Zen 3. I actually hope Apple
can win the fight, but I also wish Apple would license their CPU to other
manufacturers to promote the general adoption of more architectures in this
space.

------
ashleyn
Are smartphone architectures really this amazing, or is Intel just dropping
the ball so hard that they're getting lapped by mobile hardware?

~~~
PaulHoule
Intel has focused most on data center performance, where you are operating at
scale and frequently using entirely different algorithms than people usually
use on a PC or Phone. For instance

[https://en.wikipedia.org/wiki/Column-
oriented_DBMS](https://en.wikipedia.org/wiki/Column-oriented_DBMS)

dominates the competition for ad-hoc analytics under "enterprise" conditions.
There the point is to maximize the use of memory bandwidth and never stall the
pipelines, to turn a commercial data processing job into something that looks
like HPC to the memory bus.

(Actually columnar organization improves performance reliably for small data;
video game programmers in the 1980s knew that it was better to put the x
coordinates of all the spaceships together, then put the y coordinates
together, then put the sprite id's together, ...)

Most muggles though go with the row-orientation that comes with structs,
classes, record types, and don't care. Pythoners know their language is slow
but they know enough to use 'pandas' which is an object-oriented interface to
an in-memory column store.

Intel has sold it's performance on the basis of what a sophisticated
organization could get out of complex programming. For instance, the AVX-512
instructions can nearly double performance for some workloads. If you are a
cloud provider that will rack 1000s of identical machines that run the same
software, it can be a big win. The average PC user runs software that has to
run on a wide range of hardware and may be supported by lower-tier
organizations. They will rightly privilege having a trouble-free product and
low support costs vs giving every user the highest performance possible.)

~~~
robertlagrant
Columnar is just a specialisation, like graph or document. It's not a general-
purpose improvement on relational.

~~~
PaulHoule
From the viewpoint of relational it is an execution strategy. Many columnar
databases return row wise results via odbc or some similar interface and they
look like relational databases.

The difficulties of doing columnar style queries for document and graph
databases led at least one person I know to tell a certain three letter agency
that they couldn't have the panopticon machine they wanted.

For online transaction processing rows are good, but for OLAP it is not even
competitive.

------
derefr
> Both systems ran Z3 4.8.1, compiled by me _using Clang_ with the same
> optimization settings.

Hypothesis: LLVM's AArch64 backend has had more work put into it (by Apple, at
least) than LLVM's x86_64 backend has, specifically for "finishing one-off
tasks quickly" (as opposed to "achieving high throughput on long-running
tasks.")

To me, this would make sense—until recently, AArch64 devices were mostly
always-mobile, and so needed to be optimized for performing compute-intensive
workloads _on battery_ , and so have had more thought into the _efficiency_ of
their single-threaded burst performance (the whole "race to sleep" thing.) I'd
expect, for example, AArch64-optimized code-gen to favor low-code-size serial
loops over large-code-size SIMD vector ops, ala GCC -Os, in order to both 1.
keep the vector units powered down, and 2. keep cache-line contention lower
and thereby keep now-unneeded DMA channels powered down; both keeping the chip
further from the TDP ceiling; and thus keeping the rest of the core that _is_
powered on able to burst-clock longer. In such a setup, the simple serial-
processing loop may potentially outperform the SIMD ops. (Presuming the loop
has a variable number of executions that may be long or short, and is itself
run frequently.)

x86_64 devices, meanwhile, generally are only expected to perform compute-
intensive tasks _while connected to power_ , and so the optimizations
contributed to compilers like LLVM that specifically impact x86_64, are likely
more from the HPC and OLTP crowds, who favor squeezing out continuous
aggregate _throughput_ , at the expense of per-task time-to-completion (i.e.
holding onto Turbo Boost-like features at maximum duty cycle, to increase
_mean_ throughput, even as the overheat conditions and license-switching
overhead lower _modal_ task performance.)

~~~
my123
The LLVM x86_64 backend just gets as much work, and HPC is just one of the
reasons. The NVIDIA PGI compilers for example use LLVM as a code generation
backend, and that's one commercial option.

------
mcny
> So, in a fit of procrastination, I decided to cross-compile Z3 to iOS, and
> see just how fast my new phone (or hypothetical future Mac) is.

> I bet the new iPad Pro’s A12X is even faster thanks to the larger thermal
> envelope a tablet affords.

iirc Apple intends to complete transition to Apple Silicon within three years
(and they often tend to be conservative with their estimates) so I'd imagine
the 2023 or 2024 mac pro will be interesting given (I'd assume) it won't have
the thermal constraints an iPhone has. Thoughts?

~~~
matt2000
They are planning to complete the transition in 2 years:
[https://www.apple.com/newsroom/2020/06/apple-announces-
mac-t...](https://www.apple.com/newsroom/2020/06/apple-announces-mac-
transition-to-apple-silicon/)

Makes your question even more interesting though, what will a Mac Pro look
like running Apple silicon inside 2 years from now?

------
tyingq
Interesting. Is it possible that this particular workload takes a big hit due
to the Spectre/Meltdown mitigations?

~~~
corty
Yes. Mitigations drastically increase the cost of branch mispredictions, which
has always been expensive on x86 and has gotten even worse. See also the
sister comment.

Cache misses have the same problem.

~~~
riscy
I highly doubt the author compiled Z3 with those mitigations enabled, because
they are opt-in compiler options. Perhaps if the program makes frequent system
calls then the kernel’s mitigations would introduce overhead.

~~~
tyingq
My impression is that the mitigations that would hit hardest are Intel
microcode patches that load at boot time, and it may not have been obvious to
revert those. Or perhaps not wise to, depending on the production environment.

~~~
gowld
Why? If performance matters, don't browse Reddit on your lab machine.

I don't run antivirus on my microwave oven. There's no need.

~~~
corty
Support for most hardware depends on applying latest patches for HW and SW.
Only if you explicitly bought as HPC or Lab environment with the appropriate
contract specifying performance you might get around that.

------
pinewurst
>Indeed, after benchmarking I checked the iPhone’s battery usage report, which
said Slack had used 4 times more energy

>than the Z3 app despite less time on screen.

------
samfisher83
The a12 has about 3-4 times the number of transistors as i7-7700k. I know
there are a lot of other on than the SOC core than the cpu for a12, but they
still have a pretty large transistor budget. It was made with 7nm transistors
vs 14nm that intel was using at the time when 7700k came out. Apple did a good
job designing the chips, but TSMC is a huge part of why these chips are so
good. For the longest time Intel manufacturing process was way ahead of anyone
else which reflected in their performance. If I give you way more transistors
I can probably make a better chip.

If I give you more cache, decoders, execution units etc. I can make a faster
chip.

Also if you used all cores for sustained amount of time the intel would win
out just because it can handle all that heat.

~~~
nknealk
I think the dimension you’re missing is that the a12 also consumes
significantly less power. While we don’t know exact numbers, the author
speculates it draws about a tenth of the power as the i7.

~~~
zamadatix
It's the cache that makes this fast, not the higher clock higher voltage cores
on an older process.

Not that x86 particularly in 2018 is the pinnacle of perf per watt but the
power dimension is non linear on some of these variables and this particular
benchmark doesn't happen to gain much from it.

------
1f60c
(2018)

~~~
NietTim
Makes me wonder what the article would look like today, or next month when the
new generation iPhone hardware comes out. I guess benchmarks like geekbench
can give us a general picture

~~~
1f60c
The author's code is open source[0] if anyone wants to take a look.

[0]:
[https://github.com/jamesbornholt/z3-ios](https://github.com/jamesbornholt/z3-ios)

------
nickysielicki
On the topic of the portability of Z3, does anyone know why this example runs
fine on native amd64 but fails to solve when built with emscripten/wasm?

[https://github.com/sielicki/z3wasm](https://github.com/sielicki/z3wasm)

Do a build by running, “mkdir bld && cd bld && emcmake cmake ../ -GNinja &&
emmake ninja”

------
jcims
Not particularly apropos to the thrust of the article, but I want to apply SMT
to some tricky infosec use cases. Is there a good place to start for someone
without an academic background in math/logic/etc?

~~~
panpanna
I suggest you start with this paper:

"SMT Solvers in Software Security", Julien Vanegue, Sean Heelan, Rolf Rolles

[https://www.usenix.org/system/files/conference/woot12/woot12...](https://www.usenix.org/system/files/conference/woot12/woot12-final26.pdf)

~~~
jcims
Looks perfect, thank you!!!

------
nromiun
For anyone trying to reproduce it in an Android phone you don't need to
compile anything. Just "apt install z3" in Termux. It will probably be pretty
slow but it will work.

~~~
panpanna
Z3 is not available on termux (I am pretty sure it has been before).

Back to the issue at hand. It is possible z3 is faster on arm64 due to the
aarch64 bitfield instructions, but I find that unlikely.

~~~
nromiun
It was recently merged but it is available.

[https://github.com/termux/termux-
packages/tree/master/packag...](https://github.com/termux/termux-
packages/tree/master/packages/z3)

~~~
panpanna
Sounds it, thanks. Had to update the yermux app first to see it.

Yes, z3 performs slightly betteron aarch64. But performance seems to be mostly
cache dependent.

Also, z3 is threades which would favor Apple with their fewer but faster cores
compare to others with more but weaker cores.

~~~
panpanna
Hmmm... I think my autocorrect had a meltdown.

Hopefully people can figure it out anyway.

------
vbezhenar
It's very fun suggestion to put an iPhone in the cold water. Imagine cluster
of iPhones working under-water for math problems :)

------
api
When Apple releases ARM Macs I think we are all going to be really impressed.
Take these phone chips, beef them up a bit with more cores and maybe more
cache, clock them a bit higher, and give them better cooling so they can
sustain higher speeds.

If it goes how I think it will go, X86 is done. People will really start
wanting ARM on servers.

~~~
corty
People have been wanting ARM servers since the first ones were announced. But
there are two big problems. First, there is no unified boot- and device
initialisation-standard. So you cannot just put in a bootable USB stick and
expect it to work. Second, you can not reliably order any hardware, just very
small series just above prototype stages...

~~~
my123
> First, there is no unified boot- and device initialisation-standard.

There is. UEFI + ACPI.

> So you cannot just put in a bootable USB stick and expect it to work.

That's how it works today for Arm servers. Windows on Arm platforms also use
UEFI + ACPI, and ship with the bootloader unlocked.

------
dang
Discussed at the time:
[https://news.ycombinator.com/item?id=18383851](https://news.ycombinator.com/item?id=18383851)

------
m3kw9
Could be Apple arm chips have certain instructions that are more optimized

------
amelius
Beware: Apple devices are not general purpose computers. Whether you can use
them as such may be subject to change.

