
Is Intel within ARM’s reach? Pedestrian Detection shows the way - kracekumar
http://www.edn.com/electronics-blogs/systems-interface/4419918/Is-Intel-within-ARM-s-reach--Pedestrian-Detection-shows-the-way
======
jwise0
Sigh. Although some of this is Intel's own doing, "i3" is not a meaningful
descriptor, especially when compared to "Cortex-A15" or "Cortex-A9". "Core i3"
can refer to many, many generations of CPUs; this is not a nitpick, but a real
complaint, because each of those CPUs have very different performance
characteristics. Core i3 has been a Westmere (Nehalem tick); a Sandy Bridge;
an Ivy Bridge (Sandy Bridge tick); and a Haswell. Saying "a 1.2GHz Core i3" is
about as valuable, then, as saying "a Tegra" \-- which could be an ARM11
MPCore (Tegra 6xx); a Cortex-A9 without NEON (Tegra 2); a Cortex-A9 with NEON
(Tegra 3); or a Cortex-A15 (Tegra 4).

So, on a micro-level, the comparison is not terribly valid. On a macro-level
-- saying that the devices are within an order of magnitude -- the results are
reasonable, but certainly not novel...

~~~
ranjithparakkal
You are right that the i3 CPU(Core i3-530) we compared with is a little old
generation. I tried to compare online Core i3-530 with Core i3-2105. The 2105
is SandyBridge and runs at a slightly higher clock 3.1 GHz(while the 530 runs
at 2.93 according to CPU world).

According to cpu-world. The 2105 is 21% faster than the 530 for single
threaded operations. If you account for the speed in clock this would mean
only a boost of 14% improvement over the 530.

So its really not such a bad comparison.

In the meanwhile we will try run this on a newer CPU and let you know the
results.

------
aristidb
So code fully optimized for ARM/Cortex-A15 is almost as fast as only partially
optimized code on a 1.2 GHz i3? Well, good to know I guess.

~~~
haricm
Yes this is correct. But as you see the original un-optimized version was
running at nearly same speed on both. Meaning the A15 and Core i3 performance
are comparable, if run at the same clock. The general perception has been that
the raw performance of the ARM CPUs(like A15 and A57) and Intel Sandybridge
CPUs like the one inside like i3/i5 are not in the same ball-park . Most
people believes they are leagues apart. Also one of the reasons why such
comparisons havent been made before, much.The idea of the blog is to show that
this is not completely true.

------
zurn
Spoiler: no. 1265 ms vs 439 ms on their OpenCV benchmark.

(Then they play some what-if games by underlocking the i3 in imaginative ways
and applying SIMD opts to only the ARM side)

~~~
berkut
This.

Assuming the compiler will generate good SSE code for the Intel CPU is a joke.
If you write intrinsics for one arch, write it for both.

I'd bet money the Intel side could be made 2.5-3x faster with proper SSE
intrinsics and maybe 5-6x faster with a Haswell i3 using AVX (SandyBridge
doesn't have the cache bandwidth to fully utilise AVX properly).

~~~
haricm
The original OpenCV code already has intrinsics in many portions of the code.
But enabling them results only in a 10% improvement.

We decided to report non-intrinsics version, because reporting the original
OpenCV numbers with intrinsics as SSE optimized would be unfair to Intel.
Apparently its not very well optimized.

My own guess is that if we add intrinsics for Intel to our own C code, it will
boost by around 2x. We could have written a blog without reporting the Intel C
optimized numbers, but that would have been unfair to Intel again.

------
drill_sarge
I am getting tired of this whole dumb x86 vs. arm comparison bullshit. You
cannot compare chips (read: chips not even talking about architecture here
because it's not relevant) which are designed for complete different purposes
(high performance big ooo workstation cpu vs. low power mobile chip). Please
stop making those.

~~~
Scaevolus
People always focus on the instruction set -- "ARM's fixed length instructions
are easy to decode, so it will win in the long run" \-- ignoring how decoding
is a tiny fraction of a CPU's silicon and power budget. Memory controllers,
pipelining, and efficient superscalar instruction dispatch have far more
effect, and Intel has a large lead on ARM in these areas.

~~~
ranjithparakkal
"Memory controllers, pipelining, and efficient superscalar instruction
dispatch have far more effect"

Memory controllers(atleast the SOC(chip) level ones) are normally developed by
the silicon vendor - like Nvidia, Broadcomm, Qualcomm, Samsung, TI, Freescale
etc. Not ARM. And these companies have been working on it for many years. They
have had graphics, video acceleration, display, camera-interface IPs all
integrated into one SOC for almost a decade now. Intel is infact relatively
new to this kind of integration.

In anycase everything including memory controllers, pipelining and superscalar
has already been taken into account in this benchmark.

What has been left out is higher clocks and hyperthreading. Two things that
ARM doesnt have yet.

~~~
acqq
Something is wrong Ranjith, you have more "dead" posts, you probably triggered
something like "too many posts for a new account."

Hallo admins, Ranjith is the author of the linked article!

------
devx
Intel will be in a lot of trouble soon (within 2-3 years).

Forget the Core line chips. That's irrelevant. It will remain a cash cow for
the next few years, but a rapidly shrinking cash cow nonetheless. They'll move
upmarket with them, until there's nowhere to move to.

ARM chips' improvements over the next years will make them "good enough" for
most people, and Intel's Core chips which cost 10x more (literally) will be
very uncompetitive in that environment.

Their only solution is to fight with Atom, but so far no success there, and
even if they succeed, it means their profits will lower dramatically, and they
need to survive as a company with much lower revenue and profit, which means
the "all-powerful Intel" of the past will be but a faint memory in the future.

~~~
drill_sarge
If a company has enough resources (financial and know-how) to pull out the
hammer when needed, it's probably Intel. They have the best people, the best
manufacturing, the most experience etc. Just look how they evolved shitty Atom
processors in something competitive (bail trail). Intel has a long breath. And
once they made arm processors too.

------
vrodic
In addition to what zurn said, they also don't specify the exact Core i3 model
number (it could be an older generation), the RAM speed, and if the data set
fits in cache.

If RAM access is needed ARM machines usually fall behind quickly, since they
usually have much lower RAM bandwidth.

~~~
haricm
The model number used for the evaluation is Core™ i3-530

~~~
mattst88
That's a Westmere from Jan 2010. Westmere -> Sandy Bridge -> Ivy Bridge ->
Haswell.

Comparing with something from 3.5 years and 3 generations ago is useful?

~~~
ranjithparakkal
Copy pasting my reply once again.

You are right that the i3 CPU(Core i3-530) we compared with is a little old
generation. I tried to compare online Core i3-530 with Core i3-2105. The 2105
is SandyBridge(couldnt find a direct comparison with an Ivy Bridge) and runs
at a slightly higher clock 3.1 GHz(while the 530 runs at 2.93 according to CPU
world).

According to cpu-world. The 2105 is 21% faster than the 530 for single
threaded operations. If you account for the speed in clock this would mean
only a boost of 14% improvement over the 530.

So its really not such a bad comparison.

In the meanwhile we will try run this on a newer CPU and let you know the
results.

------
ksec
You Cant Scale a Server/Desktop CPU two step down into Mobile Devices.

This is what Atom is all about, the higher power/performance x86 possible.

And You also cant scale a Mobile Devices up to a Server / Desktop Product.

That is what the ARMv8 Cortex A58 is all about, Low Power Desktop and Server
Class.

So technically speaking both are marching towards each others end. Although
Intel would lose out due to other factor such as business model.

------
haricm
For those who are not convinced. Here is one more benchmark.

[http://www.inpai.com.cn/doc/hard/198143_8.htm](http://www.inpai.com.cn/doc/hard/198143_8.htm)

Page takes a while to load. Then scroll down to the benchmarks. Take a look at
the single threaded Linpack benchmarks between i7@3.5GHz and Exynos@1.6GHz.

