
E3-1240 v5 3.50GHz single core perf worse than E5-2650 v2 2.60GHz PHP 5.X - erichileman
E3-1240 v5 @ 3.50GHz performance is worse than E5-2650 v2 @ 2.60GHz for PHP 5.X. For PHP 7 (and everything else) the E3 is better.<p>The test setup is Xenserver 6.5 w&#x2F;Centos 6.8 kernel 2.6.32-642.6.2.el6.x86_64 HVM guests. Each VM has 2 cores assigned. The test is using siege. PHP 5.4, 5.5, 5.6; are all nearly 50% slower for E3. PHP 7 is 200% faster for E3. Varnish is nearly 300% faster for E3. Sysbench tests are 150% - 300% faster for E3. Only PHP 5.X is faster for E5.<p>I&#x27;ve torn down and rebuilt the VM&#x27;s several times and confirmed they are the same. I&#x27;ve even live migrated them across to the other host&#x2F;proc and confirmed the same results.<p>I&#x27;ve tried strace, but it isn&#x27;t going to work because it adds overhead to every call and the E3 executes that overhead faster. In a browser the E5 TTFB is 167ms; the E3 is 318ms. Stracing the call on the E5 is 548ms; E3 557ms. The E3 executes the overhead of strace faster and the execution times equalize.<p>What is different about PHP 5.X that it would run so much better on the older generation, slower clocked, E5? Is it the larger l1&#x2F;l2 cache making the difference? Or something else, instruction set related maybe? What another tool could I use, that adds a little overhead, to see the php execution performance?
======
techjuice
You are comparing low end Xeon processors with high end Xeon processors
($250-$280 vs $1166-$1180 per processor). You would need to use the same
series E3-1240 v2 vs E3-1240 v5 to have a more accurate test.
[http://ark.intel.com/products/88176/Intel-Xeon-
Processor-E3-...](http://ark.intel.com/products/88176/Intel-Xeon-
Processor-E3-1240-v5-8M-Cache-3_50-GHz)
[http://ark.intel.com/products/65730/Intel-Xeon-
Processor-E3-...](http://ark.intel.com/products/65730/Intel-Xeon-
Processor-E3-1240-v2-8M-Cache-3_40-GHz)

------
qb45
Maybe

    
    
      perf stat -d php ./benchmark.php
    

would show some difference? It measures some kernel and CPU events like
context switches, page faults, L1 and L3 cache misses.

~~~
erichileman
Unfortunately <not supported>

Performance counter stats for 'php56 index.php':

    
    
            394.588620      task-clock (msec)         #    0.983 CPUs utilized
                   226      context-switches          #    0.573 K/sec
                     2      cpu-migrations            #    0.005 K/sec
                17,447      page-faults               #    0.044 M/sec
       <not supported>      cycles
       <not supported>      stalled-cycles-frontend
       <not supported>      stalled-cycles-backend
       <not supported>      instructions
       <not supported>      branches
       <not supported>      branch-misses
       <not supported>      L1-dcache-loads
       <not supported>      L1-dcache-load-misses
       <not supported>      LLC-loads
       <not supported>      LLC-load-misses
    
           0.401580145 seconds time elapsed
    

Working on finding the event descriptors...

------
lossolo
PHP 5 is allocating memory differently from PHP 7. That's why you see
difference there and this is the biggest difference between E5 and E3 here
(memory bandwidth,cache size). PHP 7 is making optimizations making less
memory allocations because it allocates in chunks, PHP5 is
allocating/reallocating all the time.

~~~
TazeTSchnitzel
Yep: [http://www.slideshare.net/nikita_ppv/php-7-what-changed-
inte...](http://www.slideshare.net/nikita_ppv/php-7-what-changed-internally-
php-barcelona-2015)

~~~
erichileman
Memory seems likely.

On the E5 php 5.6 in top we see sys at 15%.

On the E3 php 5.6 in top we see sys at 7%.

On the E5 php 7 in top we see sys at 13%.

We are exploring memory perf more now.

------
peller
I'm just speculating here, but if I remember correctly, PHP5 uses
significantly more memory than PHP7, and the E5 has a 2.5x larger L3 cache and
almost twice the memory bandwidth of the E3. Perhaps that has something to do
with it?

~~~
erichileman
We thought that as well. The E5 has 4 memory channels max bandwidth of 51.2
GB/s. The E3 has 2 memory channels max bandwidth of 34.1 GB/s.

But we see a dramatic difference in single core tests. Our virtual machines
have 2 cores assigned and there's also a dramatic difference. I wouldn't think
that 1-2 cores would saturate 2 memory channels nor 34.1 GB/s bandwidth. If we
were testing all 8 cores on the E3 vs E5 8 core virtual machine, yeah maybe,
but 1-2 cores?

The L3 cache is much larger on the E5 at 20MB Smartcache vs the E3 at 8MB
Smartcache. That seems to be the more likely suspect but I don't know enough
about how the cpu cache is used in relation to php to say for sure. Hopefully,
someone else does :)

Ref: [http://ark.intel.com/products/88176/Intel-Xeon-
Processor-E3-...](http://ark.intel.com/products/88176/Intel-Xeon-
Processor-E3-1240-v5-8M-Cache-3_50-GHz)
[http://ark.intel.com/products/64590/Intel-Xeon-
Processor-E5-...](http://ark.intel.com/products/64590/Intel-Xeon-
Processor-E5-2650-20M-Cache-2_00-GHz-8_00-GTs-Intel-QPI)

~~~
rektide
You have talked about everything but what the parent was mentioning- actual
cache. As in, L1 and L2. Those vary sizably among the different price tiers,
somewhat understably, for reasons related to this.

On recent IBM Power chips, there's a so called PowerCore option that turns off
half the cores, and lets the remaining cores double their L2. On some
workloads that's a net win. I also tend to think it's there for those people
paying a pricey per-core or per-socket fee, where a modest 15% performance
gain/core could be very rewarding in a way that scale-out/more-cores can't
replicate, but that's in a different realm than anyone I know.

~~~
erichileman
See the other comment above re: perf stat. Working on the event descriptors to
see and confirm the l1/l2 cache hits/misses.

------
sliken
Look at the cache miss counters, I suspect that's the explanation. Your other
workloads are more cache friendly.

------
mschuster91
> Varnish is nearly 300% faster for E3

That is the most worrying thing IMO. If the cache is hot (i.e. all loads are
from RAM), then the E5 should be vastly more powerful, not vice versa...

Could you try the benchmarks with Gentoo, with optimised builds for each CPU?

------
nanis
Just to make sure: You built all binaries and linked libraries yourself, from
scratch, with the same optimization settings, right?

~~~
erichileman
The binaries are from remi repo. We have a template we provision from. I used
the same template on each virtual machine.

PHP 5.4.45 (cli) (built: Sep 19 2016 15:31:07) PHP 5.5.38 (cli) (built: Nov 9
2016 17:32:11) PHP 5.6.28 (cli) (built: Nov 9 2016 07:04:38)

The binaries are the same on each virtual machine. Are there build
optimizations for E3/V5 vs E5/V2 that could make such a difference?

~~~
cthalupa
>Are there build optimizations for E3/V5 vs E5/V2 that could make such a
difference?

Potentially, yes.

Either way you're not comparing apples to apples using packages from a 3rd
party repo built for a different system.

Newer processors have different instruction sets (AVX is a big one). You'd
want to make sure you're not only compiling it on each different platform, but
also using a new enough compiler to support the instruction sets.

~~~
erichileman
Correct. We are going to build using the correct -march flags:
[https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html](https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html)

~~~
erichileman
>>gcc -march=native -Q --help=target

>>march=silvermont

gcc native thinks this e3 is silvermont, a low power SoC.

~~~
erichileman
We built 5.6 with -O2 -march=broadwell -mno=avx (had to remove, probably pecl
ext issue).

There was about a 15% performance gain. Nothing that would explain the large
difference between E3 and E5.

------
the8472
Have tried comparing on bare metal?

~~~
erichileman
We have not compared on bare metal.

