
How IBM Stacks Up Power8 Against Xeon Servers - baazaar
http://www.theplatform.net/2015/10/13/how-ibm-stacks-up-power8-against-xeon-servers/
======
ausjke
Too late for the game. Power should do this way earlier. The Linux community
was booming around PowerPC a few years back and now all the brains left to
work on either ARM or x86. Without that community support(Power does not
really run Windows), the chip is just a piece of cold hardware even though it
can shine spotty in the spec.

One of Power's biggest user is probably a licensee in China, who replaced its
crypto logic and use it for their own needs, but that's far from enough to
compete against Xeon.

~~~
teepo
It's all about Linux. IBM's running Power8 with RedHat in SoftLayer data
centers. Hadoop and jboss are some of the target workloads.

~~~
ctstover
I remember the announcements back in May, but I still don't see an offering on
their site. Can one really get a Linux instance on Power by the hour / month
in SL? Even better in my case would be just a VPS, as a build node does need a
monster server.

At the same time, if they are "using it behind the scenes", yet not offering
it to customers, that would be very bad for brand image. Isn't the crux of the
selling points server consolidation? Which of course is the same exact thing
"cloud provider".

I still haven't tried it, but siteox.com has it though.

~~~
pm90
IBM is a enterprise behemoth. If you look at their sources of revenues, they
focus a lot more on B2B with a few selected spinoffs designed to generate some
public interest. Its not surprising they don't publicize a lot of things they
do.

------
jmnicolas
My inner geek is longing for an affordable Power8 desktop machine ... do I
NEED it ? Well ... you know ...

~~~
Quequau
I hear you man! It's a shame that there's never been a reasonably priced ATX
POWER mainboard available to individuals without a major service contract with
IBM.

Even that barebones dev server offer through OpenPower was over priced.

------
graycat
Question: I looked up the price of an 18 core Xeon processor, with a clock
speed under 3.0 GHz, and got $4000+.

So, why be willing to pay so much per core for such slow cores?

At first I guessed that having lots of cores per processor would reduce the
number of processors needed and, then, maybe for some software reduce
licensing costs, but the IBM Power processors likely are going to be running
Linux. So, the goal is to reduce licensing costs for Oracle? Something else?
Or licensing costs has nothing to do with it?

As I plan my server farm for my startup, what am I missing about the value of
paying $4000+ for 18 relatively slow cores?

~~~
vessenes
Those cores are going to be sold to you largely on their memory bandwidth
benefits for the top end of the Xeon range.

If you need memory bandwidth, you will gladly pay for them, and look seriously
at IBM (something we just did).

If you don't know that you need memory bandwidth, just skip them. Also, I
would suggest you check out OVH or Hetzner for a new startup -- it's (sadly)
very unlikely that you will need enough servers for long enough to make buying
your own a good plan.

~~~
graycat
Thanks!

> Those cores are going to be sold to you largely on their memory bandwidth
> benefits for the top end of the Xeon range.

That is, (A) _bandwidth_ , bytes moved per second or (B) total permitted
memory size, say, 1/2 TB?

My guess is that memory sizes of 1/2 TB require _registered_ memory, that is,
a _register_ in the memory to simplify timing which can be challenging for
such large memories, but the use of the register is an intermediate stop on
the way to/from the processor and its cache(s) so, really, reduces bytes per
second that might be achieved with, say, the simpler, _consumer_ 1600 MHz,
DDR3?

Of course, other issues could include number of electronically _independent
channels_ to/from memory, address _interleaved_ memory, etc.?

> look seriously at IBM

I looked at the article; IBM seems to be trying to sell hardware (again!).
Okay.

So far my software is all written for Windows, and my guess is that Windows (7
Pro or Server) doesn't run on IBM's Power processors? And even if Windows does
run, lots of other software that runs on Windows and Intel x86 likely won't
run on IBM Power?

~~~
vessenes
For where you sound like you're at, I wouldn't even worry about it. Usually
when we say bandwidth, we mean bytes/second, to and from the caches and main
system memory.

But, really, don't worry about it -- for windows, ovh or hetzner, or if your
workload varies a lot, azure or AWS are almost certainly what you want; put
the time in to product development until it's so successful that you _need_
the tech help to scale.

~~~
graycat
> Usually when we say bandwidth, we mean bytes/second, to and from the caches
> and main system memory.

I thought that the _registered_ memory of high end server processors that
could support 100+ GB of main memory were significantly slower in _bandwidth_
than the DDR3/4 main memory of consumer processors.

------
mtanski
The spark workflow is 10% cheaper (24.36 vs 21.88) over 3 years. Is that
really worth having to support a non-mainstream architecture? I doubt it.

~~~
vessenes
I just did this math for our company a couple months ago.

Power8 memory bandwidth is VERY appealing. And, it's not always just about
cost per benchmark unit -- if you have some realtime requirements for your
analysis tools, then scale-up speed can be really valuable as compared to dev
time and management time.

In the end, we made the call for Intel because golang runs some key parts of
our tools, and the golang power8 story isn't there. But, as I gaze at our
servers where we paid what feels like thousands for extra megabytes of L3
cache, I wouldn't say I'm happy about the decision. A good go story from IBM
would have likely tipped things the other way.

~~~
mtanski
I think you just kind of proved my point that real work load benchmarks would
have to be much more attractive to offset the cost of supporting an
architecture that's not tier one in many languages / software libraries /
projects.

~~~
vessenes
Totally. That said, IBM isn't short on optimizing compiler folks. Open
ecosystem support is a strategy to pick up small and mid-size buyers; it will
be interesting to see if IBM gets there. I would look again next time we
source hardware.

And, if IBM put someone internally on a properly vectorized go compilation
pathway for Power8, I would buy in a heartbeat, provided it ran some sort of
debian variant.

~~~
mtanski
I don't think that go as people tend to use it really benefits from vectorized
code. Most people who are using go that I interact with are not writing
numerical processing code but network servers and business logic for high
level web APIs. You might get minor speed ups in vectorized memcpy but I can't
see much else.

I imagine the most important CPU features for most go code would be a good
branch predictor and fast atomics / synchronization primitives.

If you're using go for numerical processing code I'd like to hear more about
it. Mostly because it's kind of a PITA.

~~~
vessenes
Well, there are almost no real vectorizable primitives or functions in the
core library so I'm not surprised that you don't run into people vectorizing
much. And, go dev team compiler focus has been elsewhere the last year.

And, so far the go team hasn't seemed to be able to interest Intel in doing
the heavy lifting that they might do for some other compilers..

So, branch predictions and faster sync primitives would be great, not least
because they would speed up channels in many cases, which would be cool; it
would be nice to widen the use cases for channel-based communication
significantly, but they're just VERY slow if you want to use them at scale in
a large application.

I am using go for some large scale numerical processing, although it's the
sort with lots of logic attached, not just a giant matrix with some glue
around it. It's kind of a PITA. We are picking and choosing some outside
libraries, and spend a lot of time massaging the go code for speed and
bitching about the garbage collector. (Did you know that for i, _ := range is
often 3 to 4x faster than for _, v := range? Do you know how awful code
written down four or five nested loops that uses indices looks?)

But, the size codebase our team can manage with go is pretty great. We
wouldn't be nearly so productive in many other cool (or .. experienced)
languages when you add up the full life cycle costs including innovation,
enhancement, bug fixes, maintenance and deployment. It's a win. I'd do it
again in a heartbeat.

~~~
mtanski
> So, branch predictions and faster sync primitives would be great, not least
> because they would speed up channels in many cases, which would be cool; it
> would be nice to widen the use cases for channel-based communication
> significantly, but they're just VERY slow if you want to use them at scale
> in a large application.

These operations are already pretty good on IA* processors, at least in
comparison to the less mainstream architectures. Other architectures focus on
either bandwidth, parallelism (but often without a great synchronization
story), and optimizing power usage. So I doubt that Go lang would benefit from
moving to Power.

Some choices the Go people made about how channels how limited their options
for optimizing channels / increase the complexity of a lock-free
implementation (don't have the mailing list link handy). If you don't need all
these guarantees you can use pick a SPSC, SPMC, MPMC implementation that might
work better for your use case.

> (Did you know that for i, _ := range is often 3 to 4x faster than for _, v
> := range? Do you know how awful code written down four or five nested loops
> that uses indices looks?)

Yes, in the second version you have to make a copy of v. Depending on how
large v this is how large the impact will be. The first version just
references the array cell via a[i] and for that there's no copy needed and
it's one assembly instruction. Maybe the optimizer can become better here but
I'm guessing it might break some language contract.

------
alyandon
So my question is - how much extra power does the IBM Power8 cpu need in order
to get performance that is on par with Intel's Haswell?

~~~
kev009
Look at the memory bandwidth numbers. These completely outclass the Xeon in
certain types of workloads, apples to oranges.

~~~
StillBored
Yes if you need more memory bandwidth, POWER is where its at. OTHO, as I've
been saying the base cores don't do as well on single threaded loopy or
problems with a decent L1/L2 cache hit rate. The specint_rate numbers look
good because of the x8 threading, the single core results probably don't look
that good.

So for many workloads it won't be that great. For the one I was working on out
of the box, the power was 1/2 as fast. But that wasn't fair because we had a
couple highly optimized x86 code paths. Doing some basic POWER optimizations
brought the performance in line with the x86. But on some benchmarks it would
win, and lose on others. So while we utilized a _LOT_ of memory/IO bandwidth
the fact that there was nearly 2.5x available in the power system over our E5
didn't give us enough of a boost to make it worth the higher price tag (nearly
3x in our case, cause we were comparing with a supermicro machine). Maybe this
newer power machine changes that a little.

~~~
vardump
Supermicro is where it's at. 10k buys you 4x dual socket 6-core Xeon E5-v3
servers (2U Twin^2). Totally 48 cores, 256 GB DDR4 RAM (minimum config).

------
groupmonoid
By looking at the IBM-provided benchmarks, it looks like the new Power8 is
better. But are there any independent benchmarks out there already?

~~~
TheCondor
It probably doesn't matter, they aren't better enough. Now if IBM sold these
maybe 30% cheaper, it could be a different story

------
eliben
The almost $3K price tag on a "Linux OS" is disturbing

~~~
nabla9
Why you find it disturbing that someone wants to pay for OS support?

~~~
eliben
That's a lot for supporting a free OS on a _single server_

~~~
fnordfnordfnord
It comes with an engineer's phone number.

