
Intel Xeon D 12 and 16 core parts launched: first benchmarks - rbanffy
http://www.servethehome.com/intel-xeon-d-12-and-16-core-parts-launched-first-benchmarks/
======
matt_wulfeck
> If you are using a compute optimized AWS c4.4xlarge instance: you will be
> able to purchase the Intel Xeon D-1587 system for about the same price as
> the “Partial Upfront” up front AWS fee, then colocate the box saving over
> $1500/ year per instance while getting better performance.

I hear this type of argument a lot, but it's so important to include the cost
in engineering and talent necessary to keep your datacenter humming.

If you're a small shop with a small future growth expectation then sure,
forget Amazon and start racking your own boxes. Just be ready to hire ops
staff competent to run your operation. You need to be realistic about both
aspects of cost.

~~~
ChuckMcM
Having done this computation a number of times, with these machines it would
flip over to 'colo' at about 150 machines. The really interesting thing is
that amazon costs scale linearly with size while owned infrastructure scale
fractionally. So the advantage just keeps growing and growing.

Bottom line, if you can host your app and resources on less than a 150
machines in Amazon you win, more than that and you're leaving money on the
table. Once you get to the point where you are deploying new datacenters to
support your customers you get a huge boost in operational efficiency.

~~~
thezilch
Note: I manage dedicated-server hosting and all web services on Titanfall @
Respawn.

Surely your calculation doesn't / can't include bandwidth charges? And if you
have Windows hosts... cloud screws you too.

You're not wrong that it takes a lot more machines than most shops need, and
if you can make your scale elastic, 150 machines probably goes a long way! And
to your point, developers undervalue their time a lot, and dealing with
colocation and sourcing of hardware and backups can be a real drain to save
some on CapEx.

~~~
ChuckMcM
Our transit costs out of our Colo in Santa Clara are $6K/month for two
dedicated gigabit lines. That translates roughly to 518 TB of data transfer a
month if we could keep the pipes fully lit 24/7\. Cogent has offered to make
one of our gigabit lines burstable to 10G if we would like, same cost if we
don't exceed a monthly average use of 1000Mbps.

And we use Linux so no Windows hosts charges.

There are more and more tools that are force multipliers for your Site
Reliability Engineer (SRE) equivalents. You can't avoid swapping out dead
drives, but with the right systems architecture you can make it pretty
painless.

~~~
rdl
Wow, you're getting ripped off.

~~~
cbg0
Indeed, I've seen Cogent go for $800 - $1000 for a 1GE over 10GE commit.

~~~
hhw
It's possible to only pay the lower end of that range for a 2Gb 90th
percentile commit on a 10Gb connection out of most major markets. Cogent
pricing fluctuates wildly depending on your ability to negotiate.

Ignore the whole song and dance when they have to bring in a VP into the call
to specially approve the pricing they're trying to pitch to you at. Those are
standard transit sales tactics. Or avoid Cogent's sales entirely by working
with a reseller.

~~~
matt_wulfeck
Speaking with the VP to Negotiate for transit prices? Sounds like a pita. AWS
knows that you're an engineer who just wants to write code and has made that
as frictionless as possible.

This goes back to my first point in that going into the DC requires staff with
very different skill sets. It's not that it's not worth it, but there's a lot
of costs involved part from the per-hour instance cost.

------
mey
"128GB is now starting to become a limitation. With 16 cores that is only 8GB/
core."

This comment gave me pause, I feel like I'm falling out of pace with the
evolution of commodity server systems. We've gone from being network
(10/40gbe) and disk io (fixed with SSD, PCIe NVRAM and Fiberchannel) bound in
the the near past to swing towards memory capacity bound?

~~~
xxpor
I think that's more of a reaction to the "fixes" to disk IO still not being
fast enough.

The more you can keep in memory, the better off you're going to be, even with
the fastest SSDs in the world.

~~~
jakub_h
Isn't 8 GB per core still reasonable? Especially if you can share indices and
buffer caches across cores, for example. There must still be diminishing
returns for caching.

~~~
Sanddancer
Depends on your dataset. Systems like Databases perform much, much better the
less you have to touch a storage controller. One of the things that's going
into Intel's 2017/2018 platform is the support for 6 channels of memory, and
not all of that needs to be DRAM. There's a new standard, NVDIMM, that lets
you use non-volatile memory as RAM, and take advantage of its much higher
densities. So, if you have things like a database, you can map a much larger
dataset into actual memory, instead of needing to just have the indexes in
there.

------
bnastic
Every couple of years Intel hardware keeps changing my mind from "I'll just
rent everything in the cloud" to "I should build a server from this and keep
everything in house!". And vice versa.

------
iheartmemcache
I can't stress how game-changing this is to anyone in super-computing. (The
whole direction Intel's been taking for the last ~5 years has been poised for
this.) I went into a bit of detail here[1] (n.b. this was pre-Xeon D, the
addition of which will only increase performance) for those who are curious.

Right now we're living in a golden age of financially accessible super-
computing. We don't need to deal with MPI anymore, we just use RDMA and have
Infiniband speed fetches to any machine joined to the cluster[see: MS Paper].
I was working on RELION as a favor to my father and got deeper and deeper into
it because my docket was pretty open and it was fascinating. Long story short,
I sketched up a test setup with Phi, RDMA (which admittedly wasn't used that
much, as the problem set would be characterized by anyone as "embarrassingly
parallel") and 10GBit. This is all commodity stuff, and we effectively _never
touch disk_ \-- I cobbled together 18650's in the rack[see: batt] instead of a
UPS with a uC which fires off a 'persist state to disk' message on any power
interrupt but other than that it's all RAM and blazing fast. (Side-note:
anyone who read the _Nature Methods_ special on all the Cryo-EM stuff may be
seeing a paper or two with a few new sets of computational methodologies in
the near future ;))

Buyer beware though -- sacrificing clock speed for more cores can end up
costing you a lot more overall[2]. Licensing policies have a tendency to
change around from version to version, so your perfectly licensed Oracle 11g
(lets say it was quantified by the physical chip you throw into the socket)
might not have a straight forward upgrade path (now by the number of vCPUs). A
lot of clients of mine got burned trying to move from Iron to AWS, only to
realize that Oracle licensed per _available CPU_. Fines galore. They ended up
downgrading to a previous revision Xeon (i.e. 4th gen E5 MSRP $x,xxx instead
of 5th gen E5 MSRP $0,xxx with better performance but more cores). Anyone
encountering this problem should keep that trick up their sleeve and find some
1 year old off-lease equipment with less cores but a higher clock speed to
keep license compliance and still meet your computational demands. Anyways,
yeah the latest and greatest might not be the most economic of choices for
this reason[licensing].

[1]
[https://news.ycombinator.com/item?id=10805087](https://news.ycombinator.com/item?id=10805087)
[2] This has so many variables as one can imagine - you can't just go by what
people did in 1975 i.e. "this instruction takes x cycles, we can physically
count how long our computation will take with a sheet of graph paper" because
of pre-fetching, pipelining, cache patterns, the use (or lack of use) of the
AVX(2) registers, etc etc. [3][https://software.intel.com/en-
us/articles/intelr-xeon-phitm-...](https://software.intel.com/en-
us/articles/intelr-xeon-phitm-coprocessor-february-developer-webinar-qa-
responses) [MS Paper]
[http://sigops.org/sosp/sosp15/current/2015-Monterey/printabl...](http://sigops.org/sosp/sosp15/current/2015-Monterey/printable/227-dragojevic.pdf)
[Batt] As you probably have guessed, I have no formal EE training. I did
however rigorously read all of the safety datasheets, use very high quality
Panasonics, and strictly conform to the CC-CV guidelines so my fathers lab
would not catch on fire. This is probably not the best idea, but it's
electrically safe and isolated. It'd pass UL certification.... at least I
think ;) [licensing] If you're a corporate entity who is moving into any sort
of cloud or onto new hardware, my firm has quite a bit of expertise in
licensing compliance when either a) shifting to new hardware, or b) shifting
to the cloud, as well as getting your best bang for buck with existing
licenses.

~~~
voltagex_
>I cobbled together 18650's in the rack[see: batt] instead of a UPS with a uC
which fires off a 'persist state to disk' message on any power interrupt

How did your tests go with this? I'd totally use a system like this - properly
packaged up of course. Li-ion and Li-poly make me nervous.

~~~
iheartmemcache
I trust it as much as commercial grade APCs I've used in DCs. It definitely
cost more in my engineering time than buying an off-the-shelf component, since
I spent a lot of time researching it. But I'm a hardware nerd at heart and
rarely get to putz around with this kind of stuff, so it didn't really seem
like labor to me.

The precautions I took were probably overkill but since it wasn't my lab I
chose to go pretty overboard. In addition to conforming to the strict charge-
discharge guidelines Panasonic thoroughly delineated in their sheets, were to
use 1: use "protected 18650's" (actually designated as 19760) which have
internal circuitry to keep, say, cheap chinese eBay charge-units from setting
peoples house on fire, 2: sourced the batteries from a vendor I've trusted for
ages (since the Panasonic 'greens' have such a popular reputation, and knock-
offs make it into all sorts of markets including first-tier shops like Amazon
and Digikey, I asked my vendor to source directly from Panasonic JP, which he
did and provided me with a packing slip). Fear of Li-* is reasonable, and I
figure risking permanent damage to ones body and/or life isn't worth the 7
bucks you save buying no-names - go with the 19760's and live life to the
fullest! (Pro-tip, the form factor of the 760s are about a mm or 1.2 mm or
something longer than unprotected 850s due to the added circuitry -- which I
presume is just a PTC thermistor + kill switch, I haven't looked it up though.
I'm sure there's a mass difference too if you have access to lab quality
scales or are friendly with your local cannibus vendor.

Thrown in the case I have a live K-type thermocouple in all 8 of the 1u's (a
pair in each rack) with conservative failure policies should temperatures
exceed parameters I set. I put put some nice fusing (HRC.. why? because I was
already going overkilling and since this wasn't an off-the-shelf component,
and I'm not an EE with extensive PDU experience I decided to go safe. (I also
consulted with a post-doc buddy of mine who's actively in lithium battery
research and after a good chuckle told me the precautions I took were
satisfactory.) It came in at around 5k USD for all 8 units with everything,
including ~115 a piece for the Hammond 1u's. Each unit can sustain a little
under 4.5 kVA which gives more than enough buffer time for a graceful shutdown

------
tosseraccount
Passmark has some benchmarks, including the E5 v3s , updated today ...

[http://www.cpubenchmark.net/high_end_cpus.html](http://www.cpubenchmark.net/high_end_cpus.html)

"The King of the Hill" is only 2.3 GHz ?

~~~
biot
The article explains why:

    
    
      What one will see, very quickly, is that the new SKUs generally offer
      slightly lower clock speeds to maintain a 45w TDP. Maintaining this
      figure while adding 50% to 100% more cores and cache is no small feat
      and it makes sense that clock speeds suffer. We also see the TDP
      figures rise to 65w in order to accommodate more cores and higher
      clock speeds.
    
      We introduced the Core * Base GHz and Thread * Base GHz figures just
      to show how much of an improvement this is. The new chips represent
      double the cores but up to about 62% more clock cycles in aggregate
      over what we had as the previous fastest chip, the Intel Xeon D-1541.
      We also now, in the same TDP figure, have 23% more raw compute.

~~~
voltagex_
With the "23% more raw compute" figure, say I've got a program running on an
older system - do I need to recompile that binary (with new optimisations /
assembly instructions) to see that improvement or is that an improvement in
efficiency for existing instructions?

~~~
cbhl
If your program is single-threaded, it will run slower. If it parallelizes
well (e.g. ray tracing) then you will see that improvement.

------
dman
Is there some cloud for developers where one can try out new hardware from
Intel / Nvidia / AMD? This is turning out to be a great year for hardware and
theres too much new stuff coming out for my home lab.

~~~
Marat_Dukhan
I am working on [http://www.peachpy.io](http://www.peachpy.io) (an academic
project) which lets performance tuning experts optimize assembly kernels for
various x86-64 microarchitectures. We currently have Intel Nehalem, Intel Ivy
Bridge, Intel Haswell, Intel Broadwell, Intel Skylake, AMD Piledriver, AMD
Steamroller, Intel Bonnell (1st-gen Atom) and AMD Bobcat in the pool.

Source code:

\- [https://github.com/PeachPy](https://github.com/PeachPy)

\-
[https://github.com/Maratyszcza/PeachPy](https://github.com/Maratyszcza/PeachPy)

~~~
jusssi
On the site, the "About PeachPy" link at top right corner seems to do nothing.

~~~
Marat_Dukhan
Peachpy.io is a work-in-progress...

------
simplexion
I wonder how much it will cost to have 2 of these in a server running Windows
Server 2016.

~~~
Sanddancer
These can only support a single socket per box. Also, Windows Server has
charged per physical processor for the past several editions, instead of per
core, so the $400 version would do. Now, if you wanna see sticker shock, ask
to see what Oracle will charge to run on the box.

~~~
simplexion
Server 2016 will be charging per core.

------
bhouston
We are trying out a few of Intel Xeon D 1520 machines as the primary servers
for [https://Clara.io](https://Clara.io). They seem to be working fine at a
low cost.

------
intrasight
>ahead of the Xeon E5 V3 series I'm confused (probably because I don't
understand Intel Xeon). Isn't that Haswell, and thus now two generation back?

~~~
Sanddancer
The Server chips are pretty much always a generation back, having a
development cycle to try to fix any erratum that may have popped up, as well
as to do work to rearrange cores, cache, add a good amount of IO, and ditch
things like the onboard graphics.

Intel's supposed to be releasing the E5 v4s in the next few weeks, which will
be Broadwell based. edit: These are also Broadwell based, supplanting the
Haswells they released last year.

------
userbinator
I wonder what the 'D' stands for, as the next earlier Intel CPU with 'D' in
its model name was the Pentium D, where it meant "dual core".

~~~
wmf
It stands for "full employment for Intel's branding department". Calling it
Xeon E4 would have been way too simple.

