

Amazon Orders More than 10,000 Nvidia Tesla cards - SlimHop
http://vr-zone.com/articles/amazon-orders-more-than-10-000-nvidia-tesla-k10-cards-k20s-to-follow-/17340.html

======
Karhan
I remember reading a blog post of the peculiarities of GPU programming and the
post noting that for most modern graphics cards (at the time) if you can keep
your computable data in chunks no bigger than 64kb a piece you can expect to
see enormous performance gains even on top of what you'll see by using
openCL/CUDA because of a physical memory limit on the actual GPU itself.

I also remember thinking that a 64kb row size for DynamoDB was very odd.

I wonder if these things are at all related.

~~~
pavanky
You are taking about shared memory in CUDA, local memory in OpenCL. When you
are reading from the same location over and over again (most notable cases are
linear algebra functions, filtering in signal processing), reading from DRAM
is going to be costly. This is solved on the CPUs by having multiple layers of
caches.

Early generation of NVIDIA gpus did not an automatic Caching mechanism or
could not for CUDA, I forget) that could help solve this issue. But they did
have memory available locally on each compute unit where you could manually
read / write data into. This helped reduce the overall read/write overhead.

Even when the newer generations have the caches, it is beneficial to use this
shared / local memory. Even when the shared / local memory limits are hit,
there are alternatives like Textures in CUDA, Images in OpenCL that are
slightly slower, but significantly better than reading from DRAM.

------
mercuryrising
Amazon's cloud might be one of the coolest things I've seen in a while, hop
on, get some of the best computing performance possible, get off and save some
money. If you have a random data analysis problem that would take your
computer three weeks, why not just pay $10 and get it done in two hours (plus
a few hours of debugging)?

If the article is correct, Amazon paid 15 million for those cards which will
be out of style in about two years (not that they have to get rid of them, but
something faster, easier to maintain (if Nvidia starts opening up to Linux),
with more memory and less power usage will come out. They'll have to fork over
a large sum of money again to keep their top "on demand computing" title.

Amazon's cluster GPU right now has two Nvidia Tesla Fermi's in it. I'm going
to assume Amazon will split their new cards into twos and fours, at about half
of each. That's ~1750 new computers that are going to load up. Looking at the
current rates of the cluster, it's $2.100 for an hour of the normal, I'll say
it will be $4.200 for an hour on the jumbo with 4 GPUs.

They paid $15 million for just the cards. They need to get 2380952 hours of
usage out of the machines to break even on the cards. They need to log 1360
hours per machine to break even, or have someone run all the machines at full
bore for 56 days. As the cards are the most expensive component (assumption),
and the total price of the computer will be about the price of one of the
cards, we'll add a little bit of over head for all the other things they need
to do to make it work - 120 days of full time use to break even on an
investment of about $25 million (they need to buy lots of other things to put
all the GPUs in, and worry about all that heat, and have a place to put it
all, and have people install the new computers, etc...). I wonder what the
actual usage of those clusters are, and if they've had anyone sign a deal
saying we'll use the cluster for an entire month. That's a beautiful maneuver
though, say CERN didn't want to do all the data analysis from the LHC in house
because by the time they got to this part of the experiment, their technology
they purchased previously would be way out of date. Just let Amazon do it.
They will always have the latest technology, and you'll have an inexpensive
way of leveraging that power.

Assuming they can make it all work (and I'm sure a lot of their decisions now
are strategic decisions aimed at future investments) this is a great time to
be a computer user, log on and get the best for a couple hours for a couple
dollars. Instead of shelling out $1500 on a new computer personally, I could
log a ton of EC2 hours getting significantly faster, more powerful machines,
that never get 'stale', and their lives are much happier (my computer probably
doesn't do anything "intensive" 70% of its life, whereas the EC2s are probably
pushed a bit harder than that).

~~~
dave_sullivan
I've looked hard at using amazons gpu clusters, and the math just doesn't work
out.

If you're working on applications that will need to be using the gpu
regularly, you can build a system with 4 gtx580s for about $3,000, and one of
those systems will outperform 2, maybe 3, aws gpu instances, which will run
you about 1000 per month each. The ownership number does not include data
center/power/etc., but I still think buying is better value if you'll be using
it a lot.

Now, if you're running gpu jobs sporadically, aws may make sense, but you
should really look carefully at this, it's not the same value relationship as
hosting web servers on aws (which I'm a general proponent of).

Although, to be fair, that may change if they really do pass on some of their
savings from this deal to the user.

~~~
samg_
Yeah, I did a few trial runs on the cluster gpu instances maybe 4 months ago.
I found that, while the gpus themselves were really quite fast, moving data in
and out of gpu was not. Maybe amazon will focus on increasing bandwidth to the
gpu for the new boxes.

~~~
pavanky
To be fair, there should not be a lot of data transfers to and from the GPU.
Moving larger chunks of data (instead of many smaller ones) when you are
running out of memory, using asynchronous data / compute streams would
increase the performance.

