

Nvidia Tesla Supercomputer (960 cores) - wesleyd
http://www.nvidia.co.uk/page/personal_computing.html

======
fireteller
(I can't see any of the 11 comments so forgive me if post is redundant in some
way)

Visual Effects production often needs lots of computing resources for
rendering. This is a task that can be heavily parallelized. Usually this is
done on a frame/per core basis, with memory being the limiting factor. Now
I've started using EC2 recently which made me realize a different approach to
rendering tasks.

Normally a facility has a fixed number of CPUs available to run jobs. So for
example 100 cpus. Let's say the average number of frames to render is usually
around 200. So if each frame takes an hour to render and your's is the only
job on the queue then it will take 2 hours to see the complete shot. And it's
much worse in the real world with many artists working on many shots. Whereas
using a resource like EC2 you can always render your entire shot in 1 hour
regardless of the number of frames. You're only limited by the time it takes 1
frame to render, and the cost is the same if you use 1 cpu or 200.

In other words one can trade depth for breadth. Now this may seem obvious to
anyone familiar with EC2, but for me it means that tools like Tesla which
where once in high demand for this type of work are now much less valuable. I
would expect that it's price/performance is much better then EC2, but where is
the cutoff? How many hours do you have to run that Tesla to come out ahead?
And if you're running 1 Tesla for that many hours, might it not be worth a
premium to get your answer sooner by running more massively parallel (but for
less wall clock time) on an EC2 like service?

I suspect these types of tools becoming only relevant to real time
applications (due to reduced latency vs. EC2), and or nonstop computing
(assuming there's a big price/performance win).

~~~
wmf
Pretty soon someone will rent you GPUs by the hour, at which point you'd have
the advantages of elasticity and price/performance.

~~~
andr
That actually sounds like a neat business.

~~~
twak
It's easy to imagine EC2 expanding beyond 1U servers and databases to GPGPUs,
FGPAs or quantum cores. There's a lot of potential for a gpu cloud - from
quick rendering (xtraNormal/nVidia Gelato) to scientific computing
(Larrabee?).

------
rys
It's not quite 960 cores. There are 960 main ALUs across the system, but
they're 8-way per item you'd maybe call a core (the stream multiprocessor) and
run the same program instruction per clock. There are multiple partitions to
execution on top of that, too: 3 cores share a memory space, 10 to a chip,
each chip with access to board memory, three chips in the complete Tesla
system.

Those 960 ALUs are just the main FP32 and INT32 hardware, too. There are
others (FP32 MUL and special function units, and an FP64 ALU too, per core).

------
blahblahblah
This is probably very cool for people doing CG work. It's still too limited
for most scientific work though because only single precision floating point
is supported. If they'd make a version of this that supports double precision
arithmetic and support it in the BLAS library so that parallelization of
matrix operations occurs automatically without having to change existing
MATLAB code, these things will sell like crazy in the scientific community.

~~~
wmf
CUDA now supports double precision and BLAS.

------
andr
I'd love me 2 of those. Has anyone had experience with CUDA?

~~~
lutorm
Yes.

It's pretty nice. As nice as C can get, that is... The biggest hurdle for me
was thinking data-parallel as opposed to sequentially or even task-parallel.

~~~
chasingsparks
I use CUDA for finance based Monte-Carlo simulations. Unless you are at a
place with a huge cluster that you have access to, some problems cannot be
solved without CUDA.

~~~
light3
One downside to CUDA is its massively powerful for single FP precision, but
double FP precision performance is less than 1/10 of single FP performance.

When precision is needed CUDA is much less useful, say you're running 10^10
simulations then with single FP precision you will only have a result accurate
to 5 significant figures.

------
helveticaman
From the title, I was expecting something about the onboard computer of the
acclaimed nerdmobile.

------
Zak
From what little I know of the subject, I can see this becoming the dominant
platform for game servers, especially MMO games within a few years.

------
Hates_
So what would be primary use of something like this?

~~~
zandorg
Either a map-reduce type operation, or a job which can be split into multiple
processes, and cobbling together all their data into a result at the end (or
periodically).

Optical Character Recognition could be done, by for instance, recognising each
letter on a different core.

~~~
palish
Assuming you know where each letter starts and ends. So each word maybe.

------
MikeCapone
New Folding@home super-cruncher?

------
FlorinAndrei
How many FPS in Quake?

(just kidding)

------
zandorg
Wow, a link I posted 2 days ago (also .co.uk) got frontpage'd. :-D

I'm gonna love one of these babies for my AI work!

