
High performance models in TensorFlow - mrry
https://www.tensorflow.org/performance/performance_models
======
jacquesm
How would you ever saturate 8 GPUs if your average large motherboard with 2
CPUs has only 40 PCIe lanes at its disposal?

Is there a motherboard out there that would allow the usage of 8 GPUs with
each of them allocated the full 16 lanes?

The best I've found so far would be 4 GPUs on one board, either with 16/8/8/8
or in a rare case 16/16/16/16 (but that requires 2 CPUs).

Besides the physical space, which again seems to be limited to 4 double-wide
GPUs on one motherboard.

~~~
deepnotderp
Two CPUs max out at 80 lanes iirc.

~~~
jacquesm
That's correct, two Xeon 2600's do 40 lanes each and that's the maximum you
can get right now. Which is why boards that are dual CPU usually require both
CPUs to be installed to have all PCIe slots available because the PCIe
interface is now embedded in the CPU rather than in the chipset.

~~~
frozenport
You can get 4 CPUs easy, and even 8 CPUs

~~~
jacquesm
40 lanes / CPU is the maximum afaik.

~~~
deepnotderp
Yup, 2x40.

It also depends on what you plan to do with the gpu. For example, models that
do most of the work on the gpu and rarely ingest data from the host, such as
large and slow models, will run just fine. On the other hand, attempting to
parallelize training across GPUs and nodes is a chore...

------
nyamhap
My bottleneck is still speed of reading data from json. I wonder whether I
should wait for features to be built out here or go down the path of writing a
custom data reader in C++

~~~
jacksnipe
If your data has a little extra structure that isn't shared by JSON in
general, you could probably get serious performance gains by rolling your own.

------
feelix
It appears that hacker news users upvote anything with machine learning or
tensorflow in it. This is merely a fifo queue implementation, which is not
particularly significant any way. Why it was submitted, much less upvoted, is
nonsensical to me.

~~~
mylittlethrow
One (not so charitable, I have to admit) interpretation is that there is a
sizable group of people interested in ML that doesn't have the more
traditional computer science background, and thus find this kind of text far
more appealing, xkcd's ten thousand and everything [0].

A more extreme version of this is situation is of the medical researcher who
"reinvented" on his own the trapezoidal rule for numerical integration[1].

[0]: [https://xkcd.com/1053/](https://xkcd.com/1053/) [1]:
[https://fliptomato.wordpress.com/2007/03/19/medical-
research...](https://fliptomato.wordpress.com/2007/03/19/medical-researcher-
discovers-integration-gets-75-citations/)

~~~
aub3bhat
No, yesterday TensorFlow team released benchmarks [1], as part of these
benchmarks they simultaneously published tips for ensuring high performance
[2] which they used in benchmarks. The reason for publishing the second link
is that due to copying memory between python and C++ (when using feed_dict)
TensorFlow can appear to be slower than competing frameworks, hence they felt
a need to point out methods used to ensure high performance by reducing I/O
latency.

Looking at [2] without having context of [1] can be confusing. But no one is
trying to pass this off as "innovation".

[1]
[https://www.tensorflow.org/performance/benchmarks](https://www.tensorflow.org/performance/benchmarks)
[2]
[https://www.tensorflow.org/performance/performance_models](https://www.tensorflow.org/performance/performance_models)

~~~
mylittlethrow
Oh, sorry if I was unclear, but I didn't intend to say that the TensorFlow
team is trying to pass this off as innovation. I was thinking about why some
readers might find this interesting and upvote even without the context.

