

Should I use infiniband on my Hadoop cluster? - sgt101

All my cool friends say infiniband is a must, but I think that it&#x27;s too expensive. Who is right, why should I believe them?
======
wmf
Isn't the point of Hadoop to use the network as little as possible?

~~~
sgt101
Workloads have to use the network to do the following :

1\. Shift data onto HDFS (obviously), this is rare though 2\. Write results
after reduce (or you can't read them) when you are doing ETL this can be
significant; you are sorting out an image for a production system and this is
non-trivial abouts of data (100's GB at least) 3\. Shuffle; this is the worry
(for me) MR jobs can shuffle a lot of data around pre reduce.

