
Google Compute Engine MPI Latency - gpoort
http://blog.rescale.com/mpi-latency-on-google-compute-engine/
======
montecarl
This isn't that interesting or surprising. 100-200 microseconds is the latency
of all ethernet that I have ever seen. Infiniband or other high performance
networks can achieve about 10-100x lower latency, but are very expensive.
Infiniband switches and cards can be double the cost of a cluster.

~~~
sargun
One of the things that Google is also working on is PCIe switching, and using
that to get rid of having to the ethernet encapsulation and conversion. This
allows for significantly lower latencies <5 μs.

~~~
montecarl
That sounds awesome. Do you have any links to describe this research? I would
be interested in the technology if it stands to lower the cost of low latency
networks.

------
codemac
To me this is where the "software defined networking" type of virtualization
can really make an impact.

The network performance of a known cluster of virtualized instances could be
extremely quick if you just lie, and _say_ your packet went through a network,
when really you just pass a pointer in the hypervisor..

I assume this has already been done, but at almost 200 microseconds, you know
it hasn't been done in these experiments.

~~~
montecarl
That is only the case for networking inside of a single physical machine. Most
HPC MPI use cases span many machines, for which this is typical latency of
ethernet.

~~~
codemac
In cases where I've used OpenMPI it was spanning machines as well as being
multiple processes on the same box. The goal was making that interchangeable
(in my use case).

I imagine google's compute engine isn't all on separate machines, but utilizes
VMs heavily.. although it's probably a bad idea to put one customer's VM's on
all the same box.

That's a long way of saying "of course you're right, I guess my thought
doesn't contribute as much as I thought" :)

~~~
montecarl
You are also right of course. Intra-machine latency is quite important. Many
problems can be decomposed in to smaller parallel parts that can be done per
machine.

