
Building a Distributed Hypervisor on FreeBSD (2015) [video] - xj9
https://www.youtube.com/watch?v=f-ug6B6ycng
======
mtanski
This is interesting but short on details. Is it essentially creating a virtual
many core / NUMA machine? If that's so I wonder what the overhead for such
things as emulating x86 cache coherency.

~~~
tg2
Furthermore, fundamental things such as the fact CPU interconnects are in the
160+Gbps range compared to (in their example) 10gbps network prevents this
from scaling as a single system would. Even in a single dual socket system it
is not uncommon to pin (via numactl) processes to a single CPU in order to get
full performance out of applications and prevent saturating this interconnect.

Accessing memory from a remote core over the network is seriously handicapped
in this respect, and purpose built cluster systems such as infiniband rdma
with bandwidth in the 100gbps realm still have issues with this latency. A
single stick of previous-gen ddr3-1600 can exceed that bandwidth, with 15x
faster access time.

To give an example, a very low latency (non Ethernet) network is around
1300μs, local dram (from the same socket) is 60ns. L3 cache on the same socket
is 15ns.

You can be smart about pinning memory allocation to the local CPU core
requesting it, and minimizing thread migration to another host, but there is
no magic bullet to getting past these theoretical limits. Accessing packets
arriving on a network card in host A from host B's core would halve the
network bandwidth unless there is a dedicated network for the clustering.

That being said, virtualization isn't perfect either and the overhead can be
substantial as we've seen when comparing to containers on bare metal.

I'd really like to take this for a test drive and benchmark it with some R
jobs.

Even if it's slower, the value of not having to use cluster-aware toolkits is
valuable to many, not to mention simplicity of operation.

I wish they would release an open source version before getting sucked up as a
stop-loss by Intel.

~~~
mtanski
I imagine that a untuned guest operating scheduler would wreck havoc in this
scenario.

