
Hoard: Distributed Data Caching System to Accelerate Deep Learning Training - godelmachine
https://arxiv.org/abs/1812.00669
======
ianmunoz
This doesn't seem like they actually provided and/or did anything new here.
(Hard to tell cuz there is also no obvious code. Is it just cachefsd?)

About a year and a half ago I did something similar with essentially the exact
same hardware.

BeeGFS filesystem across x3 IBM power8 Minksy's with NVMe drives for
distributing data quickly, in parallel, and effectively across multiple
machines. Really helped with the scripting of a batch job as the file system
was unified across the platform and had some good in memory caching.

I have the numbers somewhere but I felt like theirs aren't really that great
considering 40gbps mellanox & NVMe. Should really be able to get quiet a bit
better throughput. Also ran thousands of jobs and many Tb of data. Not just 24
over gigabytes.

(FWIW I didn't thoroughly read the actual article that thoroughly.)

------
tedivm
Has the code for this been released at all? I've gone through the paper and
can't seem to find it.

