Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Hoard: Distributed Data Caching System to Accelerate Deep Learning Training (arxiv.org)
38 points by godelmachine on Dec 11, 2018 | hide | past | favorite | 3 comments


This doesn't seem like they actually provided and/or did anything new here. (Hard to tell cuz there is also no obvious code. Is it just cachefsd?)

About a year and a half ago I did something similar with essentially the exact same hardware.

BeeGFS filesystem across x3 IBM power8 Minksy's with NVMe drives for distributing data quickly, in parallel, and effectively across multiple machines. Really helped with the scripting of a batch job as the file system was unified across the platform and had some good in memory caching.

I have the numbers somewhere but I felt like theirs aren't really that great considering 40gbps mellanox & NVMe. Should really be able to get quiet a bit better throughput. Also ran thousands of jobs and many Tb of data. Not just 24 over gigabytes.

(FWIW I didn't thoroughly read the actual article that thoroughly.)


Has the code for this been released at all? I've gone through the paper and can't seem to find it.


[flagged]





Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: