Hacker News new | comments | ask | show | jobs | submit login

I wonder how effective this would be on a fleet of raspberry pis. With things like Resin.io, Weave, and Kubernetes, I wonder if it would be possible to create something like Seti@home for crowdsourced machine learning for all kinds of different applications. Many of us have spare raspberry pis laying around that could be utilized in a global network.



You'd probably have to scale to hundreds or thousands of pis to achieve the performance you could see from a single $100-200 GPU.


One person has managed to successfully build a non-accelerated version of TensorFlow for the RaspberryPi. It can use a network, but training will be painfully, PAINFULLY slow (as in months or years of wall-clock time).

Maybe at some point it will be viable, but not with the hardware and software as it is at the moment.


Notice that while 8 gpu are 8x as effective as 1, 16 gpu are only 15x and a hundred gpu doesn't even get you 70x speedup.

I doubt your idea would prove efficient.


At the very least, computation could be distributed at the hyperparameter tuning stage. Each node would be responsible for training on a different set of hyperparameters. The master node would coordinate the selection and distribution of new hyperparameter sets, amortizing the data-set distribution time.

It would also be possible to distribute computation of batches across nodes. Each node would compute the gradients on its batch, and the master would combine gradients and distribute new weights.

High-speed interconnects (e.g. Infiniband) are not needed in this scenario, and the bandwidth usage scales according to the size of the weights and/or gradients, not the data-set size.


Moving data would be a bottleneck for sure. Distributing the model itself with the state of the calculation and the required samples is just too much compared to the CPU power a raspberry pi can provide.


I would have liked more detail about the cluster. You would still need high speed interconnects (like Infiniband) between the nodes/machines so I don't think crowdsourcing would work.

This could be interesting if ported to an FPGA though. That could give you that power/performance tradeoff.


It depends on what kind of algorithms you are using for classification/ml. Some algorithms can be distributed easily, like recommendation engines, etc. Others, like say SVMs, are much harder to distribute.

If you checkout apache mahout you can get an idea of what is possible and what is not.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: