Hacker News new | past | comments | ask | show | jobs | submit login

Thanks for the clarification. gRPC is slow. We have in-house experiments showing on RDMA-capable networks an optimized implementation can achieve significant speed up over gRPC. And I bet Google's internal version is even faster.

MxNet has a highly efficient network stack that's open source; Caffe2 uses gloo, which is open source; CNTK primarily uses Open MPI, NCCL and soon NCCL2.0. I think it's fair that Google also open source the internal network stack because it is the key to scaling.

Most convolutional networks are not a stress test for scaling because the model size/computation ratio is too low. Use a speech model that has many fully connected layers, or VGG16/19, the communication cost will dominate, and that's when CNTK's 1-bit SGD and Block Momentum really shine.

Again, I work at Microsoft.




Publish those results? It'd be very interesting to see. And, it sounds like you think there are benchmarks missing from the existing common set of things people are measuring -- what's a very specific network you'd like to see added to the mix? VGG16 doesn't fall into my radar of "modern and applicable" in the days of ResNet.

Using NCCL is great; TF now supports it, as of about a month and a half ago (though I don't know how tightly integrated it is): https://github.com/tensorflow/tensorflow/blob/master/tensorf...

From the benchmarks available, and not knowing what your in-house experiments show, I don't believe that the "internal network stack" is key to scaling. The scalability numbers shown on tensorflow.org/performance are very reasonable: From 902 images/sec to 1783 (1.97x) going from 32->64 K80 GPUs on Amazon for Inception v3, and 565->981 (1.7x) for ResNet-512. I'd love to be proved wrong.

That 1.7x scaling on ResNet-512 would be a great point of comparison, for example. From my student Hyeontaek's results, I actually suspect that there are scheduling improvements that could make up some of that difference, not networking improvements.

As I'm sure you know, of course, and are just fishing for, the reason that code links against gRPC externally is because trying to extract Google's internal networking code from the full internal software codebase would be ridiculous. I think it's far more likely to see the other direction, with everything settling on gRPC -- gRPC is actually newer, and in general, more feature-ful, than Stubby: https://cloudplatform.googleblog.com/2016/08/gRPC-a-true-Int...




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: