I'm looking forward to a review of the major systems available by someone who has a clue (i.e. not me). The investment to learn one of these is great enough that it will be worth some time invested to understand what each can do.
And, if I recall, MKL is included in Microsoft R Open (previously, Revolution R Open). At least, that's what the instructions here[1] lead me to believe.
Thanks for this, I personally have a license for MKL due to a licensed Intel C++ compiler and it really does a good job. I feel that Intel should do more to get MKL out in the open and more in use by the Open Source community. For instance I would love to have access to their FFT code so I could optimize it for my specific use case, but right now things are a black box. Very excited about CNTK btw, releasing a fully cooked multi machine distributed framework is just awesome.
That's good news. The requirement to purchase the Intel C compiler and tools like MKL before programming their processors efficiently is IMO one of the chief reasons why NVIDIA is kicking them to the curb here with 6.6 TFLOP $1,000 consumer GPUs that come with an enormous toolkit of free candy, including a DNN kernel library that plugs into all the important frameworks.
Compare and contrast with ~1 TFLOP ~$7,000 Xeon CPUs.
I have brought this up with multiple Intel engineers and for the most part they nod and agree. Then they tell me that there's no way Intel would ever start doing things like NVIDIA does here. And then I nod and tell them why I continue to bet on NVIDIA for the immediate future, sigh...
There's a joke that says those that hate Windows use Linux, and those that love unix use BSD.
I think this recent open source push by microsoft is depending on the truth of that statement. I think they're working under the assumption that the recent push of developers using Linux has more to do with those developers wanting to use a superior open source environment than preferring Unix/Linux as an operating system.
All of their open source stuff is really easy to use, IFF you're also using all their other open source software and Visual Studio.
If you're the type of developer who is only using Linux because it has the least path of resistance to using open source libraries and software, they're making good progress towards getting you back into a microsoft ecosystem.
If you're the type of developer who likes the free software philosophy, they're not trying to grab you, because they feel that's not a sizable portion of the people using Linux.
I don't like Windows because I find it awkward to use, and the UI changes between major releases seem mostly gratuitous. I don't really hate it on general principles, i.e. because it's not free/open.
I'm a bit surprised they decided on a CAFFE-like declarative language for specifying neural net architectures[1], instead of offering high-level software components that enable easy composition right from within a scripting language, e.g., like Python in TensorFlow's case.[2]
Is there anyone from the Microsoft team here that can explain this decision?
This is great news - I was looking at Tensorflow the other day, but on Windows / OSX it doesn't take advantage of my GPUs. For my desktop I was stuck as my new 3440x1440 monitor doesn't, for the time being, work with Xorg's intel driver.
What gfx card are you using? If nvidia, you should install nvidia's proprietary drivers for linux -- it would be very surprising to me if they didn't support your monitor's resolution.
Yes it looks like they have architected for low communication overhead and true parallelization when you have multiple GPUs. 1:4 gives 1:2 on Torch and Caffe, and 1:4 = 1:4 on CNTK.
I was thinking the same thing, it almost looks a little too good to be true -- although it does kinda make sense given the focus on GPU-based clusters. I wonder how this compares to Baidu's warp-ctc [1]. They don't really seem to be the same thing, and maybe I'm missing something since I'm just starting to get into ML, but it seems to be conspicuously absent from this writeup.
1-bit SGD and insanely high minibatch sizes (8192) it would appear: which drastically reduces communication costs, making data-parallel computation scale.
If so, while very cool, that's not a general solution. Scaling batch sizes of 256 or lower would be the breakthrough. I suspect they get away with this because speech recognition has very sparse output targets (words/phonemes).
Too bad the code below isn't open-source because they got g2 instances with ~2.5 Gb/s interconnect to scale:
Yep. To elaborate: really big batch sizes can speed up training data throughput, but usually mean that less is learned from each example seen, so time-to-convergence might not necessarily improve (might even increase, if you take things too far).
Training data throughput isn't the right metric to compare -- look at time to convergence, or e.g. time to some target accuracy level on held-out data.
Warp-CTC implements one specific model (or at least, one specific loss function), it's not really a general framework in the same way as the other libraries mentioned.
I'm not sure if it's a deep learning framework. Perhaps they are in some ways complementary http://hunch.net/?p=2875548. Who knows, maybe CNTK is one big reduction.
They claimed to be "the only public toolkit that can scale beyond single machine". Well, mxnet (https://github.com/dmlc/mxnet) can scale across multiple CPUs and GPUs.
Project Philly is supposed to be a GPU platform in Azure that will enable us to run 100's of GPUs. Primary focus is Deep Learning and will be using Nvidia GPUs. CNTK, I think will be the default way to harness the Computing power.
I think that is all that has been revealed. I suspect that this will be announced within a couple of months and this cntk release has something to do with that