
Microsoft releases CNTK, its open source deep learning toolkit, on GitHub - fforflo
http://blogs.microsoft.com/next/2016/01/25/microsoft-releases-cntk-its-open-source-deep-learning-toolkit-on-github/
======
x-sam
Only Microsoft can open source documentation on github in _docx_
[https://github.com/Microsoft/CNTK/tree/master/Documentation/...](https://github.com/Microsoft/CNTK/tree/master/Documentation/Documents)

~~~
cr4zy
Looks like they're moving it to GitHub wiki:
[https://github.com/Microsoft/CNTK/wiki/Config-file-
overview](https://github.com/Microsoft/CNTK/wiki/Config-file-overview)

------
csvan
First TensorFlow, then Baidu's Warp-CTC, and now Microsofts CNTK. It is a
very, very exciting time for open source machine learning indeed.

~~~
varelse
Except that it appears to _require_ MKL or ACML...

Sigh...

./configure

Defaulting to --with-buildtype=release

Cannot find a CPU math library.

Please specify --with-acml or --with-mkl with a path.

~~~
pmelendez
At least ACML seems to be open source now too.

"ACML End of Life Notice: We have transitioned our math libraries from a
proprietary, closed source codebase (ACML) to open source solutions"[1]

[1] [http://developer.amd.com/tools-and-sdks/archive/amd-core-
mat...](http://developer.amd.com/tools-and-sdks/archive/amd-core-math-library-
acml/acml-downloads-resources/)

~~~
snoman
And, if I recall, MKL is included in Microsoft R Open (previously, Revolution
R Open). At least, that's what the instructions here[1] lead me to believe.

[1]
[https://mran.revolutionanalytics.com/documents/rro/installat...](https://mran.revolutionanalytics.com/documents/rro/installation/)

~~~
boz_msft
Microsoft employee here. MKL libraries and headers will show up in CNTK soon.
We are working with Intel on this.

~~~
jhartmann
Thanks for this, I personally have a license for MKL due to a licensed Intel
C++ compiler and it really does a good job. I feel that Intel should do more
to get MKL out in the open and more in use by the Open Source community. For
instance I would love to have access to their FFT code so I could optimize it
for my specific use case, but right now things are a black box. Very excited
about CNTK btw, releasing a fully cooked multi machine distributed framework
is just awesome.

------
cs702
I'm a bit surprised they decided on a CAFFE-like declarative language for
specifying neural net architectures[1], instead of offering high-level
software components that enable easy composition right from within a scripting
language, e.g., like Python in TensorFlow's case.[2]

Is there anyone from the Microsoft team here that can explain this decision?

\--

[1] See examples on [https://github.com/Microsoft/CNTK/wiki/CNTK-usage-
overview](https://github.com/Microsoft/CNTK/wiki/CNTK-usage-overview)

[2] See examples on
[https://www.tensorflow.org/versions/0.6.0/get_started/index....](https://www.tensorflow.org/versions/0.6.0/get_started/index.html)

~~~
singularity2001
According to their slides[1] it is planned:

Quoth: "Models can be described and modified with

• C++ code

• Network definition language (NDL) and model editing language (MEL)

• Brain Script (beta)

• Python and C# (planned)

"

[1][http://research.microsoft.com/en-us/um/people/dongyu/CNTK-
Tu...](http://research.microsoft.com/en-us/um/people/dongyu/CNTK-Tutorial-
NIPS2015.pdf) thanks to @sharms

~~~
boz_msft
Hi MSFT employee here. Yes. High-level bindings are planned and are a high
priority. We will document the planned interfaces on GitHub soon.

~~~
canberroid
Thanks for letting us know. Any hand-wavy non-committal indication of how long
that might be? Weeks/month, two months/more than three months?

------
sharms
This is great news - I was looking at Tensorflow the other day, but on Windows
/ OSX it doesn't take advantage of my GPUs. For my desktop I was stuck as my
new 3440x1440 monitor doesn't, for the time being, work with Xorg's intel
driver.

Hopefully this is a viable alternative, I would love to see a online course in
machine learning leveraging this. I found [http://research.microsoft.com/en-
us/um/people/dongyu/CNTK-Tu...](http://research.microsoft.com/en-
us/um/people/dongyu/CNTK-Tutorial-NIPS2015.pdf) on the homepage which looks
well put together to start off

~~~
hpenedones
Apparently there is also a CNTK book available in pdf (~150 pages) at:
[http://research.microsoft.com/pubs/226641/CNTKBook-20160121....](http://research.microsoft.com/pubs/226641/CNTKBook-20160121.pdf)

------
baq
quoted performance numbers on multiple GPUs leave other frameworks in the
dust. where's the catch?

~~~
magicmu
I was thinking the same thing, it almost looks a little too good to be true --
although it does kinda make sense given the focus on GPU-based clusters. I
wonder how this compares to Baidu's warp-ctc [1]. They don't _really_ seem to
be the same thing, and maybe I'm missing something since I'm just starting to
get into ML, but it seems to be conspicuously absent from this writeup.

[1] [https://github.com/baidu-research/warp-ctc](https://github.com/baidu-
research/warp-ctc)

~~~
varelse
1-bit SGD and insanely high minibatch sizes (8192) it would appear: which
drastically reduces communication costs, making data-parallel computation
scale.

If so, while very cool, that's not a general solution. Scaling batch sizes of
256 or lower would be the breakthrough. I suspect they get away with this
because speech recognition has very sparse output targets (words/phonemes).

Too bad the code below isn't open-source because they got g2 instances with
~2.5 Gb/s interconnect to scale:

[http://www.nikkostrom.com/publications/interspeech2015/strom...](http://www.nikkostrom.com/publications/interspeech2015/strom_interspeech2015.pdf)

~~~
mjw
Yep. To elaborate: really big batch sizes can speed up training data
throughput, but usually mean that less is learned from each example seen, so
time-to-convergence might not necessarily improve (might even increase, if you
take things too far).

Training data throughput isn't the right metric to compare -- look at time to
convergence, or e.g. time to some target accuracy level on held-out data.

------
shtangun
How many Deep Learning frameworks now? I think DL frameworks are coming up
like JavaScript frameworks

~~~
matsiyatzy
Here's a (currently incomplete) list of Deep Learning frameworks :
[https://docs.google.com/spreadsheets/d/1XvGfi3TxWm7kuQ0DUqYr...](https://docs.google.com/spreadsheets/d/1XvGfi3TxWm7kuQ0DUqYrO6cxva196UJDxKTxccFqb9U/edit#gid=0)

There are many :)

~~~
aristus
Missing Vowpal Wabbit: [http://hunch.net/~vw/](http://hunch.net/~vw/)

~~~
charlescearl
I'm not sure if it's a deep learning framework. Perhaps they are in some ways
complementary [http://hunch.net/?p=2875548](http://hunch.net/?p=2875548). Who
knows, maybe CNTK is one big reduction.

------
mzahir
An evaluation of the open source ML options -
[https://github.com/zer0n/deepframeworks](https://github.com/zer0n/deepframeworks)

------
FLF_HOY
"...Worry about scaling; worry about vectorization; worry about data
locality...." [http://www.hpcwire.com/2016/01/21/conversation-james-
reinder...](http://www.hpcwire.com/2016/01/21/conversation-james-reinders/)

[http://icri-ci.technion.ac.il/events-2/presentation-files-ic...](http://icri-
ci.technion.ac.il/events-2/presentation-files-icri-ci-retreat-5-6-may-2015/)

[http://icri-ci.technion.ac.il/files/2015/05/00-Boris-Ginzbur...](http://icri-
ci.technion.ac.il/files/2015/05/00-Boris-Ginzburg-1505051.pdf)

Nvidia Chief Scientist for Deep Learning was poached from Intel ICRI-CI Group
[https://il.linkedin.com/in/boris-ginsburg-2249545?trk=pub-
pb...](https://il.linkedin.com/in/boris-ginsburg-2249545?trk=pub-pbmap)

[http://www.cs.tau.ac.il/~wolf/deeplearningmeeting/speaker.ht...](http://www.cs.tau.ac.il/~wolf/deeplearningmeeting/speaker.html)
Look for quote: "...In a very interesting admission, LeCun told The Next
Platform ..." [http://www.nextplatform.com/2015/08/25/a-glimpse-into-the-
fu...](http://www.nextplatform.com/2015/08/25/a-glimpse-into-the-future-of-
deep-learning-hardware/)

Yann LeCun states on Nov 2015 (29:00 min mark) GPU's short lived in Deep
Learning / CNN / NN
[https://www.youtube.com/watch?v=R7TUU94ir38](https://www.youtube.com/watch?v=R7TUU94ir38)

[https://www.altera.com/en_US/pdfs/literature/solution-
sheets...](https://www.altera.com/en_US/pdfs/literature/solution-
sheets/efficient_neural_networks.pdf)

------
tianlins
They claimed to be "the only public toolkit that can scale beyond single
machine". Well, mxnet
([https://github.com/dmlc/mxnet](https://github.com/dmlc/mxnet)) can scale
across multiple CPUs and GPUs.

~~~
santoshalper
Multiple machines implies scaling across a network or backbone - multiple
logical computers. As I understand it at least.

~~~
dgacmu
Mxnet does. The commenter just left that part out. "The library is portable
and lightweight, and is ready scales to multiple GPUs, and multiple machines."
[https://mxnet.readthedocs.org/en/latest/distributed_training...](https://mxnet.readthedocs.org/en/latest/distributed_training.html)

------
doczoidberg
Is this used on Azure Machine Learning? Do they use GPU based machines for it
on Azure?

~~~
shtangun
Project Philly is a Deep Learning platform that is underdevelopment in
Microsoft

~~~
doczoidberg
can you share more?

~~~
shtangun
Project Philly is supposed to be a GPU platform in Azure that will enable us
to run 100's of GPUs. Primary focus is Deep Learning and will be using Nvidia
GPUs. CNTK, I think will be the default way to harness the Computing power.

I think that is all that has been revealed. I suspect that this will be
announced within a couple of months and this cntk release has something to do
with that

------
blazespin
Can't wait for sousmith's take

[https://github.com/soumith/convnet-
benchmarks/issues](https://github.com/soumith/convnet-benchmarks/issues)

------
FLF_HOY
[https://software.intel.com/en-
us/blogs/2013/avx-512-instruct...](https://software.intel.com/en-
us/blogs/2013/avx-512-instructions)

"...Worry about scaling; worry about vectorization; worry about data
locality...." [http://www.hpcwire.com/2016/01/21/conversation-james-
reinder...](http://www.hpcwire.com/2016/01/21/conversation-james-reinder..).
[http://www.hpcwire.com/2016/01/21/conversation-james-
reinder...](http://www.hpcwire.com/2016/01/21/conversation-james-reinders/)

Nvidia Chief Scientist for Deep Learning was poached from Intel
[https://il.linkedin.com/in/boris-ginsburg-2249545?trk=pub-
pb...](https://il.linkedin.com/in/boris-ginsburg-2249545?trk=pub-pbmap)

[http://www.cs.tau.ac.il/~wolf/deeplearningmeeting/speaker.ht...](http://www.cs.tau.ac.il/~wolf/deeplearningmeeting/speaker.html)

AVX-512 instructions [https://software.intel.com/en-
us/blogs/2013/avx-512-instruct...](https://software.intel.com/en-
us/blogs/2013/avx-512-instructions)

Look for quote: "...In a very interesting admission, LeCun told The Next
Platform ..." [http://www.nextplatform.com/2015/08/25/a-glimpse-into-the-
fu...](http://www.nextplatform.com/2015/08/25/a-glimpse-into-the-future-of-
deep-learning-hardware/)

Yann LeCun states on Nov 2015 (29:00 min mark) GPU's short lived in Deep
Learning / CNN / NN
[https://www.youtube.com/watch?v=R7TUU94ir38](https://www.youtube.com/watch?v=R7TUU94ir38)

[https://www.altera.com/en_US/pdfs/literature/solution-
sheets...](https://www.altera.com/en_US/pdfs/literature/solution-
sheets/efficient_neural_networks.pdf)

[http://icri-ci.technion.ac.il/events-2/presentation-files-ic...](http://icri-
ci.technion.ac.il/events-2/presentation-files-icri-ci-retreat-5-6-may-2015/)

[http://icri-ci.technion.ac.il/files/2015/05/00-Boris-Ginzbur...](http://icri-
ci.technion.ac.il/files/2015/05/00-Boris-Ginzburg-1505051.pdf)

