
Cognitive Toolkit beta for deep learning advances - mrry
http://blogs.microsoft.com/next/2016/10/25/microsoft-releases-beta-microsoft-cognitive-toolkit-deep-learning-advances/
======
mrdrozdov
CNTK from Microsoft Research has been rebranded as Microsoft Cognitive
Toolkit.

Here's the github repo:
[https://github.com/Microsoft/CNTK](https://github.com/Microsoft/CNTK)

CNTK homepage ([http://www.cntk.ai/](http://www.cntk.ai/)) now redirects to
[https://www.microsoft.com/en-
us/research/product/cognitive-t...](https://www.microsoft.com/en-
us/research/product/cognitive-toolkit/)

> CNTK ([http://www.cntk.ai/](http://www.cntk.ai/)), the Computational Network
> Toolkit by Microsoft Research, is a unified deep-learning toolkit that
> describes neural networks as a series of computational steps via a directed
> graph. In this directed graph, leaf nodes represent input values or network
> parameters, while other nodes represent matrix operations upon their inputs.
> CNTK allows to easily realize and combine popular model types such as feed-
> forward DNNs, convolutional nets (CNNs), and recurrent networks
> (RNNs/LSTMs). It implements stochastic gradient descent (SGD, error
> backpropagation) learning with automatic differentiation and parallelization
> across multiple GPUs and servers. CNTK has been available under an open-
> source license since April 2015. It is our hope that the community will take
> advantage of CNTK to share ideas more quickly through the exchange of open
> source working code.

~~~
GrinningFool
> CNTK allows to OT pet peeve: the tech culture's distaste for pronouns has
> made this all too common. It doesn't even have to be a pronoun: "users",
> "people", "clients", etc all work -- but without it, the author is referring
> to a thing is never specified. That offends me as a technical person far
> more than pronouns ever will.

In particular, project descriptions and readmes are rife with "X allows to".

------
Blackthorn
Finally Python bindings! I wanted to use this because Tensorflow is impossible
on Windows, but the lack of programming language bindings made it a non-
starter. Glad this is finally here.

------
Eridrus
Besides the rebranding, the Python bindings seem relatively (2 months) new.
Though the docs seem to imply it is pretty high level compared to other
frameworks [https://www.cntk.ai/pythondocs/](https://www.cntk.ai/pythondocs/)

One interesting note is that there seem to be plans to create a Keras backend
that lets you run Keras models on CNTK:
[https://github.com/Microsoft/CNTK/issues/797](https://github.com/Microsoft/CNTK/issues/797)

~~~
derpapst
CNTK contributor here - Keras indeed is pretty high on our list of things to
cover soon. But then, all our code is out there on GitHub and we welcome PRs
:-)

------
reckel
More information about Cognitive Toolkit is available here
[https://www.microsoft.com/en-
us/research/product/cognitive-t...](https://www.microsoft.com/en-
us/research/product/cognitive-toolkit/)

------
Ph0X
How does it compare to TensorFlow?

~~~
ajwald
CNTK has significantly higher performance with one or more machines; great
multi-gpu scalability. Can train harder on bigger datasets given your
resources.

~~~
corysama
Can someone clarify this? In my head "one or more machines" means "always".
Does CNTK generally have higher perf even on a single machine? Or is ajwald
trying to say it is better at scaling to multiple machines.

~~~
cmarschner
CNTK user & contributor here. CNTK overall has very low framework overhead and
has tensors with dynamic axes as first-level citizens. This means that
sequences can be expressed without needing to do padding, sorting of the input
data, or any other workarounds, and can be packed automatically by the toolkit
in an optimal way. In particular, while it is laying out the rectangular
structure it uses to traverse multiple RNNs of a minibatch in parallel, it
fits shorter sequences into the holes and can reset the RNN state for these
sequences while it is traversing this structure. This makes CNTK especially
suitable for expressing RNN models (for CNNs many of the calls are just
forwarded to CuDNN, so the difference might be much lower).

As for distribution, a) it has an extremely simple way to run data parallelism
(for CNTK 1 it was just using MPI and starting the worker with a few extra
options. I think CNTK 2 will add this in a week or so to the Python bindings),
b) it has 1-bit SGD and more recently BlockMomentum, which are just dead
simple methods to use for distributing the gradients, and they just work. All
of these are open source (though 1-Bit SGD and BlockMomentum are patented).

~~~
dcl
An algorithm for updating parameters is patented?

------
akssri
How general/efficient are these AD systems ?

New ops in tensorflow seem to be oriented towards forward-mode AD rather than
reverse-mode (for which one needs a pull back op on a dual).

------
xdh168
The fastest for distributed deep learning workloads... The proven Toolkit for
Microsoft production system... Now Python support is native! Cognitive Toolkit
rocks!!!

