Hacker News new | past | comments | ask | show | jobs | submit login
Microsoft releases CNTK, its open source deep learning toolkit, on GitHub (microsoft.com)
546 points by fforflo on Jan 25, 2016 | hide | past | favorite | 56 comments

Only Microsoft can open source documentation on github in docx https://github.com/Microsoft/CNTK/tree/master/Documentation/...

Looks like they're moving it to GitHub wiki: https://github.com/Microsoft/CNTK/wiki/Config-file-overview

Nevertheless isn't docx an ECMA and ISO standard? [1]

[1]: https://en.wikipedia.org/wiki/Office_Open_XML

First TensorFlow, then Baidu's Warp-CTC, and now Microsofts CNTK. It is a very, very exciting time for open source machine learning indeed.

Don't forget SystemML[1] and Singa[2], as well as venerable systems like Mahout[3] and SAMOA[4].

[1]: http://systemml.apache.org/

[2]: https://singa.incubator.apache.org/

[3]: http://mahout.apache.org/

[4]: https://samoa.incubator.apache.org/

Also Microsoft's DMTK: http://www.dmtk.io/

Definitely !

I'm looking forward to a review of the major systems available by someone who has a clue (i.e. not me). The investment to learn one of these is great enough that it will be worth some time invested to understand what each can do.

Afaik CTC is just a loss function written for Torch, no? A few hundred lines of code vs. hundreds of thousands loc for TF or CNTK.

I think the guy was going for "big company something something Open Source", and not knowing what they actually were

"...In a very interesting admission, LeCun told The Next Platform ..." http://www.nextplatform.com/2015/08/25/a-glimpse-into-the-fu...

Yann LeCun states on Nov 2015 (29:00 min mark) GPU's short lived in Deep Learning / CNN / NN https://www.youtube.com/watch?v=R7TUU94ir38


Except that it appears to require MKL or ACML...



Defaulting to --with-buildtype=release

Cannot find a CPU math library.

Please specify --with-acml or --with-mkl with a path.

At least ACML seems to be open source now too.

"ACML End of Life Notice: We have transitioned our math libraries from a proprietary, closed source codebase (ACML) to open source solutions"[1]

[1] http://developer.amd.com/tools-and-sdks/archive/amd-core-mat...

And, if I recall, MKL is included in Microsoft R Open (previously, Revolution R Open). At least, that's what the instructions here[1] lead me to believe.

[1] https://mran.revolutionanalytics.com/documents/rro/installat...

Microsoft employee here. MKL libraries and headers will show up in CNTK soon. We are working with Intel on this.

Thanks for this, I personally have a license for MKL due to a licensed Intel C++ compiler and it really does a good job. I feel that Intel should do more to get MKL out in the open and more in use by the Open Source community. For instance I would love to have access to their FFT code so I could optimize it for my specific use case, but right now things are a black box. Very excited about CNTK btw, releasing a fully cooked multi machine distributed framework is just awesome.

That's good news. The requirement to purchase the Intel C compiler and tools like MKL before programming their processors efficiently is IMO one of the chief reasons why NVIDIA is kicking them to the curb here with 6.6 TFLOP $1,000 consumer GPUs that come with an enormous toolkit of free candy, including a DNN kernel library that plugs into all the important frameworks.

Compare and contrast with ~1 TFLOP ~$7,000 Xeon CPUs.

I have brought this up with multiple Intel engineers and for the most part they nod and agree. Then they tell me that there's no way Intel would ever start doing things like NVIDIA does here. And then I nod and tell them why I continue to bet on NVIDIA for the immediate future, sigh...

There's a joke that says those that hate Windows use Linux, and those that love unix use BSD.

I think this recent open source push by microsoft is depending on the truth of that statement. I think they're working under the assumption that the recent push of developers using Linux has more to do with those developers wanting to use a superior open source environment than preferring Unix/Linux as an operating system.

All of their open source stuff is really easy to use, IFF you're also using all their other open source software and Visual Studio.

If you're the type of developer who is only using Linux because it has the least path of resistance to using open source libraries and software, they're making good progress towards getting you back into a microsoft ecosystem.

If you're the type of developer who likes the free software philosophy, they're not trying to grab you, because they feel that's not a sizable portion of the people using Linux.

I think they're probably right.

I don't like Windows because I find it awkward to use, and the UI changes between major releases seem mostly gratuitous. I don't really hate it on general principles, i.e. because it's not free/open.

Don't forget IBM released SystemML

I'm a bit surprised they decided on a CAFFE-like declarative language for specifying neural net architectures[1], instead of offering high-level software components that enable easy composition right from within a scripting language, e.g., like Python in TensorFlow's case.[2]

Is there anyone from the Microsoft team here that can explain this decision?


[1] See examples on https://github.com/Microsoft/CNTK/wiki/CNTK-usage-overview

[2] See examples on https://www.tensorflow.org/versions/0.6.0/get_started/index....

According to their slides[1] it is planned:

Quoth: "Models can be described and modified with

• C++ code

• Network definition language (NDL) and model editing language (MEL)

• Brain Script (beta)

• Python and C# (planned)


[1]http://research.microsoft.com/en-us/um/people/dongyu/CNTK-Tu... thanks to @sharms

Hi MSFT employee here. Yes. High-level bindings are planned and are a high priority. We will document the planned interfaces on GitHub soon.

Thanks for letting us know. Any hand-wavy non-committal indication of how long that might be? Weeks/month, two months/more than three months?

Let's hope they decide to target F#.

I'm more interested in C# (probably target the latter with .net interop for the former)

It really seems though like this sort of work is more suited to the dataflow model F# promotes, though?

This is great news - I was looking at Tensorflow the other day, but on Windows / OSX it doesn't take advantage of my GPUs. For my desktop I was stuck as my new 3440x1440 monitor doesn't, for the time being, work with Xorg's intel driver.

Hopefully this is a viable alternative, I would love to see a online course in machine learning leveraging this. I found http://research.microsoft.com/en-us/um/people/dongyu/CNTK-Tu... on the homepage which looks well put together to start off

Apparently there is also a CNTK book available in pdf (~150 pages) at: http://research.microsoft.com/pubs/226641/CNTKBook-20160121....

> For my desktop I was stuck as my new 3440x1440 monitor doesn't, for the time being, work with Xorg's intel driver.

I think this is now fixed upstream


What gfx card are you using? If nvidia, you should install nvidia's proprietary drivers for linux -- it would be very surprising to me if they didn't support your monitor's resolution.

quoted performance numbers on multiple GPUs leave other frameworks in the dust. where's the catch?

Yes it looks like they have architected for low communication overhead and true parallelization when you have multiple GPUs. 1:4 gives 1:2 on Torch and Caffe, and 1:4 = 1:4 on CNTK.

No idea what the catch is :)

I was thinking the same thing, it almost looks a little too good to be true -- although it does kinda make sense given the focus on GPU-based clusters. I wonder how this compares to Baidu's warp-ctc [1]. They don't really seem to be the same thing, and maybe I'm missing something since I'm just starting to get into ML, but it seems to be conspicuously absent from this writeup.

[1] https://github.com/baidu-research/warp-ctc

1-bit SGD and insanely high minibatch sizes (8192) it would appear: which drastically reduces communication costs, making data-parallel computation scale.

If so, while very cool, that's not a general solution. Scaling batch sizes of 256 or lower would be the breakthrough. I suspect they get away with this because speech recognition has very sparse output targets (words/phonemes).

Too bad the code below isn't open-source because they got g2 instances with ~2.5 Gb/s interconnect to scale:


Yep. To elaborate: really big batch sizes can speed up training data throughput, but usually mean that less is learned from each example seen, so time-to-convergence might not necessarily improve (might even increase, if you take things too far).

Training data throughput isn't the right metric to compare -- look at time to convergence, or e.g. time to some target accuracy level on held-out data.

Warp-CTC implements one specific model (or at least, one specific loss function), it's not really a general framework in the same way as the other libraries mentioned.

looks like it isn't very scriptable so far, with the config files being made manually until "future releases" http://research.microsoft.com/en-us/um/people/dongyu/CNTK-Tu...

"You can also use Brain Script (beta) to specify the configuration or use Python and C# (in future releases) to directly instantiate related objects."

How many Deep Learning frameworks now? I think DL frameworks are coming up like JavaScript frameworks

Here's a (currently incomplete) list of Deep Learning frameworks : https://docs.google.com/spreadsheets/d/1XvGfi3TxWm7kuQ0DUqYr...

There are many :)

Missing Vowpal Wabbit: http://hunch.net/~vw/

I'm not sure if it's a deep learning framework. Perhaps they are in some ways complementary http://hunch.net/?p=2875548. Who knows, maybe CNTK is one big reduction.

An evaluation of the open source ML options - https://github.com/zer0n/deepframeworks

"...Worry about scaling; worry about vectorization; worry about data locality...." http://www.hpcwire.com/2016/01/21/conversation-james-reinder...



Nvidia Chief Scientist for Deep Learning was poached from Intel ICRI-CI Group https://il.linkedin.com/in/boris-ginsburg-2249545?trk=pub-pb...

http://www.cs.tau.ac.il/~wolf/deeplearningmeeting/speaker.ht... Look for quote: "...In a very interesting admission, LeCun told The Next Platform ..." http://www.nextplatform.com/2015/08/25/a-glimpse-into-the-fu...

Yann LeCun states on Nov 2015 (29:00 min mark) GPU's short lived in Deep Learning / CNN / NN https://www.youtube.com/watch?v=R7TUU94ir38


They claimed to be "the only public toolkit that can scale beyond single machine". Well, mxnet (https://github.com/dmlc/mxnet) can scale across multiple CPUs and GPUs.

Multiple machines implies scaling across a network or backbone - multiple logical computers. As I understand it at least.

Mxnet does. The commenter just left that part out. "The library is portable and lightweight, and is ready scales to multiple GPUs, and multiple machines." https://mxnet.readthedocs.org/en/latest/distributed_training...

Is this used on Azure Machine Learning? Do they use GPU based machines for it on Azure?

I do not think they do yet. There are some plans for GPU based VMs on Azure though

Project Philly is a Deep Learning platform that is underdevelopment in Microsoft

can you share more?

Project Philly is supposed to be a GPU platform in Azure that will enable us to run 100's of GPUs. Primary focus is Deep Learning and will be using Nvidia GPUs. CNTK, I think will be the default way to harness the Computing power.

I think that is all that has been revealed. I suspect that this will be announced within a couple of months and this cntk release has something to do with that

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact