Hacker News new | comments | show | ask | jobs | submit login
Ask HN: Does it matter which deep learning framework I use?
84 points by canterburry on Mar 28, 2017 | hide | past | web | favorite | 21 comments
On one hand I fully understand there are differences in language and specific features available depending on framework out there.

On the other hand, if a framework "correctly" implements the underlying statistical theory/principals of deep learning, shouldn't I get the same results regardless of whichever framework I use?

If not, how would I know which framework produces "more correct" interpretations of the underlying data?

It all depends on what your goals are. If you just want to train a neural network on a dataset you have and you aren't all that interested in going into the details of how the NN works or is trained, Keras is fine. It has a nice high-level interface and the backend is either in Theano or Tensorflow (your choice).

If your problem is more complicated and you want to use some unique architecture, you'll have to use one of the more low-level frameworks. I would recommend Tensorflow just on the basis of its popularity (you're more likely to find people who have run into the some problems as you). But Theano, Torch, and MXNet are probably pretty much equivalent in terms of speed and ease of use. I hear Caffe has a steeper learning curve.

If you're really doing something fancy, then you'll have to look into more detail. Torch and MXNet have the advantage that you can adaptively change your computation graph based on the data, but you'd probably have to be pretty far into deep learning research before something like that is useful. Tensorflow Fold does something similar, but I'm not sure how well integrated it is with the rest of Tensorflow (I've never used it).

You might also take a look at this:


It's a little out of date now, but it'll get you started.

Some of these frameworks are more general than others (e.g., Tensorflow is more general than Keras), so you can specify architectures in some that you can't in others. But as long as you can specify the architecture in a particular framework, you'll be able to get a working model. Your choice of framework just comes down to whatever one is easiest to work with for the problem at hand.

For my particular case (OCR for limited set of strings) I found using caffe much easier than tensorflow (even when using tf-learn).

You don't necessarily have to be doing much "fancy" to run into problems with Tensorflow/Keras/Theano. Almost anything with NLP is pretty hard to implement correctly with Keras.

As with most programming questions, the answer is a combination of yes and no, and that depends on the level of abstraction provided by the framework.

I started off using Caffe/Torch and currently use mostly Keras for most of my deep learning related experiments. With a more base level framework, I actually could tinker with different moving components to understand why they are used as they are, while with a higher level abstraction, I can concentrate on the problem at hand, knowing that most basic abstractions (or building blocks) are well developed already and have more or less been battle tested by people far smarter than me.

And of course, when it comes to pure speed numbers and architecture for scaling/deployment, these frameworks do vary among themselves: https://github.com/zer0n/deepframeworks/blob/master/README.m...

> On the other hand, if a framework "correctly" implements the underlying statistical theory/principals of deep learning, shouldn't I get the same results regardless of whichever framework I use?

That is about right provided that 1) you use the same initial values and hyper-parameters, and 2) you can implement the same network with all frameworks. Issue 2) is complicated. Some networks are easy to implement in one framework can be hard or even impossible in another framework. Here "hard" can mean two opposite things: lack of flexibility (which disallows you to construct a certain topology) or excessive flexibility in the framework (which takes too many steps and care to construct a topology). Which framework to use depends on your goal and skill level. For starters, keras is usually easier.

Might want to look for video to Feb 22 lecture comparing caffe, theano, torch, TF: http://cs231n.stanford.edu/syllabus.html. It was taken down from youtube because no closed captions but i'm sure it's archived multiple places

What? Youtube requires you to add captions manually, or the video is taken down? That's brutal! I thought YT had automatic captioning. Wow, lame.



So they took the easy way out and removed the videos.

Thanks for this. While I am on Java, I know Python seems to rule the deep learning space.

Hi: VERY biased author of http://deeplearning4j.org here.

One path we recommend java developers take who are new to deep learning is to take the fast.ai class: http://course.fast.ai

From there, map what you learn to our model import in keras: https://deeplearning4j.org/model-import-keras

That will more or less get you up and running.

We also have my oreilly book out for early release: http://shop.oreilly.com/product/0636920035343.do

And this will probably answer your next question: https://culurciello.github.io/tech/2016/06/04/nets.html

I was using Keras pretty heavily, but I have switched over to fully using TensorFlow. Once you build a decent library of boiler plate, Tensorflow becomes very usable. Packages like prettytensor may even surpass Keras in terms of usability. Also I found the Keras documentation to be quite lacking, and ended up reading the source code much more often than I would like to.

I ended up bumping into the edges of the Keras API too much, and coming up with hacky type solutions to do things that are actually quite simple if you just do them in TensorFlow yourself.

Theano and Torch are also great options, but I think I will be sticking with TensorFlow, simply because I trust that Google will be putting solid effort behind it for years to come.

It does not matter, and there's not a lot to get wrong in deep learning.

The math involved is pretty simple, in terms of the calculations that have to be performed.

Where frameworks differ is in things like speed and ease of use. Use the one that is the easiest for you. Tensorflow is certainly going to be the most popular for the foreseeable future.

I am learning now TensorFlow and I have no knowledge of the other possible frameworks.

What surprise me the most is that, at least tf, is almost declarative as framework.

I needed to add some random noise to a point in a multidimensional space so to generate other n points, close to first one.

In python I would loop through n, each time I would add some noise to the initial point and then I would push it into a list or whatever structure, a list compression.

In tf I am "stacking" n times the original point so to obtain n times that same point, then I am generating n random noise and finally I am adding the two.

The second solution more elegant in my opinion but require an important mental shift.

If the other frameworks are somehow similar at tf your biggest hurdle will be this kind of mental shift, just pick one.

In many frameworks the low level mathematics are delegated to the installed implementation of BLAS[1] anyway, so I'd expect most of the really popular frameworks to get the same answers from that perspective. Other than that, my feeling is that if you stick to the well known / popular frameworks, you should be fine. If any one of them had a glaring deficiency, I'm pretty sure it would have been noted and widely disseminated by now.

[1]: https://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprogra...

there are few things you need to consider.

first is language. need to choose a familiar language.

second is feature set. they don't implement the same set of operators. but if you only want to use the common ones, most frameworks should have them.

the third is their ability to train in parallel. for example, does a framework support multiple machines? or just single machine, multiple gpus? Performance is also a factor. Do they support simd/gpu? do they generate intermittent code and compile it into cpp/cuda? or they just call into gpu libraries? Do you want to support mobile devices?

the fourth difference is the level of abstraction. if a framework is very low level, users need to understand many fundamentals of deep learning. but on the other hand, if you want to extend the framework to add new operators, a low level framework is easier to hack.

a high level framework lets you to write less code, but it hides details and makes it harder to hack.

the last thing that can be considered is the difference between dynamic/static framework. dynet and chainer and tensorflow with something called "fold" are dynamic frameworks. I was told they are more flexible. but I don't understand the details.

Depends on your goal. Ultimately, the three tenets are flexibility, speed and speed of development. All frameworks make tradeoffs between them. Researchers use slower (in both senses) frameworks to implement weird new ideas that require the flexibility while engineers typically use faster (in both senses) frameworks that allow them to have a performant and reliable model for production deployment.

A year ago it looked like Tensorflow might dominate, but most papers I read still publish their code in Caffe, so we've done a lot more with Caffe than Tensorflow.

Our own work calls cudnn/cublas directly because we're C++ programmers and its just more convenient for our use case.

It's like any framework. You probably want to choose based on popularity (which equates to Stack Overflow articles explaining common pitfalls..) and a programming language you already know.

Try using keras.io that way you can have an abstraction on top of tensorflow, theano, etc

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact