
Ask HN: Does it matter which deep learning framework I use? - canterburry
On one hand I fully understand there are differences in language and specific features available depending on framework out there.<p>On the other hand, if a framework &quot;correctly&quot; implements the underlying statistical theory&#x2F;principals of deep learning, shouldn&#x27;t I get the same results regardless of whichever framework I use?<p>If not, how would I know which framework produces &quot;more correct&quot; interpretations of the underlying data?
======
antognini
It all depends on what your goals are. If you just want to train a neural
network on a dataset you have and you aren't all that interested in going into
the details of how the NN works or is trained, Keras is fine. It has a nice
high-level interface and the backend is either in Theano or Tensorflow (your
choice).

If your problem is more complicated and you want to use some unique
architecture, you'll have to use one of the more low-level frameworks. I would
recommend Tensorflow just on the basis of its popularity (you're more likely
to find people who have run into the some problems as you). But Theano, Torch,
and MXNet are probably pretty much equivalent in terms of speed and ease of
use. I hear Caffe has a steeper learning curve.

If you're really doing something fancy, then you'll have to look into more
detail. Torch and MXNet have the advantage that you can adaptively change your
computation graph based on the data, but you'd probably have to be pretty far
into deep learning research before something like that is useful. Tensorflow
Fold does something similar, but I'm not sure how well integrated it is with
the rest of Tensorflow (I've never used it).

You might also take a look at this:

[https://github.com/zer0n/deepframeworks](https://github.com/zer0n/deepframeworks)

It's a little out of date now, but it'll get you started.

Some of these frameworks are more general than others (e.g., Tensorflow is
more general than Keras), so you can specify architectures in some that you
can't in others. But as long as you can specify the architecture in a
particular framework, you'll be able to get a working model. Your choice of
framework just comes down to whatever one is easiest to work with for the
problem at hand.

~~~
fest
For my particular case (OCR for limited set of strings) I found using caffe
much easier than tensorflow (even when using tf-learn).

------
madmax108
As with most programming questions, the answer is a combination of yes and no,
and that depends on the level of abstraction provided by the framework.

I started off using Caffe/Torch and currently use mostly Keras for most of my
deep learning related experiments. With a more base level framework, I
actually could tinker with different moving components to understand why they
are used as they are, while with a higher level abstraction, I can concentrate
on the problem at hand, knowing that most basic abstractions (or building
blocks) are well developed already and have more or less been battle tested by
people far smarter than me.

And of course, when it comes to pure speed numbers and architecture for
scaling/deployment, these frameworks do vary among themselves:
[https://github.com/zer0n/deepframeworks/blob/master/README.m...](https://github.com/zer0n/deepframeworks/blob/master/README.md)

------
attractivechaos
> On the other hand, if a framework "correctly" implements the underlying
> statistical theory/principals of deep learning, shouldn't I get the same
> results regardless of whichever framework I use?

That is about right provided that 1) you use the same initial values and
hyper-parameters, and 2) you can implement the same network with all
frameworks. Issue 2) is complicated. Some networks are easy to implement in
one framework can be hard or even impossible in another framework. Here "hard"
can mean two opposite things: lack of flexibility (which disallows you to
construct a certain topology) or excessive flexibility in the framework (which
takes too many steps and care to construct a topology). Which framework to use
depends on your goal and skill level. For starters, keras is usually easier.

------
gtani
Might want to look for video to Feb 22 lecture comparing caffe, theano, torch,
TF:
[http://cs231n.stanford.edu/syllabus.html](http://cs231n.stanford.edu/syllabus.html).
It was taken down from youtube because no closed captions but i'm sure it's
archived multiple places

~~~
happyslobro
What? Youtube requires you to add captions manually, or the video is taken
down? That's brutal! I thought YT had automatic captioning. Wow, lame.

~~~
ReverseCold
No.

[https://mobile.nytimes.com/2015/02/13/education/harvard-
and-...](https://mobile.nytimes.com/2015/02/13/education/harvard-and-mit-sued-
over-failing-to-caption-online-courses.html)

So they took the easy way out and removed the videos.

------
p1esk
Here's an in depth answer: [https://www.svds.com/getting-started-deep-
learning/](https://www.svds.com/getting-started-deep-learning/)

~~~
canterburry
Thanks for this. While I am on Java, I know Python seems to rule the deep
learning space.

~~~
agibsonccc
Hi: VERY biased author of
[http://deeplearning4j.org](http://deeplearning4j.org) here.

One path we recommend java developers take who are new to deep learning is to
take the fast.ai class: [http://course.fast.ai](http://course.fast.ai)

From there, map what you learn to our model import in keras:
[https://deeplearning4j.org/model-import-
keras](https://deeplearning4j.org/model-import-keras)

That will more or less get you up and running.

We also have my oreilly book out for early release:
[http://shop.oreilly.com/product/0636920035343.do](http://shop.oreilly.com/product/0636920035343.do)

------
akyu
I was using Keras pretty heavily, but I have switched over to fully using
TensorFlow. Once you build a decent library of boiler plate, Tensorflow
becomes very usable. Packages like prettytensor may even surpass Keras in
terms of usability. Also I found the Keras documentation to be quite lacking,
and ended up reading the source code much more often than I would like to.

I ended up bumping into the edges of the Keras API too much, and coming up
with hacky type solutions to do things that are actually quite simple if you
just do them in TensorFlow yourself.

Theano and Torch are also great options, but I think I will be sticking with
TensorFlow, simply because I trust that Google will be putting solid effort
behind it for years to come.

------
wayn3
It does not matter, and there's not a lot to get wrong in deep learning.

The math involved is pretty simple, in terms of the calculations that have to
be performed.

Where frameworks differ is in things like speed and ease of use. Use the one
that is the easiest for you. Tensorflow is certainly going to be the most
popular for the foreseeable future.

------
siscia
I am learning now TensorFlow and I have no knowledge of the other possible
frameworks.

What surprise me the most is that, at least tf, is almost declarative as
framework.

I needed to add some random noise to a point in a multidimensional space so to
generate other n points, close to first one.

In python I would loop through n, each time I would add some noise to the
initial point and then I would push it into a list or whatever structure, a
list compression.

In tf I am "stacking" n times the original point so to obtain n times that
same point, then I am generating n random noise and finally I am adding the
two.

The second solution more elegant in my opinion but require an important mental
shift.

If the other frameworks are somehow similar at tf your biggest hurdle will be
this kind of mental shift, just pick one.

------
mindcrime
In many frameworks the low level mathematics are delegated to the installed
implementation of BLAS[1] anyway, so I'd expect most of the really popular
frameworks to get the same answers from that perspective. Other than that, my
feeling is that if you stick to the well known / popular frameworks, you
should be fine. If any one of them had a glaring deficiency, I'm pretty sure
it would have been noted and widely disseminated by now.

[1]:
[https://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprogra...](https://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms)

------
billconan
there are few things you need to consider.

first is language. need to choose a familiar language.

second is feature set. they don't implement the same set of operators. but if
you only want to use the common ones, most frameworks should have them.

the third is their ability to train in parallel. for example, does a framework
support multiple machines? or just single machine, multiple gpus? Performance
is also a factor. Do they support simd/gpu? do they generate intermittent code
and compile it into cpp/cuda? or they just call into gpu libraries? Do you
want to support mobile devices?

the fourth difference is the level of abstraction. if a framework is very low
level, users need to understand many fundamentals of deep learning. but on the
other hand, if you want to extend the framework to add new operators, a low
level framework is easier to hack.

a high level framework lets you to write less code, but it hides details and
makes it harder to hack.

the last thing that can be considered is the difference between dynamic/static
framework. dynet and chainer and tensorflow with something called "fold" are
dynamic frameworks. I was told they are more flexible. but I don't understand
the details.

------
deepnotderp
Depends on your goal. Ultimately, the three tenets are flexibility, speed and
speed of development. All frameworks make tradeoffs between them. Researchers
use slower (in both senses) frameworks to implement weird new ideas that
require the flexibility while engineers typically use faster (in both senses)
frameworks that allow them to have a performant and reliable model for
production deployment.

------
paulsutter
A year ago it looked like Tensorflow might dominate, but most papers I read
still publish their code in Caffe, so we've done a lot more with Caffe than
Tensorflow.

Our own work calls cudnn/cublas directly because we're C++ programmers and its
just more convenient for our use case.

------
cjbprime
It's like any framework. You probably want to choose based on popularity
(which equates to Stack Overflow articles explaining common pitfalls..) and a
programming language you already know.

------
starlineventure
Try using keras.io that way you can have an abstraction on top of tensorflow,
theano, etc

