On the other hand, if a framework "correctly" implements the underlying statistical theory/principals of deep learning, shouldn't I get the same results regardless of whichever framework I use?
If not, how would I know which framework produces "more correct" interpretations of the underlying data?
If your problem is more complicated and you want to use some unique architecture, you'll have to use one of the more low-level frameworks. I would recommend Tensorflow just on the basis of its popularity (you're more likely to find people who have run into the some problems as you). But Theano, Torch, and MXNet are probably pretty much equivalent in terms of speed and ease of use. I hear Caffe has a steeper learning curve.
If you're really doing something fancy, then you'll have to look into more detail. Torch and MXNet have the advantage that you can adaptively change your computation graph based on the data, but you'd probably have to be pretty far into deep learning research before something like that is useful. Tensorflow Fold does something similar, but I'm not sure how well integrated it is with the rest of Tensorflow (I've never used it).
You might also take a look at this:
It's a little out of date now, but it'll get you started.
Some of these frameworks are more general than others (e.g., Tensorflow is more general than Keras), so you can specify architectures in some that you can't in others. But as long as you can specify the architecture in a particular framework, you'll be able to get a working model. Your choice of framework just comes down to whatever one is easiest to work with for the problem at hand.
I started off using Caffe/Torch and currently use mostly Keras for most of my deep learning related experiments. With a more base level framework, I actually could tinker with different moving components to understand why they are used as they are, while with a higher level abstraction, I can concentrate on the problem at hand, knowing that most basic abstractions (or building blocks) are well developed already and have more or less been battle tested by people far smarter than me.
And of course, when it comes to pure speed numbers and architecture for scaling/deployment, these frameworks do vary among themselves: https://github.com/zer0n/deepframeworks/blob/master/README.m...
That is about right provided that 1) you use the same initial values and hyper-parameters, and 2) you can implement the same network with all frameworks. Issue 2) is complicated. Some networks are easy to implement in one framework can be hard or even impossible in another framework. Here "hard" can mean two opposite things: lack of flexibility (which disallows you to construct a certain topology) or excessive flexibility in the framework (which takes too many steps and care to construct a topology). Which framework to use depends on your goal and skill level. For starters, keras is usually easier.
So they took the easy way out and removed the videos.
One path we recommend java developers take who are new to deep learning is to take the fast.ai class:
From there, map what you learn to our model import in keras:
That will more or less get you up and running.
We also have my oreilly book out for early release:
I ended up bumping into the edges of the Keras API too much, and coming up with hacky type solutions to do things that are actually quite simple if you just do them in TensorFlow yourself.
Theano and Torch are also great options, but I think I will be sticking with TensorFlow, simply because I trust that Google will be putting solid effort behind it for years to come.
The math involved is pretty simple, in terms of the calculations that have to be performed.
Where frameworks differ is in things like speed and ease of use. Use the one that is the easiest for you. Tensorflow is certainly going to be the most popular for the foreseeable future.
What surprise me the most is that, at least tf, is almost declarative as framework.
I needed to add some random noise to a point in a multidimensional space so to generate other n points, close to first one.
In python I would loop through n, each time I would add some noise to the initial point and then I would push it into a list or whatever structure, a list compression.
In tf I am "stacking" n times the original point so to obtain n times that same point, then I am generating n random noise and finally I am adding the two.
The second solution more elegant in my opinion but require an important mental shift.
If the other frameworks are somehow similar at tf your biggest hurdle will be this kind of mental shift, just pick one.
first is language. need to choose a familiar language.
second is feature set. they don't implement the same set of operators. but if you only want to use the common ones, most frameworks should have them.
the third is their ability to train in parallel. for example, does a framework support multiple machines? or just single machine, multiple gpus? Performance is also a factor. Do they support simd/gpu? do they generate intermittent code and compile it into cpp/cuda? or they just call into gpu libraries? Do you want to support mobile devices?
the fourth difference is the level of abstraction. if a framework is very low level, users need to understand many fundamentals of deep learning. but on the other hand, if you want to extend the framework to add new operators, a low level framework is easier to hack.
a high level framework lets you to write less code, but it hides details and makes it harder to hack.
the last thing that can be considered is the difference between dynamic/static framework. dynet and chainer and tensorflow with something called "fold" are dynamic frameworks. I was told they are more flexible. but I don't understand the details.
Our own work calls cudnn/cublas directly because we're C++ programmers and its just more convenient for our use case.