
Open source deep learning models that programmers can download and run first try - pplonski86
https://github.com/samdeeplearning/The-Terrible-Deep-Learning-List
======
reader5000
There isnt really any math to deep learning other than the concept of a
derivative which is taught in high school calculus. The reason deep learning
papers seem mathy is people take network architectures and various elementary
operations on them and try to express them symbolically in latex using
summations and indexing-hell. For example the easy concept of "updating all
the neurons in one layer based on the neurons in the previous layer and
connecting weights" is expressed as matrix-vector multiplication for not
really any apparent reason other than it is technically correct and makes for
slicker notation, and I guess makes it easier to use APIs that compute
gradients for you. Deep learning however is broadly an experimental science,
which in many ways is the opposite of math as traditionally envisioned, in
which great insights follow deductively from prior great insights. If you ask
a basic question like "why should use 4 layers instead of 3?" there is no
answer other than "4 works better". Similarly with gradient descent versus
random search in weight space. There are many problem domains where random
search is as good as any known hill-climbing heuristic search (like gradient
descent). Why is GD so effective when learning image classifiers expressed as
stacked weight sums? Who knows.

~~~
davedx
Using matrices to perform the calculations is an optimization over doing a
bunch of for loops. This vectorization results in faster code within higher
level languages and on certain hardware platforms (SIMD). It's nothing to do
with "slicker notation", although having written gradient descent with for
loops and matrix operations, the vectorized version is simpler and cleaner to
read in my opinion.

~~~
Houshalter
He's not complaining about using vectorization in code. The problem is papers
and even explanations targeted at non-experts, often use obfuscated math in
place of clear explanations. I've complained about this before here:
[https://news.ycombinator.com/item?id=13953530](https://news.ycombinator.com/item?id=13953530)

Mathematical notation is basically a programming language. A programming
language with weird symbols you can't type to search for, single letter
variable names for everything, and no comments. And it's written by
programmers that are obsessed with fitting everything into a simple line and
making it as small as possible, no matter how difficult it is to read. Any
programmer understands this is incredibly bad practice. And even if parse
every step and perfectly follow _what_ the code is doing, without explanation,
it's pretty difficult to figure out _why_.

~~~
davedx
OK, I see what you're saying. I think you have the same issue with "real"
programming languages too. If you compare some very concise Clojure or Scala
code with the equivalent in Java, it can be quite hard to understand if you're
not very familiar with the language. But I wouldn't necessarily say it's
"incredibly bad practice". A Scala programmer can write concise and elegant
code that to another Scala programmer is actually faster to understand because
of that conciseness. Whereas the same code written with for loops and class
method calls and all the boilerplate in Java would take more studying to
filter out the low level constructions.

It's about the level of abstraction. And yeah if you don't understand the
notation or syntax at the level of abstraction you're studying, it will be
very hard.

(FWIW I find Scala code quite hard to understand sometimes, but I also find
the more I know about the language, the more comprehensible it gets).

~~~
Houshalter
It's not necessarily the conciseness that's a problem. Using foreach instead
of a full for loop is one thing. What I'm complaining about is code in place
of an explanation. E.g. imagine coming across some nasty piece of code like
this:
[https://en.wikipedia.org/wiki/Fast_inverse_square_root#Overv...](https://en.wikipedia.org/wiki/Fast_inverse_square_root#Overview_of_the_code)

It doesn't matter how familiar you are with the language. Without an
explanation of what the hell is going on, just looking at the code is useless.

~~~
davedx
Now we're talking about documentation. You are correct, no code is conpletely
self documenting. But that Quake code is very low level, the opposite of what
I think the grandparent doesn't like (very high level abstract notation)

------
shoshin23
I would say that this title is misleading. A lot of what is presented there
needs a strong grasp of deep learning(and the other underlying concepts behind
them.), without which all you'll do is load the examples on Xcode and run
them.

Moreover, I would probably encourage people to read examples of Tensorflow or
Caffe2 running on iOS rather than something like Forge. Forge is an
interesting project but won't really help you if you don't have a clue about
MPS or Deep Learning.

~~~
itg
The original title, which is also the title of the repo, was much more
accurate. The author mentions these are models to download and start playing
with right away, not a set of repositories to help you learn deep learning.

~~~
sctb
OK, we've updated the title from the (slightly edited) repo description of
“Examples to get started with Deep Learning without learning any of the math”
to this phrase from the description.

------
solomatov
It's a bad idea to learn deep learning without learning the math.

~~~
Retric
I disagree, now prove your point.

~~~
solomatov
Your deep learning models don't always work as expected. In such cases you
need to debug it. Understanding how models works internally is required for
debugging them.

~~~
joshvm
I would wager that a lot of people who "do deep learning" have absolutely no
idea about the models they're using.

Hyperparameter optimisation is basically a fudge right now - you try
everything and see what works. Even the research groups who came up with the
standard network stacks, like VGG, basically lucked out and found an
architecture that worked, then tried several variants and found one that
worked better. DL papers are full of handwaving speculation about why
particular networks perform better than others, but right now it's just that:
highly educated speculation.

This isn't limited to deep learning. If you want to try any kind of machine
learning, it's totally reasonable to throw different fitting functions at your
problem to see which one works best. Unless you have an unusually clear
problem category, it's rarely possible to say at the outset that "This problem
would best be solved with method <X>". A counter here would be that if you
need to classify images, you should almost certainly use a convnet.

You need _some_ understanding about why things might be going wrong, e.g. your
loss isn't moving -> crank up the learning rate. You're seeing nans? Probably
your learning rate is too high. But that doesn't really need any serious maths
to understand. You can get by quite well by figuring out empirical rules.

I'm not arguing that you _shouldn 't_ learn the maths, it's a wise idea to,
but many people use deep learning models without knowing how backpropagation
works for instance.

------
rrggrr
Gosh I wish this existed for Python.

~~~
jcl
Could you clarify? Several of the examples are Python projects (with and
without Tensorflow), and others are apps consuming models that probably came
from Tensorflow.

------
amelius
So sad to see the divide between iOS and Android platforms.

Half of this stuff I can't run.

