
Engineering is the bottleneck in deep learning research - sajid
http://blog.dennybritz.com/2017/01/17/engineering-is-the-bottleneck-in-deep-learning-research/
======
novaRom
After few years of research in DL I learned not to trust any single paper at
all. 99% of DL papers are not scientific but rather 'hey guys, look this trick
is our new awesome discovery'.

Also I learned to communicate with other teams and exchange with ideas proved
by practices - this really helps.

To improve, I would suggest to publish whole setup, all the parameters used,
the programming code, and publish either all the data or a reference to a
large free data set (no MNIST anymore in papers, please).

~~~
daveguy
> (no MNIST anymore in papers, please)

If you have a breakthrough in transfer learning then you will be able to very
effectively demonstrate it with MNIST.

The race to the bottom is essentially over, but that doesn't mean MNIST can't
be used to demonstrate learning.

Regarding setup and parameters. I hope AI researchers move toward something
like pachyderm [[https://pachyderm.io](https://pachyderm.io)] -- providing a
single docker image to completely replicate their work. However, I sincerely
doubt that will happen. As "open" as research is the details are almost always
obfuscated to prevent competition with the spin-out company (or other
researchers).

~~~
AIMunchkin
How about unlocking access to ImageNet? Unless one has a .edu account, its
overlords seem to ignore requests to access it. Mind you, it's relatively easy
to social engineer access to it, but why should this be necessary? OpenAI and
Google have both knocked it out of the park with easy access to datasets and
examples.

But sadly, IMO at the amateur-level, TensorFlow considered harmful. I have
repeatedly observed novices blow the thing up by starting from one of many of
its amazing and fantastic teaching examples. It's not a question of the
TensowFlow API, but rather of the engineering quality of its underlying
engine, which kind of sucks. Nothing ruins an enthusiastic data scientist's
day like a cryptic seg fault for no apparent reason whatsoever.

And I know they're working on it, but fer cryin' out loud, the API is great,
and Google has the bottomless pockets to do a lot better than this. It's been
over a year and I still see people throw their hands up in frustration trying
to make use of the thing. Of course, Google has never been a customer-driven
company, but if we don't want an AI Fall, methinks this needs to be fixed.

~~~
daveguy
I agree. I think all reference datasets should be free and open source. Also,
researchers shouldn't publish on datasets that _are not_ free and open source.
That is a basic requirement for repeatability.

As far as the TensorFlow API is concerned. This may be a tradeoff between
speed and robustness. In order to have every operation checked every time
would certainly slow down the code for general use. Probably better how-to /
setup / use guides are a better solution for this (unless it is a flat out
bug).

------
felxh
Anecdotally, my master thesis on natural language processing was supposed to
consist of first reproducing the results of an influential paper (back then)
and then hopefully improving upon it by extending the model used.

The paper made it seem like they had been using a standard PCFG parser (which
circulated in the research community at the time) to achieve their results. It
turned out they hadn't and instead had written a custom one and in fact their
results were not reproducible using the standard parser.

What was meant to be a timesaver in terms of engineering (using a standard
parser instead of writing your own) turned out to be a massive time sink. It
also turned out that by using a custom parser they had unintentionally
diverted from a vanilla PCFG (probabilistic context free grammar), or in other
words, some implementation details had led to a departure from the assumed
underlying theoretical model.

------
mattsouth
A lot of research depends on software written by researchers and yet writing
software is not really supported or incentivised by academia. Organizations
like the software sustainability institute in the UK
([http://software.ac.uk](http://software.ac.uk)) are lobbying to change this,
with some success, but I guess it takes a long time to effect cultural change.

~~~
jpolitz
See also Artifact Evaluation, a process used in several PL/SE conferences that
makes evaluation of code (and datasets/studies) an explicit step in the review
process:

[http://evaluate.inf.usi.ch/artifacts](http://evaluate.inf.usi.ch/artifacts)

[http://www.artifact-eval.org/](http://www.artifact-eval.org/)

~~~
mattsouth
Hey thanks - I hadnt seen those before. Good stuff.

------
PaulHoule
Unfortunately CS is not as rigorous as some other academic fields.

I've been to 100+ colloquia in the physics dept at Cornell and I have never
been at one that I felt was a waste of time or that the person should not
belong there.

The CS department colloquium is a different story: yes I got to see Geoff
Hinton before he became a celebrity but maybe half of the talks are awful.

~~~
AIMunchkin
IMO (to be fair, some) CS people became engineering bottlenecks the day the
universities switched out teaching C/C++ for Java and Python (IMO the Why Not
Zoidberg? of programming languages). Those who learned C/C++ anyway became my
heroes.

I have sat through too many presentations obsessing on HW-level perf/W
especially w/r to Deep Learning ASIC wannabes. Just writing one's code in
C/C++ (and doing it well) guarantees at least a 2x improvement over Java and a
10-100x improvement over Python. I won't even bring up the computational coup
that is CUDA.

But hey, let's base a mobile phone OS on Java and block low-level access to
its GPU, that's a fantastic idea, right?

See also many experiences with data scientist and CS primadonnas dismissing
low-level coding as "ops." I liken this to the Eloi dismissing the Morlocks as
"the help."

~~~
leecarraher
wrong thread, take the best programming language fight elsewhere

~~~
AIMunchkin
It's not a best programming language fight. In my experiences in the industry,
an enormous of amount of technical debt and operational inefficiency is
accrued when someone ignorant of how machines and processors actually work
(SIMD, cache, pipelines, threading, etc) is in a leadership position to
dictate the toolset for solving problems.

This wasn't a noticeable issue until about a decade ago. But it is now and it
continues to get worse IMO. The "programming language" bit is just one of its
symptoms when the root cause is ignorance of practical computer architecture.

That said, the mentality of throwing all big data problems at Hadoop clusters
with 4 year-old GPUs and flaky 10 gB interconnect (many of which could be
solved _faster_ on one.big.modern.and.cheaper.machine(tm)) is working wonders
for my Amazon stock so maybe I should just shut up and get rich?

------
elitro
Ah yes, the story of my master's ML thesis.

I had to select features from multiple papers in order to try and select the
best ones with classification results to prove it.

A few problems included:

\- Incomplete/unavailable datasets (404 on some copyright pictures)

\- Features consisted on Math formulas and text descriptions (no code
whatsoever)

\- Classifier names only (which framework did you use? parameter values?)

In the end i couldn't contribute as well, got instructions to save my work in
a private repo despite being funded by an EU academical scholarship.

------
saip
Agreed. The tooling around deep learning is not as mature as the tooling
around software development. There is a fair amount of engineering and grunt
work needed to even get started, let alone build on others' research. A few
problems from top of mind: \- Setup: Installing DL frameworks, Nvidia drivers
and CUDA is an exercise in dependency hell. Trying to run someone's project,
which has different dependencies than what you have is difficult to get right.
Docker images [1] and nvidia-docker make this simple, but are still not the
norm. \- Reproducibility: This is big as Denny mentions. Folks still use
Github for sharing code. But DL pipelines need versioning of more than just
code. It's code, environment, parameters, data and results. \- Sharing and
collaboration: I've noticed that most collaboration on deep learning research,
unlike software, happens only when the folks are co-located (e.g. part of the
same school or company). This likely links back to reproducibility, but there
are not many good tools for effective collaboration currently IMHO. [1]
[https://github.com/floydhub/dl-docker](https://github.com/floydhub/dl-docker)
(Disclaimer: I created this)

------
rubidium
Sorry to knock the post author off his high horse, but "just like you wouldn’t
want a highly trained surgeon spending several hours a day inputting patient
data from paper forms." Highly trained surgeons _do_ spend several hours a day
doing tedious paperwork.

As a researcher, I expect 50-90% of my time to be slogging through
organizational and preparatory work.

~~~
transcranial
They _do_, but they _shouldn't_. It's an inefficient allocation of skills and
resources.

~~~
ch4s3
>It's an inefficient allocation of skills and resources

Is it? Maybe the paperwork is important for other members of the care team and
the surgeon is the only one who is familiar enough with the surgery to fill
out the forms. And, you can't reasonably be doing surgery round the clock.

------
audleman
I agree with the central thesis: engineering is a huge bottleneck. I work for
a FinTech company that is building novel machine learning models and this is
our experience.

We've had a few machine learning experts working here for a couple of years,
but recently brought in a software engineer with a passion for machine
learning. He was able to, within a few months, streamline the data acquisition
pipeline to the point where we could iterate on a new models in about 30
minutes, down from days. He accomplished this not just with better data but by
building efficient in-memory data structures. It saves literally days of time
per iteration because of disk I/O.

Before his work the training data versus the data we used in production had
minor differences. Each new release required intensive manual verification to
make sure that our model worked. Now we have much more certainty that the two
match up.

Looking down on engineering problems is like a famous architect looking down
on structural engineers. You're not gonna have a very good skyscraper if your
foundation is shaky and ad-hoc.

------
leecarraher
coming from the machine learning research community, i am in awe of the
availability and relative ease-of-use of the deep learning frameworks. Rarely
can i find comparison code in ML that i didn't have to bug the author for, or
try and implement myself based on the paper alone. in short DL is on a much
better path than the author realizes. Perhaps we have to thank the
github/bitbucket era. The real problem with DL is that until there is a more
robust theory(if even possible, the bane and boon of ML is the complexity of
the models), much of the application research will sorta be a form of digital
alchemy.

------
siscia
Just yesterday, browsing the forum of kaggle, I thought that we may need a
GitHub for Deep Learning...

So that you could link your code to a dataset, have it automatically run, and
show the result...

Not sure if it is worth the time...

~~~
daveguy
I think OpenAI is working on this with gym and universe.

[https://universe.openai.com](https://universe.openai.com)

[https://gym.openai.com](https://gym.openai.com)

A general dataset pool in OpenAI would be nice. Kaggle has quite a few just
basic datasets (MNIST etc) for evaluation.

------
autokad
i was playing around with the emotion data set on kaggle using tensorflow.
depending on the seed, i was getting between 58 and 60% accuracy on the hold
out test set (what you submit against).

i thought i came up with a good set of hyper parameters using aws gpu
instances (python 2.7). i wanted to visualize some of the outputs so I copied
the code to my machine and ran under python 3.5 (windows) and only got 57%
accuracy. these swings in accuracy are huge

