
The most cited deep learning papers - sdomino
https://github.com/terryum/awesome-deep-learning-papers
======
cr0sh
I can understand why it probably isn't on the list yet (not as many citations,
since it is fairly new) - but NVidia's "End to End Learning for Self-Driving
Cars" needs to be mentioned, I think:

[https://arxiv.org/abs/1604.07316](https://arxiv.org/abs/1604.07316)

[https://images.nvidia.com/content/tegra/automotive/images/20...](https://images.nvidia.com/content/tegra/automotive/images/2016/solutions/pdf/end-
to-end-dl-using-px.pdf)

I implemented a slight variation on this CNN using Keras and TensorFlow for
the third project in term 1 of Udacity's Self-Driving Car Engineer nanodegree
course (not special in that regard - it was a commonly used implementation, as
it works). Give it a shot yourself - take this paper, install TensorFlow,
Keras, and Python, download a copy of Udacity's Unity3D car simulator (it was
recently released on GitHub) - and have a shot at it!

Note: For training purposes, I highly recommend building a training/validation
set using a steering wheel controller, and you'll want a labeled set of about
40K samples (though I have heard you can get by with much fewer, even
unaugmented - my sample set actually used augmentation of about 8k real
samples to boost it up to around 40k). You'll also want to use GPU and/or a
generator or some other batch processing for training (otherwise, you'll run
out of memory post-haste).

~~~
amelius
Nice. I'm wondering how often NVidia's solution makes a mistake. Also, the
paper says:

> More work is needed to improve the robustness of the network, to find
> methods to verify the robustness, and to improve visualization of the
> network-internal processing steps.

But it doesn't hint at how this would be approached.

Also, how they arrived at the particular network topology seems sort of a
mystery.

------
pizza
[http://people.idsia.ch/~juergen/deep-learning-
conspiracy.htm...](http://people.idsia.ch/~juergen/deep-learning-
conspiracy.html) oh Juergen

> _Machine learning is the science of credit assignment. The machine learning
> community itself profits from proper credit assignment to its members. The
> inventor of an important method should get credit for inventing it. She may
> not always be the one who popularizes it. Then the popularizer should get
> credit for popularizing it (but not for inventing it). Relatively young
> research areas such as machine learning should adopt the honor code of
> mature fields such as mathematics: if you have a new theorem, but use a
> proof technique similar to somebody else 's, you must make this very clear.
> If you "re-invent" something that was already known, and only later become
> aware of this, you must at least make it clear later._

~~~
curuinor
I mean, there was a nice 15 or so year period between the "new" post-Minsky-
and-Papert Perceptron book connectionism and the current "new new"
connectionism where neural nets were a definite backwater. Most of the PIs
doing neural nets dealt with it by being sad but Schmidhuber seems to have
dealt with it by doubling down on the weirdness.

~~~
latently
Just because it wasn't wildly popular doesn't mean there was a gap of any
sorts. I agree with Schmidhuber.
[http://grey.colorado.edu/emergent](http://grey.colorado.edu/emergent)

------
kriro
This might be as good a place to ask as any. Does anyone have suggestions on
the problem of annotating natural language text to get a ground truth for
things that have no readily available ground truth (subjective judgments of
content etc.)? I do own the book "Natural Language Annotation" which is good
but not exactly what I need. The part of annotation guidelines and how the
annotation was done in practice is often only brushed over in many research
papers. I mean I get it at a high level it's basically have a couple of
raters, calculate inter- and intrarater reliability and try to optimize that.
However like I said I'm struggling a bit with details. What are actually good
values to aim for, how many experts do you want, do you even want experts or
crowd source, what do good annotation guidelines look like, how do you
optimize them etc.? Just to play around with the idea a bit, we did a workshop
with four raters and 250 tweets each (raters simply assigned one category for
the entire tweet) and that was already quite a bit of work and feels like it's
on the way to little side of things.

I feel like I should find a lot more info on this in the sentiment analysis
literature but I don't really.

~~~
PeterisP
You might want to browse archives of LREC biannual conference, they have
sections focused on resource creation and some of the larger projects should
have papers on annotation methodology. LDC
([https://www.ldc.upenn.edu/](https://www.ldc.upenn.edu/)) is probably the
largest organization doing a large variety of annotation tasks, maybe they
have published how they do things, I'm not sure.

However, often there are no real shortcuts; in many projects the resource
annotation takes much more work and more people than everything else together,
it's not uncommon to see multiple man-years spent to do that properly.

What you say about the high level is just about all that can be said in
general, everything else will depend on your particular problem. After you've
fixed the bugs in your process, interannotator agreement is not really a
description of your annotators but a measure of how subjective/objective your
task is - and you can't really change that without meaningful changes to how
exactly you define your task. Some tasks are well suited for crowdsourcing,
and some need dedicated experts. Some annotation tasks are straightforward and
annotation guidelines fit on one page; for others the annotation guidelines
are a literal book, and one that needs revisions as a few years after you
figure out that you need changes. It depends. Shallow sentiment analysis is
generally on the trivial side of annotation (but highly subjective), but you
can go far enough the rabbit hole to drag the whole surrounding issues of
intent, degree of belief, degree of certainty, etc - then you hit the full
complexity of deep semantic annotation.

Perhaps you just need to find the people who did the latest more interesting
datasets in your domain and ask them directly. I don't handle sentiment, but
[http://alt.qcri.org/semeval2017/task5/](http://alt.qcri.org/semeval2017/task5/)
is one group of people that seems to do it seriously.

~~~
kriro
Thank you, excellent and helpful post. I see I have visited the LDC site
before but didn't remember that one. I'll have another look. SemEval looks
like it could be exactly what I'd love to poke around and they have datasets
available.

------
nojvek
Someone needs to make a summary of the top papers and explain it in a way lay
man can understand. I would pay $500 for such a book/course explaining the
techniques.

I've been reading a number of this papers but it's really tough to understand
the nitty gritties of it.

~~~
lomereiter
Jeremy Howard's course ([http://course.fast.ai/](http://course.fast.ai/)) is
much as you describe, the downside is it's a little too boring for people with
mathematical education.

~~~
nojvek
Thanks I'll check it out.

------
curuinor
No PDP book? It's old and weird but interesting and has a lot of original
ideas, notwithstanding the actual original backprop being from before then.
Nor the original backprop stuff?

~~~
aroman
The PDP book is the main textbook for a course I'm taking at CMU called...
PDP. It's digestible, but man is it weird to see things like "this is an
ongoing area of future research" where the future = after 1986.

I find it kind of hard to relate to for that reason — how do I know (besides
asking my prof., whose PhD advisor was Hinton, or doing my Googling) what
ideas ended up "sticking"? What areas of future research went nowhere vs
spawned whole new subfields?

Is there a more modern textbook of the sort I could cross-reference?

edit: here's a link to the course website:
[http://www.cnbc.cmu.edu/~plaut/IntroPDP/](http://www.cnbc.cmu.edu/~plaut/IntroPDP/)

~~~
curuinor
Nobody really cares about this stuff anymore except weirdoes like me so you
might actually be best off asking your prof.

There's also a lot of resurrection that goes on with weird ideas: FastFood
cites a paper that cites a paper that cites Lecun's brain damage paper, for
example

------
pks2006
I always wanted to apply the knowledge of the deep learning to my day to day
work. We build our own hardware that runs the Linux on Intel CPU and then
launches a virtual machine that has our propriety code. Our code generates a
lot of system logs that varies based on what is the boot sequence, environment
temperature, software config etc. Now we spend a significant amount of time go
over these logs when the issues are reported. Most of the time, we have 1 to 1
mapping of issue to the logs but more often, RCA'ing the issue requires the
knowledge of how system works and co-relating this to the logs generated. We
have tons of these logs that can be used as training set. Now any clues on how
we can put all these together to make RCA'ing the issue as less human involved
as possible?

~~~
curuinor
Use dumber ML first, try some random forests. Not even because they're even
that much better or worse, just because DL requires an enormous amount of
knowledge and fiddliness but what you prolly want is for the bulk of the
actual work to set up the data for ML, not hyperparameter fiddling and
architecture fiddling.

~~~
shoyer
Before you try any sort of ML, explore your data [1, 2]. Exploratory data
analysis may very well tell you that there is absolutely no point in making a
fancy predictive model at all. If a few heuristics get you 90% of the way to
an optimal solution, then don't even bother to start on machine learning,
unless that last 10% is going to provide significant value.

[1]
[https://en.wikipedia.org/wiki/Exploratory_data_analysis](https://en.wikipedia.org/wiki/Exploratory_data_analysis)

[2] From one of my mentors:
[http://www.unofficialgoogledatascience.com/2016/10/practical...](http://www.unofficialgoogledatascience.com/2016/10/practical-
advice-for-analysis-of-large.html)

------
mathoff
The most cited deep learning papers:
[https://scholar.google.com/scholar?q="deep+learning"](https://scholar.google.com/scholar?q="deep+learning")

------
gv2323
Has anyone downloaded them into their own separate folders and zipped the
whole thing up?

------
gravypod
This is a really lucky find for me. I was just about to do something to try
and get into machine learning. Right now I need some help getting started with
writing some machine learning code. I don't know where to start. I've come up
with a very simple project that I think this would work very well for.

I want to buy a Raspberry Pi Zero, put it in a nice case, add to push buttons
and turn it into a car music player (hook it into the USB charger and 3.5mm
jack in my car). The two buttons will be "like" and "skip & dislike". I'll
fill it with my music collection, write a python script that just finds a
song, plays it, and waits for button clicks.

I want the "like" button to be positive reinforcement and the "skip & dislike"
to be negative reinforcement.

Could someone point me in the right direction?

~~~
ctchocula
Your usecase reminded me of the Netflix problem, which is given x movies that
the user has liked, try to recommend movies to them based on a large dataset
that has thousands of users and their movie ratings. For music, there is a
similar dataset[1] and problem on Kaggle[2].

The way the system is evaluated is by building a model that will predict what
rating the user will give a song even though the user has not rated it yet.
Then the difference in predicted and actual rating will be computed as the
testing error in the model. Some basic techniques for building a model are
regression and matrix factorization using SVD (singular value decomposition).

Your usecase might be slightly different from this problem, because you
wouldn't have to predict the rating other users give a song (only for
yourself) and you want your model to change on the fly given a skip and
dislike. A simple, but possibly effective solution might be to search the
music dataset that contains the listening history of 1M users to find songs
you haven't rated before and download it to listen.

[1]
[http://labrosa.ee.columbia.edu/millionsong/](http://labrosa.ee.columbia.edu/millionsong/)
[2]
[https://www.kaggle.com/c/msdchallenge#description](https://www.kaggle.com/c/msdchallenge#description)

~~~
gravypod
The main problem is that I'll just be starting with a folder of music friles
from iTunes for example and from there I'll build up my like & dislike
profile. So basically I'd like to have two vectors. The first being played
songs (last 10 or 20), the second being unplayed songs. When the played songs
are subtracted, I want to _sorted(unplayed, key=some_score)[0]_ and get the
next song to play.

My initial thought was just to keep a full list of songs and their liked,
disliked, and ambivilant states. Make some function to score a song based on
their distance & # likes from the played vector.

Doing this all by hand would be a LOT of work. Are there any frameworks that
do this? Is there any way to interface with them that are strait forward? Do I
need to get a PhD before this stuff will come easily to me?

So many questions, so little time.

~~~
ctchocula
I'd contend that you won't be starting with a folder of music files from
iTunes, but rather you'll have the benefit of the big data that others have
gone to the trouble of collating in the form of 1M users' listening histories.
From your initial like/dislike profile combined with the listening trends of
1M others, you can discover more implicit information than you can from just
your initial like/dislike profile.

The purpose of ML is that you can tell a machine to go through more data than
any human can. By expanding the corpus of songs from songs you've heard of to
all songs in the big dataset, you will be able to find songs you haven't heard
based on others who liked the songs you liked and disliked the songs you
disliked.

edit: Maybe I misunderstood the problem you're working on. I thought you
wanted to get recommendations of songs to listen to, but now I see that you
have a lot of songs you like already, so you just want a song recommended from
that list based on some factors like mood, time of day, etc.

~~~
gravypod
The point of the project is to be simple and to collect all the data that I'm
going to use to analize from my like and dislike. I want to do this as simply
as possible so I can just get my feet wet.

This project also has constraints. It needs to fit in a Raspberry Pi Zero and
on a microsd card with no internet connection running off the power from a USB
power supply in a car.

It needs to be fast-ish (I can run all of this math at the same time I'm
playing the ~3 minute song) and it needs to be small. If I've got 10-20GB of
music then songs+os+my data<=32GB. These days I think you can get a Rasbian
install ~1-2GB so that leaves us with ~10GB of data if I don't expand my music
collection ever because after I get it set up I can't be asked to update it
ever again.

So yes for this I just want to start with my music folder. A folder of audio
files. No lyrics, no nothing. Just things that can be infered from my like and
dislike.

------
applecore
Classic papers can be worth reading but it's still useful to know what's
trending.

Even a simple algorithm would be effective: the number of citations for each
paper decayed by the age of the paper in years.

~~~
quinnftw
I think what you are describing here is simply "average number of citations
per year", no?

~~~
rahimnathwani
I think he values recent citations more than older ones.

------
EternalData
Nice. Super excited to read through and build out a few things myself.

------
husky480
torchcraft is the best way to learn about machine learning.

If you can sim a set of boxes, you can learn whats inside them.

