Hacker News new | past | comments | ask | show | jobs | submit login
The most cited deep learning papers (github.com)
452 points by sdomino on Feb 15, 2017 | hide | past | web | favorite | 47 comments

I can understand why it probably isn't on the list yet (not as many citations, since it is fairly new) - but NVidia's "End to End Learning for Self-Driving Cars" needs to be mentioned, I think:



I implemented a slight variation on this CNN using Keras and TensorFlow for the third project in term 1 of Udacity's Self-Driving Car Engineer nanodegree course (not special in that regard - it was a commonly used implementation, as it works). Give it a shot yourself - take this paper, install TensorFlow, Keras, and Python, download a copy of Udacity's Unity3D car simulator (it was recently released on GitHub) - and have a shot at it!

Note: For training purposes, I highly recommend building a training/validation set using a steering wheel controller, and you'll want a labeled set of about 40K samples (though I have heard you can get by with much fewer, even unaugmented - my sample set actually used augmentation of about 8k real samples to boost it up to around 40k). You'll also want to use GPU and/or a generator or some other batch processing for training (otherwise, you'll run out of memory post-haste).

Nice. I'm wondering how often NVidia's solution makes a mistake. Also, the paper says:

> More work is needed to improve the robustness of the network, to find methods to verify the robustness, and to improve visualization of the network-internal processing steps.

But it doesn't hint at how this would be approached.

Also, how they arrived at the particular network topology seems sort of a mystery.

That's pretty cool. How long does it take to train something like that? And did it work?

http://people.idsia.ch/~juergen/deep-learning-conspiracy.htm... oh Juergen

> Machine learning is the science of credit assignment. The machine learning community itself profits from proper credit assignment to its members. The inventor of an important method should get credit for inventing it. She may not always be the one who popularizes it. Then the popularizer should get credit for popularizing it (but not for inventing it). Relatively young research areas such as machine learning should adopt the honor code of mature fields such as mathematics: if you have a new theorem, but use a proof technique similar to somebody else's, you must make this very clear. If you "re-invent" something that was already known, and only later become aware of this, you must at least make it clear later.

I mean, there was a nice 15 or so year period between the "new" post-Minsky-and-Papert Perceptron book connectionism and the current "new new" connectionism where neural nets were a definite backwater. Most of the PIs doing neural nets dealt with it by being sad but Schmidhuber seems to have dealt with it by doubling down on the weirdness.

Just because it wasn't wildly popular doesn't mean there was a gap of any sorts. I agree with Schmidhuber. http://grey.colorado.edu/emergent

This might be as good a place to ask as any. Does anyone have suggestions on the problem of annotating natural language text to get a ground truth for things that have no readily available ground truth (subjective judgments of content etc.)? I do own the book "Natural Language Annotation" which is good but not exactly what I need. The part of annotation guidelines and how the annotation was done in practice is often only brushed over in many research papers. I mean I get it at a high level it's basically have a couple of raters, calculate inter- and intrarater reliability and try to optimize that. However like I said I'm struggling a bit with details. What are actually good values to aim for, how many experts do you want, do you even want experts or crowd source, what do good annotation guidelines look like, how do you optimize them etc.? Just to play around with the idea a bit, we did a workshop with four raters and 250 tweets each (raters simply assigned one category for the entire tweet) and that was already quite a bit of work and feels like it's on the way to little side of things.

I feel like I should find a lot more info on this in the sentiment analysis literature but I don't really.

You might want to browse archives of LREC biannual conference, they have sections focused on resource creation and some of the larger projects should have papers on annotation methodology. LDC (https://www.ldc.upenn.edu/) is probably the largest organization doing a large variety of annotation tasks, maybe they have published how they do things, I'm not sure.

However, often there are no real shortcuts; in many projects the resource annotation takes much more work and more people than everything else together, it's not uncommon to see multiple man-years spent to do that properly.

What you say about the high level is just about all that can be said in general, everything else will depend on your particular problem. After you've fixed the bugs in your process, interannotator agreement is not really a description of your annotators but a measure of how subjective/objective your task is - and you can't really change that without meaningful changes to how exactly you define your task. Some tasks are well suited for crowdsourcing, and some need dedicated experts. Some annotation tasks are straightforward and annotation guidelines fit on one page; for others the annotation guidelines are a literal book, and one that needs revisions as a few years after you figure out that you need changes. It depends. Shallow sentiment analysis is generally on the trivial side of annotation (but highly subjective), but you can go far enough the rabbit hole to drag the whole surrounding issues of intent, degree of belief, degree of certainty, etc - then you hit the full complexity of deep semantic annotation.

Perhaps you just need to find the people who did the latest more interesting datasets in your domain and ask them directly. I don't handle sentiment, but http://alt.qcri.org/semeval2017/task5/ is one group of people that seems to do it seriously.

Thank you, excellent and helpful post. I see I have visited the LDC site before but didn't remember that one. I'll have another look. SemEval looks like it could be exactly what I'd love to poke around and they have datasets available.

Have you heard of Word2Vec?

In a nutshell, its a deep learning model that given a word, predicts the other words around it, or alternatively, given some words in a sentence predicts the missing word. The idea is that similar words end up being assigned to similar vectors, all without knowing a ground truth.

Now that won't exactly answer your question, but then you can keep a couple of words related to your sentiment in a list and compare the words in the tweet to that list. If they are similar enough, you can write a rule to mark that tweet as matching your sentiment.

Someone needs to make a summary of the top papers and explain it in a way lay man can understand. I would pay $500 for such a book/course explaining the techniques.

I've been reading a number of this papers but it's really tough to understand the nitty gritties of it.

Jeremy Howard's course (http://course.fast.ai/) is much as you describe, the downside is it's a little too boring for people with mathematical education.

Thanks I'll check it out.

No PDP book? It's old and weird but interesting and has a lot of original ideas, notwithstanding the actual original backprop being from before then. Nor the original backprop stuff?

The PDP book is the main textbook for a course I'm taking at CMU called... PDP. It's digestible, but man is it weird to see things like "this is an ongoing area of future research" where the future = after 1986.

I find it kind of hard to relate to for that reason — how do I know (besides asking my prof., whose PhD advisor was Hinton, or doing my Googling) what ideas ended up "sticking"? What areas of future research went nowhere vs spawned whole new subfields?

Is there a more modern textbook of the sort I could cross-reference?

edit: here's a link to the course website: http://www.cnbc.cmu.edu/~plaut/IntroPDP/

Nobody really cares about this stuff anymore except weirdoes like me so you might actually be best off asking your prof.

There's also a lot of resurrection that goes on with weird ideas: FastFood cites a paper that cites a paper that cites Lecun's brain damage paper, for example

For those interested in PDP, one of Jay McClellands students is Professor Randy O'Reilly at CU. He develops the emergent neural network simulator and is using biologically plausible deep learning to learn how the mind works. He's been doing it all along. There was never a gap!

The simulator: http://grey.colorado.edu/emergent The textbook: https://grey.colorado.edu/CompCogNeuro

The PDP papers are in a bunch of volumes, in addition to the handbook. Chapter 3 of the handbook, I recall, has a little bit from Hinton's original BM work which is of interest to the cognitive scientist, as well as a pretty obstruse but very good description of the entire process. Very historical, very strange by modern ANN standards.

> Very historical, very strange by modern ANN standards.

That's what I gathered from a brief look at things, which is why I find it very interesting - both from a historical context, as well as the idea that maybe there are some nuggets of (possibly overlooked?) wisdom hidden within, that might be useful in today's world.

Then again, I am but an interested (and very amateur) hobbyist in this whole ML space, and not likely to come up with any breakthroughs - but you never know I suppose!

Well, if there's any overlooked nuggets of wisdom, Hinton would probably still know, and Sejnowski, and Smolensky, and McClelland, and all those other folks still doing DL stuff, as they have been doing continuously since Reagan was president.

Hey, if you can find it, post some links here - even if not mentioned there, it could be interesting to others (hint: I'm interested!)...

Other things off the top of my head:

Smolensky's hamonium (rbm)

the LeNet papers

Rumelhart's BPTT

Werbos's thesis

Jaeger's ESN nature paper

Forget gate paper, whose author I forget (ironically)

I always wanted to apply the knowledge of the deep learning to my day to day work. We build our own hardware that runs the Linux on Intel CPU and then launches a virtual machine that has our propriety code. Our code generates a lot of system logs that varies based on what is the boot sequence, environment temperature, software config etc. Now we spend a significant amount of time go over these logs when the issues are reported. Most of the time, we have 1 to 1 mapping of issue to the logs but more often, RCA'ing the issue requires the knowledge of how system works and co-relating this to the logs generated. We have tons of these logs that can be used as training set. Now any clues on how we can put all these together to make RCA'ing the issue as less human involved as possible?

Use dumber ML first, try some random forests. Not even because they're even that much better or worse, just because DL requires an enormous amount of knowledge and fiddliness but what you prolly want is for the bulk of the actual work to set up the data for ML, not hyperparameter fiddling and architecture fiddling.

Before you try any sort of ML, explore your data [1, 2]. Exploratory data analysis may very well tell you that there is absolutely no point in making a fancy predictive model at all. If a few heuristics get you 90% of the way to an optimal solution, then don't even bother to start on machine learning, unless that last 10% is going to provide significant value.

[1] https://en.wikipedia.org/wiki/Exploratory_data_analysis

[2] From one of my mentors: http://www.unofficialgoogledatascience.com/2016/10/practical...

I agree with this - try the "simpler" solutions first to see if they'll model what you need. No sense in getting lost in more complex methods if a simpler solution will suffice.

What you could do is assemble the data in tabular form so that your data is in the shape:

    Issue     System log
    -------- ------------
    issue_1   corresponding system log
    issue_2   corresponding system log
    issue_3   corresponding system log
    issue_4   corresponding system log
    issue_5   corresponding system log
Once you've done that, you can train some sort of classifier on it, e.g. something like [1]. There's a bunch of stuff you want to do to make sure you're not overfitting (I'd scale your data & use 5-fold cross validation), but that would get you started.

[1]: http://scikit-learn.org/stable/tutorial/text_analytics/worki...

First - great answer, and thanks for time and response! And now, for some issues, the RCA depends on the order of the syslogs. For some complex issues, the RCA changes based on what path the code took making the order of the syslog change and hence the RCA. Guess I will have to spend some time to incorporate syslog order to the table format you are suggesting.

If it's possible to split the log out into a more granular format, beyond what fnbr has suggested, then it can potentially be used with more complex models; keep the issue as the "label", and the "system log" (or a hash representation?) as well - but if the log entry can be broken up into other data points, it can be useful in other ML methods.

Then again, if the log entry has a somewhat set length (or can be truncated), you could feed that in as the input to a CNN (one input node/neuron per character), and the output layer could consist of the issue labels. I'm not sure what if anything that could net you; perhaps an unknown log could be input on the trained network, and it could classify it to an existing issue?

If you can upload a sample log, I'd be happy to take a look and try to provide some more specific guidance (email's in profile).

I did some work using stack traces to predict duplicate bug reports, so I'm somewhat familiar with a similar problem.

The most cited deep learning papers: https://scholar.google.com/scholar?q="deep+learning"

Has anyone downloaded them into their own separate folders and zipped the whole thing up?

This is a really lucky find for me. I was just about to do something to try and get into machine learning. Right now I need some help getting started with writing some machine learning code. I don't know where to start. I've come up with a very simple project that I think this would work very well for.

I want to buy a Raspberry Pi Zero, put it in a nice case, add to push buttons and turn it into a car music player (hook it into the USB charger and 3.5mm jack in my car). The two buttons will be "like" and "skip & dislike". I'll fill it with my music collection, write a python script that just finds a song, plays it, and waits for button clicks.

I want the "like" button to be positive reinforcement and the "skip & dislike" to be negative reinforcement.

Could someone point me in the right direction?

Your usecase reminded me of the Netflix problem, which is given x movies that the user has liked, try to recommend movies to them based on a large dataset that has thousands of users and their movie ratings. For music, there is a similar dataset[1] and problem on Kaggle[2].

The way the system is evaluated is by building a model that will predict what rating the user will give a song even though the user has not rated it yet. Then the difference in predicted and actual rating will be computed as the testing error in the model. Some basic techniques for building a model are regression and matrix factorization using SVD (singular value decomposition).

Your usecase might be slightly different from this problem, because you wouldn't have to predict the rating other users give a song (only for yourself) and you want your model to change on the fly given a skip and dislike. A simple, but possibly effective solution might be to search the music dataset that contains the listening history of 1M users to find songs you haven't rated before and download it to listen.

[1] http://labrosa.ee.columbia.edu/millionsong/ [2] https://www.kaggle.com/c/msdchallenge#description

The main problem is that I'll just be starting with a folder of music friles from iTunes for example and from there I'll build up my like & dislike profile. So basically I'd like to have two vectors. The first being played songs (last 10 or 20), the second being unplayed songs. When the played songs are subtracted, I want to sorted(unplayed, key=some_score)[0] and get the next song to play.

My initial thought was just to keep a full list of songs and their liked, disliked, and ambivilant states. Make some function to score a song based on their distance & # likes from the played vector.

Doing this all by hand would be a LOT of work. Are there any frameworks that do this? Is there any way to interface with them that are strait forward? Do I need to get a PhD before this stuff will come easily to me?

So many questions, so little time.

I'd contend that you won't be starting with a folder of music files from iTunes, but rather you'll have the benefit of the big data that others have gone to the trouble of collating in the form of 1M users' listening histories. From your initial like/dislike profile combined with the listening trends of 1M others, you can discover more implicit information than you can from just your initial like/dislike profile.

The purpose of ML is that you can tell a machine to go through more data than any human can. By expanding the corpus of songs from songs you've heard of to all songs in the big dataset, you will be able to find songs you haven't heard based on others who liked the songs you liked and disliked the songs you disliked.

edit: Maybe I misunderstood the problem you're working on. I thought you wanted to get recommendations of songs to listen to, but now I see that you have a lot of songs you like already, so you just want a song recommended from that list based on some factors like mood, time of day, etc.

The point of the project is to be simple and to collect all the data that I'm going to use to analize from my like and dislike. I want to do this as simply as possible so I can just get my feet wet.

This project also has constraints. It needs to fit in a Raspberry Pi Zero and on a microsd card with no internet connection running off the power from a USB power supply in a car.

It needs to be fast-ish (I can run all of this math at the same time I'm playing the ~3 minute song) and it needs to be small. If I've got 10-20GB of music then songs+os+my data<=32GB. These days I think you can get a Rasbian install ~1-2GB so that leaves us with ~10GB of data if I don't expand my music collection ever because after I get it set up I can't be asked to update it ever again.

So yes for this I just want to start with my music folder. A folder of audio files. No lyrics, no nothing. Just things that can be infered from my like and dislike.

you should just build it and then do the recommendations by a heuristic function. Then you can substitute the function with an ML classifier once you have enough data to train on (and time to learn about ML). Don't wait on ML coding tips for this project

I'm not sure your problem is well defined enough. Do you want the ML to be able to select songs similar to those you've liked? Or would simply keeping track of the number of likes on each song suffice, such that songs with more likes have a higher probability of being played?

Perferably (Ordered by percived difficulty):

    - Prefer songs from most liked genera/artist/album
    - Favor liked song/genera/artist/album & avoid consistantly disliked song/genera/artist/album.
    - If a specific song/genera/artist/album is liked more at a specific time, play it more often at that time.
    - If a song is being played and specific song/genera/artist/album is always liked when it follows this song then favor those. 
    - If you are playing a song and there is a song that is always liked when it follows this song favor that song when choosing the next song.
Possibly (if I decide to devote my life to making a music recommending engine):

    - Corelate GPS location with song choices
    - Corelate weather/humidity/tempature/month with song choices
    - Do NLP sentiment analysis to score all of the songs on an emotional-tone scale to help group them well.
    - Do some DSP on all of the audio to generate a intro and outro beat/rythm profile and attempt to best match the song that will lead to a good enterance/exit.
edit: fixed formatting.

Classic papers can be worth reading but it's still useful to know what's trending.

Even a simple algorithm would be effective: the number of citations for each paper decayed by the age of the paper in years.

I think what you are describing here is simply "average number of citations per year", no?

I think he values recent citations more than older ones.

Nice. Super excited to read through and build out a few things myself.

torchcraft is the best way to learn about machine learning.

If you can sim a set of boxes, you can learn whats inside them.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact