My Favorite Deep Learning Papers of 2017 172 points by gjstein on Dec 28, 2017 | hide | past | favorite | 13 comments

 Super stuff. Can anyone here who reads these research papers tell me their process to understanding them? Does anyone try to recreate the models using the math from the papers? I'm interested in this stuff at a hobby level, but find my mind glazes over when trying to process what is being talked about. It would help if these papers were more interactive, or perhaps I need to sit down and try and build these models in tensor flow perhaps.
 Agh I typed up a response but it got deleted somehow. But basically, I view there as being 3 layers of understanding. For whatever level you're stuck at, there's different ways to get "unstuck".1. Understand the task they're solving, how they did it, and their results. Anybody with a basic understanding of the domain should be able to do this by reading the abstract/intro and conclusion. If you find yourself having trouble here, you probably just need more background in the field.2. Gain some intuition of why their method works. This is probably one of the hardest parts to figure out how to do, and is probably the part that most people stumble on. Really, this is basically the entirety of what you're trying to do when you learn math. There's also varying levels of intuition. There's "I get why this might work", "I get why this works", and "I get why it's impossible that this doesn't work", in order of difficulty. The more background you have, the easier this intuition is to grasp. Alternatively, you can bootstrap your intuition by reading other people's blog posts, talking to somebody who understands the paper, asking the authors, playing around with your own implementations, etc. I'm not sure anybody has any good answers for this stage. Personally, I really like good blog posts from bloggers I know are good, but unluckily, many papers do not have blog posts attached :(3. Finally, there's the strict mathematical rigor part. These levels aren't really strict; oftentimes, I'll treat math I'm not familiar with as a black box theorem. If you don't have the math background for these proofs, there's usually not much better recourse than learning the subject properly.Luckily, many ML papers barely have any mathematical proofs :)Alex Irpan has a great explanation of the Wasserstein GAN paper here: https://www.alexirpan.com/2017/02/22/wasserstein-gan.htmlIf you're looking for interactive blog posts, distill.pub probably has the best: https://distill.pub/2017/research-debt/And one final note: Many papers (especially papers that aren't math papers) are often surprisingly simple to get to step 2; it's just hidden behind a lot of cruft. I will say that it's wise to be careful about so called "intuitive explanations" of a concept. If somebody gives an "intuitive" explanation for why X is true, but that intuitive explanation doesn't explain why !X is false, it's not very useful.
 Thanks for the links, hadn't even heard of distilling but the papers on that site are a lot more approachable for sure. Thanks for the other insights too, I will keep them in mind for the next paper I read.
 Are GAN's pretty hot these days or is that just a coincidence of the authors preferences? I just received Ian Goodfellow's Deep Learning textbook [1] and I know he pretty much invented the technique, so I'm wonder how influential / important GANs are in the field.
 GANs are for graphics, not machine learning. They have no test scores as they cannot run on the test set. Therefore, its hard to tell how much they overfit.But they have uses, like for lossy compression of images or texture generation. Stuff which focuses on the graphical side of things where it outperforms machine learning methods with its crisper samples.For an overview of this argument: https://arxiv.org/abs/1511.01844The idea of adversarial training is important and relevant in ML though! It allows for setting up losses which are hard to formulate otherwise.
 Huh? GANs are for graphics, not machine learning? I mean, first of all, how are GANs not machine learning? They are definitely "machine learning", and I don't think anyone would disagree.Second of all, GANs are most definitely not just for graphics. They've been applied to text generation, to generating adversarial examples, to data preprocessing, etc.Third, I have no idea what you even mean by "test set" in the context of GANs. It is true that it's hard to tell their performance, but that's irrespective of whatever you're talking about. It's hard to evaluate performance because we're usually judging the quality of the generated images, and we don't have any good ways of evaluating "perceptual loss", or how real an image looks.As for the OP, GANs have been a very hot topic. Not as hot as this blog post makes them look perhaps (with nearly every paper about them...), but I wouldn't really disagree with any of the papers posted. Only one I'm not familiar with is the "most useful" one, but the rest were all pretty great papers imo. As for Ian Goodfellow, he's a very smart guy who seems to do a pretty good job explaining things. I saw a couple YouTube videos from him at a meetup covering his DL book, and he did a great job teaching.
 Although I would agree that GANs are part of machine learning, some people definitely do disagree, and their concerns are valid. It's definitely an area of open research.Your third point is actually the point of those who disagree. It's the same reason why we have the principle of unfalsifiability in science.
 I'm a bit confused about the point that you're stating. I've never seen anybody not group GANs under machine learning.Machine learning is typically split into supervised learning, unsupervised learning, and reinforcement learning, and GANs are usually considered part of unsupervised learning. I guess the part I don't understand is what you mean by "their concerns are valid"? What are their concerns about? Whether GANs are a promising path of research? And if GANs aren't part of machine learning what are they?
 The difficulty of evaluating GANs is nowhere near the level of unfalsifiability, and it is not caused by GANs themselves being a bad technique, but by the problem space they are applied in.When you are trying to generate "realistic" samples of human concepts, the ultimate measure of evaluation is whether humans think that the output is realistic. So you have no choice but to ask humans to judge the quality of your results. That's a standard thing to do e.g. in text-to-speech generation, whether GANs are used or not.
 I don't think GANs are being used to do anything very serious at the moment (I would love to be corrected on this), but they're very exciting in terms of their capabilities and promise. Some good progress has been made over the last year in making them easier to train and making them so they produce varied outputs
 WaveNet2 is using a variant of GANs to produce the most high fidelity speech (text->to->voice) of any computer system out there - in realtime. WaveNet from last year took 50 sec (i think) to generate 1 sec of audio. By using a variant of a GAN to re-engineer WaveNet, they can now produce real-time high quality text-to-voice: https://deepmind.com/documents/131/Distilling_WaveNet.pdf
 WaveNet2 is not a GAN, and describing it as a "variant of GANs" is like calling Python a variant of Java: it's highly misleading (WaveNet2 and GANs are both generative models, Python and Java are both programming languages).Also WaveNet2 makes no improvements in the actual quality of the model, only the run-time performance.
 Yep, from the paper:> It is worth noting the parallels to Generative Adversarial Networks (GANs [7]), with the student playing the role of generator, and the teacher playing the role of discriminator. As opposed to GANs, however, the student is not attempting to fool the teacher in an adversarial manner; rather it cooperates by attempting to match the teacherâ€™s probabilities. Furthermore the teacher is held constant, rather than being trained in tandem with the student, and both models yield tractable normalised distributions.WaveNet2's main resemblance to a GAN is that it uses another neural network for the loss function.

Search: