Hacker News new | past | comments | ask | show | jobs | submit login
Generative Models (openai.com)
353 points by nicolapcweek94 on June 16, 2016 | hide | past | web | favorite | 55 comments

This is so cool and I can't help but feel like I'm missing something important that's taking place and has huge potential.

As a busy programmer who gets exhausted at night from the mental effort required at my day job, I have a feeling like I will never be able to catch up at this rate.

Are there any introductory materials to this field? Something I can read slowly during the weekends, that gives an overview of the fundamental concepts (primarily) and basic techniques (secondarily) without overwhelming the reader in the more advanced/complicated techniques (at least during the beginning).

I'd really appreciate any recommendations.

For reinforcement learning, one of OpenAI's focus areas, the book by Sutton & Barto is still the standard reference: https://webdocs.cs.ualberta.ca/~sutton/book/the-book.html

Improved algorithms have been devised since it was written, see


and, in particular,


I personally found Andrew Ng's videos on Reinforcement Learning from cs229@stanford + inverted pole balancing programming assignment to be great intro's on the topic.

Also see Algorithms for Reinforcement Learning: http://www.ualberta.ca/~szepesva/RLBook.html

David Silver's Reinforcement Learning Course teaches from Sutton & Barto's book.


Assuming you know basic algebra and calculus already, learn introductory statistics and linear algebra first. Then pick up a book like http://www-bcf.usc.edu/~gareth/ISL/ (or companion lectures http://www.r-bloggers.com/in-depth-introduction-to-machine-l...).

If you're a beginner, don't start with deep nets. Start with basic data analysis.

Don't worry. The models are very simple.

Fundamentally, these models are just trees of multiplications that are computed over and over.

You can construct some here: http://playground.tensorflow.org

I'm in the same situation, but I'd say: don't worry too much.

It feels like many balls are still up in the air regarding deep learning, and it's likely that the dust will settle at some point. The tried and true will remain and it's essence will emerge, will the rest will sink to the bottom.

I decided not to worry but stay well read.

Crossing my fingers for a library or API to do the grunt work for me.

We will have lots of pre-made AI blocks that do all sorts of functions. Actually using them will be easy. We don't need to understand every nuance of probability theory to call a library and have it do its work.

That reminds me of something I was doing recently, really rough corpus analysis, trying to see how much text coverage the words in the NGSL give you. Honestly thought it'd take me a couple of weeks to do.

Got into NLTK, used the built in sentence tokenizer, word tokenizer, then wordnet POS tagging to remove proper nouns, added some more cleanup code, and I had something passable within two days.

Now at this point I couldn't write a POS tagger to save my life, but it was cool seeing code you wrote over two evenings run over 30k books just like that (which still took a week, but ah well).

I had the opportunity to study Coursera's ML course a couple of years back when I was in college and developed a deep passion for the area. I was out of touch with ML since 1.5 years and now coming back to it seems overwhelming. I mean there is so much more to learn. The gap between classic ML and Deep Learning is noticeably huge. This is due to the rapid development in the recent years. You won't get things like gradient clipping, learning decay, dropouts etc. in the coursera course. Moreover, new papers are released every other day and one needs to devote time to stay updated.

And when I think about people who are not familiar with even Machine Learning, then really need to buckle up and spend serious time to catch-up with the technology that's making history today.

But now is really a good time to start. There are only a bunch of people in the whole wide world who are masters of DL and anyone with skills in it is in high demand. And it's not just about a job, "it is really cool" to play with it. I really feel I'm doing something heavy.

this would be worth your while https://www.youtube.com/watch?v=KeJINHjyzOU

I think the best option for starting out is to watch Andrew Ng's original ML course, the one he made before creating Coursera. It's just perfect - the right level of difficulty for beginners, full of insights and practical.

Brief summary: a nice intro about what generative models are and the current popular approaches/papers, followed by descriptions of recent work by OpenAI in the space. Quick links to papers mentioned:

Improving GANs https://arxiv.org/abs/1606.03498

Improving VAEs http://arxiv.org/abs/1606.04934

InfoGAN https://arxiv.org/abs/1606.03657

Curiosity-driven Exploration in Deep Reinforcement Learning via Bayesian Neural Networks http://arxiv.org/abs/1605.09674

Generative Adversarial Imitation Learning http://arxiv.org/abs/1606.03476

I think the last one seems very exciting, I expect Imitation Learning would be a great approach for many robotics tasks.

Very cool. As you're thinking about unsupervised or semi-supervised deep learning, consider medical data sets as a potential domain.

ImageNet has 1,034,908 labeled images. In a hospital setting, you'd be lucky to get 1000 participants.

That means those datasets really show off the power of unsupervised, semi-supervised, or one-shot learning algorithms. And if you set up the problem well, each increment of ROC translates into a life saved.

Happy to point you in the right direction when the time comes—my email is in my HN profile.

Most top Hospitals in USA have high quality data on Millions of patients the legal and bureaucratic challenges to sharing those datasets are insurmountable. However if you are affiliated a university hospital its not difficult to get 690,000 CT scans or time series data with 400+ signals from 450,000 Operations.

Even outcomes data procedures performed and diagnosis across multiple visits can be easily obtained for millions of patients on national scale. My research involves applying deep learning to these datasets.

Isn't the labeling really tricky, though?

In my limited experience, EHRs aren't usually setup to handle structured labeling of something like an image. There are lots of different fields for text entry that can be unstructured. Then the only label left is the billing code, which ends up being a poor choice of label since the hospital often bills for what it can get reimbursed for, not what you actually had.

You don't need labels for the image if you can get them from other patient information, in particular, the diagnosis.

E.g. you know from image metadata that it's a chest x-ray of patient #1234 at 2012/03/04. Then you automatically check patient EHR near that date - do they have lung cancer Y/N; do they have broken ribs Y/N; do they have TB Y/N, etc, and make your image labels based on that. How diagnosis are codified, though, differs significantly between various medical systems, I have no idea how it's in USA EHR.

Very cool. This is an academic project? Can you talk at all about the tools you're using?

Yes its an academic project. You can find more info on : http://www.computationalhealthcare.com

We are using data provided AHRQ HCUP and some internal datasets. TensorFlow for ML.

Have these techniques been used to generate realistic looking test data for testing software? I have had ideas along these lines but people think I'm talking about fuzz testing when I try and describe it.

I'm imagining something where you take a corporate db and reduce it down to a model. Then that can be shared with third parties and used to generate unlimited amounts of test data that looks like real data w/o revealing any actual user info.

That depends on the nature of the data, I think. If the data has a lot of sequential, sparse, hierarchical statistical dependencies (like source code, text or data streams), they might be better modeled by an LSTM. If you have high-dimensional dependencies (like images, where each pixel tends to spatially depends on many other pixels), then an autoencoder or some undirected model might be the right choice.

Looks like fake accounts on Facebook will have real unique userpics soon

And games will have more variety in NPCs faces!

hahaha! Excellent remark, I didn't think of this one.

I really like that they used TensorFlow and published their code in GitHub. It will help a lot of people like me, that are new in the field and want to learn more about generative models. Amazing work by the OpenAI team!

In theory everything OpenAI does will be available on GitHub or in some comparable form: that's the point of the organization. That's why it's called Open AI. So that we can all share the benefits, instead of just Google having it for themselves. Because we all know that's who's hording the AI progress.

It looks like they are using both TensorFlow and Theano. Is there a reason to use both?

The VAE code and the semi-supervised part of the GAN code build on code that was developed about half a year ago, when Tensorflow was less developed and was lacking in speed and functionality compared to Theano. It has since caught up and most new projects at OpenAI are now done using Tensorflow, which is what we used for the newer additions.

Could you mention a bit about why you're using Tensorflow?

I'm glad you are since I'm using it myself, but I haven't used any other frameworks so I'm wondering if I should expect more people to head in this direction, or spend time learning others.

There are currently many excellent frameworks to choose from: TensorFlow, Theano, Torch, MXNet are all great. The comparative advantage of TensorFlow is mostly its support in the community (e.g. most stars on GitHub, most new projects being developed, etc).

The community around Tensorflow is great (lots of people that try to recreate results from new papers using TF), but if you're worried about putting all your eggs in one basket (or want to be a higher level up) you should checkout Keras if you haven't yet. It lets you write generic nets that can run on Theano or TF.

The actual outputs look grotesque. Disembodied dog torsos with seven eyeballs and such. It's cool, but to me this is clearly showing the local nature of convolutional nets; it's a limitation that one has to overcome if one is to truly generate lifelike images from scratch.

Those weren't the best images. Current best results don't have disembodied dog torsos. I remember a paper that was about generating plausible bedroom images. Not only did they look real, but they could interpolate between two bedrooms generating a transformation sequence.

Yup, that was the original DCGAN paper:


Check out these generated images: http://arxiv.org/pdf/1605.09304v1.pdf

However, the technique does not seem to have a generative interpretation.

The generated images look like the stuff nightmares are made out of. Which is to say they're extremely aesthetically unpleasant. So what exactly have these networks learned?

They've learned an approximation of what stuff looks like projected into 2D.

My guess is that your brain is creeped out by an uncanny-valley-like effect. The images are plausible in their structure so part of your visual system is happy, but the causality is not there, so your brain is thrashing around looking for meaning that is missing.

Can we see somewhere the generated images with higher resolution ?

No, that's how they come out of the model.

Using larger images means your code runs much (exponentially) slower, and gives you only slightly (asymptotically) better results so people usually use tiny images. All their outputs are 32*32.

Why do I constantly feel like I'm missing out with all this stuff?

What a beautifully presented research.

Interesting topic, tedious article. Paraphrasing:

Q: What's a generative model?

A: Well, we have these neural nets and...

Ugh. I understand the excitement for one's own research but if the point is to make these results accessible to a wider audience then it's important not to get lost in the details, at least not right away. IMO, there's very little here in the way of high-level intuition. If I did not already have a PhD, and some exposure to ML (not my area), I would probably find this article entirely indecipherable. Again, paraphrasing:

Q: OK, so I understand you want to create pictures that resemble real photos. And you really like this DCGAN method, right?

A: Yes! See, it takes 100 random numbers and...

Come on guys. You can do better.

>if the point is to make these results accessible to a wider audience

It is not. While it's a big, growing field, it's really a narrow audience that can be expected to understand this, far from everyone in the field. How intuitive the writing appears is subjective. I'm sure I don't understand a word of it, not just for lack of intuition.

FWIW, I found this comment pretty indecipherable. I have no idea how your examples illustrate your point.

Maybe you can do better as well? Which is to say, effectively communicating something technical to a diverse audience is difficult, let's not be unnecessarily derisive.

>Which is to say, effectively communicating something technical to a diverse audience is difficult, let's not be unnecessarily derisive.

There's nothing especially derisive in my assessment. I don't think the content is bad, just boring. I also think it's too technical for a non-specialist audience.

> Maybe you can do better as well?

My first criticism is that generative models are not something specific to neural nets but that's not obvious from the article.

My second criticism is that their explanations are overly mechanical. In the case of DCGAN the article begins by talking about parameters and magic numbers; i.e. they explain how the thing works rather than what it does, at an intuitive level.

Clear enough?

Listen to gradstudent, he knows what he's talking about.. this article presents generative models as a grab bag of NN algorithms that came out this quarter.

Most papers are migrating from plain classification to variational methods. This is the new trend. That means, instead of just predicting labels, they know also predict a probability for each label, a degree of confidence. And they work both ways, they can generate, as well as classify. And they are unsupervised, which means they can benefit from tons of data laying around.

Your definition of "variational methods" is actually a definition of generative models. https://en.wikipedia.org/wiki/Generative_model. variational methods is a much more specific concept, and your answer speaks more to the general and high-level concept of generative models.

Notice on the wikipedia page for generative models, there is a lot more than variational methods.

>they explain how the thing works rather than what it does

In some kinds of logic this is impossible, but isn't that how reports are usually written? The what is the conclusion of the how. Many papers omit the mechanics completely and get critized for it, too.

> I don't think the content is bad, just boring

It's talking about machines gaining the power of imagination. How is that boring?

> It's talking about machines gaining the power of imagination. How is that boring?

It does nothing of the sort! If anyone comes away with this conclusion I would say the article has failed entirely. Which, btw, is my whole point: there's no over-arching intuitive explanation of what generative models are, why they're interesting or even what concrete problem they're solving.

I have been reading up on ML papers for months, and found the article pretty basic. It just gave a nice overview of the state of the art. If anything, it didn't go deep enough to get to the real interesting bits.

From their perspective, it's hard to put such information in an accessible format. Try explaining redux for example, to a person who has no idea what functional programming is. How would you do it?

Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact