
Learning to reinforcement learn - vivekchandsrc
https://arxiv.org/abs/1611.05763
======
samirm
How is this not at the top of the board? I can't wait until more people start
applying these ideas to figure out optimal apply parameters in all these data
mining and machine learning algorithms. Maybe actually put some science and
reasoning into the field.

------
sdx23
"Importantly, because it is learned, it is configured to exploit structure in
the training domain."

Interesting concept. An algorithm that learns specifically how to better learn
on data presented.

~~~
mappingbabeljc
Even more interesting - an OpenAI paper on the roughly same subject/technique
came out a little earlier (RL2 fast reinforcement learning for slow
reinforcement learning
[https://arxiv.org/pdf/1611.02779.pdf](https://arxiv.org/pdf/1611.02779.pdf)).
I think parallel inventions tends to indicate that multiple people have
stumbled on a similar good idea at the same time. (disclaimer: I work at
OpenAI. v pleased to see these two papers emerge so close together)

~~~
gwern
Or there's just so much work being done that people are stepping on each
other's toes. There were like 5-10 papers at ICLR this year on just the same
'let's treat NN hyperparameters as a MDP' idea. Or look at the concrete
distribution - two different Google groups published papers on almost the same
exact idea simultaneously apparently in total ignorance of each other. Or the
lipreading NNs the past week - two _Oxford_ groups working on different
datasets with different architectures publishing >human results. The pace is
so fast the left hand doesn't know what the right hand is doing. It's fun for
the spectator (you have no idea what will come out tomorrow, much less next
month or next year) but it strikes me as rather inefficient.

~~~
taeric
Why do you call it inefficient? If one side was suppressing their papers
because they were just slightly beat to the press, I would be worried. As it
is, I'd be more worried about breaking a system that is working for the sake
of efficiency than I am for any losses from the system.

I think more directly stated, do you think we could get better results
somehow? What would those look like?

~~~
gwern
If you've been beat to the press, you might as well release your paper to
salvage something from your sunk costs of time/effort/GPUs. It's inefficient
because in solving almost the same exact problems, they are duplicating each
other's work instead of sharing the intermediate steps. If there were more
sharing of low-grade information, along the lines of 'I'm working on a
lipreading CNN, it's going pretty well' 'oh hey _we 're_ working a lipreading
CNN too!', then the datasets and GPU clusters and math could be pooled and
better single results released quicker. As it is, now you have to read two
different papers about lipreading CNNs and puzzle over the differences and two
different papers about the concrete distribution trick, and they probably all
came out a month or two later because everyone had to redo work for their
separate system & paper.

~~~
taeric
Duplicated work is not necessary a sign of inefficiency, though.

My hope would be that they each learned something slightly different in
solving the same problem. Eventually, things _may_ converge to a single
answer. However, there is no evidence to see that we should demand the
convergence at the beginning.

So, the shame here is if folks are not comparing and contrasting the different
solutions to the same problem. I confess I am guilty in that I have not read
both papers. But I will try to see if it can help me understand.

