Hacker News new | past | comments | ask | show | jobs | submit login
Learning to reinforcement learn (arxiv.org)
97 points by vivekchandsrc on Nov 19, 2016 | hide | past | web | favorite | 9 comments



How is this not at the top of the board? I can't wait until more people start applying these ideas to figure out optimal apply parameters in all these data mining and machine learning algorithms. Maybe actually put some science and reasoning into the field.


"Importantly, because it is learned, it is configured to exploit structure in the training domain."

Interesting concept. An algorithm that learns specifically how to better learn on data presented.


Even more interesting - an OpenAI paper on the roughly same subject/technique came out a little earlier (RL2 fast reinforcement learning for slow reinforcement learning https://arxiv.org/pdf/1611.02779.pdf). I think parallel inventions tends to indicate that multiple people have stumbled on a similar good idea at the same time. (disclaimer: I work at OpenAI. v pleased to see these two papers emerge so close together)


Or there's just so much work being done that people are stepping on each other's toes. There were like 5-10 papers at ICLR this year on just the same 'let's treat NN hyperparameters as a MDP' idea. Or look at the concrete distribution - two different Google groups published papers on almost the same exact idea simultaneously apparently in total ignorance of each other. Or the lipreading NNs the past week - two Oxford groups working on different datasets with different architectures publishing >human results. The pace is so fast the left hand doesn't know what the right hand is doing. It's fun for the spectator (you have no idea what will come out tomorrow, much less next month or next year) but it strikes me as rather inefficient.


On Google and Oxford's cases, I don't think it is common to broadcast among your colleagues in different groups your proposed ideas before you actually go into it. So I think it may happen even in the same company/university.

For the meta-learning papers, you may have interests to read the related work part of the RL^2 paper https://arxiv.org/pdf/1611.02779.pdf.

Quoted as follows,

"Our work draws inspiration from a particular line of work (Younger et al., 2001; Santoro et al., 2016; Vinyals et al., 2016), which formulates meta-learning as an optimization problem, and can thus be optimized end-to-end via gradient descent."

"Another line of work (Hochreiter et al., 2001; Younger et al., 2001; Andrychowicz et al., 2016; Li & Malik, 2016) studies meta-learning over the optimization process. There, the meta-learner makes explicit updates to a parametrized model."

Inspired by the same works, apply the meta learning idea into RL problems, meet the ICLR deadline together. Still make sense right?


Why do you call it inefficient? If one side was suppressing their papers because they were just slightly beat to the press, I would be worried. As it is, I'd be more worried about breaking a system that is working for the sake of efficiency than I am for any losses from the system.

I think more directly stated, do you think we could get better results somehow? What would those look like?


If you've been beat to the press, you might as well release your paper to salvage something from your sunk costs of time/effort/GPUs. It's inefficient because in solving almost the same exact problems, they are duplicating each other's work instead of sharing the intermediate steps. If there were more sharing of low-grade information, along the lines of 'I'm working on a lipreading CNN, it's going pretty well' 'oh hey we're working a lipreading CNN too!', then the datasets and GPU clusters and math could be pooled and better single results released quicker. As it is, now you have to read two different papers about lipreading CNNs and puzzle over the differences and two different papers about the concrete distribution trick, and they probably all came out a month or two later because everyone had to redo work for their separate system & paper.


Duplicated work is not necessary a sign of inefficiency, though.

My hope would be that they each learned something slightly different in solving the same problem. Eventually, things may converge to a single answer. However, there is no evidence to see that we should demand the convergence at the beginning.

So, the shame here is if folks are not comparing and contrasting the different solutions to the same problem. I confess I am guilty in that I have not read both papers. But I will try to see if it can help me understand.


machine superintelligence is hyperintegration of the metasystem

(disclaimer: incomplete knowledge is risk)




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: