
Learning from Scratch by Thinking Fast and Slow - rusht
https://davidbarber.github.io/blog/2017/11/07/Learning-From-Scratch-by-Thinking-Fast-and-Slow-with-Deep-Learning-and-Tree-Search/
======
nlperguiy
[https://arxiv.org/abs/1705.08439](https://arxiv.org/abs/1705.08439)

The original paper.

The references in the paper paint a much clearer picture of where exactly the
idea behind reinforcement learning with optimal, suboptimal, random oracles
comes from. There are also mathematical proofs that these setups work.

I was quite shocked to not see [6, 16] references in any of the recent MCTS
papers.

These references prove why the stuff works and show how well it works. But the
whole field of imitation learning seems invisible to the deep RL papers. Don't
have the faintest idea why.

The algorithm described is the ultimate generalized algorithm. If you have the
expert policy the algorithm is learning completely supervised, if expert
policy is suboptimal but the score (loss) is fully calculable the learned
policy will outperform the reference policy, if expert policy is completely
random the algorithm behaves as reinforcement learning.

What the paper at the top adds is the ability to improve the expert policy
with the learned one simultaneously in unison and the math covered previously
guarantees improvement.

~~~
adamweld
And of course, the name is in reference to Daniel Kahneman's excellent
research and book by the same title. One of the most influential pieces of
literature I've had the pleasure of reading, everyone should read it.

[https://en.wikipedia.org/wiki/Thinking,_Fast_and_Slow](https://en.wikipedia.org/wiki/Thinking,_Fast_and_Slow)

~~~
signa11
> One of the most influential pieces of literature I've had the pleasure of
> reading, everyone should read it.

seriously ? i have had almost an opposite reaction to this, more in line with
: [https://jasoncollins.org/2016/06/29/re-reading-kahnemans-
thi...](https://jasoncollins.org/2016/06/29/re-reading-kahnemans-thinking-
fast-and-slow/) (which was discussed here
[https://news.ycombinator.com/item?id=12030791](https://news.ycombinator.com/item?id=12030791))

~~~
marklgr
I liked that book very much when it came out, but I'm now in the leery camp.
Replication and the general quality of studies is a matter of concern, but
even one of the core tenets of the book, namely system I vs system II, turns
out to be a great oversimplification that can fool as much as it can help.

The thing is, Kahneman is likeable, he has a good reputation and books like
these are pure candy for the audience who enjoy that kind of literature
(myself included)--but that's also a good warning signal. Could it be too
simple, too convenient and too satisfying to be true? How do you know when you
fall in love with an idea/theory?

~~~
sombremesa
Good question. From: [http://slatestarcodex.com/2014/12/12/beware-the-man-of-
one-s...](http://slatestarcodex.com/2014/12/12/beware-the-man-of-one-study/)

    
    
      But the question remains: what happens when (like in most cases) you don’t have a funnel plot?
    
      I don’t have a good positive answer. I do have several good negative answers.
    
      Decrease your confidence about most things if you’re not sure that you’ve investigated every piece of evidence.
    
      Do not trust websites which are obviously biased (eg Free Republic, Daily Kos, Dr. Oz) when they tell you they’re going to give you “the state of the evidence” on a certain issue, even if the evidence seems very stately indeed. This goes double for any site that contains a list of “myths and facts about X”, quadruple for any site that uses phrases like “ingroup member uses actual FACTS to DEMOLISH the outgroup’s lies about Y”, and octuple for RationalWiki.
    
      Most important, even if someone gives you what seems like overwhelming evidence in favor of a certain point of view, don’t trust it until you’ve done a simple Google search to see if the opposite side has equally overwhelming evidence.

~~~
mcguire
(As an aside, please don't use indentation for quoting. I like your points,
but they're hard to read:

" _But the question remains: what happens when (like in most cases) you don’t
have a funnel plot?_

" _I don’t have a good positive answer. I do have several good negative
answers._

" _Decrease your confidence about most things if you’re not sure that you’ve
investigated every piece of evidence._

" _Do not trust websites which are obviously biased (eg Free Republic, Daily
Kos, Dr. Oz) when they tell you they’re going to give you “the state of the
evidence” on a certain issue, even if the evidence seems very stately indeed.
This goes double for any site that contains a list of “myths and facts about
X”, quadruple for any site that uses phrases like “ingroup member uses actual
FACTS to DEMOLISH the outgroup’s lies about Y”, and octuple for RationalWiki._

" _Most important, even if someone gives you what seems like overwhelming
evidence in favor of a certain point of view, don’t trust it until you’ve done
a simple Google search to see if the opposite side has equally overwhelming
evidence._ "

------
jph00
FYI this is the paper that lays the key foundation for AlphaZero, which
recently got a lot of attention for easily beating the earlier Go-winning
algorithms without looking at human games, and then beat the best chess
algorithm with 6 hours training.

~~~
oh_sigh
Hours seems like the wrong metric to measure a (potentially) highly
distributable training time. Has deepmind released something like how many
floating point operations it took, or perhaps how many watts?

~~~
sanxiyn
DeepMind released number of games played (Table S3). 44 million games for
Chess, 24 million games for Shogi, 21 million games for Go.

------
sitkack
> Repeated deep study gradually improves intuitions.

Cognition and metacognition. The highest form of knowing is knowing why. There
is an easy to solution to most of this, ruthless application of the scientific
method. Ruthless. Zero Ego. Blank Slate every time.

~~~
visarga
Yes, I wholly agree. Let's say we make an AI agent and it creates a
hypothesis. How is it going to test it, to make sure it is causal and not a
mere correlation? By devising an experiment. So the agent needs to work like a
scientist: propose idea, test idea, iterate. Even children learn about the
world like that - they interact with the world trying out their ideas and
seeing what works and what doesn't.

But when the agent doesn't have access to a simulator or a world where it can
play, how could it understand the causal relations, and thus, be able to
reason? The most important thing for the agent is access to experimentation,
and that's why supervised learning (fixed dataset) is fundamentally limited by
comparison to RL (environment based learning).

