
A formal solution to the grain of truth problem - ikeboy
https://intelligence.org/2016/06/30/grain-of-truth/
======
kbenson
I can only barely follow what's being said, based on by unfamiliarity with the
field, but it _seems_ to say "agents used to only be able to choose an optimal
strategy if they were competing against agents with less information (less
possibly strategies known)[1], but now they have discovered they can formally
come to a Nash equilibrium based on the use of something called a reflective
oracle."

Basically, two equal level participants should result in a Nash Equilibrium?

1: Except for cases where the set of possible strategies is small, such as the
Prisoner's Dilemma. See footnote 1 in the article.

~~~
danbruc
_Basically, two equal level participants should result in a Nash Equilibrium?_

No, whether you have a Nash equilibrium depends on the strategies used, you
can always have agents following strategies that aren't a Nash equilibrium
independent of their relative strengths.

The problem they solved is that agents were unable to reason about themselves
- what will my opponent do? He is just like me so he will ask himself what
will my opponent - that is me - do. But I will of course try to figure out
what my opponent will do but he is just like me and so he will... Simplified
of course, but here you have some infinite regress from which you have to
break free. The solution they came up with and analyzed is to just pick an
arbitrary, possibly randomized answer in certain situations.

~~~
kbenson
Thanks for the clarification. So I take it that means I should read the
following:

 _The key feature of reflective oracles is that they avoid diagonalization and
paradoxes by randomizing in the relevant cases.2 This allows agents with
access to a reflective oracle to consistently reason about the behavior of
arbitrary agents that also have access to a reflective oracle, which in turn
makes it possible to model agents that converge to Nash equilibria by their
own faculties (rather than by fiat or assumption)._

as not that the agents _will_ converge on a Nash equilibria, but that now it's
possible to model agents that _do_ converge on a Nash equilibria, which
previously we could not (at at least not definitively)? That's what it seems
to say outright, which I missed previously, I just want to make sure I'm not
approaching it from the wrong context.

~~~
danbruc
Don't quote me on that, I have really nothing to do with game theory, but as I
understand it the paper it shows that reflectiveoracle-computable policies are
optimal in the discussed setup (Theorem 25) and that they will yield a Nash
equilibrium if all policies are asymptotically optimal (Theorem 28) which is
possible because a limit computable reflective oracle exists (Theorem 6) and
Thompson sampling is asymptotically optimal. So achieving a Nash equilibrium
still requires that all agents play along, they have to have asymptotically
optimal policies, if other agents show erratic behavior you are unable to make
sense of, you can not respond optimally.

~~~
AstralStorm
You can still respond optimally in a local sense, that is derive a dominant
strategy. It will take longer than an agent that is not reflective though, at
least by one sampling step.

------
rrggrr
I don't think the arbitrary solution can apply to real-world applications. A
logarithmic debit from a players preferred payoff as a computation cost might
make more sense applied to the real world.

At a point the cost of calculating an opponent's preferred payoff becomes too
high relative to the perception of payoff itself. By way of example, this is
actually how many litigation negotiations are resolved.

I think these limits on individual computation, if I may mix disciplines,
manifest as emotions or beyond that detachment. And I think there's an
opportunity for modeling real world games more accurately viewed from that
perspective.

~~~
Natanael_L
Aren't they already using such a computation limit?

[https://news.ycombinator.com/item?id=12019519](https://news.ycombinator.com/item?id=12019519)

------
evolve2k
So what does this now mean for me if I'm playing a game of werewolf/mafia?

