this book describes the linear programming formulation for finding the optimal value function. cool to see, I think this formulation is underrated by the CS community compared to other communities also trying to solve MDPs.
however I wish it also described the dual of this linear program. this problem involves optimizing over state-action frequencies which is equivalent to optimizing over policies.
so value functions and policies are dual to each other. that's pretty neat! not sure why modern RL texts don't talk about it at all.
however I wish it also described the dual of this linear program. this problem involves optimizing over state-action frequencies which is equivalent to optimizing over policies.
so value functions and policies are dual to each other. that's pretty neat! not sure why modern RL texts don't talk about it at all.