Value Iteration
In this post I’ll be covering value iteration. To best understand why it works, I want to first cover the Bellman equation: \[ \begin{align} U(s) = R(s) + \gamma \max_{s \in S} \sum_{s'} P(s'|s, a)U(s') \end{align} \]We’ve already covered R(s), P(s’ | s, a), and U(s’) so I won’t waste your time on those in a previous post. Gamma is a discount constant that is greater than 0 and less than or equal to 1. What this equation is saying is that the utility of state s is the reward plus a discount of the best neighboring state according to the transition probability multiplied by that states utility. Value iteration is the application of this formula over multiple iterations till convergence. ...