Policy Iteration

In this post I’ll be covering policy iteration. As a brief reminder, here is the bellman equation that value iteration optimizes: \[ \begin{align} U(s) = R(s) + \gamma \max_{s \in S} \sum_{s'} P(s'|s, a)U(s') \end{align} \]In policy iteration, the goal is not to find the perfectly optimal utility of states like value iteration. If one state is clearly better than another, then the precise difference isn’t that important. After all, we care most about an agent making the correct decisions. This idea is how we come to policy iteration, which is broken into two steps: (1) policy evaluation and (2) policy improvement. Before going into each of these ideas, I want to give you the pseudo-code for the algorithm. ...

January 31, 2022 · 4 min

Value Iteration

In this post I’ll be covering value iteration. To best understand why it works, I want to first cover the Bellman equation: \[ \begin{align} U(s) = R(s) + \gamma \max_{s \in S} \sum_{s'} P(s'|s, a)U(s') \end{align} \]We’ve already covered R(s), P(s’ | s, a), and U(s’) so I won’t waste your time on those in a previous post. Gamma is a discount constant that is greater than 0 and less than or equal to 1. What this equation is saying is that the utility of state s is the reward plus a discount of the best neighboring state according to the transition probability multiplied by that states utility. Value iteration is the application of this formula over multiple iterations till convergence. ...

January 26, 2022 · 2 min

Revised Direct Utility Estimation For Better MDP

In the previous post, I provided an implementation that was technically correct but lacking in two ways: (1) optimization and (2) formalism. The optimization was weak, because I was using the function game.next_states() which was computing the next possible states given a state. Instead, precompute all valid transitions and your code will be much more efficient. This also leads to formalism where I had a MDP but I never directly defined the set of actions or transitions. So, let’s do that. ...

January 25, 2022 · 2 min

Direct Utility Estimation

In this post, I’m going to take you through one approach to solving the grid world environment with reinforcement learning, specifically direct utility estimation (DUE). (If you are unfamiliar with the grid world environment, fret not because that is the section directly below.) Before DUE can be covered, I’ll give a brief overview of Markov decision processes and why they are not of great for problems with large state spaces. I’ll then show how DUE works and can be implemented in Python. Following that, I’ll show that DUE can solve MDPs, but the speed is low enough that a DUE is best seen as introduction for more powerful approaches. ...

January 20, 2022 · 6 min

Generative Art: Conway's Game of Life

With the lightning algorithm out of the way, let’s look at Conway’s Game of Life. The idea is pretty cool: we create a grid (nxm), and randomly assign cells to alive or dead. We then apply a set of rules for some number of iterations and we get a very hard to predict result. The rules are pretty simple: Any live cell with fewer than two live neighbors dies, as if by under-population. Any live cell with two or three live neighbors lives on to the next generation. Any live cell with more than three live neighbors dies, as if by overpopulation. Any dead cell with exactly three live neighbors becomes a live cell, as if by reproduction. It turns out that despite the relative fame of Conway’s Game of Life, implementing it is very simple. I think that there are three points of note. First, you are checking not only the four cardinal directions, you are also checking the corners for a total of 8 cells. Second, Conway’s Game of Life was designed for an infinite grid, but that is not feasible, nor useful is it useful in our case of generating images. In this implementation I’ve made the grid the size of the image. A better way would be to make the grid larger than the image to approximate the larger grid space. Third, the grid is not modified in place (i.e. a new grid is modified in each iteration, and the old grid is used to get neighbor counts). ...

January 13, 2022 · 4 min

Generative Art: Lightning

In the previous post, we generated images with circles and squares in Python with the package Pillow. The code was simple and the results are not that compelling. So, in this post I’m looking to step it up a bit. A while back Numberphile posted a video of Matt Henderson explaining how maze generation and breadth-first search can be used to generate a compelling animation of lightning. We are going to use a similar approach to generate images of lightning. ...

December 12, 2021 · 3 min

Generative Art: Circles and Squares

This is the first of what we’ll likely be a series of blog posts on generative art, specifically images. At the start, everything will be very simple. This is mainly because the topic is new to me and the library we’re going to use to generate images is also new to me. Speaking of, we’re going to be using Python and Pillow to generate images; at least at the start, we’ll see where we go. To start, I want to break this post into two parts: (1) pseudo-random and (2) generating a simple image with circles and squares. ...

October 20, 2021 · 5 min

Linear and Quadratic Discriminant Analysis

March 15, 2021 · 0 min

An Introduction to Naive Bayes Classification

March 10, 2021 · 0 min

Deriving the Ordinary Least Squares Linear Regression Solution

March 9, 2021 · 0 min