Previewing Files with Tkinter

Introduction This is related to an earlier post which was on panning and scaling a node graph visualized with Tkinter. In this post, the goal is to implement a level preview function where when the user hovers over a node they can see the associated file in a rectangle that appears. When they stop hovering over the node, the preview should go away. Additionally, there is a section on writing an update_node function that simplified the codebase. ...

November 25, 2024 · 3 min

Panning and Zooming for a Node Graph Editor with Tkinter

Motivation As part of the work for my dissertation, I am building a tool that allows users to edit a Markov Decision Process (MDP). This relates to my paper, Level Assembly as a Markov Decision Process. This work is different because I am representing the MDP as a graph. Every node in the graph is a state, and a state is a level segment. Every edge represents an action. Levels are formed by connecting states together. I also have some metadata in the edges, which are links between level segments. ...

November 24, 2024 · 8 min

Updating My Site to Hugo

It’s been over two years since I last made a post, and a lot has happened. One of the most minor of these is the change to my website: I’m now using Hugo to build the whole thing. Before Hugo, I was using a home-brewed tool I built in Python. It worked with HTML, but there were re-writable portions marked by {{ FILE_LOCATION }}. The tool would go to that file location and replace the line with whatever it found in that file. I also had a blog post system with a meta file with information like the title and the data of the blog post, and more features. However, it was a pain to work with. ...

November 23, 2024 · 3 min

Q-Learning and SARSA

In this post I’m going to be covering two active reinforcement learning methods: q-learning and SARSA. Both methods do not depend on an MDP. Meaning, we are not guaranteed the presence of P. Instead, we learn which actions can be taken in a state during playthroughs. This is represented by a q-table, which is a table of states and actions that yield a q-value, which represents expected utility when taking an action in a given state. We can say that the expected utility of a state is the best action associated with that state. See the equation below. ...

February 22, 2022 · 4 min

Temporal Difference Learning

This post is on temporal difference learning (TDL). TDL does not rely on a a Markov Decision Process (MDP). Instead, it traditionally uses a table where each state has a row and column so that the table represents the utility of all possible transitions. TDL uses observations to learn which transitions are valid. \[ \begin{align} U^{\pi}(s) \leftarrow U^{\pi} + \alpha (R(s) + \gamma U^{\pi}(s') - U^{\pi}(s)) \end{align} \]This is a very similar to the Bellman equation that we’ve seen in the past posts, but we aren’t using neighbors because temporal difference learning does not assume a model. Instead, an agent plays through the game, and returns a set of states that were taken along the path. So, the utility of a state s is determined by itself plus the reward of that state plus the utility of the state next in the path minus its own utility all multiplied by a learning rate alpha, which is typically a small number between 0 and 1. ...

February 1, 2022 · 3 min

Policy Iteration

In this post I’ll be covering policy iteration. As a brief reminder, here is the bellman equation that value iteration optimizes: \[ \begin{align} U(s) = R(s) + \gamma \max_{s \in S} \sum_{s'} P(s'|s, a)U(s') \end{align} \]In policy iteration, the goal is not to find the perfectly optimal utility of states like value iteration. If one state is clearly better than another, then the precise difference isn’t that important. After all, we care most about an agent making the correct decisions. This idea is how we come to policy iteration, which is broken into two steps: (1) policy evaluation and (2) policy improvement. Before going into each of these ideas, I want to give you the pseudo-code for the algorithm. ...

January 31, 2022 · 4 min

Value Iteration

In this post I’ll be covering value iteration. To best understand why it works, I want to first cover the Bellman equation: \[ \begin{align} U(s) = R(s) + \gamma \max_{s \in S} \sum_{s'} P(s'|s, a)U(s') \end{align} \]We’ve already covered R(s), P(s’ | s, a), and U(s’) so I won’t waste your time on those in a previous post. Gamma is a discount constant that is greater than 0 and less than or equal to 1. What this equation is saying is that the utility of state s is the reward plus a discount of the best neighboring state according to the transition probability multiplied by that states utility. Value iteration is the application of this formula over multiple iterations till convergence. ...

January 26, 2022 · 2 min

Revised Direct Utility Estimation For Better MDP

In the previous post, I provided an implementation that was technically correct but lacking in two ways: (1) optimization and (2) formalism. The optimization was weak, because I was using the function game.next_states() which was computing the next possible states given a state. Instead, precompute all valid transitions and your code will be much more efficient. This also leads to formalism where I had a MDP but I never directly defined the set of actions or transitions. So, let’s do that. ...

January 25, 2022 · 2 min

Direct Utility Estimation

In this post, I’m going to take you through one approach to solving the grid world environment with reinforcement learning, specifically direct utility estimation (DUE). (If you are unfamiliar with the grid world environment, fret not because that is the section directly below.) Before DUE can be covered, I’ll give a brief overview of Markov decision processes and why they are not of great for problems with large state spaces. I’ll then show how DUE works and can be implemented in Python. Following that, I’ll show that DUE can solve MDPs, but the speed is low enough that a DUE is best seen as introduction for more powerful approaches. ...

January 20, 2022 · 6 min

Generative Art: Conway's Game of Life

With the lightning algorithm out of the way, let’s look at Conway’s Game of Life. The idea is pretty cool: we create a grid (nxm), and randomly assign cells to alive or dead. We then apply a set of rules for some number of iterations and we get a very hard to predict result. The rules are pretty simple: Any live cell with fewer than two live neighbors dies, as if by under-population. Any live cell with two or three live neighbors lives on to the next generation. Any live cell with more than three live neighbors dies, as if by overpopulation. Any dead cell with exactly three live neighbors becomes a live cell, as if by reproduction. It turns out that despite the relative fame of Conway’s Game of Life, implementing it is very simple. I think that there are three points of note. First, you are checking not only the four cardinal directions, you are also checking the corners for a total of 8 cells. Second, Conway’s Game of Life was designed for an infinite grid, but that is not feasible, nor useful is it useful in our case of generating images. In this implementation I’ve made the grid the size of the image. A better way would be to make the grid larger than the image to approximate the larger grid space. Third, the grid is not modified in place (i.e. a new grid is modified in each iteration, and the old grid is used to get neighbor counts). ...

January 13, 2022 · 4 min