Visualizing Hilbert Curves With Raylib

In my most recent post, I showed how to implement Conway’s Game of Life in C++ and visualize it with raylib. This post is of the same kind, except we are going to visualize Hilbert Curves with raylib. However, this post will only have an example of the final result and the code.1 I am of the opinion that that the two sources below are of a high enough quality that I don’t have anything more to add. ...

December 3, 2024 · 3 min

Visualizing Conway's Game of Life with Raylib

Introduction Conway’s Game of Life was created by John Horton Conway. The idea is simple: you have a grid of cells that can either be alive or dead. In every iteration, a new grid is made from the current grid by following a set of rules based on a cell’s neighbors. Since this is a grid, there are 8 possible neighbors when diagonals are included. The rules are the following: Any live cell with less than two neighbors dies. Any live cell with two or three neighbors stays alive. Any live cell with more than three neighbors dies. Any dead cell with three live neighbors becomes a live cell. The rules are simple, but the results can be mesmerizing. However, there are more reasons to care about Conway’s Game of Life than the fact that it is nice to look at. For example, it has been shown that a turing machine can be simulated with it. It is also proof of how seemingly incredibly complex phenomena can appear from straightforward rules. ...

November 26, 2024 · 7 min

Previewing Files with Tkinter

Introduction This is related to an earlier post which was on panning and scaling a node graph visualized with Tkinter. In this post, the goal is to implement a level preview function where when the user hovers over a node they can see the associated file in a rectangle that appears. When they stop hovering over the node, the preview should go away. Additionally, there is a section on writing an update_node function that simplified the codebase. ...

November 25, 2024 · 3 min

Panning and Zooming for a Node Graph Editor with Tkinter

Motivation As part of the work for my dissertation, I am building a tool that allows users to edit a Markov Decision Process (MDP). This relates to my paper, Level Assembly as a Markov Decision Process. This work is different because I am representing the MDP as a graph. Every node in the graph is a state, and a state is a level segment. Every edge represents an action. Levels are formed by connecting states together. I also have some metadata in the edges, which are links between level segments. ...

November 24, 2024 · 8 min

Updating My Site to Hugo

It’s been over two years since I last made a post, and a lot has happened. One of the most minor of these is the change to my website: I’m now using Hugo to build the whole thing. Before Hugo, I was using a home-brewed tool I built in Python. It worked with HTML, but there were re-writable portions marked by {{ FILE_LOCATION }}. The tool would go to that file location and replace the line with whatever it found in that file. I also had a blog post system with a meta file with information like the title and the data of the blog post, and more features. However, it was a pain to work with. ...

November 23, 2024 · 3 min

Q-Learning and SARSA

In this post I’m going to be covering two active reinforcement learning methods: q-learning and SARSA. Both methods do not depend on an MDP. Meaning, we are not guaranteed the presence of P. Instead, we learn which actions can be taken in a state during playthroughs. This is represented by a q-table, which is a table of states and actions that yield a q-value, which represents expected utility when taking an action in a given state. We can say that the expected utility of a state is the best action associated with that state. See the equation below. ...

February 22, 2022 · 4 min

Temporal Difference Learning

This post is on temporal difference learning (TDL). TDL does not rely on a a Markov Decision Process (MDP). Instead, it traditionally uses a table where each state has a row and column so that the table represents the utility of all possible transitions. TDL uses observations to learn which transitions are valid. \[ \begin{align} U^{\pi}(s) \leftarrow U^{\pi} + \alpha (R(s) + \gamma U^{\pi}(s') - U^{\pi}(s)) \end{align} \]This is a very similar to the Bellman equation that we’ve seen in the past posts, but we aren’t using neighbors because temporal difference learning does not assume a model. Instead, an agent plays through the game, and returns a set of states that were taken along the path. So, the utility of a state s is determined by itself plus the reward of that state plus the utility of the state next in the path minus its own utility all multiplied by a learning rate alpha, which is typically a small number between 0 and 1. ...

February 1, 2022 · 3 min

Policy Iteration

In this post I’ll be covering policy iteration. As a brief reminder, here is the bellman equation that value iteration optimizes: \[ \begin{align} U(s) = R(s) + \gamma \max_{s \in S} \sum_{s'} P(s'|s, a)U(s') \end{align} \]In policy iteration, the goal is not to find the perfectly optimal utility of states like value iteration. If one state is clearly better than another, then the precise difference isn’t that important. After all, we care most about an agent making the correct decisions. This idea is how we come to policy iteration, which is broken into two steps: (1) policy evaluation and (2) policy improvement. Before going into each of these ideas, I want to give you the pseudo-code for the algorithm. ...

January 31, 2022 · 4 min

Value Iteration

In this post I’ll be covering value iteration. To best understand why it works, I want to first cover the Bellman equation: \[ \begin{align} U(s) = R(s) + \gamma \max_{s \in S} \sum_{s'} P(s'|s, a)U(s') \end{align} \]We’ve already covered R(s), P(s’ | s, a), and U(s’) so I won’t waste your time on those in a previous post. Gamma is a discount constant that is greater than 0 and less than or equal to 1. What this equation is saying is that the utility of state s is the reward plus a discount of the best neighboring state according to the transition probability multiplied by that states utility. Value iteration is the application of this formula over multiple iterations till convergence. ...

January 26, 2022 · 2 min

Revised Direct Utility Estimation For Better MDP

In the previous post, I provided an implementation that was technically correct but lacking in two ways: (1) optimization and (2) formalism. The optimization was weak, because I was using the function game.next_states() which was computing the next possible states given a state. Instead, precompute all valid transitions and your code will be much more efficient. This also leads to formalism where I had a MDP but I never directly defined the set of actions or transitions. So, let’s do that. ...

January 25, 2022 · 2 min