N-Grams: Joint Probability

Let’s start by examining the simplest case of joint probability with single word probability via a uni-gram. If we want to know the probability of a word, without any context appearing before or after, then we could take a corpus of text and convert it into a dictionary with the key for the word and the count as the value: { "a": 1000, "an": 3, "animal": 12, ... }. Say we want to calculate P("a"), meaning the likelihood of “a.” We would take the count of occurrences for the word “a” and divide it by the total number of words in the corpus: P("a") ≈ count("a") / corpus_word_count. This is called maximum likelihood estimation (MLE) and we use the “≈” sign because our corpus is not a perfect representation of the actual likelihood of the word. ...

September 11, 2019 · 3 min

Ring Buffers

In my last post I discussed n-grams and gave an example of them being used on the text of Harry Potter but I didn’t cover the implementation and instead linked the source. Today, I want to go over a key data structure used in my implementation: ring buffers (also known as a circular buffer). A ring buffer needs a max size, N, that represents its max capacity. Until the buffer has reached its max capacity, it is exactly like a list. However, once the max capacity is reached the buffer will drop elements when new ones are added resulting in a first in first out (FIFO) behavior. An example of this data structure in action can be seen below. We initialize a ring buffer of size three. At first the ring buffer acts like a list but stops when the fourth element is added. On this add, the ring buffer drops the 0 because it was the first element added and the buffer has reached its max capacity of three. Say, for example, we ran this again and added a four. The buffer would then be [2,3,4] because the one would be the next element to be dropped. ...

September 8, 2019 · 3 min

N-Grams With Harry Potter

I recently started grad school and one of the classes I am taking is Natural Language Processing (NLP). Before the class I decided to watch a few videos on NLP and came across N-Grams. I have not made it known in any of my past posts but I love N-Grams. One of my projects is around reinforcing N-Grams which I hope to post about sometime later this year. Digression aside, I decided it would be a fun project to write an n-gram that uses the text of Harry Potter as input and see what we get; I know it isn’t the most original idea but it was fun. ...

September 5, 2019 · 10 min

Procjam 2018 Postmortem

A few months ago I completed the annual ProcJam jam. I wish I had considered doing a postmortem beforehand when everything was fresh, but now I am preparing for 7DRL. I figure a retrospective on how I did in the last jam will be useful for the upcoming one. Before beginning, I would like to preface this with the fact that I was sick for the entire jam. It’s not an excuse but it is one of the reasons why I did not complete as much as I would have liked to in the game. My original design was a weapon based roguelike, minus turn-based, where the player is trying to get through as many levels as possible. The weapons would be dropped by enemies and they would be adversarially generated. Meaning, a player that spams weapon shots would receive guns that shoot slower. A player that was extremely accurate would receive guns that would spray, such as a shotgun. In addition, all levels would be procedurally generated. I hoped to get in a few enemies and had a stretch goal of creating a boss. Lastly, I hoped to make the game a platformer. In short, I had an ambitious scope for a ten-day jam even if I wasn’t sick. ...

February 10, 2019 · 12 min

Q Learning: Starting From the Top

I want to go over Q-learning (a form of reinforcement learning) in this post. To start, we could go in two directions. We could explore at the bottom and look at the math behind neural networks and Q-learning. Or we could start at the top and see the end result. We are going to go with the latter. Figure 1: The mountain car environment. To do this we are going to need a few libraries and a testbed. To test, we are going to use OpenAI’s Gym and use MountainCar-V0. In this environment, proposed by Andrew Moore in his Ph.D. thesis, the car must reach the flag seen in figure 1. The car, though, does not have enough acceleration to achieve this by just going forward. Instead, it must go back and forward, steadily gaining enough speed to reach the goal. This is a problem that can be solved simply with a rule-based agent, however, reinforcement approaches can struggle with this. You’ll soon see that the amount of episodes it takes for q-learning to solve this is more than expected. ...

January 26, 2019 · 5 min

Redacting PDFs

One day at work, I walked by someone who was going through a large set of PDFs and for everyone he put a block box over the name field. He mentioned it would take him several hours to accomplish this extremely menial task. Naturally, I found myself attracted to the problem due to my love of automation. I decided then and there I would write a small script that he could use to redact large set of similarly formatted PDFs. ...

July 29, 2018 · 6 min

Visualizing Fractal Trees

I came across a youtube video which showed a way to visualize fractal trees. I watched it while eating dinner and didn’t think much of it at the time. A week or two later, though, I had decided to a do a few more challenges for my challenges repository and this was at the top of the list. It is a fairly simple program that has a cool end result seen in figures one and four. ...

July 10, 2018 · 10 min

Generative Design in Mineraft: Nuking the Ground

A Quick Note I ended up getting pretty sick and I was out of commission for about two months. The good news is that I’m now in perfectly good health. The bad news is that it kind of destroyed my hopes of building a decent submission for GDMC. The competition ends in about thirteen days which is not enough time to come up with a submission I would be proud of. In addition, my 40+ hours at the Brain Game Center, where I work, every week is the very large nail in the coffin. Regardless, I plan on continuing to work on this problem until I have something cool I can show off. ...

June 17, 2018 · 9 min

Visualizing Sorting Algorithms with OpenGL

If you’ve read my previous posts, then you know I love python. Regardless, it has been a goal of mine to be proficient in c++. I’m not exactly sure why I’m fascinated with this language that I have no uses cases for, but I think it stems from my love of video games. C++ is used extensively by my favorite company, Blizzard Entertainment, and sees a wide range of use across the industry. In addition, it also is apart of a field that is of particular interest for me, AI. For example, tensorflow is implemented in c++. ...

April 20, 2018 · 10 min

Making Rush Hour: Github and Matrix Formats

Source Control and GitHub GitHub is an awesome website that allows you to have unlimited repositories, for free, that are backed up with git on a remote server. In addition, it provides you with helpful tools like issues that allow you to keep track of bugs, features, and anything else you want. It is not the end all be all of source control and has pros and cons that should be considered before being used. ...

March 20, 2018 · 6 min