Tuesday, March 25, 2014

Artificial Intelligence for 2048

I've always wanted to write an artificial intelligence (AI) that plays a video game by actually "watching" the screen and actually sending the right keyboard commands.  Today, I've done that for 2048, a sliding-tile game implemented in Javascript by Gabriele Cirulli.

In brief, you are given a 4x4 grid and must input UP, DOWN, LEFT or RIGHT to slide tiles.  Tiles move as far as they can go and when two tiles with the same number collide, they merge into a new tile with double the value of the original tiles.  When a key is pressed every tile moves, so every move impacts the entire board.  After every move, a new tile appears on the grid at a random location (a "2" tile with 90% probability or a "4" tile with 10% probability).  (If that was unclear, just play the game for a minute).  The objective is to merge tiles until the 2048 tile appears, which is a formidable challenge.  When the board is full and no moves can be made, the game is over.

I implemented a 2048 AI agent in C# that uses the Minimax algorithm with Alpha-Beta pruning to perform a local search of the game space.

When the software is launched, it estimates the location of the 2048 grid on the screen by looking for the leftmost pixel with the color of the exterior grid (R=187, G=173, B=160) and assumes the grid is 500x500 pixels.  After it finds the grid, it pauses to allow you to set your browser to the active window and then enters a loop.  Within the loop, it:

1. Estimates the current state of the game
2. Performs the minimax search to a depth of 6 player moves
3. Sends the appropriate keypress to 2048 in the browser
4. Wait a second for the 2048 animation to resolve (otherwise state estimation will be wrong)
State estimation uses the distinctive colors of the tiles (parsed out of the CSS file at http://gabrielecirulli.github.io/2048/style/main.css) to determine which tile occupies each space.  Colors are sampled in the upper-left corner of each tile, to avoid conflicting with the number.

By choosing the minimax algorithm, the AI treats 2048 as if it were a two-player game.  The first player is simply "the player" (the agent) and the second "player" is the random tile placement.  The AI assumes that the random tile placement will be chosen in the worst possible place (from the perspective of the AI).  Obviously, sometimes the (random!) placement won't always be worst-case, but by making this assumption, riskier lines of gameplay can be avoided.

Since minimax can only feasibly look-ahead several moves, the quality of the farthest states it can look at must be estimated.  I've used a very simple heuristic.  The quality of a state is equal to:

Quality = - [ (Number of occupied tiles) + Entropy/ln(16) ]

In general, it tries to minimize the number of tiles on the board.  The Entropy term is just the Shannon entropy of the distribution of non-zero tiles on the grid.  In simple terms, the entropy term just says "Prefer to have tiles of a few types, rather than of many types".  The idea is that it's easier to combine tiles if you have a grid of three "4"s than if you have a grid with one "4", one "8" and one "16".  The entropy term is scaled to be in the range of 0.0-1.0 so that minimizing the number of occupied tiles in the grid is the dominant guiding force.

Keypresses are sent using the Windows.System.Forms.SendKey API, which is incredibly convenient.  However, you'll need to use an implementation of C# that allows this.  This works on Windows, but I'm not sure if it will work on Linux under Mono (possibly, but I haven't tried it).

Overall, the AI tends to do pretty well when playing the game.  It does not always win, but if often does.

Software is implemented in C# and is available on GibHub at: https://github.com/djparente/AI2048