Monte Carlo Tree Search (MCTS)

. 9 min read

Chapter 1: The Arena – The Connect Four Battlefield

Imagine a large arena where two warriors face off on a vertical grid. The board has 7 columns (numbered 0 to 6) and 6 rows. Pieces fall like rain into the columns, stacking from the bottom up. The goal? Be the first to align four of your pieces in a row—horizontally, vertically, or diagonally!

Below is an empty Connect Four board:

  Columns: 0   1   2   3   4   5   6
         -----------------------------
Row 0: |   |   |   |   |   |   |   |
         -----------------------------
Row 1: |   |   |   |   |   |   |   |
         -----------------------------
Row 2: |   |   |   |   |   |   |   |
         -----------------------------
Row 3: |   |   |   |   |   |   |   |
         -----------------------------
Row 4: |   |   |   |   |   |   |   |
         -----------------------------
Row 5: |   |   |   |   |   |   |   |
         -----------------------------

Let’s begin by defining our ConnectFourState in Python. This class will represent the current board, manage whose turn it is (we’ll use 'X' and 'O'), and check for victory or a full board (a draw).

# Chapter 1: Setting Up the Battlefield (Connect Four State)

class ConnectFourState:
    ROWS = 6
    COLS = 7

    def __init__(self, board=None, current_player='X'):
        # Initialize an empty board if none is provided.
        if board is None:
            self.board = [[' ' for _ in range(ConnectFourState.COLS)] 
                          for _ in range(ConnectFourState.ROWS)]
        else:
            # Deep copy the board for safety.
            self.board = [row[:] for row in board]
        self.current_player = current_player

    def get_possible_moves(self):
        # A move is a column index where the top cell is still empty.
        return [col for col in range(ConnectFourState.COLS) if self.board[0][col] == ' ']

    def move(self, col):
        # Drop a piece into the chosen column.
        new_board = [row[:] for row in self.board]
        # Pieces fall to the lowest available row.
        for row in reversed(range(ConnectFourState.ROWS)):
            if new_board[row][col] == ' ':
                new_board[row][col] = self.current_player
                break
        # Switch players.
        next_player = 'O' if self.current_player == 'X' else 'X'
        return ConnectFourState(new_board, next_player)

    def is_terminal(self):
        # The game ends when someone wins or the board is full.
        return self.get_winner() is not None or not self.get_possible_moves()

    def get_winner(self):
        # Check all directions for a win: horizontal, vertical, and both diagonals.
        board = self.board
        R, C = ConnectFourState.ROWS, ConnectFourState.COLS
        # Directions: (delta_row, delta_col)
        directions = [(0,1), (1,0), (1,1), (1,-1)]
        for row in range(R):
            for col in range(C):
                if board[row][col] == ' ':
                    continue
                player = board[row][col]
                for dr, dc in directions:
                    count = 0
                    r, c = row, col
                    while 0 <= r < R and 0 <= c < C and board[r][c] == player:
                        count += 1
                        if count == 4:
                            return player
                        r += dr
                        c += dc
        return None

    def get_reward(self, player):
        # Reward for the 'player': 1 if they win, 0 if they lose, and 0.5 for a draw.
        winner = self.get_winner()
        if winner == player:
            return 1
        elif winner is not None:
            return 0
        else:
            # If no moves remain and no winner, it's a draw.
            return 0.5

    def __repr__(self):
        # Create a neat string representation of the board.
        rows = []
        for row in self.board:
            rows.append("| " + " | ".join(row) + " |")
        header = "  " + "   ".join(map(str, range(ConnectFourState.COLS)))
        sep = "\n" + "  " + ("----" * ConnectFourState.COLS) + "\n"
        return header + "\n" + sep.join(rows)

In this chapter, we’ve set the stage: a Connect Four board where pieces drop into columns, and victory is earned by aligning four in a row.


Chapter 2: The Game’s Inner Magic – Rules & Moves

Now that we have our battlefield, our hero must learn the rules of engagement. The code above includes:

  • Possible Moves: Choosing any column that isn’t full.
  • Making a Move: Dropping a piece (which “falls” to the lowest empty spot).
  • Terminal Check & Winner Detection: Scanning horizontally, vertically, and diagonally for four in a row.

Imagine a scenario:

  • Player X drops a piece in column 3.
  • The piece lands at the bottom of column 3.

Here’s an illustration of that move:

  Columns: 0   1   2   3   4   5   6
         -----------------------------
Row 0: |   |   |   |   |   |   |   |
         -----------------------------
Row 1: |   |   |   |   |   |   |   |
         -----------------------------
Row 2: |   |   |   |   |   |   |   |
         -----------------------------
Row 3: |   |   |   |   |   |   |   |
         -----------------------------
Row 4: |   |   |   |   |   |   |   |
         -----------------------------
Row 5: |   |   |   | X |   |   |   |
         -----------------------------

This magic of gravity and strategy is encapsulated in our move method.


Chapter 3: The Decision Tree – Our MCTS Node for Connect Four

Now we introduce our wise explorer: the MCTSNode. Think of each node as a snapshot of the board mid-battle. The node records:

  • The state (i.e., the current board and whose turn it is).
  • Its parent (the state that led here).
  • Its children (the various moves that were explored).
  • The accumulated visits and total reward from simulations.

Visualize the tree branching out from the initial board:

           [Initial Board]
                /   |   \ ...
          Col 0  Col 1 ... Col 6
             |     |         |
         [Child Nodes: Each a new board state]

Here’s the code for our explorer node:

# Chapter 3: Crafting the Wise Explorer (MCTS Node)

import math
import random

class MCTSNode:
    def __init__(self, state, parent=None):
        self.state = state           # The snapshot of the board.
        self.parent = parent         # Where we came from.
        self.children = {}           # Dictionary: move (column) -> MCTSNode.
        self.visits = 0              # How many times we've visited this node.
        self.total_reward = 0.0      # Cumulative reward from simulations.

    def is_fully_expanded(self):
        # When every legal move from this state has been tried.
        return len(self.children) == len(self.state.get_possible_moves())

    def best_child(self, c_param=1.41):
        # Select the best child node using the UCT formula.
        choices_weights = []
        for child in self.children.values():
            exploit = child.total_reward / child.visits
            explore = math.sqrt(2 * math.log(self.visits) / child.visits)
            choices_weights.append(exploit + c_param * explore)
        best_move = list(self.children.keys())[choices_weights.index(max(choices_weights))]
        return best_move, self.children[best_move]

    def expand(self):
        # Expand a new child node from an unexplored move.
        untried_moves = [move for move in self.state.get_possible_moves() if move not in self.children]
        move = random.choice(untried_moves)
        new_state = self.state.move(move)
        child_node = MCTSNode(new_state, parent=self)
        self.children[move] = child_node
        return child_node

    def update(self, reward):
        # Update the node with the simulation result.
        self.visits += 1
        self.total_reward += reward

Now our explorer can branch out, evaluate moves, and store its learned wisdom. Every time it simulates a game, it updates these values.


Chapter 4: The Hero’s Journey – The MCTS Algorithm

The heart of our tale is the MCTS algorithm. Our hero embarks on many mini-battles to learn which moves lead to victory. The journey is divided into four phases:

  1. Selection: Start at the root and follow the best branches until you reach a node that isn’t fully expanded or is terminal.
  2. Expansion: At that frontier, explore a new move.
  3. Simulation (Rollout): From that new node, play random moves until the game concludes.
  4. Backpropagation: Update all nodes along the path with the outcome of the simulation.

Imagine it as a river branching into streams:

          [Initial State]

       (Selection: follow promising paths)

       [Newly Expanded Node]

      (Rollout: random play to the end!)

       [Terminal Outcome]

 (Backpropagation: wisdom flows upward)

Let’s write the code for this grand journey:

# Chapter 4: The Hero's Journey (MCTS Algorithm)

def mcts(root, iterations, root_player):
    for _ in range(iterations):
        node = root

        # 1. Selection: Traverse the tree down to a leaf node.
        while not node.state.is_terminal() and node.is_fully_expanded():
            _, node = node.best_child()

        # 2. Expansion: If the node is not terminal, expand it.
        if not node.state.is_terminal():
            node = node.expand()

        # 3. Simulation (Rollout): Play randomly until the game ends.
        reward = rollout(node.state, root_player)

        # 4. Backpropagation: Update the nodes along the path with the result.
        while node is not None:
            node.update(reward)
            node = node.parent

    # Choose the move from the root with the highest visit count.
    best_move, best_node = max(root.children.items(), key=lambda item: item[1].visits)
    return best_move, best_node

def rollout(state, root_player):
    # Simulate a game from this state by choosing random moves.
    current_state = state
    while not current_state.is_terminal():
        move = random.choice(current_state.get_possible_moves())
        current_state = current_state.move(move)
    return current_state.get_reward(root_player)

Each iteration is a mini-adventure—our hero learns from random skirmishes and gradually refines its judgment on which moves are most promising.


Chapter 5: The Final Showdown – Making the Move

At last, our hero is ready for the ultimate challenge. With the arena set and the explorer equipped, we begin with an empty board where player X starts the duel. After many simulated battles, our MCTS agent recommends the best column in which to drop a piece.

Visualize the starting board again:

  Columns: 0   1   2   3   4   5   6
         -----------------------------
Row 0: |   |   |   |   |   |   |   |
         -----------------------------
Row 1: |   |   |   |   |   |   |   |
         -----------------------------
Row 2: |   |   |   |   |   |   |   |
         -----------------------------
Row 3: |   |   |   |   |   |   |   |
         -----------------------------
Row 4: |   |   |   |   |   |   |   |
         -----------------------------
Row 5: |   |   |   |   |   |   |   |
         -----------------------------
Player X’s turn!

Now, let’s put everything together in the main execution block:

# Chapter 5: The Final Showdown (Main Execution)

if __name__ == '__main__':
    # Prepare the battlefield.
    initial_state = ConnectFourState()
    root = MCTSNode(initial_state)
    root_player = initial_state.current_player  # 'X' starts the duel.

    # Our hero embarks on many simulated battles.
    iterations = 2000  # You can adjust this number for more thorough search.
    best_move, best_node = mcts(root, iterations, root_player)
    
    print("MCTS has completed its epic quest!")
    print(f"Recommended move for player {root_player} is to drop a piece in column: {best_move}\n")
    print("Resulting board after the best move:")
    print(initial_state.move(best_move))

When you run this script, our MCTS agent simulates thousands of mini-battles and ultimately advises the best column for player X to drop a piece.


Epilogue: Reflections on the Adventure

In this saga, we:

  • Built a complex game state for Connect Four, complete with gravity and win detection.
  • Designed an explorer node that can branch out through countless moves.
  • Launched an epic MCTS journey where our hero learned from random simulations.
  • Crowned our champion with the best move for the current player.

This multi-layered adventure not only demonstrates the power of MCTS in a challenging setting but also provides a rich, step-by-step narrative complete with ASCII illustrations. Now, feel free to tweak the iterations, experiment with different strategies, or even extend the game logic for more complex challenges. Happy coding and may your algorithms lead you to victory!


This essay was generated by o3-mini after a series of prompts.