Week 12: Informed and Adversarial Search

Readings

Informed Search
- Very useful for preparing for assignment 10!
Adversarial Search
Games, Minimax, and Alpha-Beta (MIT OpenCourseware)
Red Blob Games: Introduction to A*

Key Questions

What makes "informed search" informed?
What are some ways we can use heuristic information to guide search?
How can we come up with an heuristic for a new sort of search problem?
How do we compare two heuristics to tell which is better?
Is there such a thing as a general-purpose heuristic for search problems?
- If there is, how does it depend on the way the problem is formulated?
What could happen if the heuristic were an over-estimate of the true cost?
What are some of the different relative strengths of humans and computers in game-playing?
What does it mean for a strategy to be optimal?
How do I evaluate minimax on a game tree?
What happens to minimax if the opponent does not play optimally?
What is a board evaluation function?
How is a board evaluation function similar to and different from a heuristic?

Topics

Informed Search

Every specific problem carries some structuring information.

In N-Queens, we could make a more compact encoding that reduced symmetries
In path planning we might reject any spot we've already visited since no path with a loop is going to be optimal
In the Sudoku solver, we used more constrained cells before less constrained cells
In path planning we might prefer trying spots that are closer to the goal before trying ones that are further away

In these last two cases, we aren't reducing the state space of the problem at all, but we are prioritizing those parts of the state space which seem likely to lead to solutions faster. Whereas in the first two cases we are making the state space simpler, in the last two cases we are making best-effort guesses about which parts of the space to explore first. We call this latter type of guess a heuristic (a "rule of thumb"). We'll discuss heuristics in more detail later.

Informed search uses information about the problem to guide search. There are many informed search algorithms, but we'll only discuss best-first search and guided depth-first search and later outline the A* algorithm.

Best-first search

Best-first search uses a given heuristic to prioritize which node to visit next. You can imagine that it's similar to breadth-first search, only instead of expanding its search frontier in increasing layer order, it goes in increasing order of heuristic value (lower is better).

def best_first_search_iterative(start:SearchState) -> SearchState:
    pqueue = [start]
    while pqueue:
        # Removes last element and returns it
        first = stack.pop()
        if first.is_goal():
            return first
        for next_s in first.next_states():
            pqueue.append(next_s)
        # Put cheapest states earliest
        pqueue.sort(key=lambda s: s.heuristic(), reversed=True)
    return None

You can see for yourself that this follows the iterative search template! The only difference is that it does some sorting of the queue. We can avoid re-sorting the entire queue if we use a specialized structure called a PriorityQueue (then we just use pop() and append() and the code is exactly like depth-first or breadth-first search), but we'll leave that for a future CS course.

Best-first search has the flaw that it's easily misled by a heuristic that's too optimistic. Challenge problem: Thinking of finding a path through a maze, can you come up with a case where "prefer the spot closest to the goal" will make best-first search perform poorly? Can you outline a tweak to best-first search that will help it perform better in cases like that?

Heuristic-guided depth-first search

Assignment 10 used a simpler mechanism for guiding search with a heuristic, taking advantage of the structure of Sudoku problems: instead of considering every empty square, just pick a square with the fewest possible candidate values. This is a special case of the more general "most constrained variable" heuristic for search and constraint satisfaction problems. The following code works about the same way, except it doesn't refuse to consider other spots; in problems like Sudoku, it doesn't really matter what order squares are filled in, but in some other problems the order might be important.

def guided_dfs(state):
   if is_goal_state(state):
       return True
   nexts = successor_states(node):
   nexts.sort(key=lambda s: s.heuristic())
   for next_state in nexts:
      if guided_dfs(next_state):
         return True
   return False

Challenge problem: Come up with a search problem where guided_dfs and best_first_search will traverse nodes in a different order.

A*

A* is a popular search algorithm which combines the best aspects of best-first-search (taking promising routes earlier) and Dijkstra's algorithm (which is like our best-first outline above, but sorts states based on how "far away" they are from the starting point). It is especially well-suited to path planning but works well for any search problem where actions have costs and the problem is to find a lowest-cost plan.

The key idea of A* is that instead of sorting by the heuristic (as in best first) or by the distance from the start (as in Dijkstra's), we should sort states by the sum of the heuristic and the distance so far. This strikes a balance between preferring paths we know are short and preferring paths which seem likely to reach the goal soon.

Heuristics

We've read about several heuristics so far—straight-line distance to the goal in a maze problem, using the number of valid values for a variable as the heuristic—but we don't have much practice coming up with heuristics.

One useful way to come up with heuristics is to imagine a simpler version of the problem and solve that. For example, in the eight-puzzle example we looked at a simpler problem where tiles could just be swapped arbitrarily, and in path planning we can imagine we can walk through walls. It's ideal if the heuristic we come up with under-estimates the true cost, but not by too much—we want it to be optimistic, but not overly so (and we consider heuristics that get closer to the true cost to be "better"). We considered earlier what can happen if the heuristic is too greedy, but it's also a problem if the heuristic is too pessimistic.

Challenge problem: Imagine a search problem (for example, maze navigation) where the heuristic is too pessimistic, for example that it never gets smaller than "3 steps from the goal" no matter how close the tile is to the goal (so for spaces three or more away from the goal it's perfectly correct, but for spaces that are closer it's an overestimate). Can you design a maze where following this wrong heuristic with best first search will lead to a worse path than using depth first search?

The "most constrained variable" heuristic is especially interesting for constraint satisfaction problems (where we're trying to assign values to variables, e.g. columns to queens) since it seems to be pretty good regardless of the specific problem. If we can represent a search problem as one of giving values to variables, we can unlock a variety of relatively general-purpose heuristics like this. Can you think of any others?

Adversarial Search

Adversarial search, especially game-playing, has long been an area of interest in computer science. Computers have raw computational power and the ability to play out tons of possible futures, whereas people are reputed to have superior pattern matching abilities and can better abstract over similar (though superficially different) situations. Over time, the balance of power seems to be shifting towards computers for well-defined, constrained, and deterministic games like Chess and Go.

The basic idea of computer game playing is for a player to assume a rational opponent and select an optimal strategy. This approach comes from game theory and is easy to compute in some sense: starting from all the possible moves I might make, consider all the moves my opponent might make, and then each move I might make in response to that, and so on; eventually every imagined playout will come to an end and I should prefer the move now that gives me the best possible outcome in the end, assuming my opponent plays the best that they can.

This notion is the foundation of minimax, an essential innovation in computer game playing. The maximizing player (myself) will take the move giving me the best utility or final score, whereas the minimizing player (my opponent) will take the move forcing my utility to be as low as possible. If we can calculate out all possible plays, we can propagate win/loss information upwards in this way (on my turn taking the best option for me, on their turn taking the worst option from my perspective) to solve the game. This can tell us not only what move to pick, but also whether the game is unfair to begin with (e.g., can the first player always force a win or draw?).

When I use minimax I am playing optimally: I'll never take a move that leaves me worse off than some other move I might have taken instead, assuming my opponent is also playing optimally. This definition of optimality refers to minimizing the risk of losing: Since I assume the worst possible response from my opponent (from my perspective; in other words I assume they'll always make their best move), I know that if they pick a less optimal move I won't be any worse off. You can try this for yourself with the "baby Nim" example in the adversarial search slides.

Challenge question: Imagine a game where I don't just care about winning or losing, but by how much. Is there a situation where making the risk-minimizing choice causes me to leave some winnings on the table?

In general, games are too long and complex to allow for running a full minimax search every turn. Instead, we might opt to play out for a certain number of turns in every direction, but then make an educated guess about whether this board is likely to lead to a win or not. We can use a state evaluation function or board evaluation function in a similar way to an heuristic to suggest whether each leaf game state is indeed promising. Then, we can do minimax as usual up from there towards our current state.