Monthly Archives: July 2013

A Pathfinding Primer

There’s a thick mist to cut through on the way to good game AI. A data structure, after all, means nothing without a host of algorithms, and the algorithms can’t be described without their data. Everything depends on knowing everything else.

A graph consists of nodes and edges; an edge takes you from one node to another. Usually by a node we really mean a state that our system can enter; maybe it’s a position a monster can be in (quite concrete) or maybe it’s the position of everything in the game (a little more abstract). The edges just describe the way we can move from one state to another; they’re decisions we can make. That’s why graphs matter so much in game AI. They’re important everywhere, but in games we want to make decisions, and graphs make that easy to talk about.

Much of the cleverness comes in in deciding what counts as a node and what counts as an edge. That’s up to you. Sometimes you want edges to go both ways and sometimes you don’t; maybe you can only jump down a cliff, or maybe you don’t allow backwards motion because time only runs one direction. Often it doesn’t even matter (for reasons we’ll see) since the question you’re asking carries with it a sense of direction. There are lots of transformations you can make without changing what your graph means. There are bipartite graphs, for instance, where there are two disjoint sets of nodes, and every edge takes you from one set to the other; for example, you might have a node for a person and a node for a club, and edges tell you who’s in which clubs. If it’s easier you can turn the edges in any graph into nodes and end up with a bipartite graph, or turn one set of nodes in a bipartite graph back into edges, and you can even swap edges and nodes. Or sometimes you want to take a big complicated graph and simplify it by merging lots of nodes into one, losing some detail. The point is that if you ask how it has to be, you’re going to get a lot of waffling. You can do whatever you want with graphs, but that doesn’t help until you know what you can do.

Your graph might not even exist. Oh, sure, in the mathematician’s sense it exists, but it might not exist anywhere in computer memory. It might not even be possible to compute the whole thing before the universe dies. Sometimes the nodes actually exist, but you’ll just work out the edges in code; on a grid based map, there’s really no reason to construct data representing the step from one cell to another. (If it helps, remember that data doesn’t mean anything without code to read it; so pretending that an edge has to be stored as a pointer or an instance of a class, is pretending that it somehow doesn’t take code to follow that edge. In fact, you should always start with the assumption that edges are code, and only do something special if that’s clearly not true. This turns out to be especially convenient if you can easily undo an edge, since certain really thorny problems will then succumb to brute force.) It’s ok for it to be infinite if you don’t need to look at the whole thing to answer your question, but I’ll leave that for the section on Dijkstra.

What You’re Searching For

You’re looking for a path. A path is a sequence of edges. Each edge leads away from the node where the last one left you, and the first edge sets out from the source and the last one leads into the destination, if there is one. The informal language lines up with the formal language here.

When you want to find a path, you’re not just looking for the destination. If you were, you wouldn’t need a pathfinder at all; the path has to tell you how to get there, but the destination is just a location, and usually you’re starting with a destination in mind. (But not always! You can pick a path even when you have no destination in mind, so you should be able to search for a path in that case, too.) A path is a sequence of steps you plan to take, which means it’s a sequence of edges. It will never intersect itself, so it’s actually just a set of edges; but though a set is simpler in math, sequences are usually easier in computing. But sometimes the intuition that it doesn’t have to be a sequence will come in useful.

How you represent the path in memory as you build it is up to you, too. You might keep a sequence of steps to take (that is, a literal sequence of edges), or you might mark cells on the map to indicate which step you’re supposed to take out of them, or maybe you’ll keep the sequence of cells the path visits and figure out which direction to go based on that. If you use Dijkstra maps, you might not even have that: instead, you just have a score on each cell, and you construct a path as you go by moving to the best neighboring score. You can think of random walks and Brownian motion as being sequences of edges represented by the random seed!

(This isn’t any different from other data structures. Practice on a few. If you read that a list is “a first element and the rest of the list,” it means that there’s a chunk of code you can run to get the first element, and a chunk of code to get the rest of the list, and maybe all that code does is look something up in memory but that’s an implementation detail. If the rest of the list is represented by code and some state, you get a lazy list instead. Try it with the definition of any data structure you’re familiar with.)

Your search space is the set of all reasonable possibilities — so in the case of paths, you’re not searching “on a map” or “in a dungeon.” You’re searching through the set of all connected sequences of edges. But the practical sense of searching for a path through the map is identical, so you don’t usually need to think about it until you set about hybridizing multiple algorithms.

The Edges of a Dungeon

What do the edges actually look like? Everything depends on game rules. There are no edges leading from a floor into a wall cell, for instance, or through a corner that’s too “narrow” for the player to pass through. But there might be edges leading across teleports, and there can even be edges that depend on the time component of the node (remember that there will be many nodes for each cell if you need to track more state, and that that state might include time).

In Cogs of Cronus, I’ve got ice, which is slippery. That means that in addition to location you have momentum, and momentum affects where you can move next and is determined by where you moved from. So, for ice cells at least, there are actually nine different nodes: one for each direction of motion. From any of the eight neighboring cells, stepping onto the ice will lead you (for pathfinding purposes, at least) onto the node that represents that direction of motion; if you are able to stop and you wait then you move to the ninth node, which represents stillness. A little bit of thought will tell you a lot about how your monsters can think about the dungeon.

Try thinking about how you would label each node if you had to drop them all in a bag and then tell them apart later. You might name it by an <x, y> pair, or you might want to have a time component too, <x, y, t>, or you might have something like <x, y, t, <potion a, potion b, potion c>>. Even if some possible labels would never get used, this can give you some hints to help you figure out what you need for your particular problem. You always want to put as little into the label as you can. If you don’t need to know how much time has passed, don’t label the nodes that way. If you find yourself staring at graph paper, confused, you might ask yourself what you’d have to do to keep a path from intersecting itself. Sometimes there’s a bit of state you forgot.

Suppose you’re doing some cooperative pathfinding. A few different graphs make sense, and you’ll use more than one. There’s the graph (call it D) that represents the position of a single entity in the dungeon; the edges are the ways it can move from cell to cell, or wait, or teleport, or whatever. But then you can think of another graph where (let’s take the two monster case for a moment) the label of each node is <monster A is at position a and monster B is at position b>, where positions a and b are actually nodes in the simpler graph. The edges of this cooperative graph, if you imagine the monsters as moving simultaneously, will correspond to all possible pairs of edges of D (one move for each monster). If you imagine that one moves and then the other moves, then you get twice as many nodes (“A is here, B is there, and it’s A’s turn,” “A is here, B is there, and it’s B’s turn.”) but you have far fewer edges. Either way this is a huge search space and it takes some good tricks to do it well.

Search Strategies

A breadth-first search proceeds outward in shells, like an onion, visiting the source first, then all the nodes one edge away, then all the nodes two edges away, and so on out. When roguelike devs talk about floodfill, this is what they mean. To perform a breadth-first search you have to keep track of all of the “open” nodes, which are the ones in the current shell, so you can come back to them and visit their children. It’s a good approach when (as in the floodfill case) you actually want to visit every node in the graph.

A depth-first search just sets out on a path and keeps going until it hits a dead-end; then it backs up one step and tries something else. Backtracking algorithms (like Knuth’s Dancing Links) perform depth-first searches of combinations of constraints. Depth-first needs to remember its path back, which means you either need a stack or you need to mark nodes with the direction to their parent. It’s still up to you what order you pick the children of any node in; making a good guess can save a lot of time.

A best-first search also, like a breadth-first search, keeps track of open nodes; instead of running through all the nodes in one “shell,” though, it repeatedly picks the “best,” or most promising, open node. What the best open node is depends entirely on the problem you’re solving, so best-first search is a huge category. The other strategies can both be treated as forms of best-first, since in the case of breadth-first you can say the best node is the one the fewest steps from the start, and in the case of depth-first you can say the best node is the one that’s the most steps away. But best-first searches really shine when you have a different measure, such as distance, hazard, or time cost. It’s not just that you find your destination faster with a best-first search. This really useful property emerges: The sequence of steps from the source to the destination, when you find it, will also be the best path (or one of several ties for best), because if there had been a better path you would have expanded it first! (You try the best first, so of course the first is the best.)

Practical pathfinding algorithms expand the best nodes first. It is faster, easier to prove that the result is optimal, and more versatile. But it is useful to see the varieties as being more akin to breadth-first or to depth-first.

Dijkstra is more like breadth-first; it checks all of the options closer to the source before proceeding further. A* is more like depth-first; it sends runners out towards the destination, and only defaults to a Dijkstra-like strategy when they crash into obstacles. A* requires more information about global geometry than Dijkstra, since it has to know which edges are closer to the destination in order to send runners out towards it (this knowledge is called the “heuristic,” since it’s a rule of thumb for judging how far is left to go). We also normally let Dijkstra keep going after finding the destination, or we don’t even have a destination in mind, in order to get costs to every cell on the map; A* we normally terminate as soon as it finds the destination, and treat the result as the best path.

But you can terminate Dijkstra after finding a destination, or (if there is more than one), after finding a certain number of them, and you can also use a heuristic in order to find that destination faster. Make both of those simple changes and you’ve turned Dijkstra into A*.

In any search you can terminate after visiting a certain number of nodes, or after going a certain maximum distance, although then you can’t gurantee that you will find a result even if one exists; it might happen to be further away than you’re willing to look. For an AI this is generally quite ok. You can get passable results by running shorter searches most of the time, and proceeding to longer searches only if you have to, creating the appearance of indecision.

With Dijkstra we often want to visit every node, so we have to simplify our graph as far as possible. We usually want one node per cell of the dungeon, and we want edges to correspond to steps we take, and we don’t want to worry about things like backtracking. With A*, because we zero in on a path that looks quite likely, we can travel a more elaborate graph, or at least keep more information about each node. The rule of thumb is that with Dijkstra, we get exactly one number per node to treat as state, and that’s the score that we were already using to direct the best-first search.

The score, which is the cost of the best path so far, is the one extra piece of state you can track in Dijkstra without allocating additional space. (Really, though, it’s just that you’ve already allocated extra space for it.) If the number is time, then you can use that very carefully to let edge costs change with time. Monsters can wait for doors to open, or even wait for a floating platform to arrive. If that number represents an accumulated “hazard,” the chance of taking enough damage to be killed, then perhaps we believe that a certain kind of risk no longer matters after a certain threshold is passed (imagine separate shields and health, where some damage only affects shields.)

Neither A* nor Dijkstra allows us to speak of any property of the path as a whole. They cannot, for instance, prefer paths that look the most like straight lines when they move through open rooms; that must be achieved by post-processing. But if you are willing to track some extra state you can certainly minimize twistiness, defined by the number of times you have to change direction, by the same trick I use for ice in Cronus: give yourself eight nodes per dungeon cell and charge extra whenever the next step is in a different direction than the last. You can track the properties of the path, such as whether it crosses a certain kind of cell, and even to take that into account in choosing the best step to take next — but only as a tiebreaker, as with lexicographic ordering.

Gradients

It is possible, in some simple cases, to find paths without doing any pathfinding at all, as such. When you have a gradient map of any kind, representing (perhaps) the distance from valuable items, or the most recent time that a cell was visible to the player, or when cells were last traversed by fleeing monsters, it is possible to find a local optimum by simply rolling down (or up, if that’s how you roll,) the map. Find the neighbor with the best score and move there by whatever edge connects you.

If this gradient has been generated by a Dijkstra search, then the result will in fact be an optimal path to the destination. Influence maps and scent maps also produce gradients. It is possible to perform arithmetic on them, cell by cell, adding or multiplying them together; consider that our computers are more than a thousand times faster than they were thirty years ago, when arithmetic on single numbers was cheap enough not to worry about, and conclude that arithmetic on thousands of cells must now be cheap, too.

You can pick a random cell in a gradient map, if you treat each cell as a score, by picking a random number between one and the sum of all cells, then looping over them and subtracting them from that number until it falls below zero.

You can treat any gradient map as a cost map for each step in another pathfinding scan. You can also treat one as the initial state when pathfinding; the number in each cell tells you how much it costs to start there. (Brogue makes extensive use of this.) Making a Dijkstra scan, manipulating the resulting gradient, and then running another scan on the result, is a potent technique.