Whitespace Reasoning

Our agents need to make sense of a complicated world. They need to outwit the player every so often and they need to make convincingly organic mistakes, not systematic outcomes on obvious clockwork. There are simpler problems to deal with; they need to maneuver and plan in a world that changes, not only with time within a playthrough, but over the scale of development. You can’t afford to hold the hand of every monster you’ve ever added, every time you add another.

(A word about the brain, by way of metaphor. We think the purpose of the brain is to deal with the world, but its first purpose is to deal with the body. It doesn’t know, until it learns, how tall you are, how many colors you can distinguish, or even how many limbs you actually grew. Its job is to cope with whatever the body throws at it. If the brain had to change every time the body changed, every mutation would be abortive and every amputation, mortal. Equate the body with the game world, not with the monster’s body; good AI lets you change the game world without worrying about breaking carefully crafted monster behaviors. It’s about looser coupling.)

I’ll start by discussing the general principle of objectification (how you pretend that your beliefs about the world are actually out in the world, and how that idea can simplify monster AI), the more specific principle of whitespace reasoning (which is the point of bringing up objectification at all), and then start to discuss Dijkstra scans from the very particular perspective of the roguelike developer. In later posts, I’ll develop these themes more completely; this is something of a cross-section.


It is profitable for an actor not to think about what it can do. It should project its capacities onto the world around it, asking what the world can do for it. This is a very practical view, but it seems strange because we use it automatically, without understanding it. An everyday object like a hammer seems to be for driving nails and, if it has a claw, for pulling them; this is the only property of a hammer that sets it apart from other objects of similar heft. But this isn’t an ability of the hammer at all — you’re projecting your own ability (that is, to swing) onto it. Your actors need to objectify the world, and each other, and themselves.

Because video game actors spend most of their time moving around, it makes sense for them to objectify the space around them. You can see this as offloading thought from the actor onto the space around the actor — whitespace reasoning. Instead of imagining that your actors think for themselves, pretend that the space they’re standing in does the thinking for them. The floor knows there’s food in the fridge; the road knows there’s work at the factory; the grass knows there’s a wolf nearby. You can think in terms of ants and their pheromones if it helps you, but I suggest that you not think about it in those terms too long. Instead, think of places that feel strange even when they’re not serving their usual function. Consider how you feel standing in the middle of an empty street, or when you’ve wandered into the wrong neighborhood, or when you’re stuck in the woods. I feel queasy cutting across a queue, during the moment I intersect it geometrically.

You want to imagine that your actors feel this way.

You’ll most often see this done by analogy with ants or hounds, though. In a scent map everything interesting leaves a smell, and smells diffuse and fade with time. Most models you’ll see go in for a very literal interpretation based on physical diffusion, which is curious, and an example of taking the analogy too far. They also tend to stop right here, because the analogy doesn’t get you any further. Influence maps are scent maps by a different name, and I haven’t seen them used in any more sophisticated way. They’re often worse, using an exponential falloff function for which there is simply no justification, but which makes reasoning about the map much more difficult.

Neither of these, of course, does more than compute distance, with a few wrinkles that are hard to formalize. Both, for instance, propagate signals slowly — a more distant change takes long to affect local values. (The reduction of jittering caused by slower propagation can justly be adduced in its favor, but jitter is likely to reemerge in unusual situations; it is better dealt with directly.) Both perform poorly, but as they are generally used sparingly, this is no great loss. We’ll be using a more general formulation that is faster, easier to integrate with other techniques, and much easier to reason about formally.

Enter Dijkstra

If you haven’t read it, read Brian Walker’s The Incredible Power of Dijkstra Maps. Brogue shows just how exciting a sprinkling of pathfinding techniques can be. The algorithm given in the article performs suboptimally, but it hardly even matters. It’s amazing what you can get away with, and it’s worth it even if you can’t put together the most efficient scan. You can do more with distance maps than immediately presents itself.

Brogue’s safety maps, which are used when monkeys run away from the player, are calculated by scanning the distance from the player, multiplying by a constant smaller than -1, and scanning again. This is a totally different world than anything done with influence maps or scent maps. The first scan detects cells that are most distant from the player; the negation turns those cells into goals, and the constant causes the monster to be willing to dash by the player to get further away. Several nonlocal effects get rolled into one map, and the monsters exhibit global thinking — all without looking further than neighboring cells. Space itself has done the thinking for them.

In Brian’s fourth case (“Desire Driven AI”), you’ll want to make sure to run another scan after adding the desire maps together, and then add some multiple of the original sum back in. Why? You don’t want to get stuck in a local rut, but you really would like to stop by water on your way to a more important goal. These Dijkstra maps treat every cell as a potential source; the scan doesn’t take a single source and minimize its cost, nor even a list of sources. Instead it lets you give every cell an effective cap. If you can’t get there cheaper from somewhere else, then its own influence kicks in. This is absolutely invaluable for safety maps and similar global reasoning.

What’s in a node?

We needn’t assume, even in a grid-based game, that one grid cell is one node. It’s certainly possible to reduce the number of nodes we need to look at; you can set one node at every doorway, for instance, and assume it’s easy to path across rooms. Each node, in that case, would represent more than one grid cell, speeding up pathfinding considerably. But that kind of optimization works against us, here; we’re looking for new interpretations of space, and that that trimming would lock us into one model.

More than one node can belong to a single grid cell. This is much more useful: It lets us track state other than position. We don’t want to track any kind of global state this way because it results in an exponential explosion of nodes, but we can track local state very easily. So keeping track of which doors have been opened is probably a bad idea, but if doors close automatically after two turns it’s fine (because we don’t need separate nodes for that state when we’re further than two steps from the door.) If there’s slippery ground, we can have nine nodes for one cell, one node for each direction of motion; stepping from a neighbor onto such a cell, actually brings us to the node that includes the proper direction.

For every two sets of state we want to track, we need to track their direct product. If we need to track whether a door is open and which way we’re sliding on ice, we need 2*9=18 nodes. If we need to track eight doors and sliding, we need 256*9=2304 nodes, all for one cell. We obviously need to control this carefully, but the general technique is too powerful to give up entirely; it’s also quite inexpensive, used carefully.

Dijkstra and A*

The explosion of nodes representing various states is the major reason Dijkstra ends up falling flat for us: Every node has to actually exist, and we expect to visit all of them. When that’s just one node per cell on an 80×25 map, that’s 2000 nodes; smaller than a couple lines of a high-resolution bitmap. When it’s sixteen nodes per cell on a 400×300 map, we hit 1.83 million nodes, about a thousand times bigger and a thousand times slower. Or keep the map small, still just 80×25, but give the player 26 potions to drink: each turn, the player can either move or drink one potion. We end up needing at least 226 * 2000 nodes, because at any given time each potion will have been used or not; in practice we need even more, so we can track the state of the effects caused by drinking potions. If each cell takes one byte (which is unreasonably compact), that’s 125 gigabytes. Good luck.

Now, one way to simplify things is to construct nodes only when we need them; until your search algorithm tries drinking the Potion of Levitation, don’t actually allocate cells for that case, and then only allocate them on the nodes you visit. But we have to add a cutoff — either limit the number of nodes we expand, or limit the maximum distance we’ll search. Neither is entirely desirable, particularly since we’ll still only explore the immediate neighborhood. If the puzzle is remotely difficult, and the potion-drinking searcher needs to make it across the map, we’re pretty much certain not to find an answer.

A* and Dijkstra solve the same basic problem, but A* relies on a few extra hints in order to avoid visiting every cell. (This isn’t a huge gain for us in the 80×25 case; whitespace reasoning is all about visiting every cell, and it’s really quite cheap to do so, as long as we avoid an explosion of states. We do need A* for some very unusual problems, and we’ll come to those in a moment.) The first hint is a destination; we have a particular node (or set of nodes) that will short-circuit our search, if we happen to bump into them. (We can add a destination to Dijkstra, too; the result is half way to A*). The second hint has to do with the order in which we search. Instead of looking all around our starting point, we take advantage of geometry to start out walking blindly towards the destination, based on an admissible heuristic, which is a guess as to how much further we need to go before we get to the destination. It’s not actually a guess: It’s got to be the absolute best case. On a simple grid based map, it’s the number of steps we’d have to take if there were no walls anywhere.

If the heuristic is always exactly right, then the search will always go straight to the destination, giving the best possible path as fast as possible. The worse the heuristic underestimates the real cost, the longer it will take us to find the optimal path, but we will eventually find it. If the heuristic ever, even once, overestimates the real cost, it becomes possible that we will never find an optimal path. A heuristic that can produce an overestimate is inadmissible.

On a map with teleports we cannot, in general, use any better heuristic than the distance to the nearest teleport (plus the distance from the destination to the nearest teleport to it). If the map is totally non-euclidean and nodes are connected arbitrarily, as they might be in interactive fiction or on a MUD, then all bets are off. We simply cannot generate a valid heuristic; there’s no grid to overlay on the nodes, anymore.

On the other hand, we’re allowed to track extra state with our A* nodes. We actually can attempt to solve a puzzle that involves drinking potions in a certain order. (Risto Saarelma wrote an AI for Vector Racer, in which players manually adjust the x and y components of a car’s velocity to navigate a track drawn on graph paper; it involved an A* search over the quartet <x, y, xv, yv>. This kind of thing is valid and really quite practical.) The only problem is that our heuristic is going to be terrible.

Dijkstra to the rescue. We can actually perform a Dijkstra scan under better-than-possible conditions and use the result as our heuristic — it’s an absolutely glorious possibility. It means that while our A* search is making the hard decisions, it doesn’t have to deal with things it’s bad at, like navigating a maze. Dijkstra, which is great at navigating mazes, has already handled that part. If the A* just follows the result of Dijkstra, whenever it’s possible, it’ll get to its destination. All that A* needs to handle, in this case, is the extra state that Dijkstra couldn’t handle. It’s a perfect pairing; the weakness of one plays off the other.

Take group coordination. It’s easy to do approximate group coordination with Dijkstra alone; monsters can path towards their target, try to stay a certain distance from each other, or follow a leader just fine. But they’ll bump into each other and you need to add game rules, like displacement, to resolve it. To do a Dijkstra scan for two monsters, over a map with N nodes, you now need N*(N-1) nodes — each monster can be anywhere. Even if you force them to stay within five steps of each other (the implementation of which is an interesting challenge), it doesn’t help matters considerably, and you risk missing an otherwise valid solution. But A* can handle the two monster case just fine, and for many more monsters, too, as long as it has useful Dijkstra scans to help it find its way.

(Other refinements to A*, like Jump Point Search, amount to a simplification of the graph the search is being performed on; they’re really only useful when certain assumptions about symmetry hold, and we’re not willing to make those assumptions. Instead, we’ll go on designing around the more general cases.)

Future Topics

In a later post, I’ll be discussing the design and implementation of the PDS (Progressive Dijkstra Scan) suite and the novel possibilities that these techniques open up. Maps no longer have to be static; monsters can understand levers, wait for each other to finish walking through narrow tunnels, and make sensible decisions about their inventories. More important than anything, though, the bad decisions they still inevitably make will make sense to players — and the bugs (for there will be bugs), will be hilarious.

2 thoughts on “Whitespace Reasoning

  1. TEttinger

    So can PDS techniques be used to ensure monsters with “combo attacks” (or similar abilities that work better when more than one monster uses them in a span of time) time their usage and align themselves correctly? It seems like a similar problem to tunnel movement, but would it involve an A* search as well?

    1. Joshua Post author

      Planning attacks well is hard because you get into traditional game theory, and you have to keep track of probabilities and that sort of thing. Getting mobs to play perfectly would be pretty hard.

      But to get them to make sensible decisions, yeah, some of this can help. If you want one mob to stand three cells from the player the same turn another mob is next to the player, which will enable them to use their attacks at the right times, then it should be pretty easy to pick a few scans that’ll let them do that. (I have archers in the dev version of Rook that path to the nearest point from which they can shoot you, sort of along those lines but without the cooperative aspect.)


Leave a Reply to TEttinger Cancel reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">