How to solve Solitaire

Abstract:

This document is about writing software to solve the game "Solitaire", in which the object is to clear a board of pegs/counters so that only one remains. If you were interested in finding solutions to the card game, I'm afraid you've come to the wrong place. If all you're looking for is the solution to the game, you can find it here.

There exist several variations on the game of solitaire. My code assumes that it is required that the last piece remaining must occupy the position that was originally empty. So if you started with the middle empty (as is traditional), you must end with the last piece in the middle square.

The software is written in C (will work on most1 compilers), and is based on a recursive descent strategy. On its own, such a method is usually too computationally expensive to be viable; this document focuses on the techniques used to improve the performance of this algorithm. The modifications are listed in the order I thought of them.

This document and the program supplied with it are © 2001 William Benfold. You are given permission to use and distribute them, provided they remain intact. Feel free to comment, criticise, or suggest further enhancements to either.

Contents:


0. Initial design

Initially, I decided to use a 7x7 array of int values to model the board, where a zero would indicate an empty position, and a nonzero value would indicate a filled position.

The recursive function for solving the game would operate like this:


	int solve
	{
		if(game won)
			return TRUE;

		for each move m
		{
			perform m on the board;

			if(solve())
				return TRUE;

			undo move m;
		}

		return FALSE;
	}

		
This led naturally to the idea that a move might be encapsulated in a structure of some sort, so I defined a struct to represent a move, consisting simply of three integers, one to represent each square involved in the move (start square, "jumped over" square, end square). These integers ranged from 0 to 48 (0 to 72-1); each different value representing a single square.

I wrote routines to apply a move to the board, to un-apply a move, and to describe a move (both for debugging, and in order to dump the solution at the end.

To check for the game being "finished", I added a parameter to solve(): an int indicating the number of the current move (eg. first move is 0, last is 31). This move number would be incremented after performing a move, and decremented after undoing it, so it would effectively keep track of how deep the function had recursed. If solve() was entered with a parameter of 32, then would be only a single piece left on the board. Then an appropriate check could be applied to the state of the board (depending on which variation of the rules is being played).

To display the solution, I modified the code so that if solve() returned true, the move m would be printed out before the function returned. Note that this would cause the moves to be listed in reverse order, but this is not a problem, as we can either follow the list backwards, or write some code to automatically reverse the list.


1. Caching lists of moves

An expensive part of the above algorithm is the benign-sounding "for each move m". Finding each possible move is messy, and likely to be slow. This can be improved by precalculating a list of moves. We can use an ugly algorithm for finding moves, eg.

for each square s on the board
    for each of the four cardinal directions
        if(s and the next two squares in the current direction are all valid squares)
            if(we haven't had this move already)
                add it to the list
		
We can do this once to construct a list of all moves, and then just run through this list when we want to search for moves. Obviously, when we go through the list we will have to check that each move can actually be performed with the current arrangement of pieces.


2. Symmetry

We may be able to cut some corners if we consider symmetry. For instance, if the board has horizontal symmetry, we will consider (2,0)->(2,2), but we needn't look at (4,0)->(4,2), as this is merely the mirror image.

Again, we can precalculate lists of symmetrical moves to save repeating calculations. We produce a list of all moves, and a smaller list of the moves necessary for each symmetry.

SymmetriesMoves to consider
None76
Horizontal43
Vertical43
Both38

Note that it is necessary to check the board for symmetry before applying one of the smaller lists. For this reason, considering symmetry can easily increase running time rather than decreasing it, and it would be wise to restrict symmetry checking to the earlier stages in the game, where it will be more useful. However, even used this way, symmetry checking becomes obsolete when a caching system (see below) is introduced.


3. Packing the board into one number

With the present model of the board, we have to make three comparisons to test whether a move is possible, and we need to modify three values to apply a move.

This burden can be reduced by representing the board in a single number. How do we do this? There are 33 squares on the board, each of which may or may not be filled. So the state of a single square can be stored as a single bit (eg. 1=filled, 0=empty), and thus the state of the whole board can be stored in 33 bits. This is slightly irritating, as the normal size for an integer value in a modern C compiler is 32 bits (since many machines have 32 bit registers); this leaves us one bit short. We could attempt to handle this extra bit separately, but this could become very complex. Instead, we make use of a new data type present in many C compilers, the 64 bit long long integer type.

We then arbitrarily decide on a mapping from squares on the board to bits in this number. The method I used was as follows:

	BOARD_TYPE coordToBit(int x, int y)
	{
		int pos;
		switch(y)
		{
			case 0:		pos = x-2;
						break;
			case 1:		pos = x+1;
						break;
			case 5:		pos = x+25;
						break;
			case 6:		pos = x+28;
						break;
			default:	pos = x-8 + y*7;
		}
		return ((BOARD_TYPE) 1) << pos;
	}
		

This maps all 33 squares into the first 33 bits of the value. Note the return type is declared to be BOARD_TYPE. At the top of the program, this is declared to be the same as long long . If this code were to be compiled with a compiler which does not have long long , but provided a 64 bit data type under a different name, we would only need to change the line where BOARD_TYPE is defined. Note also that we return a BOARD_TYPE value, which has a single bit set to indicate the position.

Now, using this representation of the board, we can store moves as two values: an AND mask, and a comparison value. Our AND mask will have three bits set, indicating the positions affected by the move. The comparison value will have two of those three bits set, indicating which positions must be filled, and which position must be empty. A move will be possible if and only if board & mask == comparison value. The AND mask causes us to ignore all the irrelevent bits, so we are just comparing three bits (corresponding to the positions of the three squares involved in the move) with the comparison value,

To perform a move, we simply XOR the board with the AND mask. Note that when performing a move in Solitaire, each square involved has its state inverted. This is exactly what we are doing when we XOR the board with the AND mask (recall that this has exactly three bits set).

To display a move, we can use as crude a method as we like. Moves may be tested and performed millions of times, but in the end, only 32 moves will be displayed. I simply search for a Move structure (see 0, earlier) equivalent to the pattern of bits for the move. There are plenty of better ways, but I don't care, as it's neither interesting nor important.


4. Inlining and macros

In most languages, including C, function calls are relatively expensive. When a function is called, some or all of the the CPU registers will be pushed onto the stack (at the very least, the program counter), arguments will be copied onto the stack, local variables and implicitly created "hidden" temporary variables will be allocated on the stack. The CPU will then branch off to the called function; the CPU's fetch/decode pipeline will have to start again, and depending on how far away this new block of program code is, the contents of the cache may be useless.

The point is that excessive function calls should be minimised. This conflicts with the idea of breaking down a program into a hierarchy of functions. It is true that the fastest program will be one where all the code is in a single function. However, this makes the code much more difficult to debug and maintain.

Inline functions and macros provide two solutions to this problem.

Macros:

Inline functions:

Think of a macro as a search-and-replace, and an inline function as, well, a more intelligent version...

Recursive programs like ours frequently do have a lot of function calls (due, obviously, to the recursion). However, making a recursive function inline (or attempting to collapse it to a macro) is not a good idea. Aside from any issues, doing so would (in theory) make your program infinitely long. If you want to get rid of the recursive function calls, rewrite the algorithm without recursion2.

How does this apply to our problem? Testing and performing a move are simple operations, but we'd like to keep them out of the way, so we'll put them in inline functions. Later on, any other little "helper functions" we add will be inline functions.

On the subject of general C optimising tricks, here's another: our program will have several options that can be set, eg. whether to enable certain experimental optimisations. Rather than checking these repeatedly with if statements, we "compile them in", using the #ifdef compiler directive. For instance, we would only want to test the board for symmetry if we had symmetry checking enabled, so we use:

	#ifdef SYM
		s = findSymmetries();
	#endif
		

To enable symmetry checking, we would have a "#define SYM" near the top of the code, or pass "-DSYM" to the compiler (assuming we're using the GNU C compiler). This way, we cut out an "if" statement.


5. Caching move evaluations

(This is the technique which gives the biggest boost to the speed of the program)

In solitaire, there are many sequences of equivalent moves. There is usually more than one way to get the board to a given state. In fact, it does not stretch the imagination if we guess that there may be millions of different ways to get to the same board layout. At present, our algorithm considers every board layout, every time it is reached. Each time this happens, all the possible sequences of moves starting from that layout have to be considered all over again.

How do we cache board layouts?

We can make our algorithm enormously more efficient by "remembering" which layouts we've considered. We can sound technical by calling this "caching", as that is, in fact, what this is. The only problem is that there are 33 squares on the solitaire board, and therefore theoretically 233 possible board layouts. For each board layout, there are three possible states: possible, not possible (ie. dead end), not yet considered. We never need to store the fact that a given layout is possible4, so we only need to worry about "not possible" and "not considered". We can store this information in a single bit, eg. 1="not possible", 0="not yet considered". So we need one bit for each of the 233 layouts. There are eight bits in a byte, so we need 230bytes (as 8 = 23). Suppose we had a huge array of bytes (unsigned char5 in C-speak). We could look up the appropriate bit for a given board by taking the BOARD_TYPE value, and shifting right three places (dividing by eight) to get the right byte, then AND the board value with 7 (to get right-hand three bits) to find out which bit to look at.

230 is 1073741824. That's exactly one Gigabyte (if you're sensible), or roughly a Gigabyte (if you think SI units and computers are compatible), or a Gybbybyte (if the SI people get their own way). Either way, a lot of memory is needed. At the time of writing this, desktop PCs are sold with 256MB or 128MB of RAM; a GB of RAM can be allocated using virtual memory, but paging our cache out to disk would entirely defeat the point. Besides, malloc(0x40000000) is not likely to just happily return a pointer to a gig of RAM. And even if we could grab that much, do we really need it? We're going to stop at the first solution we find, so we'll only consider a small fraction of all the possible board layouts. Also, there may well be some board layouts which are simply not possible. The trouble is that we can't tell in advance which layouts we need and which we don't.

Although caching can give us a huge speed increase, it causes an enormous increase in the memory required by the program. To become practical, we need to reduce this to the point where we are only using RAM, and the OS is not forced to page out our data. The solution is to use a "sparse array".

Sparse arrays

The idea is simple: allow the array to have holes (this is similar to the idea of sparse files on some Linux filesystems). We break our 1GB array down into "chunks", each of size 1K (say). So we have broken the huge array down into a million small arrays. We have an array of one million pointers, each of which points to a 1K block of RAM. Now the nice bit: we start off with all the pointers set to 0 (a.k.a. "null"). When we store something, we look to see whether the appropriate chunk pointer is zero. If it's nonzero, we use it as per normal. If it's zero, we allocate 1K of RAM, and set this pointer to point to it, then use it as normal. This way, chunk pointers that never get used will never have any RAM allocated for them.

This works well, and brings the solution well within reach. However, the program still claims an undesirable amount of RAM. We can reduce the requirement by:

  1. Adjusting the size of the chunks. Smaller chunks mean less RAM gets wasted, but the chunk pointer table grows.
  2. Not caching everything. If we cache all the even moves (2nd, 4th, 6th, 8th, ...), say, we won't get quite such a huge speed increase, but we can greatly reduce the size of the cache.


6. The program

The program consists of three files:

In addition, here is a makefile; this will compile the choice of settings which seems to work best (at least on my machine).

The program takes two numbers as options on the command line; these are the coordinates of the starting/ending square. For instance, to solve the usual version of the game, where one starts and ends on the centre6 square, you would type "sol 3 3", as (3,3) is the location of the centre square. On my machine, solving for the centre square takes two seconds. One of the "trickiest" squares to solve for seems to be (0,2), which takes 39 seconds. Interestingly, solving for (2,0) only takes 6 seconds, which highlights a useful trick: if the program is taking too long, try setting it to work on an equivalent problem.

Known bugs:


7. The solutions

Without further ado, here are the solutions. The solution to the classic game (where the centre square starts off empty) can be found here. For the versions of the game where the initial empty space is elsewhere, click the location of the empty square on the board below.

(2,0) (3,0) (4,0)
(2,1) (3,1) (4,1)
(0,2) (1,2) (2,2) (3,2) (4,2) (5,2) (6,2)
(0,3) (1,3) (2,3) (3,3) (4,3) (5,3) (6,3)
(0,4) (1,4) (2,4) (3,4) (4,4) (5,4) (6,4)
(2,5) (3,5) (4,5)
(2,6) (3,6) (4,6)


8. An interesting observation

Recall that the program displays the solution in reverse order, ie. the last move to be made is the first one listed. It would seem that the solution does not have to be read backwards to work; in other words, the sequence of moves required to solve the game can be applied forwards or backwards, with the same end result.

Thanks go to Isaac Morland (ijmorlan AT uwaterloo DOT ca) for sending an explanation:

This is so because Solitaire as usually played has a dual version in which the board starts empty (except for one peg), and spaces jump over other spaces, eliminating the jumped-over space by leaving behind a peg. The location from which the space jumped becomes a peg, and the target location must be a peg (which is removed). So running a move sequence backwards actually is playing the "through the looking glass" version of the game in the forwards direction.


Acknowledgments

I should mention that it was Guy Thompson who suggested I write this program. Having solved the game for every square other than (0,2) and those equivalent to it, he wanted to know whether there actually was a solution or not. See Guy's page here.

And the idea of using a sparse array came from Mark Wood, who listened to my confused mutterings and offered encouragement, as did David Simms. Mark's page has some useful DarkBasic programming resources, while Dave's page has links to some very amusing sites.

If you have any comments, suggestions or criticism on this document or the program, please send mail to me at william@soton.ac.uk.

Back to Will's page


Footnotes

1. Your C compiler must provide a data type large enough to hold a 33 bit value. The program assumes that long long is such a data type. The GNU C compiler provides a 64 bit implementation of this data type; other compilers may also do so, perhaps under a different name.


2. This is always possible. Recursive algorithms don't need to be recursive; all they really need is a stack3. By implementing that stack with your own code, rather than using the machines execution stack, you can eliminate the recursion.


3. Actually, you don't even have to have a stack. Instead of navigating your tree pre-order, post-order, or whatever, do it in depth order. You probably wouldn't want to do this; the point was just that you don't have to have a stack.


4. An interesting extension to this program would be a program to find all the solutions. Well, maybe not list them, but at least count them. Anyway, if you were doing this, it'd make sense to remember which layouts the game could be finished from, not just which ones were impossible. In fact, to be really clever, you'd remember which layouts the game could be completed from, and how many ways in which it could be completed . . .


5. The "unsigned" isn't strictly necessary for our program, but is a good idea in general. Otherwise, we may get unexpected results when performing any arithmetic on these numbers.


6. That's right, centre. Check it in a dictionary if you like. In an English dictionary. From England.