Parallel Thinking: Make Chess More Human Share your comment!


It is the holy grail of cognitive computing: get software to imitate human brain functioning. Most visionary computer scientists such as Douglas Hofstadter and Ray Kurzweil posit that the ability of humans to think parallel threads is a big part of what makes our cognition function the way it does.

No surprise then, that today’s parallel tools can facilitate just such parallel functioning. See how this demo chess engine uses parallelism threads to gain competitive advantage. It’s a fun way to consider how parallelism can advance your system development.

For this article, I wrote a demonstration of a chess engine that spawns several parallel threads to gain significant advantage over the old paradigm — a monolithic engine finding the best move. As I explain what I have done, you will understand and see just how powerful the Intel parallel processing technology is.

The Chess Engine

We need a few words to summarize how a traditional chess engine works. First, the engine makes a list of all legal moves. It goes through the list of moves in order to see which one is best (this is known as ply 1), then makes each move in the list. After making each of these moves, the engine makes a list of the legal moves an opponent can make (this is known as ply 2). Each of the opponent moves gets made, then a list of legal moves made (known as ply 3) and executed. If the engine goes deep enough, or to a high enough ply level, it can assess the board and arrive at a score with which an optimal move can be chosen. The search tree methodology usually gives chess engines the ability to traverse at least 5 ply levels for an easy skill level, or more like 8 ply levels for harder skill levels. The deeper the engine goes, the more moves ahead it examines, and theoretically the better it plays.

By now, though, you can see that the mathematics spiral out of control at an exponential rate. It doesn’t take long before the engine has to examine and score millions of board positions. For this reason, computer scientists have developed heuristics over the years that allow chess engines to know when can bail out of a ply. In simplistic terms, chess engines use what is known as an Alpha-Beta search tree with a cutoff mechanism. There are quite a few additional heuristics commonly employed such as a killer list, but this basic explanation should suffice for understanding how the program takes advantage of the parallel processing technology.

The Demonstration Program

Download Here

The demonstration program is a Visual Studio 2012 project and can be downloaded here. By design, white decides upon a move with a monolithic chess engine finding the optimal move. Black, though, acts more like a human plays. As humans, when our opponents are thinking about their move, we are deciding which moves they are most likely to make. We then think about those moves, and develop in our mind what we will do in response if our opponent actually makes one of the anticipated moves.

Immediately before White starts thinking and deciding its move, an instance of the chess engine is set to a fairly shallow level. It then determines the three most likely moves that White will make, and since it is set to a shallow ply level it finishes quickly without a noticeable delay time. And before white even starts thinking, the program spins up three instances of a chess engine to determine a response to the three potential moves that white might make. In this way, if white makes one of the three anticipated moves, Black has already been thinking about how to respond and makes an optimal move in a relatively short period of time after White’s move.

  • Before White starts thinking…
  • An engine set to a relatively shallow level finds what it determines to be the three most likely moves that white will make.
  • Three instances of the chess engine are spun up to determine how to respond to White’s move if it does indeed make one of the anticipated moves.
  • White then starts thinking about its move while the three engine instances that Black is using are thinking.
  • White decides its best move and makes the move.
  • If one of Black’s three parallel thinking processes matches White’s move, then that parallel thinking process finishes its thinking and makes its move.

The code to make the magic happens is easy and straightforward. A method named KickOffOneThinker() gets a move from one of the instantiated chess engines. It is as follows:

void CParallelChessThinkingDlg::KickOffOneThinker( int nWhich, short *SrcSq, short *DstSq )
// Here we set up this engine so that it will be able to
// operating and get a move.

m_objThinkAheadEngine[nWhich].SetSearchDepth( 8 );

// Make the move from the list. This is what was generated
// from the shallow search in the main timer method.
m_objThinkAheadEngine[nWhich].MakeHumanMove( m_nSrc[nWhich], m_nDst[nWhich], ::GetTickCount(), 0 );

BOOL bAbort = FALSE, bPause = FALSE;
int nMovesInQueue = 0, nMovesExamined = 0;

// Here we call on the engine to produce a move.
if( m_objThinkAheadEngine[nWhich].StartComputerMove( SrcSq, DstSq, &bAbort, &bPause,
&nMovesInQueue, &nMovesExamined ) != 1 )
// This will only happen if the search takes an extraordinary time.
while( m_objThinkAheadEngine[nWhich].GetComputerMove( SrcSq, DstSq ) != 1 && bAbort != TRUE )

Each engine could be spawned with a cilk_spawn method. But the Cilk Plus documentation clearly states that using a cilk_for is far more efficient and has much less overhead than separate cilk_spawn calls. A few simple lines of code as shown below kicks off each of Black’s engines and waits for a selection.

// Use the cilk_for syntax so that all three parallel

// thinking processes execute simulaneously with high

/ efficiency.

cilk_for (int i = 0; i < 3; i++)


// Call the method that kicks this engine off.

KickOffOneThinker( i, &src[i], &dst[i] );


// Now use cilk_sync to wait for the parallel thinking

// to complete.



Figure 1: When you run the program you will notice that White takes a noticeable amount of time to select its move, but Black usually makes a move almost immediately after because it has already been thinking.

There was one other thing that I did to optimize the results. After White made its move, I set a bail out flag for any of the three of Black’s parallel thinking engines that did not match White’s move. In this way, any CPU cores that are examining the untaken moves are freed, and can be used in other ways.

I’d be curious: Does this exercise give you any ideas about new places to add parallelism in your world? Let us know below.

Posted on May 6, 2013 by Rick Leinecker, Slashdot Media Contributing Editor