Fitness Function for Tic-Tac-Toe

As has been explained above the fitness function $ f$ is a mapping $ f:\cal{W}$ $ \longmapsto \Phi$ from the possible states of the world into possible fitness values.

Some questions arise:

  1. Which kind of information should be encoded in the fitness function?
  2. How can the classifier system exploit this information in an optimal way?
  3. How often should a fitness value be apllied to the acting systems?

There are not yet complete and answers available for these questions, only first considerations and experiments.

With rgard to the possible information to be encoded one can state, that the game has a clearly defined win state: if one player can generate a state where he has three pieces connected in one direction then he has won the game (And 'direction' means here {row, column, diagonal}). Thus within the $ 3^9$ many possible states of the tic-tac-toe grid there exists a clear subset of those states where one kind of pieces - 'X' or 'O' - has three connected pieces in one direction. Because there are only three rows or columns and two different diagonals we have 8 possible goal states out of $ 3^9$, or 0.04% of the theoretical possible states are goal states.

Thus if a player starts the game from an empty state he has to find the shortest possible 'path' to one of the seldom goal states. Thus an ideally realized game is a finite sequence $ \langle w_{0}, m_{X}, w_{1}, m_{O}, ..., w_{i \ge 5}\rangle$ where at every position $ i$ in this path from the different possible movements always that one has been selected which yields the highest fitness value. Thus the sum of the fitness values of all the individual state-move pairs is an optimum.

If an individual player is able to generate all those state-movement pairs with the maximal fitness value then this player should be able to act optimally.

This raises the question how one can evaluate concrete states with a fitness function in a way that for each possible move there is a fitness value ordering the different possibilities.

Generally one has at each possible position in the path of a game a state $ w_{i}$ with a state value $ \upsilon(w_{i})$ and possible movements leading to a successor state $ w_{i+1}$ with a successor state value $ \upsilon(w_{i+1})$. This successor state value $ \upsilon(w_{i+1})$ can be better, equal or more worse. Ideally the player would only follow the 'better' values.

A state value $ \upsilon_{P}: \cal{W}$ $ \longmapsto \Phi$ evaluates a state $ w_{i} \in \cal{W}$ with regard to its 'closeness' to a goal state depending from the point of view of a certain player $ P$. Some cases:

It holds that one player can have maximally 5 pieces in one state. Furthermore each piece is always on a row, a column or a diagonal. Thus we try for every state $ w_{i} \in \cal{W}$ to compute a state-value $ \upsilon(w_{i})$.

For every turn in the game a player can compute the actual state-value $ \upsilon(w_{i})$ and for every possible move $ m_{P}$ he can compute the state-value for the new state $ \upsilon(w_{i+1,m})$ where the index $ m$ signals the realized move.

In a strictly genetic system we assume that a LCS can apply a classifier with (state, move,$ \phi_{k}$) to a situation $ w_{i}$ without making an actual computation about all possible variants. The LCS will apply the available classifier with the best fitness value actually given. After the realization of the proposed move $ m_{P}$ the state-value $ \upsilon(w_{i+1,m})$ can be computed and if that value is different from the actual fitness value then the fitness value of this classifier can be modified.

In a derived genetic system, where the set of classifiers is at least accompanied by a planning ability it is conceivable that a classifier with (state, move,$ \phi_{k}$) will only be applied to a situation $ w_{i}$ after making an actual computation about all possible variants. Thus in that case the planning function includes a fitness function as an in-built device. It is not any longer the world which feeds back a fitness but the acting system can at every point of time compute a fitness value on its own. Thus we should distinguish between an external fitness function and an internal fitness function. From an evolutionary point of view it is a fundamental question how it is possible that there can emerge systems which include internal fitness functions. How can an external fitness function be 'projected' into an internal fitness function?

Gerd Doeben-Henisch 2012-03-31