Fitness Function for Tic-Tac-Toe

As has been explained above the fitness function is a mapping $f:\cal{W}$ $\longmapsto \Phi$ from the possible states of the world into possible fitness values.

Some questions arise:

Which kind of information should be encoded in the fitness function?
How can the classifier system exploit this information in an optimal way?
How often should a fitness value be apllied to the acting systems?

There are not yet complete and answers available for these questions, only first considerations and experiments.

With rgard to the possible information to be encoded one can state, that the game has a clearly defined win state: if one player can generate a state where he has three pieces connected in one direction then he has won the game (And 'direction' means here {row, column, diagonal}). Thus within the many possible states of the tic-tac-toe grid there exists a clear subset of those states where one kind of pieces - 'X' or 'O' - has three connected pieces in one direction. Because there are only three rows or columns and two different diagonals we have 8 possible goal states out of , or 0.04% of the theoretical possible states are goal states.

Thus if a player starts the game from an empty state he has to find the shortest possible 'path' to one of the seldom goal states. Thus an ideally realized game is a finite sequence $\langle w_{0}, m_{X}, w_{1}, m_{O}, ..., w_{i \ge 5}\rangle$ where at every position in this path from the different possible movements always that one has been selected which yields the highest fitness value. Thus the sum of the fitness values of all the individual state-move pairs is an optimum.

If an individual player is able to generate all those state-movement pairs with the maximal fitness value then this player should be able to act optimally.

This raises the question how one can evaluate concrete states with a fitness function in a way that for each possible move there is a fitness value ordering the different possibilities.

Generally one has at each possible position in the path of a game a state $w_{i}$ with a state value $\upsilon(w_{i})$ and possible movements leading to a successor state $w_{i+1}$ with a successor state value $\upsilon(w_{i+1})$ . This successor state value $\upsilon(w_{i+1})$ can be better, equal or more worse. Ideally the player would only follow the 'better' values.

A state value $\upsilon_{P}: \cal{W}$ $\longmapsto \Phi$ evaluates a state $w_{i} \in \cal{W}$ with regard to its 'closeness' to a goal state depending from the point of view of a certain player . Some cases:

Empty: Having a grid which is empty is not negative, not positive, it is 'neutral'. Everything can happen. This situation can only occur for the player which begins.
Full with no goal: Having a grid which is completely filled up without having a goal state is not positive, but it is also not neutral, because a win has been missed, for both players.
$0 < n \leq 3$ pieces for one player in one direction: Positive pieces are those which can be extended to a goal without interruption by an opponent piece. There can be 1 - 3 pieces for one direction and this can be multiplied for one piece by the fact that a piece is sharing more than one direction. Negative pieces are those pieces which are positive for the opponent. If for example player 'O' has only one positive piece in two directions but player 'X' has two positive pieces at least in one direction than these positive values for the opponent are negative values for the player 'O'. Thus the negative values have somehow to be subtracted from the positive values.

It holds that one player can have maximally 5 pieces in one state. Furthermore each piece is always on a row, a column or a diagonal. Thus we try for every state $w_{i} \in \cal{W}$ to compute a state-value $\upsilon(w_{i})$ .

For every turn in the game a player can compute the actual state-value $\upsilon(w_{i})$ and for every possible move $m_{P}$ he can compute the state-value for the new state $\upsilon(w_{i+1,m})$ where the index signals the realized move.

In a strictly genetic system we assume that a LCS can apply a classifier with (state, move, $\phi_{k}$ ) to a situation $w_{i}$ without making an actual computation about all possible variants. The LCS will apply the available classifier with the best fitness value actually given. After the realization of the proposed move $m_{P}$ the state-value $\upsilon(w_{i+1,m})$ can be computed and if that value is different from the actual fitness value then the fitness value of this classifier can be modified.

In a derived genetic system, where the set of classifiers is at least accompanied by a planning ability it is conceivable that a classifier with (state, move, $\phi_{k}$ ) will only be applied to a situation $w_{i}$ after making an actual computation about all possible variants. Thus in that case the planning function includes a fitness function as an in-built device. It is not any longer the world which feeds back a fitness but the acting system can at every point of time compute a fitness value on its own. Thus we should distinguish between an external fitness function and an internal fitness function. From an evolutionary point of view it is a fundamental question how it is possible that there can emerge systems which include internal fitness functions. How can an external fitness function be 'projected' into an internal fitness function?

Gerd Doeben-Henisch 2012-03-31