As has been explained above the fitness function is a mapping from the possible states of the world into possible fitness values.
Some questions arise:
There are not yet complete and answers available for these questions, only first considerations and experiments.
With rgard to the possible information to be encoded one can state, that the game has a clearly defined win state: if one player can generate a state where he has three pieces connected in one direction then he has won the game (And 'direction' means here {row, column, diagonal}). Thus within the many possible states of the tic-tac-toe grid there exists a clear subset of those states where one kind of pieces - 'X' or 'O' - has three connected pieces in one direction. Because there are only three rows or columns and two different diagonals we have 8 possible goal states out of , or 0.04% of the theoretical possible states are goal states.
Thus if a player starts the game from an empty state he has to find the shortest possible 'path' to one of the seldom goal states. Thus an ideally realized game is a finite sequence where at every position in this path from the different possible movements always that one has been selected which yields the highest fitness value. Thus the sum of the fitness values of all the individual state-move pairs is an optimum.
If an individual player is able to generate all those state-movement pairs with the maximal fitness value then this player should be able to act optimally.
This raises the question how one can evaluate concrete states with a fitness function in a way that for each possible move there is a fitness value ordering the different possibilities.
Generally one has at each possible position in the path of a game a state with a state value and possible movements leading to a successor state with a successor state value . This successor state value can be better, equal or more worse. Ideally the player would only follow the 'better' values.
A state value evaluates a state with regard to its 'closeness' to a goal state depending from the point of view of a certain player . Some cases:
It holds that one player can have maximally 5 pieces in one state. Furthermore each piece is always on a row, a column or a diagonal. Thus we try for every state to compute a state-value .
For every turn in the game a player can compute the actual state-value and for every possible move he can compute the state-value for the new state where the index signals the realized move.
In a strictly genetic system we assume that a LCS can apply a classifier with (state, move,) to a situation without making an actual computation about all possible variants. The LCS will apply the available classifier with the best fitness value actually given. After the realization of the proposed move the state-value can be computed and if that value is different from the actual fitness value then the fitness value of this classifier can be modified.
In a derived genetic system, where the set of classifiers is at least accompanied by a planning ability it is conceivable that a classifier with (state, move,) will only be applied to a situation after making an actual computation about all possible variants. Thus in that case the planning function includes a fitness function as an in-built device. It is not any longer the world which feeds back a fitness but the acting system can at every point of time compute a fitness value on its own. Thus we should distinguish between an external fitness function and an internal fitness function. From an evolutionary point of view it is a fundamental question how it is possible that there can emerge systems which include internal fitness functions. How can an external fitness function be 'projected' into an internal fitness function?
Gerd Doeben-Henisch 2012-03-31