Fitness Function

The 'fitness function' is a rather complex topic which will be discussed in the upcoming sessions more intensively. For now we will define a simple starting point to start with.

We assume a reactive system, i.e. a system with a fixed set of deterministic rules. Additionally we assume a fitness function giving some external values associated with certain events in the environment. In the wood1 environments this is usually the eating of food.

Furthermore we assume an internal action memory called $ OLDACTIONS$. This set stores the last n-many actions where $ n=ACTIONDEPTH$. It is assumed that the newest action is located at $ OLDACTIONS(1)$. Every time a new action will be selected the old actions will be shifted by 1 to lower positions. If an action receives some fitness value $ f \in F$ then this value will first be attached to the action at $ OLDACTIONS(1)$, then this value will be 'distributed' to all preceding actions in the $ OLDACTIONS$.

For the distribution are different strategies possible. The guiding principle should be to enable a 'coherence' between those actions which can contribute to a certain path.

In the case of the maze-learning of rats with a fixed starting point and a highly static environment such a strategy can work quite well. As far as the conditions are becoming more variable (different goals, different starting positions) this mechanism can become to 'weak' to identify a path sufficiently well.

We assume for the length n of the $ OLDACTIONS$ an $ n$ at least as long as the smallest path from a start position to a goal (in the TOLMAN1 experiment this would be $ n=3$).

For the distribution we assume a 'distribution' as follows:

$\displaystyle f$ $\displaystyle :=$ $\displaystyle fitness value$ (11.1)
$\displaystyle n$ $\displaystyle :=$ $\displaystyle length of OLDACTIONS$ (11.2)
$\displaystyle s$ $\displaystyle =$ $\displaystyle 1 + 2 + ... + n$ (11.3)
$\displaystyle p$ $\displaystyle =$ $\displaystyle f/s$ (11.4)
$\displaystyle f(i)$ $\displaystyle =$ $\displaystyle ((n+1)-i)*p$ (11.5)

Example: Assuming $ f=100$, $ n=3$ we have $ s=1+2+3=6$, $ p=f/s=16.666667$ and we will have a distribution as $ f(1)=50$, $ f(2)=33.333333$, and $ f(3)=16.666667$. With a normalization by $ f$ we can fix the values like $ f(1)=0.5$, $ f(2)=0.3333333$, and $ f(3)=0.1666667$.

Because rules can already have some fitness value we have to add the new values to the old ones.

Another aspect is that the fitness values can grow infinitely high and thereby some rules can become 'isolated' from the supporting other rules. To minimize this effect one can introduce a maximum value $ REWMAX$ and stop the growth above this maximum. This supports a growing similarity between all rules of a certain path. This works as long as there are only a few paths as a subset of all possible paths which lead successfully to a goal state.

We will implement this simple strategy in the first example of a reactive deterministic system and will investigate the pros and cons.

Gerd Doeben-Henisch 2013-01-14