Behavior Mechanism Including Feedback

Presupposing a framework as described in the preceding section one can describe a behavior mechanism, which includes feedback, as follows (cf. 16.4, 16.5) (with abbreviations of the script names)

$\displaystyle (P)$	$\displaystyle =$	$\displaystyle ainp(YO,XO,GRID,SHOW)$	(4.91)
$\displaystyle (ANIM,MSET)$	$\displaystyle =$	$\displaystyle selectM(ANIM, CL,P,MSET,SHOW)$	(4.92)
$\displaystyle (IDXM,CAND, MASET)$	$\displaystyle =$	$\displaystyle makeCAND(MSET,SHOW)$	(4.93)
$\displaystyle (ACT,ASET,ANIM)$	$\displaystyle =$	$\displaystyle action2(IDXM,CAND, MSET,ASET,ANIM)$	(4.94)
$\displaystyle (YN, XN)$	$\displaystyle =$	$\displaystyle aout(ACT,SHOW, YO, XO)$	(4.95)
$\displaystyle (CELLVAL)$	$\displaystyle =$	$\displaystyle decode(YN,XN,GRID,SHOW)$	(4.96)
$\displaystyle (ANIM)$	$\displaystyle =$	$\displaystyle impact(CELLVAL,ANIM,SHOW)$	(4.97)
$\displaystyle (ANIM, CL,ASET)$	$\displaystyle =$	$\displaystyle feedback(CL, ANIM, ASET,SHOW)$	(4.98)

This is a sequence of mappings starting with a perception (PERC). Triggerd by this perception the system has to filter out those classifiers which match with this perception. The result is a matched set of classifiers (MATCHSET). From these possible actions the system selects the most promising actions by exploiting the feedback values stored in the classifiers. This results in a set of candidates (CAND) for possible best actions. This leads then to a real action (ACT) which ends up in a new position in the environment (to 'stay' at the old position is subsumed under this view). The new (or 'new old') position causes a certain cell of the grid (= a certain position (= CELLVALUE)) to be the external feedback for the acting system. Based on such an external feedback can the built-in feedback mechanism compute some impact on the internal states of the ANIMAT. Depending from this impact it is possible to modify the feedback values of the classifiers.

In the examples 16.6.1 and 16.6.3 one can see how a given (static) set of classifiers with a simple feedback system changes the behavior of the system drastically when compared to an ANIMAT0 or ANIMAT1 agent. As soon as the system has found some food it increments the reward values of the last classifier who has been executed just before reaching the food. The result is that the system will from that matching point onwards either circulate around the food (if there is no Non-Move) or it will even stay forever (if there is a Non-Moove available). This reminds to the behavior of insects flying to a source of light even if this causes finally there death. Clearly, this mechanism is far too simple for more advanced tasks. We will develop this step by step in the future.

In a more condensed form one can represent the behavior function so far as follows:

$\displaystyle ainp$	$\displaystyle :$	$\displaystyle GRID \longmapsto PERC$	(4.99)
$\displaystyle impact$	$\displaystyle :$	$\displaystyle PERC \longmapsto E$	(4.100)
$\displaystyle behavior$	$\displaystyle :$	$\displaystyle PERC \times CL \times E \longmapsto ACT$	(4.101)
$\displaystyle feedback$	$\displaystyle :$	$\displaystyle E \times CAND \longmapsto CL$	(4.102)

The feedback depending from the structures of the working system is part of an environment. There are no 'basic values' as emotions 'outside' of this system. These values 'emerge' 'inside' the whole system, reflecting some properties of the given world. Furthermore they are associated with 'goals'. Independent of goals there are no values. Somehow one can say that -in the realm of biological life- the drive to live is an overall property of acting systems which emerges driven by this drive out of the given structures. This is possible, because all necessary properties are already given as possibilities in the so called 'matter' out of which everything emerges. Applied to our technical learning systems this means that whatever we want an artificial system to be able to learn we must provide all necessary properties in the environment as well in the acting system itself. Like in formal logic where I can only prove some theorems with regard to the available axioms and assumptions can no learning happen if there are the necessary preconditions not given, either as 'built-in' knowledge or as 'operationally derived' knowledge.