Enhanced Random Agent ANIMAT $^{1}$

To enable non-random agents which behave with some 'rationale' we need a minimal perception of the environment onto which the non-random behavior functions can be based. For this we will select here a visual perception of the type of a birds-view as described before (cf. figure 4.4):

**Figure 4.4:** Alternative models of sensory input
$\includegraphics[width=3.5in]{animat_views_22april10.eps}$

Including this new structure we change the defintion of the random agent $ANIMAT^{0}$ into a new, different structure called $AGENT^{1}$ :

$\displaystyle PROP$	$\displaystyle \subseteq$	$\displaystyle \{0,1\}^{n}$	(4.29)
$\displaystyle ainp$	$\displaystyle :$	$\displaystyle POS \times DISTANCE \longmapsto PERC$	(4.30)
$\displaystyle POS$	$\displaystyle \subseteq$	$\displaystyle X \times Y$	(4.31)
$\displaystyle DISTANCE$	$\displaystyle \in$	$\displaystyle Nat$	(4.32)
$\displaystyle PERC$	$\displaystyle \subseteq$	$\displaystyle PROP^{n}$	(4.33)

This new $AGENT^{1}$ does no longer act completely at random. This new agent is primarily driven by a perception of the environment which will be evaluated by a reactive evaluation program (see below).

In case of a birds-view kind of perception with we have 8 cells surrounding the actual position of the agent (cf. figure 4.2). Assuming the properties $PROP_{wood1}$

PROP	ENCODED
Empty Space '.'	'00'
Border 'BB'	'01'
Object 'O'	'10'
Food 'F'	'11'

and assuming as example an environment like below with the $ANIMAT^{1}$ located at position at (2,2) we will get the following perceptional input

!.  .  .  !
!.  *  .  !
!.  O  O  !
!.  F  .  !

$\displaystyle ainp((2,2),1)$

$\displaystyle =$

$\displaystyle \langle 00 00 00 10 10 00 00 00\rangle$

(4.34)

The same function as implemented in scilab for a grid with food 'F' and without food 'F':

-->GD3O2F1
 GD3O2F1  =
 
!.  .  .  !
!         !
!O  O  .  !
!         !
!*  F  .  !

-->YI=3, X=1, GRID=GD3O2F1,SHOW=1,[PERC]=ainp(YI,X,GRID,SHOW)
 PERC  =
 1010110101010101   

 GRID  =
 
!.  .  .  !
!         !
!O  O  .  !
!         !
!*  .  .  !

-->YI=3, X=1, GRID=GD3O2F0,SHOW=1,[PERC]=ainp(YI,X,GRID,SHOW)

 PERC  =
 1010000101010101

Furthermore will the internal behavior function of the agent change. In $AGENT^{0}$ the agent did throw a dice for a new movement without evaluating the environment. If the new proposed movement was against a border 'BB' or an object 'O' then the move didn't happen. And this action was an empty action. The new $AGENT^{1}$ does first evaluate the perception with a fixed behavior ('reactive behavior') and only then, if there is no unique solution he will throw a dice. We will call this new function behavior1 including the subfunctions match, acrionm and aout:

$\displaystyle ACT$	$\displaystyle \subseteq$	$\displaystyle \{0,1\}^{4}$	(4.35)
$\displaystyle FITNESS$	$\displaystyle \in$	$\displaystyle Nat$	(4.36)
$\displaystyle CLASSIFIERS$	$\displaystyle \subseteq$	$\displaystyle PERC \times ACT \times FITNESS$	(4.37)
$\displaystyle CLASSIFIER$	$\displaystyle \in$	$\displaystyle CLASSIFIERS$	(4.38)
$\displaystyle match$	$\displaystyle :$	$\displaystyle PERC \times CLASSIFIERS \longmapsto INDXCL$	(4.39)
$\displaystyle action$	$\displaystyle :$	$\displaystyle INDXCL \times CLASSIFIERS \longmapsto ACT$	(4.40)
$\displaystyle aout$	$\displaystyle :$	$\displaystyle ACT \longmapsto POS$	(4.41)
$\displaystyle behavior1$	$\displaystyle =$	$\displaystyle match \otimes action \otimes aout$	(4.42)

According to the assumptions about the wood1-world (cf. figure 4.2) we will have the following action-encoding:

ACTion	ACT
No move	0000
north	0001
north-east	0010
east	0011
east-south	0100
south	0101
south-west	0110
west	0111
north-west	1000

The -value (also called reward) is some integer number which has to be provided either from the environment or -as in our case- by the body of the system as some prewired internal feedback function.

An individual classifier is a 3-tupel with the constituents perception, action, and fitness. The intended meaning of a classifier is, that the action of the classifier shall be executed if the perception of the classifier will be matched by an actual perception. The actual fitness value gives an additional hint whether this shall be done or not. To guide a behavior one will usually need a set of indivudal classifiers called to cover most of the possible situations.

Below an example set of classifiers as used in the scilab version of $ANIMNAT^{1}$ . This is a matrix with classifiers as strings in the 1st column, associated moves as strings in the 2nd column and the 3rd column is not yet officially used (it will later contain the fitness values).

The 'logic' behind this classifiers is given by the fact that the perception has a certain 'order' how the different possible 'properties' can occur. Depending from the actual position is every 'located occurence' associated with an 'approprate movement'. This knowledge is encoded in the classifiers. Thus 'food' can occur at 8 different positions in the perception and to get to this food there are 8 different movements necessary.

In the $AGENT^{1}$ this knowledge is prewired. The interesting question is whether a learning agent can learn all these associations.

CLASSIF = [
'11##############' '1' '000'; 
'##11############' '2' '000'; 
'####11##########' '3' '000'; 
'######11########' '4' '000'; 
'########11######' '5' '000'; 
'##########11####' '6' '000'; 
'############11##' '7' '000'; 
'##############11' '8' '000';
'00##############' '1' '000'; 
'##00############' '2' '000'; 
'####00##########' '3' '000'; 
'######00########' '4' '000'; 
'########00######' '5' '000'; 
'##########00####' '6' '000'; 
'############00##' '7' '000'; 
'##############00' '8' '000'
]

The function match shall be able to match an actual perception with a given set of in a way that those classifiers will be found, which are compatible with the perception and furthermore provide an adequate action. The resulting set is a set of matching classifiers.

Given the following simple environment

-->GD3O2F1
 GD3O2F1  =
 
!.  .  .  !
!         !
!O  O  .  !
!         !
!.  F  .  !

we can apply the functions ainp(), match1(), action1(), aout() as follows:

First we get a perception of the nearby environment:

-->YI=3, X=1,GRID=GD3O2F1,SHOW=1,[PERC]=ainp(YI,X,GRID,SHOW)

 PERC  =
 1010110101010101

Then identifies the match1() function that classifier which can 'match' the perception:

-->CLASSIF=ANIMAT(5), [INDXCL,REWARD]=match1(PERC, CLASSIF,SHOW)

 INDXCL  =
    3. 

-->CLASSIF(3)
 ans  =
 ####11##########   
-->CLASSIF(3,2)
 ans  =
 3

Then we can apply the function action1() which finds an appropriate action for this perception based on the selected classifiers:

-->[ACT]=action1(CLASSIF, INDXCL)
 ACT  =
 3

The action encoded as a direction between { 1, ..., 8} has then to be translated into a real position relative to the actual position:

-->YO=YI, XO=X,[YN, XN]=aout(ACT,SHOW, YO, XO)
 YO  =
    3.  
 XO  =
    1.  

 XN  =
    2.  
 YN  =
    3.

This completes the behavior function behavior1().

From these assumptions follows the following possible behavior schedule:

The agent has a certain position in an environment . Associated with this position is some property and the same holds for all surrounding cells with distance 1 to the agent.
The agent can perceive certain properties $\{p_{1}, ..., p_{k} \}$ of those positions he is determined to perceive with the function ainp(). The function ainp() does a certain decoding too.
The agent will match these decoded properties against a prewired set of classifiers.
The agent will select a new action based on the selected classifiers.
The action will be realized as a move in the environment to a new position , if a move is possible.
Continue at the beginning again.

Proposed tests: One can compare the $ANIMAT^{1}$ -agent with the $ANIMAT^{0}$ -agent (1) with regard to the number of cycles needed to find a single food and (2) with regard to the amount of reward within a defind course of life.

Gerd Doeben-Henisch 2012-03-31