To enable non-random agents which behave with some 'rationale' we need a minimal perception of the environment onto which the non-random behavior functions can be based. For this we will select here a visual perception of the type of a birds-view as described before (cf. figure 4.4):
Including this new structure we change the defintion of the random agent
into a new, different structure called
:
![]() |
![]() |
![]() |
(4.29) |
![]() |
![]() |
![]() |
(4.30) |
![]() |
![]() |
![]() |
(4.31) |
![]() |
![]() |
![]() |
(4.32) |
![]() |
![]() |
![]() |
(4.33) |
This new does no longer act completely at random. This new agent is primarily driven by a perception of the environment which will be evaluated by a reactive evaluation program (see below).
In case of a birds-view kind of perception with
we have 8 cells surrounding the actual position of the agent (cf. figure 4.2). Assuming the properties
PROP | ENCODED |
Empty Space '.' | '00' |
Border 'BB' | '01' |
Object 'O' | '10' |
Food 'F' | '11' |
and assuming as example an environment like below with the
located at position at (2,2) we will get the following perceptional input
!. . . ! !. * . ! !. O O ! !. F . !
![]() |
![]() |
![]() |
(4.34) |
The same function as implemented in scilab for a grid with food 'F' and without food 'F':
-->GD3O2F1 GD3O2F1 = !. . . ! ! ! !O O . ! ! ! !* F . ! -->YI=3, X=1, GRID=GD3O2F1,SHOW=1,[PERC]=ainp(YI,X,GRID,SHOW) PERC = 1010110101010101 GRID = !. . . ! ! ! !O O . ! ! ! !* . . ! -->YI=3, X=1, GRID=GD3O2F0,SHOW=1,[PERC]=ainp(YI,X,GRID,SHOW) PERC = 1010000101010101
Furthermore will the internal behavior function of the agent change. In the agent did throw a dice for a new movement without evaluating the environment. If the new proposed movement was against a border 'BB' or an object 'O' then the move didn't happen. And this action was an empty action. The new
does first evaluate the perception with a fixed behavior ('reactive behavior') and only then, if there is no unique solution he will throw a dice. We will call this new function behavior1 including the subfunctions match, acrionm and aout:
![]() |
![]() |
![]() |
(4.35) |
![]() |
![]() |
![]() |
(4.36) |
![]() |
![]() |
![]() |
(4.37) |
![]() |
![]() |
![]() |
(4.38) |
![]() |
![]() |
![]() |
(4.39) |
![]() |
![]() |
![]() |
(4.40) |
![]() |
![]() |
![]() |
(4.41) |
![]() |
![]() |
![]() |
(4.42) |
According to the assumptions about the wood1-world (cf. figure 4.2) we will have the following action-encoding:
ACTion | ACT |
No move | 0000 |
north | 0001 |
north-east | 0010 |
east | 0011 |
east-south | 0100 |
south | 0101 |
south-west | 0110 |
west | 0111 |
north-west | 1000 |
The -value (also called reward) is some integer number which has to be provided either from the environment or -as in our case- by the body of the system as some prewired internal feedback function.
An individual classifier
is a 3-tupel with the constituents perception, action, and fitness. The intended meaning of a classifier is, that the action of the classifier shall be executed if the perception of the classifier will be matched by an actual perception. The actual fitness value gives an additional hint whether this shall be done or not. To guide a behavior one will usually need a set of indivudal classifiers called
to cover most of the possible situations.
Below an example set of classifiers as used in the scilab version of
. This is a matrix with classifiers as strings in the 1st column, associated moves as strings in the 2nd column and the 3rd column is not yet officially used (it will later contain the fitness values).
The 'logic' behind this classifiers is given by the fact that the perception has a certain 'order' how the different possible 'properties' can occur. Depending from the actual position is every 'located occurence' associated with an 'approprate movement'. This knowledge is encoded in the classifiers. Thus 'food' can occur at 8 different positions in the perception and to get to this food there are 8 different movements necessary.
In the this knowledge is prewired. The interesting question is whether a learning agent can learn all these associations.
CLASSIF = [ '11##############' '1' '000'; '##11############' '2' '000'; '####11##########' '3' '000'; '######11########' '4' '000'; '########11######' '5' '000'; '##########11####' '6' '000'; '############11##' '7' '000'; '##############11' '8' '000'; '00##############' '1' '000'; '##00############' '2' '000'; '####00##########' '3' '000'; '######00########' '4' '000'; '########00######' '5' '000'; '##########00####' '6' '000'; '############00##' '7' '000'; '##############00' '8' '000' ]
The function match shall be able to match an actual perception with a given set of
in a way that those classifiers will be found, which are compatible with the perception and furthermore provide an adequate action. The resulting set is a set of matching classifiers.
Given the following simple environment
-->GD3O2F1 GD3O2F1 = !. . . ! ! ! !O O . ! ! ! !. F . !
we can apply the functions ainp(), match1(), action1(), aout() as follows:
First we get a perception of the nearby environment:
-->YI=3, X=1,GRID=GD3O2F1,SHOW=1,[PERC]=ainp(YI,X,GRID,SHOW) PERC = 1010110101010101
Then identifies the match1() function that classifier which can 'match' the perception:
-->CLASSIF=ANIMAT(5), [INDXCL,REWARD]=match1(PERC, CLASSIF,SHOW) INDXCL = 3. -->CLASSIF(3) ans = ####11########## -->CLASSIF(3,2) ans = 3
Then we can apply the function action1() which finds an appropriate action for this perception based on the selected classifiers:
-->[ACT]=action1(CLASSIF, INDXCL) ACT = 3
The action encoded as a direction between { 1, ..., 8} has then to be translated into a real position relative to the actual position:
-->YO=YI, XO=X,[YN, XN]=aout(ACT,SHOW, YO, XO) YO = 3. XO = 1. XN = 2. YN = 3.
This completes the behavior function behavior1().
From these assumptions follows the following possible behavior schedule:
Proposed tests: One can compare the
-agent with the
-agent (1) with regard to the number of cycles needed to find a single food and (2) with regard to the amount of reward within a defind course of life.
Gerd Doeben-Henisch 2012-03-31