Test Environment 'Wood1'

Figure 4.1: Wilson 'wood1'-environment, overview
\includegraphics[width=4.0in]{env_wood1_complete.eps}

In his paper from 1994 [347] describes Wilson a simple test environment called wood1. The overall layout (cf. figure 4.1) is a grid with 55 columns and 15 rows where the animat '*' will continue in this regular environment when it crosses the borders to keep the illusion of an infinite world. There is no explicit rule given in the paper [347]. Different to Wilson we will not assume an infinite finite world but an dynamic finite world which can be dynamically expanded if necessary. This can approximate the illusion of the infinite finite world but has the advantage to offer a fixed size which then can be used for clear benchmarkes.

We assume the following conventions:

  1. The leftmost upper corner of the wood1 environment coincides with a 2-dimensional coordinate system starting with (1,1).
  2. The X-axis runs from 'left' to 'right' in parallel with the columns of the matrix.
  3. The Y-axis runs from 'up' to 'down' in parallel with the rows of the matrix.
  4. Following Wilson we label the 'upper' row as north.

An animat can move in any of the eight directions as described in figure 4.2. Thus '00' encodes 'no move', '03' would encode a move to the right, etc. The environment is further the 'cause' for feedbacks also called rewards. For the consumption of food (F) Wilson proposes a value of '1000'. There are no further arguments in the paper of Wilson about the 'nature' of the rewards. Wilson assumes furthermore that the animat, if it plans to move onto a field which is occupied by an object (O), then the move is blocked. If the cell is occupied by food (F) then moving to the cell eats the food and gives the animat a feedback of an immediate reward $ r_{imm}$.

Figure 4.2: Wilson 'wood1'-environment, details
\includegraphics[width=4.0in]{env_wood1_detail.eps}

One can think about possible reasons behind the amount of reward. To assume -as Wilson does in his paper- that the food is the only source of positive reward, this could be interpred as the consequence of the assumption that the ANIMAT as idealized biological system needs primarily energy. Food is some stuff which provides some amount of energy represented as 'amount of reward'. Additionally one can assume that the ANIMAT is consuming energy as long as it is not 'eating' some food. Thus energy consumption is a function of time given by the cycles. Doing nothing consumes energy and being active consumes even more energy. This leads to the working hypothesis, that

  1. Doing no action consumes X1-many units of energy represented as negative reward. Proposal: '-1'
  2. Doing an action as normal move consumes X2-many units of energy represented as negative reward. Proposal: '-2'
  3. Doing an action as normal move with eating consumes X2-many units of energy and at the same time gains energy represented as positive reward. Proposal (Winston): '+1000'

Wilson assumes further some simple sensory perception of the ANIMAT including all the neighbouring cells starting with the cell 'in the north' and then going 'clockwise' around (cf. figure 4.2). If one wants to make systematic explorations of the dependency of the success from perception it would be convenient to have a more dynamic view of the perception model. We propose here two basic strategies of a scalable sensory perception (cf. figure 4.3):

Figure 4.3: Alternative models of sensory input
\includegraphics[width=3.5in]{animat_views_22april10.eps}

Either the sensory perception will work like a bird's view having the ANIMAT in it's center and from this center it can be expanded outwards according to the formula $ (2n+1)^{2}$ with $ n \geq 1$ or the sensory perception models a front view including the ANIMAT as part of this view with the formula $ (2n+1) \times m$ with $ n \geq 0$ and $ m \geq 1$.

A bird's view would be 'translated' into a sensory string row by row using the encoding of Wilson. The same would happen in the case of a front view.

Assuming a front view induces some kind of a direction. Therefore we will 'keep this aside' as long as possible because this makes the world a lot more complicated.



Subsections
Gerd Doeben-Henisch 2012-03-31