(I am just rewriting this text)
We assume as the overall structure the world (W) as described before (cf. 3.1) consisting of an environment (ENV) and at least one system (SYS), where the environment is realized as a grid of positions with properties. In our case we assume the further specialization of a WOOD1-style environment (cf. 3.3) and the system has always a certain position in the environment.
With this we could realize mappings like saying that agent 'a' is mapped to the position '(x,y)' . And because in the world every position of an environment is associated with at least one property the agent can be connected to the property of the position or to the properties of many positions, if its perception does allow this.
While the mappings ainp() and aout() are meta functions working 'above' the environment and the agent, we have to assume some functions which are internal to the agent. For this we assume the following ones: decode() translates the perceived properties of the environment into some 'agent relevant values' and the function eval() evaluates the internal states with regard to rewards and recommended actions.
The random agent will not make use of all these additional informations. It will only generate by chance a proposal for a new action independent of the actual perception and independent of any other internal states. Thus we have the following sequence of mappings happening:
Possible measures of performance with such an agent could be:
With these measures it would be possible to establish some ordering between agents: those which need less actions or have a larger amount of reward or have less errors.