In the methodological remarks above (cf. ) we claim besides a simulated random system also a theoretical model. This shall be described here.
We assume as the overall structure a world (W) consisting of an environment (ENV) and at least one agent (AGENT), where the environment is realized as a grid of positions with properties and the agent has always a certain position in the environment. This means the mapping aout() maps an agent into a set of positions. This leads to a more specific theoretic model, closer to the simulating software model, but this must no be a hindering obstacle:
(4.16) | |||
(4.17) | |||
(4.18) | |||
(4.19) | |||
(4.20) | |||
(4.21) | |||
(4.22) | |||
(4.23) | |||
(4.24) | |||
(4.25) | |||
(4.26) |
With this we could realize mappings like saying that agent 'a' is mapped to the position '(x,y)' . And because in the world every position of an environment is associated with at least one property the agent can be connected to the property of the position or to the properties of many positions, if its perception does allow this.
While the mappings ainp() and aout() are meta functions working 'above' the environment and the agent, we have to assume some functions which are internal to the agent. For this we assume the following ones: decode() translates the perceived properties of the environment into some 'agent relevant values' and the function eval() evaluates the internal states with regard to rewards and recommended actions.
(4.27) | |||
(4.28) |
The random agent will not make use of all these additional informations. It will only generate by chance a proposal for a new action independent of the actual perception and independent of any other internal states. Thus we have the following sequence of mappings happening:
Possible measures of performance with such an agent could be:
With these measures it would be possible to establish some ordering between agents: those which need less actions or have a larger amount of reward or have less errors.
Gerd Doeben-Henisch 2012-03-31