Learning Algorithm: Questions

The case of the ANIMAT2 agent caused some discussions in the with the students. An ongoing argument started how to systematize the problem. The following assumptions are taken as necessary to describe the problem sufficiently well:

  1. [GOOD/BAD]: Every evaluation needs a minimal set of values distinguished as good or bad or are leading 'more' to good or more to bad. These values are either taken as 'built in' or taken as 'externally given'.
  2. [IMPACT]: Every action a of a system can directly either cause some of the internal states IS of the system to increase its 'goodness values' or increase is 'badness values' or it is neutral. Furthermore can the input INP of the system cause some of the internal states of the system to increase its 'goodness values' or increase is 'badness values' or it is neutral.
  3. [FEEDBACK]: One has to assume a minimally value chain starting with some action a and leading back to this action a associated with accompanying value changes: $ a \rightarrow imp_{i}(a,IS) = \{good, bad\} \& ENV(a,p) = p* \rightarrow INP(p...
...{e}(i), imp_{e}(i*), \in \{good, bad\} \rightarrow val(a,i*) \in \{good, bad\} $. Such a chain is a purely logical construct; it says nothing about 'how' such a chain is 'implemented' in a concrete system.
  4. [DISSEMINATION]: If a minimal value chain exists, then it is possible to rate that action a which caused a direct impact onto the internal states causing some good-bad-values and as well to rate the last action before the perceived change of certain conditions. In many -most?- actions of an agent in an environment it is necessary to realize n-many actions without causing a perceivable change in the environment with an increase in positive values. Therefore to disqualify all the preceding actions could in the long run lead to 'self destruction' either because the preparing actions are 'kicked out' or because the final action a is occurring several times and only after m-many repetitions it caused that change which lead o a positive value. Thus it has to be clarified how it is possible to find a 'good' solution for the dissemination of the caused values with regard to the involved actions.



Subsections
Gerd Doeben-Henisch 2012-03-31