Learning Systems

The general outline of a learning systems is as described above as follows:

$\displaystyle LS$ $\displaystyle \subseteq$ $\displaystyle S$ (9.1)
$\displaystyle LS \cap RS$ $\displaystyle =$ $\displaystyle \emptyset$ (9.2)
$\displaystyle LS(s)$ $\displaystyle iff$ $\displaystyle s = \langle I, O, IS, \varphi\rangle$ (9.3)
$\displaystyle I$ $\displaystyle :=$ $\displaystyle Inputstrings of the system$ (9.4)
$\displaystyle O$ $\displaystyle :=$ $\displaystyle Outputstrings of the system$ (9.5)
$\displaystyle IS$ $\displaystyle :=$ $\displaystyle Internal States of the system$ (9.6)
$\displaystyle \varphi$ $\displaystyle :$ $\displaystyle I \times IS \longmapsto IS \times O$ (9.7)

As we already could see there are several possibilities to define interesting subsets of learning systems. Here we will mention only three.

  1. A reactive system extended by a fitness function and an action memory of size n. This allows the grouping of given rules to collaborating rules supporting a certain goal. We cal this a weak learning system.
  2. The further extension of a weak learning system with the ability to change the rule set. We call this a level-1-learning system.
  3. A level-1-learning system extended by n-1-many additional rule sets operating on each other we call a level-n-learning system.

Depending on the allowed operations within level-learning systems one can introduce many more distinctions.

The weakness of the above introduced classification follows from the architectural dependency of these definitions from a certain 'structure' of the learning systems (having rules or neurons labeled by fitness values).

A more general classification would solely based on characteristic task sets like those used in psychological intelligence tests.

To judge that a system 'can learn' is usually given from the 'outside' of a system based on the observable behavior of the system. The behavior is based on perceivable stimuli $ S$ from the environment and the possible responses $ R$ of the system into the environment. Relying on the behavior makes the term 'learning' invariant against the kind of observed system: plants, animals, robots, humans or something else.

Such an approach to learning is rooted in the psychology of the beginning of the 20th century connected to names like J.B.Watson (1878 - 1958) (cf. [408], [410]), Edwin R.Guthrie (1886 - 1959) (cf. [138], [140]), Clark L.Hull (1884 - 1952) (cf.[169], [170] ), Edward C.Tolman (1886 - 1961)(cf.[380], [381]), and B.F.Skinner (1904 -1990)(cf. [340], [345]). For a general overview see Bower and Hilgard (1981, German: 1983/4)[31].

If one has to rely on the observable behavior then the behavior can be understood as a sequence of stimulus-response pairs bound to certain points of time like $ \{ (s,r), (s,r), ... \}$. This observable set defines a finite and incremental (and partial) empirical behavior function $ \phi_{OBS} = \{ (s,r), (s,r), ... \}$.

The transition from a fixed or static behavior function to a learning behavior function happens at that moment when we can identify a time point $ t^{*}$ after which we can observe at least one stimulus $ S$ which is now connected to a new response $ (S,R)$. To classify a new stimulus-response pair as learned one has to give a practical criterion which classifies this pair as 'beyond poor chance'. Thus the empirical behavior function can change and grow and thereby extend the space of possible behavior. Some of these pairs will be more often used than others.

To keep the property of uniqueness for the function $ \phi_{OBS}$ we have to claim that the new emergence of a pair $ (S,R')$ after the pair $ (S,R)$ will 'eliminate' the future occurrence of $ (S,R)$, $ elimination: \{((S,R),(S,R'))\} \longmapsto \{(S,R')\}$.

The relationship between the empirical incremental as well as eliminating learning function $ \phi_{OBS}$ and a possible theoretical learning function $ \phi_{TH}$ can be summarized as follows:

$\displaystyle \phi_{OBS}$ $\displaystyle \subseteq$ $\displaystyle \phi_{TH}\upharpoonright(S \times R)$ (9.8)

If we would not allow the theoretical behavior function to modify its own states like $ \phi_{TH} : I_{s} \times IS \longmapsto O_{s}$ then such a fixed theoretical behavior function could not support a flexible empirical behavior function. Only the alternative format $ \phi_{TH} : I_{s} \times IS \longmapsto IS \times O_{s}$ can change sufficiently to allow the emergence of new stimulus-response pairs.

The general format of the system function $ \phi_{TH}$ doesn't tell too much about the working of the function. In the following we will have a look to at least two kinds of implementations of learning systems functions. These are rule-based systems called learning classifier systems $ LCS$ as well as artificial neural networks $ ANN$.

Historically it was John H.Holland who in the years 1976, 1980 and 1986 published this kind of a learning mechanism (cf. Holland (1975/ 1992)[161]:pp.171ff. A first official textbook with learning classifier systems was Goldberg (1989) [122]:pp.21f. A very inspiring paper about learning classifier systems is from Wilson (1994)[422]. Another interesting explication can be found in Holland (1995)[162]:pp.41ff. But there are many hundreds of papers and books meanwhile about this topic. Thus this text here can only be a first pointer into the subject. For the following discussion we will concentrate on the paper of Wilson (1994) because this seems to be a very good summary of the key ideas of Holland optimized after many discussions which followed the publications of Holland.

The general idea is that the internal states $ IS$ are rules called classifiers. This set of rules can grow and can be changed. Thus the theoretical behavior function will change depending from this set of rules.

In the case of neurons - which we will discuss later - the theoretical behavior function is based on sets of neurons which can change as well as the different possible connections as well as weights.

Gerd Doeben-Henisch 2013-01-14