Evolving Clustering Method (ECM)

The first method which will be introduced is the Evolving Clustering Method (ECM) as described in Kasabov (2002) [161]:146ff and Kasabov 2007 [151]:61ff. This method is quite satisfying for the task at hand but it will show up severe deficiencies -like other methods- in those cases which are most characteristic for our moving scenarios.

The main idea of the ECM method is to map data points into a finite set of categories $c_{i} \in Cat$ which are defined by a center coordinate $Cc_{i}$ with a radius $Ra \in \Re$ . While the center coordinate as well as the radius will be built up in an unsupervised manner some category threshold $D_{thr}$ is assumed as some apriori knowledge. This apriori threshold induces a clear limit to the growth of the categories. It must hold:

$\displaystyle 0 \leq$

$\displaystyle Ra$

$\displaystyle < D_{thr}$

(8.1)

From this it is clear that the selection of the category threshold $D_{thr}$ is decisive whether and how the ecm algorithm will identify categories (= 'clusters', 'regions', 'areas', 'patterns',...). The procedure of forming categories runs then as follows (cf. Kasabov 2002 [161]:146):

Step 1: Take the next example $x \in INPUT$ which has not yet been processed and calculate the distances between this example and all already created categories $c \in Cat$ with $D = \mid\mid x - c\mid\mid$ interpreted as euclidean distance (cf. 8.2 ). If there is no category -as in the beginning- then take as first category $c_{1}$ with center $Cc_{1} = x$ and category radius $Ra_{1} = 0$ ). Else goto step 2.
Step 2: If there is any distance value equal to or less than at least one of the radii of the categories then the current example belongs to a category $c_{j}$ with the minimum distance $D_{xj} = min(\mid\mid x - c_{j}\mid\mid)$ . It holds that $D_{xj} \leq Ra_{j}$ . In this case, neither a new category is created, nor any existing category is updated; the algorithm returns to Step 1. Else: go to step 3.
Step 3: Find the category $c_{a}$ (with center $Cc_{a}$ and category radius $Ra_{a}$ ) from all existing category centers through calculating the values $S_{xc} = D_{xc} + Ra_{c}$ with the minimum value $S_{xa}$ .
Step 4: If $S_{xa}$ is greater than $2 \times D_{thr}$ , the example does not belong to any existing categories. A new category is created with $c_{new} = x$ and the algorithm returns to Step 1.
Step 5: If $S_{xa}$ is not greater than $2 \times D_{thr}$ the category $c_{a}$ is updated by moving its center $Cc_{a}$ and increasing the value of its radius $Ra_{a}$ . The updated radius $Ra_{a}$ is set to be equal to $\frac{S_{xa}}{2}$ and the new center $Cc_{a}$ is located at the point on the line connecting the and $Cc_{a}$ , and the distance from the new center $Cc_{a}$ to the point is equal to the new $Ra_{a}$ . The algorithm returns to Step 1.

In this algorithm the distance $D = \mid\mid x - c\mid\mid$ is understood as the Euclidean Distance defined as:

$\displaystyle \mid\mid x - y \mid\mid$

$\displaystyle =$

$\displaystyle [\frac{1}{n}\sum^{n}_{i=1}{\mid x_{i} - y_{i}\mid}^{2}]^{\frac{1}{2}}$

(8.2)

A user-defined scilab functiondoing this job is the following one:

//*******************************************************
// Computes the euclidean distance between two vectors x,y

function [d]=euclideanDistance(x,y)
  
  lx=length(x), ly=length(y)
  if lx <> ly then error('SIZE of VECTORS DO NOT MATCH'),
  else
    s=0
    for i=1:lx
    s=s+abs(x(i)-y(i))^2, 
  end
  d=sqrt(s/lx)
  end
  
endfunction
//***************************************************

Another often used method to define the similarity between two vectors v,w is the computation of the angle $\theta$ between these two vectors:

$\displaystyle \textbf{v}\cdot \textbf{w}$	$\displaystyle =$	$\displaystyle \mid\mid\textbf{v}\mid\mid \mid\mid\textbf{w}\mid\mid cos \theta$	(8.3)
$\displaystyle cos \theta$	$\displaystyle =$	$\displaystyle \frac{\textbf{v} \textbf{w}}{\mid\mid\textbf{v}\mid\mid\mid\mid\textbf{w}\mid\mid}$	(8.4)
$\displaystyle \theta$	$\displaystyle =$	$\displaystyle cos^{-1} \left(\frac{\textbf{v} \textbf{w}}{\mid\mid\textbf{v}\mid\mid\mid\mid\textbf{w}\mid\mid}\right)$	(8.5)

But as a comparison shows (cf. table 8.2.1) is the angle-method less detailed than the euclidean distance measure, because the angle-method can not distinguish between different distances within the 'same' direction:

Nr.	input x	input y	euclideanDistance()	cosvec()
1			0.7071068	$11.309932^{o}$
2			1.0	$0^{o}$
3			0.707106	$11.309932^{o}$
4			1.5811388	$8.1301024^{o}$
5			2.0	$0.0^{o}$
6			1.5811388	$8.1301024^{o}$
7			2.5495098	$6.3401917^{o}$
8			3.1622777	$11.30993^{o}$
9			3.0	$0.0^{o}$
10			3.5355339	$5.1944289^{o}$

In scilab one can apply the ecm algorithm with the following command: CAT=ecm(INP3). This activates the user-defined function ecm with the input argument (the normalized input data) and computes possible categories stored in . The category threshold is set to $\theta=0.1$ . After onle one pass (!) this algorithm gives three cluster centers which are located within the three objects (a description of the scilab implementation of the ecm algorithm is given in the mathematical appendix).

    0.2029629    0.3464794    
    0.6150446    0.6212005    
    0.8897657    0.2091188

**Figure 8.3:** Output data case 1 combined with input data; output black diamonds
$\includegraphics[width=4.0in]{case1_output.eps}$

Again one can use the plot-command in scilab to print the new data embedded into the old data:
plot2d(CAT(:,1), CAT(:,2), style=-4, rect=[-0.1,-0.1,1.1,1.1]) (cf. figure 8.3).

Within the ecm algorithm there is a special sub-algorithm linePartition() managing the adaptation of the old cluster center Cc to a new position following a line between the old cluster center $Cc_{old}$ and the new datapoint $INP_{i}$ . This algorithms works as follows:

//*******************************************************
// Computes a new point P(x,y) which intersects a line between 
// P2(x2,y2) and P1(x1,y2) in the proportion n:m where
// n = P2P and m = PP1.

function [P]=linePartition(P1,P2,n)
  
  D=euclideanDistance(P1,P2)
  m=D-n
  lm=m/n  //lm denotes lambda
  
  x= (P1(1)+lm*P2(1))/(1+lm)  //P1(1) = x1, P1(2) = y1,...
  y = (P1(2)+lm*P2(2))/(1+lm)

  P=[x y]
  
endfunction
**********************************************************

Gerd Doeben-Henisch 2012-03-31