I-ALGO - ALGORITHMS AND DATASTRUCTURES

Introduction
The problem as use case
A theoretical framework; first steps
Balanced binary search trees
Splay trees
AVL trees
Red-black trees
Bayesian networks
Deque
Stack
Queue
Priority queue
Linked list
Sets and multisets
Maps and multimaps
Exercices

I-ALGO SS03 - ALGORITHMS AND DATASTRUCTURES - Lecture with Exercices
Overview Sorting and Searching; special Data Structures: Deque, Stack, Queue, Priority Queue, Set, Map

                        Attention : The text is not a complete representation of the oral lecture !!!

AUTHOR: Gerd Döben-Henisch
DATE OF FIRST GENERATION: April-10, 2003
DATE OF LAST CHANGE: June-22, 2003
EMAIL: Gerd Döben-Henisch

1. Introduction

Because there are only 7 lectures left for a subject, which could easily fill up 50 and more lectures, we have to select some topics of special interest. In this lecture I will give an overview of the planned topics of the next lectures. Then everybody can organize his reading and experimenting with programming in better agreement with the lectures. Furthermore we will intensify the exercices because only there we have a chance for practical applications.

START

2. The problem as use case

Before we make our decisions about which topics we want to study in more detail we have to look around and to try some evaluation of the field. The following --shurely idealized-- diagram of the field with the known problems shall serve as some guide.

usecase

Problems to be solved

At the center we find the algorithms with datastructures and operations optimized for handling these datastructures. Any algorithm is interacting with an user which represents some kind of an environment. The environment provides input of the algorithm and the algorithm is responding. The time between input event at t and response event t+c represents the emprically observable responsetime c of the algorithm which can be measured and can be compared with the computed theoretical responstime c' of the algorithm.

The input d consist of commands like search(x,n), insert(x,n), delete(x,n) or others. The input can have different frequencies f(d) for every item or not.

The data of the algorithm can completely be given at the start or not; in the latter case the algorithm has to built up the data structure during runtime. The data can have the shape of trees or not, if trees, binary or not, balanced or not, with keys or not, together with the operations capable for concurrent operations or not, etc.

Some algorithms are presupposing the data to be given in a certain order, others not; they organize the needed order at run time. Ordering can be complete or only partial with an adaptive extension as needed.

From this simplified picture of the scenario we want wo select some subareas for the next 6 lectures.

I am assuming the following requirements as most interesting for further investigations:

The frequency of the input events is not evenly distributed but is usually uneven.

The algorithm has at the start time no or nearly no data at hand; the data have to be collected from the input stream.

The data have to be sorted during runtime.

As the main data structure we are assuming tree-like structures.

As the main typs of action-response pairs we assume

search - answer

insert - yes

delete - yes

stimulus - response yes/no with degree

forecast - yes/no with degree

From these assumptions we infer the following topics as interesting:

Have a look to Splay Trees because these assuming no equally disributed frequencies for their input and have to built up the datastructures at run time.

We will also have a look to AVL Trees, because these are the prototypes --also historically-- of balanced binary search trees. If time will allow, we will also analyze Red-Black Trees as a more advanced form of balanced binary search trees.

Bayesian Networks are a practically very important form of trees which allow the representation of data in a way which allows the actions of asking for the probability of certain states in a described environment or about the probability of some outcome in the future.

Introductory to these more advances topics we will examine and practically use basic data types of the C++-Standard Library like que, Stack, Queue, Priority Queue, Set and Map. Set and Map are escpecially interesting because these data structures are implemented as balanced binary search trees and --according to [JOSUTTIS 1999:176]-- usually as Red-Black Trees! Thus we can use these data structures in some sense as reference implementations for testing our own software.

START

3. A theoretical framework; first steps

From mathematics one knows that mathematical objects are presented as (algebraic) structures like < A, + > representing a set A with an operation '+' on A. For algorithms such representions are not common. But we will do this. Starting with in the beginning rather abstract ideas we will try to apply the general concepts during the course more and more to the concrete cases we will encounter.

As basic setting we are assuming an algorithm with input and output and a systemfunction responding to the input through the output:

ALGO(a) iff a = << I,O,EXPR,T,D>, f >

with

I := input set with I C EXPR x Dⁿ

EXPR := set of expressions

D := set of data items

T := set of timepoints

O := output set

f_D: I x T ---> O x T (:= System function)

Thus an input element could be an item like <search,x> forcing the system to respond as f_D(<search,x>) = <yes>.

During the following lectures we have to elaborate this concept step by step.

START

4. Balanced binary search trees

In the next lecture we will show, what are trees compared to general graphs, and what are the special properties transforming a tree in a binary tree, then into a binary search tree and finally into a balanced binary search tree.

We will motivated these decisions and we will show implementations of basic tree structures and basic tree operations.

START

5. Splay trees

Algorithms for splay trees will be our first example of a special tree type. Assuming unequal frequency distributions in the input and having no data structures at the beginning splay trees built their data structures at runtime and they try to optimize their response behaviour by moving the more often used items to the top of the tree. It has to be shown, how one can implement operations on the tree which keep the overall structure of the tree approximately 'balanced'. Further on one has to compute theoretical estimates about the response behavior. Additionally we have to do some empirical measurements to check the theoretical values.

START

6. AVL trees

AVL trees (Adelson - Velskij - Landis have been the names of the authors of this algorithm) have been created 1962 as the historically first example of a balanced binary search tree. The idea is to keep the the tree in some balance defined on the length of the different paths of the tree.If one node n on a path has length k then should the two subtrees of n not have a length bigger or smaller than {+/-1}. To indicate the actual situation at every node they used to attache to every node in the tree a label with the balance factor p in {-1, 0, +1}.

Because inserting or removing can destroy a balance, the algorithm has to provide some operations to reconstruct a balance again. These operations are so-called rotation operations which can change the shape of the tree with regard to the length of the individual paths but which are leaving untouched the order of the keys.

START

7. Red-black trees

Red-black trees are a more 'modern' form of a balanced binary search tree. Compared to AVL-trees they are optimized because they have less costs during reordering of a tree which has lost it's balance. Instead of a balance factor p they are using colouring information, because every node has to be coloured either red or black. The way the nodes are coloured reflects a certain order of the nodes and implicitly the information about the existing balance. Destroying the balance causes the order of the coloring to become 'irregular'. Reordering does mean here reorder the coloring.

During the last year it could be shown that many different types of balanced binary search trees can be subsumed as subcases under the concept of red-black trees, therefore and because red-black trees are rather efficient, they have gained a good acceptance in the community (cf. the case mentioned above, that the data structures 'set' and 'map' of the C++-Standard Library are realized as red-black trees!).

START

8. Bayesian networks

Another interesting kind of algorithms are Bayesian networks. Instead of using binar trees they are using directed acycliy graphs (DACs) to represent items and their relationships. The nodes are variables which have certain states and probabilities and the edges are representing observed or assumed dependencies between the variables. Bayesian networks can be used to compute the likelihood that some event depending from others will happen in the future. Because the whole topic is rather large we can only present an introductory exampel to give some glance of the power and beauty of this concept.

START

9. Deque

The data structure deque of the C++-STL is like a vector, but you can add or remove elements at the front or the end of the structure (with a vector you can only add at the end!). The deque is used as basis to define the data structure stack, queue and priority queue. We will use it in the next exercice.