Graph-Based Patterns in Web-Based Systems
Matthias Dehmer
TU Darmstadt
dehmer@tk.informatik.tu-darmstadt.de
Abstract
The application of Data Mining Methods on web-based hypertext data is referred to as Web Mining. In view of the huge amount of information available online Web Mining is an important and fruitful research field. Thereby Web Mining can be divided into three main categories: Web Structure Mining, Web Usage Mining and Web Content Mining. In the area of Web Structure Mining we explored graph based patterns by extracting web-based hypertext structures. In contrast to this, graph based patterns in Web Usage Mining can be obtained by reworking data of large Web Logs. Those graph based patterns manifest a special class of graphs: node labelled, hierarchic and directed graphs. On the basis of the method of Dehmer et al., which determines the similarity of such graphs by means of string alignments, we obtain a generic model in the sense of consistent graph similarity measuring. We use this kind of web-based graph matching in terms of graph classification and graph clustering. In this sense the paper sheds on light on the task of automatically analyzing web genre data.