Borges, J. and Levene, Mark (2000) A fine grained heuristic to capture web navigation patterns. SIGKDD Explorations 2 (1), pp. 4050.

Text
levene7.pdf Download (207kB)  Preview 
Abstract
In previous work we have proposed a statistical model to capture the user behaviour when browsing the web. The user navigation information obtained from web logs is modelled as a hypertext probabilistic grammar (HPG) which is within the class of regular probabilistic grammars. The set of highest probability strings generated by the grammar corresponds to the user preferred navigation trails. We have previously conducted experiments with a BreadthFirst Search algorithm (BFS) to perform the exhaustive computation of all the strings with probability above a specified cutpoint, which we call the rules. Although the algorithm’s running time varies linearly with the number of grammar states, it has the drawbacks of returning a large number of rules when the cutpoint is small and a small set of very short rules when the cutpoint is high. In this work, we present a new heuristic that implements an iterative deepening search wherein the set of rules is incrementally augmented by first exploring trails with high probability. A stopping parameter is provided which measures the distance between the current ruleset and its corresponding maximal set obtained by the BFS algorithm. When the stopping parameter takes the value zero the heuristic corresponds to the BFS algorithm and as the parameter takes values closer to one the number of rules obtained decreases accordingly. Experiments were conducted with both real and synthetic data and the results show that for a given cutpoint the number of rules induced increases smoothly with the decrease of the stopping criterion. Therefore, by setting the value of the stopping criterion the analyst can determine the number and quality of rules to be induced; the quality of a rule is measured by both its length and probability.
Item Type:  Article 

Additional Information:  The second author was at University College London when this paper was published. He is currently Professor of Computer Science at Birkbeck College 
Keyword(s) / Subject(s):  Usage mining, navigation patterns, hypertext probabilistic grammars 
School or Research Centre:  Birkbeck Schools and Research Centres > School of Business, Economics & Informatics > Computer Science and Information Systems 
Depositing User:  Administrator 
Date Deposited:  26 Sep 2005 
Last Modified:  30 Aug 2013 09:15 
URI:  http://eprints.bbk.ac.uk/id/eprint/232 
Archive Staff Only (login required)
Edit/View Item 