Zhang, Dell and Mao, R. (2008) Extracting community structure features for hypertext classification. In: Pichappan, P. and Abraham, A. (eds.) 2008 Third International Conference on Digital Information Management. Piscataway, U.S.: IEEE, pp. 436-441. ISBN 9781424429165.
Abstract
Standard text classification techniques assume that all documents are independent and identically distributed (i.i.d.). However, hypertext documents such as Web pages are interconnected with links. How to take advantage of such links as extra evidences to enhance automatic classification of hypertext documents is a non-trivial problem. We think a collection of interconnected hypertext documents can be considered as a complex network, and the underlying community structure of such a document network contains valuable clues about the right classification of documents. This paper introduces a new technique, modularity Eigenmap, that can effectively extract community structure features from the document network which is induced from document link information only or constructed by combining both document content and document link information. A number of experiments on real-world benchmark datasets show that the proposed approach leads to excellent classification performance in comparison with the state-of-the-art methods.
Metadata
Item Type: | Book Section |
---|---|
Additional Information: | Third IEEE International Conference on Digital Information Management (ICDIM), November 13-16, 2008, London, UK, Proceedings |
School: | Birkbeck Faculties and Schools > Faculty of Science > School of Computing and Mathematical Sciences |
Research Centres and Institutes: | Birkbeck Knowledge Lab |
Depositing User: | Administrator |
Date Deposited: | 30 May 2013 09:00 |
Last Modified: | 09 Aug 2023 12:33 |
URI: | https://eprints.bbk.ac.uk/id/eprint/7080 |
Statistics
Additional statistics are available via IRStats2.