BIROn - Birkbeck Institutional Research Online

Extracting community structure features for hypertext classification

Zhang, Dell and Mao, R. (2008) Extracting community structure features for hypertext classification. In: Pichappan, P. and Abraham, A. (eds.) 2008 Third International Conference on Digital Information Management. Piscataway, U.S.: IEEE, pp. 436-441. ISBN 9781424429165.

Full text not available from this repository.

Abstract

Standard text classification techniques assume that all documents are independent and identically distributed (i.i.d.). However, hypertext documents such as Web pages are interconnected with links. How to take advantage of such links as extra evidences to enhance automatic classification of hypertext documents is a non-trivial problem. We think a collection of interconnected hypertext documents can be considered as a complex network, and the underlying community structure of such a document network contains valuable clues about the right classification of documents. This paper introduces a new technique, modularity Eigenmap, that can effectively extract community structure features from the document network which is induced from document link information only or constructed by combining both document content and document link information. A number of experiments on real-world benchmark datasets show that the proposed approach leads to excellent classification performance in comparison with the state-of-the-art methods.

Metadata

Item Type: Book Section
Additional Information: Third IEEE International Conference on Digital Information Management (ICDIM), November 13-16, 2008, London, UK, Proceedings
School: Birkbeck Faculties and Schools > Faculty of Science > School of Computing and Mathematical Sciences
Research Centres and Institutes: Birkbeck Knowledge Lab
Depositing User: Administrator
Date Deposited: 30 May 2013 09:00
Last Modified: 09 Aug 2023 12:33
URI: https://eprints.bbk.ac.uk/id/eprint/7080

Statistics

6 month trend
0Downloads
6 month trend
289Hits

Additional statistics are available via IRStats2.

Archive Staff Only (login required)

Edit/View Item
Edit/View Item