Extracting key-substring-group features for text classification

Zhang, Dell and Lee, W.S. (2006) Extracting key-substring-group features for text classification. In: Eliassi-Rad, T. and Ungar, L.H. and Craven, M. and Gunopulos, D. (eds.) Proceedings of the Twelfth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, pp. 474-483. ISBN 978159593-3393.

Full text not available from this repository.

Official URL: https://doi.org/10.1145/1150402.1150455

Abstract

In many text classification applications, it is appealing to take every document as a string of characters rather than a bag of words. Previous research studies in this area mostly focused on different variants of generative Markov chain models. Although discriminative machine learning methods like Support Vector Machine (SVM) have been quite successful in text classification with word features, it is neither effective nor efficient to apply them straightforwardly taking all substrings in the corpus as features. In this paper, we propose to partition all substrings into statistical equivalence groups, and then pick those groups which are important (in the statistical sense) as features (named key-substring-group features) for text classification. In particular, we propose a suffix tree based algorithm that can extract such features in linear time (with respect to the total number of characters in the corpus). Our experiments on English, Chinese and Greek datasets show that SVM with key-substring-group features can achieve outstanding performance for various text classification tasks.

Metadata

Item Type:	Book Section
School:	Birkbeck Faculties and Schools > Faculty of Science > School of Computing and Mathematical Sciences
Depositing User:	Sarah Hall
Date Deposited:	15 Nov 2021 14:09
Last Modified:	09 Aug 2023 12:52
URI:	https://eprints.bbk.ac.uk/id/eprint/46725

Statistics

DownloadsShow export options

Activity Overview

6 month trend

0Downloads

6 month trend

180Hits

Additional statistics are available via IRStats2.

Archive Staff Only (login required)

Edit/View Item