Zhang, Dell and Chen, X. and Lee, W.S. (2005) Text classification with kernels on the multinomial manifold. In: Baeza-Yates, R.A. and Ziviani, N. and Marchionini, G. and Moffat, A. and Tait, J. (eds.) SIGIR 2005: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, pp. 266-273. ISBN 9781595930347.
Abstract
Support Vector Machines (SVMs) have been very successful in text classification. However, the intrinsic geometric structure of text data has been ignored by standard kernels commonly used in SVMs. It is natural to assume that the documents are on the multinomial manifold, which is the simplex of multinomial models furnished with the Riemannian structure induced by the Fisher information metric. We prove that the Negative Geodesic Distance (NGD) on the multinomial manifold is conditionally positive definite (cpd), thus can be used as a kernel in SVMs. Experiments show the NGD kernel on the multinomial manifold to be effective for text classification, significantly outperforming standard kernels on the ambient Euclidean space.
Metadata
Item Type: | Book Section |
---|---|
School: | Birkbeck Faculties and Schools > Faculty of Science > School of Computing and Mathematical Sciences |
Depositing User: | Sarah Hall |
Date Deposited: | 15 Nov 2021 14:15 |
Last Modified: | 09 Aug 2023 12:52 |
URI: | https://eprints.bbk.ac.uk/id/eprint/46726 |
Statistics
Additional statistics are available via IRStats2.