Experience of using SVM for the triage task in TREC 2004 genomics track
Zhang, Dell and Lee, W.S. (2004) Experience of using SVM for the triage task in TREC 2004 genomics track. In: Voorhees, E.M. and Buckland, L.P. (eds.) TREC 2004: Proceedings of the Thirteenth Text REtrieval Conference. NIST Special Publication 500. The National Institute of Standards and Technology.
Abstract
This paper reports our knowledge-ignorant machine learning approach to the triage task in TREC2004 genomics track, which is actually a text categorization problem. We applied Support Vector Machine (SVM) and found that information-gain based feature selection is helpful. Although we achieved decent performance in leave-one-out cross-validation experiments, the evaluation result on the test data turned out to be surprisingly poor. Further experiments revealed that there is a chasm between the training and test data distributions. It seems that more aggressive feature selection can partially alleviate the trouble caused by distribution change.
Metadata
Item Type: | Book Section |
---|---|
School: | Birkbeck Faculties and Schools > Faculty of Science > School of Computing and Mathematical Sciences |
Depositing User: | Sarah Hall |
Date Deposited: | 15 Nov 2021 15:31 |
Last Modified: | 09 Aug 2023 12:52 |
URI: | https://eprints.bbk.ac.uk/id/eprint/46732 |
Statistics
Additional statistics are available via IRStats2.