Zhang, Dell and Lee, W.S. (2004) Experience of using SVM for the triage task in TREC 2004 genomics track. In: Voorhees, E.M. and Buckland, L.P. (eds.) TREC 2004: Proceedings of the Thirteenth Text REtrieval Conference. NIST Special Publication 500. The National Institute of Standards and Technology.
Abstract
This paper reports our knowledge-ignorant machine learning approach to the triage task in TREC2004 genomics track, which is actually a text categorization problem. We applied Support Vector Machine (SVM) and found that information-gain based feature selection is helpful. Although we achieved decent performance in leave-one-out cross-validation experiments, the evaluation result on the test data turned out to be surprisingly poor. Further experiments revealed that there is a chasm between the training and test data distributions. It seems that more aggressive feature selection can partially alleviate the trouble caused by distribution change.
Metadata
Item Type: | Book Section |
---|---|
School: | Birkbeck Faculties and Schools > Faculty of Science > School of Computing and Mathematical Sciences |
Depositing User: | Sarah Hall |
Date Deposited: | 15 Nov 2021 15:31 |
Last Modified: | 09 Aug 2023 12:52 |
URI: | https://eprints.bbk.ac.uk/id/eprint/46732 |
Statistics
Additional statistics are available via IRStats2.