BIROn - Birkbeck Institutional Research Online

    A Bayesian hierarchical model for comparing average F1 scores

    Zhang, Dell and Wang, J. and Zhao, X. and Wang, X. (2015) A Bayesian hierarchical model for comparing average F1 scores. In: UNSPECIFIED (ed.) 2015 IEEE International Conference on Data Mining (ICDM). IEEE Computer Society, pp. 589-598. ISBN 9781467395038.

    [img]
    Preview
    Text
    PID3868347.pdf - Author's Accepted Manuscript

    Download (1MB) | Preview

    Abstract

    In multi-class text classification, the performance (effectiveness) of a classifier is usually measured by micro-averaged and macro-averaged F1 scores. However, the scores themselves do not tell us how reliable they are in terms of forecasting the classifier's future performance on unseen data. In this paper, we propose a novel approach to explicitly modelling the uncertainty of average F1 scores through Bayesian reasoning, and demonstrate that it can provide much more comprehensive performance comparison between text classifiers than the traditional frequentist null hypothesis significance testing (NHST).

    Metadata

    Item Type: Book Section
    Additional Information: 14-17 Nov 2015, Atlantic City, NJ.
    Keyword(s) / Subject(s): text classification, performance evaluation, hypothesis testing, model comparison, Bayesian inference.
    School: School of Business, Economics & Informatics > Computer Science and Information Systems
    Research Centres and Institutes: Birkbeck Knowledge Lab, Data Analytics, Birkbeck Institute for
    Depositing User: Dr Dell Zhang
    Date Deposited: 21 Oct 2015 15:18
    Last Modified: 25 Jun 2020 17:30
    URI: https://eprints.bbk.ac.uk/id/eprint/13086

    Statistics

    Downloads
    Activity Overview
    789Downloads
    189Hits

    Additional statistics are available via IRStats2.

    Archive Staff Only (login required)

    Edit/View Item Edit/View Item