BIROn - Birkbeck Institutional Research Online

    Learning structured medical information from social media

    Abul, Hasan and Mark, Levene and David, Weston (2020) Learning structured medical information from social media. Journal of Biomedical Informatics 110 (103568), ISSN 1532-0464.

    [img]
    Preview
    Text
    JBI_final_version.pdf - Author's Accepted Manuscript
    Available under License Creative Commons Attribution Non-commercial No Derivatives.

    Download (988kB) | Preview

    Abstract

    Our goal is to summarise and aggregate information from social media regarding the symptoms of a disease, the drugs used and the treatment effects both positive and negative. To achieve this we first apply a supervised machine learning method to automatically extract medical concepts from natural language text. In an environment such as social media, where new data is continuously streamed, we need a methodology that will allow us to continuously train with the new data. To attain such incremental re-training, a semi-supervised methodology is developed, which is capable of learning new concepts from a small set of labelled data together with the much larger set of unlabelled data. The semi-supervised methodology deploys a conditional random field (CRF) as the base-line training algorithm for extracting medical concepts. The methodology iteratively augments to the training set sentences having high confidence, and adds terms to existing dictionaries to be used as features with the base-line model for further classification. Our empirical results show that the base-line CRF performs strongly across a range of different dictionary and training sizes; when the base-line is built with the full training data the $F_1$ score reaches the range 84\%--90\%. Moreover, we show that the semi-supervised method produces a mild but significant improvement over the base-line. We also discuss the significance of the potential improvement of the semi-supervised methodology and found that it is significantly more accurate in most cases than the underlying base-line model.

    Metadata

    Item Type: Article
    Keyword(s) / Subject(s): Social media mining, Medical concept extraction, Pharmacovigilance, Conditional random fields, Semi-supervised algorithm
    School: Birkbeck Faculties and Schools > Faculty of Science > School of Computing and Mathematical Sciences
    Depositing User: Abul Hasan
    Date Deposited: 16 Sep 2020 09:25
    Last Modified: 09 Aug 2023 12:49
    URI: https://eprints.bbk.ac.uk/id/eprint/40845

    Statistics

    Activity Overview
    6 month trend
    70Downloads
    6 month trend
    160Hits

    Additional statistics are available via IRStats2.

    Archive Staff Only (login required)

    Edit/View Item Edit/View Item