BIROn - Birkbeck Institutional Research Online

    Machine learning approaches to TCR binding prediction: benchmarking existing methods and developing a novel model for predicting TCR-MHC specificity

    Gore, Trupti Amol (2025) Machine learning approaches to TCR binding prediction: benchmarking existing methods and developing a novel model for predicting TCR-MHC specificity. PhD thesis, Birkbeck, University of London.

    [img] Text
    Gore T, final thesis for library.pdf

    Download (4MB)

    Abstract

    CD8+ T cells are key components of the adaptive immune system, playing crucial roles in targeting intracellular pathogens and in tumour surveillance. T cell immune function is mediated by their surface T cell receptors (TCRs), which bind to complexes formed between antigenic peptides and MHC molecules. It is estimated that a typical individual has around 108 unique CD8+ T cells. The sequences of many millions of unique TCRs are known (e.g. from single cell sequencing experiments), but in all but a tiny fraction of cases their antigenic targets are unknown. This gap between the TCR sequence and TCR function affords a key motivation for the research presented in this thesis, which concerns two complementary computational prediction tasks. The first task involved the evaluation of state-of-the-art deep learning tools that predict TCR binding to peptide-MHC complexes. This is known to be a challenging task, notably because TCR binding is modulated by six flexible loops. Tools were retrained in order to address the generalised TCR binding prediction task: can a tool predict whether a given TCR will bind to an antigenic peptide not present in the dataset used to train the tool? The results demonstrated that only two tools proved moderately successful at generalised prediction. A subsequent correlation analysis provides useful insights into the factors associated with prediction success or failure. MHC molecules, encoded by HLA alleles, are highly polymorphic and the HLA types of the individuals from which TCR data is acquired is rarely known. A TCR is said to be HLA-restricted, i.e. it will commonly bind to complexes involving a single type of MHC. The second task addressed in this thesis involved designing a transformer-based deep learning tool for predicting TCR-HLA associations. The tool achieved AUCs of 0.68 and 0.76 for HLA alleles present in both training and test sets.

    Metadata

    Item Type: Thesis
    Copyright Holders: The copyright of this thesis rests with the author, who asserts his/her right to be known as such according to the Copyright Designs and Patents Act 1988. No dealing with the thesis contrary to the copyright or moral rights of the author is permitted.
    Depositing User: Acquisitions And Metadata
    Date Deposited: 13 Mar 2025 10:44
    Last Modified: 18 Sep 2025 03:38
    URI: https://eprints.bbk.ac.uk/id/eprint/55155
    DOI: https://doi.org/10.18743/PUB.00055155

    Statistics

    Activity Overview
    6 month trend
    0Downloads
    6 month trend
    0Hits

    Additional statistics are available via IRStats2.

    Archive Staff Only (login required)

    Edit/View Item
    Edit/View Item