BIROn - Birkbeck Institutional Research Online

    The importance of non-accessible crosslinks and solvent accessible surface distance in modelling proteins with restraints from crosslinking mass spectrometry

    Bullock, Joshua and Schwab, J. and Thalassinos, Konstantinos and Topf, Maya (2016) The importance of non-accessible crosslinks and solvent accessible surface distance in modelling proteins with restraints from crosslinking mass spectrometry. Molecular & Cellular Proteomics 15 (7), pp. 2491-2500. ISSN 1535-9476.

    [img] Text
    15218.pdf - Author's Accepted Manuscript
    Restricted to Repository staff only
    Available under License Creative Commons Attribution.

    Download (1MB)
    [img]
    Preview
    Text
    15218A.pdf - Published Version of Record
    Available under License Creative Commons Attribution.

    Download (1MB) | Preview

    Abstract

    Crosslinking coupled to mass spectrometry (XL-MS) is becoming an increasingly popular technique for modelling protein monomers and complexes. The distance restraints garnered from these experiments can be used alone or as part of an integrative modelling approach, incorporating data from many sources. However, modelling practices are varied and the difference in their usefulness is not clear. Here, we develop a new scoring procedure for models based on crosslink data - Matched and Non-accessible Crosslink score (MNXL). We compare its performance with that of other commonly-used scoring functions (Number of Violations and Sum of Violation Distances) on a benchmark of 14 protein domains, each with 300 corresponding models (at various levels of quality) and associated, previously published, experimental crosslinks (XLdb). The distances between crosslinked lysines are calculated either as Euclidean distances or Solvent Accessible Surface Distances (SASD) using a newly-developed method (Jwalk). MNXL takes into account whether a crosslink is non-accessible, i.e., an experimentally observed crosslink has no corresponding SASD in a model due to buried lysines. This metric alone is shown to have a significant impact on modelling performance and is a concept that is not considered at present if only Euclidean distances are used. Additionally, a comparison between modelling with SASD or Euclidean distance shows that SASD is superior, even when factoring out the effect of the non-accessible crosslinks. Our benchmarking also shows that MNXL outperforms the other tested scoring functions in terms of precision and correlation to Ca-RMSD from the crystal structure. We finally test the MNXL at different levels of crosslink recovery (i.e. the percentage of crosslinks experimentally observed out of all theoretical ones) and set a target recovery of ~20% after which the performance plateaus.

    Metadata

    Item Type: Article
    Keyword(s) / Subject(s): Algorithms, Computer Modeling, Crosslinking, Mass Spectrometry, Protein Cross-linking*, Protein structure*, Non-accessible crosslinks, Solvent Accessible Surface Distance
    School: Birkbeck Schools and Departments > School of Science > Biological Sciences
    Research Centre: Bioinformatics, Bloomsbury Centre for, Structural Molecular Biology, Institute of (ISMB)
    Depositing User: Administrator
    Date Deposited: 18 May 2016 13:38
    Last Modified: 13 Feb 2018 11:13
    URI: http://eprints.bbk.ac.uk/id/eprint/15218

    Statistics

    Downloads
    Activity Overview
    26Downloads
    110Hits

    Additional statistics are available via IRStats2.

    Archive Staff Only (login required)

    Edit/View Item Edit/View Item