BIROn - Birkbeck Institutional Research Online

    A community effort to optimize sequence-based deep learning models of gene regulation

    Rafi, Abdul Muntakim and Nogina, D. and Penzar, D. and Lee, D. and Lee, D. and Kim, N. and Kim, S. and Kim, D. and Shin, Y. and Kwak, I.-Y. and Meshcheryakov, G. and Lando, A. and Zinkevich, A. and Kim, B.-C. and Lee, J. and Kang, T. and Vaishnav, E.D. and Yadollahpour, P. and Bornelöv, S. and Svensson, F. and Trapotsi, M.-A. and Tran, D. and Nguyen, T. and Tu, X. and Zhang, W. and Qiu, W. and Ghotra, R. and Yu, Y. and Labelson, E. and Prakash, A. and Narayanan, A. and Koo, P. and Chen, X. and Jones, D.T. and Tinti, M. and Guan, Y. and Ding, M. and Chen, K. and Yang, Y. and Ding, K. and Dixit, G. and Wen, J. and Zhou, Z. and Dutta, P. and Sathian, R. and Surana, P. and Ji, Y. and Liu, H. and Davuluri, R.V. and Hiratsuka, Y. and Takatsu, M. and Chen, T.-M. and Huang, C.-H. and Wang, H.-K. and Shih, E.S.C. and Chen, S.-H. and Wu, C.-H. and Chen, J.-Y. and Huang, K.-L. and Alsaggaf, I. and Greaves, P. and Barton, Carl and Wan, Cen and Abad, N. and Körner, C. and Feuerbach, L. and Brors, B. and Li, Y. and Röner, S. and Dash, P.M. and Schubach, M. and Soylemez, O. and Møller, A. and Kavaliauskaite, G. and Madsen, J. and Lu, Z. and Queen, O. and Babjac, A. and Emrich, S. and Kardamiliotis, K. and Kyriakidis, K. and Malousi, A. and Palaniappan, A. and Gupta, K. and Kumar S, P. and Bradford, J. and Perrin, D. and Salomone, R. and Schmitz, C. and JiaXing, C. and JingZhe, W. and AiWei, Y. and Kim, S. and Albrecht, J. and Regev, A. and Gong, W. and Kulakovskiy, I.V. and Meyer, P. and de Boer, C.G. (2024) A community effort to optimize sequence-based deep learning models of gene regulation. Nature Biotechnology , ISSN 1087-0156.

    [img] Text
    55373.pdf - Published Version of Record
    Available under License Creative Commons Attribution.

    Download (3MB)

    Abstract

    A systematic evaluation of how model architectures and training strategies impact genomics model performance is needed. To address this gap, we held a DREAM Challenge where competitors trained models on a dataset of millions of random promoter DNA sequences and corresponding expression levels, experimentally determined in yeast. For a robust evaluation of the models, we designed a comprehensive suite of benchmarks encompassing various sequence types. All top-performing models used neural networks but diverged in architectures and training strategies. To dissect how architectural and training choices impact performance, we developed the Prix Fixe framework to divide models into modular building blocks. We tested all possible combinations for the top three models, further improving their performance. The DREAM Challenge models not only achieved state-of-the-art results on our comprehensive yeast dataset but also consistently surpassed existing benchmarks on Drosophila and human genomic datasets, demonstrating the progress that can be driven by gold-standard genomics datasets.

    Metadata

    Item Type: Article
    School: Birkbeck Faculties and Schools > Faculty of Science > School of Computing and Mathematical Sciences
    Depositing User: Administrator
    Date Deposited: 04 Apr 2025 15:55
    Last Modified: 14 May 2025 03:46
    URI: https://eprints.bbk.ac.uk/id/eprint/55373

    Statistics

    Activity Overview
    6 month trend
    3Downloads
    6 month trend
    19Hits

    Additional statistics are available via IRStats2.

    Archive Staff Only (login required)

    Edit/View Item
    Edit/View Item