BIROn - Birkbeck Institutional Research Online

    Increasing Crossref Data Reusability With Format Experiments

    Eve, Martin Paul (2024) Increasing Crossref Data Reusability With Format Experiments. Crossref Blog ,

    [img] Text
    Crossref Data.doc - Author's Accepted Manuscript
    Available under License Creative Commons Attribution.

    Download (20kB)

    Abstract

    Every year, Crossref releases a full public data file of all of our metadata. This is partly a commitment to POSI and partly just what we do. We want the community to re-use our metadata and to find interesting ends to which they can be put! However, we have also recognized, for some time, that 170GB of compressed .tar.gz files, spread over 27,000 items, is not the easiest of formats with which to work. For instance, there’s no indexing capacity on these files, meaning that it is virtually impossible simply to pull out the record for a DOI. Decompressing the .tar.gz files takes a good three hours or more even on high-end hardware, without any additional processing. To that end, the Crossref Labs team has been experimenting with different formats for trial release that might allow us to reach broader audiences, including those who have not previously worked with our metadata files. The two new formats, alongside the existing data file format, with which we have been experimenting, are JSON lines and SQLite.

    Metadata

    Item Type: Article
    School: Birkbeck Faculties and Schools > Faculty of Humanities and Social Sciences > School of Creative Arts, Culture and Communication
    Depositing User: Martin Eve
    Date Deposited: 23 Jan 2024 11:25
    Last Modified: 23 Jan 2024 11:25
    URI: https://eprints.bbk.ac.uk/id/eprint/52878

    Statistics

    Activity Overview
    6 month trend
    26Downloads
    6 month trend
    255Hits

    Additional statistics are available via IRStats2.

    Archive Staff Only (login required)

    Edit/View Item
    Edit/View Item