Eve, Martin Paul (2024) Increasing Crossref Data Reusability With Format Experiments. Crossref Blog ,
Text
Crossref Data.doc - Author's Accepted Manuscript Available under License Creative Commons Attribution. Download (20kB) |
Abstract
Every year, Crossref releases a full public data file of all of our metadata. This is partly a commitment to POSI and partly just what we do. We want the community to re-use our metadata and to find interesting ends to which they can be put! However, we have also recognized, for some time, that 170GB of compressed .tar.gz files, spread over 27,000 items, is not the easiest of formats with which to work. For instance, there’s no indexing capacity on these files, meaning that it is virtually impossible simply to pull out the record for a DOI. Decompressing the .tar.gz files takes a good three hours or more even on high-end hardware, without any additional processing. To that end, the Crossref Labs team has been experimenting with different formats for trial release that might allow us to reach broader audiences, including those who have not previously worked with our metadata files. The two new formats, alongside the existing data file format, with which we have been experimenting, are JSON lines and SQLite.
Metadata
Item Type: | Article |
---|---|
School: | Birkbeck Faculties and Schools > Faculty of Humanities and Social Sciences > School of Creative Arts, Culture and Communication |
Depositing User: | Martin Eve |
Date Deposited: | 23 Jan 2024 11:25 |
Last Modified: | 23 Jan 2024 11:25 |
URI: | https://eprints.bbk.ac.uk/id/eprint/52878 |
Statistics
Additional statistics are available via IRStats2.