Publishing technologies and digital preservation

Birkbeck Centre for Technology and Publishing. 22nd March 2018.

A research paper

Professor Martin Paul Eve, Birkbeck, University of London

Common technologies in academic publishing

  • Platform Workflows
  • XML-First Production Workflows
  • JATS/BITS XML
  • XSLT
  • DOIs
  • LOCKSS/CLOCKSS/Portico

Platform workflows for academic journal publishing

A platform workflow

Output Documents, Production, and Standards

  • HTML
  • PDF

How do we make these and what are they?

HTML (HyperText Markup Language)

  • It's what you see "in your browser"
  • The "web" version of the article
  • Layout dynamically rendered by browsers with CSS
  • Semantically-poor format
  • Usually derived from semantically-rich XML

PDF (Portable Document Format)

  • Transposable between platforms
  • Preserves layout
  • Semantically-poor format
  • Usually produced in an Adobe InDesign workflow

XML-First Production Workflows

XML Workflows

Image by Jonathan McGlone

True XML-First Workflows are Really Hard to Develop

Library Economy

What does JATS XML Look Like (and why)?

JATS XML

What Happens to the JATS XML?

  • It goes through a process of XML Stylesheet Transformation (XSLT)
  • Converts to HTML
  • Semantic richness is lost, but display becomes possible
  • XML usually not available to readers but hidden behind the scenes...
  • ... Useful for digital preservation

Digital Preservation

  • Basic principle: Lots of Copies Keeps Stuff Safe
  • Same principle as print preservation
  • Print doesn't preserve itself; we built libraries for this
  • LOCKSS, CLOCKSS and Portico are the main preservation systems in operation today

LOCKSS

  • Open-source, library-led digital preservation system
  • Libraries run LOCKSS boxes (computers/servers) at their institutions, which keep copies of their local collection.
  • Each library checks its own content and compares this to the same content at other libraries.
  • If damage is detected, the network will repair the damaged copy.
  • Dynamically migrates content to newer formats.
  • On a trigger event, the content is released to patrons at the participating library.

CLOCKSS

  • Controlled Lots of Copies Keep Stuff Safe
  • A private LOCKSS network that can be joined by publishers.
  • Publishers sign up with their titles and pay a membership fee.
  • On a trigger event, the content is released to everyone.

Portico

  • Participating libraries deposit content
  • Third party service
  • Independently certified by the Center for Research Libraries

What is the major threat in the digital world?

  • People assume that it's the zombie-apocalyse major shutdown of electricity grids
  • It isn't
  • It's the economic cost of preservation and the inability to preserve everything (i.e. we must preselect)

The End

Thank you!

Presentation licensed under a CC BY-SA 3.0 license. All institutional images excluded from CC license. Available to view online at https://meve.io/BBKCTP2018.