--- layout: post title: The UK copyright exemption for text and data mining vs. the DMCA and EUCD categories: [copyright, DH] tags: [copyright, DH] published: True --- New provisions in UK copyright law look promising for text and data mining. Last year, the government signed into effect an exemption to copyright for the purposes of non-commercial research. [This states that](https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/375954/Research.pdf): >If a researcher has the right to read a copyright document under the terms of the licensing agreement with the content provider, they must be permitted to copy the work for the purpose of non-commercial text and data mining. Wonderful! So all those novels that are in copyright can actually be data-mined if we can get a digital copy. Except, as I discovered in a conversation with one of my Ph.D. students today, that is quite a large caveat and it turns out to be not quite so straightforward. If we have a digital copy we can text mine it. However, if there are DRM (Digital Rights Management) restrictions on the text, we cannot remove those protections, even for the purpose of non-commercial research. This would violate the Digital Millennium Copyright Act in the USA and/or Article 6 of the [European Copyright Directive](https://en.wikipedia.org/wiki/Copyright_Directive), which comes with severe penalties. On the other hand, if we saw the spines off the books and run them through a scanner and OCR process, that's fine for personal research. There is an exemption, [apparently](https://en.wikipedia.org/wiki/Digital_Millennium_Copyright_Act), for "Literary works distributed in e-book format when all existing e-book editions of the work (including digital text editions made available by authorized entities) contain access controls that prevent the enabling either of the book's read-aloud function or of screen readers that render the text into a specialized format. (A renewed exemption from 2006, based on a similar exemption approved in 2003.)" But that's no good here. This is patently ridiculous and it should be an exemption to the DMCA in the USA and the EUCD.