BIROn - Birkbeck Institutional Research Online

    Lessons from the Library: Extreme Minimalist Scaling at Pirate Ebook Platforms

    Eve, Martin Paul (2022) Lessons from the Library: Extreme Minimalist Scaling at Pirate Ebook Platforms. Digital Humanities Quarterly 16 (3), ISSN 1938-4122.

    DHQ_ Digital Humanities Quarterly_ Lessons from the Library_ Extreme Minimalist Scaling at Pirate Ebook Platforms.pdf - Published Version of Record
    Available under License Creative Commons Attribution No Derivatives.

    Download (460kB) | Preview


    At 33TB of data in its main collection, the highly illegal Library Genesis project is one of the largest repositories of copyright-violating educational ebooks ever created. Established over a decade ago in 2008 the goal of Library Genesis is nothing short of a modern Library of Alexandria, albeit without anyone’s legal sanction. As one of its administrators wrote: ‘within decades, generations of people everywhere in the world will grow up with access to the best scientific texts of all time. [...] [T]he quality and accessibility of education to the poor will grow dramatically too. Frankly, I see this as the only way to naturally improve mankind: we need to make all the information available to them at any time’ (Bodó 2018b). Rooted in its homeland’s Russian communist principles and particularly the Soviet isolationist copyright policies of the twentieth century, LibGen is a formidable resource and threat to conventional academic publishers. The Library Genesis database had just short of 1.2m records (books) in 2014 (Bodó 2018a). As of January 2020, this capacity has doubled to 2.5m books. In this article, I examine the minimalist computational design choices taken by this maximal-in-intent, illicit archive of epistemological dissent and how such decisions have shaped the scalability and growth of the platform. This includes LibGen’s numerical subdivision of record identifiers into ‘buckets’ to work around directory file limitations in the GNU/Linux operating system; its use of md5 hashing of filenames within directories capped at 1,000 files to avoid future hashing collisions while allowing for on-disk integrity checking; and its use of the MySQL socket/network server as opposed to SQLite or similar disk-based database. Beyond these computational details, though, the theoretical tension that this article highlights is the path dependencies that are set in (illegal) computational projects that have goals of absolute abundance and maximalist capacity, and the minimalist design principles that they must instigate at the outset to ensure a degree of scalability. I also query the ways in which the project’s contested mission statements target an economic (geographic) audience demographic with only minimalist access to high-capacity computing resources. I finally examine the limits on scalability of the distribution of the Library Genesis through its torrent archive and other distributed networking technologies such as IFS, which despite their promise of peer-to-peer redundancy fall down on an archive of this size.


    Item Type: Article
    School: Birkbeck Faculties and Schools > Faculty of Humanities and Social Sciences > School of Creative Arts, Culture and Communication
    Depositing User: Martin Eve
    Date Deposited: 13 Oct 2021 08:35
    Last Modified: 09 Aug 2023 12:52


    Activity Overview
    6 month trend
    6 month trend

    Additional statistics are available via IRStats2.

    Archive Staff Only (login required)

    Edit/View Item Edit/View Item