BIROn - Birkbeck Institutional Research Online

    Live monitoring 4chan discussion threads

    Pozzana, Iacopo and Prifti, Ylli and Provetti, Alessandro (2021) Live monitoring 4chan discussion threads. In: IC2S2 2021: 7th International Conference on Computational Social Science, 27-31 Jul 2021, Online.

    [img]
    Preview
    Text
    4chan_scrape.pdf

    Download (388kB) | Preview

    Abstract

    The 4chan portal has been known for several years as a ``fringe'' internet service for sharing and commenting pictures. Thanks to the possibility to post anonymously, guaranteed by the total lack of a registration/identification mechanism, the portal has somewhat evolved to a global, if mostly US-centred, locus for the posting of extreme views, including racism and all sorts of hate speech. A pivotal role in the emergence of the website as a bastion of ``free speech" has been played by the /pol/ board (https://boards.4chan.org/pol/), which declares its commitment to host ``politically incorrect'' discussions. Several research groups have intensively studied 4chan structure, dynamics and contents. Thanks to works such as[4, 12], we now have a fairly clear description of how 4chan works and what type of discussion dynamics the site supports. In particular, the latter work shed light on the extremely ephemeral nature of discussions, with threads lasting on the website for a few hours at most, and often just for minutes - depending on the traffic they generate - before being removed to make room for new discussion. Given the fast-paced nature of the evolution of the content of the boards, and especially given how such ephemerality shapes the tone and the content of the discussion itself [4, 14], it is of extreme importance for researchers to be able to capture the content of the threads at various points over the course of their short lives. To the best of our knowledge, the existing 4chan literature has relied either on autoptic exploration by the scholars [14], or on large scale data collection campaigns that drew their content from the archived versions of the threads [12], i.e. on copies of the threads as they appeared at the time of their closure, and at that time only. In order to observe at a more fine-grained level the content on the website, we devised a ``scraping'' architecture, summarised in Figure 2, which based on the OXPath platform [9]. It enables the retrieval of the threads posted on a board at various points while they are still live.

    Metadata

    Item Type: Conference or Workshop Item (Paper)
    Keyword(s) / Subject(s): 4chan, online conspiracy theories, social web, data retrieval, distributedsystems, web communities
    School: School of Business, Economics & Informatics > Computer Science and Information Systems
    Depositing User: Alessandro Provetti
    Date Deposited: 28 Jan 2022 05:51
    Last Modified: 30 Jan 2022 06:58
    URI: https://eprints.bbk.ac.uk/id/eprint/45359

    Statistics

    Downloads
    Activity Overview
    31Downloads
    26Hits

    Additional statistics are available via IRStats2.

    Archive Staff Only (login required)

    Edit/View Item Edit/View Item