BIROn - Birkbeck Institutional Research Online

    Analysis of category co-occurrence in Wikipedia networks

    Klaysri, Thidawan (2019) Analysis of category co-occurrence in Wikipedia networks. Doctoral thesis, Birkbeck, University of London.

    [img]
    Preview
    PDF
    T.Klaysri BBK Thesis.pdf - Full Version

    Download (8MB) | Preview

    Abstract

    Wikipedia has seen a huge expansion of content since its inception. Pages within this online encyclopedia are organised by assigning them to one or more categories, where Wikipedia maintains a manually constructed taxonomy graph that encodes the semantic relationship between these categories. An alternative, called the category co-occurrence graph, can be produced automatically by linking together categories that have pages in common. Properties of the latter graph and its relationship to the former is the concern of this thesis. The analytic framework, called t-component, is introduced to formalise the graphs and discover category clusters connecting relevant categories together. The m-core, a cohesive subgroup concept as a clustering model, is used to construct a subgraph depending on the number of shared pages between the categories exceeding a given threshold t. The significant of the clustering result of the m-core is validated using a permutation test. This is compared to the k-core, another clustering model. TheWikipedia category co-occurrence graphs are scale-free with a few category hubs and the majority of clusters are size 2. All observed properties for the distribution of the largest clusters of the category graphs obey power-laws with decay exponent averages around 1. As the threshold t of the number of shared pages is increased, eventually a critical threshold is reached when the largest cluster shrinks significantly in size. This phenomena is only exhibited for the m-core but not the k-core. Lastly, the clustering in the category graph is shown to be consistent with the distance between categories in the taxonomy graph.

    Metadata

    Item Type: Thesis
    Copyright Holders: The copyright of this thesis rests with the author, who asserts his/her right to be known as such according to the Copyright Designs and Patents Act 1988. No dealing with the thesis contrary to the copyright or moral rights of the author is permitted.
    Depositing User: Acquisitions And Metadata
    Date Deposited: 07 Oct 2019 13:23
    Last Modified: 14 Jun 2021 19:24
    URI: https://eprints.bbk.ac.uk/id/eprint/40441

    Statistics

    Downloads
    Activity Overview
    98Downloads
    76Hits

    Additional statistics are available via IRStats2.

    Archive Staff Only (login required)

    Edit/View Item Edit/View Item