BIROn - Birkbeck Institutional Research Online

    Measuring structural similarity of semistructured data based on information-theoretic approaches

    Helmer, Sven and Augsten, N. and Böhlen, M. (2012) Measuring structural similarity of semistructured data based on information-theoretic approaches. The VLDB Journal 21 (5), pp. 677-702. ISSN 1066-8888.

    Full text not available from this repository.

    Abstract

    We propose and experimentally evaluate different approaches for measuring the structural similarity of semistructured documents based on information-theoretic concepts. Common to all approaches is a two-step procedure: first, we extract and linearize the structural information from documents, and then, we use similarity measures that are based on, respectively, Kolmogorov complexity and Shannon entropy to determine the distance between the documents. Compared to other approaches, we are able to achieve a linear run-time complexity and demonstrate in an experimental evaluation that the results of our technique in terms of clustering quality are on a par with or even better than those of other, slower approaches.

    Metadata

    Item Type: Article
    School: Birkbeck Faculties and Schools > Faculty of Science > School of Computing and Mathematical Sciences
    Depositing User: Sarah Hall
    Date Deposited: 24 May 2013 08:13
    Last Modified: 09 Aug 2023 12:33
    URI: https://eprints.bbk.ac.uk/id/eprint/6996

    Statistics

    Activity Overview
    6 month trend
    0Downloads
    6 month trend
    242Hits

    Additional statistics are available via IRStats2.

    Archive Staff Only (login required)

    Edit/View Item
    Edit/View Item