BIROn - Birkbeck Institutional Research Online

    XCQ: A queriable XML compression system

    Ng, W. and Lam, W.Y. and Wood, P.T. and Levene, M. (2006) XCQ: A queriable XML compression system. Knowledge and Information Systems 10 (4), pp. 421-452. ISSN 0219-1377.

    Full text not available from this repository.

    Abstract

    XML has already become the de facto standard for specifying and exchanging data on the Web. However, XML is by nature verbose and thus XML documents are usually large in size, a factor that hinders its practical usage, since it substantially increases the costs of storing, processing, and exchanging data. In order to tackle this problem, many XML-specific compression systems, such as XMill, XGrind, XMLPPM, and Millau, have recently been proposed. However, these systems usually suffer from the following two inadequacies: They either sacrifice performance in terms of compression ratio and execution time in order to support a limited range of queries, or perform full decompression prior to processing queries over compressed documents. In this paper, we address the above problems by exploiting the information provided by a Document Type Definition (DTD) associated with an XML document. We show that a DTD is able to facilitate better compression as well as generate more usable compressed data to support querying. We present the architecture of the XCQ, which is a compression and querying tool for handling XML data. XCQ is based on a novel technique we have developed called DTD Tree and SAX Event Stream Parsing (DSP). The documents compressed by XCQ are stored in Partitioned Path-Based Grouping (PPG) data streams, which are equipped with a Block Statistics Signature (BSS) indexing scheme. The indexed PPG data streams support the processing of XML queries that involve selection and aggregation, without the need for full decompression. In order to study the compression performance of XCQ, we carry out comprehensive experiments over a set of XML benchmark datasets.

    Metadata

    Item Type: Article
    School: School of Business, Economics & Informatics > Computer Science and Information Systems
    Depositing User: Sarah Hall
    Date Deposited: 25 May 2021 19:05
    Last Modified: 25 May 2021 19:05
    URI: https://eprints.bbk.ac.uk/id/eprint/44427

    Statistics

    Downloads
    Activity Overview
    0Downloads
    9Hits

    Additional statistics are available via IRStats2.

    Archive Staff Only (login required)

    Edit/View Item Edit/View Item