BIROn - Birkbeck Institutional Research Online

    A methodology and architecture embedding quality assessment in data integration

    Martin, Nigel and Poulovassilis, Alexandra and Wang, J. (2014) A methodology and architecture embedding quality assessment in data integration. Journal of Data and Information Quality 4 (4), ISSN 1936-1955.

    [img] Text
    jdiq_1401.pdf - Author's Accepted Manuscript
    Restricted to Repository staff only

    Download (1MB) | Request a copy

    Abstract

    Data integration aims to combine heterogeneous information sources and to provide interfaces for accessing the integrated resource. Data integration is a collaborative task that may involve many people with different degrees of experience, knowledge of the application domain, and expectations relating to the integrated resource. It may be difficult to determine and control the quality of an integrated resource due to these factors. In this paper, we propose a data integration methodology that has embedded within it iterative quality assessment and improvement of the integrated resource. We also propose an architecture for the realisation of this methodology. The quality assessment is based on an ontology representation of different users’ quality requirements and of the main elements of the integrated resource. We use Description Logic as the formal basis for reasoning about users’ quality requirements and for validating that an integrated resource satisifies those requirements. We define quality factors and associated metrics which enable the quality of alternative global schemas for an integrated resource to be assessed quantitively, and hence the improvement which results from the refinement of a global schema following our methodology to be measured. We evaluate our approach through a large-scale real-life case study in biological data integration in which an integrated resource is constructed from three autononous proteomics data sources.

    Metadata

    Item Type: Article
    Additional Information: "© ACM, 2014. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in Journal of Data and Information Quality, 4(4), 2014 - http://doi.acm.org/10.1145/nnnnnn.nnnnnn"
    Keyword(s) / Subject(s): Data integration, data quality, data quality assesment
    School: Birkbeck Faculties and Schools > Faculty of Science > School of Computing and Mathematical Sciences
    Research Centres and Institutes: Innovation Management Research, Birkbeck Centre for, Bioinformatics, Bloomsbury Centre for (Closed), Structural Molecular Biology, Institute of (ISMB), Birkbeck Knowledge Lab
    Depositing User: Nigel Martin
    Date Deposited: 20 May 2014 09:13
    Last Modified: 09 Aug 2023 12:34
    URI: https://eprints.bbk.ac.uk/id/eprint/9253

    Statistics

    Activity Overview
    6 month trend
    1Download
    6 month trend
    530Hits

    Additional statistics are available via IRStats2.

    Archive Staff Only (login required)

    Edit/View Item
    Edit/View Item