BIROn - Birkbeck Institutional Research Online

    GORDIAN: efficient and scalable discovery of composite keys

    Sismanis, Y. and Brown, Paul and Haas, P.J. and Reinwald, B. (2006) GORDIAN: efficient and scalable discovery of composite keys. In: Dayal, U. and Whang, K.Y. and Lomet, D.B. and Alonso, G. and Lohman, G.M. and Kersten, M.L. and Kim, Y.-K. (eds.) Proceedings of the 32nd International Conference on Very Large Data Bases. Association for Computing Machinery, pp. 691-702.

    Full text not available from this repository.

    Abstract

    Identification of (composite) key attributes is of fundamental importance for many different data management tasks such as data modeling, data integration, anomaly detection, query formulation, query optimization, and indexing. However, information about keys is often missing or incomplete in many real-world database scenarios. Surprisingly, the fundamental problem of automatic key discovery has received little attention in the existing literature. Existing solutions ignore composite keys, due to the complexity associated with their discovery. Even for simple keys, current algorithms take a brute-force approach; the resulting exponential CPU and memory requirements limit the applicability of these methods to small datasets. In this paper, we describe GORDIAN, a scalable algorithm for automatic discovery of keys in large datasets, including composite keys. GORDIAN can provide exact results very efficiently for both real-world and synthetic datasets. GORDIAN can be used to find (composite) key attributes in any collection of entities, e.g., key column-groups in relational data, or key leaf-node sets in a collection of XML documents with a common schema. We show empirically that GORDIAN can be combined with sampling to efficiently obtain high quality sets of approximate keys even in very large datasets.

    Metadata

    Item Type: Book Section
    School: Birkbeck Faculties and Schools > Faculty of Science > School of Computing and Mathematical Sciences
    Depositing User: Sarah Hall
    Date Deposited: 23 Feb 2021 19:45
    Last Modified: 09 Aug 2023 12:50
    URI: https://eprints.bbk.ac.uk/id/eprint/43158

    Statistics

    Activity Overview
    6 month trend
    0Downloads
    6 month trend
    75Hits

    Additional statistics are available via IRStats2.

    Archive Staff Only (login required)

    Edit/View Item Edit/View Item