BHUNT: automatic discovery of fuzzy algebraic constraints in relational data
Brown, Paul and Haas, P.L. (2003) BHUNT: automatic discovery of fuzzy algebraic constraints in relational data. In: Freytag, J.C. and Lockemann, P.C. and Abiteboul, S. and Carey, M.J. and Selinger, P.G. and Heuer, A. (eds.) VLDB 2003: Proceedings of 29th International Conference on Very Large Data Bases. Morgan Kaufmann, pp. 668-679. ISBN 9780127224428.
|
Text
BHUNT.pdf - Published Version of Record Download (191kB) | Preview |
Abstract
We present the BHUNT scheme for automatically discovering algebraic constraints between pairs of columns in relational data. The constraints may be “fuzzy” in that they hold for most, but not all, of the records, and the columns may be in the same table or different tables. Such constraints are of interest in the context of both data mining and query optimization, and the BHUNT methodology can potentially be adapted to discover fuzzy functional dependencies and other useful relationships. BHUNT first identifies candidate sets of column value pairs that are likely to satisfy an algebraic constraint. This discovery process exploits both system catalog information and data samples, and employs pruning heuristics to control processing costs. For each candidate, BHUNT constructs algebraic constraints by applying statistical histogramming, segmentation, or clustering techniques to samples of column values. Using results from the theory of tolerance intervals, the sample sizes can be chosen to control the number of “exception” records that fail to satisfy the discovered constraints. In query-optimization mode, BHUNT can automatically partition the data into normal and exception records. During subsequent query processing, queries can be modified to incorporate the constraints; the optimizer uses the constraints to identify new, more efficient access paths. The results are then combined with the results of executing the original query against the (small) set of exception records. Experiments on a very large database using a prototype implementation of BHUNT show reductions in table accesses of up to two orders of magnitude, leading to speedups in query processing by up to a factor of 6.8.
Metadata
Item Type: | Book Section |
---|---|
Depositing User: | Sarah Hall |
Date Deposited: | 02 Mar 2021 16:32 |
Last Modified: | 27 Jun 2021 10:54 |
URI: | https://eprints.bbk.ac.uk/id/eprint/43258 |
Statistics
Additional statistics are available via IRStats2.