# Taming the infinite chase: query answering under expressive relational constraints

Calì, Andrea and Gottlob, G. and Kifer, M.
(2008)
Taming the infinite chase: query answering under expressive relational constraints.
In:
Baader, F. and Lutz, C. and Motik, B. (eds.)
*Proceedings of the 21st International Workshop on Description Logics (DL2008), Dresden, Germany, May 13-16, 2008.*
CEUR Workshop Proceedings 353.
CEUR Workshop Proceedings.

## Abstract

Answering queries posed over knowledge bases is a central problem in knowledge representation and database theory. In databases, query containment is one of the important query optimization and schema integration techniques [1,12, 16]; in knowledge representation, it has been used for object classification, schema integration, service discovery, and more, in particular in the area of descripti on logics [6,14]. Results on practical instances of the general problem were stud- ied in [12], followed by [5,7,2,4,13]. In particular, [5] and [7] deal respectively with query containment and efficient query answering under expressive descrip- tion logic constraints, that can express several construct used in conceptual data modeling; [2] and [4] address query containment under constraints derived respectively from entity-relationship and object-oriented formalisms. The com- plexity of reasoning tasks on complex constraints based on answer set programs has been investgated in [19]. The problem of query containment is strictly rela ted to that of answering queries over knowledge bases; indeed, the two are mutually reducible; we focus on the former, and our results immediately extend to the latter. In our work, rather than focusing on specific logical theories, we analyze the fundamental difficulty that underlies earlier approaches, such as [12,2,4]. They all considered special classes of so-called tuple-generating dependencies (TGDs) and equality-generating dependencies (EGDs) , all used the technique called chase , and all faced the problem that the chase generates infinite re- lations, and query answering and containment are undecidable under general TGDs and EGDs. The chase [15,12] is a procedure that “repairs” violations o f TGDs and EGDs, until a fixed-point is reached; it has been used in many works in data exchange [9,11,17]; the chase is also a form of tableau , and it has been successfully applied in terminological reasoning based on description logics [7, 18]. We carve out a significantly larger classes of TGDs, with also the addi tion of EGDs. Notice that TGDs and EGDs are able to express most description logic constructs used in data modeling [7]. In particular, we first define the notions of sets of guarded TGDs (GTGDs) and of weakly guarded TGDs (WGTGDs) . A TGD is guarded if its body contains an atom called guard that covers all variables occurring in the body. Weakly guarded TGDs are a generalization of guarded TGDs that require guards to cover only variables occurring at affected positions, i.e., positions in predicates that may contain fresh labelled nulls generated during the chase. The notion of guard is crucial, since query evaluation becomes undecidable once we allow the presence of a single non-guarded TGD. Our main contribution lies in the complexity bound for query evaluation under WGTGDs and GTGDs. We show that the complexity of query evaluation (and, equivalently, of query containment) under WGTGDs is exptime -hard, in case of a fixed set of TGDS, and 2- exptime -hard in case the TGDs are part of the input. As for upper bounds, let us first remark that we cannot (as one may think at the first glance) directly or easily use known results on guarded logics [10] to derive complexity results for query evaluation, since queries are in general non- guarded. We therefore develop new algorithms, and prove that query answering is exptime -complete in case of bounded predicate arities, and even in case the set of WGTGDs is fixed, and is 2- exptime complete in general. The proof of the upper bound is based on an alternating algorithm that mimicks the chase by using a finite number of configurations: each of them corresponds to what we call the cloud of one atom a , i.e., the set of atoms in the chase whose arguments either appear in a or in the “active domain” of the input database instance. Then, we derive complexity results for reasoning with GTGDs. While in the general case the complexity is the same as for WGTGDs, interestingly, when reasoning with a fixed set of dependencies (which is the usual setting in data ex- change and in description logics), we get much better results: evaluating Boolea n queries is np -complete (same complexity of answering without constraints [8]), and in ptime in case the query is atomic. Our results subsume the results of [12] on IDs alone as a special case. Furthermore, we describe a semantic condition, called Polynomial Clouds Criterion (PCC) , imposing that the number of clouds it generates during a chase is polynomial in the size of the input database instance, and the cloud of each generated atom can be obtained in polynomial time from the cloud of the atom from which it was generated in the chase. Whenever a set of WGTDs fulfills the PCC, then answering Boolean queries is in np , and answering atomic queries, as well as queries of bounded treewidth, is in ptime . Finally, we introduce EGDs together with WGTGDs: we define a class of innocuous EGDs, that have the property that they can be ignored in the query answering phase, since they do not actually interact with TGDs. With the above results, we subsume both the main decidability and complex- ity result in [12], and decidability and complexity results on F-logic lite [1 3] as special cases, and we are actually way more general. We also show that F-logi c Lite [4], a meaningful fragment of F-Logic [13], can be handled by our approac h. Details about the aforementioned results, including proofs, can be found in the technical report [3].

## Metadata

Item Type: | Book Section |
---|---|

Additional Information: | Series ISSN: 1613-0073 |

School: | School of Business, Economics & Informatics > Computer Science and Information Systems |

Research Centres and Institutes: | Birkbeck Knowledge Lab |

Depositing User: | Administrator |

Date Deposited: | 11 Jun 2013 09:04 |

Last Modified: | 02 Dec 2016 13:22 |

URI: | https://eprints.bbk.ac.uk/id/eprint/7430 |

## Statistics

Additional statistics are available via IRStats2.