Dimartino, Mirko (2020) Integrating and querying linked datasets through ontological rules. PhD thesis, Birkbeck, University of London.
|
Text
thesis.pdf - Full Version Download (1MB) | Preview |
Abstract
The Web of Linked Open Data has developed from a few datasets in 2007 into a large data space containing billions of RDF triples published and stored in hundreds of independent datasets, so as to form the so called Linked Open Data Cloud. This information cloud, ranging over a wide set of data domains, poses a challenge when it comes to reconciling heterogeneous schemas or vocabularies adopted by data publishers. Motivated by this challenge, in this thesis was address the problem of integrating and querying multiple heterogeneous Linked Data sets through ontological rules. Firstly, we propose a formalisation of the notion of a peer-to-peer Linked Data integration system, where the mappings between peers comprise schema-level mappings and equality constraints between different IRIs; we call this formalism an RDF Peer System(RPS). We show that the semantics of the mappings preserve tractability of answering Basic Graph Pattern (BGP) SPARQL queries against the data stored in the RDF sources and the set of constraints given by the RPS mappings. Then, we address the problem of SPARQL query rewriting under RPSs and we show that it is not possible to rewrite an input BGP SPARQL query into a SPARQL 1.0 query under general RPSs, as the RPS peer mappings are not first-order-rewritable rules; this is a major drawback of general RPSs since data materialisation is required to exploit their full semantics. With the adoption of the more recent standard SPARQL 1.1 and its property paths we are able to extend the expressivity of the target language beyond first-order by including regular expressions in the body of the target SPARQL queries, that is, by expressing conjunctive two-way regular path queries (C2RPQs). Following this idea, in the second part of the thesis we step away from the language of RPSs to conduct a study on C2RPQ-rewritability under a broader ontology language. We define [ELHI`inh] (harmless linear ELHI), an ontology language that generalises both the DL-Lite[R] and linear ELH description logics. We prove the rewritability of instance queries (queries with a single atom in their body) under [ELHI`inh] knowledge bases with C2RPQs as the target language, presenting a query rewriting algorithm that makes use of non-deterministic finite-state automata. Following from that, we propose a query rewriting algorithm for answering conjunctive queries under [ELHI`inh] knowledge bases, with C2RPQs as the target language. Since C2RPQs can be straightforwardly expressed in SPARQL 1.1 by means of property paths, we believe that our approach is directly applicable to real-world querying settings. Lastly, we undertake a complexity analysis for query answering under [ELHI`inh]. We analyse the computational cost of query answering in terms of both data complexity (where the ontology and the query are fixed and the data alone is a variable input)and combined complexity (where query, ontology and data all constitute the variable input). We show that answering instance queries under [ELHI`inh] is NLogSpace-complete for data complexity and in PTime for combined complexity; we also show that answering CQs under [ELHI`inh] is NLogSpace-complete for data complexity and NP-complete for combined complexity.
Metadata
Item Type: | Thesis |
---|---|
Copyright Holders: | The copyright of this thesis rests with the author, who asserts his/her right to be known as such according to the Copyright Designs and Patents Act 1988. No dealing with the thesis contrary to the copyright or moral rights of the author is permitted. |
Depositing User: | Acquisitions And Metadata |
Date Deposited: | 26 Apr 2022 15:35 |
Last Modified: | 01 Nov 2023 15:27 |
URI: | https://eprints.bbk.ac.uk/id/eprint/48112 |
DOI: | https://doi.org/10.18743/PUB.00048112 |
Statistics
Additional statistics are available via IRStats2.