BIROn - Birkbeck Institutional Research Online

Learning to integrate web taxonomies

Zhang, Dell and Lee, W.S. (2004) Learning to integrate web taxonomies. Journal of Web Semantics 2 (2), pp. 131-151. ISSN 1570-8268.

Full text not available from this repository.

Abstract

We investigate machine learning methods for automatically integrating objects from different taxonomies into a master taxonomy. This problem is not only currently pervasive on the Web, but is also important to the emerging Semantic Web. A straightforward approach to automating this process would be to build classifiers through machine learning and then use these classifiers to classify objects from the source taxonomies into categories of the master taxonomy. However, conventional machine learning algorithms totally ignore the availability of the source taxonomies. In fact, source and master taxonomies often have common categories under different names or other more complex semantic overlaps. We introduce two techniques that exploit the semantic overlap between the source and master taxonomies to build better classifiers for the master taxonomy. The first technique, Cluster Shrinkage, biases the learning algorithm against splitting source categories by making objects in the same category appear more similar to each other. The second technique, Co-Bootstrapping, tries to facilitate the exploitation of inter-taxonomy relationships by providing category indicator functions as additional features for the objects. Our experiments with real-world Web data show that these proposed add-on techniques can enhance various machine learning algorithms to achieve substantial improvements in performance for taxonomy integration.

Metadata

Item Type: Article
School: Birkbeck Faculties and Schools > Faculty of Science > School of Computing and Mathematical Sciences
Depositing User: Sarah Hall
Date Deposited: 15 Nov 2021 14:31
Last Modified: 09 Aug 2023 12:52
URI: https://eprints.bbk.ac.uk/id/eprint/46727

Statistics

6 month trend
0Downloads
6 month trend
144Hits

Additional statistics are available via IRStats2.

Archive Staff Only (login required)

Edit/View Item
Edit/View Item