Semantic Labeling: A domain-independent approach

Published in ISWC2016 The 15th International Semantic Web Conference, 2016

[Paper] | [Github]

Abstract

Semantic labeling is the process of mapping attributes in data sources to classes in an ontology and is a necessary step in hetero- geneous data integration. Variations in data formats, attribute names and even ranges of values of data make this a very challenging task. In this paper, we present a novel domain-independent approach to auto- matic semantic labeling that uses machine learning techniques. Previous approaches use machine learning to learn a model that extracts features related to the data of a domain, which requires the model to be re-trained for every new domain. Our solution uses similarity metrics as features to compare against labeled domain data and learns a matching function to infer the correct semantic labels for data. Since our approach depends on the learned similarity metrics but not the data itself, it is domain- independent and only needs to be trained once to work e ectively across multiple domains. In our evaluation, our approach achieves higher accu- racy than other approaches, even when the learned models are trained on domains other than the test domain.