Tuesday, September 05, 2006

Ontology Mapping with SPARQL CONSTRUCT

Ontology mapping is regarded as one of the key technologies for data integration, for example to mediate between databases that have different but similar schemas. A lot of papers have been published on the topic (see a State of the Art paper from 2005).

I am not an expert in this topic, but most approaches that I have seen so far seem to employ specialized mapping ontologies that define bridges, for example to map a property "name" in a source ontology into property "lastName" in a target ontology. Mapping engines are then needed to interpret the mapping rules. I think a lot of research prototypes exist, but I don't think the Semantic Web community has reached any conclusive standard mapping ontology or implementations beyond prototypes yet.

We are running some very promising experiments with using SPARQL for ontology mapping. SPARQL is best known as the upcoming W3C standard query language for RDF, but few people notice that beside its SELECT command, SPARQL also defines a CONSTRUCT keyword. The input of a CONSTRUCT query is a WHERE clause describing a pattern in a source model, including variable definitions. The output is an RDF graph that inserts all matching variable bindings into a target graph template.

Here is an example screenshot from TopBraid Composer's new SPARQL visualization (click for a larger image). In this example, a SPARQL query is used to convert instances of a source:Person class into target:Persons. To make this example more interesting, the string values of source:car are converted into instances of a class target:Car (that's why the query looks so scary).

For example, if you have a set of source instances Bob and Alice

then the output is a new subgraph in the target model, but with the cars as objects instead of strings:

The trick is that CONSTRUCT generates new triples, and these triples can be treated as "inferences" and added to the target model. TopBraid's SPARQL window displays what happens under the hood (the screenshot actually shows a different version of the query from above):

As I said I am not an expert on ontology mapping and therefore don't want to comment whether this approach is better than other ontology mapping tools. However, it seems to me that the popularity of SPARQL and the large number of tools that support SPARQL make this a very promising idea. We may assume that in the near future most Semantic Web developers will know SPARQL and therefore don't need to learn any other "mapping ontology". Also, SPARQL is supported by optimized query engines, and SPARQL is fairly expressive with regards to query filters etc. And if the default expressivity is not enough, you still have property functions.

I guess a lot of more research can go into this idea, and lots of new papers could be written, for example on typical design patterns (such as a "property bridge" pattern), how to edit SPARQL visually and how to shape future editions of the SPARQL standard to meet the ontology mapping use case best. For example, it appears to be impossible to create new URIs for the resources in the target ontology - only bnodes can be created on the fly. We therefore added a simple post-processor that uses the rdfs:label to create suitable URIs.

Given the fact that CONSTRUCT queries can create or infer new triples, it may also be worth investigating whether SPARQL could serve as a rule language, similar to SWRL.

As a side effect of these new features (and its existing support to import relational databases, UML, XML Schema and Excel, and to operate on Jena, Oracle and Sesame databases), TopBraid Composer is increasingly becoming a data and knowledge integration platform and is no longer "just" an ontology editor.

8 Comments:

At 2:04 AM, Anonymous Kasper van den Berg said...

Thanks for bringing the possibility of using SPARQL to integrate ontologies to my attention. I have some questions about it.

What about the persistence or accessibility of SPARQL-queries?

When someone creates an adaptor ontology for mapping two other ontologies, the adaptor can be published on the web and reused by other persons. I use SPARQL-queries on a more ad-hoc basis (of course i store them in comment fields within relevant OWL classes, properties, and instances ;) ).
Can SPARQL-queries be published to be reused by others?

When an OWL-file is used to integrate two (or more) ontologies the OWL reasoner preforms the necessary inferences. When a set of SPARQL queries is used the SPARQL engine resolves the queries and the OWL reasoner preforms the inferences. The query engine might depend on inferences and new inferences might be possible after the query results are added; so the order of executing the two engines becomes important infinite loops might be the result.
What are your ideas about this? Is this a problem? Or has it been solved already?

But I agree that using the CONSTRUCT keyword of SPARQL queries might allow better ontology mapping. It offers possibilities in
other areas
as well. It is certainly worth looking into.

 
At 11:01 AM, Blogger Holger Knublauch said...

While not perfect, I think attaching queries into your ontology is a suitable work-around. We do this in Composer using a dedicated sparql:query property. Storing them as strings has some limitations however, and a more structured representation would be better.

In an ideal world, SPARQL could have an RDF Schema, so that the queries themselves become ontologies. Unfortunately, creating such an RDF Schema is non-trivial and the outcome rather messy, because the order of things is very important and you would need to use nested rdf:Lists to represent this.

On your second question I guess in ontology mapping scenarios you will not need to iterate in cycles, because the additional triples are all added to the source ontology (unless you have matching patterns that also use concepts from the target ontology).

But more general, if you regard SPARQL's CONSTRUCT as a rule language, then some way of chaining rules similar to RETE algorithm will be needed. I think this should be very well possible and would be an interesting research topic. In a trivial approach you would just iterate over all rules as long as one of them adds new information.

But you are right - clearly any CONSTRUCT situation with recursive triple additions could lead to infinite loops.

By the way, it may be worth moving such discussions into the jena-dev list, where more SPARQL experts hang around.

 
At 11:07 AM, Blogger Holger Knublauch said...

Ooops - small but important typo: "On your second question I guess in ontology mapping scenarios you will not need to iterate in cycles, because the additional triples are all added to the target ontology..."

 
At 2:00 PM, Blogger jeff said...

I'm having trouble figuring out how such a scheme would work if each person drove multiple cars. The Construct works great if for each thing you find in the select you are going to create only one of each type of resource in the target.

Could you expand on how you would handle the single person/multiple car case?

Thanks,
Jeff

 
At 2:12 PM, Blogger Holger Knublauch said...

It works exactly like you see in the diagram: The CONSTRUCT part is executed for each match of the WHERE clause, i.e. the variable ?carName has multiple bindings and all of them are mapped into instances of target:Car.

If you are trying to reproduce this in Jena yourself, you need to do some post-processing though. As I said (somewhere), we convert blank nodes created from the CONSTRUCT to URI nodes, if the blank node has an rdfs:label. This label is taken as the local name of the new resource. This means that if you have two different blank nodes as a result of two matching CONSTRUCT instantiations, the system will still use the "same" URI node for the target:Person. I can send you the example files if you want to analyze them in TBC (Please contact me off-blog).

 
At 4:21 PM, Blogger jeff said...

Yes, the problem I was seeing was getting a new and different person node generated for each car that a single person owned. So if I understand, you suggest taking care of that in post-processing, converting the different blank nodes to nodes named after their labels, which only change if the person is actually different, right?

If would be nice if you could somehow do that in SPARQL directly, perhaps using nested CONSTRUCT's, where for each match in the outer CONSTRUCT the inner CONSTRUCT is invoked. That way the outer could just select people, and the inner could select the person's cars. Any chance nested CONSTRUCTS could become part of the language?

 
At 4:33 PM, Blogger Holger Knublauch said...

I am afraid that SPARQL related questions are better handled by the SPARQL experts, for example on the jena-dev mailing list. I believe it is currently not easy to dynamically generate URI nodes at query time, and therefore I added this work-around. I agree this is not a perfect solution, but at least it allows our users to get work done, until the languages and libraries catch up.

 
At 10:51 AM, Blogger jeff said...

I think I've got a little bit different way of getting around the "one to many" case that I mentioned earlier. Instead of using b-nodes in my Construct query, I re-use the source variables from the SELECT part of the query in a query such as the following(using the pizza ontology):

CONSTRUCT{ ?inst a pizzaTarg#Pizza .
?top a pizzaTarg#Topping .
?inst pizzaTarg#hasTopping ?top }
WHERE { $inst a pizza#Pizza .
$inst pizza#hasTopping ?top . }

This creates target ontology instances in the same ratio of pizzas to toppings that exists in the source, as opposed to creating a new pizza b-node for every topping and ending up with different b-nodes that represent the same pizza resource. This also guarantees that the instance resource IDs will be unique. I'm not sure you could say that if you used just the source label of each resource (i.e. what if there were two different Alice's?).

Of course this does use the same names for the resources in two different ontologies, but a perhaps simpler post processing script could be run to change the resource names from having a pizza prefix to a pizzaTarg prefix.

 

Post a Comment

<< Home