Friday, January 02, 2009

Introducing SPIN: the SPARQL Inferencing Notation

With this week's release of TopBraid Composer 3.0 beta1, TopQuadrant is adding new lego bricks to the Semantic Web stack. SPIN is a collection of RDF vocabularies enabling the use of SPARQL to define constraints and inference rules on Semantic Web models. Let me give you some (technical) background on why I believe SPIN will be useful. Future postings will elaborate on use cases and example applications.

One of the main selling points of Semantic Web technology is the ability to publish domain models with executable semantics. Most Semantic Web models contain class and property definitions together with definitions of ranges, domains, OWL restrictions, OWL property types, SWRL rules etc. These formal definitions can be used by any tool that implements the underlying languages to operate on the model even if the tool does not have any hard-coded knowledge about the domain. So if I publish a Semantic Web model stating that all instances of Person can have string values for firstName then Semantic Web tools can build suitable input forms to collect instances. Or, if I include a rule that states that the age of a Person is the current date minus his or her birth day, then any Semantic Web tool can automatically compute the value of age just by executing the rules. Again, nothing needs to be hard-coded and the tool can dynamically discover what a given model is all about. This is also the foundation for various data integration and information discovery tasks.

RDF and RDF Schema only provide very limited expressivity for such definitions, and it has been (intentionally) left to higher-level languages such as OWL to provide richer modeling constructs. However, people quickly recognized that in practice OWL does not meet all requirements and use cases, so that additional languages like SWRL (and recently RIF) have been proposed. These are rule-based languages that contain constructs for IF-THEN conditions which infer new triples when a pre-condition is met in the current state of the model. These rule languages cover very important use cases and many practitioners find them quite natural to use.

Now let's get back to the use cases of rich Semantic Web languages. Typically, people use them for two different purposes:
  • constraint checking: test whether the model is in a consistent/expected state
  • deriving new values: compute implicit property values from what's stated in a model
The focus of OWL is on the latter aspect although many people seem to misunderstand its semantics or intentionally simply ignore the open-world assumption and the lack of unique name assumption to use it for constraint checking as well. But this is actually incorrect, and this misuse of OWL for these tasks indicates that other languages are required to fill in this gap.

But the quest for good modeling languages does not have to stop at OWL or SWRL - there is another well-known language in the Semantic Web space that can be used to formalize semantics: SPARQL. SPARQL is a firmly established W3C standard query language and implemented by all major Semantic Web stores on the market. SPARQL is very expressive as it provides means to define matches against almost arbitrary RDF graph patterns in the WHERE clauses. Also, many Semantic Web practitioners are already familiar with SPARQL and various query editing tools exist. Furthermore, SPARQL seems to meet the users' expectations very well with regards to things like the open-world assumption: SPARQL queries only operate on the triples mentioned in the WHERE clause - no other implicit assumptions are used at query execution time. You get what you see.

Most people know that SPARQL has the SELECT query form, but there is also the extremely useful CONSTRUCT keyword and the simple ASK keyword. The SPIN Modeling Vocabulary makes heavy use of the latter two keywords. To simplify a bit, SPIN suggests to use
  • ASK for constraint checking, and
  • CONSTRUCT for deriving new values
So, a SPIN-based ontology is a collection of classes and properties plus ASK and CONSTRUCT queries. The question then is: how can we connect those queries to the domain models? How can we store the queries together with the model in a seamless way?

In previous incarnations of SPIN (when it was not called SPIN yet), TopBraid had simply stored the SPARQL queries as strings as part of the domain model. We had used a dedicated property called sparql:query that would point from any RDF resource to a SPARQL string. This approach was of course fairly weak. Relying on a purely textual representation is error-prone, for example when someone renamed a resource the change must also be made to the query string. Also, what about the namespace prefixes used in SPARQL queries.

In order to provide a maintainable representation of SPARQL queries, SPIN defines an RDF vocabulary for storing SPARQL queries. Instead of storing an ASK query as a string, SPIN stores it as an instance of a dedicated RDF class sp:Ask etc. For example, the SPARQL query
    ASK WHERE {
?this my:age ?age .
FILTER (?age < 18) .
}
can be represented in SPIN RDF syntax in N3 format as
    [ a       sp:Ask ;
sp:where ([ sp:object sp:_age ;
sp:predicate my:age ;
sp:subject spin:_this
] [ a sp:Filter ;
sp:expression
[ sp:arg1 sp:_age ;
sp:arg2 18 ;
a sp:lt
]
])
]
This may remind some of you of SWRL, where Semantic Web rules are also triplified or OWL class expressions that look similarly complex in the RDF. The RDF syntax is not necessarily pretty but it's intended to be used by software, not humans. In the case of SPIN, editing tools like TopBraid display these constructs in human-readable SPARQL syntax on the screen. Furthermore, there is a free public web service for converting between the two SPARQL syntaxes.

But the main achievement here is that we are now able to store SPARQL expressions as part of our Semantic Web models, and use SPARQL's rich expressivity to describe the concepts from our domain. The next question is: where do we put those SPARQL expressions? In SWRL, the inference rules have global scope that are simply placed anywhere in the model (as instances of swrl:Imp). In OWL, a frame-based approach is used where logical descriptions are attached to classes using rdfs:subClassOf or owl:equivalentClass. The latter has the advantage of providing some context and scope to the rules, i.e. the ontology designer consciously attaches the pieces of domain knowledge to the classes or properties where they belong to. Inheritance similar to that from object-oriented modeling is used to re-use and specialize those definitions.

SPIN supports both approaches, i.e. rules and constraints can be either global or scoped in the context of a given class. The recommended approach is to attach SPIN declarations to classes, following object-oriented design. Similar to object-oriented languages like Java, there is a special variable called ?this which refers to the current instance. For example, assume you attach a rule that computes the age of a person from its birth day to the class my:Person. If my:Parent is a subclass of my:Person, then the rule will also be applied to all instances of my:Parent. At execution time, the variable ?this will be bound to the current instance (of either my:Person or my:Parent). Global rules are simply attached to the root classes rdfs:Resource or owl:Thing and do not mention ?this.

The result of this mechanism is that SPIN users can exploit the whole range of SPARQL features to make their domain models executable, even on the scale of the Semantic Web. The following postings will provide some examples on how to use SPIN as a rule and constraint language. To summarize where we are so far, SPIN is a very light-weight mechanism that leverages SPARQL for new application areas that go far beyond querying. But there are additional capabilities in SPIN, for user-defined functions and query templates which I will introduce in future postings as well. Please stay tuned...

0 Comments:

Post a Comment

<< Home