Friday, January 09, 2009

SPARQLpedia as an Example SPARQLMotion Web Application

TopQuadrant has recently launched SPARQLpedia, a new web service that allows users to share SPARQL queries and to search for queries that others have submitted. The submitted queries are managed on a server-side RDF database together with searchable metadata such as author and submission date. Here is a screenshot of a simple SPARQLpedia web search interface:


Pressing the Search button will display a list of matching database entries:


And the user can click on any of the search results to display details (and execute the selected query):



The underlying services of SPARQLpedia can also be called as (REST-based) web services, as described on the API page.

In a sense, SPARQLpedia is a typical web application:
  • The application's designers have prepared a database to store the entries
  • Users can add or delete entries from that database
  • Users can search for entries in the database by various criteria
  • A HTML web interface can be used to interact with the database
  • It can also be accessed programmatically via web services
In this blog article, I will give some details on how SPARQLpedia was implemented and highlight the role of SPARQLMotion as its server-side scripting language.

sparqlpedia.org hosts a standard Apache Tomcat server that runs a TopBraid Live 3.0 beta application. Installing this server was straight forward and basically included dropping the TopBraid Live war file into the tomcat applications folder. TopBraid Live itself is a generic application development framework that makes tons of RDF/OWL related services available through its APIs. There are two server-side APIs in TopBraid Live:
  1. The SPARQLMotion API is an entirely model-driven way of creating web services based on the collection of services wrapped as SPARQLMotion modules.
  2. The TopBraid Live Java API can be used to access and extend the capabilities of the server. In our example, we only needed it to add some new specialized SPARQL functions.

Setting up the Database

The first step of developing SPARQLpedia was to set up a database with an associated base schema. We are using TopBraid Composer for this purpose. The entries in the database are themselves SPARQL queries (but it's easy to translate this to databases hosting product data, academic publications, medical records or whatever). SPARQL queries are entered as strings, but the database stores them in the SPIN RDF Syntax, because this will later allow us to run sophisticated queries on various aspects of the query that would be difficult to achieve if we only had the string representation. So, as a start, we have defined an empty schema ontology that imports the SPIN (sp) namespace. The rest of our database schema is simple: we store Entries that have been submitted by Users, as illustrated in the class diagram below.



The schema is a collection of RDFS classes and RDF properties. The class spedia:Entry only has a single subclass spedia:QueryEntry, but we may want to add additional types of entries later, such as discussion threads or votes. The schema is stored in a file spedia.owl in our Eclipse workspace.

Next we create a persistent database that will contain the submitted instances. For the simplicity, we use the Sesame 2 native Java back-end in TopBraid, but we could have also used any of the other database types supported by TopBraid, including AllegroGraph and Oracle. Our Sesame database imports the spedia.owl from above and its files will be also stored in the workspace. We give it the base URI http://sparqlpedia.org/public, so that we access it later as a named graph from our SPARQLMotion scripts.


Setting up the SPARQLMotion scripts

SPARQLMotion is a visual semantic web scripting language that can be used to build data processing pipelines through a graphical user interface. Typical SPARQLMotion scripts take some input, do some processing and then create some output. TopBraid Composer 3, Maestro Edition is used to build SPARQLMotion scripts, so everything we do (from schema definition and database maintenance to the implementation of the services) is done within a single uniform environment. Let's have a look at an example script, the outline of which is shown in the following screenshot.


This SPARQLMotion script (stored in an OWL file deleteQuery.sms.n3 in the workspace) implements the functionality to delete a query from the repository. The script takes two arguments as input:
  • The uri identifying the query that shall be deleted
  • The password of the submitting user - in SPARQLpedia only the original author of a query can also delete it
SPARQLMotion scripts should be laid out (and read) from top to bottom, i.e. you see input coming in from the top, then the input will be processed through a pipeline and finally some results are returned. Each node in the diagram is of a certain module type, and the SPARQLMotion modules library provides a comprehensive list of frequently needed data processing tasks. The deleteQuery script has two exit points, marked by the two red icons at the bottom:
  • The left end module is used when the script is called via the web service API and just returns the string "OK" as its result.
  • The right end module is used when the script is called to render an HTML page.
Both end modules have the same type sml:ReturnText but return different mime types. The rest of the script is the same in both cases, i.e. the arguments and the steps to perform the actual deletion are used independently of whether we use the web service API or the HTML call. The web services themselves are declared as subclasses of spin:Functions as shown below:


The class spedia:deleteQueryBase is an "abstract" base class of the two different services and defines the arguments with the usual SPIN function syntax:


The two non-abstract subclasses of deleteQueryBase "inherit" the argument declarations but also point to the SPARQLMotion module in the script that creates the result. For example, the web service deleteQueryHTML has the module ReturnHTML as its return module:



Once the function has been stored in the workspace, it is accessible through a REST URL call, such as http://sparqlpedia.org:8080/tbl/server/tbl/servlet?action=sparqlmotion&id=deleteQuery&uri=...&password=... At development time, we can run the same script within TopBraid Composer ME by hitting localhost:8083 instead. Or we can debug the script manually through the debug button in the graph view of TBC. This has the benefit that we can look at each intermediate step of the script and inspect the state of the triple store and variable bindings with a few mouse clicks.


A Closer Look at the SPARQLMotion Script

Let's walk through the example script from above. The script starts with two Argument modules, which are placed automatically by TopBraid based on the function definition. Technically, these are the same instances of spl:Argument as shown as spin:constraints on the deleteQueryBase class. Here is a screenshot of the form for the password argument:


Each argument can declare a value type that can be used for error checking, and will be used to transform the REST arguments (always strings) into the correct kind of RDF literals or resources. All downstream modules of the SPARQLMotion script can now access the value of the password argument as a SPARQL variable called ?argument. The other argument of the service is accessible as ?uri. The next module Check entry exists is of the type sml:AssertTrue (new in 3.0) and simply verifies that the provided URI is in fact a valid entry in the database:


When this module is executed, the specified ASK query will be executed. If the query returns false, the script will exit with an error message. The error message in constructed from the template given as sml:text, in which {?uri} is substituted with the actual argument value. But wait, which triple store does this query run on? If you scroll up to the script's overview you can see that the Check entry exists module also has the Connect to DB module as one of its predecessors. As usual in SPARQLMotion, the triples represented by the predecessors will be visible in the queries downstream. As shown in the following picture, Connect to DB just opens the Sesame database:


The module above connects to the database via its base URI, so we could later replace it with some other kind of database with the same base URI. Just in case our Sesame DB would explode in size... Ok, by now we have reached the stage where we have verified that the provided ?uri is in fact a valid instance of spedia:QueryEntry in our database. Next, let's validate the password. Another sml:AssertTrue module is used for that purpose as shown below.



They key aspect of the Check password module is an ASK query that gets the spedia:User object that has submitted the query with the given ?uri and then checks whether this user has the provided ?password. It throws an error if the password in the database does not match. Once all those tests are passed, the script can go on with the actual delete. The Delete entry module is an instance of sml:PerformUpdate that runs a SPARQL update call deleting all triples that have the given ?uri as their subject. Now that the query is gone the script forks, depending on how it was called. Assuming the script was called to return an HTML page, it continues with the right branch and the module Return delete query result will be ignored. The resulting HTML page will look like the following:



In order to produce this HTML page, the end result module uses a template string with the basic HTML outline. Only the URI string is different each time, the rest is static. The Return HTML module looks like the following.



The HTML page itself is encoded as a template into which {?uriString} and {?footer} will be inserted at execution time. An alternative way of creating HTML pages from a template is via the Semantic JSP support in TopBraid and SPARQLMotion (not shown here).

The outcome is then sent back to the client using text/html mime format. The footer is re-used in several HTML pages in our system, so we import it from a file:



To complete the script, a small detail is that we want to display the ?uri as a full string and therefore insert a string conversion module before we insert it into the template:



That's it! The script is now finished and ready to be used, assuming the deleteQuery.sms.n3 file has been uploaded to the TopBraid Liver server's workspace.

The other services are also implemented using SPARQLMotion scripts:
  • submitQuery takes a query string, comment, user name and password as arguments and uses SPARQL INSERT queries to insert those as QueryEntries into the database.
  • findQueries takes a namespace, resource URIs or a user name as arguments and then runs a SPARQL CONSTRUCT query to create an RDF response. This response is optionally rendered into a HTML table. To gain best performance, the SPARQL query string is assembled dynamically based on the input from various clause templates.
  • renderQuery takes a query URI and creates a pretty HTML page from it.

Summary

This example shows how the TopBraid Live platform and its SPARQLMotion support can be used to implement scalable public web services and HTML-based internet applications. SPARQLMotion can be used to define almost arbitrary REST-based web services. Deployment of those services together with the ontologies and triple stores they operate on is fairly simple. The scripts and the ontologies can be defined and tested using TopBraid Composer ME.

Please be aware that the approach shown here covers just one aspect of the TopBraid Live platform. Another approach for developing user interfaces is via TopBraid Ensemble. Version 3.0 (coming soon) is a complete framework for building rich Flex-based interfaces from configurable components. More on this some other day...

0 Comments:

Post a Comment

<< Home