A First Look at the Kowari Triplestore
by Paul Ford
June 23, 2004
Kowari is an open-sourced (Mozilla Public License) triplestore optimized for
RDF storage, created by Tucana Technologies,
and written entirely in Java 1.4.2. It began its life as the storage component
of the Tucana Knowledge Server (TKS), Tucana's proprietary knowledge management
suite, and remains under active development by Tucana.
Installation
Kowari is named for a small, mouselike Australian mammal, but given that the
full version of the software is a 40+ meg download, and includes a host of open-sourced
Java components (including Apache's SOAP implementation, the Jetty web server, and
the Lucene search engine), a better name might be "platypus". In fairness, a
"Lite" version of the software is also available, at about 14 megs, which includes
two *.jar files, one to run the server, and the other to run a console.
This simplicity of installation and operation is quite welcome. Most of the
available open-sourced triplestores currently require either compilation, or
the installation of a relational database like PostgreSQL for persistence, or
are reliant on a host programming language like Perl or Python. In contrast,
Kowari's installation is a snap (if your machine has Java 1.4 installed)-- download,
unpack, and run. On launch, Kowari sets up a web server, on port 8080 (the port
number can be configured), which contains a number of useful resources.
A key component in Kowari's bag is a simple console app that allows for
direct interaction with the server using Tucana's own SQL-like query language,
iTQL. While most applications will end up calling the database via an external
program, this easy install allows you to quickly get a feel for the product,
and provides an easy way to perform common DBA-like tasks.
A Demo iTQL Session
Below, we'll use that console interface to create a database, populate it with
an RDF file that describes United States senators, and query that data. A sample
chunk of our RDF:
<USSenator rdf:about="http://xml.com/example/LiebermanJoseph">
<Name>Lieberman, Joseph</Name>
<Party>Democrat</Party>
<State>CT</State>
<URI>http://lieberman.senate.gov</URI>
</USSenator>
First, we create a database on localhost (127.0.0.1) named "Senators". Kowari
uses Java RMI URIs to identify databases.
iTQL> create <rmi://127.0.0.1/server1#Senators>;
Our next command will load senators.rdf into that just-created database.
iTQL> load <file:///C:/rdf/senators.rdf>
into <rmi://127.0.0.1/server1#Senators>;
Kowari allows for aliases to be declared and used in a way akin to namespaces.
iTQL> alias <http://xml.com/example/> as ex;
iTQL> alias <http://tucana.org/tucana#> as kowari;
That first alias allows us to abbreviate the namespace of our senatorial RDF
in all further queries. The second alias is a convenience abbreviation for the
"is" equivalency operator built into Kowari, which we'll use below.
Now that we're initiated, propagated, and aliased, we can query the triplestore.
The query below selects all senators and their party afilliations.
iTQL> select $subj $obj
from <rmi://127.0.0.1/server1#Senators>
where $subj <ex:Party> $obj;
Here's what's happening: the "where" clause in the select statement defines constraints on the triplestore. In the example above, our "where" clause asks for all triples that have a predicate equal to ex:Party (which is an alias for
http://xml.com/example/Party).
The output of the query above is a list of the 100 URIs making up the Senate,
and their party affilliations:
[ http://xml.com/example/AkakaDaniel, "Democrat" ]
[ http://xml.com/example/BaucusMax, "Democrat" ]
[ http://xml.com/example/BayhEvan, "Democrat" ]
[ http://xml.com/example/BidenJoseph, "Democrat" ]
...
What if we only want to list Democrats? Using Kowari's built-in equivalency
operator, <kowari:is> (aliased above), we can match string literal values.
iTQL> select $subj $obj
from <rmi://127.0.0.1/server1#Senators>
where $subj <ex:Party> $obj
and ($obj <kowari:is> 'Democrat');
Now we'll use more than one constraint in the where clause, and return more
columns in our results. The query below names the different kinds of subjects
and objects we expect, in order to allow us to list the name, web address (URI),
and party affilliations for the senators from Connecticut (CT).
iTQL> select $name $uri $party
from <rmi://127.0.0.1/server1#Senators>
where $senator <ex:Name> $name
and $senator <ex:URI> $uri
and $senator <ex:Party> $party
and $senator <ex:State> $state
and $state <kowari:is> 'CT';
order by $name;
Our output:
[ "Dodd, Christopher", "http://dodd.senate.gov", "Democrat" ]
[ "Lieberman, Joseph", "http://lieberman.senate.gov", "Democrat" ]
And one final example:
iTQL> create <rmi://127.0.0.1/server1#feeds>
iTQL> load <http://www.oreillynet.com/meerkat/?_fl=rss10&t=ALL&c=47>
into <rmi://127.0.0.1/server1#feeds>;
iTQL> select $uri $title
from <rmi://127.0.0.1/server1#feeds>
where $uri <http://purl.org/rss/1.0/title> $title;
The code above creates a database called "feeds", populates it with the most
recent site summary XML from O'Reilly/XML.com; and, then, in response to a query,
lists the URIs and titles of each article, that is, the bare bones of a queryable RSS aggregator in a few lines of iTQL.
As shown above, iTQL's syntax looks quite a bit like SQL and is clearly intended
to make transitioning to Kowari as simple as possible for DBAs. XML hackers
used to the brevity of XPath might be less accepting, however.
The iTQL console is one of several interfaces to the server. Access methods
exist for JSP, SOAP, a JDBC driver, as well as for an iTQL JavaBean and Kowari's
own low-level driver interface
Other Features
Three other features worth noting are Lucene full-text integration, descriptors,
and named graphs.
Lucene full-text integration. RDF is not simply triples made up of URIs;
in practice, much RDF (as in the examples above) contains string literal or
XML data. Kowari can use the open-sourced Lucene search engine to index this
text.
To use Lucene indexing, the DBA creates a separate database using the Lucene
"model". Queries can then be constrained by the results returned from a Lucene
search. In practice, this allows for searches that keep track of the source
of a given token within a graph. In simple English, Lucene integration allows
queries like: "select all articles where the title includes the words 'hacking'
and 'library'," or "show me the publication dates of all books that contain
the word 'Texas'." Lucene allows for basic keyword lookups as well as complex
queries, including fuzzy matching and wildcards, and its presence in the database
provides Kowari users with an appealing combination of Semantic Web-style, graph-based
querying with old-school text lookups.
Descriptors. Descriptors bind iTQL commands to XSLT variables. Using
descriptors, a developer can create an XSLT template and then populate it, dynamically,
with values fetched from an iTQL query. This feature will be of particular
interest to web developers who want to create custom, navigable web interfaces
above large RDF stores, along with anyone who wants to convert RDF data into
legacy XML formats. (Descriptors are not included in Kowari Lite.)
Named graphs. One problem that frequently comes up in the RDF community
is the "provenance" problem -- how do we know, in a large triplestore, where a
given triple comes from? Many have suggested named graphs as a solution,
which will turn triples into "quads". Kowari has taken this path. According
to Tom Adams, "Our triplestore is really a quad store, the 4th tuple being the
group/model that a triple belongs to."
Benchmarks
Kowari is written in Java 1.4.2 and uses that version's New I/O (NIO). This
provides for an decrease in access times, as Kowari is able to bypass the need
for a storage layer (such as BerkeleyDB or MySQL), and write data blocks directly
to disk.
Tucana has tested the 32-bit version of Kowari with 10 million statements,
and the 64-bit with 50 million; TKS has been scaled up to 250 million statements
and can conceivably manage a billion triples. Currently the software is
used by a variety of clients, with applications in genomics research defense
integration, and automobile manufacturing, and the firm reports dramatically
increased performance on graph queries over relational databases.
What's Next for Kowari?
While Kowari is capable of doing real work today, Tucana plans to continue
adding features to both make the triplestore more standards compliant. Inferencing
support via OWL support is planned, and Tucana hopes to eventually support OWL
DL, with stops at RDFS and "OWL Tiny" along the way. Support for arbitrary data
types is also planned.
Tucana is also developing a new approach to file addressing, which they call
a "resolver". Resolvers allow any resource to be assigned a special "file://"
URI, and allow for the processing of arbitrary files as "pseudo-RDF". For instance,
a resolver that points to an MP3 file can automatically extract and store a
description of the file based on the ID3 tags embedded in the MP3; the same
could be done with JPEG files containing metadata. This approach seems particularly
interesting because it provides a simple way to absorb the "ambient data" on
a computer -- unstructured content like photo and MP3 directories -- into a database,
where it can be searched and explored.
Kowari Caveats and Conclusions
Kowari is a solid tool created by an enthusiastic, knowledgeable team. That
said, it's not for everyone: the architecture of the application is clearly
focused on the server, and developers looking for an embeddable RDF store for
desktop apps will likely want to look elsewhere, unless they are willing to
add several megs to their applications. Kowari's dependence on Java is another
possible sticking point for those developing tools using other frameworks. Documentation
is brief and unfinished, but what's there is useful for the adventurous.
Perhaps the most important caveat, however, is that Kowari lacks a security
model. Tucana clearly expects security-minded customers to look into TKS, which
provides full network-based authentication as part of its package. But DBAs
looking to replicate the user/privileges model of MySQL or other databases may
be disappointed by Kowari.
These minor issues aside, Kowari works as promised. SQL users should find it
easy to migrate their skills to iTQL. Most commendably, in the open source tradition,
the database has been designed to "play nice with others," allowing anyone who
has invested their energies in building, for instance, a Jena solution to migrate
to Kowari with minimum pain. Kowari should be a welcome addition to any Semantic
Web developer's toolbox.