WWW2004 Semantic Web Roundup
by Paul Ford
May 26, 2004
According to Tim Berners-Lee's WWW2004 keynote
address, the Semantic Web is entering "phase
II", a time of "less constraint" when Semantic Web developers are
encouraged to build upon the foundations of RDF and OWL to create working
applications on both the server and the desktop. And while other topics were
discussed at WWW2004, such as mixed
markup and XForms,
this was definitely the Semantic Web's moment in the sun, with academic and
corporate presentations alike focusing on the uses of RDF, triple stores, and
data sharing.
The Semantic Web focus was not without its critics. Elliotte Rusty Harold
posted the following to his site after listening to one of the many Semantic
Web-related presentations at the conference:
I feel like I'm a mechanical engineer in 1904 listening to a bunch
of other engineers talks about airplanes, but nobody's willing to show
me how they actually expect to get their flying machines into the air.
Maybe they can do it, but I won't believe it until I see a plane in the
air, and even then I really want to take the machine apart before I
believe it isn't a disguised hot air balloon. A lot of what I'm hearing
this morning sounds like it could float a few balloons.
Both Berners-Lee and Harold are asking the same question from different
vantages: where are the applications? There is a framework, not yet fully
proven, for a massively distributed, world-wide database, glued together by
ontologies -- and now what?
If the answer to "what can I do with the WWW?" was Mosaic 1.0, the question
"what can I do with the Semantic Web" has no corresponding killer app. Indeed,
Berners-Lee asked the assembled group to forget about killer apps totally; as
reported last
week, he said that the proof of the Semantic Web is when new
connections are made, and new links between information emerge.
That said, there is a great deal of work going on within corporations and
academic research groups, each of them trying to answer the question in its own
way. Some are crafting better back-end storage and querying methods, others
attempting to give the end-user a better experience. Throughout WWW2004's
Semantic Web track, managed by Eric
Miller, the W3C's Semantic Web Activity Lead, the conversation
shifted from theory to practice as betas and demonstrations of working products
were shown.
The Server-Side Semantic Web
At the bottom of the Semantic Web there must be a means to store RDF, and
for some time the leading storage framework has been Jena, from HP Labs.
Many of the Semantic Web projects discussed during the conference used Jena as
a backing store, but one contender to Jena's throne is Kowari, from Tucana Technologies.
Written in Java 1.4 to take advantage of native I/O support, Kowari was
created from the ground up as a database for triples. This contrasts with Jena's
reliance on back-end database engines (i.e. MySQL) for persistence. Kowari is
available in three flavors: as a component of the Tucana Knowledge Server, an
enterprise product focused on metadata analysis and knowledge discovery; as an
open-sourced (Mozilla Public Licence 1.1) server with a long list of features,
including SOAP bindings and "Descriptors" for transforming data using XSLT; and
a "Lite" version derived from the full version, which jettisons some features
to allow for a smaller (11 Meg) download size.
Rather than competing directly with Jena, Kowari includes support for the
Jena API, as well as JRDF, an
alternative RDF-management API, and adds to these APIs with a new SQL-like
query language, iTQL, that can be used via a built-in interactive shell.
Moving up from storage, the SIMILE (Semantic Interoperability of Metadata and Information in unLike
Environments) project, jointly developed by the W3C,
HP, and MIT, is focused on collecting and publishing Semantic Web data to the
(non-Semantic) Web. SIMILE has two major components, both open-sourced: Longwell and Knowle, which
work together to provide a user-friendly Web-based front-end to RDF. The SIMILE
framework has been deployed in several projects and can be seen in a publicly-available
demo of the tools in operation, which allows users to traverse and
compare W3C Technical Reports.
A more general-purpose framework for building Semantic Web applications,
KAON (The KArlsruhe ONtology), is a tool suite and application server for
developers seeking to build any sort of ontology-driven application. To that
end, it provides a front-end to ontology development, called OI-Modeler, along
with a number of API interfaces for querying ontologies and managing RDF, and
code for generating a Web-based portal for exploring data managed by KAON.
The Client-Side Semantic Web
Sometimes the simplest software can do the most, and this was the case with
Ralph Swick's talk on
using the Zakim IRC
bot as a support system for teleconferences. The bot, which is integrated with
a teleconferencing system, serves as a kind of automated secretary and group
notepad for conference calls, informing users of who is on the phone,
generating meeting minutes, and recording action items.
A more complex, but equally promising application for the Semantic Web is Bibster, a peer-to-peer
framework for managing and sharing bibliographic data. Bibster allows users to
seek out book and article references from an arbitrary number of P2P hosts, and
also seeks to consolidate messy data to avoid redundant listings. In addition
to allowing literature search on a number of fields (i.e. title, abstract,
author), it can search for references using the ACM's topic hierarchy (a
taxonomy of topics specific to computer science) and other taxonomies, and
allows users to browse through that hierarchy, then search for references which
cover a selected topic.
Offering a different view of the Semantic Web was
SWOOP (Semantic Web Ontology Overview and Perusal), from the University of Maryland Mindswap Lab.
SWOOP is an ontology browser that uses a Web-browser metaphor to allow users to
download OWL ontologies and edit them. While it is certainly not a tool that
will bring the Semantic Web to everyone, its interface offers a marked
improvement over older, more complex ontology management tools like Protégé.
Perhaps the most impressive client-side application built on a Semantic Web
framework is Haystack, a "universal
information client" that seeks to link together different kinds of user data
(emails, addresses, Web bookmarks, etc.) with a consistent interface. Emerging
from the Eclipse framework, Haystack uses
the concept of the "collection" as an organizing principle. A collection might
be a list of bookmarks or a set of email messages, which can then be displayed
in different "views" or through different "lenses." Haystack is, ultimately, a
tool for managing collections of these collections, all interlinked.
Like Eclipse, Haystack appears to be almost infinitely extensible, and one
presentation by Dennis Quan of IBM showed it being applied to problems in
bioinformatics, using LSID
URNs instead of URIs, creating a unified view of multiple bioinformatics
databases.
While Haystack shows much promise, it is also a large and slow application,
written in Java -- over 40 megs to download, with 512 megabytes of memory
required for use. Caveat downloader.
Java: the Semantic Web Language of Choice
Many of the Semantic Web applications demonstrated during the conference are
written in Java, as is much of the publicly available code for working with the
Semantic Web. While this may alienate some developers,
it also demonstrates a commitment on the part of the presenters to create
re-usable code, and this approach has paid off in tools like Kowari, which
simply grafts the Jena API on top of its triple store, allowing existing Jena
users to migrate with a minimum of pain. It may also indicate the desire of
many developers to see the Semantic Web take root in the enterprise, where Java
is an acceptable development tool.
One Java tool which was repeatedly mentioned at WWW2004 was Lucene, an
API for full-text search. A conference presentation by Doug Cutting, who
founded the Lucene project, described how Lucene has found its way into dozens
of projects, including Nutch, an open-source
search engine that hopes to one day compete with Google. At first, Lucene's
relationship to the Semantic Web may seem unclear -- after all, the Semantic
Web is about resource discovery by analyzing triples, not full-text search.
However, along with URIs, literal values make up a good portion of RDF, and
Lucene offers an easily embeddable means to provide for search within those
literal values. Most notably, Lucene is integrated into Kowari, where it allows
for combinations of graph-based querying and old-fashioned keyword lookup.
While Java's success on the server is difficult to dispute, on the desktop
the lead is not as clear. A tool like Haystack, while elegant in screenshots,
appears to be staggering under its own weight when it's run on even a fairly
powerful laptop (one attendee called it a "Shrek" -- sweet, but a monster).
While SWOOP is a lighter-weight application, as is Bibster, both have simple
GUIs, and don't provide the eye candy or visualization options that feature in
Haystack and that end-users have come to expect. For the Semantic Web to
succeed on the desktop, it may need to leave Java behind; one promising
approach might be to focus energies on .NET/Mono implementations; alternately,
developers could consider using Mozilla's XUL, particularly given the fact that
Mozilla already stores application data in RDF -- "triples all the way
down."
Summing Up the Semantic Web
Returning to Elliotte Rusty Harold's quote regarding engineers and
airplanes, while the Semantic Web applications shown at WWW2004 are not
equivalent to large commercial jetliners, several applications seem to be
self-propelled, running on more than hot air. But it is also clear that many
are still waiting for a "conversion experience" regarding the Semantic Web.
At WWW2004, it seemed as if a gauntlet was thrown down, both by Semantic Web
boosters like Berners-Lee and critics like Harold. Both are waiting for
applications to emerge, for working code. Given the attention paid to the
Semantic Web at the conference, and given that the W3C has invested a large
portion of its influence and resources to promote the Semantic Web to the Web
community at large, it is clear that the RDF/OWL framework must continue
to gain momentum and find its way into the hearts and minds of developers
before long, so that it can avoid the fate of other well-considered and useful
W3C specifications -- whither art thou, XInclude, XLink, and XPointer?.
However, it does seem as if the Semantic Web has left its childhood and
entered its adolescence, venturing out from the sheltering roof of the W3C and
showing up in such dangerous places as Bristol,
Maryland, and
Brisbane. As with
any adolescent, it is difficult to know exactly what sort of adult it will
become: a set of interlinked desktop tools? A component on the server side? A
tool for scientists, or one for publishing TV schedules? While the destiny of
the Semantic Web is impossible to predict, one thing was made clear at WWW2004:
the next 12 months will be the Semantic Web's chance to stand up and prove
itself, if it is going to do so.