
RPV: Triples Made Plain
by Kendall Grant Clark
November 20, 2002
For as long as RDF has existed, people have been trying to fix
it. My predecessor in this spot, Leigh Dodds, wrote a column in the
summer of 2000 ("Instant
RDF") in which he discussed efforts to respond to complaints about
RDF's complexity. At that relatively early point, the two dominant
approaches to relating XML and RDF, as Dodds explained, were that RDF
should be embedded in XML documents or that RDF should be extracted
from, but not embedded in, XML documents.
In last week's column, I claimed, following conversations in the
XML development community, that RDF was good for representing "mundane
metadata", to use Bob DuCharme's phrase, and as an alternative to
RDBMS storage. That is, as a kind of unstructured or semistructured
data storage model. My goal was to route around complaints about RDF's
XML serialization by suggesting ways in which it didn't matter (not
much, anyway) what that serialization looked liked, since the goal was
to avoid writing it by hand or reading it, as it were, by eye.
I
suggested using a programmatic triple or RDF store from a host
programming language, many of which have interesting RDF triplestore
implementations (for example, Redland works with several languages). By means of a
triplestore API one makes 3-tupled assertions, combining them into
graphs, using ontologies (of various degrees of formality and
publicity) of terms, predicates, both of which are named by URIs, and
values, which may be named by URIs or may be asserted literally.
In this scenario some of the constraints, but also most of the maturity,
performance, and wider tool support, of SQL and RDBMSes are avoided in
return for a considerable grant of flexibility and extensibility. And
if the XML serialization of these graphs of triples, which might be
used for exchanging graphs
or simply for on-disk storage, was
terribly ugly or hard for most people to write and read, who cares? No
one is being asked to do so. Except for the people who develop the
triplestore implementations, but they're RDF theoretic model wireheads
anyway. If you're troubled by the idea that some things are simply to
be ignored by some people, think of an RDBMS like MySQL, which is
widely and successfully used by thousands of developers, most of whom
haven't the slightest idea about the technical details of, say, ISAM
table storage. They don't know; don't want, care, or need to know.
Perhaps RDF's XML serialization is like that?
In other words, if you don't like or understand or prefer RDF's XML
serialization, find a way to avoid dealing with it directly. Using an
RDF triplestore from a high-level language is one such way, while
retaining some, perhaps all of the benefits of RDF's data model. So,
my argument is a more focused variant of the suggestion Shelley Powers
has been making repeatedly on XML-DEV lately: if you don't like or
understand or prefer RDF, just don't use it. This seems fair
enough.
Most recent discussion of RDF, which has bubbled over the bounds
of XML-DEV and moved out into the broader confines of the Web
development community, has been by turns absurd and sublime. From
foundational debates about whether RDF is complex, or fights over how
to characterize its complexity, to awfully redundant discussions about
whether its XML serialization is all that user-unfriendly, to
meta-debates in which various sides jockey for position to see which
side can be described as unfair or "politically correct" (whatever
that could possibly mean in this context) or dismissive or
narrow-minded or high-handed -- and on and on.
Yet the debate has also been
productive at times, including Tim Bray's RPV
proposal.
Resources, Properties, and Values
Bray says his RPV proposal "is an XML-based language for expressing
RDF assertions ... designed to be entirely unambiguous and highly
human-readable." That two-part design goal is worth spending some
time with insofar as it's emblematic of a good deal of the underlying
debate over RDF. To say that an XML language
is or should be "entirely unambiguous" and
"highly human-readable" is to say that it should be as easily
digestible by machines as by humans. It's that tension which runs
all the way from XML to RDF.
Further, Bray suggests that RDF has failed to gain traction because
of this tension: his RPV proposal "is motivated by a belief that RDF's
problems are rooted at least in part in its syntax." He elaborates on
this point by saying, first, that RDF's XML serialization is
"scrambled and arcane," preventing people from easily reading or
writing it; second, that the XML serialization uses qualified names in
a way that's not user-friendly and is in some conflict with the TAG's
idea that significant resources be identified by URI; third, that
there doesn't seem to be a general problem for metadata folks to think
of things in terms of RDF's 3-tuples; fourth, that some alternatives
to RDF-XML, like n3, suffer because, as non-XML, they can't get the
network effect of ubiquitous XML support; and, fifth, that the idea of
embedding RDF in XML languages, which seemed in the summer of 2000,
both to Leigh Dodds and much of the rest of the XML development
community, like a viable approach, "has failed resoundingly in the
marketplace."
To put it more plainly: RDF needs a new XML serialization as the
existing one is overly complex, and it should be possible to do
better. Bray's RPV proposal has at least one immediate virtue:
simplicity. It contains only two elements, R and PV
-- for resources and property-value pairs, respectively. Which means
simple triple in RPV can be as straightforward as
<R r="http://xml.com/">
<PV p="http://foo.com/#siteType" v="http://foo.com/#xml" />
</R>
The resource identified by the R element has the property
identified by the URI in PV's p attribute, which has
the value identified by the URI in its v attribute. Since
there can be any number of PVs within an R, one can
easily add other properties to the resource by adding other
PV elements. As the object of a property can also be a
literal, RPV says that when the v attribute is missing from a
PV, the value of the property being predicated of the
resource is the content of the PV element:
<R r="http://monkeyfist.com/">
<PV p="http://foo.com/#Title">Our Monkey, Your Fist</PV>
</R>
An attributeless R means that the element itself
is (or represents) the resource being
described:
<R>
<PV p="http://foo.com/#Type" v="http://foo.com/#Resource" />
</R>
A resource element with an id attribute, the value of
which must be unique within the XML document can be referred to at
other points in the document:
<R r="http://monkeyfist.com/" id="r1">
<PV p="http://foo.com/#Publisher">Monkeyfist Collective</PV>
</R>
<R r="#r1">
<PV p="http://foo.com/#Subject">politics</PV>
</R>
That's about all there is to RPV (save for namespaces, which
I've omitted above, and some bits about relative URIs and
reification). RDF-RPV is clear and simple, easy to write and read;
more importantly, it makes the triples plainly visible. The murkiness
of the triples is one complaint people often make about RDF-XML.
Also in XML-Deviant
The More Things Change
Agile XML
Composition
Apple Watch
Life After Ajax?
Whether the RDF Working Group will consider alternative syntaxes or
whether something like RPV could possible be adopted remains open
questions. The value of Bray's RPV proposal is its demonstration that
an XML serialization of the RDF model does not have to be complex or
hard for humans to read.
One of the parts of RDF which people seem to
like is the clarity of tuples of subjects (resources), predicates
(properties), and objects (values). The 3-tuple isn't ideal for every
situation and, yes, some people aren't interested in thinking of
things in terms of graphs of triples. For those who do, however,
having an XML serialization of RDF which makes the triples obvious and
plain seems to be an unambiguously good thing.