
April Fool's Wisdom
by Micah Dubinko
April 13, 2005
"April is the cruellest month" -- T. S. Eliot
XML devotees are, as a general rule, thoughtful, creative, and a bit
mischievous. So when the calendar rolls around to April 1, a safe bet is that
you'll find some interesting reading across not only the internet, but also
on the XML-Dev mailing list. This year held no exception.
I find that humor has an important place, even in otherwise serious
discussions. Arguments that might be uncomfortable, uncharacteristic,
exceedingly blunt, or previously discussed-to-death become more palatable with
a thin coating of silliness. Oftentimes, jokes are funny specifically because
they contain a kernel of truth. Well, maybe a grain of truth. Sliver? Speck?
Atom? Whatever the amount, this week's column will search it out, and
highlight the connections that lead back to more familiar topics.
Fool Me Once
The first artifice came from the fertile mind of Sean McGrath, with a
message bearing
the serious title REST, SOAP, Speech Acts, and the mustUnderstand Model of SOA
Communications, with bonus points for dovetailing into the
previous discussion about
REST versus SOAP. Interop is a key XML design issue--frequently highlighted as
one of the reasons for using XML in the first place.
A recurring theme in XML development, as in life, is that common everyday
things may be generally known, but still lack an agreed-upon definition:
Life,
love, and Web services to name three. Fortunately, we seem to get plenty done
without the benefit of standardized definitions. We are able to meaningfully
discuss such topics because each viewer forms his or her own personal
definition based on experience. When there's enough overlap between these
slightly different views, understanding may result. (If not, one or both of
the communicating parties are probably at fault.)
Taking an ephemeral April First
message too
seriously is a classic blunder, so I'll avoid a full-scale rant. Nevertheless,
"mustUnderstand" is a powerful concept. Perhaps two disparate systems
exchanging, say a SOAP-wrapped UBL Invoice, are comparable to two individuals
discussing the meaning of life or love (or Web services). Both parties might
have different ideas about what they're actually talking about, but a
sufficient amount of common ground can result in understanding. A
mustUnderstand qualifier crystallizes the situation with guarantees about how
much common ground truly is common.
So, musing about additional levels of mustUnderstand has an understandable
allure. Many of us spend a great deal of our professional time getting various
systems to "understand" each other.
Fool Me Twice
Another wile came from Microsofties Andrew Layman and Don Box, as a
link to a
paper entitled XML
Performance Improvements Through Interdisciplinary Factor Assessment and
Application, with bonus points for referencing the earlier
discussion about binary XML.
Performance is certainly on the mind of many XMLers, with the
"off-by-one"
publication of the XML Binary Characterization Working Draft, or XBC.
Another undercurrent behind the Interdisciplinary paper is of measurement
and evidence, something that the XBC draft falls
short on, with
simple yes and no columns instead of convincing data. Len Bullard
comments on the
binary XML saga-in-progress: "The TAG is demanding benchmarks and test cases,
something that hasn't been demanded of disruptive technologies such as HTML,
RSS, CSS, XML, or even SOAP," and he later asks, "is the XML Binary a disruptive
technology that will change the current landscape of technology companies?" As
it turns out, in the resulting thread fear runs deep that a not-XML
specification would disrupt--in a bad sense as opposed to the "good"
Innovator's Dilemma sense--the existing web of XML specifications.
Steven DeRose
provided
insightful commentary on the situation. I highly recommend reading the entire text, which I will only summarize here, using the four main topics from
his message. 1) What operations do you want to do? Perhaps the most important
question to ask. For several tasks, a binary format may be faster. For others
that involve working with the text stream, binary formats will incur extra
conversion overhead. 2) Where is the data kept? Always something to keep in
mind: disk versus RAM tradeoffs. 3) What does "lossless" mean? In "roundtrip"
scenarios, various pieces of information may or may not get preserved. The
Infoset is a commonly cited, yet controversial, set of goalposts for what to
preserve. Transport issues such as byte-order-swapping might become
significant. Finally, 4) Hybrid solutions. Certain kinds of optimizations can
be done entirely within XML, for example, by adding in additional attributes
that provide indexes into other parts of the document. Steve concludes that
"the solution space is much wider than it may appear, and the answers are more
complex. Also, that it can be, and has been, done successfully. But except for
really huge documents, I don't think it's usually worth the effort."
In a curious bit of synchronicity, Bullard
mentioned another
perceived benefit of a binary format: that authors "don't expose their content
to inspection." Paul Downey
noted that just
the day before, a technique to add "View Source" for
Flash had been posted,
inspired by a Lawrence Lessig talk at the FlashForward
conference.
Won't Be Fooled Again
Besides the posting date and credible-sounding subject lines, closer
inspection reveals some additional strands connecting these two bits of
mischief with recent discussions.
Robustness: Elliotte Rusty Harold
commented that
part of the culture of textual XML processing is a certain amount of
redundancy and paranoia, something shared by mustUnderstand processing.
Lossy Understanding: A primary decision point for the binary XML folks is
nailing down the amount of lossiness that the format will have considering
roundtrips to and from XML syntax. Authors might encode their intent with
various XML facilities, and so a lossy conversion might translate into a lossy
understanding.
Interop: Real and imagined levels of mustUnderstand exist to provide minimal
guarantees about interoperability, a parallel to discussions of whether a
binary XML format should be standardized, or left to individual
implementations.
Looking ahead, what will we see next April First? Hopefully, industrious XML
community members aren't planning that far ahead. Another safe bet is that the
XML-Dev crowd will come up with something creative, funny, and reflective of
whatever issues are burning in our collective minds next spring. Until then,
we have about 11 months to brush up on our koans, limmericks, haikus, Monty
Python sketches, and other creative outlets.
Births, Deaths, and Marriages
W3C Workshop on XML Schema 1.0 User
Experiences
The W3C is organizing a
Workshop on XML Schema 1.0 User Experiences to gather concrete reports of
user experience with XML Schema 1.0, and examine the full range of
usability, implementation, and interoperability problems around the
specification and its test suite.
When and Where: June 21-22, 2005 Oracle Conference Center, Redwood Shores, California, USA.
XML 2005 Call for Participation
Get in early; deadline is May 13.
Nux 1.1
Nux, an open source extension of the XOM and Saxon XML libraries, is available.
Documents and Data
Python processing in XML:
Point,
Counterpoint,
Counter counterpoint.
Andrzej Jan Taramina on the real use of SOAP/WS.
Dimitre Novatchev
shows
how to find the deepest node via XSLT 2.0.
Michael Kay on the secret of efficient coding.