Binary Waltz, Play On
by Robin Berjon
January 28, 2004
Over two months have now passed since the W3C workshop on Binary
Interchange. While the binary XML debate rages on in the
XML community, the cautious observer will note that the discussion
is shifting from a polarised and rather sterile fight to a
slightly quieter conversation in which both sides of the
fence try to understand issues raised by the other.
We are still a good distance away from a peaceful chat over tea, but
this shift towards communication is a welcome one.
This debate raises and contributes to interesting issues
that will not simply go away and need to be addressed,
characterized, and laid out cleanly. Also I, as with many
others, have grown tired of the endless bickering of the "binary
XML" permathread and would like us to give ourselves a chance to
get around a table and put it to rest once and for all.
There Be Dragons
Binary representation of XML is a problem-space with its own
healthily varied ecosystem of solutions. Some of these vary wildly,
some are very similar, some standardized by various organizations,
some entirely proprietary, some generic, and some very much
ad hoc. But one thing that all the participants in the workshop agreed
on was that while one is free to do whatever one wants with XML, the
one and only place that could foster cross-domain and widely
applicable recommendations for XML (with the notable exception of
Relax NG) is the W3C. Thus the issue came up that not only has the W3C
no specification in this space -- which is considered by some to be a
good thing, by others a bad one -- but more importantly that it has
formed no official opinion on the matter. The TAG has opened the binaryXML-30
issue, but it hasn't moved in a while. With Architecture of the
World Wide Web
in Last Call, they have other cats to skin.
The question of whether an authoritative body has formed
an opinion on a given topic may seem, if not pointless, at least
abstract. If a problem is solved by a given technology it is solved no
matter what is said of it, and if it isn't then one must show that it
can be solved before trying to standardize a solution. But looking
more closely at the problem at hand, I can see two scenarios in which a
careless approach in either standardizing a solution immediately, or
ignoring the issue entirely, would anger me and I am sure many of my
readers.
In the first scenario, the present situation of many competing
solutions is
maintained, with not even a document to guide choices regarding
binary interchange solutions. Since there thought to be a genuine
need for binary XML, a few solutions -- proprietary or standardized
by a consortium that does not care for royalty-free technologies --
take over the market. I am a web content developer, and I want to
make my content and services available to the 2 billion mobile
terminals or to the many million Web-enabled digital televisions out
there. All of these devices function on tight technical constraints
and use binary formats for SVG, XHTML, web services, or Semantic Web
agents. Yet since no consensus solution exists, the many
combinations of manufacturers, vendors, operators, and technologies
use different binary interchange formats. There is no doubt that the
XML nebula of technologies, with its insistence on separating
structure from presentation, has simplified multi-channel publishing;
but each channel still requires work to adapt content to it, and
that work should be simplified where possible, especially if it
requires the content developer to pay for some if not all of the
channel that she wants to publish to. Clearly a single royalty-free format is
more desirable than this situation.
In the second scenario, the W3C or perhaps another entity
sufficiently
respected produces a widely
accepted binary interchange format. But it has happened overly fast,
with no heed paid to the benefits that having a single universal
format has brought us. Again I find myself reading
about a service I wish to interact with that according to its
documentation answers to queries formulated in XML. Yet, some
developer or manager there has decided that since there are two
formats, one of which requires less processing power than the other,
there is no reason why they should support the less "efficient" one. I
fire off my text editor and generic HTTP client only to find out that
the solution is still a few steps removed from that which I was
expecting. What a number of people have dubbed "a threat to the XML
brand" is first and foremost a threat to universality, in that one
shifts from having to use a single format to having a choice of two.
In communication technologies, choice is only good when you're the
one making it. Of course, there are always solutions involving
negotiation or discovery, but these complexify the situation and are
not always applicable. In the presence of a standard XML Binary
Interchange format, strong rules are required to preserve
interoperability. At the very least, endpoints that only accept the
binary format must have very good reasons of doing so, which is to say
that they must not use the binary format as an optimization of XML,
but because it is their only option. All others would have to
support XML, only supporting the the binary format as an optimization
option.
Based on my experience as a developer,
either of those scenarios is certain to cause me unhappiness,
likely to make me angry, and may possibly leave me frothing at the
mouth. Having had to deal with character encoding problems in CSV
interchange and binary generation issues in SWF/Flash publishing, to
take just two examples, I would like to avoid encountering similar
problems ever again. XML goes a long way in addressing issues in
both of these situations, but not all of them and not all of those
encountered by the constituency that want to use it. And so we are
left with a choice between two dragons. Either the W3C decides not
to define a binary interchange format and we have to deal with the
dragon of balkanization and ad hoc formats, or it does and we have
to struggle with the dragon of inappropriate usage in the face of
multiple options. The one thing that is certain is that as a
community of users of W3C technology, we can't afford to simply drop
the ball. We have to make a choice, and to make it we need to
thoroughly think it out. We have to figure out which dragon is best
for us to wrestle, which of them is most in our ability to handle,
and from there get to work seriously on earning the kill.
One Format To Rule Them All
We already have a Format To Rule Many Of Them, and it's called
XML. The problem is that, as has been expressed by most of the
several dozens of participants in the workshop, that format is so
desirable that everyone wants to use it (or at least what they
consider to be "it"). Yet there are a number of situations in which
it is not practical, or even impossible to use. Personally I would
much prefer to live in a text-only world, but unfortunately
experience tells me that it is not an option for all of us.
It should not come as a surprise that having put a diverse crowd of
happy people in a big room for three days, the workshop diligently
produced an endless list of requirements. Looking more closely at that
list, however, it becomes apparent that there is a strong amount of
overlap in many of them, and that they could be consolidated, something
which the workshop decided not to do on the spot.
Having made a first rough pass at consolidating it (and at spotting
requirements that would be better solved in another layer), the
remaining list is still rather long. A cursory glance shows that
some requirements may, if not clash, at least cause some friction
were a format to be defined to address them. The question here is
naturally that, given a large set of requirements, even if there is
agreement that producing a standard solution is the right thing to
do, it may not be possible to find one that is generic enough. The
workshop participants were very clear on the fact that they would
much prefer to use a generic and standard format that would be
somewhat less optimal for their needs rather than something
perfectly adapted but entirely ad hoc. That is to say, it is not a
problem if there is some friction between requirements, but we need
to find out just how much friction is tolerable, and how much will
make the format unusable to too large a part of the community.
Again, this is not a question that can be addressed with a little
benchmarking sprinkled here and there with flamebaiting or with
marketing speak. It requires some level of agreement on how binary
formats can be evaluated, as well as discussion between interested
parties on how much optimality they are willing to sacrifice to
obtain generality.
Let's Meet Again
As you can see, there are quite a few reasons to pursue work in
common on this topic, whether a format is eventually defined or not.
In a nutshell, I believe that that is why the workshop reached
consensus on the idea that "the W3C should do further work in this
area". At least, that's my reading of it.
Given the strong opinions that some of its members expressed, it is
clear to me that the XML community at large has to be part of the
debate. I think that the best thing to do is for all interested
parties to prepare arguments to defend their positions in a way that
encourages progress. We've had much name calling and fear mongering
already in this discussion, it's about time things became civilized.
Much interesting talk awaits us; let's get to it.