
XML and JavaScript in the Browser
by John E. Simpson
March 26, 2003
It's been a slow month on the O'Reilly Network XML
Forum: it's not easy to write a column answering questions when
none have been asked. So, it's time to dig into the archives, pulling
out -- this time -- a couple of queries about processing XML.
Q: Can I process my XML with JavaScript?
For client-side work, is XML scriptable with JavaScript? In other
words, can you get something analogous to DHTML with it?
A: On the face of it, this question may seem almost meaningless or,
at least, unnecessary. After all, script and other XML-programming
options already abound besides JavaScript: ASP, Perl, VBScript, Java,
Python...
A couple of considerations make this not such a silly question
after all. First, the questioner specified client-side
processing. This eliminates back-end programming languages such as
Perl, ASP, and Python. Second, JavaScript has a two-pronged advantage
over the languages remaining: it's a simple, cross-platform
solution. If you could actually get it working, any browser capable of
running JavaScript (and, of course, displaying XML in the first place)
could be made to handle the XML exactly the same way. You could
restructure the document tree on-the-fly, for example, or add
completely new elements, attributes, and other nodes to it -- all
without requiring any proprietary languages, and all without having to
know any language more complex than, well, JavaScript.
In researching this question, I found numerous possible solutions
to the problem. Two of them might interest you: the Sourceforge XML for <SCRIPT>
project and Cyril Jandia's ESPX/TinyXSL.
XML for <SCRIPT>
Here's what the XML for <SCRIPT> site has to say about
it:
XML for <SCRIPT> is a simple, non-validating XML DOM
and SAX parser written in JavaScript. It was designed to help web
application designers implement cross platform, client side
manipulation of XML data. XML for <SCRIPT> is licensed under the
terms of the GNU Lesser General Public License (LGPL).
The benefits of this architecture are many.
- Server side code intermixed with HTML code can be reduced to
almost nothing.
- Client side code is simplified by having all form
initialization in one place.
- Applications are now free to maintain their own data,
reducing annoying round-trips to the server.
- Server side processing is simplified by having all relevant
form data be submitted in XML
In effect, XML for <SCRIPT> allows n-tier client side
application development to become a reality.
ESPX/TinyXSL
ESPX, in Jandia's words, is "an
ECMAScript Parser
for (almost) XML, with namespaces". The "almost"
refers to the fact that ESPX doesn't support DTDs (either internal or
external subset), let alone XML Schema. This may or may not be a fatal
limitation; for instance, if you need to recognize ID-type
attributes, as such, or to use declared entity references, you're out
of luck. On the other hand, ESPX does support quite a
few of HTML 4.0's built-in entity references. As Jandia's summary
implies, it also fully supports the W3C's Namespaces in XML
Recommendation.
Importantly for cross-platform applications, ESPX has been tested
on the three main browsers (Microsoft Internet Explorer,
Netscape/Mozilla, and Opera) not only at their current levels -- which
(to varying degrees) already "know" XML -- but also in "down-level"
versions "without built-in XML support".
As for TinyXSL, Jandia has almost nothing to say except that it's
an "XML transform in-Script
mini-Language" which sits atop an ESPX
framework. Essentially, it's something like an XSLT processor written
in JavaScript (or ECMAScript, as Jandia insists that one of his goals
with both projects was standards compliance). Stylesheets for use with
TinyXSL look quite a bit like plain old XSLT stylesheets, with a
TinyXSL namespace in place of the standard one for XSLT
transformations.
Neither of the above two projects has seen any really recent
updates. XML for <SCRIPT> was last updated about a year ago. The
current ESPX/TinyXSL version is date-stamped March of 2001.
Caveats
Remember, whether you select either XML for <SCRIPT> or
ESPX/TinyXSL -- or probably any other JavaScript-based alternative --
there's nothing inherent in XML which makes it particularly
"programmable". There's no such thing as a built-in
script element, for one obvious example; even if a
particular vocabulary does include such, what it means depends
entirely on the vocabulary's purpose. (For instance, vocabularies
intended for use in marking up dramatic works and in handwriting
analysis might both include a script element. It probably
would be used in neither case to hold programming instructions,
though.)
Related to that first caveat, another implication of using
JavaScript (as opposed to many other languages) to process XML is that
it's meant for use in a web browser/server. While XML for
<SCRIPT> includes a mini-database sample application, the
database in question is retained on a web server which receives form
input from the browser (and makes heavy use of cookies to persist the
data until it's ready to be processed on the back end). If you want to
write some kind of general-purpose XML application which will run
(cross-platform or not) in some context other than the Web, you'll
need to consider some language other than JavaScript.
Q: What are the processing steps an XSL-FO engine follows?
I have read the XSL-FO
specification. There they have said that XSL-FO formatting
includes three steps:
- Objectifying
- Refinement
- Generating area tree
I am unable to understand these things clearly. Can you please
explain them, with an example?
A: Congratulations on having read the XSL-FO Recommendation. Just
embarking on that task had to require an act of almost unimaginable
willpower! There's no real mystery to the three concepts you've
singled out for your question. Let's look at them one at a time.
Objectifying
This is similar to what a DOM-based XML parser does: it converts a
stream of XML data into an in-memory tree. Specifically, it
constructs what's called a formatting object tree -- essentially a
hierarchy of boxes or containers within which the document's actual
content appears.
For instance, the skeleton of a simple XSL-FO document might look
something like this:
<fo:root [attributes]>
<fo:layout-master-set [attributes]>
<fo:simple-page-master [attributes]>
<fo:region-body
[attributes]>...</fo:region-body>
<fo:region-before
[attributes]>...</fo:region-before>
<fo:region-after
[attributes]>...</fo:region-after>
<fo:region-start
[attributes]>...</fo:region-start>
<fo:region-end
[attributes]>...</fo:region-end>
</fo:simple-page-master>
<fo:page-sequence-master
[attributes]>...</fo:page-sequence-master>
<fo:layout-master set [attributes]>
<fo:page-sequence [attributes]>
<fo:title
[attributes]>...</fo:title>
<fo:static-content
[attributes]>...</fo:static-content>
<fo:flow
[attributes]>...</fo:flow>
</fo:page-sequence>
</fo:root>
To objectify this stream of XML, the formatter converts it to a
tree of objects -- of formatting objects, as shown below:
fo:root |
|
|
|
|
fo:layout-master-set |
|
|
|
|
fo:simple-page-master |
|
|
|
|
fo:region-body |
|
|
|
fo:region-before |
|
|
|
fo:region-after |
|
|
|
fo:region-start |
|
|
|
fo:region-end |
|
|
fo:page-sequence-master |
|
|
fo:page-sequence |
|
|
|
|
fo:title |
|
|
|
fo:static-content |
|
|
|
fo:flow |
|
Note that at this point, all that exists is only a rough in-memory
metaphor (as it were) for how the final document will appear.
(The various elements' attributes and text content are also
included in this tree inside the corresponding box, although not shown
above.)
Refinement
The idea behind the refinement step is that when the final document
is produced, each formatting object (FO) will have traits which
instruct the rendering agent exactly how and where to display that
FO. For instance, a block of text in a top margin (which corresponds
to the fo:region-before FO) might be rendered in a
particular font face, centered horizontally between the margins. These
traits are often specified explicitly in the attributes for a given
FO's corresponding element, and direct mapping of attributes to traits
is one part of refinement.
(Aside: the XSL-FO Recommendation uses the terms "trait" and
"property" more or less interchangeably. Perhaps there's some
distinction between the terms in the spec's authors' minds, but for
all practical purposes you can consider them synonymous.)
Traits can be implied as well as expressed explicitly, however. For
example, many traits are inherited by lower-level FOs from their
higher-level ancestors. Some traits must be calculated based on
evaluating expressions. And some traits (such as a simple
border trait) are shorthand expressions of various
specific traits (such as border-top and
border-style). Deriving traits from these indirect
sources is another (very important) facet of the refinement step.
 | |
|
Also in XML Q&A
From English to Dutch?
Trickledown Namespaces?
From XML to SMIL
From One String to Many
Getting in Touch with XML Contacts |
| |
Generating the area tree
The final step in XSL-FO processing is the one which produces the
result you're really after when using XSL-FO in the first place: it
assigns a geometric area on each printed page for each block of
content, according to the specifications laid out in the fully-refined
tree of FOs. It moves the abstract, metaphoric expression of the
document's appearance to something which is actually usable by the
target medium, be it printed page, computer monitor, WAP-enabled cell
phone, or whatever.
If you're interested in learning more about XSL-FO -- a big but (I
think) important topic -- I encourage you to consult more full-length
treatments such as Dave Pawson's XSL-FO or my own
Just XSL. (Note that the latter includes full coverage of
XSLT as well as XSL-FO.)