
Trees, Temporarily
by Bob DuCharme
December 03, 2003
XPath 1.0 has a special data type called Result
Tree Fragments. For example, an xsl:variable element can
store a single string, but it can also store an XML element with all
the descendants and attributes you like. This structure is a Result
Tree Fragment. (I try to avoid using the acronym because of my many
unpleasant memories of writing awk, Perl and Omnimark scripts to read
or write Rich Text Format files). There's little you can do with
result tree fragments in XSLT 1.0; you can treat them as strings and
you can use xsl:copy-of to copy them to the result tree, and
that's it. Because many XSLT developers longed for a way to pass
composite structures to named templates, and then use the pieces of
those structures individually inside the named template, instead of
merely copying the structure to the result tree or pulling substrings
out of it, several XSLT 1.0 processors offer extension functions such
as Xalan's nodeset()
and Saxon's
node-set() that convert these fragments to node sets whose nodes
can be addressed with XPath expressions.
XSLT 2.0
eliminates result tree fragments and replaces them with a more
powerful feature: temporary
trees. Once you create a temporary tree in
an xsl:variable, xsl:param,
or xsl:with-param element, you can do anything with it that
you can do with a source tree.
Passing Temporary Trees Around
Our first example shows how, after passing a variable containing a
temporary tree to a named template or function, you can do all sorts
of things with it that you couldn't do with a result tree fragment
passed to an XSLT 1.0 named template. (As usual, for now, the only
XSLT processor that implements enough of XSLT 2.0 to try this is Saxon 7.) The stylesheet in
the example just copies the source tree to the result tree after
adding a header comment with metadata about the stylesheet that
created the result. This could be done much more simply, and work just
fine using XSLT 1.0, if the stylesheet stored the metadata in a named
template and called that template to output the metadata, so don't
model any production code on what you see below — it's only doing
this with several XSLT 2.0 techniques in order to demonstrate how
those techniques work. (In fact, don't model any production code on
any XSLT 2.0 stylesheets you see before it becomes a Recommendation,
which hasn't happened yet.)
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:my="http://www.snee.com/ns/whatever"
version="1.0">
<xsl:variable name="genData">
<ssheetMetadata>
<filename>prepdata.xsl</filename>
<author>BD</author>
<releaseHist>
<version date="2003-10-12T14:52" fileSize="1543">1.2</version>
<version date="2003-09-11T10:12" fileSize="1322">1.1</version>
<version date="2003-07-24T08:03" fileSize="1134">1.0</version>
</releaseHist>
</ssheetMetadata>
</xsl:variable>
<xsl:template match="/">
<xsl:call-template name="outputMetadata">
<xsl:with-param name="revData" select="$genData"/>
</xsl:call-template>
<xsl:apply-templates/>
</xsl:template>
<xsl:template name="outputMetadata">
<xsl:param name="revData"/>
<xsl:comment>
File name: <xsl:value-of select="$revData/ssheetMetadata/filename"/>
Revision History:
<xsl:for-each select="$revData/ssheetMetadata/releaseHist/version">
release <xsl:value-of select="."/>
<xsl:text> </xsl:text><xsl:value-of select="@date"/>
</xsl:for-each>
average file size: <xsl:value-of select="my:avgFileSize($revData)"/>
</xsl:comment>
</xsl:template>
<xsl:function name="my:avgFileSize">
<xsl:param name="fileData"/>
<xsl:value-of
select="avg($fileData/ssheetMetadata/releaseHist/version/@fileSize)"/>
</xsl:function>
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
When the first template rule finds the root of the source tree, it
calls the outputMetadata named template, passing it a
reference to the genData variable for the named
template's revData parameter. After doing so, the first
template rule's xsl:apply-templates instruction tells the
XSLT processor to apply any relevant template rules to the root's
children. Because the only other template rule in the stylesheet with
a match attribute is the last one, which just copies
everything, the source tree will be copied to the result tree after
the output of the outputMetadata named template.
The declaration for the genData variable at the start of
the stylesheet holds an ssheetMetadata element, which has
several children and grandchildren
elements. The outputMetadata named template that
has genData passed to it by the first template rule
uses xsl:value-of instructions to pull various information
out of genData to add to the result tree inside of a
comment. With the stylesheet shown above, the comment comes out like
this:
<!--
File name: prepdata.xsl
Revision History:
release 1.2 2003-10-12T14:52
release 1.1 2003-09-11T10:12
release 1.0 2003-07-24T08:03
average file size: 1333-->
The first thing it pulls out is the contents of
the ssheetMetadata element's filename child. The
XPath expression in the xsl:value-of
element's select attribute has a reference to the named
template's revData parameter as its first step. The mere
existence of more XPath location steps after this one is great
news — an XSLT 2.0 processor's ability to treat a passed parameter
as a tree, and reach down and grab a specific node of the tree using an
XPath expression, adds lots of new possibilities to what we can do in
named templates.
The "Revision History" part of the example demonstrates how we can
do more than just grab a single node from the temporary
tree. An xsl:for-each loop cycles through
the version elements in the passed subtree, again specifying
a reference to the template's revData parameter as the first
step in the XPath expression that identifies the node set to loop
through. For each version element that it finds, it adds the
word "release", the contents of the version element, a single
space, and the contents of the version
element's date attribute to the result tree, which creates
the three "release" lines lines in the output shown above.
The last thing that that outputMetadata adds to the
comment sent to the result tree is the label "average file size:" and
the value returned by the my:avgFileSize() function defined
below it in the stylesheet. The ability to define and call functions
right in the stylesheet is another significant new feature in XSLT 2.0
(see
September's column for more on this); the ability to pass
arbitrary trees of information as parameters to these functions is not
only an improvement over XSLT 1.0, but also an improvement over most
other programming languages that let you define and call your own
functions.
The body of the my:avgFileSize() function definition
illustrates how we can pass temporary trees to built-in functions as
well. The new XPath (and XQuery) 2.0 avg()
function computes the average of the numeric values in the node set
passed to it and the stylesheet's my:avgFile function passes
it the set of fileSize attributes in the version
grandchild of the ssheetMetadata root element of the
temporary tree pass as a parameter.
Using Temporary Trees for Two-Phase Processing
By limiting certain template rules for use in processing in
specific modes, you can convert source tree nodes into a temporary
tree and then process the temporary tree again before sending anything
to the result tree.
Picture this scenario for the following stylesheet: you already
have lots of stylesheets to convert CALS-style
tables to several different formats, and the architectural policy
where you work is to convert incoming tabular data to CALS tables so
that you can take advantage of the existing code to then turn this
data into other formats. You must write a stylesheet to turn the
following document about the chart positions of various bands' singles
into HTML:
<chart>
<header>
<date>2003-11-24</date>
<approvals>
<analyst time="09:42">GF</analyst>
<editor time="10:03">DC</editor>
</approvals>
</header>
<songs>
<song>
<title>Mondegreen Daydream</title>
<band>The New Wayouts</band>
<chartPos>4</chartPos>
</song>
<song>
<title>You (and Me)</title>
<band>Dr. Bellows</band>
<chartPos>1</chartPos>
</song>
<song>
<title>Fly in My Soup</title>
<band>King Timahoe</band>
<chartPos>12</chartPos>
</song>
</songs>
</chart>
You want your new stylesheet to convert each song element
in this document to a row in a CALS table. Next it must pass that
CALS table to the template rules in the following cals2html.xsl
stylesheet, which your company's been using to convert CALS tables to
HTML tables. Note how the template rules of the cals2html.xsl file all
explicitly identify themselves as having a mode value of
"CALS2HTML":
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="2.0">
<xsl:template match="row" mode="CALS2HTML">
<tr><xsl:apply-templates mode="CALS2HTML"/></tr>
</xsl:template>
<xsl:template match="table" mode="CALS2HTML">
<table><xsl:apply-templates mode="CALS2HTML"/></table>
</xsl:template>
<xsl:template match="entry" mode="CALS2HTML">
<td><xsl:apply-templates/></td>
</xsl:template>
</xsl:stylesheet>
(A more complete CALS2HTML stylesheet would be much longer.) The
stylesheet below, which is based on the example in the temporary
trees section of the "last call" XSLT Working Draft, uses the
cals2html.xsl stylesheet to convert the songs input to a CALS
table temporary tree and then converts that tree to HTML in the final
result tree.
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="2.0">
<xsl:import href="cals2html.xsl"/>
<xsl:variable name="intermediate">
<xsl:apply-templates select="/" mode="phase1"/>
</xsl:variable>
<xsl:template match="/">
<xsl:apply-templates select="$intermediate" mode="CALS2HTML"/>
</xsl:template>
<xsl:template match="songs" mode="phase1">
<table><xsl:apply-templates mode="phase1"/></table>
</xsl:template>
<xsl:template match="song" mode="phase1">
<row><xsl:apply-templates mode="phase1"/></row>
</xsl:template>
<xsl:template match="/" mode="phase1">
<xsl:apply-templates mode="phase1"/>
</xsl:template>
<xsl:template match="title|band|chartPos" mode="phase1">
<entry><xsl:apply-templates/></entry>
</xsl:template>
</xsl:stylesheet>
The stylesheet's first template rule gets processed when the XSLT
engine sees the root of the source tree document. It specifies that
relevant templates with a mode value of "CALS2HTML" (which happen to
be the template rules in the cals2html.xsl stylesheet named in
the xsl:import statement) should be applied. Applied to what?
Not to the root's child nodes, which would be the default; they should
be applied to the value of the intermediate variable declared
above the template rule.
Also in Transforming XML
Automating Stylesheet Creation
Appreciating Libxslt
Push, Pull, Next!
Seeking Equality
The Path of Control
By sending intermediate to the CALS2HTML template rules,
this first template rule is clearly implementing the second step of
our songs-to-CALS, CALS-to-HTML sequence. The intermediate
declaration is where the first step happens: it applies the template
rule for which mode equals "phase1" to the root of the source
tree. That template rule — the second-to-last one in the
stylesheet — calls the other "phase1" template rules in the
stylesheet to convert the songs element to a a CALS table. (A
CALS table, like an HTML table, calls the main element "table" but
then calls each row "row" and each entry "entry," unlike HTML's use of
"tr" and "td" for rows and entries.)
When this conversion step finishes, the result is a temporary tree
of a CALS table created by the intermediate variable
declaration, ready for use by any template rule that uses it. We've
already seen that the first template rule uses intermediate
to pass to the template rules that convert CALS to HTML, and the
result is an HTML version of what began as a group of song
elements wrapped in a songs element.
You're certainly not limited to two-phase processing here, but even
doing it with only two stages requires a degree of comfort with using
modes in XSLT. If they're not set up just right, two different
template rules that use xsl:apply-templates on the source
tree root can get you stuck in a loop. Saxon 7 broke out of the loop
easily enough with a Ctrl-C, so go ahead and play. Temporary trees in
XSLT are new enough that you could end up breaking new ground
yourself; let me know what you come up with.