Fast XSLT
by Steve Punte
April 02, 2003
Introduction
This article reviews the birth and development of the promising
compiled-XSLT engine, Apache XSLTC, and the fierce competition among
developers of XSLT engines to be the performance leader.
I Had A Dream
All great revolutions begin with one person, one dream, one idea, and
one vision. And such is the case with Jacek Ambroziak:
originator, visionary, and father of the Apache XSLTC project.
Exactly one year ago Jacek collected together his thoughts and personal
perspective as he departed the project in a document I refer to as the XSLTC memoir. Like
all noteworthy breakthroughs, this one was inspired by the serendipitous
convergence of three bodies of knowledge: the Java Virtual Machines,
Compiler Theory, and XML transformation -- which all came together for
Jacek one dark and stormy night back in early 2000. From there, the
project was born at Sun
Microsystems' XML Technology Center. And life was good, for a
while.
To Compile or Not To Compile
An ancient question, stretching back to nearly the dawn of computer
science, has been, to compile or interpret? Procedural languages have
long explored this issue especially with the C and C++ language compilers
that have dominated the last few decades. Java is both compiled and
interpreted, depending upon how hard you look at it.
Does compilation always improve performance significantly? No. For
any given problem, this is called the "Compile Conjecture." In a
nutshell the answer is in the testing: if when all is said and done, the
final optimized XSLT engine, or whatever software product, spends only 10%
of its CPU time in the generated compiled code and 90% in generic
algorithm code, then the increased complexity by using the compiler tactic
is probably not worth it.
One of the oldest examples of "problem to compiled code" is the Unix
tool lex, which is basically a regular expression generator creating
optimized C code on the fly to fulfill the task of transforming a byte
stream into a token stream. In this situation all possible efficiency is
useful due to handling a large number of bytes. So the key question to ask
is what type of software problems benefit from compilation, and does XSLT
fit into this model?
Overview of Apache XSLTC
Apache XSLTC began in early 2000 at Sun Microsystems, first with Jacek
Ambroziak and eventually turning into a small team. XSLTC is most
significantly differentiated from other transformer engines by its
distinctive two-phase operation. The XSL stylesheet is first compiled
into a Java class called a "translet" ("transformer servlet", not related
to a web application servlet). This stand-alone translet can be loaded as
a Java class and run in a different process or even on an entirely
different machine. The core translet accepts a DOM-like object for input
and produces a SAX-like output stream. Adaptors are provided to convert
to all conventional standards.
While XSLTC demonstrates a significant speed advantage over other
transformers (300% to 500% over Xalan-J), it is still a young product and
has limited range of compliance and a limited feature set. However, its
true claim to fame may be its small memory footprint. While this is an
advantage in space limited situations, it could potentially be a deciding factor in large
high-volume web applications where sometimes thousands of requests are
fulfilled simultaneously. On a per-thread basis, a translet can easily
consume one-tenth the space of a conventional XSLT engine. This
characteristic mitigates a major concern to date of using any XSL
technology in high-volume sites.
Few generic products can easily gain acceptance as a proprietary
offering from a commercial firm. Sun Microsystems made the decision to
donate this software to the Apache Xalan effort in hopes of it seeing
wider acceptance and usage. Without this decision, XSLTC would have
probably become just a footnote in XSLT history. XSLTC is presently
bundled with Xalan-J, yet is still a clearly individual component. Both
Sun and IBM contribute key engineering resources to this project.
Gregor: A Second Generation Design
But Jacek Ambroziak has wiped the plate completely clean and started a
new XSLT engine, which he calls "Gregor", at his new firm Ambrosoft. As Jacek says, "In the
case of XSLTC (and especially Gregor) it is not only compilation but sets
of particular algorithmic ideas that matter a lot for performance
(speed/memory)".
In early testing, Gregor is tentatively showing nearly ten times the
performance of Xalan-J and twice that of XSLTC; but it is not a complete
product yet. Benchmarks can be misleading (See the "Metrics of Metrics"
section below).
In the end, the question always remains, "are their sufficiently
radical and advantageous architectural approaches that can improve
performance yet unexplored and untapped?" In the case of XSLT engines, the
answer is "probably", since it's a very young field.
To Be King
The race among XLST engines is not an easy one. There are many
disjoint objectives: performance, quality, compliance, small size, etc.
It is not always clear which architectural strategy will produce the best
results.
Developing Performance and Superiority
Each XSLT engine tends to discover key algorithms and data structures
that have produced superior results. Slowly, over time, the various
projects assimilate each other's tactics. And yet they also break new
ground in order to differentiate themselves, much like any competitive
software field.
Compilation to bytecode is the key approach being discussed here.
However, Michael Kay, author of Saxon, believes other tactics such as
tree-rewriting, tree-building, sorting, memory allocation, and in general
optimization at the semantic level (that is, rewriting the stylesheet)
have a much greater impact on performance.
Xalan-J pioneered the usage of an internal data structure called DTM
(Document Table Model); XSLTC will soon use this same tactic, achieving
more commonality between the two Apache products.
XSLT is a rapidly moving target. An XSLT transformer is dependent upon
numerous additional standards such as XPath, W3C XML Schema, XML
Namespaces, EXSLT, DOM, SAX, and potentially other proprietary APIs.
While in its pure form XSLT seems theoretically sufficient and complete,
in practice extensions such as those provided by the community extension
project EXSLT are essential. All these factors make it difficult to be an
XSLT contender. A project must be under continual development and
evolution or otherwise will lose its edge and fall behind.
Hardware Acceleration
Still need more horsepower? You are not alone.
DataPower has been pursuing the high throughput XSLT market for some time: first
with their just-in-time compiled XSLT product, XSLJIT, and more recently with a
dedicated hardware accelerator called XA35.
I spoke in depth with a Eugene Kuznetsov about this product, and tried to squeeze out of him the secrets
of why dedicated hardware makes such a difference. He was pretty tight lipped, but I have a
sneaky suspicion that associative memory, or something similar not found in the
classic von
Neumann CPU architectures, may play a key role. In any case, if
indeed XSLT processing is significantly accelerated by dedicated hardware, we may
see a day where specialized XSLT coprocessors are designed into CPUs much
like math coprocessors. In the mean time, such hardware accelerators are viable
solutions for the most challenging situations.DataPower also supports a
Java interface that handles the remote communication allowing the accelerator to
appear as a simple software component.
The Metric of Metrics
Benchmarks, especially XSLT benchmarks, are fuzzy because usage is a
complex issue. The key question is what constitutes typical usage.
Benchmark programs are characteristically composed of endlessly detailed
case situations and caveats making clear comparisons nearly impossible.
The state of affairs is reminiscent of earlier procedural compiler
benchmark comparisons, where the general trends seem clear but details are
debatable.
One further element that adds complexity is the separation of XML
parsing and serialization processing from XSL transformation processing.
While this is fairly straightforward to separate out if all engines use a
DOM object for input, what if a particular engine exhibits better
performance using SAX? Should this transformer be assigned a DOM to SAX
conversion time penalty?
Finally, the reported benchmarks below are only for performance in time
(that is, speed or throughput) and make no reference to footprint, breadth
of compliance, or other characteristics.
A summary of the dominant XSLT engine, plus the new Gregor, is shown
below. All performance numbers are normalized to be 100% for Xalan-J.
Much of this data is obtained from the particular XSLT development team
itself. It is expected that the data is biased toward their own product
because the teams have, inadvertently or otherwise, adjusted the test
suite to emphasis the features and functionality more in line with their
perception of the customer base and usage. And, further, they've optimized
their product accordingly. Basically, this is a Darwinian divergence in
progress and the correctness of their assumptions will directly influence
the market acceptance.
Relative Performance
Apache Xalan-J
Saxon
Apache XSLTC
Gregor
XA35
Authors, Developers, Support Team
Sun + IBM
Michael Kay
Sun + IBM
Ambrosoft
DataPower
Key Architectural Characteristics
Document Table Model
Sax-Centric Tiny-Trees
Compiled to Byte-Code
Pattern matching meta-algorithm
Dedicated and Specialized Hardware
Performance
Apache (Using DataPower XSLTMark++)
100%
na
360%
na
na
Ambrosoft (Using DataPower XSLTMark)
100%
130%
600%
1600%
na
Sarvega XSLT Benchmark Study
100%
200%
na
na
na
DataPower XSLTMark
100%
200%
na
na
2100%
Conclusion
Compilation to bytecode is just one of many tactics being utilized in
the XSLT performance race. It is possible that additional significant
gains will also be obtained in other areas such as intelligent
pre-optimization of stylesheets, clever internal data structures, or even
a more hot-spot like run-time optimizer. Most likely the field will remain
competitive, each product achieving improvements and gains on a
year-to-year basis, new contenders coming onto the scene, and some old
ones slipping off.
Performance is difficult to quantify and predict and highly dependent
upon the exact customer usage. For performance in critical situations it
is recommended that a small handful of the dominant engines be tested in a
particular application before a final decision is made.