Printing XML: Why CSS Is Better than XSL
by Håkon Wium Lie, Michael Day
January 19, 2005
Longtime readers of XML.com will remember the battles between XSL and
CSS that took place in these columns in 1999 and that were memorialized
in XSL
and CSS: One Year Later. Since then, the two languages have
coexisted in relative peace: CSS is now used to style most web sites,
XSLT (the transformation part of XSL) is used by many server-side, and
XSL-FO (the formatting part of XSL) has found a niche in the printing
industry.
A recent entry in the blog of a web luminary may signal the start of
a second round of hostilities. Norman Walsh, a member of the W3C's
Technical Architecture Group and co-author of the W3C's Web Architecture document
(WebArch), recently blogged:
...
web browsers suck at printing. ... And CSS is never going to fix it. Did
you hear me? CSS is never going to fix it.
It's unclear if this statement is a prediction or a threat. Or just
blogging on a bad day. Anyway, the pronounciation of CSS' printing
ineptness gives us a splendid opportunity to explain why CSS is a better
language than XSL for most printing needs. As we have just used CSS to
style a 400-page book which will be published later this year
(Cascading Stylesheets, designing for the web by
Håkon Lie and Bert Bos, 3rd ed, forthcoming from Addison-Wesley,
this year), this is not purely an academic excercise in stylesheet
linguistics. So, would-be authors should continue reading.
The Problem
Both camps agree that a printed document is, in many ways, more
difficult to format than on-screen presentation. A printed document must
be split into numbered pages, with added headers and footers. Page
margins must be specified, and they may be different on left and right
pages. References that appear as hyperlinks on-screen often include page
numbers on paper.
The disagreement starts with how best to express all this. Walsh's
solution is to write a 1000-line XSL transformation that generates
XSL-FO, which is subsequently turned into PDF. We will argue that it's
much easier for most authors to express styling in CSS; in the case of
the WebArch document, one can reuse the existing CSS stylesheets (200
lines or so) and add some print-specific lines. And, although browsers
tend to focus on dynamic screens rather than on printing, products like
Prince happily combine CSS with
XML and produce beautiful PDF documents.
(Some disclosure at this point is appropriate. We, the authors, have
been actively involved in shaping CSS and are now working hard to build
software--Opera and Prince--that supports
CSS.)
The Flavors
Before going into the print-specific features, let's compare the
basic flavors of XSL and CSS. Consider this fragment from Walsh's XSL
transform:
<xsl:template
match="html:p[@class='copyright' and ancestor::html:div[@class='head']]"
priority="100">
<fo:block space-before="8pt"
space-after="8pt"
font-size="75%">
<xsl:apply-templates/>
</fo:block>
</xsl:template>
The purpose of this code is to select certain elements (specified in
the match attribute) and to set certain formatting properties
on these elements (e.g., font-size).
Using CSS, this can be written:
div.head p.copyright {
margin-top: 8pt;
margin-bottom: 8pt;
font-size: 75%
}
Compare the two fragments. Which do you find more readable? Which
language would be easier to learn?
Explaining this XSL snippet to a non-programmer would also be
awkward:
<xsl:template match="html:ol/html:li">
<fo:list-item>
<xsl:if test="not(preceding-sibling::html:li)">
<xsl:attribute name="keep-with-next">always</xsl:attribute>
</xsl:if>
The CSS equivalent, however, is more intuitive:
ol li:first-of-type { page-break-after: avoid }
Printing with CSS
As we all know, simple tools cannot always perform advanced tasks.
Even if CSS were able to simplify some fragments, it wouldn't do much
good if the language had inherent limitations that made it impossible to
describe advanced features. The question becomes, then, whether there
are any inherent limitations in CSS that could make it unfit for
producing printed documents.
The answer is no. CSS2,
which became a W3C Recommendation in 1998, introduced the concept of
pages in CSS. By using it, one can set page breaks (even Internet
Explorer supports this) and page margins. More recently, a W3C Candidate
Recommendation (called CSS3 Paged Media
Module) added functionality to describe headers, footers, and more.
Let's start with a simple example:
@page { size: A4 portrait; }
This simple statement tells the formatter that the resulting PDF
document should be of size A4
(which is common outside North
America), and that the orientation should be portrait. To change the
size of the generated PDF document, one simply changes A4
into
another size. Peeking inside the XSL sheet again, we find two 40-line
switch statements to enable similar functionality. One of the statements
is reprinted in full below for entertainment purposes:
<xsl:param name="page.height.portrait">
<xsl:choose>
<xsl:when test="$paper.type = 'A4landscape'">210mm</xsl:when>
<xsl:when test="$paper.type = 'USletter'">11in</xsl:when>
<xsl:when test="$paper.type = 'USlandscape'">8.5in</xsl:when>
<xsl:when test="$paper.type = '4A0'">2378mm</xsl:when>
<xsl:when test="$paper.type = '2A0'">1682mm</xsl:when>
<xsl:when test="$paper.type = 'A0'">1189mm</xsl:when>
<xsl:when test="$paper.type = 'A1'">841mm</xsl:when>
<xsl:when test="$paper.type = 'A2'">594mm</xsl:when>
<xsl:when test="$paper.type = 'A3'">420mm</xsl:when>
<xsl:when test="$paper.type = 'A4'">297mm</xsl:when>
<xsl:when test="$paper.type = 'A5'">210mm</xsl:when>
<xsl:when test="$paper.type = 'A6'">148mm</xsl:when>
<xsl:when test="$paper.type = 'A7'">105mm</xsl:when>
<xsl:when test="$paper.type = 'A8'">74mm</xsl:when>
<xsl:when test="$paper.type = 'A9'">52mm</xsl:when>
<xsl:when test="$paper.type = 'A10'">37mm</xsl:when>
<xsl:when test="$paper.type = 'B0'">1414mm</xsl:when>
<xsl:when test="$paper.type = 'B1'">1000mm</xsl:when>
<xsl:when test="$paper.type = 'B2'">707mm</xsl:when>
<xsl:when test="$paper.type = 'B3'">500mm</xsl:when>
<xsl:when test="$paper.type = 'B4'">353mm</xsl:when>
<xsl:when test="$paper.type = 'B5'">250mm</xsl:when>
<xsl:when test="$paper.type = 'B6'">176mm</xsl:when>
<xsl:when test="$paper.type = 'B7'">125mm</xsl:when>
<xsl:when test="$paper.type = 'B8'">88mm</xsl:when>
<xsl:when test="$paper.type = 'B9'">62mm</xsl:when>
<xsl:when test="$paper.type = 'B10'">44mm</xsl:when>
<xsl:when test="$paper.type = 'C0'">1297mm</xsl:when>
<xsl:when test="$paper.type = 'C1'">917mm</xsl:when>
<xsl:when test="$paper.type = 'C2'">648mm</xsl:when>
<xsl:when test="$paper.type = 'C3'">458mm</xsl:when>
<xsl:when test="$paper.type = 'C4'">324mm</xsl:when>
<xsl:when test="$paper.type = 'C5'">229mm</xsl:when>
<xsl:when test="$paper.type = 'C6'">162mm</xsl:when>
<xsl:when test="$paper.type = 'C7'">114mm</xsl:when>
<xsl:when test="$paper.type = 'C8'">81mm</xsl:when>
<xsl:when test="$paper.type = 'C9'">57mm</xsl:when>
<xsl:when test="$paper.type = 'C10'">40mm</xsl:when>
<xsl:otherwise>11in</xsl:otherwise>
</xsl:choose>
</xsl:param>
As the alert reader will already have inferred, the statement lists
the heights of many different paper sizes. As such, it is interesting
reading. However, we do not understand why this list belongs in a stylesheet. CSS provides a simple and elegant alternative by naming the
different sizes in the specification rather than in each stylesheet.
Another example that shows the elegant simplicity of CSS is that of
page numbering. Page numbers are commonly printed on the outside
of a page so that they are easily visible when flipping through a book.
So, on a right page the page number should be on the right side, and on
a left page it should be on the left side. On the first page, there
should be no page number. In CSS, you can express this with:
@page :left {
@bottom-left {
content: counter(page);
}
}
@page :right {
@bottom-right {
content: counter(page);
}
}
@page :first {
@bottom-right {
content: normal;
}
}
The statements, while not pure English prose, are easily
understandable for anyone who has read this far, and it would be a
simple exercise for the reader to move the page number from the bottom
of each page to the top.
Because of size constraints, we're not going to show you how page
numbers are expressed in XSL. We challenge you to find
it and then try explaining it to the first person you meet.
Reuse and Cascading
One reason why the web took off in the early 90's was the manner in
which HTML is authored. By looking at the source code of other
documents, web authors could easily get started in web publishing. In a
sense, HTML is the most successful open source movement. CSS also
encourages reuse of code and has formalized how it works through the
cascading rules. For authors, this means they can take an
existing stylesheet and add to it their own rules instead of writing a
new one themselves.
One case in point is how to express page breaks for printed
documents. Typically, you want to avoid page breaks after headings, and
this can be expressed by adding a simple rule:
h1, h2, h3, h4, h5, h6 { page-break-after: avoid; }
Here, the first line lists elements to which the second line applies.
As a result, the formatter will avoid page breaks after these elements.
XSL has no concept of cascading and cannot easily express the above
example. Instead of grouping elements, one has to add a rule to each
element's template. Here is what the template for h1
elements looks like:
<xsl:template match="html:h1">
<fo:block space-before="0.25in"
color="#00599C"
font-size="16pt"
font-family="{$title.font.family}"
keep-with-next="always"
id="{generate-id()}">
(XSL has chosen another name for the property, i.e.,
keep-with-next instead of page-break-after.)
Likewise, it is easy in CSS to remove text decorations (e.g.
underlining) on all elements:
* { text-decoration: none }
Table of Contents
Many documents start with a table of contents (TOC). On-screen, the
TOC is clickable and takes the user to the requested section. Paper,
being more static in nature, needs references that can be followed
manually. A TOC on paper, therefore, lists the number of the page where
the section can be found.
Expressing this in CSS results in a slightly more complex rule than
the examples you have seen so far. Consider this:
ul.toc a:after {
content: target-counter(attr(href), page); }
In English, the rule would read as follows: inside ul
elements of class toc, all a elements should
be trailed (:after) by some generated content. The
generated content is the page number where the target of the link is
found. The link is expressed in the href attribute of the
a element.
One reason for the added complexity is that CSS, contrary to a common
misconception, has been designed to work with generic XML as well as
HTML. In HTML, links are expressed in href attributes on
a elements. In generic XML, however, links can be anywhere,
and their position must be specified.
Another common feature of TOCs on paper is a dotted line between
section titles and the respective page numbers. This is called a
leader in typesetting terminology and can be expressed in CSS
as follows:
ul.toc a:after {
content: leader('.') target-counter(attr(href), page); }
Compared with this three-line CSS solution, expressing TOCs in the
WebArch XSL stylesheet takes more than 50 lines. In fairness, the XSL
code also expresses other properties for TOCs (for example, that page
breaks should be avoided). The CSS syntax in the above examples is still
at the draft
stage.
By combining the print-
specific CSS stylesheet described above with the WebArch document,
a nicely formatted PDF document can be created.
Multi-Column Layouts
On paper, content is often laid out in multiple columns. Stylesheets
must be able to express this. Using CSS, one can easily create
multi-column layouts:
body { column-count: 2; column-gap: 8mm; }
The content of the body element will now be poured into
two columns, between which there is an 8mm gap.
Multi-column layouts are also available in XSL, but the obligatory
verbosity/complexity warnings apply.
Conclusions
So can CSS do everything better than XSL? Not quite. XSL is a
Turing-complete language which, in principle, can be used for all
programming tasks and is particularly suited for document
transformations. Styling documents is only one of many things XSL can
do. CSS, on the other hand, has been developed with only one task in
mind: styling documents.
On the web, CSS is the style sheet language of choice. However, the
usefulness of CSS is not limited to screens. If you want to transfer web
content--be it XML or HTML--onto paper, there are good
reasons to use CSS. The language is radically simpler than that of XSL,
and it is suitable both on-screen and on paper. This means that you
probably don't have to write a stylesheet at all but can reuse an
existing one.
Finally, by using CSS you can preserve the semantics of your content
all the way to the printer. That, however, is a
different discussion.