A Realist's SMIL Manifesto
by Fabio Arciniegas A.
May 29, 2002
Realist: One who is inclined to physical evidence or pragmatism. --
From the Realist Manifesto (1920), written by constructivist authors
and brothers Antoine Pevsner
& Naum Gabo
The Synchronized Multimedia Integration Language, SMIL, has a
less-than-stellar past but a very interesting future. SMIL 2.0 recaptures the
simplicity and practicality of declarative synchronization of media introduced
by version 1.0, while adding modularization and content-related features much
missed in the early version.
The goal of this two-part series is to illustrate best practices and creative
uses of SMIL 2.0; in particular the creation of guided-reading documents which
push the boundaries of Web narrative technology by combining classic layout and
design practices with television-like effects.
The present article deals with the problem of enhancing video inexpensively
and dynamically with SMIL 1.0 and assumes no prior knowledge of SMIL 1.0. It
covers the current state of SMIL; the structure and syntax of the language, with
examples; and SMIL 1.0's strengths and flaws. It is meant to get you up to speed
with the last three years of SMIL, while the next article will show you what is
ahead in the coming years, and how SMIL can be a player in improving narrative
technology on the Web. (You can download the example files I use
in this article, but be warned: they are about 4 mb.)
The State of SMIL
The SMIL project started in 1998 and then, after initial enthusiasm in
multimedia circles developing kiosks and similar applications, virtually
disappeared from people's attention, in favor of other technologies. With the
August, 2001 release of SMIL 2.0, the buzz is starting to return, but SMIL
suffers from two main problems: confusion about terminology and the lack of
business or artistic orientation in current literature.
Confusion about terminology and versioning
Keeping up with version numbers in commercial multimedia packages is simple;
the relevant entities are the "editor" and "player", the
versions of which are usually the same, and they are either "beta" or
"release". Because of technical and bureaucratic reasons, things with SMIL were
not so simple. First of all, SMIL 2.0 is technically not just a language but a
collection of reusable modules (animation, layout, synchronization) which can be
independently implemented and used in other languages. Second, as a W3C
recommendation, the status of SMIL at any point includes less well-known markers
like "Candidate Recommendation", "Note", which generally do
not improve the clarity of the situation to the intended SMIL public.
In SMIL elements and attributes are grouped into independent bundles called
modules; for example, the layout and region elements are in
the Layout Module, and the animateColor and animateMotion elements
are in the Animation module. SMIL modules can be grouped into a language, called
a profile. There are two SMIL profiles, "SMIL 2.0 language profile" and a
simplified version, "SMIL 2.0 basic profile", designed for small devices. Both
are supersets of the original SMIL 1.0 language.
Modules are designed to be reusable as parts of other XML vocabularies, so
vendors or other standards initiatives may decide to implement only parts of
SMIL. Examples of this practice include the marriage of XHTML and the SMIL
timing module and declarative animation in SVG, implemented by IE6 and Adobe SVG
Viewer 2+ respectively. As far as direct SMIL support is concerned, there are a
number of SMIL 2.0 players in the making (see side box) but most of the
available players still use SMIL 1.0. The examples of the SMIL 2.0 language
profile discussed in this article work on SMIL 1.0 players, except where
noted.
The other big impediment to popularizing SMIL is the nature of the current
literature, which for the most part contains a descriptive overview of each
module, its elements and attributes, with occasional examples of a zooming
square or a photo slideshow. This documentation pattern doesn't address the
communication potential of SMIL or its contribution to the media. It's certainly
not going to convince any manager to invest in a SMIL development or a creative
developer to learn SMIL. The key to popularizing SMIL is to emphasize its
potential to expand the the possibilities of a media-rich Web, rather than its
strictly technical superiority.
The Process
Whether using SMIL 1.0 or 2.0, the steps involved in creating a presentation
with SMIL 1.0 (hereafter, "SMIL") are invariably the following:
- Create an XML document and include the appropriate
namespace. The root element is smil, and its children are
head and body
- In the head element, code the layout of the areas where
content can be inserted
- In the body element, code the references to the content to
be inserted; specify where, when, and for how long each element is
shown.
The Problem: Late and Localized Annotations
When you watch even the simplest television show you're watching images
composed of several layers of content: the actual video filmed with a camera,
the logo of the channel on a corner, annotations (in the case of Figure 1 the
name of the band and the song) etc. Some networks add even layers of content,
providing extra data about, say, the drivers of a NASCAR race or trivia about
the band on a music video.

Figure 1. Images are composed by superimposing layers
The problem with traditional TV is that all the layers get merged before they
are shipped to everyone's television, where people get one flat image. Media
like Digital HDTV and the Web using SMIL can keep track of the different
components of a presentation. Thus, they can avoid merging layers early by
deciding at presentation time to hide or to show content, depending on user
preferences or other factors and constraints. For example, a DVD can show or
hide captions with script notes synchronized with the movie, at the user's
will.
Showing and hiding extra layers of content is just the tip of the iceberg;
using SMIL you can position and synchronize any media on top of your video
without ever having to decide on a final merge of your pieces. Furthermore, you
can combine SMIL with dynamic content and customize and localize your layers,
opening new opportunities for information, entertainment, and
publicity.
The Project: Annotating Boxing Footage
What we want from this project is a solution for visually annotating videos,
adding layers of content with data dependent on the locale and preferences of
each viewer, without having to alter the video itself. This is a very desirable
feature for many media sites, which want to inexpensively add dynamic content to
their video for publicity and business purposes.
The steps to create an annotated video include
- deciding what video and what kind of annotation data we want;
- creating a layout for the annotated video: figuring out which region serves
which purpose, the size and position of the regions;
- deciding the sequence and duration of events; and
- modifying the source of the annotations so that they can be localized and
customized.
Each step involves not only technical knowledge about the SMIL language, but
effective design ideas, which make the difference between a nice experiment and
an effective tool.
The Video
The video we will annotate is a portion of a boxing match between Jake La
Motta and Sugar Ray Robinson in 1951. The reason I picked this clip is because
it is small, and sports feeds are a realistic example of video that can be
served by dynamic annotations.
We want to add three kinds of annotation: opening titles, boxers' statistics,
and associated trivia. Figure 2 shows a snapshot of the final result versus the
original video.

Figure 2.
The naked video vs. the Final Result
Layout
Layout is the process of arranging elements in a space. Effective layout
directs the attention of the user, guiding her through the hierarchy of
elements. Layout is accomplished in a variety of ways, like providing a sense of
depth, creating contrast between elements, or intuitively sequencing
elements.
Directing the viewer's attention to different elements involved in a video is
a lot easier than in static graphics because elements can pop up and disappear
from the screen. However, important style notions are relevant for our example,
especially the notions of regularity, recognition, and depth. Table 1 shows the
layout regions for our content, the code necessary to implement them, and their
rationale.
| Layout Areas |
Code |
 |
<smil>
<head>
<layout>
<root-layout id="video"
width="159"
height="20"/>
<region id="comment" left="10" top="9"
width="34" height="29" z-index="1"/>
<region id="stats" left="105" top="14"
width="43" height="75" z-index="1"/>
<region id="title" left="12" top="99"
width="113" height="15" z-index="1"/>
<region id="caption" left="29" top="90"
width="102" height="20" z-index="2"/>
</layout>
</head>
<body>
<!-- Not shown -->
</body>
</smil>
|
| Rationale |
Using a total area not bigger than the video itself promotes the
reusability of the annotated video because we don't have to make
compromises or assumptions about the background color of the area not
covered by the footage. |
Rhythm is important in a layout because it helps the user recognize and
classify information. In the case of SMIL annotations nothing is easier
than achieve regularity by consistently showing related information on
the same places. We use totally different areas for Tips and
Statistics. |
Banking on well-known practices is often convenient. Titles and people's
names at the bottom are instantly recognized by users, so are
white-on-black captions centered at the bottom, on top of all
else. |
Table 1. Video Annotation Layout
I've kept the code compatible with SMIL 1.0 because there are very few
players for SMIL 2.0 and the ideas introduced here are the same in SMIL 1.0 and
2.0.
Adding and Grouping Elements
The first elements we want to add to our presentation are the opening
credits, which are two simple GIF files. What we want, as specified in the
timeline of Figure 3, is for each GIF to appear for 3 seconds, one after the
other. To achieve this we reference the media using img elements, and
we group them in a seq (for sequence) element, as shown in Listing
1.
 |
<smil>
<head>
<!-- Layout exactly as in Table 1 -->
</head>
<body>
<seq>
<img src="Intro-Names.gif" region="video"
dur="3s"/>
<img src="Intro-Date.gif"
region="video" dur="3s"/>
</seq>
</body>
</smil>
|
| Figure 3. Timeline for credits |
Listing 1. Showing the credits in a sequence |
As you can see, specifying a sequence in SMIL is very intuitive. Before
getting into more sophisticated ways of specifying synchronization, the prior
question is what media types you can synchronize. The media elements tags
are
- img : JPEG or GIF images work on all current players. See the documentation
of your player for details. GIF89 transparency is supported in any current
player, non-interlaced GIF preferred in RealPlayer.
- video: MPEG, AVI, RealVideo and other formats for motion clips must
be included using this element. The support for different video formats is
specially dependent on the player.
- text: Static text. HTML is not supported in any SMIL 1.0
players.
- audio: Audio clips including WAV and AU. Also covers streaming
audio such as RealVideo
- animation: Animation clips. The types supported are especially
player-dependent and limited (don't really expect Flash and Mojo support in
standalone players).
- ref: Any clip not covered by other elements but supported by the
player
It is important to realize that the existence of an explicit tag does not
mean that every SMIL player supports that media type. The incomplete support
for some media types in many players is one of the reasons for the slow adoption
of SMIL. For example, you cannot see through the transparent areas of a GIF file
or include HTML as a media element in any of the current SMIL 1.0 players and
support is only partial in SMIL 2.0 players presently.
RealPlayer and Quicktime include extra elements for including vendor-specific
"smart text" for effects like tickers and basic formatting. Unless you have to
produce SMIL 1.0 specifically for either platform, you should avoid such
extensions for the sake of portability.
[1] [2] Next