Reconstructing DTD Best Practice
by Leigh Dodds
June 13, 2000
In a presentation at XML Europe 2000, Henry Thompson examined current "best practice"
in DTD design and provided a reinterpretation using XML Schemas. The talk
focused on the capability of schemas for defining complex types and asserting
equivalence among classes of elements.
DTD Best Practice
Henry Thompson explained that current best practice in DTD design is to use
parameter entities to define class hierarchies of element types. The
use of parameter entities allows textual declarations in the DTD to be
reused in multiple places.
The problems with this methodology are two-fold. Firstly, heavy use of
parameter entities to properly structure large DTDs makes them harder to
maintain and interpret by greatly reducing readability. Secondly, the
hierarchies of element types are implicit within the parameter entity
declarations, and are not a formal part of the DTD. There is no direct
support for achieving this kind of reuse in DTD syntax.
Formal language design has progressed significantly since the 1960s,
when DTDs were first defined. Thompson observed that, since then, textual
substitution-based mechanisms to achieve reuse and structure, like
parameter entities, are now seen as less than perfect.
Schema Best Practice
Thompson summarized the four basic requirements placed on the XML
Schemas effort as being to 1) reconstruct DTD functionality; 2) accommodate
XML Namespaces; 3) provide a richer set of data types; and 4) take advantage
of the current understanding of formal design. With the latter in mind,
XML Schemas provide richer mechanisms for achieving type reuse and
defining class hierarchies. Thompson singled out two of these as the
main content of his presentation.
XML Schemas provide a separation between element declarations and type
declarations. The schema author can declare a type, and define its
content model and attributes. The author can later associate that with
any number of elements. Types can also be derived from each other,
providing a simple inheritance model. The ability to declare complex
element types provides a means to explicitly state the relationships
between elements. A type hierarchy clearly defines a related family of
elements, and supports reuse.
Equivalence classes, the second feature of Thompson's presentation,
allow the schema author to declare the equivalence of elements based on
their use in particular contexts. In contrast, type declarations define
equivalence based on structure or content. In HTML, for example, an
ordered list (<ol>) and an image (<img>)
both have very different content (i.e., they are not of the same type),
but are equivalent because they may both be used within a paragraph
element (<p>).
While these features overlap to a certain extent, when combined they
provide a rich set of functionality, in which the semantic relationships
between elements are transparent.
The Future
In a brief question and answer session following the presentation,
Thompson provided a few hints regarding future work on the Schema
specification. He explained that the naming of "equivalence classes"
would be altered in a forthcoming draft of the XML Schema Structures
specification to clarify some ambiguities implied by the current
terminology.
Thompson also explained that XML Schemas 1.0 would not include
multiple inheritance, as the Working Group is keen to produce a strong
design for a single inheritance model first. However, Thompson did not
rule out that multiple inheritance could feature in a later version, and
that a model similar to Java (single inheritance supplemented by
interfaces) could still be considered at that time. The emphasis is
clearly on getting the first version of XML Schemas complete.