Protocol Design: Structure and Syntax
by Itamar Shtull-Trauring
April 21, 2004
Protocols use syntax, the way the data they send is formatted
and organized, to send and receive structured information. A POP3
server knows that the bytes in the message RETR 1
followed by a CRLF should be parsed by splitting, based on the space
character. The first part indicates a command and the second indicates
an argument to the command, in this case, an integer index written
using decimal representation, formatted using the ASCII characters for
numerals (the bytes ONE would be meaningless to the
server). Thus, a sequence of bytes formatted according to a known
syntax correspond to a structured message: a command with an integer
argument.
What design goals should be kept in mind when choosing the syntax
for a protocol? The most important goals should be simplicity and
consistency. By making the syntax easy to parse and generate, the
protocol implementation will be shorter and simpler. This will
minimize the occurrence of bugs, which in turn can make the
implementation less vulnerable to attackers (many, if not most,
vulnerabilities in network applications are in the parsing code). In
cases where multiple implementations are expected or encouraged, an
easy-to-implement protocol will be more likely to be adopted. More
importantly, it will be less likely to have interoperability problems
between different implementations.
Simplicity should not, of course, limit the functionality of the
protocol. The second goal when choosing a protocol's syntax is
extendibility -- the ability to accommodate future changes and
additional functionality.
Another much-touted goal is that of being human-readable,
sometimes described as being a "text protocol" rather than a "binary protocol." Unfortunately, these definitions are vague, and to some degree, meaningless. All protocols are ultimately sequences of bytes,
which is to say, numbers. Some protocols will choose to use bytes that
happen to match up to the way computers encode English text (i.e. use
bytes that match up to the alphanumeric bytes in ASCII), and to choose
a syntax easily understandable visually by a human. Even so, some
amount of post-processing is being done to make the bytes
understandable. A more meaningful and reasonable goal might be to
allow the protocol to be easily "parsed" and generated by people (for
debugging and testing purposes) using minimal software support.
With these goals in mind, two general approaches to syntax design
can be considered. One way of representing structured data in a
protocol is to create new syntax for each piece of structured
information that needs to be represented. For example, in the
following excerpt from a SIP (Session Initiation Protocol, used in
VoIP) message, the first line is a command in one syntax, with the
second argument of the command using the URI syntax, and the rest of
the lines using a different syntax indicating a key-value pair. Each of
these headers then has its own syntax for the value it needs to
represent. The Via header, for example, records the address of the
client that sent the message. The From and To headers use a
different syntax for a different form of address.
REGISTER sip:example.com SIP/2.0
Via: SIP/2.0/UDP 192.168.1.100:50609
From: <sip:smith@example.com:50609>
To: <sip:smith@example.com:50609>
Contact: "John Smith" <sip:smith@192.168.1.100:50609>
Call-ID: 94E7E5DAF39111D791C6000393764646@example.com
CSeq: 9898 REGISTER
There are a number of issues with this approach that make its use
problematic. Each new piece of structured information needs new
standards to be defined, and new code to generate and parse it. If the
specification is vague, different implementations will output the same
data in different ways. Extending and adding new information typically
involves creating new syntax, which can cause backwards compatibility
problems with old parsers.
An alternative approach to designing protocol syntax involves
separating the task into two stages. In the first stage, a syntax is
chosen that can be used to create generic structures not necessarily
tied to the protocol. This syntax should be simple, but powerful
enough to represent all potential information the protocol will want
to transmit. In the second stage, the protocol-specific information is
encoded using the supported structure, which can then be encoded to
bytes with the chosen syntax. The result is a single syntax that can
be used throughout the protocol, which only needs one parser and can
be easily validated. The protocol can be extended by changing the
structure of encoded information, with no need to change the
syntax. Of course, care needs to be taken in both design and
implementation to support future changes in structure.
Probably the best known example of such an approach is XML. XML
allows the encoding of structured information, using a well-defined
syntax. The structure is that of nested records that can have
attributes and contain other records or text. Namespaces allow
different information and schemas to be used in the same
document. Here's a sample XML message:
<?xml version="1.0" encoding="iso-8859-1"?>
<certificate>
<issuer>bob</issuer>
<subject>alice</issuer>
</certificate>
XML's suitability as a structured syntax for protocols depends on
the requirements of the protocol. XML tends to be verbose compared to
custom syntax. In most cases this is irrelevant, but in some instances,
this can limit its usefulness. For example, SIP messages need
to fit in a UDP datagram and are therefore limited to a rather small
number of bytes. If SIP messages were encoded in XML, it's possible
that they would simply be too big.
Another more important point to notice is that XML documents are
formed of Unicode text (that is, a series of abstract characters such
as "Uppercase letter E" or "the letter 'Dalet' in the Hebrew alphabet") and not as a series of bytes. The text is then encoded to
bytes using an appropriate Unicode encoding. For protocols that
involve communication between humans, the use of Unicode is an
important feature. For example, the Jabber protocol is an instant
messaging protocol implemented using XML. The Unicode support allows
it to send structured messaging in virtually every human language. In
many other cases, the fact that messages are Unicode text is
unimportant or irrelevant.
Some protocols do have a problem with the fact XML is composed of
Unicode text. The problem is that XML has no reasonable way of
representing bytes. Since, for example, a JPEG image is a sequence of
bytes, a protocol that requires sending such images will not be well-suited to a pure XML solution. There are a number of solutions to this
problem, which include using a separate connection for transferring
byte-oriented information (this is how Jabber sends
files) and various schemes for combining XML documents with other
types of structured sequences of bytes. Another small issue with XML
involves cryptographic signatures, which require a canonical format
for data, but because of a flexible definition, XML documents can
represent the same information in a number of different ways. There
are standards for XML canonicalization, but not all XML-processing
tools support them.
A simpler alternative to XML are s-expressions. An s-expression is
essentially a structure composed of lists of byte sequences or other
s-expressions, which is to say, nested lists. While a number of syntax
representations are possible, a nice example is Ron Rivest's,
which are used by the SPKI (Simple Public Key
Infrastructure) standard. Compared to XML, s-expressions are simpler, less
verbose, and support storing sequences of bytes quite nicely, while
the ability to have a structure of nested lists allows for the creation of complex data structures. S-expressions are thus a suitable replacement
for cases where XML may be inappropriate. The following example
consists of a list where the first item is an 11-character
byte-sequence, followed by two nested lists:
(11:certificate(6:issuer3:bob)(7:subject5:alice))
There are many other standards for encoding various types of
structured data, from low-level data types used in RPC systems to
high-level data structures encoded on top of XML. Whether you choose
one of these standards or design your own format, you will do well to use a protocol that defines its messages using a structured data on
top of a simple, consistent, and unified syntax. The resulting protocol
will tend to be simpler and easier to implement, easier to extend, and
less likely to suffer from interoperability and security problems.