Protocol Design: How Many Bytes?
by Itamar Shtull-Trauring
November 25, 2003
The Internet is built on protocols. Protocols take the raw,
unstructured capabilities of the network and, using rules and
restrictions, determines what and how programs can communicate. Choosing
the right rules is important: they determine to a large degree the
security, ease of implementation and performance of the protocol. This
is the first in a series of articles discussing basic concepts of
protocol design. The issue we will start with is how a protocol
knows how much data it is going to receive. Protocols are after all
mostly about sending and receiving data.
Before we begin, it's worth noting some basic assumptions. Unless
noted otherwise, the protocols being discussed all run over a
connection-oriented transport, typically TCP. There is an initiating side that starts the
connection and a receiving side that accepts it. In many
cases these will match the concepts of "client" and "server", and will
have different behavior depending on which they are. The connection is
assumed to transport a stream of bytes in an ordered, reliable
fashion.
Many protocols involve sending chunks of "payload" bytes -- data
which is not part of the protocol itself. An email is a structured
sequence of bytes, so when an email is sent or received, the receiver
side of the protocol needs to know when the email data ends and the
protocol begins again. An email that contains a transcript of a POP3
session should not be able to confuse a POP3 client that is
downloading it. In addition, commands and messages of the protocol
itself are also structured, and the receiving side needs to know when
they end and the next message begins.
The first approach that can be used is an end-of-data
indicator: some special way of marking when the transfer of
the data is over. For example, when sending a payload, the sending
side will send a message meaning the data will now be sent,
then the actual payload, and finally a message saying there is no
more data. One of the Internet's oldest protocols, SMTP, uses this
technique to allow clients to send emails to the server. SMTP is
documented in RFC
2821, an updated version of RFC 821, which was
written in 1982. In the SMTP protocol, a client connects to a server,
sends a series of commands indicating from whom and to whom the email
is being sent, the body of the email, and then the server deals with
delivery of the message.
SMTP follows (or perhaps, given its age, leads) the convention of
"line-based" protocols. An SMTP session is composed of a series of lines: a
"line" is a sequence of bytes terminated with CRLF, the bytes with the hex values 0x0D and
0x0A. A line can be a command, a response to a command, or part of a message.
Each of these lines recreates in its own small way the end-of-data indicator
method for finding the end of a message, in this case CRLF. The basic units
of the protocol, the lines, can be any length; the receiving side only
knows when they are over. As a result, all SMTP servers set an arbitrary
length on the length of lines they accept, otherwise a simple connection
sending an infinite stream of non-CRLF characters would use up the server's
memory. Here is an example of a simple SMTP session between a client and
server, taken from the RFC (note that each printed line would be sent with a
CRLF after it):
S: 220 foo.com Simple Mail Transfer Service Ready
C: EHLO bar.com
S: 250-foo.com greets bar.com
S: 250-8BITMIME
S: 250-SIZE
S: 250-DSN
S: 250 HELP
C: MAIL FROM:<Smith@bar.com>
S: 250 OK
C: RCPT TO:<Jones@foo.com>
S: 250 OK
C: DATA
S: 354 Start mail input; end with <CRLF>.<CRLF>
C: Blah blah blah...
C: ...etc. etc. etc.
C: .
S: 250 OK
C: QUIT
S: 221 foo.com Service closing transmission channel
Looking at the example carefully, we'll note two more examples of the
end-of-data indicator. There are multiple responses to the EHLO command,
with response code 250, and the last response starts with "250 ",
rather than "250-", to indicate that no more
responses are forthcoming. A more interesting use is the "DATA"
command, which is used by the client to send the body of the email. The
email is sent as a series of lines, and a line with a single "."
(a period) indicates the end of the email body.
On the face of it this is a reasonable approach, but there are some
serious issues which have led modern protocols to choose other solutions.
Consider what would happen if the email contained a line consisting solely of a
"." character -- the server would get confused and think the email
had ended, even though the period was actually part of the email, not an SMTP
command. In order to prevent this, the SMTP protocol specifies that when
sending the contents of a "DATA" command, any line beginning with
a period must have a period inserted in the beginning. The receiver
checks each incoming line, and if it has a period followed by other
characters, the period is removed, otherwise this is the end of data.
While this does work, it is inelegant and inefficient. A cleaner
solution would be to use length prefixing. Instead of
sending "DATA", the client implementation of an imaginary
improved SMTP protocol would also send the length of the message, for
example "DATA 1235" for a message that is 1235 bytes
long. The server would then read exactly 1235 bytes, and then revert
back to line-based mode. No quoting would be necessary for the client,
no unquoting for the server. In practice, SMTP has an extension for
sending the size of the message, but it is mostly used to allow the
server to deny overlarge messages, and the server still must use the
period indicator method to detect the end of the message.
HTTP, the protocol used for what is commonly referred to as "the
Web", uses length prefixing to indicate the length of a document it is
returning in response to a client request (the headers are still
sent using CRLF terminated lines). Here is a sample HTTP server
response. Notice that the body is separated from the headers by an
extra CRLF, and that the body can be any 12 bytes; there is no need
for quoting nor any restrictions on their values.
HTTP/1.1 200 OK
Content-Type: text/plain
Content-Length: 12
0123456789ab
While quite a nice idea, length prefixing has a problem of
its own: it assumes the length of the data is known in advance. This is
certainly a valid assumption when sending the contents of a file, but when
generating dynamic content the length of the data is not known until all the
data is available. In theory it is possible to wait until all the data has
been generated, and then send it along with its length. In practice this is
inefficient, as it slows down the data transfer and requires extra temporary
storage, either in memory or on disk. One solution, used in HTTP 1.0, is to
allow omitting the "Content-Length" header, and indicating the end of the data
by closing the connection. This solution is also problematic: it makes
it hard to distinguish a failure in the transport (such as a broken TCP connection)
from the end of the data, and it is also inefficient since multiple HTTP requests
to the same server require opening multiple TCP connections.
The updated HTTP 1.1
presented a solution that did not have these problems, a combination
of length prefixing and an end of data indicator. When data is
generated on the fly, it is assumed to be generated as a series of
"chunks", each chunk being at least 1 byte long. An HTTP response can
indicate that is returning a chunked response, in which case it
returns the data as a series of length-prefixed chunks. The end of the
data is indicated by sending a chunk whose length is 0. A chunk's
length is encoded in hexadecimal numerals, and prefixed with CRLF, after which the
chunk is sent. Here is an example HTTP response using chunked encoding
(new lines indicate a CRLF). The "a" means the next chunk
is 10 bytes long, the "3" means the next chunk is 3 bytes
long, and the "0" indicates the end of the response.
HTTP/1.1 200 OK
Content-type: text/plain
Transfer-encoding: chunked
a
0123456789
3
abc
0
End of data indicators versus length prefixing are just one of the
issues protocol designers must deal with, but one which influences
many other aspects. In future articles we will discuss syntax and
structure, state and statelessness, handling multiple requests
and more.