From P2P to Web Services: Addressing and Coordination
by Andy Oram
April 07, 2004
Organizations are facing new technological challenges, often
finding them perplexing or even insolvable, as they modernize their
use of the Internet and intranets. But the common element which these
problems share is that their solutions go beyond technology. These
problems require a social infrastructure, a framework that determines
whether or not technological change is successful. This article
summarizes what researchers and standards committees are doing in
tentative attempts to create that infrastructure.
After making great strides in the 1990s to install LANs, web servers,
virtual private networks, and other facets of the TCP/IP revolution,
system administrators notice that:
- mobile users are logging in from all over the continent;
- employees are attaching wireless devices to your networks at
heaven's knows what access point from minute to minute;
- people are exchanging sensitive data over instant messaging, an
outrageously insecure protocol by default, and one that additionally
replicates many of the problems of email such as viruses;
- employees are collaborating with people outside the company, using
email or more sophisticated collaboration tools, freely sharing
company data in ways you can't control;
- people are running servers on their PCs for the first time and,
thus, exposing services to the network--theoretically opening their
systems to compromise--through technologies such as Rendezvous;
These problems have to be solved at a fundamental level, involving
changes in business requirements, training, and organizational
communication patterns. In short, they require a social
infrastructure, which makes them harder to solve. Computer and
software vendors sell technical solutions for these problems; the
products are probably good ones. But they are not enough.
Perhaps we can gain some insight by looking at the ways in which
peer-to-peer technology was received and accomodated a few years ago?
No one ever offered a great definition of "peer-to-peer". Sometimes
the term is used to cover only file-sharing systems (Napster, Gnutella, and Kazaa) , despite the fact that
peer-to-peer researchers and implementers were looking beyond
file-sharing. Other people defined peer-to-peer so broadly that it
included email. When used appropriately, "peer-to-peer" covers grid
computing, the new generation of collaborative tools (for example, Groove), and new types of distributed
databases and distributed filesystems (for example, OceanStore).
For our purposes here, it's fine to think of peer-to-peer as any
networking technology where crucial responsibility lies at the
end-points. This definition includes all the issues I mentioned
earlier. It may also characterize some aspects of Microsoft's Office
2003 suite.
In fact, definitional inadequacies aside, peer-to-peer isn't really
a set of technologies as much as it is a set of problems. And now the
problems of peer-to-peer are the problems we all face. Peer-to-peer
exposed the weaknesses that exist in the current implementation of the
Internet; it was an avant-garde. And while few peer-to-peer
technologies have been adopted thus far, I expect that in a decade or
so they will be adopted because the problems in social infrastructure
now must be solved.
The challenges of and lessons from peer-to-peer fall under one of
three categories: addressing, coordination, and trust. I discuss the
first two of these in the present article, taking up the third in a
future article.
Addressing
Most of us don't run applications that require personal, persistent
addresses. Suppose you have a great sale to offer customers and want
to promote it through a web service. SOAP offers a way to expose the
information to your customers, who can query you for promotions
through a SOAP call:
<s:Envelope xmlns:s="http://www.w3.org/2001/06/soap-envelope">
<s:Body>
p:QueryPromotions xmlns:p="urn:Promotions">
<category>travel/category>
<expiration>2003-10-31/expiration>
</p:QueryPromotions>
</s:Body>
</s:Envelope>
But why wait for users to think about querying you? Perhaps this
promotion lasts only one week and you want to reach out to loyal
customers in time. You want push technology. Companies do this now
through email. And you're stuck with email; you can't do push through
web services. The problem is that there is no persistent address where
a user can be reached by way of a web service. Web services are
asymmetric: users can query a server, but the server can't query
users. It would also be good if a user could make a web service
request and then disconnect, letting you send the results to the user
at a later time. You're going to have to use email for that too. Web
services are synchronous: the sender has to wait for the reply.
The Real Problem
The two situations I've just described are related. I could
restate these points by saying that our current social infrastructure
provides only one persistent address for a user, an email address, and
that it cannot currently be used easily for other protocols. And even
email lacks adequate persistence; just look at the thousands of
MediaOne subscribers who had to change to AT&T accounts and then
change again to Comcast.
But the venerable email address, for lack of anything better, is
being used as a unique identifier. In other words, web clients lack a
robust return address. In theory, I could send you an IP address,
properly encrypted and signed to prevent spoofing. And a protocol
could be developed for you to send the results of your long-running
operation to the IP address when you're done. But there is no
guarantee I'll be at that IP address; I may have logged off long ago
and my neighbor, who may be my sworn enemy and work for a competing
company, may have logged in and received my old address.
When the addressing problem, which is related to resource
discovery, is raised, some people say, "Implement IPv6, thus providing
enough addresses for every device to be manufactured over the next
several hundred years. Give every device an address; and, while you're
at it, eliminate Network Address Translation and DHCP, and the problem
is solved." No, it's not. People are not tied to individual
devices. We go to work, we go home, we log in through PDAs and
telephones. I am not my computer device.
Furthermore, I need to change addresses as I move. If all of us
could use the Internet using the same IP address whether we were in
Boston, Montreal, or Helsinki, Internet routing would bog down and
become unmanageable. IPv6 does not provide for the use of addresses
in different geographic locations; there is only an extension called
Mobile IP [http://www.ietf.org/html.charters/mobileip-charter.html],
an extra layer designed for cellular phone networks. Implementing IPv6
and eliminating NAT have benefits, but they don't remove the
addressing problem.
P2P Faces the Problem
The peer-to-peer movement had to face the problem of addressing head
on because people at individual PCs had to be reached in a wide
variety of environments. One of cleverest ways to solve the addressing
problem is to design applications so that user addresses don't
matter. This is the solution chosen by Gnutella and many related
file-sharing systems: you just broadcast that you want a certain
file. The request passes from your system to a few systems you know
and then to a few systems each of them knows; eventually some system
comes back saying, "Here is the file." Another way of saying this is
that the addressing problem is moved from the user to the desired
resource. Individual users are free from the addressing problem.
It's interesting how many applications can function with
anonymity. As we have seen, the Web requires the client to identify
the server, but the server does not have to identify the client
(except to obtain a temporary IP address); the server is happy to
display its home page to anyone. Once someone wants to view sensitive
data or buy something, the server will put up a password dialog box or
require a credit card; that's a more advanced situation.
On the other hand, anonymity is currently being allowed in many
places where it's creating trouble, largely because of the rise of
wireless networks and the risk of drive-by intruders. For instance,
corporate file servers routinely put up public sites that anyone on
the network can read; it's assumed that everybody behind the firewall
can be trusted. This was always a bad assumption, but if the network
adds a wireless hub, the administrator has to worry constantly about
who's sneaking up to it and snooping around. In a similar fashion,
many corporate mail servers accept mail from anyone on the LAN. That
problem has been highly publicized because intruders have been using
such hubs to send unsolicited bulk email. So more and more, we are
discovering the need to assign persistent identities to users. In the
case of wireless, organizations are doing so by making them log in
before using the network.
Sender Policy Framework, which has
been in the news a lot as email software designers and ISPs call
for its adoption, works on a slightly different level. It doesn't
identify end users. Instead, it provides checks to ensure that
mail messages correctly identify the hubs and relays through which
they pass. This is more of a routing issue than an addressing
issue; the basic form of addressing (DNS) is no different when SPF
is used.
Solutions to the addressing problem fall into a few categories. In
the case of a wireless LAN, the first solution is simply to make users
authenticate themselves with a central repository that contains their
identities and to record their addresses for a single session. This is
what most instant messaging systems do. It was also the
quick-and-dirty solution chosen by Napster, which is why it could
easily be dismantled for vicarious and contributory copyright
infringement while modern Gnutella-based file-sharing systems cannot.
This dependence on central servers scales well. AOL Instant Messenger
shows that such a system can serve millions of users. Still the system
suffers from a flat namespace (once someone chooses the name John, no
one else can use it), and it puts control in the hands of the people
who run the servers.
The second solution is used in Apple's Rendezvous (an
implementation of the Zero Configuration Networking or Zeroconf standard) and in many
other network systems meant for LANs, including some Microsoft
domains. Each would-be peer announces the address or name it wants,
and if it hasn't been claimed already by another peer, it is assigned
to the newcomer and recorded by each of the other peers. This solution
requires all participants to be on the same LAN, for several reasons:
it depends on broadcasts, it doesn't scale up to huge numbers, and
it's open to many attacks if a peer isn't well-behaved.
The most robust and scalable solution in current use, the Domain
Name System, was created twenty years ago. DNS was extended long ago
with special records (MX records) to support email, which I mentioned
as the one form of persistent address in our social
infrastructure. DNS makes it easier to maintain a network of mail
servers. It would be interesting to see whether support in DNS for a
more generalized addressing solution could allow other services to
support persistent addresses or lead to a more general form of
addressing that could be used by many applications and protocols.
Such support was actually added about five years ago, in the form
of SRV records as specified in RFC 2782. These
records can specify any well-known server and provide the information
needed to reach it in a flexible manner. SRV records have not been
generally adopted and are not being pushed by the IETF; but they are
in widespread use: by Apple's Rendezvous, by
Kerberos, and by
Microsoft's Active Directory.
The Jabber instant messaging
service, an XML message passing system that is not highly popular yet
but whose protocol was officially standardized by the IETF as the
Extensible Messaging and Presence Protocol (XMPP),
partly solves the addressing problem by depending on DNS, and
suggesting that each user run his or her own server. Doing so is not
required, but if practiced, automatically gives each user an
address.
Domain names are perhaps the best solution in the ideal sense but the
worst in their practical implementation. They remain relatively
heavy-weight and present many barriers to the average user, partly
thanks to the original implementation of the system and partly thanks
to persistent intervention by large corporations and their legal
representatives. Particularly in the global top-level domains like COM
and ORG, this intervention has been effective in keeping most
individuals from taking advantage of the persistent names offered by
DNS.
The supply of domain names is artificially limited, so much so that
a whole business has grown up around notifying someone when his
desired name becomes available. VeriSign fought with
registrars over who gets to dominate this activity, which adds no
value to society. Compared to the cost of actually administering DNS
servers, prices of domain names in the popular top-level domains
amount to information highway robbery. Even if you get past these
barriers and obtain your own domain name, you cannot consider it safe
unless you also invest thousands of dollars to obtain a national
trademark. Furthermore, registration requires you to make your
contact information public, an anti-privacy measure that renders the
system inappropriate for individuals.
When you turn to country domains, the situation is much more
user-friendly. But registering is still too much trouble and expensive
for most people; compare its difficulty to the convenience of getting
a login account on one of the major instant messaging services.
Researchers have been searching for years for a distributed system
of addressing and resource discovery. The more heavy-weight
peer-to-peer systems such as Chord and
Tapestry,
both in the experimental stage, design addressing and routing
systems.
Each node that joins one of these systems is assigned a unique,
random identifier. Certain nodes know how to reach others with similar
numbers. When trying to reach another node, you start by choosing one
you know whose first few bits match the identifier of the node you
want. The system is a lot like standard Internet routing at the IP
layer.
Thus, if you want to reach 12345 and you have two choices, 12862
and 12347, you choose 12347 because more of the initial bits
match. 12347 requires fewer hops to get to 12345. This kind of system
is intriguing, but we don't know yet how practical it is.
Much of the p2p network research was subsidized by the music
industry, for which we should offer our sincere thanks. Without that
subsidy, how could researchers collect statistics over months and
years based on participation by literally tens of millions of nodes?
It was sheer genius to offer popular recordings to sign up users, and
the world will benefit from the testbeds that these systems
provide. The music industry botched the PR, of course. So did
universities, who also subsidized the research by providing large
amounts of bandwidth, but tried to strangle the traffic once they
noticed it.