
The Library of Congress Comes Home
by Kendall Grant Clark
March 17, 2004
In the inaugural article ("Geeks and the
Dijalog Lifestyle") of my new XML.com column, Hacking the Library (RSS feed, Atom feed), I offered
a short tour of the territory I intend to explore with you, dear reader. It's a
territory I call "dijalog", which stands for the confluence and intertwingling
of the digital and the analog. If you're like me, you will never live the
pure, weightless all-digital media lifestyle. Our media collections weren't
born digital.
While I presented dijalog last time as a characteristic of people with
lifestyles like ours, it's more fundamental than that. Dijalog is
really about correlating and managing the interplay between physical
space and virtual space, which is accomplished by organizing,
describing, and managing objects which exist in both of these spaces.
I know that sounds complex and cerebral, but it's actually a
simple, even basic idea. And the best way to explain it to you is by
way of example. Let's consider what will be my subject for this and
the next column: organizing your library (that is, your personal media
collection) at home.
The Library as Dijalog Institution
What is a library? The first social thing to say about
lending libraries in the United States is that they are thriving
socialist institutions in the midst of capitalist fervor. That's a
very interesting idea, one which could delay me for many paragraphs if
I let it.
What can we say about lending libraries from the perspective of
information management? Libraries, including some
personal ones, are dijalog institutions. Libraries are (1) chunks of
physical space, (2) highly organized and regimented, which exist, in
part, to facilitate (3) the navigation of a virtual space, in this
case, the information space of all (ideally, anyway) recorded human
knowledge.
Let's unpack that sentence in three steps.
First. Libraries are places, sites, locations in the physical world. A
library is a place that you can visit, around and in and through which
you can move, as a body moves through space.
Second. Libraries aren't merely spaces: they are highly regimented,
organized, controlled spaces. A library is a space that brims with all
the signs and pomps of human purpose. I want to revive an old word to
describe this kind of socially significant space. A library is a
habitation; it is a human dwelling place -- a place where
human projects, goals, purposes, and ends can be acted out.
Third. Libraries aren't merely habitations: they are
social spaces organized to aid people's navigation of another, a
non-physical space, namely, the information space made up of and by
all recorded human knowledge.
Want to learn about Chinese pottery during the Han period? Take the
elevator to the third floor, go down 12 rows, turn right, walk halfway
down the aisle, five rows from the top, grab the first three
books. Need schematics for the design of a wastewater treatment plant
with excellent aerative capacity and a small footprint? Take the
tunnel to the next building, up the stairs to the seventh floor, turn
left....
Thus, by navigating through, that is, by cleverly inhabiting, a
particular, highly regimented social space, you can identify, locate,
and interact with objects -- born digital, born physical, or both --
that represent or constitute your very own culture, or cultures far
removed in space and time from your own.
A library is, then, a dijalog institution: it's a place where the
interplay of physical and information space is managed.
LCC@Home: Why and What
Who knew libraries were such complex places, right? Librarians knew,
of course, as do most people, even if we don't often think about
libraries in these terms. This social and informational complexity
means at the very least that it's okay for librarians to be so
incredibly anal retentive. It's okay because they have to be!
It's not a simple job.
But if it's so hard, why do I want you to consider implementing a
classification scheme for your library at home? Because, first, all of
the really hard work has already been done for you; and, second, there
are benefits in return for a small investment of time and energy. You'll
understand the point about the hard work having been done already
after next month's column; but what about the expected benefits? They include easier
discovery of things you own but don't know that you own; easier
management of the physical constraints of owning a large library;
easier digital management of physical objects, and so on.
Let me put this point in another way. XML.com writers, editors, and
columnists often say that no XML vocabulary is worth much unless or
until there is code that consumes and produces it. That's roughly the
case with personal media collections, too. Implementing a
classification scheme for your library is the first step toward
managing a dijalog lifestyle because it gives you a replicable,
algorithmic, tractable grounding in the real world. It means you can
easily, predictably, reliably put your hands on your copy of Wayne
Meeks' The Origins of Christian Morality or Jorge Luis
Borges' Ficciones -- both of which you forgot you even
owned -- as a result of asking a computer to tell you about some books
that discuss Christian ethics in the patristic period or notable
Spanish fabulists of the 20th century.
A Classification Scheme for Your Personal Media Collection
In the idiosyncratic way that I'm using the term, a "classification scheme" is a method of
organizing the items of a media collection in such a way that they can
be physically indexed, digitally queried, and physically
retrieved. I'm focusing on books, but the items of a media collection
may include other artifacts: CDs, DVDs, cassettes, 8-tracks, albums,
magazines, journals, and assorted ephemera.
In other words, a classification scheme is a method for
managing the interplay of information space (your collection) and
physical space (all of the items which constitute your collection, as
well as the space in which they reside). Chances are, if you have a
large library, you've already implemented some kind of classification
scheme. Probably something like this: most of my art history stuff lives on that
skinny shelf in the bedroom, except for the oversized coffee table books which live on the coffee table; the fiction stays in the rec room; all
the CDs are in the basement, near the stereo; and all the computer
books are in the office, separated into open source and non-open
source.
Which Classification Scheme?
Let's assume that I've convinced you to consider implementing a
classification scheme at home. Which one should you use? There are
several possibilities: Library of Congress Classification (LCC),
Universal Decimal Classification (UDC), Dewey Decimal Classification
(DDC), Colon Classification (CC), Bliss Classification (BC). If you're
curious about some of these, I've collected good web resources about
each one, as well as some microcommentary, in the Resources section at
the end of this article.
To anticipate next month's column, I'm going to show you how to
implement LCC. Well, really, I'm going to show you how to implement a
variant that I'm calling "LCC@Home". It's a variant because we're not
going to do much, if any actual cataloging at all (though I probably
won't be able to resist telling you about cuttering, since
it's a kind of interesting canonicalization algorithm of sorts), and
we're going to make a few simplifying moves and assumptions in order
to keep things realistic and manageable.
Before moving on, I want to show you the top-level categories of LCC,
so that you can start to get an idea of what it's like. According to
the Library of
Congress Classification Outline, there are 21 top-level
categories, one for most of the letters in the Latin alphabet:
A -- GENERAL WORKS
B -- PHILOSOPHY. PSYCHOLOGY. RELIGION
C -- AUXILIARY SCIENCES OF HISTORY
D -- HISTORY (GENERAL) AND HISTORY OF EUROPE
E -- HISTORY: AMERICA
F -- HISTORY: AMERICA
G -- GEOGRAPHY. ANTHROPOLOGY. RECREATION
H -- SOCIAL SCIENCES
J -- POLITICAL SCIENCE
K -- LAW
L -- EDUCATION
M -- MUSIC AND BOOKS ON MUSIC
N -- FINE ARTS
P -- LANGUAGE AND LITERATURE
Q -- SCIENCE
R -- MEDICINE
S -- AGRICULTURE
T -- TECHNOLOGY
U -- MILITARY SCIENCE
V -- NAVAL SCIENCE
Z -- BIBLIOGRAPHY. LIBRARY SCIENCE. INFORMATION RESOURCES (GENERAL)
Alpha By Title or Author?
All of this begs a real question: why shouldn't we
just alphabetize our collection items by title or by author's last
name? There are a few reasons why that's not ideal.
It will come as no surprise to XML developers and other readers that
the choice of classification scheme goes a long way toward determining
what kinds of queries one can easily perform. As I discuss below in
the Resources section, some of the alternatives to LCC are faceted
schemes, which allow for a variety of complex, composable
queries. LCC is not in fact a faceted scheme, though for personal
collections that's not going to matter very much.
The problem with only arranging your collection physically by
alphabetical order is that, without a computerized index of the
collection, you can't form
queries like, "show me all the resources that are about Spanish
anarchism or anarcho-syndicalism" or "show me all the resources that
are about Buddhist folk magic". The only way to browse an alphabetized
collection is to stroll among its items, looking at each one carefully, trying
to decide whether it matches what you want. Or you have
to know a relevant title or author's name already.
Discovering resources, in the common case, is going to take longer if
your collection is arranged alphabetically, though that's really only
a problem once it grows above a certain size. For me that size was
about 1,000 books. As soon as I could no longer take in my library in
one continuous glance or eye-sweep, I started being unable to find
things easily. Now that my library is spread out over three
distinct physical spaces, it would be even harder to find things if it
were arranged alphabetically.
I should tell the truth: you can get by with physically arranging your
collection alphabetically by title or author. In that case, you use
some computerized index at a big library, probably over the Web, to
look for stuff. When you find items that you're interested in, you
retrieve them in your collection, if they exist there, by doing an
alphabetical lookup.
But that approach has real limitations. That physical arrangement makes future
planning harder because it doesn't scale well at all. It doesn't scale
well because it lacks a rational or predictable connection between the
information space which you query and the physical space within which
you retrieve the results of that query. Recall the top-level LCC
categories; it offers a connection between the conceptual and physical
arrangement of the collection.
For example, it is very unlikely that I will ever own any
items that are classifiable as U
or V. Sorry, but that's not gonna happen. Likewise, while
I have a copy of Black's Legal Dictionary (what
self-respecting IP-loving geek doesn't?), and a few dozen other legal
reference texts, I'm never going to have a large number
of K or R items. On the flip side, as a
long-time student of philosophy and religion, I have some thousand or
so B books; I have nearly 500 M items
(mostly CDs, but lots of books about music); and my Q
and T sections are very crowded, too.
While this may seem a subtle, insignificant, point, it's actually quite
useful. It allows me to do some rational preplanning about the
physical arrangement of my collection. I typically want all of the
items of a particular top-level category to be as close together,
physically, as possible. Since I know how LCC distributes and joins items by looking at its top-level categories (as well as major
categories within the top-levels: I have tons and tons
of BR, BT, B790--B802,
and T), I can make some decisions in advance about how to map my local distribution of items in top-level categories onto the constraints of my physical space. Given that I have a lot of B items, I need to make sure I leave a lot of shelf space for them, for example.
Compare this predictability to the case where your collection is
arranged alphabetically. Do you have any idea how many items the
titles of which begin with "r" or "p" or any other letter are in your collection? There's no
easy way to know this and certainly not with any predictability. We do know roughly the most commonly
used letters in English, but how that correlates to initial letters of
author names and resource titles is anyone's guess. You could roughly
allocate space according to wild guesses derived from letter
frequencies, but that's a far cry from the kind of planning you can do
with LCC.
Now, clearly, this argument is especially relevant to universities and other
institutions with very large collections. In fact, you can often
save yourself time on the campus of a large university by remembering
that the law, engineering, divinity, and medical schools are all
likely to have their own libraries, which is where you'll find the
highest concentrations
of K, T, B, and K
items. Copies of items of general relevance -- like that Black's Legal
Dictionary I mentioned -- may, at some universities, be found in the central library, but not always.
So, yes, this is an argument more for large, resource-constrained
institutions, but I think it applies to resource-constrained
individuals with relatively large collections. If we're gonnna do this, we
might as well do it right.
Why Not Dewey?
Finally, before concluding this column, I want to consider briefly the
reasons I had in mind when I chose LCC over Dewey Decimal
Classification (DDC). DDC is, as its partisans will remind you, the
mostly widely used scheme in the world. But I think that's slightly
misleading. I don't mean in any way to denigrate DDC, since I'm
neither remotely qualified to do so nor do I have any technical
reasons whatever for preferring LCC. However, "widely used" is ambiguous. I
have no doubt that DDC is the scheme used in the largest number of
libraries. I also have as little doubt that more money and
resources are poured into LCC cataloging efforts than into DDC
cataloging efforts. I'm also relatively confident that more
LCC-organized library indexes are available for query over the Web.
Why do those things matter? They matter because we want to push all of
the cataloging burden onto relatively well-heeled public institutions,
where it belongs. The reason that implementing LCC@Home is at all
possible is because individuals are able to push the cataloging burden
onto powerful public institutions, and we are able to take advantage
of the results of all the investment that's gone into cataloging to
date. The simple fact is that LCC is supported by the US federal
government and by the overwhelming majority of research universities
in the US. The goal is for us little folks to do as little actual
cataloging work as is possible; one way we can achieve that goal is to
align ourselves with the biggest and best funded cataloging
effort. Near as I can tell, that's LCC.
How Exactly?
As a teaser for next month's column, I want to summarize
very concisely how we'll actually implement LCC@Home:
- Form an initial impression of the distribution of your collection
in terms of LCC top-level categories and major subcategories.
- Allocate physical and storage space (bookshelves, primarily) in a
way that corresponds roughly with (1), taking into consideration your
present and expected future interests.
- Gather item-labeling materials -- including a variety of labels,
stickers, and pens of various kinds -- taking into consideration any
special requirements presented by unusual items in your
collection.
- For each item in your collection, find its unique LCC identifier
and affix that identifier to the item, using the materials in (3).
- Depending on the number and type of items in your collection that
are not LCC cataloged, apply some other classification scheme, leave
the items unidentified, or consider cataloging the item yourself.
- Physically arrange the distribution of items matching LCC
categories according to some locally-derived, sensible plan.
In next month's column, replete with pictures and diagrams, I'll walk
you through these steps, point out pitfalls and hidden traps, and
discuss some of the choices we have to make. As always, I'm curious to hear your feedback about this article and these ideas.