Enterprise Application Integration using Apache Cocoon 2.1
by Tony Culshaw
November 12, 2003
Apache Cocoon has typically been categorized as a web publishing
framework, but since the release of version 2.1 is has started to look
more like an XML application server.
I've just completed a project with a travel company to build a
web-based travel agency desktop which integrates several common backend
systems. These systems are ones that a typical agent would use in day to
day business and were chosen to demonstrate a variety of integration
techniques. In this article I outline how Cocoon 2.1 was the key to
building this product, including both advantages and disadvantages.
The Problem Space
The travel industry is an old industry in IT terms. The core systems
have been around a long time and are truly legacy systems. Even today
there may still be the occasional dumb terminal plugged into an ALC
(Airline Link Control) network. Most airlines and global distribution
systems still have user interfaces based on a 64 by 15 character
screen.
These days we are getting new interfaces to the core of these central
reservation systems, usually based on proprietry XML APIs. Standards such
as those evolving at the Open Travel
Alliance will eventually provide for a much improved web service
development space for both suppliers and clients alike.
The unfortunate reality is that the problem space we occupy today is a
fairly horrendous mix of rapidly changing technologies. Any system that
integrates the variety of supplier systems we are faced with is going to
have to be an agile system. Give us any system, and we have to be
able to talk to it and integrate it.
As a system architect my other objectives were to build a system which had
a very fluid and incomplete requirements specification. All this with a
small number of developers that had varying skillsets in both open source
and Microsoft technologies. Extreme Programming seemed the only way
forward in this environment. An XML foundation appeared to be the
compromise to bridge the developers skills, since any developer would be
able to maintain and develop any part of the system with minimal
training.
As time went on, the more it appeared that "Everything is XML" was the
mantra. Even dumb terminal feeds could be "XMLized".
The next step was answering the question: "If all my system feeds are in
XML, what benefit do I get in coverting the XML into an object graph (as
in Java code), and then back out into a user interface as HTML?" The
answer caused some healthy debate amongst the developers, but in the end
the answer was that there is no benefit.
We had to develop or find a framework that handled input and output in
XML, had a suite of XML-aware components to handle
authentication,
form validation, and was easily
extensible to call a variety of external
interfaces. Cocoon appeared ideal in that it was based on XML Pipeline
processing and we had the skillset to build extensions in Java.
Cocoon 2.1 was chosen because,
although it was not even in alpha stage, the anticipated timeline would
mean that any extensions we built would be based on the 2.1
infrastructure, not the old 2.0 infrastructure. This proved a good
decision because our product is now in beta testing, and Cocoon 2.1 has
just been released.
Five Second Introduction to Cocoon
There are a number of excellent articles on Cocoon basics, not to
mention the Cocoon site
itself. The samples that come with the Cocoon distribution are recommended
and hint at the many possibilities. For the purposes of this article I
will only introduce the basic components and the concepts of XML pipeline
processing.
A Cocoon Web Application consists of a number of hierarchical
sitemaps. Each sitemap consists of a number of Matchers (typically to
match a url pattern) and each match can kick off the assembly of a
Pipeline. A pipeline gets assembled by Action and Selector components. A
pipeline, once assembled, is typically started with a Generator, followed
by one or more Transfomers, and finally a Serializer.
The main components in Cocoon are
- Matchers. Matches on some
input; for example, the Wildcard
URI Matcher, which matches on the URI delivered via a browser
request.
- Actions. Takes some
action based on input parameters and results in success or failure.
Typically takes on the role of the "C" in traditional MVC.
- Selectors. Similar to
actions, but allows muliple outcomes as in
'if else if else if...'
- Generators. Most commonly
the Request
Generator, which takes a browser request (POST/GET) and generates
XML as SAX events..
- Transformers. XML SAX
input is transformed to XML SAX output. The
best example would be the XSLT
Transfomer.
- Serializers. Accepts SAX
Events as input and serializes them to
an output stream. For example the HTMLSerializer serializes to the
Cocoon Servlet response output stream.
- Readers. A reader ties
the input stream directly to the output stream. Ideal for inputs that
can't be readily XMLized, like JPGs.
The Systems
The Systems to be integrated were
- a tour management system via a dumb terminal interface.
- a global distribution system via a published XML API accessing
air, car and hotel reservations.
- an air fare system via its web site.
- a travel insurance system via HTTP XML.
- an authentication and customization Server via web
service
- a persistence engine, the 'data integrator' to provide the value
added integration services eg. combined travel itineraries, customer
relationship management -- via published web service.
The diagram summarizes these systems and their methods of access. Note
Cocoon as the aggregator of these systems.
Cocoon Extensions
There were primarily three extensions that we required due to the fact
they they didn't currently exist in Cocoon, or the components that did
exist weren't ready for prime time.
These were
- a custom form action extension including validation (using
Jakarta Commons Validator);
- a web service transformer which could deal with both basic HTTP
XML POST operations and full blown SOAP calls;
- an HTML transformer which could HTTP GET/POST to a regular web
site and return the response as well-formed HTML.
Both the web service transformer and the HTML transformer had to
maintain any remote session information transparently (as would a
regular session through a browser using cookies).
Note that all the code for these extensions has previously been
submitted to the Cocoon Developer mailing list.
The Big Issues
The big issues were "pipeline lock-in", debugging, standards drift, and
performance.
"Pipeline lock-in" means that XML data flowing through a
pipeline cannot influence the path through the pipeline. This isn't
actually a limitation, rather a state of mind that has to be adopted.
Imagine a pipeline that has to call two remote systems to achieve its end
result. If the call to the first system fails (call 1) we require that no
call to the second system be made, and that a suitable form of error
handling be invoked, including returning an error message to the
user.
...
<map:generate type="request"/>
<map:transform type="xalan"
src="prepareCallToHost1.xsl"/>
<map:transform
type="webservice">
<map:parameter name="uri"
value="http://host1/service" /> <!-- call
1 -->
</map:transform>
<map:transform type="xalan" src="logger.xsl">
<map:parameter name="file"
value="c:/tmp/call1.log" /> <!-- log
result of call
1 -->
</map:transform>
<map:transform type="xalan"
src="prepareCallToHost2.xsl"/>
<map:transform type="webservice">
<map:parameter name="uri"
value="http://host2/service" />
<!-- call 2 --> </map:transform>
<map:transform type="xalan"
src="result2html.xsl"/>
<map:serialize type="html"/>
...
Thus in the above sample prepareCallToHost2.xsl would
have to determine whether an error had occured in call 1. It would either
prepare an agreed upon XML error structure without tags to trigger call 2
or prepare the XML for call 2. Similarly result2html.xsl
would have to determine whether to render an error or to render a normal
result to the user.
But surely it would be better to change the direction through the pipeline
and have another component just after call 1 to say 'if success do
endPipeWithA else do endPipeWithB'?
It takes a while to grasp, but think about how the XML flows through the
pipeline. The key is in the SAX events. Each event is fired all the way
through the chain before the next. If we were allowed to change direction
in a pipeline, we would essentially break the chain (startElement doesn't
match endElement scenarios). We would get halfway through an
HTMLSerializer, and then we could change to a PDFSerializer; it just
wouldn't make any sense.
The bottom line is that all possible error conditions must be accounted
for at each stage in the pipeline and propagated through the pipeline.
Debugging is more difficult, primarily due to the lack of
an IDE with breakpoints that you would get in an ordinary development
environment. Each stage in a pipeline must be serialized back to the
browser in XML for effective debugging. An alternative is to insert a
logging type transfomer in the pipeline where a suspected error occurs and
view the output that way. The above example logs the result of call 1
using the logger.xsl transform below.
<?xml version="1.0"?>
<!-- Logger Transform - Must use xalan! -->
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:lxslt="http://xml.apache.org/xslt"
xmlns:redirect="org.apache.xalan.xslt.extensions.Redirect"
extension-element-prefixes="redirect">
<xsl:param name="file" select="'log.xml'"/>
<xsl:template match="/">
<redirect:open select="$file"/>
<redirect:write select="$file">
<xsl:copy-of select="."/>
</redirect:write>
<redirect:close select="$file"/>
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="node()|@*">
<!-- Copy the current node -->
<xsl:copy>
<!-- Including any attributes it has
and any child nodes -->
<xsl:apply-templates
select="@*|node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Hopefully we'll see some decent debugging tools emerging as Cocoon's
popularity grows.
Standards Drift can occur simply because there's nothing
to enforce the quality of the XML flowing throuh a pipeline. In addition
we require best practice enforcement in the authorship of XSLT. Pair
programming can certainly help here. One possibility would be schema
enforcement at various stages in the pipelines, although this would be
removed prior to release for performance reasons.
Performance can be a big issue. XML transforms are still
slow, although the last few years have given us great leaps in efficiency.
Cocoon has an advanced caching mechanism, but this doesn't help us much
where most of the content is dynamic.
A few tidbits of advice:
- Use IFrames to selectively repopulate a page instead of
rebuilding entire pages each time.
- Minimize the number of transforms by using input modules where
possible.
- Minimize XML volume through each pipeline by using aggregation.
- Cocoon supports multiple XSLT types (xalan, xslt) and others will
be added in the future, including Saxon (XSLT 2.0) and Gregor (very
performant). It would be useful to experiment with different
implementations.
- Load test for scalability every couple of development iterations.
- Identify indvidual bottlenecks and consider writing custom
components to optimize.
- Use Apache with mod_jk in front of your Cocoon servlet
engine to cache content from Cocoon Readers and any other static
content.
Development Methodology
Did I mention Extreme Programming?
A tight specify/code/build/test/release cycle was vital for this
project.
Use cases (or Story cards) were implemented alongside JMeter test
scripts. Once a script run was successful it was incorporated into the Ant
build script for the project. Wherever Java components were developed,
unit tests were built and incorporated into the build. The JMeter scripts
were used to enforce continuous quality control and were used for load
testing as well as later system checks in the live system
Without this strict adherence to testing as part of the build process, the
project would have failed due to lack of quality control. If a bug is
introduced, it's far better to catch it quickly, while the developer's mind
is still "in context". Needless to say, until our custom components were
bedded down, we had a lot of pipeline breakages. Our testing regime
ensured we kept on top of things.
Due to the wide variation in developer's skills, pair programming was used
to cross train and enhance skillsets. Pair programming is the best
training a programmer can get.
Looking Forward by Looking Back
This is always an important part of a project, enhancing your ability
to see where you can improve, not only for the next iteration of the
current project, but also for the next project.
What did we do wrong? What would be done differently next time around?
How has the technology changed since we started?
We need to use more XML Schemas to enforce standardization. I believe that
we made the right decision in not designing and enforcing schemas from day
one, but now that the project has matured and entered maintenance phase it
is the right time.
Our form handling mechanisms are totally proprietary simply because there
were no satisfactory solutions in Cocoon. Cocoon 2.1 now incorporates
Flow/Continuations, JXPath, and Woody. These are very interesting
components that should be given serious consideration for any new
development work.