Document Engineering

An excellent book by Bob Glushko and Tim McGrath.

April 3, 2006

Bob Glushko and Tim McGrath’s new book Document Engineering: Analyzing and Designing Documents for Business Informatics and Web Services describes “document engineering” as a new discipline. The discipline, if not the name, will sound familiar to people who work with XML in an automated publishing context, a web services context, or somewhere in between. (More on the “between” later.)

I found the book’s subtitle to be misleading, because the book covers much more than the design of documents. As its introduction tells us,

The essence of Document Engineering is the analysis and design methods that yield:

Precise specifications or models for the information that business processes require.

Rules by which related processes are coordinated, whether between different firms to create composite services or virtual enterprises or within a firm to streamline information flow between organizations.

Document Engineering provides the concepts and methods needed to align business strategy and information technology, to bridge the gap between what we want to do and how to do it.

The two authors describe documents as “self-contained package[s] of related information”, which obviously means much more than publishable content that gets set into specific fonts to be read by human eyeballs. In particular, they describe documents as the interfaces between business processes. These processes are typically rendered as services these days, what with service-oriented architectures (SOA) being such a hot topic in IT architecture discussions. Such a document could be an invoice, a bill of lading, or a specific information package designed for the interaction between processes at two business partners.

Or, of course, it could be a novel or a user’s guide or a company’s annual report. There’s a common distinction in the XML world between “data-oriented” XML and “document-oriented XML” that I prefer to describe as transaction-oriented XML versus publishing-oriented XML (see Documents vs. Data, Schemas vs. Schemas). It’s all data, and it’s all documents. The status of all well-formed XML as both data and documents is something that Glushko and McGrath take very seriously, and they’ve studied engineering techniques from both the business transaction and the publishing worlds to help the reader address issues in both ends of the spectrum and in the many cases in between. For example, in addition to reviewing the methodology proposed in Eve Maler and Jeanne El Andaloussi’s classic book Developing SGML DTDs: From Text to Model to Markup, they present the first detailed approach I’ve seen to applying classic database normalization techniques to documents.

Along with the use of existing data engineering techniques, they also build on existing business processing models and standards such as ebXML. While the book will be very useful for business process people who want to learn more about the processing of non-relational data, a chapter like “Describing What Businesses Do and How They Do It” will be just as valuable to data people who want to understand the different classes of business processes, the different levels of abstraction used to discuss them, and potential interactions between them, which is especially important considering the primary role that Glushko and McGrath see their idea of “documents” playing in those interactions.

Consultants who need to repeatedly perform business process analysis and related document workflow analysis at multiple clients will find this book to be particularly helpful, both to educate themselves and their clients. For example, the chapter “When Models Don’t Match: The Interoperability Challenge” enumerates ways that exchanged data may not fill the intended purpose. I liked the chart showing the potential problems caused by differences in content, encoding, structure, and semantics for a simple piece of information like “100 US Dollars”.

The extended use case that the book repeatedly returns to is an integrated event calendaring system for the University of California at Berkeley. Anyone who considers shared calendaring to be the next killer app should consider themselves lucky that Glushko and McGraw chose this as a use case; the clear connections that they draw between the details of the use case and the more abstract discussions that form the bulk of the book will give calendar app developers a big jump in their analysis work. Also handy for anyone’s analysis are the lists of questions that the book suggests you ask of someone about any document that they use or send to others.

As a 702-page (with indexes and backmatter) hardcover, the book does weigh a bit; I read much of it on a series of plane trips in which I was not carrying a laptop. Carrying this book with a laptop and normal luggage might have given me back problems. The book is definitely worth getting, though, for just about anyone who deals with XML data that gets passed from one process to another, and that’s a lot of us.

2 Comments

By stelt on April 3, 2006 12:29 PM

Some very smart ideas about Documents being way more than a dead instance on a website or print-out:
http://www.google.com/search?q=%22future+of+Science+Communication+and+Publishing%22 and
http://www.google.com/search?q=%22hypermedia+for+science%22+datament

And what about many DOMs continuously interacting ?

By Bob Glushko on April 3, 2006 3:08 PM

Bob, I’m pleased that you like the book. Tim McGrath and I have set up site at docengineering.com with a couple of sample chapters and various other talks and papers.

-Bob Glushko

blog

home

blog

categories

writing

music

about

Recent Posts

Visualizing RDF

Using regular expressions to manipulate data in a SPARQL query

Appreciating the SPARQL property path slash character more

Triples about existing triples

Querying for labels

Human-readable names in RDF

My brief tenor banjo career

Nicer date and time handling in SPARQL 1.2

Passing your own data to use in Wikidata visualizations

Entity recognition from within a SPARQL query