A schemaless computer database in 1965

To enable flexible metadata aggregation, among other things.

figure 3

I’ve been reading up on America’s post-war attempt to keep up the accelerated pace of R&D that began during World War II. This effort led to an infrastructure that made accomplishments such as the moon landing and the Internet possible; it also led to some very dry literature, and I’m mostly interested in what new metadata-related techniques were developed to track and share the products of the research as they led to development.

One dry bit of literature is the proceedings of the 1965 Toward a National Information System: Second Annual National Colloquium On Information Retrieval. The conference was sponsored by the American Documentation Institute, who had a big role in the post-war information sharing work, as well as the University of Pennsylvania’s Moore School of Electrical Engineering (where Eckert and Mauchly built ENIAC and its successor EDVAC) and some ACM chapters.

In a chapter on how the North American Aviation company (now part of Boeing) revamped their practices for sharing information among divisions, I came across this description of some very flexible metadata storage:

All bibliographic information contained in both the corporate and divisional Electronic Data Processing (EDP) subsystems is retained permanently on magnetic tape in the form of variable length records containing variable length fields. Each field, with the exception of sort keys, consists of three adjacent field parts: field character count, field identification, and field text (see Figure 3). There are several advantages to this format: it is extremely compact, thereby reducing computer read-write time; it provides for definition and consequent addition of new types of fields of bibliographic information without reformatting extant files; and its flexibility allows conversion of files from other indexing abstracting services.

I especially like that “it provides for definition and consequent addition of new types of fields of bibliographic information without reformatting extant files.” This reminds me of one slide in my presentation last month at the Semantic Technology and Business / NoSQL Now! conferences last month, where my talk was on a track shared by both conferences, about how a key advantage of schemaless NoSQL databases is the ability to add a new value for a new property to a data set with no need for the schema evolution steps that can be so painful in a relational database.

Moore’s law has led to less of a reliance on arranging data in tables to allow the efficient retrieval of that data. The various NoSQL options have explored new ways to do this, and it was great to see that one aerospace company was doing it 49 years ago. Of course, retrieving data from magnetic tape is less efficient than modern alternatives, but it was a big step past the use of piles of punched cards, and pretty modern for its time, as you can see from the tape spools on the picture of EDVAC’s gleaming successor below. I thought it was cool to see that, although tabular representation of data long predates relational databases (hierarchical and network databases also stored sets of entities as tables, but with much less flexibility) that someone had implemented such a flexible model so long ago, especially to represent metadata, with a use case that we often see now with RDF: to allow “conversion of files from other indexing abstracting services”—in other words, to accomodate the aggregation of metadata from other sources that may not have structured their data the same way that yours is structured.

Univac 9400

Univac photo by H. Müller CC-BY-SA-2.5, via Wikimedia Commons