Measuring information

A short but dense classic offers some solid background.

I’ve always been fascinated by the idea of information as something quantifiable. When William Strunk (of Strunk and White fame) wrote omit needless words, and when George Orwell wrote “If it is possible to cut a word out, always cut it out” in Politics and the English Language, they affirmed that good writing packs more information into fewer words (or syllables, or even letters—in the same essay, Orwell wrote " Never use a long word where a short one will do") than bad writing does. While I won’t tag this weblog entry as part of my Documenting Software series, the idea of more “efficient” sentences holds obvious appeal to a geek writer.

The phrase “more information” here, though, does not refer to something quantifiable. For example, the second sentence below has more information than the first, but we can’t assign a number to the difference:

  • The Beatles' album "Revolver" is really so, completely, totally, you know, like, awesome.

  • The Beatles recorded their "Revolver" album at Abbey Road studios from 4/7/66 to 6/17/66.

Years ago, when I heard about Claude Shannon’s work on information theory, I sent away for the book The Mathematical Theory of Communication that he co-authored with Warren Weaver. This was so long ago that I only realized today that the NYU building where I took every class of my computer science degree was named for the same co-author; if I ever referred to WWH in an e-mail to Matthew Fuchs, who got his PhD there, he’d know immediately that I meant Warren Weaver Hall.

[Shannon and Weaver cover]

Much of information theory came out of the study of communication, with the engineering problem being the loss of information. To know what percentage of transmitted information was received, you must be able to measure information, so it’s no surprise that Shannon did this work at Bell Telephone Labs. He wrote pages 36 to 125 of this book, and the math is over my head despite my CS degree. (I may claim that I’ve always specialized in getting computers to manipulate text, not numbers, but that’s a poor excuse considering how much of the payoff from Shannon’s work has been in applications that transmit and store text.)

Weaver’s 1949 26-page introduction to the book includes a few logarithmic expressions, but I can handle that, and his whole section is fascinating. Part 2 of Weaver’s essay, the longest part, is an interpretation of Shannon’s work; Part 1 raises questions that add much clarity to my ruminations about the difference in the amount of information in the two sample sentences above, and Part 3 revisits the questions in light of Shannon’s work.

Part 1 describes three levels of communications problems, which he describes like this:

  • LEVEL A. How accurately can the symbols of communication be transmitted? (The technical problem.)

  • LEVEL B. How precisely do the transmitted symbols convey the desired meaning? (The semantic problem.)

  • LEVEL C. How effectively does the received meaning affect conduct in the desired way? (The effectiveness problem.)

So, the second of my two sentences about the Revolver album has more information in Level B terms (that is, more semantic information), while having an identical amount in Level A terms—with the two sentences being equal in length, my host provider’s server used the same energy to send each one to your computer. (I think that Level C concerns the receiving entity more than the received message, and Weaver doesn’t say much about it, so I won’t address it here.) Part 2 of Weaver’s piece is called “Communications Problems at Level A”; this is obviously where Shannon’s math has the most to offer. Weaver mostly discusses Levels B and C in terms of their relationship to Level A, and it’s an important relationship: they build on Level A, so problems at Level A cause problems in B and C. He suggests an interesting change to the following diagram, shown originally in Part 1:

[communication system schematic]

One can imagine, as an addition to the diagram, another box labeled “Semantic Receiver” interposed between the engineering receiver (which changes signals to the messages) and the destination. This semantic receiver subjects the message to a second decoding, the demand on this one being that it must match the statistical semantic characteristics [his italics] of the message to the statistical semantic capacities of the totality of receivers, or of that subset of receivers which constitute the audience one wishes to affect.

Similarly one can imagine another box in the diagram which, inserted between the information source and the transmitter, would be labeled “semantic noise,” the box previously labeled as simply “noise” now being labeled “engineering noise.” From this source is imposed into the signal the perturbations or distortions of meaning which are not intended by the source but which inescapably affect the destination. And the problem of semantic decoding must take this semantic noise into account. It is also possible to think of an adjustment of original message so that the sum of message meaning plus semantic noise is equal to the desired total message meaning at the destination.

To “match the statistical semantic characteristics” sounds like quite a challenge, but I’m sure there are semantic web researchers out there reading up on their Shannon and Weaver.

I’ve written all this as an introduction to a review of a more recent, pop science oriented book that I’ve just read and enjoyed very much, Hans Christian von Baeyer’s Information, but this is already long enough, so I’ll discuss von Baeyer’s book at some future point. I’ll finish with my favorite quote from “The Mathematical Theory of Communication”, in which Weaver discusses the effect of the probability of the existence of a given symbol string on efficient compression:

… anyone would agree that the probability is low for such a sequence of words as “Constantinople fishing nasty pink.” Incidentally, it is low, but not zero; for it is perfectly possible to think of a passage in which one sentence closes with “Constantinople fishing,” and the next begins with “Nasty pink.” And we might observe in passing that the unlikely four-word sequence under discussion has occurred in a single good English sentence, namedly, the one above.

A Google search on the phrase “Constantinople fishing nasty pink” today gets 246 hits, but of course Shannon and Weaver’s influence is much more extensive than that.

2 Comments

By Gavin Brelstaff on April 6, 2006 9:36 AM

Ezra Pound in his “ABC Of Reading” - Faber & Faber 1951
wrote p36

“Great literature is simply langauge charged with meaning to the utmost possible degree."
DICHTEN = CONDENSARE

p63
“Incompetence will show in the use of too many words”

By Gavin Brelstaff on April 11, 2006 4:46 AM

Dear Bob

I (also) think your thinking might be elucidated by the
the Russian linguist:

http://en.wikipedia.org/wiki/Roman_Jakobson

“Jakobson distinguishes six communication functions, each associated with a dimension of the communication process:”

Dimensions
1 context
2 message
3 sender ————— 4 receiver
5 channel
6 code

Functions
1 referential (= contextual information)
2 poetic (= autotelic)
3 emotive (= self-expression)
4 conative (= vocative or imperative addressing of receiver)
5 phatic (= checking channel working)
6 metalingual (= checking code working)\