The past and present of hypertext

You know, links in the middle of sentences.

January 17, 2016

I’ve been thinking lately about the visionary optimism of the days when people dreamed of the promise of large-scale hypertext systems. I’m pretty sure they didn’t mean linkless content down the middle of a screen with columns of ads to the left and right of it, which is much of what we read off of screens these days. I certainly don’t want to start one of those rants of “the World Wide Web is deficient because it’s missing features X and Y, which by golly we had in the HyperThingie™ system that I helped design back in the 80s, and the W3C should have paid more attention to us” because I’ve seen too many of those. The web got so popular because Tim Berners-Lee found such an excellent balance between which features to incorporate and which (for example, central link management) to skip.

The idea of inline links, in which words and phrases in the middle of sentences link to other documents related to those words and phrases, was considered an exciting thing back when we got most of information from printed paper. A hypertext system had links between the documents stored in that system, and the especially exciting thing about a “world wide” hypertext system was that any document could link to any other document in the world.

But who does, in 2016? The reason I’ve been thinking more about the past and present of hypertext (a word that, sixteen years into the twenty-first century, is looking a bit quaint) is that since adding a few links to something I was writing at work recently, I’ve been more mindful of which major web sites include how many inline links and how many of those links go to other sites. For example, while reading the article Bayes’s Theorem: What’s the Big Deal? on Scientific American’s site recently, I found myself thinking “good for you guys, with all those useful links to other web sites right in the body of your article!”

To get some idea of relative proportions of internal links, external links, and linkless text on today’s successful websites, I went to a top 15 most popular blogs list and did some random checking of articles on these sites. (An exercise for the reader to make up for my haphazard skimming: write some scripts to scrape some editorial content from each site, count the internal and external links, and produce a bar chart.) Because these are professionally managed sites, I imagine that management at some of them encourage links to other articles on the same site and discourage links to others as a matter of policy, because they want to keep their readers looking at their advertisers’ ads.

There is a gray area between internal and external links: linking to other sites that are part of the same organization, such the many links in a Business Insider article to Tech Insider articles, or the many links between members of the Gawker Media stable, which is heavily represented in the top 15.

Of those top 15:

Huffington Post: a mix of internal and external links, but their number of external links fits with their business model of being a hub of other sites’ content.
All about the internal links: TMZ, Mashable, Gawker, The Daily Beast, Engadget, Jezebel (where most external links are to their Gawker Media sibling Gawker).
Deadspin: a reasonable percentage of external links.
Gawker Media’s video game site Kotaku: long stretches of text with no links, and others with both internal and external links.
TechCrunch: mostly internal and several to Gizmodo, even though TechCrunch is an AOL site and Gizmodo a Gawker media site.
Gawker Media’s lifehacker, which is probably the site I visit most of all those listed here: external links if an article describes the external site’s article, company, or product, but otherwise, internal links.
Perez Hilton: mostly internal links; external links tend to be redirected via goo.gl, I suppose so that Mr. Hilton’s people can track which external links get clicked.
Gawker Media’s Gizmodo: plenty of external links, even to non-Gawker sites, for a gadget site that I assume is mostly interested in helping advertisers sell gadgets.
Cheezburger: textual content not much of an issue here.

I’m guessing that there is no policy across all of Gawker Media about the use of links, but that each of their major properties has some sort of policy in place. (For an interesting, explicit enumeration of one carefully managed site’s linking policy, see the guidelines at IBM Developer Works.)

On particularly link-rich bit of content that I read regularly is Data is Plural, which ironically is delivered via email—a technology that had a firm foothold in the Internet before Berners-Lee came up with the Web, and which most young people today only use to communicate with us old people.

Who even thinks about hypertext as hypertext anymore? A quick look at the former Usenet newsgroup (and now Google Group) alt.hypertext shows an average of about one new message or comment per month for the last few years, including spam. (Compare January of 1998, when the newsgroup had 39 topics with one or more postings in that one month.) The most recent topic shown is titled “NCSA Mosaic for X 0.10 available” from Marc Andreesen, posted—I thought—last month, making me think “isn’t he a bit busy for Mosaic these days?” It turned out that last month someone added a comment to his original 1993 post. A relatively recent new topic is Paul Ford’s January 2014 query “Do documents have a chance? Or is the future more and smarter optimized applications?” Actually, that makes a solid answer to my question that began this paragraph: Paul Ford, and I’m really looking forward to his upcoming book.

The hypertext “novel” I bought in 1994 for $25

blog

home

blog

categories

writing

music

about

Recent Posts

Visualizing RDF

Using regular expressions to manipulate data in a SPARQL query

Appreciating the SPARQL property path slash character more

Triples about existing triples

Querying for labels

Human-readable names in RDF

My brief tenor banjo career

Nicer date and time handling in SPARQL 1.2

Passing your own data to use in Wikidata visualizations

Entity recognition from within a SPARQL query