17 years of my web bookmarks, with metadata

Featuring "75 Bleeding-Edge Search Engines To Beat Google", and more!

Much of the original point of the web was not just linking from one page to another but also saving and managing links, ideally with some metadata. Because of this, all browsers give you some way to save a link to a web page as a bookmark, and they typically let you sort these into a hierarchical arrangement of folders.

Third-party apps have cropped up with various strategies for improving on the built-in bookmark management offered by browsers. I have used diigo since 2004 and del.icio.us before that, and a recent review of my 71 pages of bookmarks was like a tour of my own mind for 17 years. (I seem to remember migrating the del.icio.us bookmarks when I made the transition but only see a few dozen of them showing up together in the early days of my using diigo.)

The ability to tag diigo bookmarks makes it easier for me to link to batches of them from here. For example, 63 of my early links are about the very concept of linking and the related standards and implementations that were evolving at the time. These links about linking include links to entries from my first blog, Thinking About Linking", which I had on oreilly.com.

Reviewing the full bookmark collection showed some interesting patterns.

Too much fun to not share right away

Link rot (or not)

Big respect to the organizations whose URLs still point to the same content they pointed to way back when:

  • The New York Times, like with this Messing Around With Metadata piece that I bookmarked in 2007.
  • The Economist, like with this Start making sense piece about the semantic web from 2008.
  • Lifehacker, like with this review of a Windows uninstaller that I bookmarked in 2007. (One name that came up a lot in my bookmark inventory was Gina Trapani. She wrote many, many pieces at Lifehacker that I found useful enough to bookmark.)
  • I was going to give special kudos to technology news company GigaOM, because when I started reviewing these links I found that after GigaOM acquired the media news publisher PaidContent they redirected PaidContent article URLs to gigaom.com URLs for the same articles, but these links have rotted away.

The plain domain name paidcontent.org does redirect to gigaom.com, which reminds me of another interesting pattern I saw: expired domain names, including those with specific technical names, were often taken over by Japanese or Chinese sites that seem to have nothing to do with the original content. One example is medianmusic.com, which was selling groceries in Chinese when I first checked during my bookmark review and now shows Chinese content that I don’t understand well enough to generalize about.

Some other companies who get neither the A+ of the companies above for link maintenance or a failing grade. Two examples:

Failing grades:

  • IBM. While reviewing my diigo links I made note of five or six IBM developerWorks articles that were still there ten or so years after being published, but as I write this, none of them are there anymore. I’ll give them credit for one thing, though: they paid us to write those articles! After looking over my contract for one of them I recently republished it here: Taxonomy management with SKOS.
  • Taxonomy management tool vendor Synaptica. They had many bookmarkable articles about taxonomy management on their synapticacentral.com site, but all the ones I looked for are no longer there.

Of course, you could add to all three lists above; I’m just basing them on my own bookmark review. I purged many of them from the collection before I started taking notes for this blog post.

The Wayback Machine is like a versioning system for the web that lets you see how just about any web page looked at earlier points in the web’s history. It has been so valuable that I just donated $10 to them while reviewing one of my links that use it. I have replaced several of my formerly dead diigo bookmarks with links to Wayback Machine versions, like this recipe for making candied ginger from fresh ginger.

Things I thought were going to be a bigger deal than they turned out to be

  • I already mentioned that I used to be especially interested in linking—not just the ability to jump from web page to relevant web page, but standards-based architectures being built around these ideas and the ability for applications to take advantage of these architectures. Twenty years ago I realized that it wasn’t going to play out that way, although of course various JavaScript libraries and related tools have let people create more sophisticated link implementations. Standardized metadata built in to the links? Not so much.

  • A dozen links tagged RDFa. With millions (billions?) of HTML pages now using JSON-LD to embed triples, I won’t complain about RDFa’s failure. The goal of machine-readable triples being embedded in HTML pages was achieved.

  • HTML5 and the bitter process of its development.

  • Chatbots. My two most recent bookmarks with this tag are to Chatbots Magazine, whose newest article is over two years old, and a 2018 piece on chatbotslife.com titled Chatbots: What Happened?

  • Google+. I can’t even link to my bookmarks there because I deleted them all during my purge.

Some of my bookmarks showed the rise in popularity of things that continue to be popular, such as cloud computing, Twitter, and electronic book technologies.

Miscellaneous observations

I had many bookmarks for:

  • Tasks that were difficult to do in Linux 13 years ago but are easier now.
  • Windows utilities that would now be outdated even if I still used Windows.
  • Things I no longer need to bookmark because a web search to find them is faster than finding the bookmark (for example, a web form that escapes and unescapes URLs).
  • Things in the category of “I should read this but don’t feel like it; I will tag it so that I can come back to it if I ever regret not reading it”—especially in the field of machine learning.
  • How did I find the book image shown at the beginning of this blog post? After I wrote my first draft, I searched my diigo bookmarks for clipart and found Openclipart, which I had tagged as opensource and clipart. I guess I use my diigo bookmarks more than I realize.

Adding this data to a personal knowledge graph

The idea of a personal knowledge graph is hot lately. A curated set of over 1,700 favorite bookmarks sounds like an excellent addition to one. You can export diigo bookmarks to CSV, so I did that and used tarql to convert all of my links and their associated metadata to 7,641 triples.

In diigo you can assign multiple tags to your bookmarks; I apparently assigned four different tags to 21 of them. When you do this, diigo outputs a given bookmark’s multiple tags as a single CSV list in the CSV output, so that the “tags” value for my bookmark for this cartoon about user interfaces is “Apple,Google,Comic,userInterface”. Luckily, tarql supports Jena’s apf:strSplit function, making it easy to split that list and create four different ex:tag triples for that bookmark. (That ex: namespace was just for the quick and dirty test. For a real application I would use dc:subject for the tags.) After I added this function to my conversion query, it created 583 more triples than it had before.

How did I find out that I had assigned four different tags to 21 bookmarks? With a SPARQL query after doing the conversion, of course. With this data in RDF I can look for patterns and connect those tags to keywords in a taxonomy if I want. I can also connect up the data to other datasets. For example, the query that drives tarql could convert tags to URIs from standardized subject collections. I had tagged two bookmarks as F1; these could be converted to the URI <http://cv.iptc.org/newscodes/subjectcode/15039001>, the IPTC subject code for Formula One racing, for easier connection to other content out there. There are all kinds of possibilities.


Comments? Reply to my tweet announcing this blog entry.