
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">

  <channel>
    <title>bobdc blog</title>
    <link>https://www.bobdc.com/blog/</link>
    <description>
      Recent content from bobdc blog | 
      Bob DuCharme&#39;s weblog.
    </description>
    <generator>Hugo | gohugo.io | Theme twenty-sixteen</generator><language>en-us</language><lastBuildDate>Sun, 08 Mar 2026 09:53:24 -0400</lastBuildDate>
    
        <atom:link href="https://www.bobdc.com/blog/index.xml" rel="self" type="application/rss+xml" />
    
    
    <item>
      <title>The best way to talk about AI: don&#39;t say &#39;AI&#39; so much; say what you really mean</title>
      <link>https://www.bobdc.com/blog/stopsayingai/</link>
      <pubDate>Sun, 25 Jan 2026 09:45:00 +0000</pubDate>
      
      <guid>https://www.bobdc.com/blog/stopsayingai/</guid>
      
      
      <description><div>Be more specific to help reduce the hype.</div><div>&lt;p&gt;What do 1960s LISP programs for natural language understanding, 1980s Prolog programs for expert systems, and today&amp;rsquo;s use of large language models have in common? Nothing, really, except they&amp;rsquo;ve all been referred to as Artificial Intelligence.&lt;/p&gt;
&lt;p&gt;AI is not a technology. It&amp;rsquo;s a marketing term used by tech world people discussing many different technologies over the last 60 years. What&amp;rsquo;s different now is this scale of the use of this marketing term to sell us services. People like Sam Altman talk about generalized artificial intelligence because it gets them lots of media coverage that helps them to market the services that are their real goal: to have us all sign up to give them money every month so that their artificial intelligence can write our emails for us and read the emails that other people send to us so that we don&amp;rsquo;t have to. (And of course, to get lots of ad revenue as well there; look what it did for Google!)&lt;/p&gt;
&lt;p&gt;We could reduce a lot of the silliness out there if these discussions  — especially the many overly optimistic and overly pessimistic ones — mentioned the specific technology that they&amp;rsquo;re talking about instead of calling it &amp;ldquo;AI&amp;rdquo;.&lt;/p&gt;
&lt;p&gt;University of Washington linguistics professor Emily Bender &lt;a href=&#34;https://medium.com/@emilymenonbender/opening-remarks-on-ai-in-the-workplace-new-crisis-or-longstanding-challenge-eb81d1bee9f&#34;&gt;describes the term&amp;rsquo;s marketing role well&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;[AI] is a marketing term.  It&amp;rsquo;s a way to make certain kinds of automation sound sophisticated, powerful, or magical and as such it’s a way to dodge accountability by making the machines sound like autonomous thinking entities rather than tools that are created and used by people and companies. It’s also the name of a subfield of computer science concerned with making machines that “think like humans” but even there it was started as a marketing term in the 1950s to attract research funding to that field.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;(Interestingly, when I try to find out more about AI as a marketing term, I mostly just find pages — especially advertising — about &lt;a href=&#34;https://en.wikipedia.org/wiki/Artificial_intelligence_marketing&#34;&gt;using AI to help you do marketing&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;Bender goes on to say that &amp;ldquo;discussions of this technology become much clearer when we replace the term AI with the word &amp;lsquo;automation&amp;rsquo;&amp;rdquo;. She makes some nice points to support that, but I find it too simplistic to globally replace &amp;ldquo;AI&amp;rdquo; with a single term. We&amp;rsquo;d all communicate our ideas better by naming which specific technology that we&amp;rsquo;re marketing by calling it AI, and like I said above, it has been a set of different technologies since the term was coined. (I do highly recommend her &lt;a href=&#34;https://www.dair-institute.org/maiht3k/&#34;&gt;Mystery AI Hype Theater 3000&lt;/a&gt; podcast with sociologist Dr. Alex Hanna. The title is a tribute to &lt;a href=&#34;https://en.wikipedia.org/wiki/Mystery_Science_Theater_3000&#34;&gt;Mystery Science Theater 3000&lt;/a&gt;, a late twentieth-century television show in which two hosts added their own hilarious commentary to bad old science fiction movies; Bender and Hanna analyze bad academic papers on AI technology and add their own sarcastic commentary.)&lt;/p&gt;
&lt;p&gt;I see the current meaning of AI as being &amp;ldquo;the use of generative text chatbot interfaces to work with popular large language models&amp;rdquo;. It&amp;rsquo;s only about five years ago that AI usually meant the use of &lt;a href=&#34;../semantic-web-semantics-vs-vect/&#34;&gt;vector embedding models&lt;/a&gt; with neural networks (with so many layers that they were &amp;ldquo;deep,&amp;rdquo; so it wasn&amp;rsquo;t just learning; it was &amp;ldquo;deep learning&amp;rdquo;!)  to identify patterns that could help people make predictions: was that chest x-ray anomalous? Was that series of financial transactions unusual enough to maybe indicate fraud? Instead of &amp;ldquo;AI&amp;rdquo; I would call that &amp;ldquo;machine learning&amp;rdquo;; the biggest, most successful versions of it led to the Large Language Models that dominate our AI discussions today as they predict which words might make the most sense after a given set of input words.&lt;/p&gt;
&lt;p&gt;To complement a historical approach to the different things that AI has meant over the decades, at the recent &lt;a href=&#34;https://summit.graphwise.ai/graphwise-ai-summit-2025-gm5&#34;&gt;Graphwise AI Summit&lt;/a&gt; I learned from &lt;a href=&#34;https://graphrag.info/2025/07/25/graph-rag-curation-lowering-the-ai-noise-floor-draft/&#34;&gt;Alan Morrison&lt;/a&gt; about the excellent &lt;a href=&#34;https://www.jeffwinterinsights.com/insights/chatgpt-venn-diagram&#34;&gt;Where Does ChatGPT Fit in the Field of AI?&lt;/a&gt; Venn diagram by consultant &lt;a href=&#34;https://www.jeffwinterinsights.com/&#34;&gt;Jeff Winters&lt;/a&gt;:&lt;/p&gt;
&lt;img src=&#34;https://www.bobdc.com/img/main/jeffWinterAIVennDiagram.png&#34; class=&#34;centered&#34; width=&#34;720&#34; alt=&#34;Jeff Winter AI Venn diagram&#34;/&gt;
&lt;p&gt;Compared with my linear,  historical approach, Winters&amp;rsquo; diagram is more top-down. By describing the various technologies that have been associated with Artificial Intelligence and showing their relationships, this diagram can also help us all focus our discussions on what we can get out of which of these technologies rather than just referring to everything in the diagram as &amp;ldquo;AI&amp;rdquo;.&lt;/p&gt;
&lt;p&gt;Many references to the term remind me of how non-technical people refer to &amp;ldquo;the&amp;rdquo; cloud as if there was just one. There are multiple clouds out there, and a cloud-based application is using one or more of them: AWS, or maybe Azure or Google Cloud or one of the others. There are many AI-related systems out there, and referring to all of them with one term poses the additional danger —worse than the vagueness of describing &amp;ldquo;the&amp;rdquo; cloud—of letting people think that there is one thing that combines the capabilities of all the different &amp;ldquo;AI&amp;rdquo; technologies. It&amp;rsquo;s dangerous because it feeds the &lt;a href=&#34;https://www.nytimes.com/2025/09/03/opinion/ai-gpt5-rethinking.html&#34;&gt;panic about superhuman &amp;ldquo;generalized&amp;rdquo; AI&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;In life in general, the use of a larger vocabulary helps us to express ourselves better. With all the panic and blather about AI these days, using the  available broader vocabulary to discuss these technologies can help everyone to better appreciate the potential good and bad attributes and then plan for appropriate usage. Saying &amp;ldquo;LLM&amp;rdquo; is more specific; you can be even more specific by saying which LLM: ChatGPT, Claude, Gemini, or whatever. Maybe even better: say the name of the relevant tool. Don&amp;rsquo;t say &amp;ldquo;I&amp;rsquo;ll send you the AI summary of the meeting you missed&amp;rdquo; when you can say &amp;ldquo;I&amp;rsquo;ll send you the Copilot summary of the meeting you missed&amp;rdquo;.&lt;/p&gt;
&lt;p&gt;Recently, as I chatted with someone behind a counter while we waited for my credit card to be approved, he told me that his son was studying machine learning at a German university. I thanked him for saying &amp;ldquo;machine learning&amp;rdquo; instead of &amp;ldquo;AI&amp;rdquo;. Being more specific about what we mean makes for clearer communication, especially if we&amp;rsquo;re using an alternative to a term whose meaning keeps changing and means different things to difference audiences.&lt;/p&gt;
&lt;hr/&gt;
&lt;p&gt;&lt;em&gt;Comments? Reply to my &lt;a href=&#34;https://mas.to/@bobdc/115956331408873197&#34;&gt;Mastodon&lt;/a&gt; or &lt;a href=&#34;https://bsky.app/profile/bobdc.bsky.social/post/3mdayr3cz4s2a&#34;&gt;Bluesky&lt;/a&gt; posts announcing this blog entry.&lt;/em&gt;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2026">2026</category>
      
      <category domain="https://www.bobdc.com//categories/ai">AI</category>
      
    </item>
    
    <item>
      <title>My GraphRAG Curator interview</title>
      <link>https://www.bobdc.com/blog/graphragcuratorinterview/</link>
      <pubDate>Tue, 18 Nov 2025 10:15:00 +0000</pubDate>
      
      <guid>https://www.bobdc.com/blog/graphragcuratorinterview/</guid>
      
      
      <description><div>Discussing graphs, RAG, GraphRAG, music...</div><div>&lt;p&gt;&lt;a href=&#39;https://graphrag.info/&#39;&gt;&lt;img src=&#34;https://www.bobdc.com/img/main/GRClogo.png&#34; style=&#34;margin: 0px 30px 20px 80px;&#34; border=&#34;0&#34; align=&#34;right&#34; width=&#34;260pt&#34; /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://www.linkedin.com/in/alanmorrison/&#34;&gt;Alan Morrison&lt;/a&gt; of &lt;a href=&#34;https://graphrag.info/&#34;&gt;GraphRAG Curator&lt;/a&gt; recently interviewed me about my work both inside and outside of Graphwise, and it was a lot of fun. You can see it &lt;a href=&#34;https://www.youtube.com/watch?v=OyJPXUR8CXc&#34;&gt;on YouTube&lt;/a&gt; and read a &lt;a href=&#34;https://graphrag.info/2025/11/17/bob-ducharme-what-metadata-helps-businesses-most-with/&#34;&gt;transcript&lt;/a&gt; on the GraphRAG Curator website .&lt;/p&gt;
&lt;p&gt;Reading the transcript later, it felt like a first draft that I&amp;rsquo;m not allowed to edit. So, for example, when Alan asked &amp;ldquo;if there was some really common issue across companies that you knew you could change, what would you change to help us out with the governance problem that we have with data?&amp;rdquo; my answer was &amp;ldquo;It might be a cop-out, but for any given company, what are you trying to do?&amp;rdquo; In other words, I couldn&amp;rsquo;t think of general advice and would find it better to focus on each company&amp;rsquo;s individual goals.&lt;/p&gt;
&lt;p&gt;Later, though, I thought of something that could potentially help all companies, at least from a knowledge graph perspective: doing a complete inventory of all of your organization&amp;rsquo;s information assets might turn up some nice surprises for your enterprise knowledge graph. The first few things that you find will be your main systems that you use most often, but keep looking and you may find things you forgot about that relate to the first few datasets you found and can enrich your use of them. A little extra metadata for each dataset may be all you need to get them making a contribution to your knowledge hub.&lt;/p&gt;
&lt;p&gt;For example, let&amp;rsquo;s say a few years ago someone did a special project about a certain subset of their company&amp;rsquo;s inventory and the result was a set of spreadsheets. People might be thinking &amp;ldquo;well, that wasn&amp;rsquo;t official company data but just some back-of-the-napkin experiments and would therefore not be in the domain of serious data governance&amp;rdquo;, but maybe it could still contribute in some interesting new ways. For example, after converting the spreadsheets to RDF, which is easy enough, the insights gained from that project could become a long-term part of the more official data about those inventory items, even though these old spreadsheets would typically not be considered one of the company&amp;rsquo;s important assets. The flexibility of RDF makes it simple to add a few new properties to a few entities in your dataset with no need for some schema revision procedure.&lt;/p&gt;
&lt;p&gt;Here are links to some of the projects that Alan and I discussed, ranging from serious videos about Graphwise GraphDB capabilities to fun things I&amp;rsquo;ve done on my own, but nearly all with some RDF angle:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Recent blog entry &lt;a href=&#34;../stopsemanticweb/&#34;&gt;Let&amp;rsquo;s stop saying &amp;ldquo;semantic web&amp;rdquo;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;2021 blog entry &lt;a href=&#34;../dontneedowl/&#34;&gt;You probably don&amp;rsquo;t need OWL&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The video &lt;a href=&#34;https://www.youtube.com/watch?v=dCndx2QJRIQ&#34;&gt;Build a shopping chatbot in four minutes with GraphDB Talk to Your Graph 2.0&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The video &lt;a href=&#34;https://www.youtube.com/watch?v=VDNoYhFdXIM&#34;&gt;Using GraphDB&amp;rsquo;s Lucene Connector&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;My 2017 blog entry &lt;a href=&#34;../sparql-queries-of-beatles-reco/&#34;&gt;SPARQL queries of Beatles recording sessions&lt;/a&gt;, which includes a link to the RDF data that you can download&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;My 2016 blog entry &lt;a href=&#34;../converting-between-midi-and-rd/&#34;&gt;Converting between MIDI and RDF: readable MIDI and more fun with RDF&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;2012 blog entry &lt;a href=&#34;../a-brief-opinionated-history-of/&#34;&gt;A brief, opinionated history of XML&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;https://iptc.org/standards/sport-schema/&#34;&gt;IPTC&amp;rsquo;s specialization of schema.org for sports data&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Bandcamp&amp;rsquo;s &lt;a href=&#34;https://daily.bandcamp.com/best-contemporary-classical/&#34;&gt;Best contemporary classical&lt;/a&gt; roundups&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Is the Dewey Decimal System available as a SKOS dataset? The W3C&amp;rsquo;s &lt;a href=&#34;https://www.w3.org/2001/sw/wiki/SKOS/Datasets#DDC_Dewey_Decimal_Classification&#34;&gt;SKOS/Datasets&lt;/a&gt; points to one at  &lt;a href=&#34;http://dewey.info&#34;&gt;http://dewey.info&lt;/a&gt;, along with a &lt;a href=&#34;http://dewey.info/sparql.php&#34;&gt;SPARQL endpoint&lt;/a&gt;, but neither of those links seem to work anymore.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I hope the interview is interesting if you are interested in knowledge graphs and Graph RAG applications.&lt;/p&gt;
&lt;hr/&gt;
&lt;p&gt;&lt;em&gt;Comments? Reply to my &lt;a href=&#34;https://mas.to/@bobdc/115571615466394754&#34;&gt;Mastodon&lt;/a&gt; or &lt;a href=&#34;https://bsky.app/profile/bobdc.bsky.social/post/3m5w5lb5s3c2j&#34;&gt;Bluesky&lt;/a&gt; posts announcing this blog entry.&lt;/em&gt;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2025">2025</category>
      
      <category domain="https://www.bobdc.com//categories/ai">AI</category>
      
      <category domain="https://www.bobdc.com//categories/knowledge-graphs">knowledge-graphs</category>
      
    </item>
    
    <item>
      <title>Let&#39;s stop saying &#39;semantic web&#39;</title>
      <link>https://www.bobdc.com/blog/stopsemanticweb/</link>
      <pubDate>Sun, 28 Sep 2025 10:15:00 +0000</pubDate>
      
      <guid>https://www.bobdc.com/blog/stopsemanticweb/</guid>
      
      
      <description><div>Like a startup pivot, the technology turned out to be great for things other than a new kind of &#39;web&#39;.</div><div>&lt;img src=&#34;https://www.bobdc.com/img/main/semwebfordummies.jpg&#34; border=&#34;0&#34; align=&#34;right&#34; width=&#34;240px&#34; hspace=&#34;30px&#34; vspace=&#34;30px&#34;/&gt; 
&lt;p&gt;&amp;ldquo;Semantic web technology&amp;rdquo; refers to technology designed to create something that never got created. That&amp;rsquo;s OK. Lots of great things were and continue to be created.&lt;/p&gt;
&lt;h1 id=&#34;the-world-wide-web&#34;&gt;The (World-Wide) Web&lt;/h1&gt;
&lt;p&gt;Tim Berners-Lee created the original World Wide Web in 1990 by assembling five things that he had developed:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;A simple protocol for different systems to request and deliver resources (usually, text files) over a network: HTTP&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;A syntax to uniquely identify such resources: URLs&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;A markup language to represent simple yet structured documents in these delivered files, in which an important part of the structure was the ability to represent hypertext links to other such documents: HTML&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;A program that responded to HTTP requests by sending the requested file or delivering an appropriate HTTP status code if unable to deliver it: a web server&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;A program that could request a document from one of these  servers and, if it was an HTML document, render it on a screen with headings, bulleted and numbered lists, hypertext links, and other document components displayed appropriately for humans to read it: a web client&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Eventually, servers loaded with documents full of links, and clients to request and read them, accumulated into the World Wide Web as we know it.&lt;/p&gt;
&lt;h1 id=&#34;the-semantic-web&#34;&gt;The &amp;ldquo;Semantic&amp;rdquo; web&lt;/h1&gt;
&lt;p&gt;The next idea was to build on these things (mostly 1, 2, and 4) to create a web of machine-readable data that would complement the human-readable web of data. The new web would use a simple yet flexible standardized data model (RDF) with universal identifiers modeled on the URLs from component 2 above to make, in addition to documents, machine-readable data available to any program that could use HTTP across the same network used for the World-Wide Web. This might be a browser, but could also be a simple Perl or Python script.&lt;/p&gt;
&lt;p&gt;As a bonus, this new web would use recent knowledge representation advances to store bits of meaning about the terms in any given data model. For example, if the model says that the property &lt;code&gt;locatedIn&lt;/code&gt; is transitive and a certain piece of inventory is located in room A and room A is located in building B, an automated system could &lt;a href=&#34;https://www.bobdc.com/blog/trying-out-blazegraph/&#34;&gt;infer that this piece of inventory is in building B&lt;/a&gt;, and you could get more out of your data.&lt;/p&gt;
&lt;p&gt;A model could also say that Employee is a subclass of Person. This provides some semantics of these terms but was a data modeling capability that most people already took for granted from other systems. The ability to say that creationDate was a subproperty of modifiedDate was new for most people with an object-oriented background, but I have not seen this latter capability get a lot of use. (I&amp;rsquo;m sure there are plenty of usage examples out there, but they don&amp;rsquo;t cross my path much.)&lt;/p&gt;
&lt;p&gt;This ability to store bits of meaning, or semantics, led people to start calling this potential network of data files the &amp;ldquo;semantic&amp;rdquo; web. Over time, not many people and systems took advantage of this aspect of the new web, so it turned out that this was not a great name for it. There were other issues as well.&lt;/p&gt;
&lt;h1 id=&#34;a-web-of-machine-readable-data&#34;&gt;A web of machine-readable data?&lt;/h1&gt;
&lt;p&gt;Like most Internet resources, there are two approaches for making RDF data available on the web: as static files or as dynamically generated data.&lt;/p&gt;
&lt;p&gt;There are plenty of static RDF files out there, but not with connections between them that would form any kind of web the way that HTML files link to each other. In the early days of RDF technology, people talked about &lt;a href=&#34;http://xmlns.com/foaf/spec/&#34;&gt;Friend of a Friend&lt;/a&gt; files as a step toward replacing commercial social networking sites with our own decentralized RDF-based version. In this version, our FOAF file&amp;rsquo;s triples would describe the personal data that we want to make available and also link to the FOAF files of our friends. &lt;a href=&#34;https://snee.com/bob/foaf.ttl&#34;&gt;My FOAF file&lt;/a&gt; is still on the hosting service that I use, and so is &lt;a href=&#34;http://dig.csail.mit.edu/2008/webdav/timbl/foaf.rdf&#34;&gt;Tim Berners-Lee&amp;rsquo;s&lt;/a&gt;; as they show, we&amp;rsquo;re both friends of Norm Walsh, making Tim a friend of my friend. In my book &lt;a href=&#34;https://www.learningsparql.com/&#34;&gt;Learning SPARQL&lt;/a&gt;, example &lt;a href=&#34;https://www.learningsparql.com/2ndeditionexamples/ex166.rq&#34;&gt;ex166.rq&lt;/a&gt; uses Berners-Lee&amp;rsquo;s FOAF file to demonstrate how the &lt;code&gt;FROM&lt;/code&gt; keyword can retrieve data from remote resources.&lt;/p&gt;
&lt;p&gt;But, as I wrote &lt;a href=&#34;https://www.bobdc.com/blog/replace-facebook-with-foaf-twi/&#34;&gt;fifteen years ago&lt;/a&gt;, &amp;ldquo;actual FOAF files have been used for little more than demos&amp;rdquo;, and I don&amp;rsquo;t know of any other sets of RDF files available on the public web that form any kind of useful network. (I should mention that Berners-Lee&amp;rsquo;s company &lt;a href=&#34;https://en.wikipedia.org/wiki/Solid_(web_decentralization_project)&#34;&gt;Solid&lt;/a&gt; is doing interesting work helping people and organizations to share data in ways that let them control their privacy, all with RDF underneath.)&lt;/p&gt;
&lt;p&gt;The best way to share RDF data dynamically is by making it available over a SPARQL endpoint. &lt;a href=&#34;https://www.wikidata.org/wiki/Wikidata:Main_Page&#34;&gt;Wikidata&lt;/a&gt; and &lt;a href=&#34;https://www.dbpedia.org/&#34;&gt;DBpedia&lt;/a&gt; are two amazing examples of SPARQL endpoints that have given so much to so many people in so many disciplines. For many people, their whole motivation for learning SPARQL was to gain programmatic access to the data that these two endpoints provide.&lt;/p&gt;
&lt;p&gt;But, despite the growth of the &lt;a href=&#34;https://lod-cloud.net/&#34;&gt;Linked Data Cloud&lt;/a&gt; over the years, outside of the life sciences world there don&amp;rsquo;t seem to be many endpoints available besides Wikidata and DBpedia anymore. Sure, there are some, but classic ones from brand name organizations like  &lt;a href=&#34;http://data.nytimes.com/&#34;&gt;http://data.nytimes.com/&lt;/a&gt; and &lt;a href=&#34;http://nasataxonomy.jpl.nasa.gov/fordevelopers/&#34;&gt;http://nasataxonomy.jpl.nasa.gov/fordevelopers/&lt;/a&gt;  no longer work. After clicking many random nodes on the Linked Data Cloud diagram to find the relevant endpoints, I see the FAIL triangle &amp;ldquo;!&amp;rdquo; icon on nearly every SPARQL endpoint&amp;rsquo;s entry, like this:&lt;/p&gt;
&lt;img src=&#34;https://www.bobdc.com/img/main/resource-unavailable.png&#34; class=&#34;centered&#34; width=&#34;500px&#34;/&gt; 
&lt;p&gt;Of course, organizations like the &lt;a href=&#34;https://www.weather.gov/documentation/services-web-api&#34;&gt;U.S. Weather Service&lt;/a&gt; that want to dynamically generate machine-readable data these days generate JSON, and rarely &lt;a href=&#34;https://json-ld.org/&#34;&gt;JSON-LD&lt;/a&gt;. There are no persistent URIs or references to other persistent URIs. Each one is basically an API to a silo, but as public APIs they are still performing a valuable service. They&amp;rsquo;re just not part of any kind of &amp;ldquo;semantic web&amp;rdquo;.&lt;/p&gt;
&lt;p&gt;There are still some nice SKOS and data model RDF datasets available, like those at the &lt;a href=&#34;https://id.loc.gov/&#34;&gt;Library of Congress&lt;/a&gt; and &lt;a href=&#34;https://agrovoc.fao.org/&#34;&gt;AGROVOC&lt;/a&gt;; these are good to push because they encourage data interoperability. And, &lt;a href=&#34;https://schema.org/&#34;&gt;schema.org&lt;/a&gt; has provided an excellent, RDF-based data model that (because of the ease with which RDF data models can be extended) has provided many data projects out there with a nice dose of interoperability.&lt;/p&gt;
&lt;h1 id=&#34;the-pivot&#34;&gt;The pivot&lt;/h1&gt;
&lt;p&gt;Does this mean that the technology was a failure? Not at all. I see it as a very successful &lt;a href=&#34;https://en.wikipedia.org/wiki/Lean_startup#Pivot&#34;&gt;pivot&lt;/a&gt;—a re-application of one or more of an organization&amp;rsquo;s technologies to a new domain to address new use cases, like &lt;a href=&#34;https://philmckinney.medium.com/i-completely-wrote-off-twitter-as-the-stupidest-pivot-ever-e2851f4b117d&#34;&gt;Twitter&lt;/a&gt; and &lt;a href=&#34;https://en.wikipedia.org/wiki/Slack_(software)#History&#34;&gt;Slack&lt;/a&gt; did in their early days. Instead of creating a web of interconnected public RDF datasets that are spread around the world, organizations have created their own internal webs that aren&amp;rsquo;t necessarily &amp;ldquo;semantic&amp;rdquo; but make it easier to share and evolve collections of datasets within those organizations. The data may be stored locally or using a cloud provider like &lt;a href=&#34;https://docs.aws.amazon.com/neptune/latest/userguide/bulk-load-tutorial-format-rdf.html&#34;&gt;AWS&lt;/a&gt;. There are a wide variety of commercial and open source triplestores to store this data as well as tools to create SPARQL endpoint front ends to relational database managers so that legacy relational data can contribute to these internal webs.&lt;/p&gt;
&lt;p&gt;How about the term &amp;ldquo;Linked Data&amp;rdquo;? There isn&amp;rsquo;t much public RDF data to link to outside of the data behind the endpoints mentioned above. But, the globally unique nature of URIs make it easy for a resource&amp;rsquo;s triples within one dataset to reference a resource in any other accessible dataset, which may be public or may be stored on another server behind the same firewall as the referencing document. I think that &amp;ldquo;Linked Data&amp;rdquo; is a perfectly good term to describe the modern version of this, but it is starting to feel a bit old.&lt;/p&gt;
&lt;p&gt;Twelve years ago I &lt;a href=&#34;http://www.bobdc.com/blog/coming-soon-new-expanded-editi/&#34;&gt;described&lt;/a&gt; how the second edition of my book &amp;ldquo;Learning SPARQL&amp;rdquo; had &amp;ldquo;55% more pages! 23% fewer mentions of the semantic web!&amp;rdquo; This was a bit of a joke, but even then I could see the pattern of RDF technologies (the term that I was &lt;a href=&#34;https://www.bobdc.com/blog/selling-rdf-technology-to-big/&#34;&gt;already using&lt;/a&gt; instead of  &amp;ldquo;semantic web&amp;rdquo;) doing quite well without being focused on a semantic version of the World Wide Web.&lt;/p&gt;
&lt;p&gt;After I was nearly done drafting this blog entry here I watched a &lt;a href=&#34;https://www.youtube.com/watch?v=HkeudvN0u_c&#34;&gt;YouTube interview with Andreas Blumauer&lt;/a&gt;, the Senior Vice President of Growth at my employer &lt;a href=&#34;https://graphwise.ai/&#34;&gt;Graphwise&lt;/a&gt;  (and the former CEO and co-founder of PoolParty, which merged with Ontotext to form Graphwise). At one point he says &amp;ldquo;It shouldn&amp;rsquo;t be called &amp;lsquo;semantic web&amp;rsquo; anymore but &amp;lsquo;semantic enterprise standards&amp;rsquo; because we see a lot of that adoption of RDF technologies, SKOS, SPARQL &amp;ndash; all that is now really all around in enterprises which want to implement an enterprise knowledge graph. So that&amp;rsquo;s pretty much the same technology under the hood and enterprise knowledge graphs have adopted the semantic web standards.&amp;rdquo; It was nice to see him confirm these ideas that I had been drafting notes about. The public knowledge graph (as we would now call it) that was part of the dream of the semantic web didn&amp;rsquo;t really happen, but a large, growing number of enterprises are seeing the benefits of having their own enterprise knowledge graphs. (Also, he and I have never discussed this, so it was nice to see him use my preferred term of &amp;ldquo;RDF technologies&amp;rdquo; there.)&lt;/p&gt;
&lt;h1 id=&#34;semantics-sneaking-in-there-after-all&#34;&gt;Semantics sneaking in there after all?&lt;/h1&gt;
&lt;p&gt;By taking text analysis technology that both PoolParty and Ontotext brought to the merger and combining it with LLMs, the &amp;ldquo;meaning&amp;rdquo; of words is now playing a growing role in the resulting applications as we use knowledge graphs to address tasks like hallucination minimization. This is a different kind of semantics from the &amp;ldquo;semantic web&amp;rdquo; kind, as I explained in &lt;a href=&#34;https://www.bobdc.com/blog/semantic-web-semantics-vs-vect/&#34;&gt;Semantic web semantics vs. vector embedding machine learning semantics&lt;/a&gt; nine years ago, and it&amp;rsquo;s the kind that provides the foundation for the large language model usage that is apparently what &amp;ldquo;AI&amp;rdquo; means these days. Like with other aspects of the pivot, it&amp;rsquo;s not really what was planned but has worked out well, making all kinds of great applications possible without much usage of the OWL classes and properties that were originally supposed to provide the semantic web&amp;rsquo;s semantics.&lt;/p&gt;
&lt;hr/&gt;
&lt;p&gt;&lt;em&gt;Comments? Reply to my &lt;a href=&#34;https://mas.to/@bobdc/115282413624037756&#34;&gt;Mastodon&lt;/a&gt; or &lt;a href=&#34;https://bsky.app/profile/bobdc.bsky.social/post/3lzvppedyis26&#34;&gt;Bluesky&lt;/a&gt; posts announcing this blog entry.&lt;/em&gt;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2025">2025</category>
      
      <category domain="https://www.bobdc.com//categories/semantic-web">semantic web</category>
      
    </item>
    
    <item>
      <title>Correcting some outdated &#34;Learning SPARQL&#34; examples</title>
      <link>https://www.bobdc.com/blog/updating-2nd-ed-examples/</link>
      <pubDate>Sun, 27 Jul 2025 12:15:00 +0000</pubDate>
      
      <guid>https://www.bobdc.com/blog/updating-2nd-ed-examples/</guid>
      
      
      <description><div>Revising some queries to accommodate revised data.</div><div>&lt;img src=&#34;https://www.bobdc.com/img/main/2ndEdCoverBig.jpg&#34; border=&#34;0&#34; align=&#34;right&#34; width=&#34;240px&#34; hspace=&#34;30px&#34; vspace=&#34;30px&#34;/&gt; 
&lt;p&gt;O&amp;rsquo;Reilly books such as &lt;a href=&#34;https://www.learningsparql.com/&#34;&gt;Learning SPARQL&lt;/a&gt; have an &lt;a href=&#34;https://www.oreilly.com/catalog/errata.csp?isbn=0636920030829&#34;&gt;errata&lt;/a&gt; page where anyone can submit corrections for the book, and I appreciate all entries. Some are just basic typo misspellings, which is embarrassing. Some are examples that no longer work because a certain SPARQL endpoint is no longer up or, in several cases, because &lt;a href=&#34;https://www.dbpedia.org/&#34;&gt;DBpedia&lt;/a&gt; entries got revised to describe resources using different properties than they did when the book was published.&lt;/p&gt;
&lt;p&gt;For example, page 50 of the book&amp;rsquo;s second edition showed this example, which no long works at &lt;a href=&#34;https://dbpedia.org/snorql/&#34;&gt;DBpedia&amp;rsquo;s SNORQL interface&lt;/a&gt;  to the site&amp;rsquo;s SPARQL endpoint.&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;# filename: ex048.rq
# See update in ex048a.rq.

PREFIX d: &amp;lt;http://dbpedia.org/ontology/&amp;gt;

SELECT ?artistName ?albumName 
WHERE
{
  ?album d:producer :Timbaland .
  ?album d:musicalArtist ?artist . 
  ?album rdfs:label ?albumName . 
  ?artist rdfs:label ?artistName . 
}
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;The &lt;a href=&#34;http://dbpedia.org/ontology/musicalArtist&#34;&gt;http://dbpedia.org/ontology/musicalArtist&lt;/a&gt; property still exists, but DBpedia no longer uses it to describe Timbaland. Also, I shouldn&amp;rsquo;t have assumed that everything he produced was an album; some are singles, so I adjusted a variable name as well. The following now works in SNORQL:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;# filename: ex048a.rq - 2025-07-23 update of ex048.rq
 
PREFIX d: &amp;lt;http://dbpedia.org/ontology/&amp;gt;

SELECT ?artistName ?musicalWork
WHERE
{
  ?album d:producer :Timbaland .
  ?album d:artist ?artist . 
  ?album rdfs:label ?musicalWork . 
  ?artist rdfs:label ?artistName . 
}
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;The comments at the top of the two queries above describe the relationship of the original and revised versions. As the &lt;a href=&#34;https://www.learningsparql.com/2ndeditionexamples/readme.txt&#34;&gt;readme.txt&lt;/a&gt; file that I included with the examples says, &amp;ldquo;A file whose name has the form exNNNa.rq is a 2025 update of the file exNNN.rq from the book&amp;rsquo;s second edition examples.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;I have done this for five examples so far and and will be going through my notes and the submitted errata to create more corrected versions to make available on the learningsparql.com &lt;a href=&#34;https://www.learningsparql.com/2ndeditionexamples/index.html&#34;&gt;examples&lt;/a&gt; page. You should see more &lt;code&gt;exNNNa.rq&lt;/code&gt; files showing up on that page over the next few months.&lt;/p&gt;
&lt;hr/&gt;
&lt;p&gt;&lt;em&gt;Comments? Reply to my &lt;a href=&#34;https://mas.to/@bobdc/114926095348240889&#34;&gt;Mastodon&lt;/a&gt; or &lt;a href=&#34;https://bsky.app/profile/bobdc.bsky.social/post/3luxi7gejr22o&#34;&gt;Bluesky&lt;/a&gt; posts announcing this blog entry.&lt;/em&gt;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2025">2025</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
    </item>
    
    <item>
      <title>ChatGPT and Copilot as OWL processors</title>
      <link>https://www.bobdc.com/blog/chatgpt-copilot-owl/</link>
      <pubDate>Sun, 25 May 2025 11:20:00 +0000</pubDate>
      
      <guid>https://www.bobdc.com/blog/chatgpt-copilot-owl/</guid>
      
      
      <description><div>Pretty impressive.</div><div>&lt;img src=&#34;https://www.bobdc.com/img/main/owlChatgptCopilot.png&#34; border=&#34;0&#34; align=&#34;right&#34; width=&#34;280px&#34; hspace=&#34;30px&#34; vspace=&#34;30px&#34;/&gt;
&lt;p&gt;I asked &lt;a href=&#34;https://en.wikipedia.org/wiki/ChatGPT&#34;&gt;ChatGPT&lt;/a&gt; and &lt;a href=&#34;https://en.wikipedia.org/wiki/Microsoft_Copilot&#34;&gt;Copilot&lt;/a&gt; to parse my two favorite home-grown OWL examples, do the appropriate inferencing, and show me the results, and I was impressed.&lt;/p&gt;
&lt;p&gt;First of all, &lt;a href=&#34;../dontneedowl&#34;&gt;you probably don&amp;rsquo;t need OWL&lt;/a&gt;, but I have one example about querying furniture inventory and another about querying people that I think nicely demonstrate the value of OWL for certain use cases. The first, which I created when &lt;a href=&#34;../trying-out-blazegraph&#34;&gt;Trying out Blazegraph&lt;/a&gt; and have used for other demos since, has triples that identify some chairs and desks and say what rooms they are each in. It also says which rooms are in which buildings. These triples aren&amp;rsquo;t enough to tell a SPARQL processor which furniture is in building 101, but if we have a triple saying that &lt;code&gt;dm:locatedIn&lt;/code&gt; is a transitive property, and we have two more triples telling us that &lt;code&gt;dm:Chair&lt;/code&gt; and &lt;code&gt;dm:Desk&lt;/code&gt; are both subclasses of &lt;code&gt;dm:Furniture&lt;/code&gt;, then an OWL processor can tell us which furniture is in which building. Blazegraph did it fine at the time and GraphDB and several other processors have done fine with it since.&lt;/p&gt;
&lt;p&gt;ChatGPT and Copilot both  did this quite well. It was especially nice to see how they could both explain the implications of the distinction between a regular SPARQL processor and one that uses inferencing.&lt;/p&gt;
&lt;h2 id=&#34;understanding-the-owl-properties-used-to-describe-my-data&#34;&gt;Understanding the OWL properties used to describe my data&lt;/h2&gt;
&lt;p&gt;Below is an edited version of my interaction with ChatGPT about the furniture example:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Here is some RDF data in the turtle syntax that uses RDFS and OWL to define a data model. The RDF data also includes some instance data. I am going to ask you some questions about it:&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I then pasted the furniture data mentioned above; here is the response:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Thanks for sharing the RDF data. It defines a simple ontology involving buildings, rooms, and furniture, and models where each item is located using a transitive property dm:locatedIn.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;I’m ready for your questions—what would you like to ask about this data?&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;List any furniture in building 200.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Furniture and Their Locations:&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;d:chair35 → locatedIn d:room202 → locatedIn d:building200&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;d:desk59 → locatedIn d:room202 → locatedIn d:building200&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;ChatGPT also explained its logic for getting that answer, which I won&amp;rsquo;t reproduce here to get to the interesting part faster. In an earlier version of this exchange, at this point I asked:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Can you show me a SPARQL query that is the equivalent of my question about which chairs are in building 200?&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;It responded with this and some additional explanation:&lt;/p&gt;
&lt;pre&gt;&lt;b&gt;PREFIX d: &lt;http://learningsparql.com/ns/data#&gt;
PREFIX dm: &lt;http://learningsparql.com/ns/demo#&gt;
PREFIX rdf: &lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#&gt;

SELECT ?chair
WHERE {
  ?chair rdf:type dm:Chair .
  ?chair dm:locatedIn+ d:building200 .
}
&lt;/b&gt;&lt;/pre&gt;
&lt;p&gt;(Good SPARQL habit to have the first triple pattern specify the class of the returned data!) I then asked&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;How would that return any results if the input data didn&amp;rsquo;t have any triples of the form ?chair dm:locatedIn+ d:building200 ?&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Awesome question — and you&amp;rsquo;re absolutely right to question that!&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;You&amp;rsquo;re pointing out something subtle but important: SPARQL doesn&amp;rsquo;t do OWL reasoning (like transitive inference) by default unless you either:&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Use a triplestore or reasoner that supports OWL inference (like Apache Jena/Fuseki with reasoning enabled, or GraphDB, Stardog, etc.).&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Explicitly model the transitive paths in the data using property paths (as in dm:locatedIn+) — but only if those intermediate triples exist in the data.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;And then the explanations continued. I was happy to see it spell out the distinction between SPARQL querying that takes advantage of inferencing and SPARQL querying that doesn&amp;rsquo;t, because this is an important point when learning about RDF technology. I was also happy to see it mention GraphDB (the product that I spend most of my professional time documenting or &lt;a href=&#34;https://www.youtube.com/watch?v=dCndx2QJRIQ&#34;&gt;making videos about&lt;/a&gt;) as one of the main tools that can do SPARQL querying with inferencing.&lt;/p&gt;
&lt;p&gt;Trying all of this out with Copilot gave me roughly the same results.&lt;/p&gt;
&lt;h2 id=&#34;inferring-class-membership&#34;&gt;Inferring class membership&lt;/h2&gt;
&lt;p&gt;A very old-school use of OWL is to declare classes and then, instead of listing which instances belong to which classes (which is what most systems with any notion of classes and instances do), to list conditions for membership in each class and then query to see what their instances are.&lt;/p&gt;
&lt;p&gt;OWL classes that are defined by describing which instances meet which conditions are known as restriction classes. In some cases, these conditions may be so complex that the calculations must run overnight. This led to various OWL profiles that standardized on specific subsets of inferencing capabilities so that implementations optimized for certain inferencing tasks wouldn&amp;rsquo;t necessarily take so long to run.&lt;/p&gt;
&lt;p&gt;Example &lt;a href=&#34;https://www.learningsparql.com/2ndeditionexamples/ex424.ttl&#34;&gt;ex424.ttl&lt;/a&gt; from my book &lt;a href=&#34;https://www.learningsparql.com/&#34;&gt;Learning SPARQL&lt;/a&gt; demonstrates the use of restriction classes by listing several musicians, where they are from, and what instruments they play, with no explicit indication of any of them being members of classes. The dataset includes a &lt;code&gt;dm:Guitarist&lt;/code&gt; class defined as someone who plays the guitar and a &lt;code&gt;dm:Texan&lt;/code&gt; class defined as someone who has &amp;ldquo;TX&amp;rdquo; as their &lt;code&gt;dm:stateOfBirth&lt;/code&gt; value. A third class, &lt;code&gt;dm:TexasGuitarPlayer&lt;/code&gt;, is defined as the intersection of the first two classes. (Set operations are a popular tool for defining OWL restriction classes.)&lt;/p&gt;
&lt;p&gt;I told Copilot the following:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The following is an RDF OWL model with some instance data, written in the Turtle syntax. Perform any necessary inferencing with this as input and then I will ask you some questions about the resulting triples.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;I&amp;rsquo;d be happy to help with RDF OWL inferencing! Please provide the Turtle syntax model, and I&amp;rsquo;ll process it accordingly. Once I&amp;rsquo;ve performed the necessary inferencing, you can ask me questions about the resulting triples.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I then pasted the &lt;code&gt;ex424.ttl&lt;/code&gt; content. Here is how it responded:&lt;/p&gt;
&lt;img src=&#34;https://www.bobdc.com/img/main/copilotAndTXGuitarPlayers.png&#34; alt=&#34;[Copilot result of OWL inferencing]&#34; border=&#34;0&#34; /&gt;
&lt;p&gt;I had one more request for it:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;List the instances of &lt;code&gt;http://learningsparql.com/ns/demo#TexasGuitarPlayer&lt;/code&gt;.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;We&amp;rsquo;ve already seen the answer above, but I wanted to see if it understood that &lt;code&gt;http://learningsparql.com/ns/demo#TexasGuitarPlayer&lt;/code&gt; was the full URI version of &lt;code&gt;dm:TexasGuitarPlayer&lt;/code&gt;. It did:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Based on the inferencing we performed, the instances of &lt;code&gt;dm:TexasGuitarPlayer&lt;/code&gt; (&lt;code&gt;http://learningsparql.com/ns/demo#TexasGuitarPlayer&lt;/code&gt;) are:&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;d:m2&lt;/code&gt; (Charlie Christian)&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;d:m6&lt;/code&gt; (Roky Erickson)&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;These musicians are both born in Texas (&lt;code&gt;d:TX&lt;/code&gt;) and play guitar (&lt;code&gt;d:Guitar&lt;/code&gt;), fulfilling the criteria for being classified as &lt;code&gt;dm:TexasGuitarPlayer&lt;/code&gt;. Let me know if you’d like to explore further details or additional queries!&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;(As a tech writer, it was nice to see it put the prefixed names in a monospaced font.) With the same input, ChatGPT inferred the &lt;code&gt;dm:TexasGuitarPlayer&lt;/code&gt; class membership just fine.&lt;/p&gt;
&lt;p&gt;If you&amp;rsquo;re a really big OWL fan, it would be interesting to push the boundaries of some of those OWL profiles that I described above by trying out the most sophisticated features of the different OWL profiles with ChatGPT, Copilot, and other LLM-based chat tools.&lt;/p&gt;
&lt;hr/&gt;
&lt;p&gt;&lt;em&gt;Comments? Reply to my &lt;a href=&#34;https://mas.to/@bobdc/114569318555462241&#34;&gt;Mastodon&lt;/a&gt; or &lt;a href=&#34;https://bsky.app/profile/bobdc.bsky.social/post/3lpz224et522j&#34;&gt;Bluesky&lt;/a&gt; posts announcing this blog entry.&lt;/em&gt;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2025">2025</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
      <category domain="https://www.bobdc.com//categories/owl">OWL</category>
      
      <category domain="https://www.bobdc.com//categories/ai">AI</category>
      
    </item>
    
    <item>
      <title>Converting RDFS schemas to SHACL constraints</title>
      <link>https://www.bobdc.com/blog/rdfs2shacl/</link>
      <pubDate>Sun, 09 Mar 2025 11:10:00 +0000</pubDate>
      
      <guid>https://www.bobdc.com/blog/rdfs2shacl/</guid>
      
      
      <description><div>With SPARQL, of course.</div><div>&lt;p&gt;(This may look like a long blog entry, but it&amp;rsquo;s mostly sample schemas, data, and shapes. It should be a quick read.)&lt;/p&gt;
&lt;blockquote id=&#34;id202455&#34; class=&#34;pullquote&#34;&gt;if RDF technology uses triples to express &lt;i&gt;everything&lt;/i&gt; (except queries), why not automate the creation of SHACL constraints from RDF schema declarations?&lt;/blockquote&gt;
&lt;p&gt;In a blog entry titled &lt;a href=&#34;../whatisrdfspart2/&#34;&gt;What else can I do with RDFS?&lt;/a&gt; I described how the triple { &lt;code&gt;vcard:given-name rdfs:domain emp:Person}&lt;/code&gt; lets us infer that a resource with a &lt;code&gt;vcard:given-name&lt;/code&gt; value is an instance of class &lt;code&gt;emp:Person&lt;/code&gt;. I then wrote &amp;ldquo;Sometimes we forget that RDFS and OWL were invented to enable this kind of inferencing across data found on the web. They were not invented to help us define data structures, but as I’ve shown, RDFS is handy to at least document them.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;In the data processing world, the purpose of schemas is usually to describe the structure of some data so that a person or process working with that data knows what to expect. If a standard automated process flags parts of the data that don&amp;rsquo;t comply with the schema, that&amp;rsquo;s a Good Thing—it means that the person working with the data doesn&amp;rsquo;t need to write error-checking code to do that.&lt;/p&gt;
&lt;p&gt;As I described above, this was not the reason for RDF schemas, but they&amp;rsquo;ve still been a handy way to describe the structure of a given dataset. Using these schemas for error checking is not an incorrect use of them; &lt;a href=&#34;https://www.w3.org/TR/rdf11-schema/#ch_domainrange&#34;&gt;section 4 of the RDF Schema specification&lt;/a&gt; tells us &amp;ldquo;Different applications will use this information in different ways. For example, data checking tools might use this to help discover errors in some data set, an interactive editor might suggest appropriate values, and a reasoning application might use it to infer additional information from instance data.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;Some people though that OWL would make it easier to describe these constraints, but to really enforce the constraints, OWL just made it more complicated. So, the W3C eventually published the &lt;a href=&#34;https://www.bobdc.com/blog/validating-rdf-data-with-shacl/&#34;&gt;Shapes Constraint Language&lt;/a&gt; standard, or SHACL. This makes it relatively easy to specify typical constraints such as &amp;ldquo;an instance of Employee must have a family name and given name value&amp;rdquo; and &amp;ldquo;an instance of employee must have another employee instance as its &lt;code&gt;emp:reportsTo&lt;/code&gt; value&amp;rdquo;.&lt;/p&gt;
&lt;p&gt;If I want to write out a list of classes and properties that are in a given dataset, though, it&amp;rsquo;s still much simpler with RDFS. Then I had an idea: if RDF technology uses triples to express &lt;em&gt;everything&lt;/em&gt; (except queries), why not automate the creation of SHACL constraints from RDF schema declarations? It turned out to be surprisingly easy.&lt;/p&gt;
&lt;p&gt;Here is a sample schema excerpt for a community orchestra. It declares classes for musicians and instruments and describes two properties:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;the &lt;code&gt;m:Musician&lt;/code&gt; class&amp;rsquo;s &lt;code&gt;m:plays&lt;/code&gt; property, whose value is an instance of &lt;code&gt;m:Instrument&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;the same class&amp;rsquo;s &lt;code&gt;m:joined&lt;/code&gt; property, which shows the date that the musician joined the group&lt;/li&gt;
&lt;/ul&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;@prefix m:    &amp;lt;http://learningsparql.com/ns/music#&amp;gt; .
@prefix rdfs: &amp;lt;http://www.w3.org/2000/01/rdf-schema#&amp;gt; .
@prefix rdf:  &amp;lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#&amp;gt; .
@prefix xsd:  &amp;lt;http://www.w3.org/2001/XMLSchema#&amp;gt; .

m:Musician a rdfs:Class . 
m:Instrument a rdfs:Class .

m:plays a rdf:Property ;
        rdfs:domain m:Musician ;
        rdfs:range m:Instrument . 

m:joined a rdf:Property ; 
         rdfs:domain m:Musician ;
         rdfs:range xsd:date . 
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;My goal was to write a SPARQL &lt;code&gt;CONSTRUCT&lt;/code&gt; query that created SHACL shapes from the schema above to flag the following errors when they come up in instance data:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;an &lt;code&gt;m:plays&lt;/code&gt; triple whose value was not an instance of &lt;code&gt;m:Instrument&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;an &lt;code&gt;m:joined&lt;/code&gt; triple whose value was not a proper ISO 8601 date&lt;/li&gt;
&lt;li&gt;a musician with more than one &lt;code&gt;m:joined&lt;/code&gt; value&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The query that creates these shapes should not be about this specific data but work more generally with other object and datatype property values. Having this work with both &lt;a href=&#34;https://www.w3.org/TR/owl-ref/#Property&#34;&gt;object properties and datatype properties&lt;/a&gt; was very important for handling a wide variety of data structures.&lt;/p&gt;
&lt;p&gt;A query to do this was briefer than I thought it would be:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;PREFIX rdfs: &amp;lt;http://www.w3.org/2000/01/rdf-schema#&amp;gt;
PREFIX xsd:  &amp;lt;http://www.w3.org/2001/XMLSchema#&amp;gt;
PREFIX sh:   &amp;lt;http://www.w3.org/ns/shacl#&amp;gt;

CONSTRUCT {
  ?class a sh:NodeShape ;
  sh:targetClass ?class ;
  sh:property [
     a sh:PropertyShape ; 
     sh:path ?property ;
     ?rangePredicate ?propertyRange ;
     sh:minCount 1 ;
     sh:maxCount 1 
   ]
}
WHERE {
  ?class a rdfs:Class .
  ?property rdfs:domain ?class ;
            rdfs:range ?propertyRange .
  BIND(IF(contains(xsd:string(?propertyRange),
          &amp;#34;http://www.w3.org/2001/XMLSchema#&amp;#34;),
	  sh:datatype, sh:class) AS ?rangePredicate) . 
}
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;I won&amp;rsquo;t describe the details of the SHACL syntax that it creates, because you can look that up yourself. The only somewhat tricky part of the query was identifying whether a declared property was a datatype property or an object property. The &lt;code&gt;IF()&lt;/code&gt; statement that does this assumes that if a property is not a datatype property, it&amp;rsquo;s an object property; if you have more complex data, you can nest &lt;code&gt;IF()&lt;/code&gt; function calls to cover more complex cases.&lt;/p&gt;
&lt;p&gt;The query adds &lt;code&gt;sh:minCount&lt;/code&gt; and &lt;code&gt;sh:maxCount&lt;/code&gt; values of 1 for all properties so that each property is required and can have only one value. An orchestra member may actually play more than one instrument, so the SHACL shapes that this query outputs can be easily edited to account for that. For me, the real value of the query above is to automate the creation of the shapes and their relationships, leaving me to do easy things like adjusting the count values by hand.&lt;/p&gt;
&lt;p&gt;Let&amp;rsquo;s see it in action. Here are the SHACL shapes that the &lt;code&gt;CONSTRUCT&lt;/code&gt; query created from the musician schema above:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;@prefix m:    &amp;lt;http://learningsparql.com/ns/music#&amp;gt; .
@prefix rdf:  &amp;lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#&amp;gt; .
@prefix rdfs: &amp;lt;http://www.w3.org/2000/01/rdf-schema#&amp;gt; .
@prefix sh:   &amp;lt;http://www.w3.org/ns/shacl#&amp;gt; .
@prefix xsd:  &amp;lt;http://www.w3.org/2001/XMLSchema#&amp;gt; .

m:Musician  rdf:type    sh:NodeShape;
        sh:property     [ rdf:type     sh:PropertyShape;
                          sh:datatype  xsd:date;
                          sh:maxCount  1;
                          sh:minCount  1;
                          sh:path      m:joined
                        ];
        sh:property     [ rdf:type     sh:PropertyShape;
                          sh:class     m:Instrument;
                          sh:maxCount  1;
                          sh:minCount  1;
                          sh:path      m:plays
                        ];
        sh:targetClass  m:Musician .
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Do these shapes do what they&amp;rsquo;re supposed to do? In the following sample data, the musician kim has two different &lt;code&gt;m:joined&lt;/code&gt; values. Musician pat has only one, but it&amp;rsquo;s not a proper ISO 8601 date. Also, pat has a &lt;code&gt;m:plays&lt;/code&gt; value of &lt;code&gt;m:kim&lt;/code&gt;, which is not an instance of the &lt;code&gt;m:Instrument&lt;/code&gt; class.&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;@prefix m: &amp;lt;http://learningsparql.com/ns/music#&amp;gt; .
@prefix rdfs: &amp;lt;http://www.w3.org/2000/01/rdf-schema#&amp;gt; .
@prefix rdf: &amp;lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#&amp;gt; .
@prefix xsd: &amp;lt;http://www.w3.org/2001/XMLSchema#&amp;gt; .
@prefix sh: &amp;lt;http://www.w3.org/ns/shacl#&amp;gt; .

### instance data ###

m:guitar a m:Instrument .

m:piano a m:Instrument . 

m:kim a m:Musician ;
   m:joined &amp;#34;2024-10-12&amp;#34;^^xsd:date ;
   m:joined &amp;#34;2024-10-13&amp;#34;^^xsd:date ;
   m:plays m:guitar . 
   
m:pat a m:Musician ;
   m:joined &amp;#34;2023-03-13&amp;#34; ;
   m:plays m:kim .
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Validating the shapes created by the &lt;code&gt;CONSTRUCT&lt;/code&gt; query against this instance data (using the &lt;a href=&#34;https://shacl.org/playground/&#34;&gt;SHACL Playground&lt;/a&gt;, &lt;a href=&#34;https://graphdb.ontotext.com/documentation/10.8/shacl-validation.html&#34;&gt;GraphDB&lt;/a&gt;, and &lt;a href=&#34;https://www.bobdc.com/blog/jenagems/#shacl&#34;&gt;Jena&amp;rsquo;s SHACL validator&lt;/a&gt;), I got this result:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;@prefix m:    &amp;lt;http://learningsparql.com/ns/music#&amp;gt; .
@prefix rdf:  &amp;lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#&amp;gt; .
@prefix rdfs: &amp;lt;http://www.w3.org/2000/01/rdf-schema#&amp;gt; .
@prefix sh:   &amp;lt;http://www.w3.org/ns/shacl#&amp;gt; .
@prefix xsd:  &amp;lt;http://www.w3.org/2001/XMLSchema#&amp;gt; .

[ rdf:type     sh:ValidationReport;
  sh:conforms  false;
  sh:result    [ rdf:type                      sh:ValidationResult;
                 sh:focusNode                  m:pat;
                 sh:resultMessage              &amp;#34;DatatypeConstraint[xsd:date]: Expected xsd:date : Actual xsd:string : Node \&amp;#34;2023-03-13\&amp;#34;&amp;#34;;
                 sh:resultPath                 m:joined;
                 sh:resultSeverity             sh:Violation;
                 sh:sourceConstraintComponent  sh:DatatypeConstraintComponent;
                 sh:sourceShape                _:b0;
                 sh:value                      &amp;#34;2023-03-13&amp;#34;
               ];
  sh:result    [ rdf:type                      sh:ValidationResult;
                 sh:focusNode                  m:pat;
                 sh:resultMessage              &amp;#34;ClassConstraint[&amp;lt;http://learningsparql.com/ns/music#Instrument&amp;gt;]: Expected class :&amp;lt;http://learningsparql.com/ns/music#Instrument&amp;gt; for &amp;lt;http://learningsparql.com/ns/music#kim&amp;gt;&amp;#34;;
                 sh:resultPath                 m:plays;
                 sh:resultSeverity             sh:Violation;
                 sh:sourceConstraintComponent  sh:ClassConstraintComponent;
                 sh:sourceShape                [] ;
                 sh:value                      m:kim
               ];
  sh:result    [ rdf:type                      sh:ValidationResult;
                 sh:focusNode                  m:kim;
                 sh:resultMessage              &amp;#34;maxCount[1]: Invalid cardinality: expected max 1: Got count = 2&amp;#34;;
                 sh:resultPath                 m:joined;
                 sh:resultSeverity             sh:Violation;
                 sh:sourceConstraintComponent  sh:MaxCountConstraintComponent;
                 sh:sourceShape                _:b0
               ]
] .
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;It looks like the shapes created by the &lt;code&gt;CONSTRUCT&lt;/code&gt; query did their job. (Isn&amp;rsquo;t it great that, along with RDFS schemas and SHACL shapes, the validation output is also expressed in triples? This means that you can make it part of a pipeline that combines additional steps into a complex workflow.)&lt;/p&gt;
&lt;p&gt;I also tried it with this next scheme, where the &lt;code&gt;hr:Employee&lt;/code&gt; class&amp;rsquo;s &lt;code&gt;hr:reportsTo&lt;/code&gt; property should have a value that is another &lt;code&gt;hr:Employee&lt;/code&gt; instance, and the &lt;code&gt;hr:jobGrade&lt;/code&gt; value must be an integer:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;@prefix hr:   &amp;lt;http://learningsparql.com/ns/humanResources#&amp;gt; .
@prefix d:    &amp;lt;http://learningsparql.com/ns/data#&amp;gt; .
@prefix rdfs: &amp;lt;http://www.w3.org/2000/01/rdf-schema#&amp;gt; .
@prefix rdf:  &amp;lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#&amp;gt; .
@prefix xsd:  &amp;lt;http://www.w3.org/2001/XMLSchema#&amp;gt; .
@prefix sh:  &amp;lt;http://www.w3.org/ns/shacl#&amp;gt; .

hr:Employee a rdfs:Class .

hr:reportsTo a rdf:Property ;
rdfs:domain hr:Employee ;
rdfs:range hr:Employee . 

hr:name
   rdf:type rdf:Property ;
   rdfs:domain hr:Employee .

hr:hireDate
   rdf:type rdf:Property ;
   rdfs:domain hr:Employee ;
   rdfs:range xsd:date .

hr:jobGrade
   rdf:type rdf:Property ;
   rdfs:domain hr:Employee ;
   rdfs:range xsd:integer .
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;The &lt;code&gt;CONSTRUCT&lt;/code&gt; query above created these shapes from that:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;@prefix d:    &amp;lt;http://learningsparql.com/ns/data#&amp;gt; .
@prefix hr:   &amp;lt;http://learningsparql.com/ns/humanResources#&amp;gt; .
@prefix rdf:  &amp;lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#&amp;gt; .
@prefix rdfs: &amp;lt;http://www.w3.org/2000/01/rdf-schema#&amp;gt; .
@prefix sh:   &amp;lt;http://www.w3.org/ns/shacl#&amp;gt; .
@prefix xsd:  &amp;lt;http://www.w3.org/2001/XMLSchema#&amp;gt; .

hr:Employee  rdf:type   sh:NodeShape;
        sh:property     [ rdf:type     sh:PropertyShape;
                          sh:datatype  xsd:date;
                          sh:maxCount  1;
                          sh:minCount  1;
                          sh:path      hr:hireDate
                        ];
        sh:property     [ rdf:type     sh:PropertyShape;
                          sh:class     hr:Employee;
                          sh:maxCount  1;
                          sh:minCount  1;
                          sh:path      hr:reportsTo
                        ];
        sh:property     [ rdf:type     sh:PropertyShape;
                          sh:datatype  xsd:integer;
                          sh:maxCount  1;
                          sh:minCount  1;
                          sh:path      hr:jobGrade
                        ];
        sh:targetClass  hr:Employee .
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;My sample test instance data for that has an employee e3 who reports to &lt;code&gt;d:d1&lt;/code&gt;, a resource not mentioned elsewhere in the data as an instance of &lt;code&gt;hr:Employee&lt;/code&gt; or anything else. Employee e3 also has a non-integer job grade.&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;@prefix d:    &amp;lt;http://learningsparql.com/ns/data#&amp;gt; .
@prefix hr:   &amp;lt;http://learningsparql.com/ns/humanResources#&amp;gt; .
@prefix rdfs: &amp;lt;http://www.w3.org/2000/01/rdf-schema#&amp;gt; .
@prefix rdf:  &amp;lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#&amp;gt; .
@prefix xsd:  &amp;lt;http://www.w3.org/2001/XMLSchema#&amp;gt; .
@prefix sh:   &amp;lt;http://www.w3.org/ns/shacl#&amp;gt; .

d:e1
   a hr:Employee;
   hr:name &amp;#34;Barry Wom&amp;#34; ;
   hr:hireDate &amp;#34;2017-06-03&amp;#34;^^xsd:date ;
   hr:reportsTo d:e3 ; 
   hr:jobGrade 5 .

d:e3
   a hr:Employee;
   hr:name &amp;#34;Stig O&amp;#39;Hara&amp;#34; ;
   hr:hireDate &amp;#34;2017-03-14&amp;#34;^^xsd:date ;
   hr:jobGrade 3.14 ;
   hr:reportsTo d:d1 .
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;When the employee shapes created by the SPARQL query are run against this sample data, it finds both problems:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;@prefix d:    &amp;lt;http://learningsparql.com/ns/data#&amp;gt; .
@prefix hr:   &amp;lt;http://learningsparql.com/ns/humanResources#&amp;gt; .
@prefix rdf:  &amp;lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#&amp;gt; .
@prefix rdfs: &amp;lt;http://www.w3.org/2000/01/rdf-schema#&amp;gt; .
@prefix sh:   &amp;lt;http://www.w3.org/ns/shacl#&amp;gt; .
@prefix xsd:  &amp;lt;http://www.w3.org/2001/XMLSchema#&amp;gt; .

[ rdf:type     sh:ValidationReport;
  sh:conforms  false;
  sh:result    [ rdf:type                      sh:ValidationResult;
                 sh:focusNode                  d:e3;
                 sh:resultMessage              &amp;#34;ClassConstraint[&amp;lt;http://learningsparql.com/ns/humanResources#Employee&amp;gt;]: Expected class :&amp;lt;http://learningsparql.com/ns/humanResources#Employee&amp;gt; for &amp;lt;http://learningsparql.com/ns/data#d1&amp;gt;&amp;#34;;
                 sh:resultPath                 hr:reportsTo;
                 sh:resultSeverity             sh:Violation;
                 sh:sourceConstraintComponent  sh:ClassConstraintComponent;
                 sh:sourceShape                [] ;
                 sh:value                      d:d1
               ];
  sh:result    [ rdf:type                      sh:ValidationResult;
                 sh:focusNode                  d:e3;
                 sh:resultMessage              &amp;#34;DatatypeConstraint[xsd:integer]: Expected xsd:integer : Actual xsd:decimal : Node 3.14&amp;#34;;
                 sh:resultPath                 hr:jobGrade;
                 sh:resultSeverity             sh:Violation;
                 sh:sourceConstraintComponent  sh:DatatypeConstraintComponent;
                 sh:sourceShape                [] ;
                 sh:value                      3.14
               ]
] .
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Any SHACL fan is going to think of other things that the &lt;code&gt;CONSTRUCT&lt;/code&gt; query can deduce from a regular RDFS schema in order to add more useful triples to the SHACL shapes created from that schema. Let me know what you come up with!&lt;/p&gt;
&lt;hr/&gt;
&lt;p&gt;&lt;em&gt;Comments? Reply to my &lt;a href=&#34;https://mas.to/@bobdc/114133176596206659&#34;&gt;Mastodon&lt;/a&gt; or &lt;a href=&#34;https://bsky.app/profile/bobdc.bsky.social/post/3ljxe4hgqmc25&#34;&gt;Bluesky&lt;/a&gt; posts announcing this blog entry.&lt;/em&gt;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2025">2025</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
      <category domain="https://www.bobdc.com//categories/shacl">SHACL</category>
      
      <category domain="https://www.bobdc.com//categories/rdfs">RDFS</category>
      
    </item>
    
    <item>
      <title>Filtering foreign literals out of SPARQL query results</title>
      <link>https://www.bobdc.com/blog/filterforeignliterals/</link>
      <pubDate>Sun, 26 Jan 2025 10:20:00 +0000</pubDate>
      
      <guid>https://www.bobdc.com/blog/filterforeignliterals/</guid>
      
      
      <description><div>And only the foreign literals.</div><div>&lt;blockquote class=&#34;pullquote&#34;&gt;At first I was treating this like an overly complex logic puzzle, wondering how I could get literals that were (not (not English)).&lt;/blockquote&gt;
&lt;p&gt;It&amp;rsquo;s &lt;a href=&#34;https://www.w3.org/TR/sparql11-query/#func-lang&#34;&gt;easy enough&lt;/a&gt; for a SPARQL query to specify that you only want literal values that are tagged with a particular spoken language such as English or French. I had a more complex condition to express recently that has happened to me fairly often: how do I retrieve all the data for a particular resource &lt;em&gt;except&lt;/em&gt; the literals tagged in a foreign language? I want all the triples with object property values, and I want all the ones with literal values, regardless of type, unless they are tagged in a language other than English. (Obviously, you can substitute another language tag as the only one whose values you want to see.)&lt;/p&gt;
&lt;p&gt;This came up when I was playing with &lt;a href=&#34;https://yago-knowledge.org/&#34;&gt;YAGO&lt;/a&gt;, but it has also happened when I was working with &lt;a href=&#34;https://www.wikidata.org/wiki/Wikidata:Main_Page&#34;&gt;Wikidata&lt;/a&gt; and &lt;a href=&#34;https://www.dbpedia.org/&#34;&gt;DBpedia&lt;/a&gt;. These are such international data collections that many of the string literal values are available in many languages, which is great, but when I retrieve all the data for a given resource, I see lots and lots of string values that I don&amp;rsquo;t need.&lt;/p&gt;
&lt;p&gt;For example, try a &lt;a href=&#34;https://query.wikidata.org/#SELECT%20%2a%20WHERE%20%7B%0A%20%20%3Chttp%3A%2F%2Fwww.wikidata.org%2Fentity%2FQ1369941%3E%20%3Fp%20%3Fo%20%0A%7D&#34;&gt;Wikidata query about data for bebop bassist Tommy Potter&lt;/a&gt;. Of the 156 triples that get returned, 25 are &lt;code&gt;rdfs:label&lt;/code&gt; values for his name tagged for different languages (but usually showing &amp;ldquo;Tommy Potter&amp;rdquo;), 3 are &lt;code&gt;skos:altLabel&lt;/code&gt; values for his name tagged with different languages, and 27 triples are &lt;code&gt;schema:description&lt;/code&gt;  values with the English one being &amp;ldquo;American jazz double bassist (1918–1988)&amp;rdquo; and the rest being variations on that in other languages.&lt;/p&gt;
&lt;p&gt;If I ask for triples whose objects are tagged as English language values, like this,&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;SELECT * WHERE {
  &amp;lt;http://www.wikidata.org/entity/Q1369941&amp;gt; ?p ?o 
  FILTER( lang(?o) = &amp;#34;en&amp;#34;)
}
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;I&amp;rsquo;ll only get three search results: the English version of each of the three properties mentioned above. I&amp;rsquo;ll miss out on literal values that aren&amp;rsquo;t tagged as English, whether they are strings or other data types such as the one for Potter&amp;rsquo;s birthday. I&amp;rsquo;ll also miss out on triples that have a URI as an object.&lt;/p&gt;
&lt;p&gt;To come up with a good FILTER, I switched from querying for Tommy Potter data to querying the following small test data set that  I created:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;@prefix ls: &amp;lt;http://www.learningsparql/ns/test&amp;gt; .
@prefix rdfs: &amp;lt;http://www.w3.org/2000/01/rdf-schema#&amp;gt; . 
@prefix xsd: &amp;lt;http://www.w3.org/2001/XMLSchema#&amp;gt; .

ls:someEntity rdfs:label &amp;#34;some entity&amp;#34; ;
			  rdfs:label &amp;#34;Some Entity&amp;#34;@en ;
			  rdfs:label &amp;#34;alguna entidad&amp;#34;@es ;
			  ls:created &amp;#34;2025-01-23&amp;#34;^^xsd:date ;
			  ls:amount  4 ;
			  ls:rating  3.14 ;
			  ls:needs  ls:someOtherEntity . 
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;I wanted a query that would retrieve all of this data except for the &amp;ldquo;alguna entidad&amp;rdquo; triple. I wanted the one with the &amp;ldquo;en&amp;rdquo; language tag, the string with no language tag, the three typed literals, and the triple that has a URI as an object.&lt;/p&gt;
&lt;p&gt;At first I was treating this like an overly complex logic puzzle, wondering how I could get literals that were (not (not English)). I finally realized how easy it would be to just have a boolean OR ask for:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The triples where the objects are URIs.&lt;/li&gt;
&lt;li&gt;Literals that are tagged as being in English.&lt;/li&gt;
&lt;li&gt;Literals that have no language tag. This would get the first &amp;ldquo;some entity&amp;rdquo; triple in my sample data, but perhaps more importantly, it would get the &lt;code&gt;ls:created&lt;/code&gt;, &lt;code&gt;ls:amount&lt;/code&gt;, and &lt;code&gt;ls:rating&lt;/code&gt; values.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The following does this.&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;SELECT * WHERE {
   ?s ?p ?o .
   FILTER( ISIRI(?o) || (lang(?o) = &amp;#34;en&amp;#34;)  ||  (!(langMatches(lang(?o),&amp;#34;*&amp;#34;))) )
}
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;The first two filter conditions are basic SPARQL: if a triple&amp;rsquo;s object is an IRI, we want it; if the triple&amp;rsquo;s object has a language tag of &amp;ldquo;en&amp;rdquo; for English, we want it.&lt;/p&gt;
&lt;p&gt;The third filter condition uses the &lt;code&gt;langMatches()&lt;/code&gt; function. I had forgotten about this one but was reminded by the section &amp;ldquo;Checking, Adding and Removing Spoken Language Tags&amp;rdquo; of my book &lt;a href=&#34;https://www.learningsparql.com/&#34;&gt;Learning SPARQL&lt;/a&gt;. Without the &lt;code&gt;!&lt;/code&gt; to do a boolean NOT, the &lt;code&gt;langMatches()&lt;/code&gt; expression in this query with &amp;ldquo;*&amp;rdquo; as an argument would return True for any value of &lt;code&gt;?o&lt;/code&gt; that has any language tag; with the boolean NOT it returns True for any value that has no language tag. So, it does the job described by the third bullet above.&lt;/p&gt;
&lt;p&gt;For my &amp;ldquo;some entity&amp;rdquo; sample data this query returned everything but the &amp;ldquo;alguna entidad&amp;rdquo;@es triple, as I had hoped. For the query of Tommy Potter data, you can &lt;a href=&#34;https://query.wikidata.org/#SELECT%20%2a%20WHERE%20%7B%0A%20%20%20%20%3Chttp%3A%2F%2Fwww.wikidata.org%2Fentity%2FQ1369941%3E%20%3Fp%20%3Fo%20.%0A%20%20%20%20FILTER%28%20ISIRI%28%3Fo%29%20%7C%7C%20%28lang%28%3Fo%29%20%3D%20%22en%22%29%20%20%7C%7C%20%20%28%21%28langMatches%28lang%28%3Fo%29%2C%22%2a%22%29%29%29%20%29%0A%0A%7D&#34;&gt;see for yourself&lt;/a&gt; that it returns 104 rows instead of 156, with no literal values tagged with a language other than English. The results include only one row for each of the &lt;code&gt;rdfs:label&lt;/code&gt;, &lt;code&gt;rdfs:description&lt;/code&gt;, and &lt;code&gt;skos:altLabel&lt;/code&gt; values. (Changing the &amp;ldquo;en&amp;rdquo; in the Tommy Potter query to &amp;ldquo;de&amp;rdquo; for German and &amp;ldquo;es&amp;rdquo; for Spanish got the expected results.)&lt;/p&gt;
&lt;p&gt;If anyone can suggest a more efficient version of that boolean &lt;code&gt;FILTER&lt;/code&gt; condition I&amp;rsquo;d love to see it, but meanwhile I&amp;rsquo;m sure I&amp;rsquo;ll be pasting it into a lot more queries in the future when I explore large international datasets.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;2025-02-01 update: I have learned that langmatches(lang(?o),&amp;quot;&amp;quot;) does the same thing as (!(langMatches(lang(?o),&amp;quot;*&amp;quot;))), so that simplifies the expression more. Thanks &lt;a href=&#34;https://www.linkedin.com/feed/update/urn:li:activity:7289303997264351232?commentUrn=urn%3Ali%3Acomment%3A%28activity%3A7289303997264351232%2C7289571596011278336%29&amp;amp;dashCommentUrn=urn%3Ali%3Afsd_comment%3A%287289571596011278336%2Curn%3Ali%3Aactivity%3A7289303997264351232%29&#34;&gt;
Mohammad Hossein Rimaz&lt;/a&gt; and &lt;a href=&#34;https://mstdn.social/@janmartinkeil/113901220387268807&#34;&gt;Jan Martin Keil&lt;/a&gt;!&lt;/em&gt;&lt;/p&gt;
&lt;hr/&gt;
&lt;p&gt;&lt;em&gt;Comments? Reply to my &lt;a href=&#34;https://mas.to/@bobdc/113895362038956690&#34;&gt;Mastodon message&lt;/a&gt; or &lt;a href=&#34;https://bsky.app/profile/bobdc.bsky.social/post/3lgnqkgrf3k2l&#34;&gt;Bluesky post&lt;/a&gt; announcing this blog entry.&lt;/em&gt;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2025">2025</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
    </item>
    
    <item>
      <title>Parsing JSON with Python</title>
      <link>https://www.bobdc.com/blog/pythonjson/</link>
      <pubDate>Sun, 15 Dec 2024 10:10:00 +0000</pubDate>
      
      <guid>https://www.bobdc.com/blog/pythonjson/</guid>
      
      
      <description><div>My personal quick reference</div><div>&lt;p&gt;It seems like every few months I have a project where I need to parse some JSON and pull out certain parts. Maybe the JSON came in JSON files, or maybe I retrieved it from an API. The duration between each of these occasions is long enough that I&amp;rsquo;ve had to relearn some basics each time, so a year or two ago I made a sample JSON file that demonstrates a few data structures and features, and then I wrote a Python demo script that parses them. Now I look at that script to review the basics each time I need to do this.&lt;/p&gt;
&lt;p&gt;I usually need to pull out a subset of that JSON and convert it to RDF triples. If it&amp;rsquo;s &lt;a href=&#34;../json-ld&#34;&gt;JSON-LD&lt;/a&gt;, I don&amp;rsquo;t need any Python parsing because it&amp;rsquo;s already an RDF serialization format, so I can feed it to any proper RDF parser as-is, but it&amp;rsquo;s rarely JSON-LD.&lt;/p&gt;
&lt;p&gt;Another option is AtomGraph&amp;rsquo;s &lt;a href=&#34;../json2rdf&#34;&gt;JSON2RDF&lt;/a&gt;. This converts any JSON at all to RDF, but if I only need a small subset of the data, I need to then create a SPARQL query to run with the JSON2RDF output so that I can pull out the parts that I want and convert them to the RDF classes and properties that I need. And, I would also have to build and install JSON2RDF on the platform where I&amp;rsquo;m running this, which was not an option on the server where I recently had work with some JSON.&lt;/p&gt;
&lt;p&gt;My sample demo data to parse is pretty close to the test input that I used when I wrote about JSON2RDF:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-json&#34; data-lang=&#34;json&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;mydata&amp;#34;&lt;/span&gt;: {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;	&lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;color&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;red&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;	&lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;amount&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#ae81ff&#34;&gt;3&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;	&lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;arrayTest&amp;#34;&lt;/span&gt;: [
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;	    &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;north&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;	    &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;south&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;	    &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;east&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;	    &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;escaped \&amp;#34;test\&amp;#34; string&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;	    &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;west&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;	],
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;	&lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;boolTest&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#66d9ef&#34;&gt;true&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;	&lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;nullTest&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#66d9ef&#34;&gt;null&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;	&lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;addressBookEntry&amp;#34;&lt;/span&gt;: {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;	    &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;givenName&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;Richard&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;	    &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;familyName&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;Mutt&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;	    &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;address&amp;#34;&lt;/span&gt;: {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;		&lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;street&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;1 Main St&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;		&lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;city&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;Springfield&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;		&lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;zip&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;10045&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;	    }
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    }
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;I read and output it with this Python:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;#!/usr/bin/env python3&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#f92672&#34;&gt;import&lt;/span&gt; json
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;f &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; open(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;jsondemo.js&amp;#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;data &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; json&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;load(f)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;print(data[&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;mydata&amp;#34;&lt;/span&gt;][&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;color&amp;#34;&lt;/span&gt;])
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;print(data[&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;mydata&amp;#34;&lt;/span&gt;][&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;amount&amp;#34;&lt;/span&gt;])
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;# Pull something out of the middle of an array&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;print(data[&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;mydata&amp;#34;&lt;/span&gt;][&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;arrayTest&amp;#34;&lt;/span&gt;][&lt;span style=&#34;color:#ae81ff&#34;&gt;3&lt;/span&gt;])
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;print(data[&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;mydata&amp;#34;&lt;/span&gt;][&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;boolTest&amp;#34;&lt;/span&gt;])
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;print(data[&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;mydata&amp;#34;&lt;/span&gt;][&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;nullTest&amp;#34;&lt;/span&gt;])
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;# Use a boolean value&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;if&lt;/span&gt; data[&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;mydata&amp;#34;&lt;/span&gt;][&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;boolTest&amp;#34;&lt;/span&gt;]:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    print(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;So boolean!&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;# Dig down into a data structure&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;print(data[&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;mydata&amp;#34;&lt;/span&gt;][&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;addressBookEntry&amp;#34;&lt;/span&gt;][&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;address&amp;#34;&lt;/span&gt;][&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;city&amp;#34;&lt;/span&gt;])
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;print(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;-- mydata properties: --&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;for&lt;/span&gt; p &lt;span style=&#34;color:#f92672&#34;&gt;in&lt;/span&gt; data[&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;mydata&amp;#34;&lt;/span&gt;]:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    print(p)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;print(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;-- list addressBookEntry property names and values: --&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;for&lt;/span&gt; p &lt;span style=&#34;color:#f92672&#34;&gt;in&lt;/span&gt; data[&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;mydata&amp;#34;&lt;/span&gt;][&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;addressBookEntry&amp;#34;&lt;/span&gt;]:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    print(p &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;: &amp;#39;&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; str(data[&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;mydata&amp;#34;&lt;/span&gt;][&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;addressBookEntry&amp;#34;&lt;/span&gt;][p]))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;# Testing whether values are present.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;if&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;familyName&amp;#34;&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;in&lt;/span&gt; data[&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;mydata&amp;#34;&lt;/span&gt;][&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;addressBookEntry&amp;#34;&lt;/span&gt;]:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    print(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;There is a family name value.&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;else&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    print(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;There is no family name value.&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;if&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;phone&amp;#34;&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;in&lt;/span&gt; data[&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;mydata&amp;#34;&lt;/span&gt;][&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;addressBookEntry&amp;#34;&lt;/span&gt;]:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    print(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;There is a phone value.&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;else&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    print(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;There is no phone value.&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;f&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;close()
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;It has print statements and comments describing the demonstrated tasks, so I don&amp;rsquo;t need to describe them here. Here is the output:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;red
3
escaped &amp;#34;test&amp;#34; string
True
None
So boolean!
Springfield
-- mydata properties: --
color
amount
arrayTest
boolTest
nullTest
addressBookEntry
-- list addressBookEntry property names and values: --
givenName: Richard
familyName: Mutt
address: {&amp;#39;street&amp;#39;: &amp;#39;1 Main St&amp;#39;, &amp;#39;city&amp;#39;: &amp;#39;Springfield&amp;#39;, &amp;#39;zip&amp;#39;: &amp;#39;10045&amp;#39;}
There is a family name value.
There is no phone value.
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;I hope that someday when someone asks themselves, as I have asked myself every few months, &amp;ldquo;how do I deal with that little bit of JSON in Python again?&amp;rdquo; that this demo can save them a few minutes.&lt;/p&gt;
&lt;hr/&gt;
&lt;p&gt;&lt;em&gt;Comments? Reply to my &lt;a href=&#34;https://mas.to/@bobdc/113657519628296478&#34;&gt;Mastodon message&lt;/a&gt; or &lt;a href=&#34;https://bsky.app/profile/bobdc.bsky.social/post/3lde4mgel6k23&#34;&gt;Bluesky post&lt;/a&gt; announcing this blog entry.&lt;/em&gt;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2024">2024</category>
      
      <category domain="https://www.bobdc.com//categories/json">JSON</category>
      
    </item>
    
    <item>
      <title>Amazon&#39;s failed folksonomy and Kevin Federline </title>
      <link>https://www.bobdc.com/blog/federline/</link>
      <pubDate>Sat, 30 Nov 2024 12:01:00 +0000</pubDate>
      
      <guid>https://www.bobdc.com/blog/federline/</guid>
      
      
      <description><div>What could go wrong? </div><div>&lt;img src=&#34;https://www.bobdc.com/img/main/federlineAlbum.jpg&#34; alt=&#34;[Kevin Federline Playing with Fire album cover]&#34; border=&#34;0&#34; width=&#34;240&#34; align=&#34;right&#34; style=&#34;margin-left: 30px; margin-bottom: 30px&#34; /&gt;
&lt;p&gt;A few years ago I wrote about some &lt;a href=&#34;../firstmetadata&#34;&gt;metadata that was 4,000 years old&lt;/a&gt;. Today I wanted to write about another high point in the history of metadata — well, some sort of point: the failure of Amazon&amp;rsquo;s folksonomy and the &lt;a href=&#34;https://www.amazon.com/gp/product/tags-on-product/B000IU3YLY/&#34;&gt;Playing with Fire&lt;/a&gt; album by Kevin Federline, the former Mr. Britney Spears. Throughout this discussion I&amp;rsquo;ll show some of the tags that Amazon users added to this album,  and I&amp;rsquo;ll also provide a link to a page where you can see the top 100 entries. (As you&amp;rsquo;ll see from the link in the previous sentence, these tags are no longer on Amazon&amp;rsquo;s page for that album.)&lt;/p&gt;
&lt;blockquote&gt;
worst album ever&lt;br/&gt;
an assault to decency&lt;br/&gt;
bird vomit&lt;br/&gt;
every track ought to be hidden
&lt;/blockquote&gt;
&lt;p&gt;The &lt;a href=&#34;https://en.wikipedia.org/wiki/Folksonomy&#34;&gt;Wikipedia page for &amp;ldquo;folksonomy&amp;rdquo;&lt;/a&gt; begins by defining it as a &amp;ldquo;a classification system in which end users apply public tags to online items, typically to make those items easier for themselves or others to find later. Over time, this can give rise to a classification system based on those tags and how often they are applied or searched for, in contrast to a taxonomic classification designed by the owners of the content and specified when it is published&amp;rdquo;. The rest of the page gives a lot of good background. (It also says that the &amp;ldquo;study of the structuring or classification of folksonomy is termed folksontology&amp;rdquo; which looks like good work going on, but I sure hope they come up with a better name.)&lt;/p&gt;
&lt;p&gt;For another good definition, slide 6 of a &lt;a href=&#34;http://www.greenchameleon.com/uploads/iKMS_Christine_Connors_slides.pdf&#34;&gt;presentation&lt;/a&gt; (pdf) by &lt;a href=&#34;https://www.linkedin.com/in/cjmconnors/&#34;&gt;Christine Connors&lt;/a&gt;, who is currently at Raytheon but whose name will be familiar to anyone who has done much work with taxonomies and data modeling, has a nice summary of what folksonomies are: a &amp;ldquo;people&amp;rsquo;s classification management&amp;rdquo; system that draws on the wisdom of the crowd to apply user-generated tags to digital resources  so that, as with any tagging system, people can find the resources they need more easily.&lt;/p&gt;
&lt;p&gt;When folksonomies first became popular, some people became overly excited. For example, the 2007 paper &lt;a href=&#34;https://hal.science/hal-00531169/file/2007FolksonomySHORT.pdf&#34;&gt;Folksonomy: the New Way to Serendipity&lt;/a&gt; (pdf) claimed that &amp;ldquo;folksonomy allows various modalities of curious explorations: a cultural exploration and a social exploration.&amp;rdquo;&lt;/p&gt;
&lt;blockquote&gt;
music to make you long for the sweet release of death&lt;br/&gt;
should be working at wendys&lt;br/&gt;
tiresome and vulgar&lt;br/&gt;
vanilla ice
&lt;/blockquote&gt;
&lt;p&gt;A lot of professional taxonomists didn&amp;rsquo;t like folksonomies, because without some measure of control then you don&amp;rsquo;t really have a &lt;a href=&#34;../what-is-a-taxonomy#id202592&#34;&gt;controlled vocabulary&lt;/a&gt;. You have a a free-for-all. Quality taxonomies are built around a governance process to ensure that terms are added, revised, or deleted in an organized way. This process also ensures that these terms are presented consistently, and the resulting metadata helps users find the resources they need more efficiently. I attended the &lt;a href=&#34;https://www.taxonomybootcamp.com/2024/default.aspx&#34;&gt;Taxonomy Bootcamp&lt;/a&gt; conference for several years when folksonomies were hotter, and based on many presentations that I saw, it was clear that this hotness had many in the community feeling a greater need to prove the value of their profession.&lt;/p&gt;
&lt;p&gt;The Wikipedia page includes a &lt;a href=&#34;https://en.wikipedia.org/wiki/Folksonomy#Benefits_and_disadvantages&#34;&gt;pretty good list of the advantages and disadvantages&lt;/a&gt; of folksonomies, but it omits one of the key perceived advantages at the time: free metadata. Metadata&amp;rsquo;s purpose is to add value to data by making it easier to navigate, and when Amazon decided to let anyone tag any product with anything, they thought they were getting people to add value for nothing. It was an early hint about where the eventual &amp;ldquo;creator economy&amp;rdquo; would go: people doing free work for a multi-billion dollar company for the sense of community and for the satisfaction in seeing their work on the big famous platform, but certainly not for any worthwhile amount of money.&lt;/p&gt;
&lt;p&gt;What could go wrong? Let&amp;rsquo;s look at Exhibit A. Thanks to the &lt;a href=&#34;https://web.archive.org/&#34;&gt;Wayback Machine&lt;/a&gt; we can look at the &lt;a href=&#34;https://web.archive.org/web/20080115123201/http://www.amazon.com/Playing-Fire-Kevin-Federline/dp/tags-on-product/B000IU3YLY&#34;&gt;January 15, 2008 version&lt;/a&gt; of Amazon&amp;rsquo;s page for Mr. Federline&amp;rsquo;s debut album. Let&amp;rsquo;s look at an excerpt:&lt;/p&gt;
&lt;img src=&#34;https://www.bobdc.com/img/main/federlineTags.png&#34; border=&#34;0&#34; width=&#34;360&#34; style=&#34;display: block; margin-left: auto; margin-right: auto; &#34; alt=&#34;Excerpt from 2008 Federline album Amazon page&#34;/&gt;
&lt;p&gt;Lots of interesting information here!&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The first tag every applied was &amp;ldquo;poser loser&amp;rdquo;&lt;/li&gt;
&lt;li&gt;The last tag was &amp;ldquo;stupid&amp;rdquo;&lt;/li&gt;
&lt;li&gt;The most popular tag was &amp;ldquo;talentless&amp;rdquo;, which 45 people applied&lt;/li&gt;
&lt;li&gt;The second most popular was &amp;ldquo;music to make you long for the sweet release of death&amp;rdquo;, which 27 people applied&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;And so on. As you can see, the tags in that list were links, but they no longer work. Back then, each of these linked to a page that listed everything else with that tag. The first one went to the excellent URL &lt;a href=&#34;http://www.amazon.com/tag/talentless&#34;&gt;http://www.amazon.com/tag/talentless&lt;/a&gt;, which no longer has anything on it, but you can see a &lt;a href=&#34;https://web.archive.org/web/20081222022439/http://www.amazon.com/tag/talentless&#34;&gt;Wayback Machine version&lt;/a&gt; from that era that includes some of Paris Hilton&amp;rsquo;s work. At one point that older version also let you &amp;ldquo;narrow by popular tags&amp;rdquo; like &amp;ldquo;horrible&amp;rdquo; and &amp;ldquo;trash&amp;rdquo; as a sort of faceted search.&lt;/p&gt;
&lt;blockquote&gt;
frisbee&lt;br/&gt;
i left justin for this&lt;br/&gt;
pure concentrated evil&lt;br/&gt;
cole slaw
&lt;/blockquote&gt;
&lt;p&gt;&lt;a href=&#34;https://en.wikipedia.org/wiki/Kevin_Federline&#34;&gt;Wikipedia&amp;rsquo;s Kevin Federline page&lt;/a&gt; says that the album is &amp;ldquo;commonly considered to be one of the worst albums ever released&amp;rdquo;. That links to the Wikipedia page titled &lt;a href=&#34;https://en.wikipedia.org/wiki/List_of_music_considered_the_worst&#34;&gt;List of music considered the worst&lt;/a&gt;, where it is the first entry under &amp;ldquo;2000s–2020s&amp;rdquo; (which, to be fair, is sorted chronologically). The album is not even on Spotify, but if you really want to hear an awful attempt at West Coast hiphop, it&amp;rsquo;s &lt;a href=&#34;https://www.youtube.com/watch?v=8LZdTnYmj18&#34;&gt;on YouTube&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Because of tags like these, Amazon eventually realized that folksonomies were not adding worthwhile value.  They discontinued the use of folksonomy tags and eventually removed the &lt;a href=&#34;https://web.archive.org/web/20160309205856/http://www.amazon.com/gp/help/customer/display.html?ie=UTF8&amp;amp;nodeId=16238571&#34;&gt;About Tags&lt;/a&gt; page that explained how they once had this feature available but didn&amp;rsquo;t anymore. As it says, &amp;ldquo;We&amp;rsquo;ve since continued to innovate on our more popular features such as Wish Lists, Customer Reviews, and Improve Your Recommendations&amp;rdquo;. More innovations in the creator economy!&lt;/p&gt;
&lt;p&gt;So taxonomists didn&amp;rsquo;t have to worry about proving the value of their profession. In fact, as more people learn that carefully curated knowledge graphs are a useful tool for reducing the number of hallucinations coming from Large Language Models, these people are appreciating that taxonomist skills are an excellent approach to curating those knowledge models. I&amp;rsquo;m sure there are still plenty of Content Management Systems in production where users can make up their own tags, but I&amp;rsquo;ll bet trained taxonomists are reviewing and normalizing those contributions as part of their job of improving the system&amp;rsquo;s metadata.&lt;/p&gt;
&lt;blockquote&gt;
makes baby jesus cry&lt;br/&gt;
dumbass&lt;br/&gt;
rich wife&lt;br/&gt;
ear bleach
&lt;/blockquote&gt;
&lt;hr/&gt;
&lt;p&gt;&lt;em&gt;Comments? Reply to my &lt;a href=&#34;https://mas.to/@bobdc/113573018025138224&#34;&gt;Mastodon message&lt;/a&gt; or &lt;a href=&#34;https://bsky.app/profile/bobdc.bsky.social/post/3lc6ls5xjtk2n&#34;&gt;Bluesky post&lt;/a&gt; announcing this blog entry.&lt;/em&gt;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2024">2024</category>
      
      <category domain="https://www.bobdc.com//categories/amazon">Amazon</category>
      
      <category domain="https://www.bobdc.com//categories/metadata">metadata</category>
      
    </item>
    
    <item>
      <title>RDF serialization formats</title>
      <link>https://www.bobdc.com/blog/trig/</link>
      <pubDate>Sun, 27 Oct 2024 10:45:00 +0000</pubDate>
      
      <guid>https://www.bobdc.com/blog/trig/</guid>
      
      
      <description><div>Starring TriG</div><div>&lt;img src=&#34;https://www.bobdc.com/img/main/trigPlaneSquare.png&#34; alt=&#34;[TRIG biplane]&#34; border=&#34;0&#34; width=&#34;240&#34; align=&#34;right&#34; style=&#34;margin-left: 30px; margin-bottom: 30px&#34; /&gt;
&lt;p&gt;For this month’s blog entry I originally planned to create a reference for RDF serialization formats. My idea was to create a table listing all the known formats, with links to their specs (when they have one), their age, origin, a sample, and some opinionated comments–for example, why creating new documents in RDF/XML made sense in 1999 but no longer does.&lt;/p&gt;
&lt;p&gt;I found a few nice existing surveys, so instead of creating a new one I will list those:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The GraphDB &lt;a href=&#34;https://graphdb.ontotext.com/documentation/10.7/rdf-formats.html&#34;&gt;RDF formats&lt;/a&gt; documentation page was created before I joined Ontotext (which, by the way, is merging with &lt;a href=&#34;https://semantic-web.com/&#34;&gt;Semantic Web Company&lt;/a&gt; of &lt;a href=&#34;https://www.poolparty.biz/&#34;&gt;PoolParty&lt;/a&gt; fame to become &lt;a href=&#34;https://graphwise.ai/&#34;&gt;Graphwise&lt;/a&gt;) and has good information about the important formats.&lt;/li&gt;
&lt;li&gt;The &lt;a href=&#34;https://docs.aws.amazon.com/neptune/latest/userguide/sparql-media-type-support.html&#34;&gt;Neptune list&lt;/a&gt; has some nice descriptions of each.&lt;/li&gt;
&lt;li&gt;The Medium article &lt;a href=&#34;https://medium.com/wallscope/understanding-linked-data-formats-rdf-xml-vs-turtle-vs-n-triples-eb931dbe9827&#34;&gt;Understanding Linked Data Formats&lt;/a&gt; doesn&amp;rsquo;t cover many formats but it does include examples of the ones that are listed.&lt;/li&gt;
&lt;li&gt;The W3C &lt;a href=&#34;https://www.w3.org/wiki/RdfSyntax&#34;&gt;RDFSyntax&lt;/a&gt; page is a bit out of date (saying, for example, that Turtle is still in the process of becoming a W3C Recommendation) but it&amp;rsquo;s interesting for the historical perspective it provides on attempts to create serialization formats.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I tend to just use Turtle for everything. I have found N-Triples useful for &lt;a href=&#34;../driving-hadoop-data-integratio/&#34;&gt;certain experiments&lt;/a&gt; because you can split a file up at any line breaks (for example, with some shell text processing utilities) and be confident that the pieces will all be syntactically correct.&lt;/p&gt;
&lt;p&gt;When I stored data with named graphs I used N-Quads, which are N-Triples with a graph name URI added to any line that represents a triple in a named graph. (N-Triples and especially N-Quads are difficult to write about because, if a single one of either can&amp;rsquo;t have a carriage return in it and  you must use full URIs instead of prefixed names, it&amp;rsquo;s difficult to come up with realistic examples that properly fit on a line of a book page or browser paragraph.)&lt;/p&gt;
&lt;p&gt;I never paid attention to &lt;a href=&#34;https://www.w3.org/TR/trig/&#34;&gt;TriG&lt;/a&gt;, which is now a bit embarrassing because TriG 1.1 has been a Recommendation for over ten years and it looks like a great way to represent data in named graphs. It&amp;rsquo;s basically Turtle with the SPARQL syntax for specifying triples in named graphs.&lt;/p&gt;
&lt;p&gt;As an example, let&amp;rsquo;s first look at the update request &lt;a href=&#34;https://www.learningsparql.com/2ndeditionexamples/ex338.ru&#34;&gt;example 338&lt;/a&gt; from my book &lt;a href=&#34;https://www.learningsparql.com/&#34;&gt;Learning SPARQL&lt;/a&gt;. I used it as part of an example in my &lt;a href=&#34;../selectingall/&#34;&gt;previous blog entry&lt;/a&gt; to create two graphs and then put two triples in each of them as well as adding two triples to the default graph:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;# filename: ex338.ru

PREFIX d:  &amp;lt;http://learningsparql.com/ns/data#&amp;gt;
PREFIX dm: &amp;lt;http://learningsparql.com/ns/demo#&amp;gt;

INSERT DATA
{
  d:x dm:tag &amp;#34;one&amp;#34; . 
  d:x dm:tag &amp;#34;two&amp;#34; . 

  GRAPH d:g1
  { 
    d:x dm:tag &amp;#34;three&amp;#34; . 
    d:x dm:tag &amp;#34;four&amp;#34; . 
  }

  GRAPH d:g2
  { 
    d:x dm:tag &amp;#34;five&amp;#34; . 
    d:x dm:tag &amp;#34;six&amp;#34; . 
  }
}
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Instead of running that update query to load those six triples, I could have just loaded the following TriG file and gotten the same result:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;@prefix d:  &amp;lt;http://learningsparql.com/ns/data#&amp;gt; .
@prefix dm: &amp;lt;http://learningsparql.com/ns/demo#&amp;gt; .

{
  d:x dm:tag &amp;#34;one&amp;#34; . 
  d:x dm:tag &amp;#34;two&amp;#34; .
}

GRAPH d:g1 {
    d:x dm:tag &amp;#34;three&amp;#34; . 
    d:x dm:tag &amp;#34;four&amp;#34; .
}

GRAPH d:g2 {
    d:x dm:tag &amp;#34;five&amp;#34; . 
    d:x dm:tag &amp;#34;six&amp;#34; .
}
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;The &lt;code&gt;GRAPH&lt;/code&gt; keywords in this TriG sample, which you&amp;rsquo;ll recognize as the SPARQL way to say &amp;ldquo;here comes a named graph&amp;rdquo;, are actually optional. The curly braces around the &amp;ldquo;one&amp;rdquo; and &amp;ldquo;two&amp;rdquo; triples to delimit the default graph&amp;rsquo;s triples are also optional. With or without these optional bits, anyone who has been using Turtle and SPARQL for a few years will find TriG&amp;rsquo;s syntax to be intuitive.&lt;/p&gt;
&lt;p&gt;The &lt;a href=&#34;https://en.wikipedia.org/wiki/TriG_(syntax)&#34;&gt;Wikipedia page about TriG&lt;/a&gt; has another nice example.&lt;/p&gt;
&lt;p&gt;As I mentioned above, the &lt;a href=&#34;https://www.w3.org/TR/trig/&#34;&gt;current TriG Recommendation&lt;/a&gt; is release 1.1. Release 1.2 is &lt;a href=&#34;https://www.w3.org/TR/rdf12-trig/&#34;&gt;underway&lt;/a&gt;, and it looks like its main goal is to incorporate reification, or the ability to make RDF statements about RDF statements, like the current ongoing work with Turtle. As part of the work on the next version of SPARQL, it would be nice to see &lt;code&gt;CONSTRUCT&lt;/code&gt; queries use this kind  syntax to support the creation of quads in queries. An &lt;a href=&#34;https://jena.apache.org/documentation/query/construct-quad.html&#34;&gt;extension in Apache Jena&lt;/a&gt; has supported this &lt;a href=&#34;https://mvnrepository.com/artifact/org.apache.jena/jena-arq&#34;&gt;since 2015&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Either way, I&amp;rsquo;m sure I&amp;rsquo;ll be using TriG a lot more in the future. I just used it at work on Friday!&lt;/p&gt;
&lt;hr/&gt;
&lt;p&gt;&lt;em&gt;Comments? Reply to  &lt;a href=&#34;https://x.com/bobdc/status/1850551990431928811&#34;&gt;my tweet&lt;/a&gt; (or even better, my &lt;a href=&#34;https://mas.to/@bobdc/113379966709454414&#34;&gt;Mastodon message&lt;/a&gt;) announcing this blog entry.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://creativecommons.org/licenses/by-sa/2.0/&#34;&gt;CC BY-SA 2.0&lt;/a&gt; &lt;a href=&#34;https://www.flickr.com/photos/biker_jun/14137025813/&#34;&gt;biplane photo&lt;/a&gt; by &lt;a href=&#34;https://www.flickr.com/photos/biker_jun/&#34;&gt;Jun&lt;/a&gt; (and cropped)&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2024">2024</category>
      
      <category domain="https://www.bobdc.com//categories/rdf">RDF</category>
      
    </item>
    
    <item>
      <title>Selecting all the triples from all the graphs</title>
      <link>https://www.bobdc.com/blog/selectingall/</link>
      <pubDate>Sun, 29 Sep 2024 11:30:00 +0000</pubDate>
      
      <guid>https://www.bobdc.com/blog/selectingall/</guid>
      
      
      <description><div>But the default graph?</div><div>&lt;p&gt;In my book &lt;a href=&#34;https://www.learningsparql.com/&#34;&gt;Learning SPARQL&lt;/a&gt; I often use a query for all the triples in a dataset (that is, all the triples in the default graph and all the triples in any named graphs) that I now realize needs some revision to be more accurate.&lt;/p&gt;
&lt;p&gt;To see the issue that I ran into, first imagine running the update request in &lt;a href=&#34;https://www.learningsparql.com/2ndeditionexamples/ex338.ru&#34;&gt;example 338&lt;/a&gt; from the book on an empty dataset. It inserts two triples into the default graph and two each into two named graphs:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;# filename: ex338.ru

PREFIX d:  &amp;lt;http://learningsparql.com/ns/data#&amp;gt;
PREFIX dm: &amp;lt;http://learningsparql.com/ns/demo#&amp;gt;

INSERT DATA
{
  d:x dm:tag &amp;#34;one&amp;#34; . 
  d:x dm:tag &amp;#34;two&amp;#34; . 

  GRAPH d:g1
  { 
    d:x dm:tag &amp;#34;three&amp;#34; . 
    d:x dm:tag &amp;#34;four&amp;#34; . 
  }

  GRAPH d:g2
  { 
    d:x dm:tag &amp;#34;five&amp;#34; . 
    d:x dm:tag &amp;#34;six&amp;#34; . 
  }
}
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Next we run &lt;a href=&#34;https://www.learningsparql.com/2ndeditionexamples/ex332.rq&#34;&gt;example 332&lt;/a&gt;:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;# filename: ex332.rq

SELECT ?g ?s ?p ?o
WHERE
{
  { ?s ?p ?o }
  UNION
  { GRAPH ?g { ?s ?p ?o } }
}
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Contrasting this query with a &lt;code&gt;SELECT * WHERE {?s ?p ?o}&lt;/code&gt; query earlier in the book, I wrote &amp;ldquo;This really is the List All Triples query, because it lists a union of all triples in the default graph and all the triples in any named graph along with the associated graph names&amp;rdquo;. When run with the Jena &lt;a href=&#34;https://jena.apache.org/documentation/fuseki2/&#34;&gt;Fuseki&lt;/a&gt; triplestore, it lists the six triples shown in example 338 above with the associated graph names next to the last four.&lt;/p&gt;
&lt;p&gt;I had assumed that a triple is either in a named graph or in a default graph, but I have recently learned that it&amp;rsquo;s not always that simple. For example, according to the SPARQL query specification&amp;rsquo;s &lt;a href=&#34;https://www.w3.org/TR/sparql11-query/#exampleDatasets&#34;&gt;Examples of RDF Datasets&lt;/a&gt; section, &amp;ldquo;One possible arrangement of graphs in an RDF Dataset is to have the default graph be the RDF merge of some or all of the information in the named graphs&amp;rdquo;. According to my experiments, the &lt;a href=&#34;https://graphdb.ontotext.com/&#34;&gt;GraphDB&lt;/a&gt;, &lt;a href=&#34;https://blazegraph.com/&#34;&gt;Blazegraph&lt;/a&gt;, and &lt;a href=&#34;https://github.com/RDFLib/rdflib&#34;&gt;RDFLib&lt;/a&gt; query engines each assume that named graph triples are also in the default graph. With these query engines, running the query above with the data above gets me a list of ten query results because the &amp;ldquo;three&amp;rdquo;, &amp;ldquo;four&amp;rdquo;, &amp;ldquo;five&amp;rdquo;, and &amp;ldquo;six&amp;rdquo; triples appear with their graph names, and because of the &lt;code&gt;{?s ?p ?o}&lt;/code&gt; before the &lt;code&gt;UNION&lt;/code&gt; keyword, they also show up as part of the default graph.&lt;/p&gt;
&lt;p&gt;As I learned in a &lt;a href=&#34;https://lists.apache.org/thread/1foxsmqbn5575yms9vxh8xj7cwv136cg&#34;&gt;conversation with Andy Seaborne&lt;/a&gt; on the Jena mailing list, you can configure Fuseki to do this. From now on, though, when I want to list all of a dataset&amp;rsquo;s triples by first listing those that aren&amp;rsquo;t in a named graph and then listing the ones that are with their graph names, I&amp;rsquo;ll use this new query below. It uses the &lt;code&gt;MINUS&lt;/code&gt; keyword to explicitly exclude named graph triples from the set of default graph triples being retrieved by the clause before the &lt;code&gt;UNION&lt;/code&gt; keyword:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;SELECT ?g ?s ?p ?o
WHERE
{
    { ?s ?p ?o
      MINUS { GRAPH ?g {?s ?p ?o} }
    }
  UNION
  { GRAPH ?g { ?s ?p ?o } }
}
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Using the data above, this query returns the same six rows with Fuseki, GraphDB, and RDFLib. (Blazegraph returns six rows, but with &lt;code&gt;bd:nullGraph&lt;/code&gt; as the &lt;code&gt;?g&lt;/code&gt; value for the &amp;ldquo;one&amp;rdquo; and &amp;ldquo;two&amp;rdquo; triples.)&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s not a super efficient query, but asking for absolutely all the triples in a dataset rarely is. With small datasets it&amp;rsquo;s a quick way to answer the question &amp;ldquo;what do we have here&amp;rdquo;, so I use it often when showing the effect of various keywords and syntax in a SPARQL query. I&amp;rsquo;ll be using this new query often enough that I already have it as one of the &lt;a href=&#34;https://graphdb.ontotext.com/documentation/10.7/sparql-queries.html#save-and-share-queries&#34;&gt;Saved queries&lt;/a&gt; that GraphDB lets you keep handy regardless of what dataset or project you&amp;rsquo;re working on.&lt;/p&gt;
&lt;!-- image is from https://x.com/bobdc/status/1399068522760843270/photo/1 --&gt;
&lt;img src=&#34;https://www.bobdc.com/img/main/tripleDumpsters.png&#34; class=&#34;centered&#34; alt=&#34;Triples dumpsters&#34;/&gt;
&lt;hr/&gt;
&lt;p&gt;&lt;em&gt;Comments? Reply to  &lt;a href=&#34;https://x.com/bobdc/status/1840417941306663408&#34;&gt;my tweet&lt;/a&gt; (or even better, my &lt;a href=&#34;https://mas.to/@bobdc/113221623437527774&#34;&gt;Mastodon message&lt;/a&gt;) announcing this blog entry.&lt;/em&gt;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2024">2024</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
    </item>
    
    <item>
      <title>Editing schemas, ontologies, and SKOS taxonomies with VocBench</title>
      <link>https://www.bobdc.com/blog/vocbench/</link>
      <pubDate>Sun, 25 Aug 2024 11:06:00 +0000</pubDate>
      
      <guid>https://www.bobdc.com/blog/vocbench/</guid>
      
      
      <description><div>A free GUI tool.</div><div>&lt;p&gt;&lt;a href=&#39;https://vocbench.uniroma2.it/&#39;&gt;&lt;img src=&#34;https://www.bobdc.com/img/main/VocBenchLogo.png&#34; alt=&#34;[VocBench logo]&#34; border=&#34;0&#34; align=&#34;right&#34; style=&#34;margin-left: 30px; margin-bottom: 30px&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;According to the &lt;a href=&#34;https://en.wikipedia.org/wiki/Simple_Knowledge_Organization_System&#34;&gt;Wikipedia page for SKOS&lt;/a&gt;, &lt;a href=&#34;https://vocbench.uniroma2.it/&#34;&gt;VocBench&lt;/a&gt; &amp;ldquo;is an open-source, web-based RDF/OWL/SKOS/SKOS-XL editor developed by a collaboration between the Food and Agriculture Organization (FAO) of the United Nations, the University of Rome Tor Vergata and the Malaysian research centre MIMOS&amp;rdquo;. I&amp;rsquo;m usually happy to create schemas, ontologies, and SKOS taxonomies by hand with Turtle files in a &lt;a href=&#34;https://www.gnu.org/software/emacs/&#34;&gt;text editor&lt;/a&gt;, but I thought that a free, cross-platform graphical tool for creating and editing these was worth investigating—especially if it lets you create and edit instance data to go with your schemas and ontologies. VocBench did a fine job with all of these.&lt;/p&gt;
&lt;p&gt;Like my former employer TopQuadrant&amp;rsquo;s former product TopBraid Composer, VocBench&amp;rsquo;s overall layout appears to be descended from the open source ontology editor &lt;a href=&#34;https://en.wikipedia.org/wiki/Prot%C3%A9g%C3%A9_(software)&#34;&gt;Protégé&lt;/a&gt;. (The last time I tried Protégé and its more modern successor &lt;a href=&#34;https://protegewiki.stanford.edu/wiki/WebProtege&#34;&gt;WebProtégé&lt;/a&gt;, I still found them confusing, with more documentation about advanced features than about the simple basics. I gave up.)&lt;/p&gt;
&lt;p&gt;My test goals with VocBench were simple:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Load a sample existing RDF schema and  use VocBench&amp;rsquo;s web-based interface to:
&lt;ul&gt;
&lt;li&gt;Create a new class based on an existing one&lt;/li&gt;
&lt;li&gt;Create a property that has the new class as a domain&lt;/li&gt;
&lt;li&gt;Edit some instances of the new class and check whether the new properties are part of the edit form&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Query some data with SPARQL&lt;/li&gt;
&lt;li&gt;Load some existing SKOS and see how that looks&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;It was important for me to start off with existing files that were already on my hard disk. This way, I could verify that the tool&amp;rsquo;s standards support was really there and that using the tool didn&amp;rsquo;t depend on special things that it added to the model files.&lt;/p&gt;
&lt;h2 id=&#34;installing-and-running&#34;&gt;Installing and running&lt;/h2&gt;
&lt;p&gt;First I downloaded the most recent zip file from &lt;a href=&#34;https://bitbucket.org/art-uniroma2/vocbench3/downloads/&#34;&gt;https://bitbucket.org/art-uniroma2/vocbench3/downloads/&lt;/a&gt;. Unzipping this created a directory named &lt;code&gt;semanticturkey-12.2/&lt;/code&gt;. (VocBench&amp;rsquo;s home page tells that its &amp;ldquo;business and data access layers are realized by Semantic Turkey, an open-source platform for Knowledge Acquisition and Management realized by the ART Research Group at the University of Rome Tor Vergata&amp;rdquo;.)&lt;/p&gt;
&lt;p&gt;Next, I ran the &lt;code&gt;semanticturkey.sh&lt;/code&gt; script in the &lt;code&gt;semanticturkey-12.2/bin&lt;/code&gt; directory, which also has a &lt;code&gt;semanticturkey.bat&lt;/code&gt; file for Windows users. As the &lt;a href=&#34;https://vocbench.uniroma2.it/doc/&#34;&gt;VocBench documentation&lt;/a&gt; describes, the next step is to send your browser to  &lt;a href=&#34;http://localhost:1979/vocbench3&#34;&gt;http://localhost:1979/vocbench3&lt;/a&gt;. You have to register and make up a password,  and then at &lt;a href=&#34;http://localhost:1979/vocbench3/#/Sysconfig&#34;&gt;http://localhost:1979/vocbench3/#/Sysconfig&lt;/a&gt; you set up the server. I just accepted all the default settings on the setup screen.&lt;/p&gt;
&lt;p&gt;Their documentation suggests that the next step is a &amp;ldquo;quick &lt;a href=&#34;https://vocbench.uniroma2.it/doc/user/test_drive.jsf&#34;&gt;Test Drive&lt;/a&gt;&amp;rdquo;. The Test Drive page is too long and detailed to do quickly, so I just figured out the following myself.&lt;/p&gt;
&lt;p&gt;I started by selecting &lt;strong&gt;Projects&lt;/strong&gt; on the main menu and then &lt;strong&gt;Create&lt;/strong&gt; on the Projects screen. The next screen, which is shown as the second screen shot on the Test Drive page, has many properties that you can fill out, but I went with the bare minimum:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Project name&lt;/strong&gt;: emptest&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Base URI&lt;/strong&gt;: &lt;code&gt;http://example.org/emptest/&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Model&lt;/strong&gt;: RDFS&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Lexicalization&lt;/strong&gt;: RDFS (the default value; you&amp;rsquo;ll see that I picked a different value for this for my SKOS test below)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I clicked &lt;strong&gt;Create&lt;/strong&gt; in the lower right, and then a popup told me &amp;ldquo;Project created successfully&amp;rdquo;.&lt;/p&gt;
&lt;p&gt;Back on the &lt;strong&gt;Projects&lt;/strong&gt; screen I clicked the radio button in the &amp;ldquo;Accessed&amp;rdquo; column of the &lt;code&gt;emptest&lt;/code&gt; row. This seemed to be the quickest way to make it the current project.&lt;/p&gt;
&lt;p&gt;In the upper left VocBench then showed &amp;ldquo;Current project: emptest&amp;rdquo; and a &amp;ldquo;Global Data Management&amp;rdquo; dropdown menu to the right of that. The dropdown has the handy choices to &lt;strong&gt;Load data&lt;/strong&gt;, &lt;strong&gt;Export data&lt;/strong&gt;, &lt;strong&gt;Clear data&lt;/strong&gt;, and a few others. I picked &lt;strong&gt;Load data&lt;/strong&gt; and imported the following &lt;code&gt;person.ttl&lt;/code&gt; file:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;@prefix ex:     &amp;lt;http://example.com/&amp;gt; .
@prefix schema: &amp;lt;http://schema.org/&amp;gt; .
@prefix rdfs:   &amp;lt;http://www.w3.org/2000/01/rdf-schema#&amp;gt; .
@prefix rdf:    &amp;lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#&amp;gt; . 

ex:Person a rdfs:Class .

schema:familyName a rdf:Property ;
                  rdfs:domain ex:Person .

schema:givenName a rdf:Property ;
                 rdfs:domain ex:Person .
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;After doing this, selecting &lt;strong&gt;Data&lt;/strong&gt; from the main menu showed this in a widget for editing the class hierarchy:&lt;/p&gt;
&lt;img src=&#34;https://www.bobdc.com/img/main/vocbench1.png&#34; class=&#34;centered&#34;  alt=&#34;VocBench class hierarchy&#34; width=&#34;400&#34;/&gt;
&lt;p&gt;I selected &lt;code&gt;ex:Person&lt;/code&gt; there and then clicked &amp;ldquo;Create subClass&amp;rdquo; (the second of the four buttons above it—throughout these instructions, whenever I refer to an icon by name and you don&amp;rsquo;t see the name, mouse over the buttons until you find the tooltip with that name) to create an Employee subclass, and that was added to the visual hierarchy.&lt;/p&gt;
&lt;p&gt;To add a &lt;code&gt;startDate&lt;/code&gt; property to go with that, I selected the &lt;strong&gt;Property&lt;/strong&gt; tab above the class hierarchy and selected its &lt;strong&gt;Create property&lt;/strong&gt; button. I picked datatypeProperty from this button&amp;rsquo;s dropdown menu and entered &amp;ldquo;startDate&amp;rdquo; as the property name, and then  I saw this new property added to the list on the &lt;strong&gt;Property&lt;/strong&gt; tab on the left.&lt;/p&gt;
&lt;p&gt;With the new property selected there, on the right side of the screen I clicked the &lt;strong&gt;Add domain&lt;/strong&gt; button to the right of the Domains header. I then selected Employee from the class hierarchy widget that it displayed, and clicked &lt;strong&gt;OK&lt;/strong&gt; on the &lt;strong&gt;Add domain&lt;/strong&gt; dialog box. Below the Domain header was a Range header with a similar button that let me add &lt;code&gt;xsd:date&lt;/code&gt; as a range for &lt;code&gt;startDate&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;To really see what this was creating, I went back to the &amp;ldquo;Global Data Management&amp;rdquo; dropdown and picked &lt;strong&gt;Export Data&lt;/strong&gt;. On the lower-right of the next screen I changed the Export Format from RDF/XML to Turtle (it is 2024, after all) and clicked &lt;strong&gt;Submit&lt;/strong&gt;. Here is the &lt;code&gt;export.ttl&lt;/code&gt; file that this created, without its prefix declarations:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;&amp;lt;http://example.org/emptest/&amp;gt; a owl:Ontology .

ex:Person a rdfs:Class .

schema:familyName a rdf:Property;
  rdfs:domain ex:Person .

schema:givenName a rdf:Property;
  rdfs:domain ex:Person .

:Employee a rdfs:Class;
  rdfs:subClassOf ex:Person .

:startDate a owl:DatatypeProperty;
  rdfs:range xsd:date;
  rdfs:domain :Employee .
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;That was just what I was hoping for.&lt;/p&gt;
&lt;p&gt;Going back to the  &lt;strong&gt;Class&lt;/strong&gt; tab we see an instances panel underneath it, where the first button lets you add an instance for the selected class:&lt;/p&gt;
&lt;img src=&#34;https://www.bobdc.com/img/main/vocbench2.png&#34; class=&#34;centered&#34;  alt=&#34;VocBench class hierarchy&#34; width=&#34;400&#34;/&gt;
&lt;p&gt;Clicking it displays a popup where you create a URI for the new instance; clicking &lt;strong&gt;Ok&lt;/strong&gt; on that message box adds it to the instance list. Once it&amp;rsquo;s there, you can select it, and then the form on the right lets you edit it. Selecting the blue &lt;strong&gt;Add property&lt;/strong&gt; button to the right of &lt;strong&gt;Other properties&lt;/strong&gt; on that form displayed a popup that gave me the opportunity to add or edit the &lt;code&gt;startDate&lt;/code&gt;, &lt;code&gt;familyName&lt;/code&gt;, or &lt;code&gt;givenName&lt;/code&gt; properties for the new instance. I was happy to see that the editing of an instance&amp;rsquo;s data automates the use of both assigned and inherited properties on the editing widget for that data.&lt;/p&gt;
&lt;p&gt;Selecting SPARQL from the main menu gave me a SPARQL screen where it was easy enough to write and execute SPARQL queries, so all that was left was to load some existing SKOS and see what VocBench did with it. I took example &lt;a href=&#34;https://www.learningsparql.com/2ndeditionexamples/ex327.ttl&#34;&gt;ex327.ttl&lt;/a&gt; from my book &amp;ldquo;Learning SPARQL&amp;rdquo; and created a new VocBench project for it. For this project, I set both Model and Lexicalization to SKOS. (Don&amp;rsquo;t forget to set a Base URI, as I did several times when trying to create a new project.) After I created this new project and made it the current one I imported the &lt;code&gt;ex327.ttl&lt;/code&gt; file. I then picked &lt;strong&gt;Data&lt;/strong&gt; from the main menu and after expanding the widget in the &lt;strong&gt;Concept&lt;/strong&gt; tab a little I saw this:&lt;/p&gt;
&lt;img src=&#34;https://www.bobdc.com/img/main/vocbench3.png&#34; class=&#34;centered&#34;  alt=&#34;VocBench SKOS  hierarchy&#34; width=&#34;400&#34;/&gt;
&lt;p&gt;Of course VocBench can do much more than what I&amp;rsquo;ve shown here, but my first priority was to see how it covered the basics, and it did just great with these. The rest of its Test Drive page shows plenty of potential follow-up exercises.&lt;/p&gt;
&lt;hr/&gt;
&lt;p&gt;&lt;em&gt;Comments? Reply to  &lt;a href=&#34;https://x.com/bobdc/status/1827727700238172614&#34;&gt;my tweet&lt;/a&gt; (or even better, my &lt;a href=&#34;https://mas.to/@bobdc/113023336395446176&#34;&gt;Mastodon message&lt;/a&gt;) announcing this blog entry.&lt;/em&gt;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2024">2024</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
      <category domain="https://www.bobdc.com//categories/skos">SKOS</category>
      
      <category domain="https://www.bobdc.com//categories/rdfs">RDFS</category>
      
      <category domain="https://www.bobdc.com//categories/owl">OWL</category>
      
    </item>
    
    <item>
      <title>SPARQLing anything</title>
      <link>https://www.bobdc.com/blog/sparqlanything/</link>
      <pubDate>Sun, 21 Jul 2024 11:45:00 +0000</pubDate>
      
      <guid>https://www.bobdc.com/blog/sparqlanything/</guid>
      
      
      <description><div>MS Office files, XML, markdown, plain text, and more.</div><div>&lt;p&gt;&lt;a href=&#34;https://sparql-anything.cc/&#34;&gt;SPARQL Anything&lt;/a&gt;  is an open source tool that lets you use SPARQL to query data in a long list of popular formats: XML, JSON, CSV, HTML, Excel, Text, Binary, EXIF, File System, Zip/Tar, Markdown, YAML, Bibtex, DOCx, and PPTx. It has a lot of great documentation and features, but I&amp;rsquo;ll start here with an example of it in action.&lt;/p&gt;
&lt;p&gt;As you&amp;rsquo;ll see on &lt;a href=&#34;https://github.com/SPARQL-Anything/sparql.anything&#34;&gt;its github page&lt;/a&gt;, there is a command line interface and a server version. I downloaded the jar file from its &lt;a href=&#34;https://github.com/SPARQL-Anything/sparql.anything/releases&#34;&gt;releases page&lt;/a&gt; with the goal of sending a SPARQL query to this spreadsheet, which I called &lt;code&gt;xlsxtest.xlsx&lt;/code&gt;:&lt;/p&gt;
&lt;img src=&#34;https://www.bobdc.com/img/main/xlstest.png&#34; class=&#34;centered&#34; width=&#34;400&#34; alt=&#34;sample spreadsheet&#34;/&gt;
&lt;p&gt;(I created this spreadsheet with OpenOffice, but saved it as an MS Office Excel file and it worked just fine.)&lt;/p&gt;
&lt;p&gt;I then put the following SPARQL query in the file &lt;code&gt;sa1.rq&lt;/code&gt;. Note how its &lt;code&gt;SERVICE&lt;/code&gt; parameter includes the name of the spreadsheet file above:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;CONSTRUCT { ?s ?p ?o }
WHERE {
    SERVICE &amp;lt;x-sparql-anything:xlsxtest.xlsx&amp;gt; {
    ?s ?p ?o
   }
}
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;I called the jar file with &lt;code&gt;sa1.rq&lt;/code&gt; as the query file (run it with no parameters to see a wide choice of other parameters) and redirected the output to a Turtle file:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;java -jar ~/temp/sparql-anything-0.9.0.jar -q sa1.rq &amp;gt; xlsxtest.ttl
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Here is the Turtle file:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;[ a       &amp;lt;http://sparql.xyz/facade-x/ns/root&amp;gt; ;
  &amp;lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#_1&amp;gt;
          [ &amp;lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#_1&amp;gt;
                    &amp;#34;Given-name&amp;#34; ;
            &amp;lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#_2&amp;gt;
                    &amp;#34;Family-name&amp;#34; ;
            &amp;lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#_3&amp;gt;
                    &amp;#34;Hire-date&amp;#34; ;
            &amp;lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#_4&amp;gt;
                    &amp;#34;random int&amp;#34;
          ] ;
  &amp;lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#_2&amp;gt;
          [ &amp;lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#_1&amp;gt;
                    &amp;#34;Grace&amp;#34; ;
            &amp;lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#_2&amp;gt;
                    &amp;#34;Lee&amp;#34; ;
            &amp;lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#_3&amp;gt;
                    &amp;#34;45150.0&amp;#34;^^&amp;lt;http://www.w3.org/2001/XMLSchema#double&amp;gt; ;
            &amp;lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#_4&amp;gt;
                    &amp;#34;3.0&amp;#34;^^&amp;lt;http://www.w3.org/2001/XMLSchema#double&amp;gt;
          ] ;
  &amp;lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#_3&amp;gt;
          [ &amp;lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#_1&amp;gt;
                    &amp;#34;Johnson&amp;#34; ;
            &amp;lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#_2&amp;gt;
                    &amp;#34;Frank&amp;#34; ;
            &amp;lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#_3&amp;gt;
                    &amp;#34;44887.0&amp;#34;^^&amp;lt;http://www.w3.org/2001/XMLSchema#double&amp;gt; ;
            &amp;lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#_4&amp;gt;
                    &amp;#34;54.0&amp;#34;^^&amp;lt;http://www.w3.org/2001/XMLSchema#double&amp;gt;
          ]
] .
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;It&amp;rsquo;s nice to see that it didn&amp;rsquo;t turn &lt;em&gt;all&lt;/em&gt; the values into strings—it recognizes decimal numbers and typed them accordingly. The structure of this output, which uses a lot of blank nodes, conforms to a model developed by the SPARQL Anything developers called &lt;a href=&#34;https://github.com/SPARQL-Anything/sparql.anything/blob/v1.0-DEV/Facade-X.md&#34;&gt;Facade-X&lt;/a&gt;. Before digging into that much, I just played around with SPARQL queries of the above triples and came up with this:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;PREFIX xsd: &amp;lt;http://www.w3.org/2001/XMLSchema#&amp;gt;
PREFIX rdf: &amp;lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#&amp;gt;

SELECT ?rowID ?cellID ?value WHERE {
   ?root a &amp;lt;http://sparql.xyz/facade-x/ns/root&amp;gt; ;
   ?rowID ?rowContents.
   ?rowContents ?cellID ?value . 
}
ORDER BY ?rowID ?cellID
&lt;/code&gt;&lt;/pre&gt;&lt;!--  arq --query xlsxtest.rq --data xlsxtest.ttl --&gt;
&lt;p&gt;Running that query on the Turtle created by SPARQL Anything gave me this:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;-------------------------------------------
| rowID  | cellID | value                 |
===========================================
| rdf:_1 | rdf:_1 | &amp;#34;Given-name&amp;#34;          |
| rdf:_1 | rdf:_2 | &amp;#34;Family-name&amp;#34;         |
| rdf:_1 | rdf:_3 | &amp;#34;Hire-date&amp;#34;           |
| rdf:_1 | rdf:_4 | &amp;#34;random int&amp;#34;          |
| rdf:_2 | rdf:_1 | &amp;#34;Grace&amp;#34;               |
| rdf:_2 | rdf:_2 | &amp;#34;Lee&amp;#34;                 |
| rdf:_2 | rdf:_3 | &amp;#34;45150.0&amp;#34;^^xsd:double |
| rdf:_2 | rdf:_4 | &amp;#34;3.0&amp;#34;^^xsd:double     |
| rdf:_3 | rdf:_1 | &amp;#34;Johnson&amp;#34;             |
| rdf:_3 | rdf:_2 | &amp;#34;Frank&amp;#34;               |
| rdf:_3 | rdf:_3 | &amp;#34;44887.0&amp;#34;^^xsd:double |
| rdf:_3 | rdf:_4 | &amp;#34;54.0&amp;#34;^^xsd:double    |
-------------------------------------------
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;It&amp;rsquo;s a pretty nice representation of the original spreadsheet.&lt;/p&gt;
&lt;p&gt;The SPARQL Anything server looks cool, but I liked how I could do the above with a downloaded jar file and no configuration or setup. I just downloaded and ran it.&lt;/p&gt;
&lt;p&gt;At 5:06 of the 15:34 video &lt;a href=&#34;https://www.youtube.com/watch?v=Ak3bykN2dgI&#34;&gt;Streamlining Knowledge Graph Construction with a façade: the SPARQL Anything project - Enrico Daga&lt;/a&gt; one of the key SPARQL Anything developers gives some good background on the philosophy of Facade-X. To summarize, it models things as lists of lists. Blank nodes, as we saw above, play a large role. A paper published by Enrico and his colleagues for the &lt;a href=&#34;https://dl.acm.org/doi/10.1145/3555312&#34;&gt;ACM Transactions on Internet Technology&lt;/a&gt; (also available on one of the &lt;a href=&#34;https://sparql.xyz/FacadeX_TOIT.pdf&#34;&gt;SPARQL Anything&lt;/a&gt; websites) describes Facade-X in more detail.&lt;/p&gt;
&lt;p&gt;For fun, I used SPARQL Anything to send SPARQL queries to some other formats as well, like PPTx and markdown. My query for most of these was simply &lt;code&gt;CONSTRUCT {?s ?p ?o} WHERE {?s ?p ?o}&lt;/code&gt; because I just wanted to convert the various formats to RDF and see what that looked like. More sophisticated &lt;code&gt;CONSTRUCT&lt;/code&gt; or &lt;code&gt;SELECT&lt;/code&gt; queries could pull out information modeled for specific applications.&lt;/p&gt;
&lt;p&gt;I can picture SPARQL Anything being useful in many, many projects. For example: about a half dozen of its possible input formats are typically used for natural language unstructured text. Turning that into triples, where the object of each triple stores a document or paragraph of natural language text, is a great way to hand these documents off to RDF-based text analysis tools such as the &lt;a href=&#34;https://spacy.io/&#34;&gt;spacy&lt;/a&gt; entity recognition library that I wrote about in my post &lt;a href=&#34;../spacy&#34;&gt;Entity recognition from within a SPARQL query&lt;/a&gt;. (Ontotext&amp;rsquo;s GraphDB Free triplestore supports other entity recognition libraries that would benefit from these conversions as well.)&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;m sure there are many other potential applications as an increasing number of projects seek to pull information from commonly used file formats to add to Knowledge Graphs. SPARQL Anything  is an excellent contribution to anyone&amp;rsquo;s toolbox of potential RDF workflow pipeline steps.&lt;/p&gt;
&lt;hr/&gt;
&lt;p&gt;&lt;em&gt;Comments? Reply to  &lt;a href=&#34;https://x.com/bobdc/status/1815055289982349751&#34;&gt;my tweet&lt;/a&gt; (or even better, my &lt;a href=&#34;https://mas.to/@bobdc/112825332181616751&#34;&gt;Mastodon message&lt;/a&gt;) announcing this blog entry.&lt;/em&gt;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2024">2024</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
    </item>
    
    <item>
      <title>Querying for audio on Wikidata</title>
      <link>https://www.bobdc.com/blog/wikidataaudio/</link>
      <pubDate>Sat, 22 Jun 2024 11:00:00 +0000</pubDate>
      
      <guid>https://www.bobdc.com/blog/wikidataaudio/</guid>
      
      
      <description><div>Music and more.</div><div>&lt;!-- picture source: https://flickr.com/photos/hwmobs/51389438903/ --&gt;
&lt;img src=&#34;https://www.bobdc.com/img/main/AustralianBrassBand.jpg&#34; alt=&#34;[Brass Band in Ballarat, Victoria - early 1900s]&#34; border=&#34;0&#34; align=&#34;right&#34; width=&#34;440&#34; style=&#34;margin-left: 30px; margin-bottom: 30px&#34;/&gt;
&lt;p&gt;For a long time I&amp;rsquo;ve thought that it would be fun to use SPARQL queries of Wikidata to create music playlists that can be played back.  While researching last month&amp;rsquo;s blog entry &lt;a href=&#34;../querywatchmovies&#34;&gt;Use SPARQL to query for movies, then watch them&lt;/a&gt; I learned about the &lt;a href=&#34;https://www.wikidata.org/wiki/Property:P724&#34;&gt;P724&lt;/a&gt; Internet Archive ID property, and that turned out to be an excellent hook for finding Wikidata audio recordings that we can listen to.&lt;/p&gt;
&lt;p&gt;In that last entry, my query for the films of Frank Capra searched for resources that have a P31 value of Q11424 (that is, they are instances of film) and have a P724 value of some movie that we can watch. This brought up the question: what other types besides film have P724 values? We can answer this with a simple query:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;SELECT DISTINCT ?name WHERE {
  ?s wdt:P724 ?internetArchiveID ;
     wdt:P31 ?type . 

  ?type rdfs:label ?name . 
  FILTER ( lang(?name) = &amp;#34;en&amp;#34; )
}
LIMIT 100
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;(Instead of showing query results here, I will provide links so you can run each yourself.) The &lt;a href=&#34;https://query.wikidata.org/#SELECT%20DISTINCT%20%3Fname%20WHERE%20%7B%0A%20%20%3Fs%20wdt%3AP724%20%3FinternetArchiveID%20%3B%0A%20%20%20%20%20wdt%3AP31%20%3Ftype%20.%20%0A%0A%20%20%3Ftype%20rdfs%3Alabel%20%3Fname%20.%20%0A%20%20FILTER%20%28%20lang%28%3Fname%29%20%3D%20%22en%22%20%29%0A%7D%0ALIMIT%20100%0A&#34;&gt;list of types that have Internet Archive IDs&lt;/a&gt; includes many interesting possibilities. (I limited it to 100 results because it was threatening to time out.)&lt;/p&gt;
&lt;p&gt;While I encourage you to explore the various values that this retrieves, I decided to focus on Q105543609, &lt;a href=&#34;https://www.wikidata.org/wiki/Q105543609&#34;&gt;musical work/composition&lt;/a&gt;. I found that many had a &lt;a href=&#34;https://www.wikidata.org/wiki/Property:P136&#34;&gt;genre&lt;/a&gt; property, so I listed all the possible values that came up for that:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;SELECT DISTINCT ?genre ?genreName WHERE {
  ?s wdt:P31  wd:Q105543609; # a musical composition
     wdt:P51 ?recording ;    # where a recording exists
     wdt:P136 ?genre .       # that is tagged with a genre
  ?genre rdfs:label ?genreName . 
   FILTER( lang(?genreName) = &amp;#34;en&amp;#34; )
}
ORDER BY ?genreName
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;When I &lt;a href=&#34;https://query.wikidata.org/#SELECT%20DISTINCT%20%3Fgenre%20%3FgenreName%20WHERE%20%7B%0A%20%20%3Fs%20wdt%3AP31%20%20wd%3AQ105543609%3B%20%23%20a%20musical%20composition%0A%20%20%20%20%20wdt%3AP51%20%3Frecording%20%3B%20%20%20%20%23%20where%20a%20recording%20exists%0A%20%20%20%20%20wdt%3AP136%20%3Fgenre%20.%20%20%20%20%20%20%20%23%20that%20is%20tagged%20with%20a%20genre%0A%20%20%3Fgenre%20rdfs%3Alabel%20%3FgenreName%20.%20%0A%20%20%20FILTER%28%20lang%28%3FgenreName%29%20%3D%20%22en%22%20%29%0A%7D%0AORDER%20BY%20%3FgenreName%0A&#34;&gt;ran this&lt;/a&gt;  I found 150 genres.&lt;/p&gt;
&lt;p&gt;I got a little too excited until I rediscovered one of the common issues with metadata: just because people can tag something with structured metadata doesn&amp;rsquo;t mean that they do, so there are very few recordings associated with many of these tags. For example, the following asks for spirituals,&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;SELECT * WHERE {
  ?s wdt:P31  wd:Q105543609; # a musical composition
     wdt:P51 ?recording ;    # with a recording
     wdt:P136 wd:Q212024 ;   # that has a genre of spiritual
     rdfs:label ?name . 
  ?wppage schema:about ?s .
   FILTER(contains(str(?wppage),&amp;#34;//en.&amp;#34;)) # Only the English Wikipedia pages  
   FILTER( lang(?name) = &amp;#34;en&amp;#34; )               
}
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;but if you &lt;a href=&#34;https://query.wikidata.org/#SELECT%20%2a%20WHERE%20%7B%0A%20%20%3Fs%20wdt%3AP31%20%20wd%3AQ105543609%3B%20%23%20a%20musical%20composition%0A%20%20%20%20%20wdt%3AP51%20%3Frecording%20%3B%20%20%20%20%23%20with%20a%20recording%0A%20%20%20%20%20wdt%3AP136%20wd%3AQ212024%20%3B%20%20%20%23%20that%20has%20a%20genre%20of%20spiritual%0A%20%20%20%20%20rdfs%3Alabel%20%3Fname%20.%20%0A%20%20%3Fwppage%20schema%3Aabout%20%3Fs%20.%0A%20%20%20FILTER%28contains%28str%28%3Fwppage%29%2C%22%2F%2Fen.%22%29%29%20%23%20Only%20the%20English%20Wikipedia%20pages%20%20%0A%20%20%20FILTER%28%20lang%28%3Fname%29%20%3D%20%22en%22%20%29%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%0A%7D%0A&#34;&gt;run it&lt;/a&gt; you&amp;rsquo;ll only find two recordings. (Of the two, the &lt;a href=&#34;https://commons.wikimedia.org/wiki/File:The_Old_Time_Religion_-_Tuskegee_Institute_Singers.flac&#34;&gt;1915 Tuskegee Institute Singers 78RPM record of &amp;ldquo;The Old Time Religion&amp;rdquo;&lt;/a&gt; is a wonderful Wikimedia find.)&lt;/p&gt;
&lt;p&gt;That last query and several of the remaining ones also ask for the associated Wikipedia page, which was handy when one of my queries would turn up a recording that made me think &amp;ldquo;wait, what IS this?&amp;rdquo;&lt;/p&gt;
&lt;p&gt;Replacing  the genre value in that query with others gave me some interesting results. Using wd:Q102932 for &amp;ldquo;avant-garde&amp;rdquo; got me a MIDI file and an ogg vorbis &amp;ldquo;recording&amp;rdquo; of John Cage&amp;rsquo;s famous silent piece &lt;a href=&#34;https://en.wikipedia.org/wiki/4%E2%80%B233%E2%80%B3&#34;&gt;4&#39;33&amp;quot;&lt;/a&gt;. A genre of wd:Q9734 for &amp;ldquo;symphony&amp;rdquo; found two movements of Beethoven&amp;rsquo;s 7th and no other recordings. The one recording tagged as wd:Q7749 for &amp;ldquo;rock and roll&amp;rdquo; was the U.S. Air Force band playing &amp;ldquo;When the Saints Go Marching In&amp;rdquo;, which reminds me of the old semantic web saying &amp;ldquo;anyone can say anything about anything&amp;rdquo;. (Considering their arrangement, I was tempted to change it to the wd:Q906647 category for &amp;ldquo;dixieland jazz&amp;rdquo;, but because the &lt;a href=&#34;https://www.wikidata.org/wiki/Q1753926&#34;&gt;Wikidata page&lt;/a&gt; lists the song as a &amp;ldquo;gospel hymn&amp;rdquo;, I changed &amp;ldquo;rock and roll&amp;rdquo; to that.)&lt;/p&gt;
&lt;p&gt;Another genre is &amp;ldquo;national anthem&amp;rdquo;. A &lt;a href=&#34;https://query.wikidata.org/#SELECT%20%2a%20WHERE%20%7B%0A%20%20%3Fs%20wdt%3AP31%20%20wd%3AQ105543609%3B%20%23%20a%20musical%20composition%0A%20%20%20%20%20wdt%3AP51%20%3Frecording%20%3B%20%20%20%20%23%20with%20a%20recording%0A%20%20%20%20%20wdt%3AP136%20wd%3AQ23691%20%3B%20%20%20%23%20that%20has%20a%20genre%20of%20national%20anthem%0A%20%20%20%20%20rdfs%3Alabel%20%3Fname%20.%20%0A%20%20%3Fwppage%20schema%3Aabout%20%3Fs%20.%0A%20%20%20FILTER%28contains%28str%28%3Fwppage%29%2C%22%2F%2Fen.%22%29%29%20%23%20Only%20the%20English%20Wikipedia%20pages%20%20%0A%20%20%20FILTER%28%20lang%28%3Fname%29%20%3D%20%22en%22%20%29%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%0A%7D%0A&#34;&gt;query for that&lt;/a&gt; only gave one result, but there are other ways to query for specific types of recordings than by the genre value. Instead of looking for recordings that are instances of musical composition, I can just look for those that are instances of national anthem. This turned up 369 recordings:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;SELECT ?anthemName ?wppage ?recording WHERE {
   ?anthem wdt:P31 wd:Q23691 ; # is a national anthem
      wdt:P51 ?recording ;
           rdfs:label ?anthemName . 
   ?wppage schema:about ?anthem .
         
   FILTER( lang(?anthemName) = &amp;#34;en&amp;#34; )    
   FILTER(contains(str(?wppage),&amp;#34;//en.&amp;#34;)) # Only the English Wikipedia pages  
}
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;&lt;a href=&#34;https://query.wikidata.org/#SELECT%20%3FanthemName%20%3Fwppage%20%3Frecording%20WHERE%20%7B%0A%20%20%20%3Fanthem%20wdt%3AP31%20wd%3AQ23691%20%3B%20%23%20is%20a%20national%20anthem%0A%20%20%20%20%20%20wdt%3AP51%20%3Frecording%20%3B%0A%20%20%20%20%20%20%20%20%20%20%20rdfs%3Alabel%20%3FanthemName%20.%20%0A%20%20%20%3Fwppage%20schema%3Aabout%20%3Fanthem%20.%0A%20%20%20%20%20%20%20%20%20%0A%20%20%20FILTER%28%20lang%28%3FanthemName%29%20%3D%20%22en%22%20%29%20%20%20%20%0A%20%20%20FILTER%28contains%28str%28%3Fwppage%29%2C%22%2F%2Fen.%22%29%29%20%23%20Only%20the%20English%20Wikipedia%20pages%20%20%0A%7D%0A&#34;&gt;Running that&lt;/a&gt; can let you create a pretty crazy playlist.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://www.wikidata.org/wiki/Property:P870&#34;&gt;Instrumentation&lt;/a&gt; was another interesting property to use when searching for music. I started by asking, for all the recordings, which instrumentation values were used:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;SELECT DISTINCT ?instrumentation WHERE {
  ?s  wdt:P51 ?recording ;
      wdt:P870 ?instrumentationURI .
  ?instrumentationURI rdfs:label ?instrumentation.
  FILTER ( lang(?instrumentation) = &amp;#34;en&amp;#34; )    
}
ORDER BY ?instrumentation
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;&lt;a href=&#34;https://query.wikidata.org/#SELECT%20DISTINCT%20%3Finstrumentation%20WHERE%20%7B%0A%20%20%3Fs%20%20wdt%3AP51%20%3Frecording%20%3B%0A%20%20%20%20%20%20wdt%3AP870%20%3FinstrumentationURI%20.%0A%20%20%3FinstrumentationURI%20rdfs%3Alabel%20%3Finstrumentation.%0A%20%20FILTER%20%28%20lang%28%3Finstrumentation%29%20%3D%20%22en%22%20%29%20%20%20%20%0A%7D%0AORDER%20BY%20%3Finstrumentation%0A&#34;&gt;Running it&lt;/a&gt; showed 44 results. One was viola, so I wondered how many have that as their instrumentation value:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;SELECT ?recording WHERE {
  ?s  wdt:P51 ?recording ;
      wdt:P870 ?instrumentationURI .
  ?instrumentationURI rdfs:label &amp;#34;viola&amp;#34;@en.
}
ORDER BY ?instrumentation
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;&lt;a href=&#34;https://query.wikidata.org/#SELECT%20%3Frecording%20WHERE%20%7B%0A%20%20%3Fs%20%20wdt%3AP51%20%3Frecording%20%3B%0A%20%20%20%20%20%20wdt%3AP870%20%3FinstrumentationURI%20.%0A%20%20%3FinstrumentationURI%20rdfs%3Alabel%20%22viola%22%40en.%0A%7D%0AORDER%20BY%20%3Finstrumentation%0A&#34;&gt;Running this one&lt;/a&gt; showed nine pieces, each where a viola would be part of the group. For example, Mozart&amp;rsquo;s &amp;ldquo;A Little Night Music&amp;rdquo; has four instrumentation values:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;SELECT DISTINCT ?instrumentation  WHERE {
  wd:Q12025 wdt:P51 ?recording ;
               wdt:P870 ?instrumentationURI .
    ?instrumentationURI rdfs:label ?instrumentation.
    FILTER ( lang(?instrumentation) = &amp;#34;en&amp;#34; )    
}
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;&lt;a href=&#34;https://query.wikidata.org/#SELECT%20DISTINCT%20%3Finstrumentation%20%20WHERE%20%7B%0A%20%20wd%3AQ12025%20wdt%3AP51%20%3Frecording%20%3B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20wdt%3AP870%20%3FinstrumentationURI%20.%0A%20%20%20%20%3FinstrumentationURI%20rdfs%3Alabel%20%3Finstrumentation.%0A%20%20%20%20FILTER%20%28%20lang%28%3Finstrumentation%29%20%3D%20%22en%22%20%29%20%20%20%20%0A%7D%0A&#34;&gt;You will see&lt;/a&gt; that it was written for a string orchestra.&lt;/p&gt;
&lt;p&gt;As you can see, for many of these I would see which properties were used with resources that had recordings and then did more queries with those properties. There are bird calls, historic speeches from the early days of audio recording, and all kinds of things to explore. I&amp;rsquo;m sure I&amp;rsquo;ll be doing more.&lt;/p&gt;
&lt;p&gt;I will leave you with one of my more successful queries:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;SELECT ?name ?wppage ?recording WHERE {
  
  ?composerURL rdfs:label &amp;#34;Johann Sebastian Bach&amp;#34;@en .   
  ?instrumentationURI rdfs:label &amp;#34;harpsichord&amp;#34;@en . 
  
  ?s  wdt:P51 ?recording ;
      wdt:P870 ?instrumentationURI ; 
      rdfs:label ?name ;
      wdt:P86 ?composerURL .
  ?wppage schema:about ?s . 
  FILTER( lang(?name) = &amp;#34;en&amp;#34; )    
  FILTER(contains(str(?wppage),&amp;#34;//en.&amp;#34;)) # Only the English Wikipedia pages
}
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;&lt;a href=&#34;https://query.wikidata.org/#SELECT%20%3Fname%20%3Fwppage%20%3Frecording%20WHERE%20%7B%0A%20%20%0A%20%20%3FcomposerURL%20rdfs%3Alabel%20%22Johann%20Sebastian%20Bach%22%40en%20.%20%20%20%0A%20%20%3FinstrumentationURI%20rdfs%3Alabel%20%22harpsichord%22%40en%20.%20%0A%20%20%0A%20%20%3Fs%20%20wdt%3AP51%20%3Frecording%20%3B%0A%20%20%20%20%20%20wdt%3AP870%20%3FinstrumentationURI%20%3B%20%0A%20%20%20%20%20%20rdfs%3Alabel%20%3Fname%20%3B%0A%20%20%20%20%20%20wdt%3AP86%20%3FcomposerURL%20.%0A%20%20%3Fwppage%20schema%3Aabout%20%3Fs%20.%20%0A%20%20FILTER%28%20lang%28%3Fname%29%20%3D%20%22en%22%20%29%20%20%20%20%0A%20%20FILTER%28contains%28str%28%3Fwppage%29%2C%22%2F%2Fen.%22%29%29%20%23%20Only%20the%20English%20Wikipedia%20pages%0A%7D%0A&#34;&gt;Running this&lt;/a&gt; will give you recordings of 14 J.S. Bach harpsichord pieces.&lt;/p&gt;
&lt;hr/&gt;
&lt;p&gt;&lt;em&gt;Comments? Reply to  &lt;a href=&#34;https://x.com/bobdc/status/1804899348913717428&#34;&gt;my tweet&lt;/a&gt; (or even better, my &lt;a href=&#34;https://mas.to/@bobdc/112666611022796557&#34;&gt;Mastodon message&lt;/a&gt;) announcing this blog entry.&lt;/em&gt;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2024">2024</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
      <category domain="https://www.bobdc.com//categories/wikidata">wikidata</category>
      
    </item>
    
    <item>
      <title>Use SPARQL to query for movies, then watch them</title>
      <link>https://www.bobdc.com/blog/querywatchmovies/</link>
      <pubDate>Sun, 26 May 2024 12:00:00 +0000</pubDate>
      
      <guid>https://www.bobdc.com/blog/querywatchmovies/</guid>
      
      
      <description><div>On YouTube and more.</div><div>&lt;img src=&#34;https://www.bobdc.com/img/main/hisGirlFriday.png&#34; alt=&#34;[still from His Girl Friday]&#34; border=&#34;0&#34; align=&#34;right&#34; hspace=&#34;30px&#34; vspace=&#34;30px&#34; width=&#34;240&#34;/&gt;
&lt;p&gt;I recently learned about &lt;a href=&#34;https://wikiflix.toolforge.org/#/&#34;&gt;WikiFlix&lt;/a&gt;, which lets you search for streamable movies on the Internet. It was assembled by &lt;a href=&#34;https://pro.europeana.eu/person/sandra-fauconnier&#34;&gt;Sandra Fauconnier&lt;/a&gt; and &lt;a href=&#34;https://en.wikipedia.org/wiki/Magnus_Manske&#34;&gt;Magnus Manske&lt;/a&gt;. (Magnus played a major role in developing MediaWiki, which I&amp;rsquo;ve blogged about several times.) Sandra has provided some &lt;a href=&#34;https://commons.wikimedia.org/wiki/User:Spinster/Thoughts_about_WikiFlix_(and_dynamic_multimedia_portals)&#34;&gt;good background&lt;/a&gt; on the history and goals of WikiFlix on Wikimedia.&lt;/p&gt;
&lt;p&gt;When I sent her some geeky questions about the role of Wikidata in that, she told me about a great Wikidata property that I hadn&amp;rsquo;t known about: &lt;a href=&#34;https://www.wikidata.org/wiki/Property:P1651&#34;&gt;P1651&lt;/a&gt;, or &amp;ldquo;YouTube video ID&amp;rdquo;. It usually links to a video of an entire movie. Once I started playing with it, it didn&amp;rsquo;t take me long to come up with this query for the titles and YouTube links of films with Cary Grant:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;SELECT ?filmTitle ?youTubeURL  WHERE {
  ?castMember rdfs:label &amp;#34;Cary Grant&amp;#34;@en . 
  ?film wdt:P161 ?castMember ;
        rdfs:label ?filmTitle ;
        wdt:P1651 ?youtubeID .
  FILTER(lang(?filmTitle) = &amp;#34;en&amp;#34;)
  BIND(URI(CONCAT(&amp;#39;https://www.youtube.com/watch?v=&amp;#39;,?youtubeID)) AS ?youTubeURL)
}
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;(I didn&amp;rsquo;t include prefix declarations in the query because I only used prefixes that are predeclared in Wikidata.) The last line just adds the P1651 value to the usual YouTube stub and converts the result to a URI— or for our purposes, a URL, because it&amp;rsquo;s locating something. As you&amp;rsquo;ll see if you &lt;a href=&#34;https://query.wikidata.org/#SELECT%20%3FfilmTitle%20%3FyouTubeURL%20%20WHERE%20%7B%0A%20%20%3FcastMember%20rdfs%3Alabel%20%22Cary%20Grant%22%40en%20.%20%0A%20%20%3Ffilm%20wdt%3AP161%20%3FcastMember%20%3B%0A%20%20%20%20%20%20%20%20rdfs%3Alabel%20%3FfilmTitle%20%3B%0A%20%20%20%20%20%20%20%20wdt%3AP1651%20%3FyoutubeID%20.%0A%20%20FILTER%28lang%28%3FfilmTitle%29%20%3D%20%22en%22%29%0A%20%20BIND%28URI%28CONCAT%28%27https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3D%27%2C%3FyoutubeID%29%29%20AS%20%3FyouTubeURL%29%0A%7D%0A%0A&#34;&gt;run this query on Wikidata&lt;/a&gt;, the &lt;code&gt;?youTubeURL&lt;/code&gt; links will take you to the listed movies on YouTube.&lt;/p&gt;
&lt;p&gt;Some of the links actually lead to a page saying that that the video has been taken down because of a copyright claim. The earlier the film was made, the more likely it is to be available on YouTube, so let&amp;rsquo;s list them sorted by date:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;SELECT ?releaseDate ?filmTitle ?youTubeURL  WHERE {
  ?castMember rdfs:label &amp;#34;Cary Grant&amp;#34;@en . 
  ?film wdt:P161 ?castMember ;
        wdt:P577 ?releaseDate ; 
        rdfs:label ?filmTitle ;
        wdt:P1651 ?youtubeID .
  FILTER(lang(?filmTitle) = &amp;#34;en&amp;#34;)
  BIND(URI(CONCAT(&amp;#39;https://www.youtube.com/watch?v=&amp;#39;,?youtubeID)) AS ?youTubeURL)
}
ORDER BY ?releaseDate
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;If you &lt;a href=&#34;https://query.wikidata.org/#SELECT%20%3FreleaseDate%20%3FfilmTitle%20%3FyouTubeURL%20%20WHERE%20%7B%0A%20%20%3FcastMember%20rdfs%3Alabel%20%22Cary%20Grant%22%40en%20.%20%0A%20%20%3Ffilm%20wdt%3AP161%20%3FcastMember%20%3B%0A%20%20%20%20%20%20%20%20wdt%3AP577%20%3FreleaseDate%20%3B%20%0A%20%20%20%20%20%20%20%20rdfs%3Alabel%20%3FfilmTitle%20%3B%0A%20%20%20%20%20%20%20%20wdt%3AP1651%20%3FyoutubeID%20.%0A%20%20FILTER%28lang%28%3FfilmTitle%29%20%3D%20%22en%22%29%0A%20%20BIND%28URI%28CONCAT%28%27https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3D%27%2C%3FyoutubeID%29%29%20AS%20%3FyouTubeURL%29%0A%7D%0AORDER%20BY%20%3FreleaseDate%0A&#34;&gt;run that on Wikidata&lt;/a&gt; you&amp;rsquo;ll see them listed by date. The links for the first few movies in the result worked fine for me. Some link to multiple YouTube URLs, which I thought was worth leaving in case the first one I try doesn&amp;rsquo;t work.&lt;/p&gt;
&lt;p&gt;Of course movies have all kinds of metadata to query for, which adds to the fun. For example, I could query for Cary Grant movies directed by Howard Hawks:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;SELECT ?releaseDate ?filmTitle ?youTubeURL  WHERE {
  ?castMember rdfs:label &amp;#34;Cary Grant&amp;#34;@en . 
  ?director rdfs:label &amp;#34;Howard Hawks&amp;#34;@en . 
  ?film wdt:P161 ?castMember ;
        wdt:P577 ?releaseDate ; 
        rdfs:label ?filmTitle ;
        wdt:P57 ?director ; 
        wdt:P1651 ?youtubeID .
  FILTER(lang(?filmTitle) = &amp;#34;en&amp;#34;)
  BIND(URI(CONCAT(&amp;#39;https://www.youtube.com/watch?v=&amp;#39;,?youtubeID)) AS ?youTubeURL)
}
ORDER BY ?releaseDate
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;You&amp;rsquo;ll see &lt;a href=&#34;https://query.wikidata.org/#SELECT%20%3FreleaseDate%20%3FfilmTitle%20%3FyouTubeURL%20%20WHERE%20%7B%0A%20%20%3FcastMember%20rdfs%3Alabel%20%22Cary%20Grant%22%40en%20.%20%0A%20%20%3Fdirector%20rdfs%3Alabel%20%22Howard%20Hawks%22%40en%20.%20%0A%20%20%3Ffilm%20wdt%3AP161%20%3FcastMember%20%3B%0A%20%20%20%20%20%20%20%20wdt%3AP577%20%3FreleaseDate%20%3B%20%0A%20%20%20%20%20%20%20%20rdfs%3Alabel%20%3FfilmTitle%20%3B%0A%20%20%20%20%20%20%20%20wdt%3AP57%20%3Fdirector%20%3B%20%0A%20%20%20%20%20%20%20%20wdt%3AP1651%20%3FyoutubeID%20.%0A%20%20FILTER%28lang%28%3FfilmTitle%29%20%3D%20%22en%22%29%0A%20%20BIND%28URI%28CONCAT%28%27https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3D%27%2C%3FyoutubeID%29%29%20AS%20%3FyouTubeURL%29%0A%7D%0AORDER%20BY%20%3FreleaseDate%0A&#34;&gt;three movies&lt;/a&gt; in the results. (I have certainly seen &amp;ldquo;Bringing Up Baby&amp;rdquo; and &amp;ldquo;His Girl Friday&amp;rdquo; but I have never heard of Hawks and Grant doing a film called &amp;ldquo;Monkey Business&amp;rdquo;, which I will certainly need to check out.) The &lt;a href=&#34;https://www.wikidata.org/wiki/Property:P136&#34;&gt;P136&lt;/a&gt; genre property is another that can make movie query results more interesting.&lt;/p&gt;
&lt;p&gt;Sandra also told me about two more properties that point at other video collections: &lt;code&gt;P10&lt;/code&gt; and &lt;code&gt;P724&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://www.wikidata.org/wiki/Property:P10&#34;&gt;P10&lt;/a&gt; points to video content stored on Wikimedia. For example, the &lt;a href=&#34;https://www.wikidata.org/wiki/Q829250&#34;&gt;Wikidata page&lt;/a&gt; for Dziga Vertov&amp;rsquo;s &lt;a href=&#34;https://en.wikipedia.org/wiki/Man_with_a_Movie_Camera&#34;&gt;Man with a Movie Camera&lt;/a&gt;, which I knew was an important early Soviet silent but have never seen, includes this triple; the triple&amp;rsquo;s object &lt;a href=&#34;http://commons.wikimedia.org/wiki/Special:FilePath/Man%20With%20A%20Movie%20Camera%20%28Dziga%20Vertov%2C%201929%29.webm&#34;&gt;links to a version of the film that we can watch&lt;/a&gt;:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;Q829250 wdt:P10 &amp;lt;http://commons.wikimedia.org/wiki/Special:FilePath/Man%20With%20A%20Movie%20Camera%20%28Dziga%20Vertov%2C%201929%29.webm&amp;gt;
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;&lt;a href=&#34;https://www.wikidata.org/wiki/Property:P724&#34;&gt;P724&lt;/a&gt; is a resource&amp;rsquo;s Internet Archive ID. This may be a film, or it may be an emulated video game or even software such as Visicalc. The &lt;a href=&#34;https://www.wikidata.org/wiki/Q59317&#34;&gt;Wikidata page&lt;/a&gt; for the 1944 Frank Capra film &lt;a href=&#34;https://en.wikipedia.org/wiki/Arsenic_and_Old_Lace_(film)&#34;&gt;Arsenic and Old Lace&lt;/a&gt; has two &lt;code&gt;wdt:P724&lt;/code&gt; values: &amp;ldquo;1944-arsenic-and-old-lace-arsenico-por-compasion-frank-capra-vose&amp;rdquo; and &amp;ldquo;1944-arsenic-and-old-lace-este-mundo-e-um-hospicio-frank-capra-legendado&amp;rdquo;. Add either one to the stub &lt;code&gt;https://archive.org/details/&lt;/code&gt; and you&amp;rsquo;ll have a URL that lets you watch the whole movie.&lt;/p&gt;
&lt;p&gt;Because &lt;code&gt;P724&lt;/code&gt; gets applied to so many different media, when looking for movies it&amp;rsquo;s a good idea to have your query specify that you want an &lt;a href=&#34;http://www.wikidata.org/prop/direct/P31&#34;&gt;instance of&lt;/a&gt; &lt;a href=&#34;http://www.wikidata.org/entity/Q11424&#34;&gt;film&lt;/a&gt;. For example:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;SELECT ?title ?internetArchiveURL WHERE {
  ?film wdt:P31	 wd:Q11424 ;   # it&amp;#39;s a film 
        wdt:P724 ?internetArchiveID; 
        wdt:P57 ?director ;
        rdfs:label ?title . 
  ?director rdfs:label &amp;#34;Frank Capra&amp;#34;@en .
  FILTER(lang(?title) = &amp;#34;en&amp;#34;)
  BIND(URI(CONCAT(&amp;#39;https://archive.org/details/&amp;#39;,?internetArchiveID)) AS ?internetArchiveURL)
}
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;(When I &lt;a href=&#34;https://query.wikidata.org/#SELECT%20%3Ftitle%20%3FinternetArchiveURL%20WHERE%20%7B%0A%20%20%3Ffilm%20wdt%3AP31%09%20wd%3AQ11424%20%3B%20%20%20%23%20it%27s%20a%20film%20%0A%20%20%20%20%20%20%20%20wdt%3AP724%20%3FinternetArchiveID%3B%20%0A%20%20%20%20%20%20%20%20wdt%3AP57%20%3Fdirector%20%3B%0A%20%20%20%20%20%20%20%20rdfs%3Alabel%20%3Ftitle%20.%20%0A%20%20%3Fdirector%20rdfs%3Alabel%20%22Frank%20Capra%22%40en%20.%0A%20%20FILTER%28lang%28%3Ftitle%29%20%3D%20%22en%22%29%0A%20%20BIND%28URI%28CONCAT%28%27https%3A%2F%2Farchive.org%2Fdetails%2F%27%2C%3FinternetArchiveID%29%29%20AS%20%3FinternetArchiveURL%29%0A%7D%0A&#34;&gt;ran that one&lt;/a&gt; it was interesting to see how many of the World War II propaganda films that Capra directed are available for viewing.)&lt;/p&gt;
&lt;p&gt;So the next time you&amp;rsquo;re looking for a film to watch, instead of Netflix or Apple TV, let some SPARQL queries of Wikidata point you to classic films that you can watch for free!&lt;/p&gt;
&lt;hr/&gt;
&lt;p&gt;&lt;em&gt;Comments? Reply to  &lt;a href=&#34;https://x.com/bobdc/status/1794765215508333004&#34;&gt;my tweet&lt;/a&gt; (or even better, my &lt;a href=&#34;https://mas.to/@bobdc/112508303003154160&#34;&gt;Mastodon message&lt;/a&gt;) announcing this blog entry.&lt;/em&gt;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2024">2024</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
    </item>
    
    <item>
      <title>SPARQL queries of the Billboard Hot 100</title>
      <link>https://www.bobdc.com/blog/hot100/</link>
      <pubDate>Sun, 21 Apr 2024 11:25:00 +0000</pubDate>
      
      <guid>https://www.bobdc.com/blog/hot100/</guid>
      
      
      <description><div>Current and historical data!</div><div>&lt;img src=&#34;https://www.bobdc.com/img/main/hot100sparql.png&#34; alt=&#34;[Hot 100 and SPARQL logos]&#34; border=&#34;0&#34; align=&#34;right&#34; hspace=&#34;30px&#34; vspace=&#34;30px&#34;/&gt;
&lt;p&gt;Wikipedia describes the &lt;a href=&#34;https://en.wikipedia.org/wiki/Billboard_Hot_100&#34;&gt;Billboard Hot 100&lt;/a&gt; as &amp;ldquo;the music industry standard record chart in the United States for songs, published weekly by Billboard magazine. Chart rankings are based on sales (physical and digital), online streaming, and radio airplay in the U.S.&amp;rdquo; A song that ranks  highly there is a hit song (in the U.S.) by definition. The data goes back to the beginning of the chart&amp;rsquo;s history in 1958, when Rick Nelson&amp;rsquo;s &lt;a href=&#34;https://www.youtube.com/watch?v=R12H8QWnwvE&#34;&gt;Poor Little Fool&lt;/a&gt; was the number one song.&lt;/p&gt;
&lt;p&gt;I recently learned about &lt;a href=&#34;https://github.com/mhollingshead/billboard-hot-100&#34;&gt;billboard-hot-100&lt;/a&gt;, which is &amp;ldquo;a git repository of JSON files for every Billboard Hot 100 chart in history, updated daily&amp;rdquo;. Of course I thought it would be fun to query it with SPARQL, so I wrote something to convert the data to RDF. I did it as a github fork of the project that I called &lt;a href=&#34;https://github.com/bobdc/billboard-hot-100-rdf/&#34;&gt;billboard-hot-100-rdf&lt;/a&gt; in case anyone else wants to play with it. One nice advantage of doing it this way is that, because Billboard updates their chart on Tuesday mornings and the billboard-hot-100 repository scrapes that data every Wednesday, you can do SPARQL queries against the latest data on Thursday through Monday.&lt;/p&gt;
&lt;p&gt;Here is a sample of the JSON data from that billboard-hot-100 project. Instead of listing all the hits from that week, my excerpt only lists two:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;[{
  &amp;#34;date&amp;#34;: &amp;#34;2024-04-06&amp;#34;,
  &amp;#34;data&amp;#34;: [
    {
      &amp;#34;song&amp;#34;: &amp;#34;Cruel Summer&amp;#34;,
      &amp;#34;artist&amp;#34;: &amp;#34;Taylor Swift&amp;#34;,
      &amp;#34;this_week&amp;#34;: 16,
      &amp;#34;last_week&amp;#34;: 10,
      &amp;#34;peak_position&amp;#34;: 1,
      &amp;#34;weeks_on_chart&amp;#34;: 47
    },
    {
      &amp;#34;song&amp;#34;: &amp;#34;Redrum&amp;#34;,
      &amp;#34;artist&amp;#34;: &amp;#34;21 Savage&amp;#34;,
      &amp;#34;this_week&amp;#34;: 30,
      &amp;#34;last_week&amp;#34;: 22,
      &amp;#34;peak_position&amp;#34;: 5,
      &amp;#34;weeks_on_chart&amp;#34;: 11
    }  ]
}]
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;The &lt;a href=&#34;https://github.com/bobdc/billboard-hot-100-rdf/blob/main/rdf/h100json2rdf.py&#34;&gt;h100json2rdf.py&lt;/a&gt; Python script in my fork of the project  (which is only about 60 lines including white space and comments)  converts the data above to this:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;@prefix h1: &amp;lt;http://rdfdata.org/hot100#&amp;gt; .
@prefix schema: &amp;lt;http://schema.org/&amp;gt; .
@prefix dc: &amp;lt;http://purl.org/dc/elements/1.1/&amp;gt; .
@prefix xsd: &amp;lt;http://www.w3.org/2001/XMLSchema#&amp;gt; .
@prefix rdfs: &amp;lt;http://www.w3.org/2000/01/rdf-schema#&amp;gt; .

h1:TaylorSwift a h1:MusicalArtist ; 
   rdfs:label &amp;#34;Taylor Swift&amp;#34;@en .

h1:TaylorSwiftCruelSummer a schema:MusicRecording;
     schema:byArtist h1:TaylorSwift;
     dc:title &amp;#34;Cruel Summer&amp;#34;;
     h1:charted &amp;#34;2024-04-06&amp;#34;^^xsd:date {| 
        h1:position 16
|}.

h1:21Savage a h1:MusicalArtist ; 
   rdfs:label &amp;#34;21 Savage&amp;#34;@en .

h1:21SavageRedrum a schema:MusicRecording;
     schema:byArtist h1:21Savage;
     dc:title &amp;#34;Redrum&amp;#34;;
     h1:charted &amp;#34;2024-04-06&amp;#34;^^xsd:date {| 
        h1:position 30
|}.
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;A few things to note about the conversion:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;As always, I used classes and prefixes from existing schemas when I could and made up new ones where necessary.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;When the input data said that a given artist had a given hit, I created instances for both the artist and the song. As we&amp;rsquo;ll see below, this makes queries about artists like &amp;ldquo;who had hits in the most decades&amp;rdquo; easier.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;URIs for the artists are made from their names with a little cleanup to make them proper URIs. URIs for the songs are similar but with the artist name and song title combined because sometimes completely different songs happen to have the same title and I wanted to distinguish them from each other. For example, as the charts tell us, &lt;a href=&#34;https://www.youtube.com/watch?v=l9ml3nyww80&#34;&gt;Banarama&lt;/a&gt; had a hit with a song called &amp;ldquo;Cruel Summer&amp;rdquo; 30 years ago that is completely different from the &lt;a href=&#34;https://www.youtube.com/watch?v=GrKQvyXpNgc&#34;&gt;Taylor Swift&lt;/a&gt; song of the same name.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;To record that a given song charted on a given date at a particular position, I made the position value (&lt;code&gt;h1:position&lt;/code&gt; above) a property of the triple about the song charting at that position. To put it another way, the position value is a property of the graph edge about the date.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;I didn&amp;rsquo;t convert all the JSON properties from the original because, as we&amp;rsquo;ll see, values like &lt;code&gt;weeks_on_chart&lt;/code&gt; are easy enough to query for with the data that I did convert.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I used my Python script to convert the project&amp;rsquo;s &lt;code&gt;all.json&lt;/code&gt; data file into Turtle RDF, loaded the Turtle file into the free version of Ontotext GraphDB, and was ready to start querying. (Other files in that repository hold data for individual weeks, which can make some queries go much faster.)&lt;/p&gt;
&lt;h2 id=&#34;querying-for-the-data-that-we-didnt-convert&#34;&gt;Querying for the data that we didn&amp;rsquo;t convert&lt;/h2&gt;
&lt;p&gt;Our first few queries will show why the Python script didn&amp;rsquo;t bring all the numbers into the Turtle data. Let&amp;rsquo;s query for the number of weeks that a recording had been on the chart:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;PREFIX h1: &amp;lt;http://rdfdata.org/hot100#&amp;gt;
PREFIX schema: &amp;lt;http://schema.org/&amp;gt;
PREFIX dc: &amp;lt;http://purl.org/dc/elements/1.1/&amp;gt;

PREFIX rdfs: &amp;lt;http://www.w3.org/2000/01/rdf-schema#&amp;gt;
SELECT (COUNT(?chartPosition) AS ?weeksOnChart) WHERE {
?recording a schema:MusicRecording ; 
               dc:title &amp;#34;Cruel Summer&amp;#34; ; 
               schema:byArtist/rdfs:label &amp;#34;Bananarama&amp;#34;@en .
    
   &amp;lt;&amp;lt; ?recording h1:charted ?chartDate &amp;gt;&amp;gt; h1:position ?chartPosition .
}
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;The result of this query is that Bananarama&amp;rsquo;s &amp;ldquo;Cruel Summer&amp;rdquo; was on the charts for 18 weeks. Comparing the &lt;a href=&#34;https://www.billboard.com/charts/hot-100/1984-11-17/&#34;&gt;November 17th&lt;/a&gt; and &lt;a href=&#34;https://www.billboard.com/charts/hot-100/1984-11-24/&#34;&gt;November 24th&lt;/a&gt; charts from 1984 confirms that November 17th was the 18th and last chart appearance of Bananarama&amp;rsquo;s &amp;ldquo;Cruel Summer&amp;rdquo;.&lt;/p&gt;
&lt;p&gt;Change the &lt;code&gt;SELECT&lt;/code&gt; line in that query to the following and you&amp;rsquo;ll learn that recording&amp;rsquo;s highest U.S. chart position:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;SELECT (MIN(?chartPosition) AS ?highestPosition) WHERE {
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;The query result is 9, which is confirmed as the recording&amp;rsquo;s highest chart position on its &lt;a href=&#34;https://en.wikipedia.org/wiki/Cruel_Summer_(Bananarama_song)&#34;&gt;Wikipedia page&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;How do we query for a recording&amp;rsquo;s position on the chart the week before a given appearance? We&amp;rsquo;ll ask about Dua Lipa&amp;rsquo;s &amp;ldquo;Houdini&amp;rdquo;. First the following query has a nested query that finds the most recent date that the recording was on the chart, and then the outer query asks about its position the week before that:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;PREFIX h1: &amp;lt;http://rdfdata.org/hot100#&amp;gt;
PREFIX schema: &amp;lt;http://schema.org/&amp;gt;
PREFIX dc: &amp;lt;http://purl.org/dc/elements/1.1/&amp;gt;
PREFIX xsd: &amp;lt;http://www.w3.org/2001/XMLSchema#&amp;gt;
PREFIX rdfs: &amp;lt;http://www.w3.org/2000/01/rdf-schema#&amp;gt;

SELECT ?dateLastWeek ?positionLastWeek WHERE {
   # 2. Find out the week before the latest chart 
   # appearance and the position from that week. 
   BIND (?latestPosition - &amp;#34;P7D&amp;#34;^^xsd:duration AS ?dateLastWeek)
   &amp;lt;&amp;lt; ?recording h1:charted ?dateLastWeek &amp;gt;&amp;gt; h1:position ?positionLastWeek . 
   {
     # 1. Find the date of the latest chart appearance. 
     SELECT ?recording (MAX(?chartDate) AS ?latestPosition) WHERE {
       ?recording a schema:MusicRecording ; 
                  dc:title &amp;#34;Houdini&amp;#34; ; 
                  schema:byArtist/rdfs:label &amp;#34;Dua Lipa&amp;#34;@en .
       &amp;lt;&amp;lt; ?recording h1:charted ?chartDate &amp;gt;&amp;gt; h1:position ?chartPosition .
      }
   GROUP BY ?recording
   } 
} 
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;The result tells me that last week (as I write this), on the March 30th chart, this song was at position 29, which is confirmed &lt;a href=&#34;https://www.billboard.com/charts/hot-100/2024-03-30&#34;&gt;on the Billboard web site&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;querying-for-new-things&#34;&gt;Querying for new things&lt;/h2&gt;
&lt;p&gt;Besides querying for data that the Python script didn&amp;rsquo;t bring over to the Turtle data, what else can we query for? For example, which artist had hits in the most decades? For the decade, this next query takes the first three digits of the chart date and then groups and sorts by that. Then, it lists everyone who has had hits in at least five decades:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;PREFIX schema: &amp;lt;http://schema.org/&amp;gt;
PREFIX h1: &amp;lt;http://rdfdata.org/hot100#&amp;gt;
PREFIX rdfs: &amp;lt;http://www.w3.org/2000/01/rdf-schema#&amp;gt; 

SELECT (COUNT(DISTINCT ?decade) AS ?decades) ?artistName WHERE { 
?recording a schema:MusicRecording ; 
             schema:byArtist/rdfs:label ?artistName ;
             h1:charted ?chartDate . 
   BIND (SUBSTR(str(?chartDate),1,3) AS ?decade)
}
GROUP BY ?artistName
HAVING (?decades &amp;gt; 4)
ORDER BY DESC(?decades)
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;The results:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;?decades,?artistName
7,&amp;#34;Elvis Presley&amp;#34;@en
6,&amp;#34;Frank Sinatra&amp;#34;@en
6,&amp;#34;Cher&amp;#34;@en
6,&amp;#34;Paul McCartney&amp;#34;@en
6,&amp;#34;Michael Jackson&amp;#34;@en
5,&amp;#34;Chuck Berry&amp;#34;@en
5,&amp;#34;Andy Williams&amp;#34;@en
5,&amp;#34;The Isley Brothers&amp;#34;@en
5,&amp;#34;Brenda Lee&amp;#34;@en
5,&amp;#34;The Beatles&amp;#34;@en
5,&amp;#34;The Rolling Stones&amp;#34;@en
5,&amp;#34;Stevie Wonder&amp;#34;@en
5,&amp;#34;Fleetwood Mac&amp;#34;@en
5,&amp;#34;Eagles&amp;#34;@en
5,&amp;#34;Prince&amp;#34;@en
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;In what decades did Little Richard have hits?&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;PREFIX rdfs:   &amp;lt;http://www.w3.org/2000/01/rdf-schema#&amp;gt;
PREFIX h1:     &amp;lt;http://rdfdata.org/hot100#&amp;gt;
PREFIX schema: &amp;lt;http://schema.org/&amp;gt;

SELECT DISTINCT ?decade WHERE {
  ?artist rdfs:label &amp;#34;Little Richard&amp;#34;@en . 
  ?s schema:byArtist ?artist ;
     h1:charted ?chartDate .
  BIND (CONCAT(SUBSTR(str(?chartDate),1,3),&amp;#39;*&amp;#39;) AS ?decade)
}
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;(The three-digit year values looked odd so I added asterisks to make them look more like years with wildcards.)&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;?decade
&amp;#34;195*&amp;#34;
&amp;#34;196*&amp;#34;
&amp;#34;197*&amp;#34;
&amp;#34;198*&amp;#34;
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;To be honest, everything I&amp;rsquo;ve done so far could be done with a relational database. I&amp;rsquo;ve been experimenting with ways to make this a real knowledge graph by adding additional data from Wikidata. I have some ideas for more interesting queries to make about the artists and their relationships to their hits.&lt;/p&gt;
&lt;hr/&gt;
&lt;p&gt;&lt;em&gt;Comments? Reply to  &lt;a href=&#34;https://twitter.com/bobdc/status/1782074668947582999&#34;&gt;my tweet&lt;/a&gt; (or even better, my &lt;a href=&#34;https://mas.to/@bobdc/112310006578986653&#34;&gt;Mastodon message&lt;/a&gt;) announcing this blog entry.&lt;/em&gt;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2024">2024</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
      <category domain="https://www.bobdc.com//categories/music">music</category>
      
    </item>
    
    <item>
      <title>Visualizing RDF</title>
      <link>https://www.bobdc.com/blog/visualizing-rdf/</link>
      <pubDate>Sun, 24 Mar 2024 04:22:00 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/visualizing-rdf/</guid>
      
      
      <description><div>I see nodes and edges...</div><div>&lt;p&gt;I recently did a review of options for creating visual representations of RDF data. I didn&amp;rsquo;t just want a general visualization tool, but something that understood RDF enough to represent class instances and literal values differently.  I will emphasize &lt;em&gt;instances&lt;/em&gt; because several tools out there can read RDF schema or ontologies and create a visualization of classes and their relationships and potential properties, but I want to see instances with their property values.&lt;/p&gt;
&lt;p&gt;My favorite ended up being the &lt;a href=&#34;https://rdfshape.weso.es/&#34;&gt;RDF Shape&lt;/a&gt; tool from the University of Oviedo&amp;rsquo;s WESO group in northern Spain. I also liked &lt;a href=&#34;https://www.ldf.fi/service/rdf-grapher&#34;&gt;RDF Grapher&lt;/a&gt; from the Linked Data Finland project. Both let me create SVG files that I can edit with the &lt;a href=&#34;https://inkscape.org/&#34;&gt;Inkscape&lt;/a&gt; editor if I don&amp;rsquo;t like their algorithmic layout of a particular dataset&amp;rsquo;s RDF graph nodes. Before I go into detail about that and demonstrate some Inkscape editing, I wanted to describe some of the research I did to get there.&lt;/p&gt;
&lt;p&gt;The &lt;a href=&#34;https://www.w3.org/RDF/Validator/&#34;&gt;W3C RDF Validation Service&lt;/a&gt; can create visualizations that you may recognize from W3C publications. It has been available for at least sixteen years, according to the &amp;ldquo;Last modified&amp;rdquo; date at the bottom of the page. It lets you paste some RDF into a field or enter the URL of an RDF dataset in another field, and then after you set the &amp;ldquo;Triples and/or Graph&amp;rdquo; field to include a Graph in its output,  clicking the Parse button generates the image.&lt;/p&gt;
&lt;p&gt;To test the various graph generation tools I used the &lt;a href=&#34;https://www.learningsparql.com/2ndeditionexamples/ex012.ttl&#34;&gt;ex012.ttl&lt;/a&gt; sample data from my book &lt;a href=&#34;https://www.learningsparql.com/&#34;&gt;Learning SPARQL&lt;/a&gt; and then added a few schema.org &lt;a href=&#34;https://schema.org/follows&#34;&gt;follows&lt;/a&gt; triples to connect up the three people in the sample data:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-turtle&#34; data-lang=&#34;turtle&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;d:&lt;span style=&#34;color:#f92672&#34;&gt;i0432&lt;/span&gt; schema:&lt;span style=&#34;color:#f92672&#34;&gt;follows&lt;/span&gt; d:&lt;span style=&#34;color:#f92672&#34;&gt;i9771&lt;/span&gt;. 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;d:&lt;span style=&#34;color:#f92672&#34;&gt;i8301&lt;/span&gt; schema:&lt;span style=&#34;color:#f92672&#34;&gt;follows&lt;/span&gt; d:&lt;span style=&#34;color:#f92672&#34;&gt;i0432&lt;/span&gt;, d:&lt;span style=&#34;color:#f92672&#34;&gt;i9771&lt;/span&gt;. 
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The W3C RDF Validation Service created this image from that data.&lt;/p&gt;
&lt;img src=&#34;https://www.bobdc.com/img/main/w3cvalidator.svg&#34; border=&#34;0&#34;  alt=&#34;RDF graph image generated by W3C Validator&#34;/&gt;
&lt;p&gt;The text labels are small enough to be illegible. As we&amp;rsquo;ll see, if a tool can generate SVG, like the W3C RDF Validation Service  can, editing the image with an SVG editing tool  might make it narrower so that it displays the labels at a more readable size (&amp;ldquo;Scalable Vector Graphics!&amp;rdquo;). For this image, though, that&amp;rsquo;s just too much editing.&lt;/p&gt;
&lt;p&gt;An image like this is pretty common with the W3C RDF Validation Service. Also, the input must be RDF/XML, so that was another reason to look for new alternatives.&lt;/p&gt;
&lt;p&gt;A &lt;a href=&#34;https://stackoverflow.com/questions/66720/are-there-any-tools-to-visualize-a-rdf-graph-please-include-a-screenshot&#34;&gt;Stack Overflow&lt;/a&gt; discussion provided a good starting place for research into alternatives. Some alternatives were more focused on visualizing schema and ontology classes, as I mentioned above, and others were general-purpose visualization tools that had an RDF plugin available that may or may not be up to date with the latest version of the visualization tool.&lt;/p&gt;
&lt;p&gt;This list is where I found out about WESO &lt;a href=&#34;https://rdfshape.weso.es/&#34;&gt;RDF Shape&lt;/a&gt;.  To learn more about that project, see its &lt;a href=&#34;https://rdfshape.weso.es/about&#34;&gt;About&lt;/a&gt; page.&lt;/p&gt;
&lt;p&gt;The &amp;ldquo;Data analysis and visualization&amp;rdquo; link on the RDF Shape page leads to the &lt;a href=&#34;https://rdfshape.weso.es/dataInfo&#34;&gt;Data analysis&lt;/a&gt; form where you can paste some RDF in just about any serialization, click the blue Analyze button, and then click the Visualizations tab that appears with the result.&lt;/p&gt;
&lt;p&gt;RDF Shape did this with the data that I used to make the previous image:&lt;/p&gt;
&lt;img src=&#34;https://www.bobdc.com/img/main/rdfshape-weso-es.svg&#34; border=&#34;0&#34;  alt=&#34;RDF graph image generated by RDF Shape&#34;/&gt;
&lt;p&gt;I don&amp;rsquo;t love the yellow. I could edit each individual square with the Inkscape editor and change their color, but because it&amp;rsquo;s SVG, it&amp;rsquo;s XML, which means that I could edit that directly with a text editor. I globally replaced the &lt;code&gt;polygon/@fill&lt;/code&gt; values of &amp;ldquo;#ffff00&amp;rdquo; with &amp;ldquo;#8aeaea&amp;rdquo; and then the graph looked like this:&lt;/p&gt;
&lt;img src=&#34;https://www.bobdc.com/img/main/rdfshape-weso-es-lightblue.svg&#34; border=&#34;0&#34;  alt=&#34;RDF graph image generated by RDF Shape, after global replace of color value&#34;/&gt;
&lt;p&gt;If you prefer a certain style of font, rectangle fill colors, or oval and rectangle outline colors, a little XSLT or even perl could turn the default RDF Shape SVG into whatever you like with similar replacements.&lt;/p&gt;
&lt;p&gt;Using Inkscape for more hands-on editing, I moved a few shapes and arrows in that last image to make the image narrower so that it (and especially its text) can be displayed bigger, which makes the whole thing easier to read:&lt;/p&gt;
&lt;img src=&#34;https://www.bobdc.com/img/main/rdfshape-weso-es-lightblue-edited.svg&#34; border=&#34;0&#34;  alt=&#34;RDF Shape image after some Inkscape&#34;/&gt;
&lt;p&gt;To learn how to do these kinds of edits with Inkscape, I started with their &lt;a href=&#34;https://inkscape.org/doc/tutorials/basic/tutorial-basic.html&#34;&gt;Basic tutorial&lt;/a&gt; and then skipped around in the sections of their &lt;a href=&#34;https://inkscape-manuals.readthedocs.io/en/latest/index.html&#34;&gt;Beginners&amp;rsquo; Guide&lt;/a&gt;. For editing the lines with the arrows, the section &lt;a href=&#34;https://inkscape-manuals.readthedocs.io/en/latest/editing-paths.html&#34;&gt;Editing Paths with the Node Tool&lt;/a&gt; was helpful.&lt;/p&gt;
&lt;p&gt;I doubt that I know 5% of what Inkscape can do. Instead of writing out the important parts that I learned so that I could make the edits mentioned above, I just made a two-minute demo video:&lt;/p&gt;

&lt;div style=&#34;position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;&#34;&gt;
  &lt;iframe src=&#34;https://www.youtube.com/embed/7RFkdIVCxc4&#34; style=&#34;position: absolute; top: 0; left: 0; width: 100%; height: 100%; border:0;&#34; allowfullscreen title=&#34;YouTube Video&#34;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;p&gt;I mentioned that in addition to the WESO RDF Shape tool, I also liked the RDF Grapher tool. Here is the RDF Grapher version of the same data as an SVG image:&lt;/p&gt;
&lt;img src=&#34;https://www.bobdc.com/img/main/rdf-grapher.svg&#34; border=&#34;0&#34;  alt=&#34;RDF Grapher version of same data&#34;/&gt;
&lt;p&gt;Overall it&amp;rsquo;s similar to the RDF Shape version, and you have similar options for editing its SVG XML directly (for example, those five lines of text at the bottom were easy to find in the SVG XML and delete) or using Inkscape like I did in the video.&lt;/p&gt;
&lt;p&gt;Have you found any RDF visualization tools that you really like?&lt;/p&gt;
&lt;hr/&gt;
&lt;p&gt;&lt;em&gt;Comments? Reply to  &lt;a href=&#34;https://twitter.com/bobdc/status/1771930841683304891&#34;&gt;my tweet&lt;/a&gt; (or even better, my &lt;a href=&#34;https://mas.to/@bobdc/112151511341344755&#34;&gt;Mastodon message&lt;/a&gt;) announcing this blog entry.&lt;/em&gt;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2024">2024</category>
      
      <category domain="https://www.bobdc.com//categories/rdf">RDF</category>
      
    </item>
    
    <item>
      <title>Using regular expressions to manipulate data in a SPARQL query</title>
      <link>https://www.bobdc.com/blog/regex/</link>
      <pubDate>Sun, 25 Feb 2024 10:58:00 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/regex/</guid>
      
      
      <description><div>A pure, standards-compliant SPARQL query.</div><div>&lt;p&gt;&lt;a href=&#39;https://xkcd.com/208/&#39;&gt;&lt;img src=&#34;https://www.bobdc.com/img/main/xkcdregex.png&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;xkcd frame&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I have often lamented that SPARQL&amp;rsquo;s &lt;a href=&#34;https://www.w3.org/TR/sparql11-query/#func-regex&#34;&gt;REGEX&lt;/a&gt; function only returned a boolean value. It&amp;rsquo;s handy in &lt;code&gt;FILTER&lt;/code&gt; tests because it lets you use &lt;a href=&#34;https://en.wikipedia.org/wiki/Regular_expression&#34;&gt;regular expressions&lt;/a&gt; to create more complex conditions about the results that you do or don&amp;rsquo;t want returned by a query, but instead of just returning True or False I wished that it would let me grab the pieces of a string that match the regular expression pattern and recombine them into new values, like I can with the regular expression support of most programming languages.&lt;/p&gt;
&lt;p&gt;I only recently noticed that SPARQL&amp;rsquo;s &lt;a href=&#34;https://www.w3.org/TR/sparql11-query/#func-replace&#34;&gt;&lt;code&gt;REPLACE&lt;/code&gt;&lt;/a&gt; function, which comes right after &lt;code&gt;REGEX&lt;/code&gt; in the SPARQL query specification, supports regular expressions, so I can do this regex string manipulation in SPARQL after all.&lt;/p&gt;
&lt;p&gt;One of those other languages is JavaScript. In &lt;a href=&#34;https://www.bobdc.com/blog/arqjavascript/&#34;&gt;Calling your own JavaScript functions from SPARQL queries&lt;/a&gt; I showed how once you write a JavaScript function that does some regex string manipulation, you can then call that function from a SPARQL query being executed with &lt;a href=&#34;https://jena.apache.org/documentation/query/&#34;&gt;Jena ARQ&lt;/a&gt;. (Soon I&amp;rsquo;ll be showing how to do that with GraphDB on the &lt;a href=&#34;https://www.ontotext.com/blog/&#34;&gt;Ontotext blog&lt;/a&gt;.) The demo in my earlier blog entry used a regular expression in a JavaScript function to normalize some U.S. phone numbers.&lt;/p&gt;
&lt;p&gt;The SPARQL query below demonstrates why I didn&amp;rsquo;t need to call those JavaScript functions. Using SPARQL&amp;rsquo;s &lt;code&gt;REPLACE&lt;/code&gt; function and the same input data as that demo, I can normalize the same phone numbers using nothing but pure W3C-compliant SPARQL.&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;PREFIX rdfs: &amp;lt;http://www.w3.org/2000/01/rdf-schema#&amp;gt;
PREFIX v:    &amp;lt;http://www.w3.org/2006/vcard/ns#&amp;gt;

SELECT ?name ?phoneNum ?fixedPhone
WHERE {
    ?s v:given-name ?name ;
  v:homeTel ?phoneNum .
  BIND (replace(?phoneNum,&amp;#34;.*(\\d\\d\\d).*(\\d\\d\\d).*(\\d\\d\\d\\d).*&amp;#34;,
                &amp;#34;$1-$2-$3&amp;#34;) AS ?fixedPhone)
}
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;The regular expression in the &lt;code&gt;replace()&lt;/code&gt; function call&amp;rsquo;s second argument looks for two three-digit sequences and then a four-digit sequence, ignoring everything before, after, or in between. Then it returns the found strings separated by hyphens.&lt;/p&gt;
&lt;p&gt;Here is the sample data from that earlier blog entry; note the different punctuation and spacing used with the four phone numbers:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;@prefix v: &amp;lt;http://www.w3.org/2006/vcard/ns#&amp;gt; .
@prefix d: &amp;lt;http://learningsparql.com/ns/data#&amp;gt; .

d:i9771 v:given-name &amp;#34;Cindy&amp;#34; ;
        v:homeTel &amp;#34;1 (203) 446-5478&amp;#34; .

d:i0432 v:given-name &amp;#34;Richard&amp;#34; ;
        v:homeTel &amp;#34;   (729)556-5135   &amp;#34; .

d:i8301 v:given-name &amp;#34;Craig&amp;#34; ;
        v:homeTel &amp;#34;9232765135&amp;#34; .

d:i8309 v:given-name &amp;#34;Leigh&amp;#34; ;
        v:homeTel &amp;#34;843-5544&amp;#34; .
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;The result after running the query above with this data shows the phone numbers from the data and the results of the  &lt;code&gt;replace()&lt;/code&gt; calls:&lt;/p&gt;
&lt;style type=&#34;text/css&#34;&gt;
 tr   { font-weight: bold; }
 td   { font-weight: normal ! important; text-align: left; }
&lt;/style&gt;
&lt;table&gt;
&lt;tr&gt;&lt;th&gt;name&lt;/th&gt;
&lt;th&gt;phoneNum&lt;/th&gt;
&lt;th&gt;fixedPhone&lt;/th&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;Craig&lt;/td&gt;
&lt;td&gt;9232765135&lt;/td&gt;
&lt;td&gt;923-276-5135&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;Leigh&lt;/td&gt;
&lt;td&gt;843-5544&lt;/td&gt;
&lt;td&gt;843-5544&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;Richard&lt;/td&gt;
&lt;td&gt;   (729)556-5135   &lt;/td&gt;
&lt;td&gt;729-556-5135&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;Cindy&lt;/td&gt;
&lt;td&gt;1 (203) 446-5478&lt;/td&gt;
&lt;td&gt;203-446-5478&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;
&lt;p&gt;As the SPARQL query spec tells us, this function corresponds to the XPath &lt;a href=&#34;https://www.w3.org/TR/xpath-functions/#func-replace&#34;&gt;&lt;code&gt;fn:replace&lt;/code&gt;&lt;/a&gt; function. That leads to more documentation, which points to a separate &lt;a href=&#34;https://www.w3.org/TR/xpath-functions/#regex-syntax&#34;&gt;Regular expression syntax&lt;/a&gt; section that lists available &lt;a href=&#34;https://www.w3.org/TR/xpath-functions/#flags&#34;&gt;flags&lt;/a&gt; such as &lt;code&gt;i&lt;/code&gt; for case-insensitive matching and &lt;code&gt;m&lt;/code&gt; for multiline matching.&lt;/p&gt;
&lt;p&gt;Those links ultimately lead to an &lt;a href=&#34;https://www.w3.org/TR/xmlschema-2/#nt-WildcardEsc&#34;&gt;escape character table&lt;/a&gt; in the XML Schema Part 2 specification. This table tells us the typical regular expression codes—for example, that &lt;code&gt;\s&lt;/code&gt; matches white space characters and &lt;code&gt;\d&lt;/code&gt; matches a numeric digit. Note that when I used the &lt;code&gt;\d&lt;/code&gt; codes in the SPARQL query above they&amp;rsquo;re in a quoted string, so the backslash itself needed escaping; that&amp;rsquo;s why you see two backslashes before each &lt;code&gt;d&lt;/code&gt; in my query&amp;rsquo;s regular expression.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;REPLACE&lt;/code&gt; function&amp;rsquo;s ability to find substrings and delete or rearrange them in RDF literal data should be very handy for data cleanup and enhancement. I&amp;rsquo;m sorry I didn&amp;rsquo;t notice it before!&lt;/p&gt;
&lt;hr/&gt;
&lt;p&gt;&lt;em&gt;Comments? Reply to  &lt;a href=&#34;https://twitter.com/bobdc/status/1761785944796275069&#34;&gt;my tweet&lt;/a&gt; (or even better, my &lt;a href=&#34;https://mas.to/@bobdc/111992997650896225&#34;&gt;Mastodon message&lt;/a&gt;) announcing this blog entry.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Excerpt from &lt;a href=&#34;https://xkcd.com/208/&#34;&gt;xkcd comic&lt;/a&gt; by Randall Monroe, &lt;a href=&#34;https://creativecommons.org/licenses/by-nc/2.5/&#34;&gt;CC BY-NC 2.5 DEED&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2024">2024</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
    </item>
    
    <item>
      <title>Appreciating the SPARQL property path slash character more</title>
      <link>https://www.bobdc.com/blog/slashnote/</link>
      <pubDate>Sun, 21 Jan 2024 10:25:00 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/slashnote/</guid>
      
      
      <description><div>Querying for labels and more.</div><div>&lt;img src=&#34;https://www.bobdc.com/img/main/slash.png&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; width=&#34;220&#34; alt=&#34;Slash with natural Les Paul&#34;/&gt;
&lt;p&gt;I&amp;rsquo;ve understood SPARQL&amp;rsquo;s &lt;a href=&#34;https://www.w3.org/TR/sparql11-query/#propertypaths&#34;&gt;property path&lt;/a&gt; features well enough to demo them in the &amp;ldquo;Searching Further in the Data&amp;rdquo; section of my book &lt;a href=&#34;http://www.learningsparql.com/&#34;&gt;Learning SPARQL&lt;/a&gt;. (See &lt;a href=&#34;http://www.learningsparql.com/2ndeditionexamples/&#34;&gt;example files&lt;/a&gt; ex074 - ex085.) To be honest, I have very rarely used them in actual queries that I&amp;rsquo;ve written. I&amp;rsquo;ve only just realized how the property path slash operator can help with a pattern that I have used in a large percentage of my queries. It makes these queries more concise and removes at least one variable that would not have been in my &lt;code&gt;SELECT&lt;/code&gt; statement anyway.&lt;/p&gt;
&lt;p&gt;As an example, here is some very simple data about three people and who follows who on social media:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;@prefix schema: &amp;lt;http://schema.org&amp;gt; .
@prefix d:  &amp;lt;http://learningsparql.com/ns/data#&amp;gt; .

d:i0432 d:name &amp;#34;Richard Mutt&amp;#34; . 
d:i9771 d:name &amp;#34;Cindy Marshall&amp;#34; . 
d:i8301 d:name &amp;#34;Craig Ellis&amp;#34; . 

@prefix schema: &amp;lt;http://schema.org/&amp;gt; .
d:i0432 schema:follows d:i9771, d:i8301. 
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;If I want to list who Richard follows, I want to list their actual names, not their URIs. This would be an obvious query to do that:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;PREFIX d:      &amp;lt;http://learningsparql.com/ns/data#&amp;gt;
PREFIX schema: &amp;lt;http://schema.org/&amp;gt; 

SELECT ?name WHERE {
  
  ?follower d:name &amp;#34;Richard Mutt&amp;#34; ;
            schema:follows ?person .
  
  ?person d:name ?name .
}
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;It finds the URIs of the people that Richard follows, stores them in the &lt;code&gt;?person&lt;/code&gt; variable, and then finds the &lt;code&gt;d:name&lt;/code&gt; value of each of those people. Having a query find resources that meet a certain condition and then using another triple pattern to get the human-readable names of those resources (and then using those names in the &lt;code&gt;SELECT&lt;/code&gt; statement) is extremely common in SPARQL.&lt;/p&gt;
&lt;p&gt;The property path slash character lets me do the same thing with no need for the &lt;code&gt;?person&lt;/code&gt; variable in the previous query. This next query asks, for each resource that Richard follows, what their name is:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;PREFIX d:      &amp;lt;http://learningsparql.com/ns/data#&amp;gt;
PREFIX schema: &amp;lt;http://schema.org/&amp;gt; 

SELECT ?name WHERE {
  ?follower d:name &amp;#34;Richard Mutt&amp;#34; ;
            # For each followed resource, what is its name?
            schema:follows/d:name ?name . 
}
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;In graph terms, we store the URI of Richard Mutt&amp;rsquo;s node in the &lt;code&gt;?follower&lt;/code&gt; variable, then traverse &lt;code&gt;schema:follows&lt;/code&gt; graph edges to any nodes that then have a &lt;code&gt;d:name&lt;/code&gt; edge, and then we store each value that the &lt;code&gt;d:name&lt;/code&gt; edge leads to in the &lt;code&gt;?name&lt;/code&gt; variable.&lt;/p&gt;
&lt;p&gt;I don&amp;rsquo;t think that it&amp;rsquo;s intuitively very readable, which is why I added the comment in the query, but perhaps as I use this more I will get used to it. (Note also that the comment doesn&amp;rsquo;t ask &amp;ldquo;What is the name of each followed resource?&amp;rdquo;; I wanted it to reflect the syntax it describes a little more closely.)&lt;/p&gt;
&lt;p&gt;This is such a common pattern that I wanted to show some examples from more real-life contexts. The following query asks Wikidata for the names of the members of Daft Punk. It does this by storing the URI representing each member of the group in the &lt;code&gt;?member&lt;/code&gt; variable, and it then asks for the &lt;code&gt;rdfs:label&lt;/code&gt; value of each, filtered to only show the English representation. (You can &lt;a href=&#34;https://query.wikidata.org/#PREFIX%20wd%3A%20%3Chttp%3A%2F%2Fwww.wikidata.org%2Fentity%2F%3E%0A%0ASELECT%20%3Fname%20WHERE%20%7B%0A%0A%20%20wd%3AQ185828%20wdt%3AP527%20%3Fmember%20.%20%0A%20%20%3Fmember%20rdfs%3Alabel%20%3Fname%20.%20%0A%20%20FILTER%28lang%28%3Fname%29%20%3D%20%22en%22%29%0A%7D%0A%0A&#34;&gt;execute this query with the Wikidata Query Service&lt;/a&gt; yourself.)&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;PREFIX wd: &amp;lt;http://www.wikidata.org/entity/&amp;gt;

SELECT ?name WHERE {
  wd:Q185828 wdt:P527 ?member . 
  ?member rdfs:label ?name . 
  FILTER(lang(?name) = &amp;#34;en&amp;#34;)
}
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;But, we don&amp;rsquo;t need that &lt;code&gt;?member&lt;/code&gt; variable and second triple pattern! We can just do this:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;PREFIX wd: &amp;lt;http://www.wikidata.org/entity/&amp;gt;

SELECT ?name WHERE {
# For each member of Daft Punk, what is their name? 
  wd:Q185828 wdt:P527/rdfs:label ?name . 
  FILTER(lang(?name) = &amp;#34;en&amp;#34;)
}
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;&lt;a href=&#34;https://query.wikidata.org/#PREFIX%20wd%3A%20%3Chttp%3A%2F%2Fwww.wikidata.org%2Fentity%2F%3E%0A%0ASELECT%20%3Fname%20WHERE%20%7B%0A%20%20wd%3AQ185828%20wdt%3AP527%2Frdfs%3Alabel%20%3Fname%20.%20%0A%20%20FILTER%28lang%28%3Fname%29%20%3D%20%22en%22%29%0A%7D%0A%0A&#34;&gt;Run this second query&lt;/a&gt; and you will see the same results as the query before it.&lt;/p&gt;
&lt;p&gt;I could do this with something besides names, such as their &lt;a href=&#34;https://query.wikidata.org/#PREFIX%20wd%3A%20%3Chttp%3A%2F%2Fwww.wikidata.org%2Fentity%2F%3E%0A%0ASELECT%20%2a%20WHERE%20%7B%0A%20%20wd%3AQ185828%20wdt%3AP527%2Fwdt%3AP569%20%3FbirthDate%20.%20%0A%7D%0A%0A&#34;&gt;birth dates&lt;/a&gt;, but a list of dates with no context about what resources they describe isn&amp;rsquo;t very helpful. (Using it for names also just happens to build on a theme of recent entries in my blog, &lt;a href=&#34;../rdflabels&#34;&gt;Human-readable names in RDF&lt;/a&gt; and &lt;a href=&#34;../wikibaselabel&#34;&gt;Querying for labels&lt;/a&gt;.)&lt;/p&gt;
&lt;p&gt;As another example, I was going to create a query for the Rhizome Artbase SPARQL endpoint that I wrote about in &lt;a href=&#34;../snowmanartbasept1/&#34;&gt;Generating websites with SPARQL and Snowman, part 1&lt;/a&gt;. Then, I realized that I could use a query that was already in that blog entry, which you can &lt;a href=&#34;https://query.artbase.rhizome.org/#PREFIX%20rt%3A%20%3Chttps%3A%2F%2Fartbase.rhizome.org%2Fprop%2Fdirect%2F%3E%0ASELECT%20DISTINCT%20%3FartistName%20WHERE%20%7B%0A%20%20%3Fartwork%20rt%3AP29%20%3Fartist%20.%20%0A%20%20%3Fartist%20rdfs%3Alabel%20%3FartistName%20.%0A%7D%0AORDER%20BY%20%28%3FartistName%29%0ALIMIT%20250&#34;&gt;run yourself&lt;/a&gt;:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;PREFIX rt: &amp;lt;https://artbase.rhizome.org/prop/direct/&amp;gt;

SELECT DISTINCT ?artistName WHERE {
  ?artwork rt:P29 ?artist . 
  ?artist rdfs:label ?artistName .
}
ORDER BY (?artistName)
LIMIT 250
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;This time, we&amp;rsquo;ll remove the &lt;code&gt;?artist&lt;/code&gt; variable from the end of the first triple pattern and the beginning of the second and create a property path out of &lt;code&gt;rt:P29&lt;/code&gt; and &lt;code&gt;rdfs:label&lt;/code&gt;:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;PREFIX rt: &amp;lt;https://artbase.rhizome.org/prop/direct/&amp;gt;

SELECT DISTINCT ?artistName WHERE {
  ?artwork rt:P29/rdfs:label ?artistName .
}
ORDER BY (?artistName)
LIMIT 250
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;&lt;a href=&#34;https://query.artbase.rhizome.org/#PREFIX%20rt%3A%20%3Chttps%3A%2F%2Fartbase.rhizome.org%2Fprop%2Fdirect%2F%3E%0A%0ASELECT%20DISTINCT%20%3FartistName%20WHERE%20%7B%0A%20%20%3Fartwork%20rt%3AP29%2Frdfs%3Alabel%20%3FartistName%20.%0A%7D%0AORDER%20BY%20%28%3FartistName%29%0ALIMIT%20250&#34;&gt;Run this one&lt;/a&gt; and you&amp;rsquo;ll see the same result as the previous query.&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;PREFIX rt: &amp;lt;https://artbase.rhizome.org/prop/direct/&amp;gt;

SELECT * WHERE {
  ?artist rdfs:label &amp;#34;Jessica Gomula&amp;#34;@en . 
  ?artwork rt:P29 ?artist .
  ?artwork rdfs:label ?name . 
}
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Has anyone else found a particular property path pattern to be worth using in a high percentage of their SPARQL queries?&lt;/p&gt;
&lt;!--
This works at https://query.artbase.rhizome.org/ but is a bit too complicated to demonstrate the point here.

The issue is that instead of &#34;artist created artwork&#34; triples they have &#34;artwork created by artist&#34; triples. 

PREFIX rt: &lt;https://artbase.rhizome.org/prop/direct/&gt;

SELECT ?artwork WHERE {
  ?artist rdfs:label &#34;Jessica Gomula&#34;@en . 
  ?artist ^rt:P29/rdfs:label ?artwork .
}
--&gt;
&lt;hr/&gt;
&lt;p&gt;&lt;em&gt;Comments? Reply to  &lt;a href=&#34;https://twitter.com/bobdc/status/1749093134745633154&#34;&gt;my tweet&lt;/a&gt; (or even better, my &lt;a href=&#34;https://mas.to/@bobdc/111794672776689311&#34;&gt;Mastodon message&lt;/a&gt;) announcing this blog entry.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://creativecommons.org/licenses/by-sa/2.0/&#34;&gt;CC BY-SA 2.0&lt;/a&gt; &lt;a href=&#34;https://flickr.com/photos/dgoomany/7183123425&#34;&gt;photo&lt;/a&gt; by &lt;a href=&#34;https://flickr.com/photos/dgoomany/&#34;&gt;Dineshraj Goomany&lt;/a&gt;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2024">2024</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
    </item>
    
    <item>
      <title>Triples about existing triples</title>
      <link>https://www.bobdc.com/blog/etriplesabout/</link>
      <pubDate>Sun, 17 Dec 2023 10:35:00 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/etriplesabout/</guid>
      
      
      <description><div>The easy way and the hard way.</div><div>&lt;img id=&#34;idm45504699000944&#34; src=&#34;https://www.bobdc.com/img/main/rdfrdf.png&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; width=&#34;160&#34; alt=&#34;triple within a triple&#34;/&gt;
&lt;h1 id=&#34;triples-about-existing-triples&#34;&gt;Triples about existing triples&lt;/h1&gt;
&lt;p&gt;Several years ago in the blog post &lt;a href=&#34;../rdf-and-sparql&#34;&gt;RDF* and SPARQL*&lt;/a&gt; I described how I had played with implementations of the new reification syntax that Olaf Hartig and Bryan Thompson proposed in their paper &lt;a href=&#34;https://arxiv.org/pdf/1406.3399.pdf&#34;&gt;Foundations of an Alternative Approach to Reification in RDF&lt;/a&gt;. I found the new syntax to be straightforward and useful. As you can see from the recent W3C Community Group Report &lt;a href=&#34;https://w3c.github.io/rdf-star/cg-spec&#34;&gt;RDF-star and SPARQL-star&lt;/a&gt;, this syntax has progressed—with a more search-engine-friendly spelling of the spec&amp;rsquo;s name—closer to W3C standardization. (You&amp;rsquo;ll also see me listed as an author of that specification; I  merely did a pull request that  revised the tutorial from an earlier draft, so I was honored to be co-credited on that document.)&lt;/p&gt;
&lt;p&gt;Because of the advancing specification, the wider implementation, and some potential syntax trickiness for situations that I would consider to be edge cases, I wanted to first review the current syntax that I feel will be the most popular and then review the potentially tricky part that I think most people can ignore. (I realized that the second part of my subtitle of &amp;ldquo;The easy way and the hard way&amp;rdquo; could imply the &lt;a href=&#34;https://www.w3.org/wiki/RdfReification&#34;&gt;original reification syntax&lt;/a&gt; from years ago, but I think we can all put that behind us.)&lt;/p&gt;
&lt;h2 id=&#34;the-simple-way-annotation-syntax&#34;&gt;The simple way: annotation syntax&lt;/h2&gt;
&lt;p&gt;The simple way is called annotation syntax, which as far as I know did not exist yet when I did my earlier experiments with RDF-Star and SPARQL-Star. Using the  &lt;a href=&#34;https://w3c.github.io/rdf-star/cg-spec/2021-12-17.html#turtle-star&#34;&gt;Turtle-Star&lt;/a&gt; syntax, if you have a triple that expresses a statement and you want to record other triples about that triple in annotation syntax, you put them after it inside of &lt;code&gt;{|&lt;/code&gt; and &lt;code&gt;|}&lt;/code&gt; delimiters.&lt;/p&gt;
&lt;p&gt;Here is the example from that earlier blog entry expressed in annotation syntax. It has three triples that I got from Olaf&amp;rsquo;s slides that the blog entry linked to:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;One triple saying that (Stanley) Kubrick was influenced by (Orson) Welles.&lt;/li&gt;
&lt;li&gt;Another saying that triple 1 has a significance of 0.8.&lt;/li&gt;
&lt;li&gt;A third one saying that triple 1 has its source at a URL at &lt;code&gt;nofilmschool.com&lt;/code&gt;.&lt;/li&gt;
&lt;/ol&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;@prefix d: &amp;lt;http://www.learningsparql.com/ns/data/&amp;gt; .

d:Kubrick d:influencedBy d:Welles {| 
   d:significance 0.8 ;
   d:source &amp;lt;https://nofilmschool.com/2013/08/films-directors-that-influenced-stanley-kubrick&amp;gt;
|} .
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Using Apache Jena arq or the free version of Ontotext&amp;rsquo;s GraphDB, a &lt;code&gt;SELECT * WHERE {?s ?p ?o}&lt;/code&gt; query to get all the triples in that block of Turtle-Star retrieves this:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;?s                                      ?p             ?o
--------------------------------------- -------------- ------------------
d:Kubrick                               d:influencedBy d:Welles .
&amp;lt;&amp;lt; d:Kubrick d:influencedBy d:Welles &amp;gt;&amp;gt; d:significance &amp;#34;0.8&amp;#34;^^xsd:decimal . 
&amp;lt;&amp;lt; d:Kubrick d:influencedBy d:Welles &amp;gt;&amp;gt; d:source https://nofilmschool.com/2013/08/films-directors-that-influenced-stanley-kubrick . 
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;It&amp;rsquo;s the three triples from the numbered list above.&lt;/p&gt;
&lt;p&gt;To understand better what this syntax adds, here is the sample data from my earlier blog entry on this topic:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;@prefix d: &amp;lt;http://www.learningsparql.com/ns/data/&amp;gt; .
&amp;lt;&amp;lt;d:Kubrick d:influencedBy d:Welles&amp;gt;&amp;gt; d:significance 0.8 ;
      d:source &amp;lt;https://nofilmschool.com/2013/08/films-directors-that-influenced-stanley-kubrick&amp;gt; .
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;The same query on this data will show the second and third result rows above but not the first one. In other words, this data doesn&amp;rsquo;t actually say that Kubrick was influenced by Welles; it only has metadata about this statement.&lt;/p&gt;
&lt;p&gt;When you use annotation syntax in a SPARQL query, you&amp;rsquo;re using SPARQL-Star. To let me make my next SPARQL-Star query a little more interesting, I added the following data to the triples above:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;
d:Scorsese d:influencedBy d:Rosselini {| 
   d:significance 0.9 ;
   d:source &amp;lt;https://en.wikipedia.org/wiki/Martin_Scorsese&amp;gt;
|} .

d:Tarantino d:influencedBy d:Scorsese .
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;For which director influence triples do we have annotations about the significance of that influence?&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;PREFIX d: &amp;lt;http://www.learningsparql.com/ns/data/&amp;gt; 

SELECT ?director
WHERE {
  
  ?director d:influencedBy ?o {|
      d:significance ?significanceScore
  |} .

}
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;The result:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;
?director
-----------
d:Scorsese
d:Kubrick
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;It&amp;rsquo;s all pretty simple, until we get to&amp;hellip;&lt;/p&gt;
&lt;h2 id=&#34;quoted-and-asserted-triples&#34;&gt;Quoted and asserted triples&lt;/h2&gt;
&lt;p&gt;The original proposal that I mentioned in the first paragraph above did not mention the concepts of quoted or asserted triples until its authors later added a &amp;ldquo;This document has become obsolete&amp;rdquo; paragraph at the top. In the &lt;a href=&#34;https://www.w3.org/2021/12/rdf-star.html&#34;&gt;latest version of the specification&lt;/a&gt;, the first subsection of the &lt;a href=&#34;https://www.w3.org/2021/12/rdf-star.html#concepts&#34;&gt;Concepts and Abstract Syntax&lt;/a&gt; section is titled &lt;a href=&#34;https://www.w3.org/2021/12/rdf-star.html#quoted-triples&#34;&gt;Quoted and Asserted Triples&lt;/a&gt; and includes this:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;A quoted triple is a triple used as the subject or object of another triple. Quoted triples can also be called &amp;ldquo;embedded triples&amp;rdquo;.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;in RDF 1.1, an asserted triple is an element of the set of triples that make up an RDF graph. RDF-star does not change this except that an RDF-star triple can contain quoted triples. A triple can be used as an asserted triple, a quoted triple, or both, in a given graph.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This tells me that the regular triples that we&amp;rsquo;ve been using all along are now known as asserted triples, and the new kind—the kind that can be used as the subject or object of another triple—are known as quoted or embedded triples.  (I did enjoy this quote from the Community Group Report after it used a &lt;a href=&#34;https://en.wikipedia.org/wiki/Lisp_(programming_language)&#34;&gt;Lisp&lt;/a&gt; analogy to explain the difference between asserted and quoted triples: &amp;ldquo;Obviously this way of thinking is helpful only if you understand how Lisp works&amp;rdquo;.)&lt;/p&gt;
&lt;p&gt;Here is an example. The following Turtle translated to plain English says &amp;ldquo;Sam said that the earth is flat&amp;rdquo;.&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;@prefix d: &amp;lt;http://learningsparql.com/ns/data#&amp;gt; .

d:sam d:said &amp;lt;&amp;lt; d:earth d:shape &amp;#34;flat&amp;#34; &amp;gt;&amp;gt; .
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;It does this with:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;An asserted triple that tells us that Sam said something.&lt;/li&gt;
&lt;li&gt;A quoted triple that tells us what he said: that the earth is flat.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;That second one is a quoted triple because it&amp;rsquo;s used as the object of the first triple, and the &lt;code&gt;&amp;lt;&amp;lt;&lt;/code&gt; &lt;code&gt;&amp;gt;&amp;gt;&lt;/code&gt; delimiters show us that it&amp;rsquo;s a quoted triple.&lt;/p&gt;
&lt;p&gt;If I do a &lt;code&gt;SELECT * WHERE {?s ?p ?o}&lt;/code&gt; query on this data to get of that examples&amp;rsquo; triples, this is all I will see:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;--------------------------------------------------------
| s       | p        | o                                |
========================================================
| d:sam   | d:said   | &amp;lt;&amp;lt; d:earth d:shape &amp;#34;flat&amp;#34; &amp;gt;&amp;gt;     |
--------------------------------------------------------
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;What if I do a query asking for triples about the earth&amp;rsquo;s shape, like this?&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;PREFIX d: &amp;lt;http://learningsparql.com/ns/data#&amp;gt;

SELECT *
WHERE {
  d:earth d:shape ?earthShape
}
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;I won&amp;rsquo;t get any response. That data has no &lt;em&gt;asserted&lt;/em&gt; triples about the earth&amp;rsquo;s shape.&lt;/p&gt;
&lt;p&gt;If I wanted this earth-is-flat triple to be both an asserted triple and a quoted triple, I can record it as both:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;@prefix d: &amp;lt;http://learningsparql.com/ns/data#&amp;gt; .

d:sam d:said &amp;lt;&amp;lt; d:earth d:shape &amp;#34;flat&amp;#34; &amp;gt;&amp;gt; .
d:earth d:shape &amp;#34;flat&amp;#34; . 
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;&lt;a href=&#34;https://www.w3.org/2021/12/rdf-star.html#example-7&#34;&gt;Example 7&lt;/a&gt; in the Community Group Report also demonstrates this.&lt;/p&gt;
&lt;p&gt;I don&amp;rsquo;t like this redundancy because maintaining a thing and a separate copy of the thing is usually a bad idea. If you edit one, then maybe you do or don&amp;rsquo;t need to make the same edit to the other, and maintenance gets messy. That&amp;rsquo;s why it was nice to see that the first section of the Community Group Report&amp;rsquo;s &lt;a href=&#34;https://www.w3.org/2021/12/rdf-star.html#concrete-syntaxes&#34;&gt;Concrete Syntaxes&lt;/a&gt; section is &lt;a href=&#34;https://www.w3.org/2021/12/rdf-star.html#annotation-syntax&#34;&gt;Annotation Syntax&lt;/a&gt;, which I described above as the simpler way to just have triples about triples without some of those triples having a special status that prevents them from showing up as the result of an &lt;code&gt;?s ?p ?o&lt;/code&gt; query.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;m sure that having this separate status be a part of the architecture will enable some finer-grained modeling. To just have triples about triples (especially to express data about edges between graph nodes, which was a key inspiration for all of this), I&amp;rsquo;m happy with the annotation syntax for now.&lt;/p&gt;
&lt;hr/&gt;
&lt;p&gt;&lt;em&gt;Comments? Reply to  &lt;a href=&#34;https://twitter.com/bobdc/status/1736412063793131589&#34;&gt;my tweet&lt;/a&gt; (or even better, my &lt;a href=&#34;https://mas.to/@bobdc/111596532783340382&#34;&gt;Mastodon message&lt;/a&gt;) announcing this blog entry.&lt;/em&gt;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2023">2023</category>
      
      <category domain="https://www.bobdc.com//categories/rdf">RDF</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
      <category domain="https://www.bobdc.com//categories/turtle">turtle</category>
      
    </item>
    
    <item>
      <title>Querying for labels</title>
      <link>https://www.bobdc.com/blog/wikibaselabel/</link>
      <pubDate>Sun, 19 Nov 2023 11:20:00 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/wikibaselabel/</guid>
      
      
      <description><div>The normal way and the wikibase:label service way</div><div>&lt;img src=&#34;https://www.bobdc.com/img/main/tomatoPlantsWithSPARQLLogo.jpg&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; width=&#34;250&#34; alt=&#34;labeled tomato plants with SPARQL logo&#34;/&gt;
&lt;p&gt;In my &lt;a href=&#34;../rdflabels&#34;&gt;last blog entry&lt;/a&gt; I discussed various ways that different RDF datasets assign human-readable labels to resources, with the &lt;code&gt;rdfs:label&lt;/code&gt; property being at the center of them all. I mentioned how schema.org doesn&amp;rsquo;t use &lt;code&gt;rdfs:label&lt;/code&gt; but its own equivalent of that, &lt;code&gt;schema:name&lt;/code&gt;, which its schema declares as a subproperty of &lt;code&gt;rdfs:label&lt;/code&gt;. Since I wrote that, &lt;a href=&#34;https://twitter.com/FanLi_RnD&#34;&gt;Fan Li&lt;/a&gt; &lt;a href=&#34;https://twitter.com/FanLi_RnD/status/1718687939352236107&#34;&gt;pointed out&lt;/a&gt; that Facebook&amp;rsquo;s &lt;a href=&#34;https://ogp.me/&#34;&gt;Open Graph protocol&lt;/a&gt; also has their own equivalent: &lt;code&gt;og:title&lt;/code&gt;, which you can see used in the HTML source of &lt;a href=&#34;https://www.imdb.com/title/tt22041854/?ref_=ttls_li_tt&#34;&gt;IMDB&lt;/a&gt;, &lt;a href=&#34;https://www.instagram.com/bobdcofficial/&#34;&gt;Instagram&lt;/a&gt;, and &lt;a href=&#34;https://www.yelp.com/biz/peter-changs-china-grill-charlottesville&#34;&gt;yelp&lt;/a&gt;. (I tried pointing each of those three links to the view-source version of the pages, and that didn&amp;rsquo;t work, so you&amp;rsquo;ll have to take the extra step with each to view their source and see each one&amp;rsquo;s &lt;code&gt;og:title&lt;/code&gt; value.) This also gets defined as a subproperty of &lt;code&gt;rdfs:label&lt;/code&gt; in the &lt;a href=&#34;https://ogp.me/ns/ogp.me.ttl&#34;&gt;OGP schema&lt;/a&gt;, so a serious RDFS application could parse that schema and then treat &lt;code&gt;og:title&lt;/code&gt; values as &lt;code&gt;rdfs:label&lt;/code&gt; values.&lt;/p&gt;
&lt;h1 id=&#34;treating-those-rdfslabel-variations-as-rdfslabel-values&#34;&gt;Treating those rdfs:label variations as rdfs:label values&lt;/h1&gt;
&lt;p&gt;Querying for &lt;code&gt;rdfs:label&lt;/code&gt; values is simple enough. To demonstrate how a query for &lt;code&gt;rdfs:label&lt;/code&gt; values will retrieve &lt;code&gt;og:title&lt;/code&gt; and &lt;code&gt;schema:name&lt;/code&gt; values when a query engine that can do inferencing has access to the Open Graph Protocol and schema.org schemas, I added some of those values to the following document with comments about where I found each. (Where I found them they were not in Turtle syntax like they are here, but they were in machine-readable formats that could easily be converted to Turtle.)&lt;/p&gt;
&lt;p&gt;Sample data:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;@prefix og: &amp;lt;http://ogp.me/ns#&amp;gt; .
@prefix schema: &amp;lt;https://schema.org/&amp;gt; .

# og:title examples

&amp;lt;https://www.imdb.com/title/tt22041854/?ref_=ttls_li_tt&amp;gt;  
  og:title &amp;#34;Priscilla (2023) ⭐ 6.9 | Biography, Drama, Music&amp;#34; . 

&amp;lt;https://www.instagram.com/bobdcofficial/&amp;gt; 
  og:title &amp;#34; (&amp;amp;#064;bobdcofficial) &amp;amp;#x2022; Instagram photos and videos&amp;#34; . 

&amp;lt;https://www.yelp.com/biz/peter-changs-china-grill-charlottesville&amp;gt; 
  og:title &amp;#34;Peter Chang&amp;#39;s China Grill - Charlottesville, VA&amp;#34; . 

# schema:name examples

## (added by Hugo as a default with no special configuration from me)
&amp;lt;https://www.bobdc.com/blog/rdflabels/&amp;gt; 
  schema:name &amp;#34;Human-readable names in RDF&amp;#34; . 

&amp;lt;https://www.newyorker.com/best-books-2023&amp;gt; 
  schema:name &amp;#34;The Best Books We Read This Week&amp;#34; . 

&amp;lt;https://www.landsend.com/products/mens-super-t-long-sleeve-t-shirt/id_130670&amp;gt; 
  schema:name &amp;#34;Men&amp;#39;s Super-T Long Sleeve T-Shirt&amp;#34; . 
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;I downloaded the &lt;a href=&#34;https://schema.org/docs/developers.html&#34;&gt;schema.org&lt;/a&gt; and &lt;a href=&#34;https://ogp.me/&#34;&gt;OGP&lt;/a&gt; schema files and combined them into a single schema file:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;cat ogp.me.ttl schemaorg-current-https.ttl &amp;gt; comboschema.ttl
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Then, as I described in &lt;a href=&#34;https://www.bobdc.com/blog/jenagems/&#34;&gt;Hidden gems included with Jena’s command line utilities&lt;/a&gt;, I used the Jena &lt;code&gt;riot&lt;/code&gt; tool to do RDFS inferencing with the data above and the combined schemas. It produced a lot of triples, so I used &lt;code&gt;grep&lt;/code&gt; to only show the ones that mentioned the &lt;code&gt;rdfs:label&lt;/code&gt; value:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;riot --rdfs comboschema.ttl labeldata.ttl | grep &amp;#34;#label&amp;#34; 
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;It produced these results:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;&amp;lt;https://www.imdb.com/title/tt22041854/?ref_=ttls_li_tt&amp;gt; &amp;lt;http://www.w3.org/2000/01/rdf-schema#label&amp;gt; &amp;#34;Priscilla (2023) ⭐ 6.9 | Biography, Drama, Music&amp;#34; .
&amp;lt;https://www.instagram.com/bobdcofficial/&amp;gt; &amp;lt;http://www.w3.org/2000/01/rdf-schema#label&amp;gt; &amp;#34; (&amp;amp;#064;bobdcofficial) &amp;amp;#x2022; Instagram photos and videos&amp;#34; .
&amp;lt;https://www.yelp.com/biz/peter-changs-china-grill-charlottesville&amp;gt; &amp;lt;http://www.w3.org/2000/01/rdf-schema#label&amp;gt; &amp;#34;Peter Chang&amp;#39;s China Grill - Charlottesville, VA&amp;#34; .
&amp;lt;https://www.bobdc.com/blog/rdflabels/&amp;gt; &amp;lt;http://www.w3.org/2000/01/rdf-schema#label&amp;gt; &amp;#34;Human-readable names in RDF&amp;#34; .
&amp;lt;https://www.newyorker.com/best-books-2023&amp;gt; &amp;lt;http://www.w3.org/2000/01/rdf-schema#label&amp;gt; &amp;#34;The Best Books We Read This Week&amp;#34; .
&amp;lt;https://www.landsend.com/products/mens-super-t-long-sleeve-t-shirt/id_130670&amp;gt; &amp;lt;http://www.w3.org/2000/01/rdf-schema#label&amp;gt; &amp;#34;Men&amp;#39;s Super-T Long Sleeve T-Shirt&amp;#34; .
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;So, asking for the &lt;code&gt;rdfs:label&lt;/code&gt; values when the schemas were available retrieved the &lt;code&gt;schema:name&lt;/code&gt; and &lt;code&gt;og:title&lt;/code&gt; values because they were subproperties of &lt;code&gt;rdfs:label&lt;/code&gt; and because I used a query engine that could do inferencing. (When I created a repo that would do RDFS inferencing with the &lt;a href=&#34;https://www.ontotext.com/products/graphdb&#34;&gt;free version of GraphDB&lt;/a&gt;, the same thing happened. Standards!)&lt;/p&gt;
&lt;h1 id=&#34;some-extra-help-from-the-wikidata-query-service&#34;&gt;Some extra help from the Wikidata Query Service&lt;/h1&gt;
&lt;p&gt;Querying for an &lt;code&gt;rdfs:label&lt;/code&gt; value in Wikipedia can be simple enough:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;PREFIX wd:   &amp;lt;http://www.wikidata.org/entity/&amp;gt;

SELECT * WHERE {
   wd:Q144 rdfs:label ?name
}
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;&lt;a href=&#34;https://query.wikidata.org/#SELECT%20%2a%20WHERE%20%7B%0A%20%20%20wd%3AQ144%20rdfs%3Alabel%20%3Fname%0A%20%20%20%20%20%20%20%20%20%20%20%7D&#34;&gt;Doing this in Wikidata&lt;/a&gt;, though, gets about 300 results (and the number has gone up since I first drafted this blog entry) because Wikidata knows the word for &amp;ldquo;dog&amp;rdquo; in so many languages. We could &lt;code&gt;FILTER&lt;/code&gt; it down to one or just a few languages like this:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;PREFIX wd:   &amp;lt;http://www.wikidata.org/entity/&amp;gt;

SELECT ?label WHERE {
   wd:Q144 rdfs:label ?label
   FILTER (lang(?label) IN (&amp;#34;en&amp;#34;,&amp;#34;es&amp;#34;))
}
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Wikidata has a &lt;a href=&#34;https://en.wikibooks.org/wiki/SPARQL/SERVICE_-_Label&#34;&gt;special service&lt;/a&gt; to make this easier. To demonstrate it, let&amp;rsquo;s say I&amp;rsquo;m wondering about the topics of the Wikiquote pages &lt;a href=&#34;https://en.wikiquote.org/wiki/Dogs&#34;&gt;https://en.wikiquote.org/wiki/Dogs&lt;/a&gt; and &lt;a href=&#34;https://en.wikiquote.org/wiki/Cats&#34;&gt;https://en.wikiquote.org/wiki/Cats&lt;/a&gt; (although it&amp;rsquo;s pretty clear from the URLs). The following query, which you can try &lt;a href=&#34;https://query.wikidata.org/#SELECT%20%3Ffoo%20%3Fbar%0AWHERE%20%7B%0A%20%20%7B%20%3Chttps%3A%2F%2Fen.wikiquote.org%2Fwiki%2FDogs%3E%20schema%3Aabout%20%3Ffoo%20%7D%0A%20%20UNION%0A%20%20%7B%20%3Chttps%3A%2F%2Fen.wikiquote.org%2Fwiki%2FCats%3E%20schema%3Aabout%20%3Fbar%20%7D%0A%7D%0A&#34;&gt;on the Wikidata Query Service&lt;/a&gt;, will show me a &lt;code&gt;?foo&lt;/code&gt; value of &lt;code&gt;wd:Q144&lt;/code&gt; and and &lt;code&gt;?bar&lt;/code&gt; value of &lt;code&gt;wd:Q146&lt;/code&gt;, which are not very informative:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;SELECT ?foo ?bar
WHERE {
  { &amp;lt;https://en.wikiquote.org/wiki/Dogs&amp;gt; schema:about ?foo }
  UNION
  { &amp;lt;https://en.wikiquote.org/wiki/Cats&amp;gt; schema:about ?bar }
}
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;I could ask for &lt;code&gt;rdfs:label&lt;/code&gt; values of &lt;code&gt;?foo&lt;/code&gt; and &lt;code&gt;?bar&lt;/code&gt;, but instead I&amp;rsquo;ll use the &lt;code&gt;wikibase:label&lt;/code&gt; service built in to the Wikidata Query Service. This not only looks up the labels but even creates variables for them by adding &amp;ldquo;Label&amp;rdquo; to the names of the variables representing the resources that I&amp;rsquo;m querying about:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;SELECT ?fooLabel ?barLabel
WHERE {
  { &amp;lt;https://en.wikiquote.org/wiki/Dogs&amp;gt; schema:about ?foo }
  UNION
  { &amp;lt;https://en.wikiquote.org/wiki/Cats&amp;gt; schema:about ?bar }
  SERVICE wikibase:label { bd:serviceParam wikibase:language &amp;#34;[AUTO_LANGUAGE]&amp;#34; } 
}
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;&lt;a href=&#34;https://query.wikidata.org/#SELECT%20%3FfooLabel%20%3FbarLabel%0AWHERE%20%7B%0A%20%20%7B%20%3Chttps%3A%2F%2Fen.wikiquote.org%2Fwiki%2FDogs%3E%20schema%3Aabout%20%3Ffoo%20%7D%0A%20%20UNION%0A%20%20%7B%20%3Chttps%3A%2F%2Fen.wikiquote.org%2Fwiki%2FCats%3E%20schema%3Aabout%20%3Fbar%20%7D%0A%20%20SERVICE%20wikibase%3Alabel%20%7B%20bd%3AserviceParam%20wikibase%3Alanguage%20%22%5BAUTO_LANGUAGE%5D%22%20%7D%20%0A%7D%0A&#34;&gt;Running that query&lt;/a&gt; gives us the following results:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;fooLabel    barLabel
--------    --------
dog         house cat
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;I could name a specific language if I wanted;  &lt;a href=&#34;https://query.wikidata.org/#SELECT%20%3FfooLabel%20%3FbarLabel%0AWHERE%20%7B%0A%20%20%7B%20%3Chttps%3A%2F%2Fen.wikiquote.org%2Fwiki%2FDogs%3E%20schema%3Aabout%20%3Ffoo%20%7D%0A%20%20UNION%0A%20%20%7B%20%3Chttps%3A%2F%2Fen.wikiquote.org%2Fwiki%2FCats%3E%20schema%3Aabout%20%3Fbar%20%7D%0A%20%20SERVICE%20wikibase%3Alabel%20%7B%20bd%3AserviceParam%20wikibase%3Alanguage%20%22de%22%20%7D%20%0A%7D%0A&#34;&gt;running the next one&lt;/a&gt; shows a &lt;code&gt;?fooLabel&lt;/code&gt; value of &amp;ldquo;Hund&amp;rdquo; and a &lt;code&gt;?barLabel&lt;/code&gt; value of &amp;ldquo;Hauskatze&amp;rdquo;.&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;SELECT ?fooLabel ?barLabel
WHERE {
  { &amp;lt;https://en.wikiquote.org/wiki/Dogs&amp;gt; schema:about ?foo }
  UNION
  { &amp;lt;https://en.wikiquote.org/wiki/Cats&amp;gt; schema:about ?bar }
  SERVICE wikibase:label { bd:serviceParam wikibase:language &amp;#34;de&amp;#34; } 
}
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;A neat Wikidata Query Service trick that I only recently learned about is how the web interface lets you reset the default language. If I click on &amp;ldquo;English&amp;rdquo; in the upper right of the &lt;a href=&#34;https://query.wikidata.org/#SELECT%20%3FfooLabel%20%3FbarLabel%0AWHERE%20%7B%0A%20%20%7B%20%3Chttps%3A%2F%2Fen.wikiquote.org%2Fwiki%2FDogs%3E%20schema%3Aabout%20%3Ffoo%20%7D%0A%20%20UNION%0A%20%20%7B%20%3Chttps%3A%2F%2Fen.wikiquote.org%2Fwiki%2FCats%3E%20schema%3Aabout%20%3Fbar%20%7D%0A%20%20SERVICE%20wikibase%3Alabel%20%7B%20bd%3AserviceParam%20wikibase%3Alanguage%20%22de%22%20%7D%20%0A%7D%0A&#34;&gt;query screen&lt;/a&gt; I get a drop-down, searchable list of languages. If I pick &amp;ldquo;español&amp;rdquo; from this list, the query screen&amp;rsquo;s &amp;ldquo;Examples&amp;rdquo; button gets renamed as &amp;ldquo;Ejemplos&amp;rdquo;, &amp;ldquo;Help&amp;rdquo; becomes &amp;ldquo;Ayuda&amp;rdquo;, and so forth with the rest of the UI. When I run the  &lt;code&gt;[AUTO_LANGUAGE]&lt;/code&gt; query from above after doing this,  it shows a &lt;code&gt;?fooLabel&lt;/code&gt; value of &amp;ldquo;perro&amp;rdquo; and a &lt;code&gt;?barLabel&lt;/code&gt; value of &amp;ldquo;gato doméstico&amp;rdquo;.&lt;/p&gt;
&lt;p&gt;With a made-up language code of &amp;ldquo;xyz&amp;rdquo; that it doesn&amp;rsquo;t recognize, it gives me the Q names from the &lt;code&gt;?foo&lt;/code&gt; and &lt;code&gt;?bar&lt;/code&gt; values as &lt;code&gt;?fooLabel&lt;/code&gt; and &lt;code&gt;?barLabel&lt;/code&gt; values:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;fooLabel  barLabel
--------  --------
Q144      Q146
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;The &lt;code&gt;wikibase:label&lt;/code&gt; service is not standard SPARQL, but with the tremendous amount of multi-lingual data available in Wikidata, it adds a lot of convenience that can trim down the length of your Wikidata queries.&lt;/p&gt;
&lt;hr/&gt;
&lt;p&gt;&lt;em&gt;Comments? Reply to  &lt;a href=&#34;https://twitter.com/bobdc/status/1726278213205463105&#34;&gt;my tweet&lt;/a&gt; (or even better, my &lt;a href=&#34;https://mas.to/@bobdc/111438192229406571&#34;&gt;Mastodon message&lt;/a&gt;) announcing this blog entry.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://creativecommons.org/licenses/by/2.0/&#34;&gt;CC BY 2.0&lt;/a&gt; &lt;a href=&#34;https://www.flickr.com/photos/krossbow/52186070666/&#34;&gt;photo&lt;/a&gt; by &lt;a href=&#34;https://www.flickr.com/photos/krossbow/&#34;&gt;F Delventhal&lt;/a&gt;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2023">2023</category>
      
      <category domain="https://www.bobdc.com//categories/rdf">RDF</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
      <category domain="https://www.bobdc.com//categories/triplestores">triplestores</category>
      
    </item>
    
    <item>
      <title>Human-readable names in RDF</title>
      <link>https://www.bobdc.com/blog/rdflabels/</link>
      <pubDate>Sun, 29 Oct 2023 11:04:00 +0000</pubDate>
      
      <guid>https://www.bobdc.com/blog/rdflabels/</guid>
      
      
      <description><div>Sometimes simple, sometimes not.</div><div>&lt;img src=&#34;https://www.bobdc.com/img/main/tomatoPlants.jpg&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; width=&#34;250&#34; alt=&#34;labeled tomato plants&#34;/&gt;
&lt;h1 id=&#34;rdfslabel&#34;&gt;rdfs:label&lt;/h1&gt;
&lt;p&gt;First, reviewing some basics before I discuss the edge cases: resources in RDF are represented by URIs, and the spelling of a given URI often provides no clues about what the URI represents. For example, you wouldn&amp;rsquo;t know from looking at  &lt;code&gt;http://www.wikidata.org/entity/Q144&lt;/code&gt; that it represents &amp;ldquo;dog&amp;rdquo; as a Wikipedia topic.  (We&amp;rsquo;ll see below that this is a for a good reason.)&lt;/p&gt;
&lt;p&gt;Subject-predicate-object triples use predicate-object pairs to describe the resources represented as URIs by each subject. (We sometimes forget that RDF stands for &amp;ldquo;Resource Description Framework&amp;rdquo;.) The most popular predicate is the one that gives us a human-readable name to tell us what resource the URI represents: &lt;code&gt;rdfs:label&lt;/code&gt;. People typically use it to assign an identifying name to a resource.&lt;/p&gt;
&lt;p&gt;You can optionally add a language tag to indicate the spoken language of the label value. Assigning multiple terms in different languages to the same resource makes it easier to build &lt;a href=&#34;../using-sparql-queries-from-nati#multilanguage&#34;&gt;multi-lingual&lt;/a&gt; applications.&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;@prefix wd: &amp;lt;http://www.wikidata.org/entity/&amp;gt; . 

wd:Q144 rdfs:label &amp;#34;dog&amp;#34;@en . 
wd:Q144 rdfs:label &amp;#34;perro&amp;#34;@es . 
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;This also reminds us why it&amp;rsquo;s a bad practice to include descriptive text as part of the URI: including &amp;ldquo;dog&amp;rdquo; in the URI &lt;code&gt;http://www.wikidata.org/entity/Q144&lt;/code&gt; would only help people who know English, and including &amp;ldquo;perro&amp;rdquo; would only help people who know Spanish.&lt;/p&gt;
&lt;h1 id=&#34;schemaname&#34;&gt;schema:name&lt;/h1&gt;
&lt;p&gt;While most schemas and ontologies are built around a specific domain such as a business sector or an academic discipline, the very successful &lt;a href=&#34;https://schema.org/&#34;&gt;schema.org&lt;/a&gt; is much broader, covering many aspects of ordinary life and commerce. Unlike most other vocabularies, schema.org does not use &lt;code&gt;rdfs:label&lt;/code&gt; for names, but its own &lt;a href=&#34;https://schema.org/name&#34;&gt;&lt;code&gt;schema:name&lt;/code&gt;&lt;/a&gt; property instead. The discussion &lt;a href=&#34;https://github.com/schemaorg/schemaorg/issues/1762&#34;&gt;What is the difference between schema:name and rdfs:label?&lt;/a&gt; on a schema.org development issue page explains why: many processors that can read schema.org data from a web page won&amp;rsquo;t know about RDF and won&amp;rsquo;t recognize &lt;code&gt;rdfs:label&lt;/code&gt;. As part of that discussion, &lt;a href=&#34;https://github.com/danbri&#34;&gt;Dan Brickley&lt;/a&gt; mentions adding a subPropertyOf assertion to the definition of &lt;code&gt;schema:name&lt;/code&gt;, which we see right in the property&amp;rsquo;s definition in the RDFS schema that you can download from the schema.org &lt;a href=&#34;https://schema.org/docs/developers.html&#34;&gt;Developers&lt;/a&gt; page:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;schema:name a rdf:Property ;
    rdfs:label &amp;#34;name&amp;#34; ;
    rdfs:comment &amp;#34;The name of the item.&amp;#34; ;
    rdfs:subPropertyOf rdfs:label ;
    owl:equivalentProperty dcterms:title ;
    schema:domainIncludes schema:Thing ;
    schema:rangeIncludes schema:Text .
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;This is a perfect response to RDF geeks who complain that schema.org should have used &lt;code&gt;rdfs:label&lt;/code&gt; instead of making up its own &lt;code&gt;schema:name&lt;/code&gt; property—for a system that can parse full RDF and do even minimal inferencing, a &lt;code&gt;schema:name&lt;/code&gt; value counts as an &lt;code&gt;rdfs:label&lt;/code&gt; value. It says so right on the fourth line of the above excerpt.&lt;/p&gt;
&lt;h1 id=&#34;a-brief-detour-dctitle-and-skospreflabel&#34;&gt;A brief detour: dc:title and skos:prefLabel&lt;/h1&gt;
&lt;p&gt;The schema excerpt above also includes an assertion that &lt;code&gt;schema:name&lt;/code&gt; is an &lt;code&gt;owl:equivalentProperty&lt;/code&gt; to the &lt;a href=&#34;https://www.dublincore.org/&#34;&gt;Dublin Core&lt;/a&gt; &lt;code&gt;dcterms:title&lt;/code&gt; property. The Dublin Core vocabulary is almost as old as the web itself, predating schema.org by sixteen years. That vocabulary&amp;rsquo;s specification describes both &lt;a href=&#34;https://www.dublincore.org/specifications/dublin-core/dcmi-terms/#title&#34;&gt;&lt;code&gt;dcterms:title&lt;/code&gt;&lt;/a&gt; and the property of which it is a subproperty, &lt;a href=&#34;https://www.dublincore.org/specifications/dublin-core/dcmi-terms/elements11/title/&#34;&gt;&lt;code&gt;dc:title&lt;/code&gt;&lt;/a&gt;,  as &amp;ldquo;A name given to the resource&amp;rdquo;, which supports Dan&amp;rsquo;s note that the &lt;code&gt;schema:name&lt;/code&gt; property means the same thing as &lt;code&gt;dc:title&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;I think of the Dublin Core terms as slightly narrower than that. The &lt;a href=&#34;https://en.wikipedia.org/wiki/Dublin_Core&#34;&gt;Wikipedia page&lt;/a&gt; for Dublin Core describes it as a set of &amp;ldquo;metadata items for describing digital or physical resources&amp;rdquo;, which aligns it with &lt;code&gt;rdfs:label&lt;/code&gt;, but Dublin Core was first developed in response to the rapidly expanding ideas of what constituted &amp;ldquo;publishing&amp;rdquo; in the early days of the web, so I&amp;rsquo;ve always thought of it as by and for the publishing industry. (I once took part in a standards group that developed standards more specifically for the magazine industry, and when they needed separate properties for a given issue&amp;rsquo;s publication date and newsstand date, making each a subproperty of &lt;code&gt;dcterms:date&lt;/code&gt; was a perfect use case for RDFS subproperties.) I suppose the word &amp;ldquo;title&amp;rdquo; also makes me think of a label for a book, musical album, or other published work.&lt;/p&gt;
&lt;p&gt;The &lt;a href=&#34;https://www.bobdc.com/blog/skosibm/&#34;&gt;SKOS&lt;/a&gt; &lt;a href=&#34;https://www.w3.org/TR/skos-reference/#L1304&#34;&gt;&lt;code&gt;skos:prefLabel&lt;/code&gt;&lt;/a&gt; property, which names something&amp;rsquo;s preferred label (as opposed to alternative or hidden labels, which are additional SKOS properties) may seem equivalent to &lt;code&gt;rdfs:label&lt;/code&gt;. I don&amp;rsquo;t think of it as suitable for just any existing or imaginary resource, the way &lt;code&gt;rdfs:label&lt;/code&gt; is, but instead for for naming concepts within the taxonomies and thesauruses that SKOS was designed to help manage. &lt;a href=&#34;https://www.w3.org/2009/08/skos-reference/skos.html#prefLabel&#34;&gt;The SKOS specification&lt;/a&gt; does say that it&amp;rsquo;s a subproperty of &lt;code&gt;rdfs:label&lt;/code&gt;, so this supports the idea that it&amp;rsquo;s a specialized version of that, but the &lt;a href=&#34;https://www.w3.org/2009/08/skos-reference/skos.rdf&#34;&gt;actual SKOS schema&lt;/a&gt; shows that &lt;code&gt;skos:prefLabel&lt;/code&gt; does not have an &lt;code&gt;rdfs:domain&lt;/code&gt; of &lt;code&gt;skos:Concept&lt;/code&gt; (that is, it&amp;rsquo;s not defined as being used only for describing labels of Concepts) as I had expected. Still, it was defined as part of SKOS, and SKOS is about managing vocabulary terms and their relationships and other metadata, with concepts being the central organizing unit for managing these terms and their metadata.&lt;/p&gt;
&lt;p&gt;One person&amp;rsquo;s SKOS taxonomy might be another person&amp;rsquo;s  hierarchical class structure; converting one to the other with a SPARQL CONSTRUCT query has helped many people take advantage of available data that otherwise wasn&amp;rsquo;t a perfect fit for their system. This typically means converting between concept &lt;code&gt;skos:prefLabel&lt;/code&gt; values and class &lt;code&gt;rdfs:label&lt;/code&gt; values.&lt;/p&gt;
&lt;p&gt;How do we query for all these types of labels? Generally, the same way we query for any other RDF values, but in my next blog entry I&amp;rsquo;ll talk about a built-in special service in Wikidata that lets you replace several lines of label-retrieving SPARQL code with a single line.&lt;/p&gt;
&lt;hr/&gt;
&lt;p&gt;&lt;em&gt;Comments? Reply to  &lt;a href=&#34;https://twitter.com/bobdc/status/1718648951870501221&#34;&gt;my tweet&lt;/a&gt; (or even better, my &lt;a href=&#34;https://mas.to/@bobdc/111318983999801365&#34;&gt;Mastodon message&lt;/a&gt;) announcing this blog entry.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://creativecommons.org/licenses/by/2.0/&#34;&gt;CC BY 2.0&lt;/a&gt; &lt;a href=&#34;https://www.flickr.com/photos/krossbow/52186070666/&#34;&gt;photo&lt;/a&gt; by &lt;a href=&#34;https://www.flickr.com/photos/krossbow/&#34;&gt;F Delventhal&lt;/a&gt;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2023">2023</category>
      
      <category domain="https://www.bobdc.com//categories/rdf">RDF</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
      <category domain="https://www.bobdc.com//categories/triplestores">triplestores</category>
      
    </item>
    
    <item>
      <title>My brief tenor banjo career</title>
      <link>https://www.bobdc.com/blog/tenorbanjo/</link>
      <pubDate>Sun, 24 Sep 2023 10:50:00 +0000</pubDate>
      
      <guid>https://www.bobdc.com/blog/tenorbanjo/</guid>
      
      
      <description><div>Brief but symphonic.</div><div>&lt;img src=&#34;https://www.bobdc.com/img/main/wsobanjo.png&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; width=&#34;250&#34; alt=&#34;banjo with the WSO&#34;/&gt;
&lt;p&gt;During the first pandemic summer I asked my wife for a cheap &lt;a href=&#34;https://en.wikipedia.org/wiki/Banjo#Tenor_banjo&#34;&gt;tenor banjo&lt;/a&gt; for my birthday. These are tuned like a viola and smaller than the traditional five-string banjos  used for bluegrass. Instead of fingerpicking patterns with the right hand like bluegrass banjos are famous for, tenor banjo players strum chords with a pick for volume as a rhythm instrument. When you hear a banjo in old-timey jazz, that&amp;rsquo;s a tenor. (Around the time the ground was being laid for bebop, &lt;a href=&#34;../cmdlineowl&#34;&gt;Charlie Christian&lt;/a&gt; showed that an amplified electric guitar could do a lot more than a banjo, so banjos faded from use in jazz groups outside of trad and Dixieland circles.) It was also fun knowing that the chords that I learned for the tenor banjo could work on the viola; composer Jessie Montgomery&amp;rsquo;s wonderful piece &lt;a href=&#34;https://www.youtube.com/watch?v=-ZmVRWjpNxw&#34;&gt;Strum&lt;/a&gt; was also an inspiration toward thinking about plucking chords on this otherwise bowed instrument.&lt;/p&gt;
&lt;p&gt;When the &lt;a href=&#34;https://waynesborosymphonyorchestra.org/&#34;&gt;Waynesboro Symphony Orchestra&lt;/a&gt; was rehearsing William Grant Still&amp;rsquo;s &lt;a href=&#34;https://en.wikipedia.org/wiki/Afro-American_Symphony&#34;&gt;Afro-American Symphony&lt;/a&gt; in February of last year, after the third movement the conductor &lt;a href=&#34;https://peterwilsonmusician.com/&#34;&gt;Peter Wilson&lt;/a&gt; told us that the actual performance would include a harp, a xylophone, and a banjo. On the next break I went up to him and said “that would be a tenor banjo, right? Because of the old time jazz effect? If so, I have one.” (“Rhapsody in Blue” by Still&amp;rsquo;s friend and arranging student George Gershwin also includes a tenor banjo.) He said “bring it!”&lt;/p&gt;
&lt;p&gt;Here is that movement as we performed it at Waynesboro&amp;rsquo;s First Presbyterian Church. To make it easier for the audience to hear the banjo, Peter had me stand practically right in front of him for this movement. (You will see links in YouTube to the other movements, where I am in the back row of the violas.)&lt;/p&gt;
&lt;iframe width=&#34;560&#34; height=&#34;315&#34; style=&#34;display: block; margin: 20pt auto 20pt auto;&#34; src=&#34;https://www.youtube.com/embed/fekt-ErKJ2I&#34; title=&#34;YouTube video player&#34; frameborder=&#34;0&#34; allow=&#34;accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture&#34; allowfullscreen&gt;&lt;/iframe&gt;
&lt;h2 id=&#34;tenor-banjo-chord-cheat-sheet&#34;&gt;Tenor Banjo Chord Cheat Sheet&lt;/h2&gt;
&lt;p&gt;To get to know the instrument when I first got it I went through several books and many Internet charts of chords to learn. I eventually scrawled a single-page chart of the chord forms that I thought were most useful. I wanted to share that with others, but I didn&amp;rsquo;t want to share the actual scrawl, so I made a neater PDF version of my &lt;a href=&#34;https://www.bobdc.com/miscfiles/tenorBanjoChords.pdf&#34;&gt;Tenor Banjo Chord Cheat Sheet&lt;/a&gt;. The first three rows show a few forms of major, minor, and (dominant) seventh chords, which should be enough for people doing most simple songs. The last three rows show half-diminished, sharp 5, and diminished chords for people playing more typical jazz standards.&lt;/p&gt;
&lt;p&gt;The chart shows all chords in the open position so that at least one string is played without any left-hand fingers pressing it down, but they all work as bar chords if you play them higher on the neck and press your left first finger across the neck where you see open strings in each chord diagram. I also wrote &amp;ldquo;R&amp;rdquo; in each chord diagram under any strings that are playing the root of the chord. That way, if I need (for example) a B flat minor chord, which is not shown on the chart, I can just find the note B flat somewhere on the neck and then pick the minor chord shape from the cheat sheet that is built around the chord&amp;rsquo;s root being on that string.&lt;/p&gt;
&lt;p&gt;I hope this chart can help someone else who decides to try this fun instrument with an important role in early jazz history. My one live gig with it was certainly an interesting experience.&lt;/p&gt;
&lt;hr/&gt;
&lt;p&gt;&lt;em&gt;Comments? Reply to  &lt;a href=&#34;https://twitter.com/bobdc/status/1705964565853344218&#34;&gt;my tweet&lt;/a&gt; (or even better, my &lt;a href=&#34;https://mas.to/@bobdc/111120788859709858&#34;&gt;Mastodon message&lt;/a&gt;) announcing this blog entry.&lt;/em&gt;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2023">2023</category>
      
      <category domain="https://www.bobdc.com//categories/music">music</category>
      
    </item>
    
    <item>
      <title>Nicer date and time handling in SPARQL 1.2</title>
      <link>https://www.bobdc.com/blog/sparql12time/</link>
      <pubDate>Fri, 18 Aug 2023 10:25:00 +0000</pubDate>
      
      <guid>https://www.bobdc.com/blog/sparql12time/</guid>
      
      
      <description><div>Add, subtract, and ADJUST() dates and times. </div><div>&lt;img src=&#34;https://www.bobdc.com/img/main/calendarsparql.png&#34; alt=&#34;[SPARQL logo and calendar]&#34; border=&#34;0&#34; align=&#34;right&#34; hspace=&#34;30px&#34; vspace=&#34;30px&#34; width=&#34;200&#34;/&gt;
&lt;p&gt;SPARQL 1.1 has been with us for about ten years. Work on SPARQL 1.2 is &lt;a href=&#34;https://github.com/w3c/sparql-dev&#34;&gt;currently underway&lt;/a&gt;, and one nice set of improvements will let us do much more with date and time values.&lt;/p&gt;
&lt;p&gt;I didn&amp;rsquo;t realize how minimal SPARQL 1.1&amp;rsquo;s ability to handle these was until I saw the introductory material in the &lt;a href=&#34;https://github.com/w3c/sparql-dev/blob/main/SEP/SEP-0002/sep-0002.md&#34;&gt;Add Support Durations, Dates, and Times&lt;/a&gt; issue recently added to the SPARQL 1.2 development discussion. I had never noticed how the SPARQL 1.1 Recommendation &lt;a href=&#34;https://www.w3.org/TR/sparql11-query/#otherTermConstraints&#34;&gt;explicitly says&lt;/a&gt; that it supports the &lt;code&gt;xsd:dateTime&lt;/code&gt; data type without mentioning &lt;code&gt;xsd:date&lt;/code&gt; or any of the other related date and time data types.&lt;/p&gt;
&lt;p&gt;With this support, SPARQL 1.1 lets you &lt;a href=&#34;https://www.w3.org/TR/sparql11-query/#expressions&#34;&gt;compare &lt;code&gt;xsd:dateTime&lt;/code&gt; values&lt;/a&gt; so that you can filter for events that occur before or after a particular point in time (see &lt;a href=&#34;http://www.learningsparql.com/2ndeditionexamples/ex230.rq&#34;&gt;this example&lt;/a&gt; from my book &lt;a href=&#34;http://www.learningsparql.com/&#34;&gt;Learning SPARQL&lt;/a&gt;) or between two points in time. SPARQL 1.2 will add many related options, including the ability to do date arithmetic.&lt;/p&gt;
&lt;p&gt;My experiments with subtracting one date or date-time value from another had different levels of success with different SPARQL processors because some of these processors have added degrees of support just because it was useful to have, even though it wasn&amp;rsquo;t explicitly required by the standard. One example is &lt;a href=&#34;https://graphdb.ontotext.com/documentation/10.3/time-functions.html&#34;&gt;Ontotext&amp;rsquo;s added support for date and time manipulation&lt;/a&gt;. Another is the way that Wikidata&amp;rsquo;s SPARQL endpoint can subtract &lt;code&gt;xsd:date&lt;/code&gt; values, although it &lt;a href=&#34;https://query.wikidata.org/#PREFIX%20xsd%3A%20%3Chttp%3A%2F%2Fwww.w3.org%2F2001%2FXMLSchema%23%3E%0A%0ASELECT%20%2a%20WHERE%20%7B%0A%20%20BIND%20%28%28%222023-10-15%22%5E%5Exsd%3Adate%20-%20%222023-10-12%22%5E%5Exsd%3Adate%29%20AS%20%24dateTest%29%0A%7D%0A&#34;&gt;returns a decimal number&lt;/a&gt;; with Ontotext and the proposed 1.2 standard it returns a &lt;a href=&#34;https://www.w3.org/TR/xmlschema-2/#duration&#34;&gt;duration&lt;/a&gt; value. Having more extensive support for working with dates and times right in the SPARQL 1.2 standard will ensure that all the SPARQL processors support it and that they all use a consistent syntax and return consistent data types. This is why we use standards!&lt;/p&gt;
&lt;p&gt;The proposed 1.2 additions will also let us add and subtract durations from &lt;code&gt;xsd:time&lt;/code&gt; and &lt;code&gt;xsd:dateTime&lt;/code&gt; values. And, it will give us a new function that builds on these: &lt;code&gt;ADJUST()&lt;/code&gt;, which adjusts &lt;code&gt;xsd:dateTime&lt;/code&gt; values based on their time zones.&lt;/p&gt;
&lt;p&gt;The latest release of &lt;a href=&#34;https://jena.apache.org/&#34;&gt;Apache Jena&lt;/a&gt; supports this new function, so I tried it out. The following Turtle data shows the start and end time of a meeting that takes place in the New York City time zone:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;@prefix xsd: &amp;lt;http://www.w3.org/2001/XMLSchema#&amp;gt; .
@prefix d:   &amp;lt;http://learningsparql.com/ns/data#&amp;gt; .
@prefix t:   &amp;lt;http://purl.org/tio/ns#&amp;gt; . 

d:meeting1 t:starts &amp;#34;2023-10-14T12:30:00-05:00&amp;#34;^^xsd:dateTime ;
           t:ends   &amp;#34;2023-10-14T15:00:00-05:00&amp;#34;^^xsd:dateTime . 
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;The query below uses the &lt;code&gt;ADJUST()&lt;/code&gt; function to calculate the Los Angeles start and end time of the same meeting. The time zone of the New York meeting was indicated with &lt;code&gt;-05:00&lt;/code&gt;, so the LA equivalent is &lt;code&gt;-08:00&lt;/code&gt;:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;PREFIX t: &amp;lt;http://purl.org/tio/ns#&amp;gt; 
PREFIX xsd: &amp;lt;http://www.w3.org/2001/XMLSchema#&amp;gt;

SELECT ?NYCStartTime ?LAStartTime
WHERE
{
  ?mtg t:starts ?NYCStartTime . 
  BIND (ADJUST(?NYCStartTime, xsd:dayTimeDuration(&amp;#34;-PT8H&amp;#34;)) AS ?LAStartTime)
}
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;The result:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;-----------------------------------------------------------------------------------------
| NYCStartTime                              | LAStartTime                               |
=========================================================================================
| &amp;#34;2023-10-14T12:30:00-05:00&amp;#34;^^xsd:dateTime | &amp;#34;2023-10-14T09:30:00-08:00&amp;#34;^^xsd:dateTime |
-----------------------------------------------------------------------------------------
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Read the  &lt;a href=&#34;https://github.com/w3c/sparql-dev/blob/main/SEP/SEP-0002/sep-0002.md&#34;&gt;Add Support Durations, Dates, and Times&lt;/a&gt; issue mentioned above for more details about the expanded support for manipulation of date and time data. You&amp;rsquo;ll see some great new things that we&amp;rsquo;ll be able to do with a lot of existing data, especially with all those date values in Wikidata.&lt;/p&gt;
&lt;hr/&gt;
&lt;p&gt;&lt;em&gt;Comments? Reply to  &lt;a href=&#34;https://twitter.com/bobdc/status/1692545508277993594&#34;&gt;my tweet&lt;/a&gt; (or even better, my &lt;a href=&#34;https://mas.to/@bobdc/110911115060935960&#34;&gt;Mastodon message&lt;/a&gt;) announcing this blog entry.&lt;/em&gt;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2023">2023</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
    </item>
    
    <item>
      <title>Passing your own data to use in Wikidata visualizations</title>
      <link>https://www.bobdc.com/blog/your-values-wikidata/</link>
      <pubDate>Sun, 23 Jul 2023 11:15:00 +0000</pubDate>
      
      <guid>https://www.bobdc.com/blog/your-values-wikidata/</guid>
      
      
      <description><div>A VALUES-based approach.</div><div>&lt;p&gt;I&amp;rsquo;ve had a decent understand of what the &lt;code&gt;VALUES&lt;/code&gt; keyword can do for a while (see &lt;a href=&#34;https://www.bobdc.com/blog/sparql-11s-new-values-keyword/&#34;&gt;SPARQL 1.1&amp;rsquo;s new VALUES keyword&lt;/a&gt; and &amp;ldquo;Creating Tables of Values in your Queries&amp;rdquo; in my book &lt;a href=&#34;http://www.learningsparql.com/&#34;&gt;Learning SPARQL&lt;/a&gt;) but lately I&amp;rsquo;ve gained a greater appreciation of ways to use it. For example, &lt;a href=&#34;https://www.bobdc.com/blog/spacy/&#34;&gt;last month&lt;/a&gt; I used it to map codes assigned by an entity recognition tool to schema.org classes. This month I found a nice way to use it to control one of Wikidata&amp;rsquo;s &lt;a href=&#34;https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/Wikidata_Query_Help/Result_Views&#34;&gt;many cool data visualization possibilities&lt;/a&gt;. By sending the Wikidata query service some data in a &lt;code&gt;VALUES&lt;/code&gt; clause in my query, I don&amp;rsquo;t have to rely completely on what&amp;rsquo;s in Wikidata to drive the visualization.&lt;/p&gt;
&lt;p&gt;Wikidata&amp;rsquo;s &lt;a href=&#34;https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/Wikidata_Query_Help/Result_Views#Timeline&#34;&gt;timeline visualization&lt;/a&gt; lets you view a chart of events displayed in the order in which they happened&amp;ndash;for instance, the launch date of space probes, which Wikidata&amp;rsquo;s &lt;a href=&#34;https://query.wikidata.org/#%23defaultView%3ATimeline%0ASELECT%20%3Fitem%20%3FitemLabel%20%3Flaunchdate%20%28SAMPLE%28%3Fimage%29%20AS%20%3Fimage%29%0AWHERE%0A%7B%0A%09%3Fitem%20wdt%3AP31%20wd%3AQ26529%20.%0A%20%20%20%20%3Fitem%20wdt%3AP619%20%3Flaunchdate%20.%0A%09SERVICE%20wikibase%3Alabel%20%7B%20bd%3AserviceParam%20wikibase%3Alanguage%20%22en%22%20%7D%0A%20%20%20%20OPTIONAL%20%7B%20%3Fitem%20wdt%3AP18%20%3Fimage%20%7D%0A%7D%0AGROUP%20BY%20%3Fitem%20%3FitemLabel%20%3Flaunchdate&#34;&gt;sample timeline query&lt;/a&gt; asks for. Requesting two dates in your query result adds bars to the display to visually show the elapsed time between those dates.&lt;/p&gt;
&lt;p&gt;As their demonstration shows, this is pretty simple if you can come up with a query for the things you want displayed on the chart. But what if the entities you want to see there don&amp;rsquo;t have anything in common that you can query for?&lt;/p&gt;
&lt;p&gt;In my case, I wanted to create a timeline of Shakespeare and certain famous people who lived in his time. I wanted to see two composers, two scientists, and the &amp;ldquo;statesman&amp;rdquo; Walter Raleigh, but I knew of no single query that would  return these six people. (I read books on music and science that mention one or the other so I thought it would be nice to see just how contemporary they were.)&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;VALUES&lt;/code&gt; query made it easy. I used Wikipedia and Wikidata to find the identifier for each person (for example, for one of them, I clicked &lt;a href=&#34;https://www.wikidata.org/wiki/Q179277&#34;&gt;Wikidata item&lt;/a&gt; on &lt;a href=&#34;https://en.wikipedia.org/wiki/Giovanni_Pierluigi_da_Palestrina&#34;&gt;Palestrina&amp;rsquo;s Wikipedia page&lt;/a&gt; and saw from its URL that his identifier is Q179277) and added them to the &lt;code&gt;VALUES&lt;/code&gt; list with a &lt;code&gt;wd:&lt;/code&gt; prefix. The comment at the top of the query tells Wikidata that I want to see the Timeline visualization:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;#defaultView:Timeline
SELECT ?name ?dateOfBirth ?dateOfDeath 
WHERE {
  VALUES ?person { wd:Q692 wd:Q179277 wd:Q53068 wd:Q307 
                   wd:Q9191 wd:Q189144 }
  ?person wdt:P569 ?dateOfBirth ;
          rdfs:label ?name ; 
          wdt:P570 ?dateOfDeath .
  FILTER ( lang(?name) = &amp;#34;en&amp;#34; )
}
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;You can &lt;a href=&#34;https://query.wikidata.org/#%23defaultView%3ATimeline%0ASELECT%20%3Fname%20%3FdateOfBirth%20%3FdateOfDeath%20%0AWHERE%20%7B%0A%20%20VALUES%20%3Fperson%20%7B%20wd%3AQ692%20wd%3AQ179277%20wd%3AQ53068%20wd%3AQ307%20%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20wd%3AQ9191%20wd%3AQ189144%20%7D%0A%0A%20%20%3Fperson%20wdt%3AP569%20%3FdateOfBirth%20%3B%0A%20%20%20%20%20%20%20%20%20%20rdfs%3Alabel%20%3Fname%20%3B%20%0A%20%20%20%20%20%20%20%20%20%20wdt%3AP570%20%3FdateOfDeath%20.%0A%20%20FILTER%20%28%20lang%28%3Fname%29%20%3D%20%22en%22%20%29%0A%7D%0A&#34;&gt;try it out yourself&lt;/a&gt;. Here is the result:&lt;/p&gt;
&lt;img src=&#34;https://www.bobdc.com/img/main/shakespeareContemporaries.png&#34; border=&#34;0&#34; align=&#34;center&#34; alt=&#34;Timeline of Shakespeare contemporaries&#34;/&gt;
&lt;p&gt;As I showed in last month&amp;rsquo;s blog entry, a &lt;code&gt;VALUES&lt;/code&gt; clause can hold two-dimensional sets of data in addition to simple lists like my query about Shakespeare&amp;rsquo;s contemporaries. This enables even more possibilities when using your own data with Wikidata&amp;rsquo;s wide choice of visualization tools.&lt;/p&gt;
&lt;hr/&gt;
&lt;p&gt;&lt;em&gt;Comments? Reply to &lt;a href=&#34;https://twitter.com/bobdc/status/1683135977139609600&#34;&gt;my tweet&lt;/a&gt; (or even better, my &lt;a href=&#34;https://mas.to/@bobdc/110764092135012496&#34;&gt;Mastodon message&lt;/a&gt;) announcing this blog entry.&lt;/em&gt;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2023">2023</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
    </item>
    
    <item>
      <title>Entity recognition from within a SPARQL query</title>
      <link>https://www.bobdc.com/blog/spacy/</link>
      <pubDate>Sun, 25 Jun 2023 10:53:00 +0000</pubDate>
      
      <guid>https://www.bobdc.com/blog/spacy/</guid>
      
      
      <description><div>Using my new employer&#39;s excellent free product.</div><div>&lt;img src=&#34;https://www.bobdc.com/img/main/ontotextspacy.png&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;Ontotext and spacy logos&#34; width=&#34;180&#34;/&gt;
&lt;p&gt;I &lt;a href=&#34;https://twitter.com/bobdc/status/1663563990302359562&#34;&gt;recently announced&lt;/a&gt; that I have joined &lt;a href=&#34;https://www.ontotext.com/&#34;&gt;Ontotext&lt;/a&gt; as a full-time Senior Tech Writer. I have admired their free &lt;a href=&#34;https://www.ontotext.com/products/graphdb/&#34;&gt;GraphDB&lt;/a&gt; triplestore for a long time (for example, I &lt;a href=&#34;https://www.bobdc.com/blog/geosparqlgraphdb/&#34;&gt;wrote about&lt;/a&gt; how well it supports the GeoSPARQL geospatial extension in October of 2020) and I am now learning about all the great capabilities of their &lt;a href=&#34;https://www.ontotext.com/products/&#34;&gt;commercial products&lt;/a&gt;, such as the scalability of GraphDB Enterprise.&lt;/p&gt;
&lt;p&gt;As always, though, in this blog I will focus on free RDF-related software, so this month I will write about a cool feature of GraphDB Free that I just learned about just last week: its use of the &lt;a href=&#34;https://spacy.io&#34;&gt;spaCy&lt;/a&gt; library to let you do text analysis and entity recognition from within a SPARQL query.&lt;/p&gt;
&lt;p&gt;The &lt;a href=&#34;https://graphdb.ontotext.com/documentation/10.2/text-mining-plugin.html&#34;&gt;Text Mining Plugin&lt;/a&gt; page of the GraphDB documentation describes text mining protocols that it supports: spaCy, GATE Cloud, and Ontotext&amp;rsquo;s Tag API. The spaCy section of that page shows the two lines necessary to create and then run a spaCy client with  &lt;code&gt;docker&lt;/code&gt;, and then it shows a SPARQL &lt;code&gt;INSERT DATA&lt;/code&gt; command that establishes a connection from GraphDB to the spaCy client. Once that&amp;rsquo;s done you&amp;rsquo;re ready to run queries that tell spaCy to analyze content that you pass to it.&lt;/p&gt;
&lt;p&gt;The &lt;a href=&#34;https://graphdb.ontotext.com/documentation/10.2/text-mining-plugin.html#find-spacy-entities-through-graphdb&#34;&gt;Find spaCy entities through GraphDB&lt;/a&gt; section that follows that shows a query that passes a paragraph of text about Dyson Vacuum Cleaners to spaCy and and returns several columns of information about how spaCy annotated it to indicate the entities that it found. Beneath that on the&lt;a href=&#34;https://graphdb.ontotext.com/documentation/10.2/text-mining-plugin.html&#34;&gt;Text Mining Plugin&lt;/a&gt; page you can see the results: it identifies &amp;ldquo;Dyson Ltd.&amp;rdquo; as an organization, James Dyson as a person, Singapore as a geopolitical entity, and more. (While that documentation shows six of the returned rows, I got twelve when I ran it.)&lt;/p&gt;
&lt;p&gt;That query was a &lt;code&gt;SELECT&lt;/code&gt; query. I wanted to run a &lt;code&gt;CONSTRUCT&lt;/code&gt; query that would create new triples about some of the identified things. If it recognized people, places, and organizations, I wanted it to create triples making those instances &lt;a href=&#34;https://schema.org/&#34;&gt;schema.org&lt;/a&gt; classes. Revising the &lt;code&gt;SELECT&lt;/code&gt; query mentioned above, I ended up with this:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;# getting triples from endpoint with this query: 
# curl -H &amp;#34;Accept: text/turtle&amp;#34; --data-urlencode \
# &amp;#34;query@spacytest.rq&amp;#34; http://bob-inspiron:7200/repositories/my_repo

PREFIX txtm:      &amp;lt;http://www.ontotext.com/textmining#&amp;gt;
PREFIX txtm-inst: &amp;lt;http://www.ontotext.com/textmining/instance#&amp;gt;
PREFIX s:         &amp;lt;https://schema.org/&amp;gt;
PREFIX rdfs:      &amp;lt;http://www.w3.org/2000/01/rdf-schema#&amp;gt;
PREFIX dc:        &amp;lt;http://purl.org/dc/elements/1.1/&amp;gt;

# Don&amp;#39;t forget to start the spaCy server and run the INSERT query
# that establishes the connection to it before running this query. 

CONSTRUCT {
        ?annotatedDocument txtm:annotations ?annotation .
        ?annotation txtm:annotationText ?annotationText .

        ?entityID a ?soClassname ; 
                  rdfs:label ?annotationText . 
        # The annotation has a related resource: this
        # new resource being declared. 
        ?annotation dc:relation ?entityID .
}
WHERE {
  ?searchDocument a txtm-inst:localSpacy;
     txtm:text &amp;#39;&amp;#39;&amp;#39;Dyson Ltd. plans to hire 450 people globally, with
     more than half the recruits in its headquarters in Singapore.
     The company best known for its vacuum cleaners and hand dryers will
     add 250 engineers in the city-state. This comes short before the founder
     James Dyson announced he is moving back to the UK after moving residency
     to Singapore. Dyson, a prominent Brexit supporter who is worth US$29
     billion, faced criticism from British lawmakers for relocating his
     company&amp;#39;&amp;#39;&amp;#39; .

    GRAPH txtm-inst:localSpacy {
        ?annotatedDocument txtm:annotations ?annotation .
        ?annotation txtm:annotationText ?annotationText ;
                    txtm:annotationKey ?annotationKey;
                    txtm:annotationType ?annotationType ;
    }
    VALUES (?annotationType ?soClassname) {
      (&amp;#34;ORG&amp;#34;    s:Organization) 
      (&amp;#34;GPE&amp;#34;    s:AdministrativeArea)
      (&amp;#34;PERSON&amp;#34; s:Person)
    }

    # Create a URI to use as the subject of each newly
    # recognized entity being declared as a schema.org class. 
    BIND(UUID() AS ?entityID)
}
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;The &lt;code&gt;WHERE&lt;/code&gt; clause grabs the information generated by spaCy like the &lt;code&gt;WHERE&lt;/code&gt; clause in the original &lt;code&gt;SELECT&lt;/code&gt; query in the GraphDB documentation does. It also uses SPARQL&amp;rsquo;s &lt;a href=&#34;https://www.bobdc.com/blog/sparql-11s-new-values-keyword/&#34;&gt;VALUES&lt;/a&gt; clause to map spaCy annotation types to schema.org classes. (With more input text, I&amp;rsquo;m sure spaCy would recognize more types of entities, so you could easily extend this &lt;code&gt;VALUES&lt;/code&gt; list to accommodate those.) Then instead of a &lt;code&gt;SELECT&lt;/code&gt; clause, I have a &lt;code&gt;CONSTRUCT&lt;/code&gt; to create triples saying that the recognized entities are instances of the appropriate classes.&lt;/p&gt;
&lt;p&gt;This is only a beginning. For example, spaCy recognizes Singapore as a geopolitical entity in two different places, but it doesn&amp;rsquo;t know that the two identified entities are the same thing, so my query creates a separate &lt;code&gt;s:AdministrativeArea&lt;/code&gt; instance for each. There are tools that could be used further down the pipeline to straighten this out and maybe connect it to &lt;code&gt;http://www.wikidata.org/entity/Q334&lt;/code&gt;, the Wikidata identifier for Singapore; because this &lt;code&gt;CONSTRUCT&lt;/code&gt; query creates triples instead of a table of results, it will be much easier to pass the result of its work down a pipeline to other tools that can do further enhancements.&lt;/p&gt;
&lt;hr/&gt;
&lt;p&gt;&lt;em&gt;Comments? Reply to &lt;a href=&#34;https://twitter.com/bobdc/status/1672984947034845185&#34;&gt;my tweet&lt;/a&gt; (or even better, my &lt;a href=&#34;https://mas.to/@bobdc/110605478030184648&#34;&gt;Mastodon message&lt;/a&gt;) announcing this blog entry.&lt;/em&gt;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2023">2023</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
    </item>
    
    <item>
      <title>Getting ChatGPT to turn a flat vocabulary list into a hierarchical taxonomy</title>
      <link>https://www.bobdc.com/blog/chatgpttaxonomy/</link>
      <pubDate>Sat, 20 May 2023 11:15:00 +0000</pubDate>
      
      <guid>https://www.bobdc.com/blog/chatgpttaxonomy/</guid>
      
      
      <description><div>ChatGPT-3, Chat GPT-4.</div><div>&lt;img src=&#34;https://www.bobdc.com/img/main/labels.jpg&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;SKOS diagram&#34; width=&#34;320&#34;/&gt;
&lt;p&gt;I was catching up with my old friend &lt;a href=&#34;https://www.linkedin.com/in/paulprescod/&#34;&gt;Paul Prescod&lt;/a&gt; the other day. We have not only known each other since the early days of XML, but actually before that: &amp;ldquo;since XML was a &lt;a href=&#34;https://en.wikipedia.org/wiki/Standard_Generalized_Markup_Language&#34;&gt;four-letter word&lt;/a&gt;&amp;rdquo;, to quote Paul.&lt;/p&gt;
&lt;p&gt;One current popular topic we discussed is where LLM tools such as ChatGPT can add value in the data pipelines that we have worked with. We&amp;rsquo;ve all seen blog posts where people got ChatGPT to create code in their favorite languages; Paul and I, as always, were focused on how it could improve content and content metadata. I&amp;rsquo;ve often said that the point of metadata is to add value to content, so automating the creation of useful metadata is automating the addition of value to content.&lt;/p&gt;
&lt;p&gt;Automating the assignment of keyword terms from a controlled vocabulary to content, in order to improve content findability, has been a classic goal for decades. While talking to Paul, I wondered whether the controlled vocabulary itself could be improved by ChatGPT, specifically by turning flat lists into hierarchies.&lt;/p&gt;
&lt;p&gt;How does this add value? Imagine that Sidney at the hypothetical Snee Company stores a picture of &lt;a href=&#34;https://en.wikipedia.org/wiki/Lassie&#34;&gt;Lassie&lt;/a&gt; tagged as &amp;ldquo;Collie&amp;rdquo; in a CMS, and that term in the CMS&amp;rsquo;s taxonomy has a link to the broader term &amp;ldquo;Dog&amp;rdquo;. Taylor, another Snee employee, is writing an article about hints for taking your pets on vacation and searches the CMS for dog pictures. Sidney didn&amp;rsquo;t tag the Lassie picture as &amp;ldquo;Dog&amp;rdquo;, but the taxonomy-aware search engine knows that it shows a collie, which therefore makes it a dog, and returns that picture to Taylor. Taylor found a good picture for the article and has benefited from the value added by this piece of metadata.&lt;/p&gt;
&lt;p&gt;I thought I&amp;rsquo;d create a controlled vocabulary of animal species and broader terms as a simple flat list and see how well ChatGPT-3 could impose some hierarchical structure on this by adding links such as the Collie-to-Dog one described above. Of course, I would have it use the RDF-based standard &lt;a href=&#34;https://www.bobdc.com/blog/skosibm/&#34;&gt;SKOS&lt;/a&gt; standard for controlled vocabularies, taxonomies, and thesauri.&lt;/p&gt;
&lt;p&gt;A web search showed that Kurt Cagle, another old friend from XML&amp;rsquo;s early days, had given me a nice head start in his recent posting &lt;a href=&#34;https://thecaglereport.com/2023/03/16/nine-chatgpt-tricks-for-knowledge-graph-workers/&#34;&gt;Nine ChatGPT Tricks for Knowledge Graph Workers&lt;/a&gt;. His list of animals (see &amp;ldquo;Example 8. Taxonomy Construction&amp;rdquo; in that article) sorted and indented the terms to show their hierarchy. I wanted to ChatGPT to do that work, so I made a copy of Kurt&amp;rsquo;s list, removed all the leading spaces, and sorted the lines into a random order. (Cool Linux command line tool I found for that: &lt;a href=&#34;https://man7.org/linux/man-pages/man1/shuf.1.html&#34;&gt;&lt;code&gt;shuf&lt;/code&gt;&lt;/a&gt;.)&lt;/p&gt;
&lt;p&gt;I then wrote a one-off Perl script that converted the list to SKOS Turtle RDF. All it said was that these were concepts with these labels. Here are the first 15 lines; the remainder follows the pattern shown:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;@prefix skos: &amp;lt;http://www.w3.org/2004/02/skos/core#&amp;gt; . 
@prefix d:    &amp;lt;http://learningsparql.com/ns/data#&amp;gt; .

d:c1 a skos:Concept ;
     skos:prefLabel &amp;#34;Tigers&amp;#34; .

d:c2 a skos:Concept ;
     skos:prefLabel &amp;#34;Bears&amp;#34; .

d:c3 a skos:Concept ;
     skos:prefLabel &amp;#34;Mammals&amp;#34; .

d:c4 a skos:Concept ;
     skos:prefLabel &amp;#34;Primates&amp;#34; .
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;I wanted to see if ChatGPT could add triples such as &lt;code&gt;d:c4 skos:broader d:c3&lt;/code&gt;. To summarize my result, the free ChatGPT-3 did OK, but not great; when Paul later tried it with ChatGPT-4, for which he is paying for a subscription, that did much better. Important ChatGPT lesson here: you get what you pay for.&lt;/p&gt;
&lt;p&gt;Here is the prompt that I gave to ChatGPT-3:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Take the following set of Turtle RDF triples and add more triples that use skos:broader as their predicate. The new triples should show the hierarchical relationship of the existing terms.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Below that prompt I pasted the data that is excerpted above; you can see the whole thing at &lt;a href=&#34;https://bobdc.com/miscfiles/simpleSKOS.ttl&#34;&gt;https://bobdc.com/miscfiles/simpleSKOS.ttl&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The system responded with my original RDF triples and the new ones that it generated based on what I asked for. The prefix declarations at the top were missing their angle brackets, so I added those to make it parse properly. I then wrote the following SPARQL query to process the returned RDF and show me a report that was more intuitive to read than statements like  &lt;code&gt;d:c4 skos:broader d:c3&lt;/code&gt;:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;prefix skos: &amp;lt;http://www.w3.org/2004/02/skos/core#&amp;gt; 
prefix d:    &amp;lt;http://learningsparql.com/ns/data#&amp;gt;

SELECT ?narrowerLabel ?broaderLabel WHERE {
  ?narrower skos:prefLabel ?narrowerLabel ;
            skos:broader ?broader .
  ?broader skos:prefLabel ?broaderLabel . 
}
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Here is the resulting report:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;----------------------------------
| narrowerLabel  | broaderLabel  |
==================================
| &amp;#34;Mammals&amp;#34;      | &amp;#34;Bears&amp;#34;       |
| &amp;#34;Mammals&amp;#34;      | &amp;#34;Tigers&amp;#34;      |
| &amp;#34;Coyotes&amp;#34;      | &amp;#34;Canines&amp;#34;     |
| &amp;#34;Felines&amp;#34;      | &amp;#34;Animals&amp;#34;     |
| &amp;#34;Vertebrates&amp;#34;  | &amp;#34;Animals&amp;#34;     |
| &amp;#34;Ursines&amp;#34;      | &amp;#34;Bears&amp;#34;       |
| &amp;#34;Wolves&amp;#34;       | &amp;#34;Canines&amp;#34;     |
| &amp;#34;Animals&amp;#34;      | &amp;#34;Vertebrates&amp;#34; |
| &amp;#34;Primates&amp;#34;     | &amp;#34;Mammals&amp;#34;     |
| &amp;#34;Chimpanzees&amp;#34;  | &amp;#34;Primates&amp;#34;    |
| &amp;#34;Canines&amp;#34;      | &amp;#34;Mammals&amp;#34;     |
| &amp;#34;Lions&amp;#34;        | &amp;#34;Bears&amp;#34;       |
| &amp;#34;Apes&amp;#34;         | &amp;#34;Primates&amp;#34;    |
| &amp;#34;Carnivores&amp;#34;   | &amp;#34;Animals&amp;#34;     |
| &amp;#34;Insectivores&amp;#34; | &amp;#34;Carnivores&amp;#34;  |
| &amp;#34;Badgers&amp;#34;      | &amp;#34;Carnivores&amp;#34;  |
| &amp;#34;Panthers&amp;#34;     | &amp;#34;Felines&amp;#34;     |
| &amp;#34;Humans&amp;#34;       | &amp;#34;Felines&amp;#34;     |
----------------------------------
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;A few are completely wrong but it usually made sensible connections. As you can see,  many of the connections are backward, like the first two.&lt;/p&gt;
&lt;p&gt;Paul had much better luck doing the exact same thing with ChatGPT-4. He also did a lot of prompt refinement to get the system to explain why it did what it did. He has promised me that he will be writing that up soon, and when he does I will link to his writeup from here. It&amp;rsquo;s an interesting start to an answer for one of the important questions of 2023: &amp;ldquo;What useful work can I get Large Language Models to do for me?&amp;rdquo;&lt;/p&gt;
&lt;hr/&gt;
&lt;p&gt;&lt;em&gt;Comments? Reply to &lt;a href=&#34;https://twitter.com/bobdc/status/1659944436501979137&#34;&gt;my tweet&lt;/a&gt; (or even better, my &lt;a href=&#34;https://mas.to/@bobdc/110401725068040843&#34;&gt;Mastodon message&lt;/a&gt;) announcing this blog entry.&lt;/em&gt;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2023">2023</category>
      
      <category domain="https://www.bobdc.com//categories/skos">SKOS</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
      <category domain="https://www.bobdc.com//categories/ai-and-machine-learning">ai-and-machine-learning</category>
      
    </item>
    
    <item>
      <title>More advice about software documentation</title>
      <link>https://www.bobdc.com/blog/techwritingadvice/</link>
      <pubDate>Sun, 14 May 2023 12:10:00 +0000</pubDate>
      
      <guid>https://www.bobdc.com/blog/techwritingadvice/</guid>
      
      
      <description><div>Especially documenting APIs.</div><div>&lt;img src=&#34;https://www.bobdc.com/img/main/apipic.jpg&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;API diagram&#34; width=&#34;320&#34;/&gt;
&lt;p&gt;Early last year, in the blog entry &lt;a href=&#34;../ieee-podcast&#34;&gt;Doing a podcast interview about technical writing&lt;/a&gt;, I described an interview I did for the IEEE Software Engineering Radio podcast. Listening to it again this week I saw that I covered a lot of good ground. Since then I have thought of a few other points I wish I&amp;rsquo;d mentioned, so here they are in another bulleted list. Because of some recent experience I had enough thoughts about documenting APIs that I gave that discussion its own section below.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Documentation (in particular, the User Guide) should reflect a company’s official vision for the product, which is something that a team of people at the company typically worked pretty hard on. Coordinate with them and their work. For each thing that the vision promises about the product, it should be easy to find information about how to do that thing in the documentation.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Good documentation is a form of marketing literature and good marketing literature is a form of documentation. Documentation should convince the reader that the product will help them get useful work done and marketing literature should educate the reader about how the product gets used.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;../jupytersparql/&#34;&gt;Jupyter notebooks&lt;/a&gt; are good for documentation–but just for tutorials, because they walk through a series of specific steps for a particular scenario and show the results. When you have tutorials, you still need a User Guide to explain big-picture topics and a Reference Guide to explain every detail of the product. Notebooks are so focused on specific scenarios that they’re not good for those purposes.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Something I hate, and I’ve seen entire books of it: developers who think that long programs from them in the documentation with lots of comments are the best way for others to learn. (Ooh, a “complete application!”) Examples should be short and self-contained so that they are easier to apply to other contexts. In a book, explanations of code samples should be in a readable proportionally-spaced font such as  Times Roman or Helvetica. This is part of the appeal of Jupyter notebooks, which let you mix executable code with nicely-formatted prose text.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;While I stand by my description of the basic categories of documentation, the term “User Guide” seems to have gone out of fashion. Instead of a five-section User Guide as a top-level document in a product’s documentation collection, nowadays a software company is more likely to present each of those five sections as a top-level document in its own right. The good news: what was formerly a third-level heading in that content becomes a second-level heading, so it’s easier to display descriptions of more sections in an expanded table of contents. The bad news: the list of top-level documents can more easily get too long and therefore difficult to quickly evaluate when looking for something.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;documenting-apis&#34;&gt;Documenting APIs&lt;/h2&gt;
&lt;p&gt;The core of a software product may be an API, or &lt;a href=&#34;https://en.wikipedia.org/wiki/API&#34;&gt;Application Programming Interface&lt;/a&gt;. Customers doing their work in a particular programming language are given libraries or access to a server where they can call the functions that make up the API. If you buy a robot with a Python API, the product includes Python libraries so that you might make calls like &lt;code&gt;head.turn(left,30)&lt;/code&gt; if you want to turn the robot&amp;rsquo;s head 30 degrees to the left. You don&amp;rsquo;t have to worry about the robot&amp;rsquo;s internal electronics because the vendor has provided you with a Python-based interface at a higher level of abstraction that lets you simply tell the robot what to do.&lt;/p&gt;
&lt;p&gt;Usually, the developers who actually implemented the API functions like &lt;code&gt;head.turn(direction,degrees)&lt;/code&gt; included comments with their code that describe more about these functions and what you can do with them. If their comments follow the right formatting conventions, an automated program such as &lt;a href=&#34;https://swagger.io/&#34;&gt;Swagger&lt;/a&gt;, &lt;a href=&#34;https://www.sphinx-doc.org/en/master/tutorial/automatic-doc-generation.html&#34;&gt;Sphinx&lt;/a&gt;, &lt;a href=&#34;https://docs.python.org/3/library/pydoc.html&#34;&gt;PyDoc&lt;/a&gt;, or &lt;a href=&#34;https://www.doxygen.nl/&#34;&gt;Doxygen&lt;/a&gt; can extract those comments (known as &lt;a href=&#34;https://en.wikipedia.org/wiki/Docstring&#34;&gt;docstrings&lt;/a&gt;) and package them in HTML documents that make it easy for API users to look up the information they need. The conversion package will include CSS modules and other means to customize the content&amp;rsquo;s look and feel.&lt;/p&gt;
&lt;p&gt;From the user&amp;rsquo;s perspective, the API documentation is the formatted list of the functions or methods that they can call along with a description of each one&amp;rsquo;s purpose, the role and types of parameters to pass to it, what it returns, and maybe an example.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve heard people refer to this as automated creation of API documentation, but the automated generation of the HTML doesn&amp;rsquo;t mean that the actual &lt;em&gt;writing&lt;/em&gt; of the documentation is automated. For example, the developer who coded the &lt;code&gt;head.turn()&lt;/code&gt; function might just give the two parameter names of &lt;code&gt;direction&lt;/code&gt; and &lt;code&gt;degrees&lt;/code&gt; and leave it at that. This can lead to many questions: how do you specify the direction? What are the choices? Are they constants or quoted strings? Does the number of degrees have to be a whole number?&lt;/p&gt;
&lt;p&gt;To improve these descriptions, a tech writer working on API documentation functions as an editor, but much more than a copy editor correcting spelling and punctuation. These tech writers are also reporters, interviewing the developers about who would use a given function, for what purpose, in what situations. You often ask those same questions about each parameter passed to the function: why is each one there? What powers does it give to the developer calling the function? API documentation seems very different from marketing literature, but to return to my point above about aligning them, even API documentation should make it clear that each feature provides useful value to the user that fits in with the bigger picture of what the marketing department promises about the product.&lt;/p&gt;
&lt;p&gt;Just providing the type of each parameter might not be enough. For example, for a number, the documentation should indicate what would be a typical high value and a typical low value and what effect these would have. I was once revising the description of a parameter whose docstring merely said that it was a decimal number and then found out from the responsible developer that it represented a percentage, so that a value of .5 meant 50%. The fact that 1 was the highest possible value that you could pass was a surprise to me, and I made sure to include that in its documentation.&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s not unusual for tech writers to ask the developer these questions and then go and fix the docstrings in the source code themselves–coordinating, of course, with the developers in charge of maintaining that code and often going through the same &lt;a href=&#34;https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/about-pull-requests&#34;&gt;Github pull request&lt;/a&gt; steps that modifications to the executable code go through. These tech writers must be familiar with the syntax of the programming language being used, because you don’t want to break anything; accidentally deleting the wrong comma can prevent the code from compiling at product build time.&lt;/p&gt;
&lt;p&gt;If the API is a significant part of the product, tech writers should know the language that calls the API well enough to write programs that try out these functions themselves. This can often help them to answer their questions about the functions that need more documentation, thereby reducing the need to pester the developers who wrote these functions in the first place.&lt;/p&gt;
&lt;p&gt;Ideally, tech writers will enjoy writing programs that use the API to make the product do interesting things. If so, they are in the right job!&lt;/p&gt;
&lt;p&gt;&lt;em&gt;&lt;a href=&#34;https://flickr.com/photos/thesmith/4574969567/&#34;&gt;API diagram picture&lt;/a&gt; by &lt;a href=&#34;https://flickr.com/photos/thesmith/&#34;&gt;Ben Smith&lt;/a&gt;, &lt;a href=&#34;http://creativecommons.org/licenses/by-nc/2.0/&#34;&gt;Creative Commons CC BY-NC 2.0&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
&lt;hr/&gt;
&lt;p&gt;&lt;em&gt;Comments? Reply to &lt;a href=&#34;https://twitter.com/bobdc/status/1657790800934076416&#34;&gt;my tweet&lt;/a&gt; (or even better, my &lt;a href=&#34;https://mas.to/@bobdc/110368074996491025&#34;&gt;Mastodon message&lt;/a&gt;) announcing this blog entry.&lt;/em&gt;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2023">2023</category>
      
      <category domain="https://www.bobdc.com//categories/documenting-software">documenting software</category>
      
    </item>
    
    <item>
      <title>Introducing RDF and related standards</title>
      <link>https://www.bobdc.com/blog/rdfintroseries/</link>
      <pubDate>Sun, 30 Apr 2023 11:01:00 +0000</pubDate>
      
      <guid>https://www.bobdc.com/blog/rdfintroseries/</guid>
      
      
      <description><div>The series.</div><div>&lt;p&gt;&lt;a href=&#34;../writing/rdfstandards/&#34;&gt;&lt;img src=&#34;https://www.bobdc.com/img/main/rdflogo.png&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;RDF logo&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;A co-worker recently told me that he was considering using OWL for something but didn&amp;rsquo;t want to deal with all that XML. It was disappointing to hear this in the year 2023, but I guess those early images of RDF/XML being used to implement OWL restriction classes really scared some people off.&lt;/p&gt;
&lt;p&gt;There have been other occasions where I wanted to suggest that someone could learn the basics of RDF, OWL, and the related standards by reading the introductions I did for each as blog entries in 2021, but I hate to send these people five URLs or tell them to go to the list of &lt;a href=&#34;../../categories/2021/&#34;&gt;2021 entries&lt;/a&gt; and read the June through October entries in the reverse of the order shown.&lt;/p&gt;
&lt;p&gt;So now there&amp;rsquo;s one URL for a table of contents page at &lt;strong&gt;&lt;a href=&#34;../../articles/rdfstandards&#34;&gt;A brief introduction to RDF, related standards, and what they can do for you&lt;/a&gt;&lt;/strong&gt;. It lists the relevant entries, in order, with a brief description of each. I hope that it&amp;rsquo;s useful for others who want to provide a simple, brief—if a bit opinionated—introduction to those who want to ramp up quickly on these topics. (I didn&amp;rsquo;t mean for this table of contents page to be displayed with the full blog entry theming, but to take advantage of the CSS and so forth that I&amp;rsquo;m using with the &lt;a href=&#34;../changing-my-blogs-domain-name/&#34;&gt;Hugo&lt;/a&gt; blog engine I just used the existing template without creating a new one for it.)&lt;/p&gt;
&lt;p&gt;I wrote a similar article many years ago titled &lt;a href=&#34;http://www.snee.com/rdf/semweboverview.html&#34;&gt;RDF, The Semantic Web, and Linked Data&lt;/a&gt; that did a broader summary with references to many more blog entries. Much of that, such as its references to XMP and RDFa, are quite dated now. That might be interesting to people for its historical perspective from 2009, but the brevity of this new summary that I wrote should make it more helpful overall.&lt;/p&gt;
 &lt;hr/&gt;
&lt;p&gt;&lt;em&gt;Comments? Reply to &lt;a href=&#34;https://twitter.com/bobdc/status/1652693229060407302&#34;&gt;my tweet&lt;/a&gt; (or even better, my &lt;a href=&#34;https://mas.to/@bobdc/110288426502118230&#34;&gt;Mastodon message&lt;/a&gt;) announcing this blog entry.&lt;/em&gt;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2023">2023</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
      <category domain="https://www.bobdc.com//categories/rdf">rdf</category>
      
      <category domain="https://www.bobdc.com//categories/owl">OWL</category>
      
      <category domain="https://www.bobdc.com//categories/rdfs">RDFS</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
    </item>
    
    <item>
      <title>Normalizing company names (and more) with SPARQL and Wikidata</title>
      <link>https://www.bobdc.com/blog/wikidatanormalizing/</link>
      <pubDate>Sun, 26 Mar 2023 11:16:00 +0000</pubDate>
      
      <guid>https://www.bobdc.com/blog/wikidatanormalizing/</guid>
      
      
      <description><div>As a service!</div><div>&lt;p&gt;&lt;a href=&#34;https://en.wikipedia.org/wiki/Big_Blue_(mascot)&#34;&gt;&lt;img id=&#34;id143548&#34; src=&#34;https://www.bobdc.com/img/main/bigblue.jpg&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;[ODU mascot]&#34; width=&#34;160&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Several years ago I wrote &lt;a href=&#34;https://www.bobdc.com/blog/normalizing-company-names-with/&#34;&gt;Normalizing company names with SPARQL and DBpedia&lt;/a&gt; to describe how SPARQL queries to DBpedia let you take advantage of the Wikipedia logic that redirects the URL  &lt;a href=&#34;http://en.wikipedia.org/wiki/Big_Blue&#34;&gt;http://en.wikipedia.org/wiki/Big_Blue&lt;/a&gt; to &lt;a href=&#34;https://en.wikipedia.org/wiki/IBM&#34;&gt;https://en.wikipedia.org/wiki/IBM&lt;/a&gt; and &lt;a href=&#34;http://en.wikipedia.org/wiki/Bobby_Kennedy&#34;&gt;http://en.wikipedia.org/wiki/Bobby_Kennedy&lt;/a&gt; to &lt;a href=&#34;http://en.wikipedia.org/wiki/Robert_F._Kennedy&#34;&gt;http://en.wikipedia.org/wiki/Robert_F._Kennedy&lt;/a&gt;. This lets SPARQL queries normalize names—a useful task to perform for data cleanup.&lt;/p&gt;
&lt;p&gt;This time I did it with Wikidata. As with the DBpedia version, I did it using a &lt;code&gt;SERVICE&lt;/code&gt; call to Wikidata so that a query that is running somewhere besides Wikidata can take advantage of this. This time I also showed how to make it work for countries as well as companies. Minor changes should make it work for most other classes.&lt;/p&gt;
&lt;p&gt;Because my new version uses alternative names instead of redirect data, the name of the company that it returns is the name on the Wikipedia page, not the official name of the company. So, for example, instead of &amp;ldquo;Eastman Kodak Company&amp;rdquo; it will return &amp;ldquo;Kodak&amp;rdquo; and instead of &amp;ldquo;Apple, Inc.&amp;rdquo; it will return &amp;ldquo;Apple&amp;rdquo;.  Still, the normalization down to a single name is useful.&lt;/p&gt;
&lt;p&gt;Here is the query I entered into the Wikidata Query Service to send to its endpoint. You can also see and execute the query with &lt;a href=&#34;https://query.wikidata.org/#PREFIX%20rdfs%3A%20%3Chttp%3A%2F%2Fwww.w3.org%2F2000%2F01%2Frdf-schema%23%3E%0APREFIX%20wd%3A%20%20%20%3Chttp%3A%2F%2Fwww.wikidata.org%2Fentity%2F%3E%0APREFIX%20skos%3A%20%3Chttp%3A%2F%2Fwww.w3.org%2F2004%2F02%2Fskos%2Fcore%23%3E%0A%0ASELECT%20DISTINCT%20%3FwdName%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%23%20Find%20the%20Wikidataname%0AWHERE%20%7B%0A%20%20BIND%28%22Big%20Blue%22%40en%20AS%20%3FenglishName%29%20%20%20%20%23%20Search%20based%20on%20a%20nickname.%0A%20%20%23BIND%28%22IBM%22%40en%20AS%20%3FenglishName%29%20%20%20%20%20%20%20%20%20%23%20Search%20based%20on%20Wikipedia%20name.%0A%20%20%23BIND%28%22Apple%20Inc.%22%40en%20AS%20%3FenglishName%29%20%20%23%20Search%20based%20on%20official%20name.%0A%20%20%0A%20%20BIND%20%28wd%3AQ4830453%20as%20%3FentityClass%29%20%20%20%20%20%20%23%20Business%20is%20the%20entity%20class...%0A%20%20%0A%20%20%3Fentity%20wdt%3AP31%20%3FentityClass%20.%20%20%20%20%20%20%20%20%20%20%23%20that%20we%20want%20to%20search.%0A%20%20%0A%20%20%23%20Union%20of%20two%20sets%20of%20triples%3A%20entities%20that%20have%20the%20input%20name%20as%0A%20%20%23%20an%20alternative%20name%20and%20those%20that%20have%20it%20as%20their%20official%20name.%20%0A%20%20%7B%20%20%20%20%20%20%20%0A%20%20%20%20%3Fentity%20skos%3AaltLabel%20%3FenglishName%20%3B%20%0A%20%20%20%20%20%20%20%20%20%20%20%20rdfs%3Alabel%20%3FofficialName%20.%0A%20%20%20%20FILTER%20%28%20lang%28%3FofficialName%29%20%3D%20%22en%22%20%29%0A%20%20%7D%0A%20%20UNION%0A%20%20%7B%20%3Fentity%20rdfs%3Alabel%20%3FenglishName%20.%20%7D%0A%20%20%0A%20%20%23%20if%20there%20was%20an%20officialName%20to%20go%20with%20the%20input%20name%20as%20an%20alternative%0A%20%20%23%20name%2C%20use%20that%20as%20the%20Wikidata%20name%2C%20otherwise%20use%20the%20input%20name%20as%20%0A%20%20%23%20the%20Wikidata%20name%20if%20it%20was%20used%20as%20the%20rdfs%3Alabel%20name%20for%20that%20entity.%20%0A%20%20BIND%28STR%28COALESCE%28%3FofficialName%2C%3FenglishName%29%29%20AS%20%3FwdName%29%0A%7D%0A&#34;&gt;this query link&lt;/a&gt;.&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;PREFIX rdfs: &amp;lt;http://www.w3.org/2000/01/rdf-schema#&amp;gt;
PREFIX wd:   &amp;lt;http://www.wikidata.org/entity/&amp;gt;
PREFIX skos: &amp;lt;http://www.w3.org/2004/02/skos/core#&amp;gt;

SELECT DISTINCT ?wdName                   # Find the Wikidata name
WHERE {
  # A few test cases to choose from
  BIND(&amp;#34;Big Blue&amp;#34;@en AS ?englishName)    # Search based on a nickname.
  #BIND(&amp;#34;IBM&amp;#34;@en AS ?englishName)         # Search based on Wikipedia name.
  #BIND(&amp;#34;Apple Inc.&amp;#34;@en AS ?englishName)  # Search based on official name.

  BIND (wd:Q4830453 as ?entityClass)      # Business is the entity class...
  ?entity wdt:P31 ?entityClass .          # that we want to search.
  
  # Union of two sets of triples: entities that have the input name as
  # an alternative name and those that have it as their official name. 
  {       
    ?entity skos:altLabel ?englishName ; 
            rdfs:label ?officialName .
    FILTER ( lang(?officialName) = &amp;#34;en&amp;#34; )
  }
  UNION
  { ?entity rdfs:label ?englishName . }
  
  # Get the official name if it was bound, otherwise the
  # the English name if part 2 of the UNION found it. 
  BIND(STR(COALESCE(?officialName,?englishName)) AS ?wdName)
}
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;I won&amp;rsquo;t discuss the query much here because it&amp;rsquo;s heavily commented. You can also read about the query logic in my earlier article on doing this with DBpedia (especially the use of the under-appreciated &lt;code&gt;COALESCE&lt;/code&gt; function). I will say that, although Wikidata sometimes uses its own vocabulary to express ideas that could have used basic parts of the standard (for example, using &lt;code&gt;wdt:P31&lt;/code&gt; instead of &lt;code&gt;rdf:type&lt;/code&gt; to show class membership), it&amp;rsquo;s nice to see the RDFS and SKOS vocabularies used as part of the data I was retrieving.&lt;/p&gt;
&lt;h2 id=&#34;as-a-service&#34;&gt;As a SERVICE&amp;hellip;&lt;/h2&gt;
&lt;p&gt;You can run the query below with any SPARQL processor that has access to the Internet so that it can make the &lt;code&gt;SERVICE&lt;/code&gt; call. I ran it locally with &lt;a href=&#34;https://jena.apache.org/documentation/query/&#34;&gt;&lt;code&gt;arq&lt;/code&gt;&lt;/a&gt; in a query that demonstrates batch processing of names to normalize.&lt;/p&gt;
&lt;p&gt;This query is more flexible than the one above, letting you disambiguate terms from different classes—in my example, both company and country names. (I tried it with people as well, but there are just too many famous people who share their name with someone else such as the various &lt;a href=&#34;https://en.wikipedia.org/wiki/Michael_Jordan_(disambiguation)&#34;&gt;Michael Jordans&lt;/a&gt; and &lt;a href=&#34;https://en.wikipedia.org/wiki/David_Thomas&#34;&gt;Dave Thomases&lt;/a&gt;.) You just need to include the class URI in the input.&lt;/p&gt;
&lt;p&gt;Here is the sample input, listing some entities by their Wikidata names and some by alternative names that they are known for.&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;@prefix s:  &amp;lt;http://learningsparql.com/ns/sample/&amp;gt; .
@prefix wd: &amp;lt;http://www.wikidata.org/entity/&amp;gt;

s:company1 a wd:Q4830453;
           s:name &amp;#34;Kodak&amp;#34; .

s:company2 a wd:Q4830453;
           s:name &amp;#34;Big Blue&amp;#34; .

s:company3 a wd:Q4830453;
           s:name &amp;#34;Coca Cola&amp;#34; .

s:country1 a wd:Q6256;
        s:name &amp;#34;The UK&amp;#34; .
        
s:country2 a wd:Q6256;
        s:name &amp;#34;Nigeria&amp;#34; .
        
s:country3 a wd:Q6256;
        s:name &amp;#34;U.S.&amp;#34; .
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;The query is a &lt;code&gt;CONSTRUCT&lt;/code&gt; query, so it will return triples: the original data, the Wikipedia name, and the Wikipedia qname so that something further down the processing pipeline can retrieve more data from Wikipedia about the entity.&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;PREFIX rdfs: &amp;lt;http://www.w3.org/2000/01/rdf-schema#&amp;gt;
PREFIX wd:   &amp;lt;http://www.wikidata.org/entity/&amp;gt;
PREFIX wdt:  &amp;lt;http://www.wikidata.org/prop/direct/&amp;gt;
PREFIX skos: &amp;lt;http://www.w3.org/2004/02/skos/core#&amp;gt;
PREFIX s:    &amp;lt;http://learningsparql.com/ns/sample/&amp;gt;

CONSTRUCT {
  ?entity rdfs:label ?wdName ;
          s:name ?name ; 
          s:wikidataURI ?wikidataEntity . 
}
WHERE {
  ?entity a ?entityClass;
          s:name ?name . 
  
  BIND(STRLANG(?name,&amp;#34;en&amp;#34;) AS ?englishName)
  
  SERVICE &amp;lt;https://query.wikidata.org/sparql&amp;gt; 
  # Look for something with that name and entity class. 
  {
    ?wikidataEntity wdt:P31 ?entityClass . 
    {       
      ?wikidataEntity skos:altLabel ?englishName ; 
         rdfs:label ?officialName .
      FILTER ( lang(?officialName) = &amp;#34;en&amp;#34; )
    }
    UNION
    { ?wikidataEntity rdfs:label ?englishName .
      FILTER ( lang(?englishName) = &amp;#34;en&amp;#34; )
}
  }
  BIND(STR(COALESCE(?officialName,?englishName)) AS ?wdName)
}
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;The &lt;code&gt;SERVICE&lt;/code&gt; clause passes off some of the logic to happen on Wikidata, and the rest executes locally with my copy of &lt;code&gt;arq&lt;/code&gt;. The rest of the syntax is very close to the more heavily-commented example above.&lt;/p&gt;
&lt;p&gt;Here are the results I get:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;@prefix rdfs: &amp;lt;http://www.w3.org/2000/01/rdf-schema#&amp;gt; .
@prefix s:    &amp;lt;http://learningsparql.com/ns/sample/&amp;gt; .
@prefix skos: &amp;lt;http://www.w3.org/2004/02/skos/core#&amp;gt; .
@prefix wd:   &amp;lt;http://www.wikidata.org/entity/&amp;gt; .
@prefix wdt:  &amp;lt;http://www.wikidata.org/prop/direct/&amp;gt; .

s:company1  rdfs:label  &amp;#34;Kodak&amp;#34; ;
        s:name         &amp;#34;Kodak&amp;#34; ;
        s:wikidataURI  wd:Q486269 .

s:country2  rdfs:label  &amp;#34;Nigeria&amp;#34; ;
        s:name         &amp;#34;Nigeria&amp;#34; ;
        s:wikidataURI  wd:Q1033 .

s:company2  rdfs:label  &amp;#34;IBM&amp;#34; ;
        s:name         &amp;#34;Big Blue&amp;#34; ;
        s:wikidataURI  wd:Q37156 .

s:country3  rdfs:label  &amp;#34;United States of America&amp;#34; ;
        s:name         &amp;#34;U.S.&amp;#34; ;
        s:wikidataURI  wd:Q30 .

s:country1  rdfs:label  &amp;#34;United Kingdom&amp;#34; ;
        s:name         &amp;#34;The UK&amp;#34; ;
        s:wikidataURI  wd:Q145 .

s:company3  rdfs:label  &amp;#34;The Coca-Cola Company&amp;#34; , &amp;#34;Coca-Cola Hellenic&amp;#34; ;
        s:name         &amp;#34;Coca Cola&amp;#34; ;
        s:wikidataURI  wd:Q3295867 , wd:Q1104910 .
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;The fact that the &amp;ldquo;Coca Cola&amp;rdquo; entry returns two companies shows that this may not completely normalize a given name. We can automate the identification of which of the output entities have more than one Wikidata name and therefore need some review with this query on the output of the CONSTRUCT query above:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;PREFIX  s:  &amp;lt;http://learningsparql.com/ns/sample/&amp;gt;

SELECT ?name (COUNT(?uri) AS ?uriCount)
WHERE {
  ?s s:name ?name ;
     s:wikidataURI ?uri 
}
GROUP BY ?name
HAVING (?uriCount &amp;gt; 1)
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Or, you could try to find some logic related to how Wikidata models companies as a way to pick just one of the companies—or countries, because this issue comes up with them as well. The nice thing is that it the query can work with different classes of entities, so it provides a foundation to build on.&lt;/p&gt;
 &lt;hr/&gt;
&lt;p&gt;&lt;em&gt;Comments? Reply to &lt;a href=&#34;https://twitter.com/bobdc/status/1640013052853493761&#34;&gt;my tweet&lt;/a&gt; (or even better, my &lt;a href=&#34;https://mas.to/@bobdc/110090304693753766&#34;&gt;Mastodon message&lt;/a&gt;) announcing this blog entry.&lt;/em&gt;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2023">2023</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
      <category domain="https://www.bobdc.com//categories/rdf">rdf</category>
      
      <category domain="https://www.bobdc.com//categories/owl">OWL</category>
      
      <category domain="https://www.bobdc.com//categories/rdfs">RDFS</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
      <category domain="https://www.bobdc.com//categories/wikidata">Wikidata</category>
      
    </item>
    
    <item>
      <title>Using the AWS Graph Explorer with Fuseki and local datasets</title>
      <link>https://www.bobdc.com/blog/graphexplorerandfuseki/</link>
      <pubDate>Sun, 26 Feb 2023 11:03:00 +0000</pubDate>
      
      <guid>https://www.bobdc.com/blog/graphexplorerandfuseki/</guid>
      
      
      <description><div>An open source visual graph navigator.</div><div>&lt;p&gt;When I first heard about the &lt;a href=&#34;https://github.com/aws/graph-explorer&#34;&gt;AWS Graph Explorer&lt;/a&gt; I assumed that it was a cloud-based tool for use with Neptune, the AWS cloud-based triplestore. After I read Fan Li&amp;rsquo;s &lt;a href=&#34;https://apex974.com/articles/aws-graph-explorer-first-impression&#34;&gt;First Impressions of the AWS Graph Explorer&lt;/a&gt; I realized that you can install this open source tool locally and point it at any SPARQL endpoint you want, so I cranked up &lt;a href=&#34;https://jena.apache.org/documentation/fuseki2/&#34;&gt;Jena Fuseki&lt;/a&gt; on my laptop, loaded some data into it, and installed the Graph Explorer.&lt;/p&gt;
&lt;p&gt;The first three of the &lt;a href=&#34;https://github.com/aws/graph-explorer#steps-to-install-graph-explorer&#34;&gt;Steps to install Graph Explorer&lt;/a&gt; on the project&amp;rsquo;s &lt;a href=&#34;https://github.com/aws/graph-explorer&#34;&gt;git repo&lt;/a&gt; readme page were all I needed. For step two, where it says &lt;code&gt;{hostname-or-ip-address}&lt;/code&gt;, I just put &lt;code&gt;localhost&lt;/code&gt;, which is also what Fan did, and that worked fine.&lt;/p&gt;
&lt;p&gt;After I did step three&amp;rsquo;s &lt;code&gt;docker run&lt;/code&gt; command  to run the Docker container, I didn&amp;rsquo;t need to do the remaining steps on the list, which were for running this on an EC2 instance. I sent a browser to &lt;code&gt;https://localhost:5173/&lt;/code&gt; and that got redirected to &lt;code&gt;https://localhost:5173/#/connections&lt;/code&gt;, which is where you create and manage connections to the data sources with the data you want to visualize.&lt;/p&gt;
&lt;p&gt;For some local data to explore I loaded the W3C&amp;rsquo;s  &lt;a href=&#34;https://www.w3.org/TR/vcard-rdf/&#34;&gt;vcard&lt;/a&gt; business card ontology into Fuseki. (The ontology file is available at its namespace URI, &lt;a href=&#34;http://www.w3.org/2006/vcard/ns&#34;&gt;http://www.w3.org/2006/vcard/ns&lt;/a&gt;. It&amp;rsquo;s always nice when a namespace URI is a URL pointing to the ontology itself.) I also made a little file with four instances of the ontology&amp;rsquo;s &lt;code&gt;Individual&lt;/code&gt; class (the following plus three variations) and loaded that into Fuseki to see how the Graph Explorer showed them.&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;ex:r4 a v:Individual ;
      v:given-name &amp;#34;Dana&amp;#34; ;
      v:family-name &amp;#34;Williams&amp;#34; . 
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;The next step was to connect the Graph Explore to the dataset. I clicked the plus sign on the connections screen mentioned above to display the &amp;ldquo;Add New Connection&amp;rdquo; dialog box. I only had to fill out two things there:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;I changed the default value for &amp;ldquo;Graph Type&amp;rdquo; from &amp;ldquo;PG (Property Graph)&amp;rdquo; (which supports the &lt;a href=&#34;https://tinkerpop.apache.org/&#34;&gt;Apache Tinkerpop&lt;/a&gt; variety) to &amp;ldquo;RDF (Resource Description Framework)&amp;rdquo;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;For the endpoint, I learned from Fan that the &amp;ldquo;Public or Proxy Endpoint&amp;rdquo; field assumes that your SPARQL endpoint ends with &lt;code&gt;/sparql&lt;/code&gt;, so you shouldn&amp;rsquo;t include that when entering a URL for that. The endpoint URL for this vcard dataset on Fuseki was &lt;code&gt;http://localhost:3030/vcard/sparql&lt;/code&gt;, so I entered &lt;code&gt;http://localhost:3030/vcard/&lt;/code&gt; in that field:&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;img src=&#34;https://www.bobdc.com/img/main/graphExplorerAddConnect.png&#34; class=&#34;centered&#34; alt=&#34;Add Connection dialog box of AWS Graph Explorer&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;After clicking &amp;ldquo;Add Connection&amp;rdquo;, the next screen will either have a message &amp;ldquo;Connection successfully synchronized&amp;rdquo; on the right or an error message. My error messages were due to accidentally picking PG as the graph type and using the full endpoint URL instead of omitting the &lt;code&gt;/sparql&lt;/code&gt; part.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://www.bobdc.com/img/main/graphExplorerConnectionsScreen.png&#34;&gt;&lt;img src=&#34;https://www.bobdc.com/img/main/graphExplorerConnectionsScreen.png&#34; class=&#34;centered&#34; width=&#34;500&#34; alt=&#34;Graph Explorer Connections screen&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;If you use this screen&amp;rsquo;s plus sign to add additional connections, the panel on the left will list them with an &amp;ldquo;Active&amp;rdquo; toggle to the right of each one&amp;rsquo;s name to select it. Clicking the circular arrows in the upper-right next to &amp;ldquo;Last Synchronization&amp;rdquo; will update the data that the Explorer is using from the data source. This was useful for me when I loaded the vcard ontology into Fuseki, created a Graph Explorer connection to view it, and then added the sample instance data into that Fuseki dataset and wanted Graph Explorer to use that new instance data as well as the ontology data.&lt;/p&gt;
&lt;p&gt;Once you have a connection up and running, the Explorer&amp;rsquo;s Graph View tells you &amp;ldquo;To get started, click into the search bar to browse graph data. Click + to add to Graph View.&amp;rdquo; I searched for &amp;ldquo;Dana&amp;rdquo; in the search bar, got the results shown on the left here, and clicked that to see the additional details on the right:&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://www.bobdc.com/img/main/graphExplorerSearchResults.png&#34;&gt;&lt;img src=&#34;https://www.bobdc.com/img/main/graphExplorerSearchResults.png&#34; class=&#34;centered&#34; width=&#34;400&#34; alt=&#34;Search Results&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Clicking &amp;ldquo;Add Selected&amp;rdquo; in the lower right of that dialog box put this instance on the Graph View with data underneath it in the Table View. Double-clicking the instance there expanded the graph a bit:&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://www.bobdc.com/img/main/graphExplorerGraphView.png&#34;&gt;&lt;img src=&#34;https://www.bobdc.com/img/main/graphExplorerGraphView.png&#34; class=&#34;centered&#34; width=&#34;400&#34; alt=&#34;Instance and its class in Graph View&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve barely scratched the surface here. vcard is a fairly rich ontology, so exploring its structure was also fun. With a name like &amp;ldquo;AWS Graph Explorer&amp;rdquo; they are obviously pushing it for use with cloud-based datasets, but I was happy to see how easily it works with small local setups as well. To learn more before you try it out, don&amp;rsquo;t miss Fan Li&amp;rsquo;s description of his experiments with this tool.&lt;/p&gt;
 &lt;hr/&gt;
&lt;p&gt;&lt;em&gt;Comments? Reply to &lt;a href=&#34;https://twitter.com/bobdc/status/1629877129092407300&#34;&gt;my tweet&lt;/a&gt; (or even better, my &lt;a href=&#34;https://mas.to/@bobdc/109931923333176170&#34;&gt;Mastodon message&lt;/a&gt;) announcing this blog entry.&lt;/em&gt;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2023">2023</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
      <category domain="https://www.bobdc.com//categories/rdf">rdf</category>
      
      <category domain="https://www.bobdc.com//categories/owl">OWL</category>
      
      <category domain="https://www.bobdc.com//categories/rdfs">RDFS</category>
      
    </item>
    
    <item>
      <title>SPARQL and OWL on the command line—of my phone!</title>
      <link>https://www.bobdc.com/blog/rdflibonphone/</link>
      <pubDate>Sun, 22 Jan 2023 11:15:00 +0000</pubDate>
      
      <guid>https://www.bobdc.com/blog/rdflibonphone/</guid>
      
      
      <description><div>Termux and rdflib on my Android phone.</div><div>&lt;p&gt;I recently wondered &amp;ldquo;could I run a Python script that includes the &lt;a href=&#34;https://rdflib.readthedocs.io/en/stable/&#34;&gt;rdflib&lt;/a&gt; library on my Samsung Android phone?&amp;rdquo; Five minutes later, I was doing it, and about three of those minutes were spent installing Python.&lt;/p&gt;
&lt;p&gt;The &lt;a href=&#34;https://play.google.com/store/apps/details?id=com.termux&amp;amp;hl=en_US&amp;amp;gl=US&#34;&gt;termux&lt;/a&gt; terminal emulator lets you treat your Android phone as a regular linux machine. (I even have a Bluetooth keyboard for when I&amp;rsquo;m getting super geeky with termux on my phone by running Emacs or command line git.) From the termux command line, I installed Python and rdflib the same way I would on any machine:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-sh&#34; data-lang=&#34;sh&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;pkg install python
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;pip install rdflib
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;I pasted the Python script in &amp;ldquo;A tiny example&amp;rdquo; from the &lt;a href=&#34;https://rdflib.readthedocs.io/en/stable/gettingstarted.html&#34;&gt;rdflib getting started page&lt;/a&gt; into a text file and ran it with no problem five minutes after wondering if all this would work.&lt;/p&gt;
&lt;p&gt;To point my phone&amp;rsquo;s Python scripts at the interesting place that &lt;code&gt;pkg install&lt;/code&gt; put my Python executable, I did have to put this as my first line:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;#!/data/data/com.termux/files/usr/bin/python
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;To try a SPARQL query, I took the first example on the rdflib documentation&amp;rsquo;s &lt;a href=&#34;https://rdflib.readthedocs.io/en/stable/intro_to_sparql.html&#34;&gt;Querying with SPARQL&lt;/a&gt; page, substituted the URL of my own ancient FOAF file &lt;a href=&#34;http://snee.com/bob/foaf.rdf&#34;&gt;http://snee.com/bob/foaf.rdf&lt;/a&gt; as the parameter for the demo script&amp;rsquo;s &lt;code&gt;g.parse()&lt;/code&gt; call, and the Python script ran fine with the expected output of the script&amp;rsquo;s SPARQL query.&lt;/p&gt;
&lt;p&gt;Then I got ambitious and tried some OWL inferencing. After I did the &lt;code&gt;pip install owlrl&lt;/code&gt; command shown at the top of the &lt;a href=&#34;https://pypi.org/project/owlrl/&#34;&gt;owlrl home page&lt;/a&gt; in termux, everything from my blog entry &lt;a href=&#34;../cmdlineowl/&#34;&gt;My command line OWL processor&lt;/a&gt; then worked fine. I took the script shown at the end of that blog entry and only had to make one change (not to run on my phone, but I&amp;rsquo;m guessing because the library has evolved a bit): I  removed &lt;code&gt;.decode()&lt;/code&gt; from the last line.&lt;/p&gt;
&lt;p&gt;To review the goal of the &amp;ldquo;command line OWL processor&amp;rdquo; demo, it started with Turtle data about musicians, their instruments, and the states that they were from, such as:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;d:m2 rdfs:label &amp;#34;Charlie Christian&amp;#34; ;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;     dm:plays d:Guitar ;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;     dm:stateOfBirth d:TX .
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;d:m4 rdfs:label &amp;#34;Kim Gordon&amp;#34; ;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;     dm:plays d:Bass ;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;     dm:stateOfBirth d:NY .
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;It also declared three OWL restriction classes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;dm:Guitarist&lt;/code&gt; as resources that have a &lt;code&gt;d:Guitar&lt;/code&gt; value for their &lt;code&gt;dm:plays&lt;/code&gt; property.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;dm:Texan&lt;/code&gt; as resources that have a value of &lt;code&gt;d:TX&lt;/code&gt; for their &lt;code&gt;dm:stateOfBirth&lt;/code&gt; property.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;dm:TexasGuitarPlayer&lt;/code&gt; as the intersection of the &lt;code&gt;dm:Guitarist&lt;/code&gt; and &lt;code&gt;dm:Texan&lt;/code&gt; classes.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The sample data did not identify any of the musicians as instances of any classes; finding this out required OWL inferencing, and the &lt;code&gt;owlrl&lt;/code&gt; library made this possible. Below you can see the script being invoked:&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://www.bobdc.com/img/main/pythonrdfonphone1.jpg&#34;&gt;&lt;img src=&#34;https://www.bobdc.com/img/main/pythonrdfonphone1.jpg&#34; class=&#34;centered&#34; width=&#34;400&#34; alt=&#34;Running a Python OWL script on Termux&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;In this excerpt from its output, you can see that the OWL inferencing happened, and that resource &lt;code&gt;http://learningsparql.com/ns/m2&lt;/code&gt; (Charlie Christian) is an instance of all three restriction classes:&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://www.bobdc.com/img/main/pythonrdfonphone2.jpg&#34;&gt;&lt;img src=&#34;https://www.bobdc.com/img/main/pythonrdfonphone2.jpg&#34; class=&#34;centered&#34; width=&#34;400&#34; alt=&#34;Output of command from previous image&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;In &lt;a href=&#34;../sqlite/&#34;&gt;Converting sqlite browser cookies to Turtle and querying them with SPARQL&lt;/a&gt; I wrote that most of your computing devices probably have some SQLite data on them, and I showed that converting this data to RDF is pretty easy.  &lt;code&gt;sqlite3&lt;/code&gt; was as easy to install with termux as the other packages above; using it to reach the SQLite data on my phone was another matter. As you might guess from where &lt;code&gt;pkg install&lt;/code&gt; put my Python executable, termux has its own storage section on my phone. From there I can access music files, downloads, and other files on my phone from the termux command line, but I &lt;a href=&#34;https://mas.to/@bobdc/109580313975700412&#34;&gt;couldn&amp;rsquo;t find&lt;/a&gt; any SQLite data in my phone&amp;rsquo;s termux storage area. &lt;a href=&#34;https://en.wikipedia.org/wiki/Rooting_(Android)&#34;&gt;Rooting&lt;/a&gt; my phone would give me access to more, so that&amp;rsquo;s something to consider.&lt;/p&gt;
&lt;p&gt;Still, as the first two examples above show, a Python script with rdflib can retrieve data from the Internet and run SPARQL queries on that, so that provides some interesting possibilities. The most pleasant surprise for me about all this was just how easy it was to use this set of tools on my phone.&lt;/p&gt;
 &lt;hr/&gt;
&lt;p&gt;&lt;em&gt;Comments? Reply to &lt;a href=&#34;https://twitter.com/bobdc/status/1617197043352485889&#34;&gt;my tweet&lt;/a&gt; (or even better, my &lt;a href=&#34;https://mas.to/@bobdc/109733799937066871&#34;&gt;Mastodon message&lt;/a&gt;) announcing this blog entry.&lt;/em&gt;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2023">2023</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
      <category domain="https://www.bobdc.com//categories/rdf">rdf</category>
      
      <category domain="https://www.bobdc.com//categories/owl">OWL</category>
      
    </item>
    
    <item>
      <title>Web3 and Web 3.0 at OriginTrail</title>
      <link>https://www.bobdc.com/blog/origintrail/</link>
      <pubDate>Sun, 18 Dec 2022 13:17:00 +0000</pubDate>
      
      <guid>https://www.bobdc.com/blog/origintrail/</guid>
      
      
      <description><div>An interview with CTO and co-founder Branimir Rakić</div><div>&lt;p&gt;&lt;a href=&#34;https://origintrail.io/&#34;&gt;OriginTrail&lt;/a&gt; is doing one of the most interesting combinations of blockchain technology and RDF that I have seen. In November I spoke with CTO and co-founder &lt;a href=&#34;https://twitter.com/BranaRakic&#34;&gt;Branimir Rakić&lt;/a&gt;.&lt;/p&gt;
&lt;hr/&gt;
&lt;p&gt;&lt;a href=&#34;https://origintrail.io/&#34;&gt;&lt;img src=&#34;https://www.bobdc.com/img/main/origintrailLogo.png&#34; alt=&#34;OriginTrail logo&#34; border=&#34;0&#34; width=&#34;220&#34; align=&#34;right&#34;  style=&#34;margin: 0px 30px 20px 40px;&#34; /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Tell me about OriginTrail.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;OriginTrail is both an ecosystem and technology stack. Its mission is to grow an open, permissionless system for discovering, verifying and querying valuable assets, be they physical or digital. It merges the benefits of two technologies—blockchains and semantic tech (both named Web3, at different times), hence forming a Decentralized Knowledge Graph, or DKG for short.&lt;/p&gt;
&lt;p&gt;With this &amp;ldquo;merge&amp;rdquo; of technologies OriginTrail enables innovative applications that transition from &amp;ldquo;managing data&amp;rdquo; to managing assets, with associated tools such as data marketplaces, knowledge tokens, and user-tailored search. It operates on a network of hundreds of nodes run by individuals and companies around the world (including the British Standards Institution, US retailers, and Swiss Railways), based on open source tech and standards such as W3C RDF/SPARQL and emerging Decentralized Identifiers and Verifiable Credentials. OriginTrail DKG can be seen as &amp;ldquo;middleware&amp;rdquo;, connecting different (often legacy) systems in a novel &amp;ldquo;Semantic Web3&amp;rdquo; network.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;How would an interested user get started using this?&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;One of the best ways to start is to explore the &lt;a href=&#34;https://docs.origintrail.io/&#34;&gt;official documentation&lt;/a&gt;. With a pending update of the network to version 6 (in December), we&amp;rsquo;re also about to release an updated version of the documentation with example tutorials so that would be a great starting point.&lt;/p&gt;
&lt;p&gt;Naturally, knowing about graphs and SPARQL would be also a good start.&lt;/p&gt;
&lt;p&gt;You can also develop graph-native Web3 applications, interfacing with assets on the DKG using the OriginTrail SDK. There are currently two SDKs available; one is available on the &lt;a href=&#34;https://cloudmarketplace.oracle.com/marketplace/en_US/listing/123693746&#34;&gt;Oracle Cloud Marketplace&lt;/a&gt; and another one on &lt;a href=&#34;https://marketplace.digitalocean.com/apps/origintrail-node&#34;&gt;DigitalOcean&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;I guess what I mean is, what would a brand new user set about creating as a first step of using OriginTrail?&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Broadly speaking a user could observe OriginTrail DKG as a global decentralized graph &amp;ldquo;database&amp;rdquo; which one can publish or query knowledge assets from. Both of these can be done using the DKG libraries (such as dkg.js) or public web interfaces (OriginTrail’s Project &amp;ldquo;Magnify&amp;rdquo;, currently in private beta).&lt;/p&gt;
&lt;p&gt;Writing (or &amp;ldquo;publishing&amp;rdquo;) would, as a first step, entail prep work on the published information (triple generation) and publishing those as knowledge assets into DKG records.&lt;/p&gt;
&lt;p&gt;For querying the DKG one could explore the existing knowledge assets (for example, via Project Magnify interface) and run SPARQL queries on it.&lt;/p&gt;
&lt;p&gt;Apart from being a user, since OriginTrail is a permissionless decentralized system, one can also become a &amp;ldquo;system operator&amp;rdquo; by running an OriginTrail Network node and host the DKG state. For hosting the state, nodes collect publishing fees in the form of TRAC tokens.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;I found &lt;a href=&#34;https://www.reddit.com/r/CryptoCurrency/comments/qykbb3/can_someone_tell_me_why_origintrail_trac_shouldnt/&#34;&gt;this&lt;/a&gt; description on reddit; would you consider it accurate?&lt;/em&gt;&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;&amp;ldquo;OriginTrail allows anyone to store knowledge assets on its decentralized network of nodes by paying a fee. Those assets can then be queried, verified and made valuable because of the relationships that can be represented in the knowledge graph and also because of the interoperable nature of the platform.&amp;rdquo;&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;That is a pretty good description.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Is RDF a typical format for publishing this data?&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Starting with the latest version 6,  yes. This is about to reach productive state (release on the OriginTrail DKG main network) in December.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Is there SPARQL access to the published data&lt;/em&gt;?&lt;/p&gt;
&lt;p&gt;Yes. There are two ways to query the data with SPARQL. One is through a SPARQL service (one is provided in Project Magnify) which provides a gateway into the DKG. The other would be to run your own gateway by running an OriginTrail Node.&lt;/p&gt;
&lt;p&gt;On top of SPARQL access, one can verify the integrity of each triple in the graph by associating it with the issuer&amp;rsquo;s public key (associated with its blockchain account) and Merkle proofs.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;There is some way to plug your own triplestore into a node, right?&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Yes, absolutely. The node connects to a triple store and is decoupled from it. It currently supports Apache Jena (Fuseki), Blazegraph and GraphDB, with plans to extend direct support for others. Essentially, you can consider the node as a &amp;ldquo;modem&amp;rdquo; for your triple store that connects it with other nodes and uses blockchains for verification and transactions.&lt;/p&gt;
&lt;p&gt;Nodes come in two flavors—full and light nodes, where light nodes do not have a triple store of their own and are not participating in running the system, but are able to perform operations on it such as publish and query.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;If I&amp;rsquo;m going to publish data on one of these nodes and sell access to it, what are the potential mechanisms for my customers to pay for this data?&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;The payment mechanisms come in several flavors—paying with TRAC tokens, or paying with &amp;ldquo;Knowledge Tokens&amp;rdquo; (kTokens) which you can create on your own.&lt;/p&gt;
&lt;p&gt;This enables you (as Bob) to create e.g. 1000 Bob tokens, which you can sell via the blockchain as &amp;ldquo;pay as you go&amp;rdquo; access tokens for your data. This enables interesting novelties such as the application of market mechanisms for price discovery on your data.&lt;/p&gt;
&lt;p&gt;To briefly elaborate:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;The data you are selling would be private (kept by you, in a triple store of your choice, connected to a DKG node of your choice).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Metadata about it would be published on the DKG, to make it discoverable.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Based on your decision on how to implement payments, you could opt for one of the above options.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;When a buyer would discover your data, they would initiate the purchase via OriginTrail smart contracts by locking the right amount of tokens (escrow fashion).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Your node would verify the initiation of the transaction (tokens in escrow) and package the data for consumption and verification of the buyer.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Data is swapped for tokens. Using a &amp;ldquo;Proof of Misbehavior&amp;rdquo; system,  tokens will only be spent if the original data has been transmitted.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;em&gt;OriginTrail&amp;rsquo;s website mentions the use of the W3C &lt;a href=&#34;https://www.w3.org/TR/did-core/&#34;&gt;Decentralized Identifiers&lt;/a&gt; (DIDs) specification. What does this provide to your technology?&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Decentralized identifiers are the key piece of tech enabling the blockchain side of things, and the core component of UALs (Universal Asset Locators—URLs in Web3, with resources being assets). With UALs, DIDs enable a standard for provisioning ownable identifiers without a need for a central authority and without dependency on a specific technology. OriginTrail is designed to be blockchain agnostic and, via this standard, can reference any object on any decentralized network (including the DKG itself).&lt;/p&gt;
&lt;p&gt;With DIDs one can identify and interact with data issuers, verify integrity of data and fully control their identifiers.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;It sounds like this is helping to tie the blockchain technology and the W3C standards-based technologies together.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Indeed it does, and it&amp;rsquo;s one of the recommendations getting the most traction, together with W3C &lt;a href=&#34;https://www.w3.org/2017/vc/WG/&#34;&gt;Verifiable Credentials&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;That is great to hear. There are a lot of W3C Recommendations that are nice in theory but not being applied anywhere.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;What kind of OriginTrail customers are using it for what kinds of applications?&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;OriginTrail has been used quite a bit by enterprises. The Swiss Railway company uses it to track rail parts and maintenance events. Several food and beverage producers  (whiskey, poultry, beef etc.) use it to show ingredient provenance information to their consumers. The British Standards Institution (BSI) issues verifiable certificates for their trainings on the DKG, and US retailers such as Walmart, Target, and The Home Depot use it to exchange factory audit reports among each other in a privacy preserving fashion. The World Federation of Hemophilia NGO uses it to track donated vaccines and medicine.&lt;/p&gt;
&lt;p&gt;Most of these applications built on top of OriginTrail aggregate information from different sources (for example, rail companies, food supply chain companies, and factories) and perform various graph traversal queries to obtain product histories and discover associated events. Many of them also use OriginTrail together with &lt;a href=&#34;https://www.gs1.org/standards/epcis&#34;&gt;GS1 EPCIS and CBV&lt;/a&gt; data models; GS1 is to the supply chain world what W3C is to the Web.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Supply chain applications seem to be a theme there. Are any of them using RDF?&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Most of the applications mentioned are either already fully RDF-based or being migrated to RDF. Specifically, the ones using GS1 standardization are benefiting from RDF as it enables a great extension to the descriptive capabilities of those standards. The EPCIS 2.0 standard, which came out recently and we helped co-create through the GS1 Working group, makes this easy, as it&amp;rsquo;s created with RDF compatibility in mind. RDF and SPARQL are an important component of making these implementations easily extendable.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Is there anything else you&amp;rsquo;d like to add?&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Just to reiterate that we are about to launch the latest OriginTrail version (V6) in a couple of weeks time and are excited to showcase the new capabilities unlocked by incorporating RDF/SPARQL into the tech stack with the wider audience. The great thing about OriginTrail is that it has a vibrant community of technologists and enthusiasts who help create content in and around the DKG. It&amp;rsquo;s a truly global community with lots of resources, so I encourage everyone who is interested in finding out more to join our &lt;a href=&#34;https://discord.gg/cCRPzzmnNT&#34;&gt;Discord&lt;/a&gt; and check out the community created resources that can be found on our &lt;a href=&#34;https://Linktr.ee/origintrail&#34;&gt;linktree&lt;/a&gt; site.&lt;/p&gt;
 &lt;hr/&gt;
&lt;p&gt;&lt;em&gt;Comments? Reply to &lt;a href=&#34;https://twitter.com/bobdc/status/1604543874952634368&#34;&gt;my tweet&lt;/a&gt; (or even better, my &lt;a href=&#34;https://mas.to/@bobdc/109536090380504228&#34;&gt;Mastodon message&lt;/a&gt;) announcing this blog entry.&lt;/em&gt;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2022">2022</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
      <category domain="https://www.bobdc.com//categories/rdf">rdf</category>
      
    </item>
    
    <item>
      <title>SPARQL queries of git repository data</title>
      <link>https://www.bobdc.com/blog/sparqlgit/</link>
      <pubDate>Sun, 20 Nov 2022 10:58:00 +0000</pubDate>
      
      <guid>https://www.bobdc.com/blog/sparqlgit/</guid>
      
      
      <description><div>If we&#39;re going to think of git data as a graph...</div><div>&lt;img src=&#34;https://www.bobdc.com/img/main/sparqlAndGitLogos.png&#34; alt=&#34;SPARQL and Git logos&#34; border=&#34;0&#34; width=&#34;220&#34; align=&#34;right&#34;  style=&#34;margin: 0px 30px 20px 40px;&#34; /&gt;
&lt;p&gt;&lt;a href=&#34;https://twitter.com/thejustman&#34;&gt;Justin Dowdy&lt;/a&gt; recently created an open source project to convert the metadata in a git repository to RDF, and I&amp;rsquo;ve been having some fun with it. Before getting into the details, as a brief demo I&amp;rsquo;ll start with a sample SPARQL query that I did to list all of the 2019 commits in my &lt;a href=&#34;https://github.com/bobdc/misc&#34;&gt;misc&lt;/a&gt; github repo:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;PREFIX dcterms: &amp;lt;http://purl.org/dc/terms/&amp;gt; 
PREFIX wd:      &amp;lt;http://www.wikidata.org/entity/&amp;gt; 
PREFIX x:       &amp;lt;http://www.w3.org/2001/XMLSchema#&amp;gt;
PREFIX gist:    &amp;lt;https://ontologies.semanticarts.com/gist/&amp;gt; 

SELECT ?title ?dateTime WHERE {
  ?commit a wd:Q20058545 ;  # it&amp;#39;s an instance of the commit class
            dcterms:subject ?subject ;
            gist:atDateTime ?dateTime . 
  ?subject  dcterms:title ?title .
  FILTER (?dateTime &amp;gt;= &amp;#34;2019-01-01T00:00:00&amp;#34;^^x:dateTime &amp;amp;&amp;amp; 
          ?dateTime &amp;lt; &amp;#34;2020-01-01T00:00:00&amp;#34;^^x:dateTime)
}
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;It produced this result:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;title                                     dateTime
-----                                     --------
adding sqlite rdf files                   2019-07-13T16:19:39-04:00
added tableList.scr                       2019-07-13T16:21:39-04:00
adding readme                             2019-07-28T12:00:55-04:00
added files to go with 2019-10 blog entry 2019-10-20T16:46:07-04:00
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Justin&amp;rsquo;s software that makes this all possible is at &lt;a href=&#34;https://github.com/justin2004/git_to_rdf&#34;&gt;https://github.com/justin2004/git_to_rdf&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Once I installed that software and created a &lt;code&gt;/home/bob/temp/rdf&lt;/code&gt; directory, the following variation on the command line from Justin&amp;rsquo;s github page read my local copy of the &lt;code&gt;misc&lt;/code&gt; repo and put 35,353 triples about it in two files in &lt;code&gt;/mnt/temp/rdf&lt;/code&gt;:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;/home/bob/git/git_to_rdf/git_to_rdf.sh \
  --repository /mnt/git/misc  --output /mnt/temp/rdf
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;(Referencing &lt;code&gt;/home/bob/temp/rdf&lt;/code&gt; as &lt;code&gt;/mnt/temp/rdf&lt;/code&gt; is a Docker thing that I don&amp;rsquo;t completely understand myself. Justin said that he is working to simplify that.) I loaded the new triples into &lt;a href=&#34;https://jena.apache.org/documentation/fuseki2/&#34;&gt;Jena Fuseki&lt;/a&gt; and tried a few of my &lt;a href=&#34;../exploringadataset/&#34;&gt;Queries to explore a dataset&lt;/a&gt; that I typically use, which is how I found out that it had 35K triples.&lt;/p&gt;
&lt;p&gt;To really understand the possibilities, read Justin&amp;rsquo;s blog entry &lt;a href=&#34;https://github.com/justin2004/weblog/tree/master/git_repo_as_rdf&#34;&gt;Git Repositories as RDF Graphs&lt;/a&gt;. I especially like how it explained that he didn&amp;rsquo;t necessarily have to make &amp;ldquo;thoughtful&amp;rdquo; RDF (well-modeled RDF that takes advantage of standard vocabularies) and why and how he did so. His blog entry also includes a nice diagram of his data model, generated with &lt;a href=&#34;https://www.oxfordsemantic.tech/product&#34;&gt;RDFox&lt;/a&gt;, that  you&amp;rsquo;ll want to keep handy while you develop any queries for you own git repo data converted to RDF.&lt;/p&gt;
&lt;p&gt;Several of his sample queries will be especially useful for querying git repos that have commits from multiple people. He demonstrates these with RDF generated from the repo for the cURL utility that I have written about here &lt;a href=&#34;https://www.google.com/search?q=curl&amp;amp;as_sitesearch=bobdc.com&#34;&gt;many times&lt;/a&gt;. My &lt;code&gt;misc&lt;/code&gt; repo that I used to generate RDF only has commits from me, so these sample queries were less useful to me, but they still provided a good model for how to get at certain kinds of repo information.&lt;/p&gt;
&lt;p&gt;To build on what he wrote there I wanted to create at least one more query that was different from his examples, so I created this one to find  the commits that used blocks of text with the word &amp;ldquo;music&amp;rdquo; in them:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;PREFIX wd:      &amp;lt;http://www.wikidata.org/entity/&amp;gt; 
PREFIX gist:    &amp;lt;https://ontologies.semanticarts.com/gist/&amp;gt; 
PREFIX dcterms: &amp;lt;http://purl.org/dc/terms/&amp;gt;

SELECT DISTINCT ?commitTitle ?commitTime ?filename ?textLine  WHERE {
  
  ?commit a wd:Q20058545 ; # it&amp;#39;s a commit
          gist:hasPart ?part ;
          dcterms:subject ?commitSubject ;
          gist:atDateTime ?commitTime . 
  
  ?commitSubject dcterms:title ?commitTitle .
  
  ?part gist:produces  ?contiguousLines .
  
  ?contiguousLines gist:occursIn ?file ; 
                   &amp;lt;http://example.com/containedTextContainer&amp;gt; ?textContainer . 
  
  ?file gist:name ?filename .
    ?textContainer ?line ?textLine .
  
  FILTER(contains(?textLine,&amp;#34;music&amp;#34;))
}
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;And here is the result:&lt;/p&gt;
&lt;img src=&#34;https://www.bobdc.com/img/main/sparqlGitQueryResult.png&#34; alt=&#34;query result&#34; border=&#34;0&#34;/&gt;
&lt;p&gt;This combination of the world&amp;rsquo;s most popular version control system and this ability to to manipulate metadata about what it contains could provide the basis for a Content Management System in the broader original sense of the term: something to manage the storage and workflow of multiple kinds of content for multiple kinds of publication media. (In recent years the term&amp;rsquo;s meaning has narrowed to mean &amp;ldquo;platform to help automate web publishing&amp;rdquo;.)&lt;/p&gt;
&lt;p&gt;That&amp;rsquo;s just one of the possibilities. Read Justin&amp;rsquo;s blog entry and see what ideas it gives you!&lt;/p&gt;
 &lt;hr/&gt;
&lt;p&gt;&lt;em&gt;Comments? Reply to &lt;a href=&#34;https://twitter.com/bobdc/status/1594361640987901952&#34;&gt;my tweet&lt;/a&gt; announcing this blog entry.&lt;/em&gt;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2022">2022</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
      <category domain="https://www.bobdc.com//categories/git">git</category>
      
    </item>
    
    <item>
      <title>Your own free, publicly available SPARQL endpoint</title>
      <link>https://www.bobdc.com/blog/ec2fuseki/</link>
      <pubDate>Sun, 23 Oct 2022 11:59:00 +0000</pubDate>
      
      <guid>https://www.bobdc.com/blog/ec2fuseki/</guid>
      
      
      <description><div>Free as in tier.</div><div>&lt;img src=&#34;https://www.bobdc.com/img/main/sparqlAndEc2Logos.png&#34; alt=&#34;SPARQL and EC2 logos&#34; border=&#34;0&#34; width=&#34;220&#34; align=&#34;right&#34;  style=&#34;margin: 0px 30px 20px 40px;&#34; /&gt;
&lt;p&gt;There are a few tutorials out there about how to start up your own free-tier Amazon Web Services (AWS) Elastic Compute Cloud (EC2) instance and then run your own publicly available web server. I&amp;rsquo;ve planned for a while to try this with a &lt;a href=&#34;https://jena.apache.org/documentation/fuseki2/&#34;&gt;Jena Fuseki&lt;/a&gt; triplestore and SPARQL endpoint, but I postponed it because I thought it might be complicated. It turned out to be pretty easy.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://medium.com/@KerrySheldon/ec2-exercise-1-1-host-a-static-webpage-9732b91c78ef&#34;&gt;EC2 Exercise 1.1: Host a Static Webpage&lt;/a&gt; by Kerry Sheldon is an example of one of the tutorials described above, and it was a good starting point for putting up an Apache web server. Because AWS now has a  &amp;ldquo;new launch experience&amp;rdquo; I couldn&amp;rsquo;t follow her 2018 instructions exactly, but my first few instructions below are based on hers.&lt;/p&gt;
&lt;h2 id=&#34;tell-aws-you-want-to-launch-an-instance&#34;&gt;Tell AWS you want to launch an instance&lt;/h2&gt;
&lt;p&gt;If you don&amp;rsquo;t have an AWS account, create one. Then log in and pick EC2 on the &lt;a href=&#34;https://us-east-1.console.aws.amazon.com/console/&#34;&gt;AWS Console&lt;/a&gt; and &amp;ldquo;Launch Instance&amp;rdquo; from the orange &amp;ldquo;Launch Instance&amp;rdquo; button&amp;rsquo;s dropdown menu.&lt;/p&gt;
&lt;h2 id=&#34;configure-and-launch-the-instance&#34;&gt;Configure and launch the instance&lt;/h2&gt;
&lt;p&gt;The older version of this &amp;ldquo;experience&amp;rdquo; was more of a wizard leading you through various small screens to fill out. The current version has one big screen where you fill out these details:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Add something to the &amp;ldquo;Name&amp;rdquo; field like &amp;ldquo;Fuseki server&amp;rdquo;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Pick from the &amp;ldquo;Application and OS Images&amp;rdquo; selection. This includes a field where you can search from many choices or, under that, you can pick one of the Quick Start choices. I clicked the blue Amazon Linux AWS Quick Start category and then, under that, picked the first choice: &amp;ldquo;Amazon Linux 2 AMI (HVM) - Kernel 5.10, SSD Volume Type Free tier eligible&amp;rdquo;. Scrolling down that list you can see more more machine-learning-oriented images with additional features such as GPUs and PyTorch. This is one of those places where you have to be careful to pick something that will cost you little or nothing, and it&amp;rsquo;s up to you to keep track of that. (After all my experiments with this project so far, as I write the first draft of this blog entry the AWS &lt;a href=&#34;https://us-east-1.console.aws.amazon.com/billing&#34;&gt;billing management&lt;/a&gt; screen says that I currently owe them 20 cents.) I went with the first free tier choice mentioned above.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Under that is the &amp;ldquo;Instance type&amp;rdquo;. I selected the first choice there, &amp;ldquo;t2.micro&amp;rdquo; which is also Free tier eligible. Again, it&amp;rsquo;s up to you to make the choice that will cost you little or nothing, and some of the choices can be expensive.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Under that, create or select  a Key Pair—a public and private key combination that will let you log in to your new instance from your local machine. If you are an AWS user and have an existing one you can pick it from the dropdown list there. If you don&amp;rsquo;t have one, click &amp;ldquo;Create new key pair&amp;rdquo;, give it a name such as fuseki-key-pair, leave the other settings at their default, and click the orange &amp;ldquo;Create key pair&amp;rdquo; button. It will create one with a name like &lt;code&gt;fuseki-key-pair.pem&lt;/code&gt; that your browser downloads. Save that (a typical destination would be the &lt;code&gt;.ssh&lt;/code&gt; subdirectory of your home directory) and remember where you saved it for later.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Moving down the instance configuration page, the next box to fill out is &amp;ldquo;Network settings&amp;rdquo;.  &amp;ldquo;Allow SSH traffic from Anywhere&amp;rdquo; is checked as a default, meaning that anyone can use the &lt;code&gt;ssh&lt;/code&gt; utility for shell access to your instance from anywhere on the Internet. (Shell access will also need the file that you downloaded in the previous step, so that&amp;rsquo;s a somewhat decent level of security. As with the potential costs, it&amp;rsquo;s up to you to research other configurations if that&amp;rsquo;s what you need.) Add checks to the &amp;ldquo;Allow HTTPs traffic&amp;rdquo; and &amp;ldquo;Allow HTTP traffic&amp;rdquo; checkboxes so that browsers and other tools can send HTTP requests to your web server or Fuseki SPARQL endpoint.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Scroll around to see the other things you can set, leave them at their default for this exercise, and click the orange &amp;ldquo;Launch instance&amp;rdquo; button. After a few seconds you should sees a screen that say &amp;ldquo;Success&amp;rdquo; with an orange &amp;ldquo;View all instances&amp;rdquo; button in the lower right. Click that to display the Instances list.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id=&#34;review-your-running-instance-and-start-a-terminal-session-with-it&#34;&gt;Review your running instance and start a terminal session with it&lt;/h2&gt;
&lt;p&gt;Sometimes, when doing this, I didn&amp;rsquo;t see my new instance right away. If this happens to you, wait a minute, reload your browser, and you should eventually see it. The instances list  will show that the &amp;ldquo;Instance state&amp;rdquo; of your new instance is already &amp;ldquo;Running&amp;rdquo;.&lt;/p&gt;
&lt;p&gt;Click the checkbox to the left of your instance on the instances list. From the &amp;ldquo;Instance state&amp;rdquo; dropdown at the top you will see that this the place to Stop, Start, and Terminate the instance, along with a few other options.&lt;/p&gt;
&lt;p&gt;The tabs below the instance list let you do further configuration of the checked instance. The Security tab shows &amp;ldquo;Inbound rules&amp;rdquo; that allow inbound traffic on port 22 for SSH, 80 for HTTP, and 443 for HTTPS.&lt;/p&gt;
&lt;p&gt;Fuseki uses port 3030 as a default, so add a rule for that: on the Security tab under &amp;ldquo;Security groups&amp;rdquo; click the Security group name of &lt;code&gt;sg-long-hex-number&lt;/code&gt; and then under Inbound rules click &amp;ldquo;Edit Inbound Rules&amp;rdquo;. Click &amp;ldquo;Add rule&amp;rdquo; to create a new one with a &amp;ldquo;Port range&amp;rdquo; of 3030. Set the sixth column to 0.0.0.0/0 like the others by picking &amp;ldquo;Anywhere-IPv4&amp;rdquo; from the fifth column&amp;rsquo;s dropdown. Leave the Type value at &amp;ldquo;Custom TCP&amp;rdquo; and click &amp;ldquo;Save rules&amp;rdquo; at the bottom.&lt;/p&gt;
&lt;p&gt;Now your instance is all set up. Pick &amp;ldquo;Instances&amp;rdquo; under &amp;ldquo;Instances&amp;rdquo; (yes, a bit confusing) on the left to return to your Instances list, go back to the Details tab to the left of your new instance&amp;rsquo;s Security tab, and copy the Public Ipv4 address into your clipboard. I will use 12.345.678.90 in my examples below, so substitute yours for that. There are ways to map these IP addresses to registered domain names, but for this exercise, that address will be your server&amp;rsquo;s name when you use &lt;code&gt;ssh&lt;/code&gt; or a web browser to do anything with it.&lt;/p&gt;
&lt;p&gt;Before you log in to your new machine you will need to reset the permissions on the &lt;code&gt;pem&lt;/code&gt; file that you downloaded earlier to something acceptable to your &lt;code&gt;ssh&lt;/code&gt; utility, because the default permissions after downloading are too permissive. Enter the following, adjusting the path as necessary for the file you downloaded:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-shell&#34; data-lang=&#34;shell&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;chmod &lt;span style=&#34;color:#ae81ff&#34;&gt;400&lt;/span&gt; ~/.ssh/fuseki-key-pair.pem
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;(Windows users will have &lt;a href=&#34;https://superuser.com/questions/106181/equivalent-of-chmod-to-change-file-permissions-in-windows&#34;&gt;some other command&lt;/a&gt; to use instead of &lt;code&gt;chmod&lt;/code&gt;, and also may be using  &lt;a href=&#34;http://www.putty.org&#34;&gt;PuTTY&lt;/a&gt; instead of &lt;code&gt;ssh&lt;/code&gt;. I&amp;rsquo;m not sure of the exact Windows syntax to do these tasks, but they shouldn&amp;rsquo;t be difficult to find out.)&lt;/p&gt;
&lt;p&gt;In a shell window on your local computer, enter the following command, substituting the Ipv4 address that you copied above and pointing the &lt;code&gt;-i&lt;/code&gt; parameter to the file that you downloaded earlier:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-shell&#34; data-lang=&#34;shell&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;ssh -i ~/.ssh/fuseki-key-pair.pem ec2-user@12.345.678.90
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;A prompt will ask if you are sure you want to continue, so answer yes, and then you&amp;rsquo;ll be logged in to your new instance as it waits for you to tell it what to do:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt; 
       __|  __|_  )
       _|  (     /   Amazon Linux 2 AMI
      ___|\___|___|

https://aws.amazon.com/amazon-linux-2/
[ec2-user@ip-987-65-4-321 ~]$ 
&lt;/code&gt;&lt;/pre&gt;&lt;h2 id=&#34;download-and-unzip-the-jena-software&#34;&gt;Download and unzip the Jena software&lt;/h2&gt;
&lt;p&gt;You will need the software for the Fuseki server itself and also the Jena tools that let you load data into that server and work with that data. (I described some of those tools in the &lt;a href=&#34;../jenagems/#fusekiDatasets&#34;&gt;Working with Fuseki datasets from the command line&lt;/a&gt; section of my blog post &lt;a href=&#34;../jenagems&#34;&gt;Hidden gems included with Jena’s command line utilities&lt;/a&gt;.)&lt;/p&gt;
&lt;p&gt;After visiting the &lt;a href=&#34;https://jena.apache.org/download/index.cgi&#34;&gt;Jena download page&lt;/a&gt; to find the URLs of these distribution files I executed these commands at the EC2 prompt to retrieve the files to the current directory:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;wget https://dlcdn.apache.org/jena/binaries/apache-jena-fuseki-4.6.1.zip
wget https://dlcdn.apache.org/jena/binaries/apache-jena-4.6.1.zip 
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;(If this posting that you are reading is more than a few months old you&amp;rsquo;ll want to check the download page yourself to get more recent versions of these files.)&lt;/p&gt;
&lt;p&gt;Unzip the two files you downloaded. (For demo purposes, you can just do it from your new instance&amp;rsquo;s root directory. For a more serious production system you would want to create some directories to organize all this better.)&lt;/p&gt;
&lt;h2 id=&#34;install-java&#34;&gt;Install Java&lt;/h2&gt;
&lt;p&gt;Jena is a Java-based tool, and the default version of this EC2 instance doesn&amp;rsquo;t have Java, so you have to add it. I found the x64 RPM Package URL on &lt;a href=&#34;https://www.oracle.com/java/technologies/downloads/&#34;&gt;https://www.oracle.com/java/technologies/downloads/&lt;/a&gt;. The next two commands pull that package into the EC2 instance and then install it there:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-shell&#34; data-lang=&#34;shell&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;   wget https://download.oracle.com/java/19/latest/jdk-19_linux-x64_bin.rpm
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;   sudo yum localinstall jdk-19_linux-x64_bin.rpm 
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id=&#34;try-the-fuseki-server&#34;&gt;Try the Fuseki server&lt;/h2&gt;
&lt;p&gt;Change into the directory with the Fuseki binary (created by unzipping above) and see if it responds to a simple command:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-shell&#34; data-lang=&#34;shell&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;cd apache-jena-fuseki-4.6.1/
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;./fuseki-server --help
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;If you see the help information, that means that you installed Fuseki and Java correctly.&lt;/p&gt;
&lt;p&gt;Now let&amp;rsquo;s start it up for real:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-shell&#34; data-lang=&#34;shell&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;./fuseki-server
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Give it a few seconds until the status messages stop scrolling and then send a browser to port 3030 of the Public Ipv4 address you saved earlier. Your URL will be something like http://12.345.678.90:3030/.&lt;/p&gt;
&lt;p&gt;You should see the main Apache Jena Fuseki management screen, with the message &amp;ldquo;No datasets created - add one&amp;rdquo;. Don&amp;rsquo;t bother to click on &amp;ldquo;add one&amp;rdquo;, because this server doesn&amp;rsquo;t have permission to write to your new instance&amp;rsquo;s disk storage, even if you had started &lt;code&gt;fuseki-server&lt;/code&gt; with its &lt;code&gt;--update&lt;/code&gt; switch.  We will load data using the Jena tools.&lt;/p&gt;
&lt;h2 id=&#34;create-an-empty-dataset-for-the-triples-that-you-will-load&#34;&gt;Create an empty dataset for the triples that you will load&lt;/h2&gt;
&lt;p&gt;In the shell window where you started up Fuseki, press ^C to shut it down, because the command line tools that you&amp;rsquo;re about to use don&amp;rsquo;t work with a server that is up and running. Make the root directory your default and, as a sample data set to load, get the data file I created for &lt;a href=&#34;../sparql-queries-of-beatles-reco/&#34;&gt;SPARQL queries of Beatles recording sessions&lt;/a&gt;. With this data loaded in Fuseki, people will be able to query its endpoint about who played what instruments on which Beatles recordings:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-shell&#34; data-lang=&#34;shell&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;cd 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;wget https://bobdc.com/miscfiles/BeatlesMusicians.ttl
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;To tell Fuseki the named dataset on the Fuseki server where you want to load your data, you need to identify the assembler file for that dataset. Your new Fuseki instance has no datasets or assembler files, so how can we create them?&lt;/p&gt;
&lt;p&gt;As I explained in the introduction to  &lt;a href=&#34;../jenagems/#fusekiDatasets&#34;&gt;Working with Fuseki datasets from the command line&lt;/a&gt;, instead of learning the syntax of these files I found that I could just create one with the web interface to a Fuseki server running on my local machine, as long as I started it up with the &lt;code&gt;--update&lt;/code&gt; switch so that the web interface would have write permission. For that one, I called the dataset that I created dataset2, and Fuseki put the assembler file into &lt;code&gt;~/apache-jena-fuseki/run/configuration/dataset2.ttl&lt;/code&gt; on my local machine. I put a &lt;a href=&#34;https://bobdc.com/miscfiles/dataset2.ttl&#34;&gt;copy of that &lt;code&gt;dataset2.ttl&lt;/code&gt; file on my blog&amp;rsquo;s server&lt;/a&gt; so that I could &lt;code&gt;wget&lt;/code&gt; it to my EC2 instance. (I could have also &lt;code&gt;sftp&lt;/code&gt;&amp;rsquo;d it from my local machine to the EC2 instance, but this way it&amp;rsquo;s available to others who want to try the same thing.)&lt;/p&gt;
&lt;p&gt;From your EC2 shell&amp;rsquo;s root directory, execute the following to change into the directory where assembler files get stored, get a copy of the assembler file mentioned above, and rename it for the Beatles data:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-shell&#34; data-lang=&#34;shell&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;cd apache-jena-fuseki-4.6.1/run/configuration
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;wget https://bobdc.com/miscfiles/dataset2.ttl
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;mv dataset2.ttl beatlesSessions.ttl
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Next, you need to edit it for your new dataset. The &lt;code&gt;vi&lt;/code&gt; and &lt;code&gt;nano&lt;/code&gt; editors are included with this Amazon Linux 2 image, but I need my emacs, so I installed it:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-shell&#34; data-lang=&#34;shell&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;sudo yum install emacs
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Open up &lt;code&gt;beatlesSessions.ttl&lt;/code&gt; with your editor. Near the bottom you will see some triples that look like this:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;:tdb_dataset_readwrite
	rdf:type       tdb2:DatasetTDB2 ;
        tdb2:location  &amp;#34;/home/bob/bin/apache-jena-fuseki/run/databases/dataset2&amp;#34; .
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;(Isn&amp;rsquo;t it nice that the configuration file for this triplestore stores everything as triples? ) Change that &lt;code&gt;tdb2:location&lt;/code&gt; value to &amp;ldquo;/home/ec2-user/apache-jena-fuseki-4.6.1/run/databases/dataset2/beatlesSessions&amp;rdquo;, do a global replace of &amp;ldquo;BeatlesSessions&amp;rdquo; for &amp;ldquo;dataset2&amp;rdquo; elsewhere in the file (including in that pathname that you put in in the previous step), save the file, and quit out of your editor.&lt;/p&gt;
&lt;p&gt;Now that you&amp;rsquo;ve created this empty dataset for the server, let&amp;rsquo;s make sure that Fuseki recognizes it before we load any data. Change into the Fuseki directory and start up the Fuseki server again:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-shell&#34; data-lang=&#34;shell&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;cd ~/apache-jena-fuseki-4.6.1/
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;./fuseki-server
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;After the startup status messages stop scrolling, send your browser to the same IP address you did before. You should see &lt;code&gt;/BeatlesSessions&lt;/code&gt; listed as an available dataset. If you like, you can click the &amp;ldquo;query&amp;rdquo; action and run the default query, which asks for ten triples. (Click the dark gray triangle to the right of the query to actually execute it.) It won&amp;rsquo;t get any data, but it shouldn&amp;rsquo;t show an error, either, so you know that the query engine works with this dataset.&lt;/p&gt;
&lt;h2 id=&#34;load-some-triples-into-the-new-dataset&#34;&gt;Load some triples into the new dataset&lt;/h2&gt;
&lt;p&gt;At the shell window, press ^C to end the server session and go back to the command prompt. With the following two commands, go back to the root directory, and before loading data with Jena&amp;rsquo;s &lt;code&gt;tdbloader&lt;/code&gt; tool, use the &lt;code&gt;riot&lt;/code&gt; tool to verify that the data file we&amp;rsquo;re about to load doesn&amp;rsquo;t have any syntax problems, because data load time is not a good time to find out about such problems:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-shell&#34; data-lang=&#34;shell&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;cd
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;./apache-jena-4.6.1/bin/riot --validate BeatlesMusicians.ttl
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;You shouldn&amp;rsquo;t see any error messages.&lt;/p&gt;
&lt;p&gt;Next, load that data into your new dataset by pointing the &lt;code&gt;tdb.tdbloader&lt;/code&gt; command line tool at the data file and at the dataset&amp;rsquo;s assembler file (this is a single long command that I split up to show here, but pasting it as shown worked for me):&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-shell&#34; data-lang=&#34;shell&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;   ./apache-jena-4.6.1/bin/tdb2.tdbloader --tdb &lt;span style=&#34;color:#ae81ff&#34;&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;&lt;/span&gt;      ./apache-jena-fuseki-4.6.1/run/configuration/beatlesSessions.ttl &lt;span style=&#34;color:#ae81ff&#34;&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;&lt;/span&gt;      BeatlesMusicians.ttl
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;(Read more about &lt;code&gt;riot&lt;/code&gt;, &lt;code&gt;tdbloader&lt;/code&gt;, and their companion utilities at  &lt;a href=&#34;../jenagems/#fusekiDatasets&#34;&gt;Working with Fuseki datasets from the command line&lt;/a&gt;. These will let you edit and perform other maintenance on the data loaded in Fuseki.)&lt;/p&gt;
&lt;h2 id=&#34;query-the-data&#34;&gt;Query the data&lt;/h2&gt;
&lt;p&gt;Start up the server again:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-shell&#34; data-lang=&#34;shell&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;cd ~/apache-jena-fuseki-4.6.1/
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;./fuseki-server
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Run that default query again, and this time you should see ten triples about the Beatles&amp;rsquo; recording sessions.&lt;/p&gt;
&lt;p&gt;Let&amp;rsquo;s try a more interesting query. Paul was known as the bass player but sometimes added guitar solos. On which songs? Paste the following into that query screen and run it to find out:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;PREFIX s:     &amp;lt;http://learningsparql.com/ns/schema/&amp;gt; 
PREFIX i:     &amp;lt;http://learningsparql.com/ns/instrument/&amp;gt; 
PREFIX rdfs:  &amp;lt;http://www.w3.org/2000/01/rdf-schema#&amp;gt; 
PREFIX m:     &amp;lt;http://learningsparql.com/ns/musician/&amp;gt; 
SELECT ?title WHERE { 
  ?song a s:Song ;
  i:leadguitar m:PaulMcCartney .
  ?song rdfs:label ?title . 
}
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;You will see a surprising number of songs where he played lead guitar. (Optional step: check out his &lt;a href=&#34;https://www.youtube.com/watch?v=sjb9AxDkwAQ#t=1m16s&#34;&gt;amazing solo&lt;/a&gt;  on &amp;ldquo;Good Morning Good Morning&amp;rdquo;. Be sure to wait for the last lick, after John sings &amp;ldquo;it&amp;rsquo;s time for tea and meet the wife&amp;rdquo;.)&lt;/p&gt;
&lt;p&gt;Remember, what you see in your browser is not a SPARQL endpoint, but the HTML interface to one. &lt;a href=&#34;../endpointandcurl/&#34;&gt;There&amp;rsquo;s an important difference&lt;/a&gt;. To really test this as a SPARQL endpoint, paste the query above into a file on your local machine (or any machine with web access) called &lt;code&gt;paulquery.rq&lt;/code&gt; and then enter the following at the machine&amp;rsquo;s command prompt,  substituting the Ipv4 address that you copied above into the URL:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;curl --data-urlencode &amp;#34;query@paulquery.rq&amp;#34; \
   http://12.345.678.90:3030/BeatlesSessions/sparql
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;It should display a JSON version of the query results. (You can learn how to customize this behavior in my blog posting &lt;a href=&#34;../curling-sparql&#34;&gt;Curling SPARQL&lt;/a&gt;.)&lt;/p&gt;
&lt;h2 id=&#34;your-own-sparql-web-server&#34;&gt;Your own SPARQL web server&lt;/h2&gt;
&lt;p&gt;If it works with curl, it will work with all kinds of other tools, letting those applications take advantage of the data you provide over your new SPARQL endpoint.  A few more points:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Be careful in the options you pick when setting this up, because some can get expensive. I copied this from one of the setup pages: &amp;ldquo;Free tier: In your first year includes 750 hours of t2.micro (or t3.micro in the Regions in which t2.micro is unavailable) instance usage on free tier AMIs per month, 30 GiB of EBS storage, 2 million IOs, 1 GB of snapshots, and 100 GB of bandwidth to the internet.&amp;rdquo; The &lt;a href=&#34;https://aws.amazon.com/ec2/instance-types/t2/&#34;&gt;Amazon EC2 T2 Instances&lt;/a&gt; page says that a t2 micro instance costs  $0.0116 per hour, which works out to about $1.95 per week. Of course, if you want to scale way up and host a ton of data on a faster instance, the more expensive options are available.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;That being said, forgetting about it for a year and then owing AWS a hundred bucks would be no fun. Remember to stop your instance when the time is right and to check the  &lt;a href=&#34;https://us-east-1.console.aws.amazon.com/billing&#34;&gt;billing management&lt;/a&gt; screen ever now and then.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The  &lt;a href=&#34;https://medium.com/@KerrySheldon/ec2-exercise-1-1-host-a-static-webpage-9732b91c78ef&#34;&gt;EC2 Exercise 1.1: Host a Static Webpage&lt;/a&gt; article mentioned above explains how to add a regular Apache web server to your EC2 instance so that you can host static web pages from your new EC2 instance.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The most important thing is that you can use some robust open source software to create a SPARQL endpoint that costs practically nothing and is available to everyone on the Internet. That provides some big opportunities for standards-based data publishing.&lt;/p&gt;
 &lt;hr/&gt;
&lt;p&gt;&lt;em&gt;Comments? Reply to &lt;a href=&#34;https://twitter.com/bobdc/status/1584217353314717697&#34;&gt;my tweet&lt;/a&gt; announcing this blog entry.&lt;/em&gt;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2022">2022</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
      <category domain="https://www.bobdc.com//categories/fuseki">Fuseki</category>
      
    </item>
    
    <item>
      <title>More Picasso paintings in one year than all the Vermeer paintings?</title>
      <link>https://www.bobdc.com/blog/picassovermeer/</link>
      <pubDate>Sun, 25 Sep 2022 12:09:00 +0000</pubDate>
      
      <guid>https://www.bobdc.com/blog/picassovermeer/</guid>
      
      
      <description><div>Answering an art history question with SPARQL.</div><div>&lt;img src=&#34;https://www.bobdc.com/img/main/w1500-Vermeer-Lady-Letter.jpg&#34; alt=&#34;Woman Writing a Letter, with her Maid by Johannes Vermeer&#34; border=&#34;0&#34; width=&#34;240&#34; align=&#34;right&#34;  style=&#34;margin: 0px 30px 20px 40px;&#34; /&gt;
&lt;p&gt;Sometimes a question pops into my head that, although unrelated to computers, could likely be answered with a SPARQL query. I don&amp;rsquo;t necessarily know the query off the top of my head and have to work it out. I&amp;rsquo;m going to discuss an example of one that I worked out and the steps that I took, because I wanted to show how I navigated the Wikidata data model to get what I wanted.&lt;/p&gt;
&lt;p&gt;On a recent trip to Dublin my wife and I went to Dublin&amp;rsquo;s wonderful &lt;a href=&#34;https://www.nationalgallery.ie&#34;&gt;National Gallery of Ireland&lt;/a&gt;. Among other paintings we saw Vermeer&amp;rsquo;s &lt;a href=&#34;https://www.nationalgallery.ie/art-and-artists/highlights-collection/woman-writing-letter-her-maid-johannes-vermeer-1632-1675&#34;&gt;Woman Writing a Letter, with her Maid&lt;/a&gt; and Picasso&amp;rsquo;s &lt;a href=&#34;https://www.nationalgallery.ie/art-and-artists/highlights-collection/still-life-mandolin-pablo-picasso-1881-1973&#34;&gt;Still Life with a Mandolin&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Seeing any Vermeer is a treat because there are so few of them around, and the way he depicts light makes for a huge difference between seeing a picture of the painting and seeing the real thing in front of you. (Remember, when you see these dumb discussions about AI-generated &amp;ldquo;paintings&amp;rdquo;: we can discuss &lt;a href=&#34;https://www.amazon.com/What-Art-Arthur-C-Danto/dp/0300205716/bobducharmeA/&#34;&gt;whether they&amp;rsquo;re art or not&lt;/a&gt;, but they&amp;rsquo;re not paintings if there is no paint. They&amp;rsquo;re PNG and JPG files. If you compare the image above with the Vermeer hanging on the wall at the National Gallery of Ireland you&amp;rsquo;ll see what a tremendous difference that can be.) The Picasso was also great to see live because it was from his more colorful late cubist period; while some of his related collages included bits of wall paper, for this one he painted wallpaper-like patterns onto the canvas.&lt;/p&gt;
&lt;!-- hspace and vspace not making any difference --&gt;
&lt;img src=&#34;https://www.bobdc.com/img/main/stillLifeWithAMandolin.jpg&#34; width=&#34;300&#34; alt=&#34;Still Life with a Mandolin&#34; border=&#34;0&#34; align=&#34;right&#34;  style=&#34;margin: 30px 30px 20px 40px;&#34;/&gt;
&lt;p&gt;We know that Picasso was very prolific for many decades. This led me to wonder: was there any single year of Picasso&amp;rsquo;s career where he produced more paintings than Vermeer produced in his whole life? (Judging, in both cases, by surviving paintings that we have record of.)&lt;/p&gt;
&lt;p&gt;The &lt;a href=&#34;https://en.wikipedia.org/wiki/Johannes_Vermeer&#34;&gt;Wikipedia page for Vermeer&lt;/a&gt; tells us that &amp;ldquo;only 34 paintings are universally attributed to him today&amp;rdquo;, so I didn&amp;rsquo;t need SPARQL for that. The question for to me answer was this: were there any years where Picasso painted more than 34 paintings?&lt;/p&gt;
&lt;h2 id=&#34;what-triples-say-picasso-made-this-painting&#34;&gt;What triples say &amp;ldquo;Picasso made this painting&amp;rdquo;?&lt;/h2&gt;
&lt;p&gt;First I had to identify how Wikidata tells us that Picasso painted a given painting. I started with one of his most famous ones and clicked &lt;a href=&#34;https://www.wikidata.org/wiki/Special:EntityPage/Q175036&#34;&gt;Wikidata item&lt;/a&gt; on the left side of the &lt;a href=&#34;https://en.wikipedia.org/wiki/Guernica_(Picasso)&#34;&gt;Guernica (Picasso)&lt;/a&gt; Wikipedia page. This showed me that Q175036 is the Wikidata identifier for this painting. I knew that the Wikidata triples with subjects that build on this ID would provide some good clues about developing a query that could count up his paintings per year.&lt;/p&gt;
&lt;h3 id=&#34;what-triples-say-its-a-painting&#34;&gt;What triples say &amp;ldquo;It&amp;rsquo;s a painting&amp;rdquo;?&lt;/h3&gt;
&lt;p&gt;I didn&amp;rsquo;t want to count up all his artworks per year, but just his paintings, so I entered the following query and &lt;a href=&#34;https://query.wikidata.org/#SELECT%20%2a%20WHERE%20%7B%0A%20%20wd%3AQ175036%20wdt%3AP31%20%3Fclass%20.%0A%20%20%3Fclass%20rdfs%3Alabel%20%3Fname%20.%0A%20%20FILTER%20%28lang%28%3Fname%29%20%3D%20%22en%22%29%0A%7D%0A&#34;&gt;executed it&lt;/a&gt; to see what class Guernica was an instance of. (Note that instead of using &lt;code&gt;rdf:type&lt;/code&gt; or &lt;code&gt;a&lt;/code&gt; as a property meaning &amp;ldquo;is an instance of&amp;rdquo;, Wikidata uses &lt;a href=&#34;https://www.wikidata.org/wiki/Property:P31&#34;&gt;wdt:P31&lt;/a&gt;. Being reminded of this was part of my navigation around the Wikidata data model that I mentioned above.)&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;SELECT * WHERE {
  wd:Q175036 wdt:P31 ?class .
  ?class rdfs:label ?name .
  FILTER (lang(?name) = &amp;#34;en&amp;#34;)
}
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;This showed that it is an instance of &lt;a href=&#34;https://www.wikidata.org/wiki/Q3305213&#34;&gt;wd:Q3305213&lt;/a&gt;, or &amp;ldquo;painting&amp;rdquo;.&lt;/p&gt;
&lt;h2 id=&#34;what-triples-say-its-by-picasso&#34;&gt;What triples say &amp;ldquo;It&amp;rsquo;s by Picasso&amp;rdquo;?&lt;/h2&gt;
&lt;p&gt;I went to the &lt;a href=&#34;https://en.wikipedia.org/wiki/Pablo_Picasso&#34;&gt;Wikipedia page for Picasso&lt;/a&gt;, picked &lt;a href=&#34;https://www.wikidata.org/wiki/Special:EntityPage/Q5593&#34;&gt;Wikidata item&lt;/a&gt;, and saw that Picasso&amp;rsquo;s Wikidata identifier is WQ5593.&lt;/p&gt;
&lt;p&gt;Next, I did a &lt;a href=&#34;https://query.wikidata.org/#SELECT%20%2a%20WHERE%20%7B%0A%20%20wd%3AQ175036%20%3Fp%20%3Fo%20%0A%7D%20%0A&#34;&gt;very simple query&lt;/a&gt; for all the data about the painting Guernica:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;SELECT * WHERE {
  wd:Q175036 ?p ?o 
} 
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;The result of this query included  &amp;ldquo;wdt:P170 wd:Q5593&amp;rdquo;.  If &lt;code&gt;wd:Q5593&lt;/code&gt; is Picasso, what is &lt;code&gt;wdt:P170&lt;/code&gt;? This is easy enough to find out when executing the query with the Wikidata SPARQL endpoint HTML form: I just clicked on this name in the query result and it &lt;a href=&#34;https://www.wikidata.org/wiki/Property:P170&#34;&gt;showed me&lt;/a&gt; that &lt;code&gt;wdt:P170&lt;/code&gt; means &amp;ldquo;creator&amp;rdquo;.&lt;/p&gt;
&lt;h2 id=&#34;what-triples-say-what-year-a-painting-was-created&#34;&gt;What triples say what year a painting was created?&lt;/h2&gt;
&lt;p&gt;The Wikipedia page for Guernica says that it was created in 1937. The earlier result of asking for all the triples about the painting showed that it has a &lt;code&gt;wdt:P571&lt;/code&gt; value of &amp;ldquo;1 January 1937&amp;rdquo;, where &lt;a href=&#34;https://www.wikidata.org/wiki/Property:P571&#34;&gt;wdt:P571&lt;/a&gt; means &amp;ldquo;inception.&amp;rdquo;&lt;/p&gt;
&lt;h2 id=&#34;what-paintings-in-what-years&#34;&gt;What paintings in what years?&lt;/h2&gt;
&lt;p&gt;Next, I used &lt;a href=&#34;https://query.wikidata.org/#SELECT%20%2a%20WHERE%20%7B%0A%20%20%3Fpainting%20wdt%3AP31%20wd%3AQ3305213%20%3B%20%23%20it%27s%20a%20painting%0A%20%20wdt%3AP170%09%20wd%3AQ5593%20%3B%20%20%20%20%20%20%20%20%20%20%20%23%20by%20Picasso%0A%20%20rdfs%3Alabel%20%3Ftitle%20%3B%0A%20%20wdt%3AP571%20%3FinceptionDate%20.%0A%20%20FILTER%20%28lang%28%3Ftitle%29%20%3D%20%22en%22%29%0A%7D%20%0A&#34;&gt;this query&lt;/a&gt; to list all the paintings by Picasso and the dates they were created:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;SELECT * WHERE {
  ?painting wdt:P31 wd:Q3305213 ; # it&amp;#39;s a painting
  wdt:P170 wd:Q5593 ;             # by Picasso
  rdfs:label ?title ;
  wdt:P571 ?inceptionDate .
  FILTER (lang(?title) = &amp;#34;en&amp;#34;)
} 
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;This listed them, but the Wikidata endpoint interface was displaying dates like 1913-01-01 as &amp;ldquo;1 January 1913&amp;rdquo; (with a suspicious amount having that &amp;ldquo;1 January&amp;rdquo;, so that may be a default when the month and day were unavailable).  I just wanted the year if I was going to look for total paintings per year. I eventually realized that the date values were in &lt;a href=&#34;https://en.wikipedia.org/wiki/ISO_8601&#34;&gt;ISO 8601&lt;/a&gt; format, so I tried pulling out the year values with &lt;a href=&#34;https://query.wikidata.org/#SELECT%20%2a%20WHERE%20%7B%0A%20%20%3Fpainting%20wdt%3AP31%20wd%3AQ3305213%20%3B%20%23%20it%27s%20a%20painting%0A%20%20wdt%3AP170%09%20wd%3AQ5593%20%3B%20%20%20%20%20%20%20%20%20%20%20%23%20by%20Picasso%0A%20%20rdfs%3Alabel%20%3Ftitle%20%3B%0A%20%20wdt%3AP571%20%3FinceptionDate%20.%0A%20%20BIND%28substr%28%3FinceptionDate%2C1%2C4%29%20AS%20%3Fyear%29%0A%20%20FILTER%20%28lang%28%3Ftitle%29%20%3D%20%22en%22%29%0A%7D%20%0A&#34;&gt;this query&lt;/a&gt;:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;SELECT * WHERE {
  ?painting wdt:P31 wd:Q3305213 ; # it&amp;#39;s a painting
  wdt:P170 wd:Q5593 ;             # by Picasso
  rdfs:label ?title ;
  wdt:P571 ?inceptionDate .
  BIND(substr(?inceptionDate,1,4) AS ?year)
  FILTER (lang(?title) = &amp;#34;en&amp;#34;)
} 
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;The dates still looked inconsistent, so I stored that query in the file &lt;code&gt;pquery1.rq&lt;/code&gt; and used  &lt;a href=&#34;https://en.wikipedia.org/wiki/CURL&#34;&gt;curl&lt;/a&gt; to run the query from my shell command line so that I could see the raw result:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;curl --data-urlencode &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;query@pquery1.rq&amp;#34;&lt;/span&gt; https://query.wikidata.org/sparql
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;That showed me that the dates weren&amp;rsquo;t just arranged in ISO 8601 format—they were actually typed as ISO dates, so I revised the query above to convert those to regular strings before pulling out the year value with &lt;a href=&#34;https://query.wikidata.org/#SELECT%20%2a%20WHERE%20%7B%0A%20%20%3Fpainting%20wdt%3AP31%20wd%3AQ3305213%20%3B%20%23%20it%27s%20a%20painting%0A%20%20wdt%3AP170%09%20wd%3AQ5593%20%3B%20%20%20%20%20%20%20%20%20%20%20%23%20by%20Picasso%0A%20%20rdfs%3Alabel%20%3Ftitle%20%3B%0A%20%20wdt%3AP571%20%3FinceptionDate%20.%0A%20%20BIND%28substr%28str%28%3FinceptionDate%29%2C1%2C4%29%20AS%20%3Fyear%29%0A%20%20FILTER%20%28lang%28%3Ftitle%29%20%3D%20%22en%22%29%0A%7D%20%0A&#34;&gt;this query&lt;/a&gt;, and the &lt;code&gt;?year&lt;/code&gt; values came as the four-digit numbers I wanted to see:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;SELECT * WHERE {
  ?painting wdt:P31 wd:Q3305213 ; # it&amp;#39;s a painting
  wdt:P170 wd:Q5593 ;             # by Picasso
  rdfs:label ?title ;
  wdt:P571 ?inceptionDate .
  # added str() call to following
  BIND(substr(str(?inceptionDate),1,4) AS ?year)
  FILTER (lang(?title) = &amp;#34;en&amp;#34;)
} 
&lt;/code&gt;&lt;/pre&gt;&lt;h2 id=&#34;how-many-picasso-paintings-per-year&#34;&gt;How many Picasso paintings per year?&lt;/h2&gt;
&lt;p&gt;I wasn&amp;rsquo;t really interested in the painting titles or their month and day of inception. I had everything I needed to answer my original question: how many paintings did Picasso do each year?&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;SELECT ?year (COUNT(?painting) AS ?paintingsInYear) WHERE {
  ?painting wdt:P31 wd:Q3305213 ; # it&amp;#39;s a painting
  wdt:P170 wd:Q5593 ;             # by Picasso
  wdt:P571 ?inceptionDate .
  BIND(substr(str(?inceptionDate),1,4) AS ?year)
} 
GROUP BY ?year
ORDER BY DESC(?paintingsInYear)
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Here are the first few rows of the results:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;year    paintingsInYear
1901	52
1906	33
1908	31
1909	30
1905	25
1914	24
1903	23
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;So there&amp;rsquo;s the answer: we know of more Picasso paintings from 1901 than we know of Vermeer paintings from his whole life, and in 1906 Picasso came close to the Vermeer total. The first decade of the twentieth century was a very busy year for Picasso. (I then found a website showing his paintings by year; the &lt;a href=&#34;https://www.pablo-ruiz-picasso.net/year-1901.php&#34;&gt;1901&lt;/a&gt; page is interesting.)&lt;/p&gt;
&lt;p&gt;The eye icon dropdown &amp;ldquo;Display result as&amp;rdquo;  menu on the left side of the Wikidata Query Service page offers other ways to visualize the data. I changed the &lt;code&gt;ORDER BY&lt;/code&gt; line in the last query to sort by the &lt;code&gt;?year&lt;/code&gt; value, &lt;a href=&#34;https://query.wikidata.org/#SELECT%20%3Fyear%20%28COUNT%28%3Fpainting%29%20AS%20%3FpaintingsInYear%29%20WHERE%20%7B%0A%20%20%3Fpainting%20wdt%3AP31%20wd%3AQ3305213%20%3B%20%23%20it%27s%20a%20painting%0A%20%20wdt%3AP170%09%20wd%3AQ5593%20%3B%20%20%20%20%20%20%20%20%20%20%20%23%20by%20Picasso%0A%20%20wdt%3AP571%20%3FinceptionDate%20.%0A%20%20BIND%28substr%28str%28%3FinceptionDate%29%2C1%2C4%29%20AS%20%3Fyear%29%0A%7D%20%0AGROUP%20BY%20%3Fyear%0AORDER%20BY%20%3Fyear%0A&#34;&gt;ran the query&lt;/a&gt;, and then picked &amp;ldquo;line chart&amp;rdquo; from the dropdown and got this graph of the number of Picasso&amp;rsquo;s paintings per year:&lt;/p&gt;
&lt;img src=&#34;https://www.bobdc.com/img/main/picassoPaintingsPerYear.png&#34; alt=&#34;Picasso paintings per year&#34; border=&#34;0&#34; /&gt;
&lt;p&gt;This makes it even clearer how busy he was in the first decade of that century.&lt;/p&gt;
&lt;p&gt;There are other display types, and of course, many other painters. There is a lot more fun to have here!&lt;/p&gt;
&lt;p&gt;The most difficult part of creating such a query is the cryptic nature of the entity and property IDs: a single letter followed by a few digits. If the resources and properties used more readable names such as &amp;ldquo;Guernica (painting)&amp;rdquo; and &amp;ldquo;creator&amp;rdquo; instead, it would be more intuitive and easier to write queries—for those of us who speak English. But, Wikidata is designed to be usable by everyone in the world, not just the English speakers, and that&amp;rsquo;s a good thing. I won&amp;rsquo;t complain.&lt;/p&gt;
&lt;p&gt;One more note: I included a &lt;a href=&#34;https://www.bobdc.com/tags/digital-humanities/&#34;&gt;digital-humanities&lt;/a&gt; tag with this post because it&amp;rsquo;s about using technology to answer an art history question. The field is often about accumulating data from different sources so that people can identify new patterns, but as the data in Wikidata accumulates more and more, there are more and more great things we can do with this wonderful source.&lt;/p&gt;
&lt;hr/&gt;
&lt;p&gt;&lt;em&gt;Comments? Reply to &lt;a href=&#34;https://twitter.com/bobdc/status/1574071859666190337&#34;&gt;my tweet&lt;/a&gt; announcing this blog entry.&lt;/em&gt;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2022">2022</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
      <category domain="https://www.bobdc.com//categories/wikidata">Wikidata</category>
      
      <category domain="https://www.bobdc.com//categories/digital-humanities">digital-humanities</category>
      
    </item>
    
    <item>
      <title>Learn RDF in Y minutes</title>
      <link>https://www.bobdc.com/blog/learnrdfinyminutes/</link>
      <pubDate>Sun, 28 Aug 2022 17:10:00 +0000</pubDate>
      
      <guid>https://www.bobdc.com/blog/learnrdfinyminutes/</guid>
      
      
      <description><div>Where X = RDF</div><div>&lt;p&gt;I have always loved the website &lt;a href=&#34;https://learnxinyminutes.com/&#34;&gt;Learn X in Y minutes&lt;/a&gt;, which provides short crash courses in several dozen programming languages plus additional topics such as &lt;a href=&#34;https://learnxinyminutes.com/docs/set-theory/&#34;&gt;set theory&lt;/a&gt; and &lt;a href=&#34;https://learnxinyminutes.com/docs/git/&#34;&gt;git&lt;/a&gt;. Its home page tells us &amp;ldquo;Take a whirlwind tour of your next favorite language&amp;rdquo;; I&amp;rsquo;ll bet it&amp;rsquo;s especially popular with applicants on their way to job interviews where languages that are new to them are in the job description.&lt;/p&gt;
&lt;p&gt;I have planned to add a SPARQL page, and I still haven&amp;rsquo;t. Four years ago they didn&amp;rsquo;t even have an SQL page, so as groundwork for a future SPARQL page I converted an &lt;a href=&#34;https://www.bobdc.com/blog/my-sql-quick-reference/&#34;&gt;old blog entry&lt;/a&gt; of my own SQL quick reference into a &lt;a href=&#34;https://learnxinyminutes.com/docs/sql/&#34;&gt;Learn SQL in Y minutes&lt;/a&gt; page for them. That has since been translated into Spanish, Italian, Russian, Turkish, and Chinese.&lt;/p&gt;
&lt;p&gt;More groundwork: I have just created a &lt;a href=&#34;https://learnxinyminutes.com/docs/rdf/&#34;&gt;Learn RDF in Y minutes&lt;/a&gt; page that shows some Turtle syntax and a few basics of RDFS. The &amp;ldquo;Further Reading&amp;rdquo; section at the end points to my &lt;a href=&#34;https://www.bobdc.com/blog/whatisrdf/&#34;&gt;What is RDF?&lt;/a&gt; and &lt;a href=&#34;https://www.bobdc.com/blog/whatisrdfs/&#34;&gt;What is RDFS?&lt;/a&gt; blog entries, which are more detailed introductions, but I hope that this taste of RDF&amp;rsquo;s value on the Learn X in Y minutes site helps to spread the word of RDF&amp;rsquo;s potential value to a broader audience.&lt;/p&gt;
&lt;hr/&gt;
&lt;p&gt;&lt;em&gt;Comments? Reply to &lt;a href=&#34;https://twitter.com/bobdc/status/1564002259037573120&#34;&gt;my tweet&lt;/a&gt; announcing this blog entry.&lt;/em&gt;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2022">2022</category>
      
      <category domain="https://www.bobdc.com//categories/rdf">RDF</category>
      
      <category domain="https://www.bobdc.com//categories/rdfs">RDFS</category>
      
    </item>
    
    <item>
      <title>SPARQL and Instacart&#39;s Knowledge Graph</title>
      <link>https://www.bobdc.com/blog/instacartsparql/</link>
      <pubDate>Sun, 31 Jul 2022 13:05:00 +0000</pubDate>
      
      <guid>https://www.bobdc.com/blog/instacartsparql/</guid>
      
      
      <description><div>Managing data quality.</div><div>&lt;img src=&#34;https://www.bobdc.com/img/main/sparqlAndInstacartLogos.png&#34; style=&#34;margin: 0px 30px 20px 80px;&#34; border=&#34;0&#34; align=&#34;right&#34;  alt=&#34;SPARQL and Instacart logos&#34;  /&gt;
&lt;p&gt;Two recent articles describe a fascinating use of SPARQL to improve data quality in a knowledge graph at the successful grocery delivery service &lt;a href=&#34;https://www.instacart.com/&#34;&gt;Instacart&lt;/a&gt;. &lt;a href=&#34;https://www2022.thewebconf.org/PaperFiles/28.pdf&#34;&gt;On Reliability Scores for Knowledge Graphs&lt;/a&gt; (pdf) is a short paper submitted to the &lt;a href=&#34;https://www2022.thewebconf.org/&#34;&gt;2022 ACM Web Conference&lt;/a&gt; in Lyon and a longer piece on Instacart&amp;rsquo;s &lt;a href=&#34;https://tech.instacart.com/&#34;&gt;tech blog&lt;/a&gt; is titled &lt;a href=&#34;https://tech.instacart.com/red-means-stop-green-means-go-a-look-into-quality-assessment-in-instacarts-knowledge-graph-9ceeb3f1be24&#34;&gt;Red Means Stop. Green Means Go: A Look into Quality Assessment in Instacart’s Knowledge Graph&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The abstract from the Web Conference paper gives an overview of the goal:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The Instacart KG is a central data store which contains facts regarding grocery products, ranging from taxonomic classifications to product nutritional information. With a view towards providing reliable and complete information for downstream applications, we propose an automated system for providing these facts with a score based on their reliability. This system passes data through a series of contextualized unit tests; the outcome of these tests are aggregated in order to provide a fact with a discrete score: reliable, questionable, or unreliable. These unit tests are written with explainability, scalability, and correctability in mind.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;They &amp;ldquo;propose an automated system&amp;rdquo; that the tech blog piece shows is successfully in production. To quote more from the Web Conference  paper&amp;rsquo;s introduction:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The Instacart KG contains information regarding products, recipes, and various product attributes, together with millions of contextual facts regarding these entities&amp;hellip; Due to their large scale it is infeasible to curate such graphs by hand. Because of this, automated quality control mechanisms are important to ensure KGs contain valid information. Often KGs are created through a series of automated ETL processes which analyze both structured and unstructured data from a variety of sources to generate facts for the graph. This automation, combined with questionable source data, can cause KGs to acquire noise in the form of incorrect statements during their build processes. This noise can present itself in a variety of ways: incorrect product attributes can lead to negative storefront interactions, and noisy training sets can lead to less precise machine learning models. This has led to much work regarding quality assessment, error detection, and error correction in knowledge graphs.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;As the tech blog put it, this system &amp;ldquo;helps us preemptively discover and flag flaws in our data which can then be corrected at the source [and] acts as a basic guardrail which prevents noisy and unreliable data from being published and corrupting downstream processes&amp;rdquo;.&lt;/p&gt;
&lt;p&gt;They store their knowledge graph data as RDF triples in AWS Neptune. They evaluate and record the quality of facts with the following steps:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Use SPARQL to retrieve a set of data such as nutritional information.&lt;/li&gt;
&lt;li&gt;Run the retrieved data through a series of Python unit tests designed for that dataset and log the results.&lt;/li&gt;
&lt;li&gt;Tag facts as being either reliable, questionable, or unreliable.&lt;/li&gt;
&lt;li&gt;Use SPARQL Update to record the results in the named graphs ReliableKG, QuestionableKG, and UnreliableKG.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;With this system in place, downstream applications within the company can use more reliable data or just more data as appropriate for their needs. According to the Web Conference paper, &amp;ldquo;It is trivially easy to restrict a KG query to only select data which is at or above a certain reliability score&amp;rdquo;.&lt;/p&gt;
&lt;p&gt;The tests in step 2 might flag something that is marked as Vegan but not Vegetarian so that someone can check whether it really is Vegan and set its Vegetarian value to True if so. The Web Conference paper includes other examples of how different classes of tests such as identification of outliers (for example, a dessert with an abnormally large amount of protein per serving, or items with an unreasonable  sugar-carbohydrate ratio) led to more data quality. Because of this paper&amp;rsquo;s academic orientation, it also includes an &amp;ldquo;Impact Analysis&amp;rdquo; section about how they quantified the improvements to data quality, as well as references to previous academic work on data quality.&lt;/p&gt;
&lt;p&gt;According to the tech blog, another benefit of their pipeline is that the system can pass the logs &amp;ldquo;to upstream data providers to make it easier to find and correct data inaccuracies at the source&amp;rdquo;.&lt;/p&gt;
&lt;p&gt;Another part of their knowledge graph that provides metadata about the products is a taxonomy that, according to the article&amp;rsquo;s author Thomas Grubb, is represented in RDF as &lt;code&gt;rdfs:class&lt;/code&gt; instances with &lt;code&gt;rdfs:subClassOf&lt;/code&gt; relationships. This taxonomy drives some of the rules used to identify data problems. It also provides input to machine learning steps that help to identify new relationships about items; this process uses word embeddings (which I described in &lt;a href=&#34;../docembeddings&#34;&gt;Document analysis with machine learning&lt;/a&gt;) and the k-Nearest Neighbors algorithm to identify taxonomy classifications based on product names.&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s great to see SPARQL play such an important role in a powerful, useful system that takes advantage of several other interesting technologies. I especially like seeing their use of SPARQL Update—between the &amp;ldquo;QL&amp;rdquo; in &amp;ldquo;SPARQL&amp;rdquo; and the way that Wikidata is driving much of SPARQL&amp;rsquo;s current popularity, many people don&amp;rsquo;t realize that SPARQL is not a read-only technology. I also loved seeing a well-known brand name use and publicize SPARQL&amp;rsquo;s power, as you can see in this tweet:&lt;/p&gt;
&lt;p&gt;&lt;a href=&#39;https://twitter.com/strafstrudel/status/1525144465299718144&#39;&gt;&lt;img src=&#34;https://www.bobdc.com/img/main/instacartSPARQLTweet.png&#34; class=&#34;centered&#34; alt=&#34;Instacart SPARQL Tweet&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;hr/&gt;
&lt;p&gt;&lt;em&gt;Comments? Reply to &lt;a href=&#34;https://twitter.com/bobdc/status/1553763479605186563&#34;&gt;my tweet&lt;/a&gt; announcing this blog entry.&lt;/em&gt;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2022">2022</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
      <category domain="https://www.bobdc.com//categories/knowledge-graphs">knowledge-graphs</category>
      
    </item>
    
    <item>
      <title>Generating websites with SPARQL and Snowman, part 2</title>
      <link>https://www.bobdc.com/blog/snowmanartbasept2/</link>
      <pubDate>Sun, 19 Jun 2022 13:05:00 +0000</pubDate>
      
      <guid>https://www.bobdc.com/blog/snowmanartbasept2/</guid>
      
      
      <description><div>With Rhizome&#39;s excellent ArtBase SPARQL endpoint. </div><div>&lt;img src=&#34;https://www.bobdc.com/img/main/chatonskyoneplusoneplus.png&#34; style=&#34;margin: 0px 30px 20px 80px;&#34; border=&#34;0&#34; align=&#34;right&#34; width=&#34;300&#34; alt=&#34;1+1+1+1+1+1+1+1+1+1+1+1 by Grégory Chatonsky&#34;  /&gt;
&lt;p&gt;In &lt;a href=&#34;../snowmanartbasept1&#34;&gt;part one&lt;/a&gt; of this two-part series, we saw how the open source &lt;a href=&#34;https://github.com/glaciers-in-archives/snowman/releases/tag/0.1.0&#34;&gt;Snowman&lt;/a&gt; static web site generator can generate websites with data from a SPARQL endpoint. I showed how I created a sample website project with its &lt;code&gt;snowman new&lt;/code&gt; command and then reconfigured the project to retrieve a list of artists from the  &lt;a href=&#34;https://artbase.rhizome.org/wiki/Main_Page&#34;&gt;Rhizome&lt;/a&gt; ArtBase endpoint, a repository of data about digital artworks since 1999. Here in part two I will build on that to add lists of artists&amp;rsquo; works with links to Rhizome pages about them.&lt;/p&gt;
&lt;h1 id=&#34;add-lists-of-artists-works-with-links-to-more-information&#34;&gt;Add lists of artists works with links to more information&lt;/h1&gt;
&lt;p&gt;To add these lists of the artists&amp;rsquo; works under their names, I started by removing the last two lines of the project&amp;rsquo;s &lt;code&gt;views.yaml&lt;/code&gt; file (which was generated by the original &lt;code&gt;snowman new&lt;/code&gt; command) and the &lt;code&gt;templates/static.html&lt;/code&gt; file that the  last line pointed to because I didn&amp;rsquo;t need that additional view:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;views:
  - output: &amp;quot;index.html&amp;quot;
    query: &amp;quot;index.rq&amp;quot;
    template: &amp;quot;index.html&amp;quot;
  - output: &amp;quot;static/index.html&amp;quot;
    template: &amp;quot;static.html&amp;quot;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The Snowman &lt;a href=&#34;https://github.com/glaciers-in-archives/snowman#readme&#34;&gt;github readme&lt;/a&gt; file tells you more about views.&lt;/p&gt;
&lt;p&gt;A lot of incremental development in Snowman consists of adding to a query such as the &lt;code&gt;queries/index.rq&lt;/code&gt; one that I started editing in part one and then editing the corresponding display template to take advantage of the new parts of the query. I gradually worked the query in &lt;code&gt;queries/index.rq&lt;/code&gt; up to the following. It asks for artist names and their works that have &amp;ldquo;Flash&amp;rdquo; in their list of tags:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;PREFIX rt: &amp;lt;https://artbase.rhizome.org/prop/direct/&amp;gt;
SELECT DISTINCT ?artistName ?artist ?searchTag WHERE {
   BIND(&amp;quot;Flash&amp;quot; AS ?searchTag)
   ?artwork rt:P29 ?artist . 
   ?artist rdfs:label ?artistName .
   ?artwork rt:P48 ?artbaseLegacyTags .
   # Compare lower-case versions of both to make it case-insensitive
   FILTER CONTAINS(LCASE(?artbaseLegacyTags),LCASE(?searchTag))
}
ORDER BY (?artistName)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;A few notes about this query:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;An artwork&amp;rsquo;s &lt;code&gt;rt:P48&lt;/code&gt; value is a comma-delimited list of tags that have been assigned to it. (Some artworks in the dataset do not have tags assigned, so because this triple pattern is not optional, this query would not retrieve any of those.)&lt;/li&gt;
&lt;li&gt;I learned about which properties (such as &lt;code&gt;rt:P48&lt;/code&gt;) do what mostly through exploratory queries and guesswork. Visiting URLs like &lt;a href=&#34;https://artbase.rhizome.org/wiki/Property:P48&#34;&gt;https://artbase.rhizome.org/wiki/Property:P48&lt;/a&gt; would then show me how good my guesses were.&lt;/li&gt;
&lt;li&gt;I could have just put &lt;code&gt;&amp;quot;Flash&amp;quot;&lt;/code&gt; as the second parameter to &lt;code&gt;CONTAINS()&lt;/code&gt; in the &lt;code&gt;FILTER&lt;/code&gt; line instead of storing it in a &lt;code&gt;?searchTag&lt;/code&gt; variable and referencing that. Storing it in a variable at the top of the query made it easier to change to other values to look for other kinds of works, as we&amp;rsquo;ll see below.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In the followup to the query revision above, the new version of the &lt;code&gt;template/index.html&lt;/code&gt; display template shown below has three new things:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A slightly different title.&lt;/li&gt;
&lt;li&gt;The artist name in an &lt;code&gt;h2&lt;/code&gt; subhead element with the search value (for example, &amp;ldquo;Flash&amp;rdquo;) appended.&lt;/li&gt;
&lt;li&gt;A Snowman &lt;code&gt;include&lt;/code&gt; function to insert more content. It names an HTML template to format the inserted content, a query to generate values for the new template, and a parameter to pass to the query: the &lt;code&gt;?artist&lt;/code&gt; value (a URL) retrieved by the main query above.&lt;/li&gt;
&lt;/ul&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;{{ template &amp;#34;base&amp;#34; . }}
{{ define &amp;#34;title&amp;#34; }}Rhizome Artbase Artists and Works {{ end }}

{{ define &amp;#34;content&amp;#34; }}
&amp;lt;h1&amp;gt;Rhizome Artbase Artists and Works&amp;lt;/h1&amp;gt;
&amp;lt;ul&amp;gt;
    {{ range . }}
&amp;lt;h2&amp;gt;Artist: {{ .artistName }} ({{ .searchTag }} and other works)&amp;lt;/h2&amp;gt;
{{ include &amp;#34;artistsWorks.html&amp;#34; (query &amp;#34;artistsWorks.rq&amp;#34; .artist.String) }}
{{ end }}
&amp;lt;/ul&amp;gt;
{{ end }}
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Below is the &lt;code&gt;queries/artistsWorks.rq&lt;/code&gt; query referenced by the &lt;code&gt;include&lt;/code&gt; function above. The &lt;code&gt;artist&lt;/code&gt; value passed to it by the template above is plugged in using the &lt;code&gt;&amp;lt;{{.}}&amp;gt;&lt;/code&gt; construct, which I believe is Go template syntax. (I tried to learn more about it, but It&amp;rsquo;s difficult to do web searches for strings like that. You can see another demonstration of it in Snowman&amp;rsquo;s &lt;code&gt;inline-queries&lt;/code&gt; example project.)&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;PREFIX rt: &amp;lt;https://artbase.rhizome.org/prop/direct/&amp;gt;
SELECT DISTINCT ?workTitle ?creationDate ?artworkPage ?artbaseLegacyTags WHERE {
  # r:Q676 is the artist Andy Cox if I need to sub it in next line for testing
  ?artwork rt:P29  &amp;lt;{{.}}&amp;gt; ;    # artwork by artist
           rdfs:label ?workTitle;
           rt:P26 ?creationDateTime .
  OPTIONAL { ?artwork rt:P48 ?artbaseLegacyTags . }
  # Don&#39;t need full ISO date value; just yyyy-mm-dd
  BIND(SUBSTR(str(?creationDateTime),1,10) AS ?creationDate)
  ?artworkPage schema:about ?artwork;
               schema:isPartOf &amp;lt;https://artbase.rhizome.org/&amp;gt;.
}
ORDER BY (?workTitle)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I left the qname for one of the artists in a comment near the top of the query because I sometimes replaced the &lt;code&gt;&amp;lt;{{.}}&amp;gt;&lt;/code&gt; with that qname when working out other parts of the query logic.&lt;/p&gt;
&lt;p&gt;Remember that the &lt;code&gt;include&lt;/code&gt; function mentioned both this new &lt;code&gt;queries/artistsWorks.rq&lt;/code&gt; file and the template file that goes with it to format the result: &lt;code&gt;template/artistsWorks.html&lt;/code&gt;.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;lt;table&amp;gt;
  &amp;lt;tr&amp;gt;&amp;lt;th width=&amp;quot;200&amp;quot;&amp;gt;title&amp;lt;/th&amp;gt;&amp;lt;th width=&amp;quot;100&amp;quot;&amp;gt;creation date&amp;lt;/th&amp;gt;&amp;lt;th&amp;gt;tags&amp;lt;/th&amp;gt;&amp;lt;/tr&amp;gt;
    {{ range . }}
    &amp;lt;tr&amp;gt;
      &amp;lt;td&amp;gt;&amp;lt;a href=&#39;{{ .artworkPage }}&#39;&amp;gt;{{ .workTitle}}&amp;lt;/a&amp;gt;&amp;lt;/td&amp;gt;
      &amp;lt;td&amp;gt;{{ .creationDate}}&amp;lt;/td&amp;gt;
      &amp;lt;td&amp;gt;{{ .artbaseLegacyTags }}&amp;lt;/td&amp;gt;
    &amp;lt;/tr&amp;gt;
        {{ end }}
&amp;lt;/table&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This creates a table for each artist&amp;rsquo;s works with a row for each one. The first cell of each row uses the URL stored in the &lt;code&gt;?artworkPage&lt;/code&gt; value retrieved by the &lt;code&gt;artistsWorks.rq&lt;/code&gt; query to create a link to that page—for example, to &lt;a href=&#34;https://artbase.rhizome.org/wiki/Q3050&#34;&gt;this page&lt;/a&gt; for one of the retrieved works.&lt;/p&gt;
&lt;p&gt;Once the additions and modifications have been made to the files described so far, a &lt;code&gt;snowman build&lt;/code&gt; creates a new &lt;code&gt;site/index.html&lt;/code&gt; file with the work lists under each artist&amp;rsquo;s name.&lt;/p&gt;
&lt;h1 id=&#34;looking-more-stylish&#34;&gt;Looking more stylish&lt;/h1&gt;
&lt;p&gt;I added some simple CSS, but first, in the &lt;code&gt;templates/layouts/default.html&lt;/code&gt; file in the project, I took the / out of the following line so that the generated &lt;code&gt;index.html&lt;/code&gt; file would look for &lt;code&gt;style.css&lt;/code&gt; in the same directory:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;lt;link rel=&amp;quot;stylesheet&amp;quot; href=&amp;quot;/style.css&amp;quot;&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;There was already a &lt;code&gt;style.css&lt;/code&gt; file in the project&amp;rsquo;s &lt;code&gt;static&lt;/code&gt; directory. I replaced its contents with the following minimal CSS:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;{ font-family: arial,helvetica; font-size:12pt; }

body {
    margin: .25in .5in .25in .5in; /* t,r,b,l */
    font-family: arial,helvetica; 
}

th {
    text-align: left;
    background: lightgray;
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Another &lt;code&gt;snowman build&lt;/code&gt; then created a version of the page that looks like the one that I previewed in part one of this series:&lt;/p&gt;
&lt;img src=&#34;https://www.bobdc.com/img/main/snowmanArtbasePreview.png&#34; class=&#34;centered&#34;  alt=&#34;Preview of Snowman ArtBase project&#34;/&gt;
&lt;h1 id=&#34;query-for-3d-works&#34;&gt;Query for 3D works&lt;/h1&gt;
&lt;p&gt;I mentioned how I stored the string &amp;ldquo;Flash&amp;rdquo; in the &lt;code&gt;?searchTag&lt;/code&gt; variable of the &lt;code&gt;queries/index.rq&lt;/code&gt; query to make it easier to have this query search for artwork tagged with other values. After changing this variable&amp;rsquo;s value to &amp;ldquo;3D&amp;rdquo; and doing another build, the top of the &lt;code&gt;index.html&lt;/code&gt; file looked like this:&lt;/p&gt;
&lt;img src=&#34;https://www.bobdc.com/img/main/snowmanArtbasePreview2.png&#34; class=&#34;centered&#34;  alt=&#34;Snowman ArtBase project listing 3D works&#34;/&gt;
&lt;p&gt;The image at the top of this blog entry is from &lt;a href=&#34;https://artbase.rhizome.org/wiki/Q3273&#34;&gt;1+1+1+1+1+1+1+1+1+1+1+1 by &lt;/a&gt; by &lt;a href=&#34;https://artbase.rhizome.org/wiki/Q1128&#34;&gt;Grégory Chatonsky&lt;/a&gt;, one of the works tagged as 3D.&lt;/p&gt;
&lt;p&gt;The search for the keyword is just a simple substring search of the CSV list. If a work had been tagged with &amp;ldquo;&lt;a href=&#34;https://www.youtube.com/watch?v=jPNVOxZ7Ius#t=2m50s&#34;&gt;3DogNight&lt;/a&gt;&amp;rdquo;, that also would have been retrieved in the search for &amp;ldquo;3D&amp;rdquo;. For a more serious search of tags, I would make a copy of the keyword list that was not only all lower-case but also had spaces removed and began and ended with commas &amp;ldquo;,like,this,&amp;rdquo;. Then, a search of that for a version of &lt;code&gt;?searchTag&lt;/code&gt; enclosed by commas such as &amp;ldquo;,3d,&amp;rdquo; would be more accurate.&lt;/p&gt;
&lt;h1 id=&#34;possible-next-steps&#34;&gt;Possible next steps&lt;/h1&gt;
&lt;p&gt;I zipped up my &lt;a href=&#34;http://www.bobdc.com/miscfiles/artbase.zip&#34;&gt;artbase snowman project&lt;/a&gt; and made it available so that you can unzip it in your own Snowman &lt;code&gt;examples&lt;/code&gt;directory and try it out. The &amp;ldquo;Getting started with Snowman&amp;rdquo; section of the &lt;a href=&#34;https://github.com/glaciers-in-archives/snowman/releases/tag/0.1.0&#34;&gt;Snowman&lt;/a&gt; home page has brief descriptions of other projects included in that &lt;code&gt;examples&lt;/code&gt; directory as part of the Snowman distribution. Each of those demonstrates other features that you can incorporate into your own Snowman website projects. One included example that is not listed there is &lt;code&gt;nested-lists-with-single-query&lt;/code&gt;, which shows a way to &amp;ldquo;render nested lists without the need of multiple queries or views&amp;rdquo;.&lt;/p&gt;
&lt;p&gt;You can apply these features to you own copy of my &lt;code&gt;artbase&lt;/code&gt; project, or you can apply them to you own new Snowman projects that you create with &lt;code&gt;snowman new&lt;/code&gt;. Let me know how it turns out!&lt;/p&gt;
&lt;hr/&gt;
&lt;p&gt;&lt;em&gt;Comments? Reply to &lt;a href=&#34;https://twitter.com/bobdc/status/1538571776636198915&#34;&gt;my tweet&lt;/a&gt; announcing this blog entry.&lt;/em&gt;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2022">2022</category>
      
      <category domain="https://www.bobdc.com//categories/rdf">RDF</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
    </item>
    
    <item>
      <title>Generating websites with SPARQL and Snowman, part 1</title>
      <link>https://www.bobdc.com/blog/snowmanartbasept1/</link>
      <pubDate>Sun, 22 May 2022 11:20:00 +0000</pubDate>
      
      <guid>https://www.bobdc.com/blog/snowmanartbasept1/</guid>
      
      
      <description><div>With Rhizome&#39;s excellent ArtBase SPARQL endpoint. </div><div>&lt;img src=&#34;https://www.bobdc.com/img/main/colorFieldTelevision.png&#34; style=&#34;margin: 0px 30px 20px 80px;&#34; border=&#34;0&#34; align=&#34;right&#34; width=&#34;400&#34; alt=&#34;Color Field Television by Andrew Venell&#34; title=&#34;Color Field Television by Andrew Venell&#34;  /&gt;
&lt;p&gt;&lt;a href=&#34;https://github.com/glaciers-in-archives/snowman/releases/tag/0.1.0&#34;&gt;Snowman&lt;/a&gt; is an open-source project that generates static web sites from data served up by SPARQL endpoints. The history of the web is full of sites generated from relational database back ends, so it&amp;rsquo;s nice to see this significant step toward doing it with RDF data.&lt;/p&gt;
&lt;p&gt;Snowman is written in the Go programming language. The Hugo tool that I use to &lt;a href=&#34;../changing-my-blogs-domain-name/&#34;&gt;generate this website&lt;/a&gt; is also written using Go, and as with Hugo, no knowledge of Go is required to use Snowman. (If you do learn some Go, it&amp;rsquo;s &lt;a href=&#34;../rdf2modsxml/&#34;&gt;pretty cool&lt;/a&gt;.)&lt;/p&gt;
&lt;p&gt;I built a website around Rhizome&amp;rsquo;s ArtBase project to get to know Snowman better. &lt;a href=&#34;https://artbase.rhizome.org/wiki/Main_Page&#34;&gt;Rhizome&lt;/a&gt;, as their home page describes, &amp;ldquo;is an archive of born-digital artworks from 1999 to the present day&amp;rdquo; affiliated with &lt;a href=&#34;https://www.newmuseum.org/&#34;&gt;The New Museum&lt;/a&gt; in New York City. When you think of museum art preservation work, you usually think of preservationists dealing with fading paint colors and cracks in artwork; the Rhizome project is doing the difficult work of maintaining an infrastructure to present older computer-based art that often relies on obsolete technology such as &lt;a href=&#34;https://www.adobe.com/products/flashplayer/end-of-life.html&#34;&gt;Flash&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;And, Rhizome makes the data about their collection available as a SPARQL endpoint! &lt;a href=&#34;https://artbase.rhizome.org/wiki/Query&#34;&gt;It has good documentation&lt;/a&gt; that links to their &lt;a href=&#34;https://query.artbase.rhizome.org/&#34;&gt;endpoint&amp;rsquo;s HTML interface&lt;/a&gt; in addition to describing it. It does not mention the actual endpoint URL, which is &lt;code&gt;https://query.artbase.rhizome.org/proxy/wdqs/bigdata/namespace/wdq/sparql&lt;/code&gt;, but the HTML interface does something related that is very handy: after you run a query on the HTML front end, the &amp;ldquo;&amp;lt;/&amp;gt; Code&amp;rdquo; link in the upper-right of the results displays an escaped version of the query with the actual endpoint URL. You can pass this to &lt;a href=&#34;./curling-sparql&#34;&gt;curl&lt;/a&gt; or other tools to build applications around this data.&lt;/p&gt;
&lt;p&gt;In Snowman, a given website is built around a specified endpoint, and the set of files used to create that website are known as a project. The &lt;a href=&#34;https://github.com/glaciers-in-archives/snowman#readme&#34;&gt;github readme file&lt;/a&gt; does a nice job of explaining all the pieces of a project and how they fit together. This includes a writeup of the &lt;code&gt;snowman new&lt;/code&gt; command, which generates a skeleton project that you can modify to use your own data and presentation. (The readme also describes the straightforward process for installing and building Snowman.) In this two-part series I will walk through the steps I used to create a web page listing artists and their works where the work had been tagged in the data with a particular keyword such as &amp;ldquo;Flash&amp;rdquo; or &amp;ldquo;3D&amp;rdquo;.&lt;/p&gt;
&lt;h1 id=&#34;create-load-and-view-a-sample-project&#34;&gt;Create, load, and view a sample project&lt;/h1&gt;
&lt;p&gt;The Snowman project includes an &lt;code&gt;examples&lt;/code&gt; directory with several projects that you can explore to learn more about Snowman&amp;rsquo;s features. The following command from within that directory created an &lt;code&gt;artbase&lt;/code&gt; project as a sibling of the other examples. (As you may have guessed from the &lt;code&gt;../&lt;/code&gt; part, after you build Snowman the &lt;code&gt;snowman&lt;/code&gt; binary is in the parent directory of of &lt;code&gt;examples&lt;/code&gt;.)&lt;/p&gt;
&lt;pre&gt;&lt;code&gt; ../snowman new --directory=&amp;quot;artbase&amp;quot;     # directory should not already exist
 Your project has been created in: artbase
 You can now run:
 cd artbase
 snowman build
 snowman server
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The &amp;ldquo;you can now run&amp;rdquo; commands that it suggests assume that &lt;code&gt;snowman&lt;/code&gt; is in your path. If not, point to it like I did in the &lt;code&gt;../&lt;/code&gt; call to it above.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;snowman server&lt;/code&gt; command suggested by the &lt;code&gt;snowman new&lt;/code&gt; output started up a server at http://127.0.0.1:8000/, where I could see the results of the &lt;code&gt;site/index.html&lt;/code&gt; file generated by &lt;code&gt;snowman build&lt;/code&gt;. Instead of running the server, you can just load the &lt;code&gt;site/index.html&lt;/code&gt; file created by &lt;code&gt;snowman build&lt;/code&gt; directly into your browser, which was what I did for most of my development. The advantage of the server is the ability to use features like AJAX requests and fancier JavaScript things that won&amp;rsquo;t work with files loaded using &lt;code&gt;file://&lt;/code&gt; URLs.&lt;/p&gt;
&lt;h1 id=&#34;point-the-project-to-the-artbase-endpoint-instead-of-the-default-one&#34;&gt;Point the project to the ArtBase endpoint instead of the default one&lt;/h1&gt;
&lt;p&gt;The default &lt;code&gt;site/index.html&lt;/code&gt; file created by &lt;code&gt;snowman build&lt;/code&gt; (not to be confused with the &lt;code&gt;site/static/index.html&lt;/code&gt; file that it also creates) tells us that the endpoint to query is specified in the &lt;code&gt;snowman.yaml&lt;/code&gt; file, so in that file I changed the endpoint URL from &amp;ldquo;&lt;a href=&#34;https://query.wikidata.org/sparql%22&#34;&gt;https://query.wikidata.org/sparql&amp;quot;&lt;/a&gt; to &amp;ldquo;&lt;a href=&#34;https://query.artbase.rhizome.org/proxy/wdqs/bigdata/namespace/wdq/sparql%22&#34;&gt;https://query.artbase.rhizome.org/proxy/wdqs/bigdata/namespace/wdq/sparql&amp;quot;&lt;/a&gt;. The default query created by &lt;code&gt;snowman new&lt;/code&gt; in &lt;code&gt;queries/index.rq&lt;/code&gt;  just asks for any ten triples, so it should work with any endpoint. After revising the endpoint URL I did another  &lt;code&gt;snowman build&lt;/code&gt;, reloaded the browser,  and then I saw ten triples from the ArtBase project instead of from Wikidata. (Sometimes I saw the same triples that I saw with the wikidata endpoint—triples defining the &lt;a href=&#34;https://schema.org/&#34;&gt;schema.org&lt;/a&gt; ontology that they both use. In the next step we will definitely see ArtBase triples in the result.)&lt;/p&gt;
&lt;h1 id=&#34;query-for-artist-names-instead-of-random-triples&#34;&gt;Query for artist names instead of random triples&lt;/h1&gt;
&lt;p&gt;After developing the query below in the ArtBase endpoint&amp;rsquo;s  &lt;a href=&#34;https://query.artbase.rhizome.org/&#34;&gt;HTML interface&lt;/a&gt;, I changed the default query in &lt;code&gt;queries/index.rq&lt;/code&gt; to this query so that my Snowman project would ask the ArtBase endpoint for artists who had any artworks, in alphabetical order:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;PREFIX rt: &amp;lt;https://artbase.rhizome.org/prop/direct/&amp;gt;
SELECT DISTINCT ?artistName WHERE {
  ?artwork rt:P29 ?artist . 
  ?artist rdfs:label ?artistName .
}
ORDER BY (?artistName)
LIMIT 250
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The last time I checked there were 1,268 artists in the dataset, so the &lt;code&gt;LIMIT&lt;/code&gt; line helped to speed the edit-reload cycle. A later version of this query will filter based on artwork tags as another way to limit the number of displayed artists.&lt;/p&gt;
&lt;h1 id=&#34;adjust-the-display-template-to-use-data-from-the-revised-query&#34;&gt;Adjust the display template to use data from the revised query&lt;/h1&gt;
&lt;p&gt;You can see above that the revised &lt;code&gt;queries/index.rq&lt;/code&gt; query binds values to an &lt;code&gt;?artistName&lt;/code&gt; variable, so I replaced the contents of the default &lt;code&gt;templates/index.html&lt;/code&gt; file (which had  a lot of other markup in it to demo various Snowman features) to the following so that the &lt;code&gt;?artistName&lt;/code&gt; values would get inserted where I wanted them. The Go template &lt;code&gt;range&lt;/code&gt; keyword iterates through a list passed to it; in the following that will create a new &lt;code&gt;li&lt;/code&gt; element inside the &lt;code&gt;ul&lt;/code&gt; element for each &lt;code&gt;?artistName&lt;/code&gt; value:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;{{ template &amp;quot;base&amp;quot; . }}
{{ define &amp;quot;title&amp;quot; }}Rhizome Artbase Artists{{ end }}

{{ define &amp;quot;content&amp;quot; }}
&amp;lt;h1&amp;gt;Rhizome Artbase Artists&amp;lt;/h1&amp;gt;
&amp;lt;ul&amp;gt;
    {{ range . }}
&amp;lt;li&amp;gt;{{ .artistName }}&amp;lt;/li&amp;gt;
    {{ end }}
&amp;lt;/ul&amp;gt;
{{ end }}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;After another rebuild and reload, the browser showed a bulleted list of the first 250 artist names.&lt;/p&gt;
&lt;p&gt;In part two, we&amp;rsquo;ll see how to add a query that lists the work of artists for whom at least one artwork has a particular tag, such as Flash, and I&amp;rsquo;ll add CSS. Below is a screenshot of the eventual end result:&lt;/p&gt;
&lt;img src=&#34;https://www.bobdc.com/img/main/snowmanArtbasePreview.png&#34; class=&#34;centered&#34;  alt=&#34;Preview of Snowman ArtBase project&#34;/&gt;
&lt;p&gt;Each work title on the left is a link to a Rhizome page about the work so that you can see it along with a description and other metadata. &lt;a href=&#34;https://artbase.rhizome.org/wiki/Q2564&#34;&gt;Kriegspiel&lt;/a&gt; is one example from the illustration above. The &lt;a href=&#34;https://artbase.rhizome.org/wiki/Q2513&#34;&gt;Color Field Television&lt;/a&gt; image by &lt;a href=&#34;https://artbase.rhizome.org/wiki/Q1122&#34;&gt;Andrew Venell&lt;/a&gt; at the top of this blog entry is another artwork that the generated report links to.&lt;/p&gt;
&lt;p&gt;The ability to do a hierarchical display of the returned result like this is a nice contribution to the world of RDF development, because SPARQL queries normally return either a flat table or triples. (You still see some repetition in the screen shot of where this is headed because the dataset stores the tags as delimited lists and some works have more than one such list.) Watch this space to see what I did to the Snowman project files to get this result!&lt;/p&gt;
&lt;hr/&gt;
&lt;p&gt;&lt;em&gt;Comments? Reply to &lt;a href=&#34;https://twitter.com/bobdc/status/1528398147864735751&#34;&gt;my tweet&lt;/a&gt; announcing this blog entry.&lt;/em&gt;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2022">2022</category>
      
      <category domain="https://www.bobdc.com//categories/rdf">RDF</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
    </item>
    
    <item>
      <title>Queries to explore a dataset</title>
      <link>https://www.bobdc.com/blog/exploringadataset/</link>
      <pubDate>Sat, 30 Apr 2022 08:09:06 +0000</pubDate>
      
      <guid>https://www.bobdc.com/blog/exploringadataset/</guid>
      
      
      <description><div>Even a schemaless one. </div><div>&lt;!-- image from https://www.loc.gov/pictures/resource/cph.3a03809/ --&gt;
&lt;img src=&#34;https://www.bobdc.com/img/main/explorerMacMillan.png&#34; style=&#34;margin: 0px 30px 20px 80px;&#34; border=&#34;0&#34; align=&#34;right&#34; width=&#34;300&#34;  /&gt;
&lt;p&gt;I recently worked on a project where we had a huge amount of RDF and no clue what was in there apart from what we saw by looking at random triples. I developed a few SPARQL queries to give us a better idea of the dataset&amp;rsquo;s content and structure and these queries are generic enough that I thought that they could be useful to other people.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve written about other exploratory queries before. In &lt;a href=&#34;https://www.bobdc.com/blog/exploring-a-sparql-endpoint/&#34;&gt;Exploring a SPARQL Endpoint&lt;/a&gt; I wrote about queries that look for the use of common vocabularies that might be used at a particular endpoint, and how getting a few clues led me to additional related queries. That blog post also mentioned the  “Exploring the Data” section of my book &lt;a href=&#34;http://www.learningsparql.com/&#34;&gt;Learning SPARQL&lt;/a&gt;, which has other general useful queries.&lt;/p&gt;
&lt;p&gt;You can see those listed in the book&amp;rsquo;s &lt;a href=&#34;http://www.learningsparql.com/toc.html&#34;&gt;table of contents&lt;/a&gt;; they often assume that some sort of schema or ontology is in use. A great thing about SPARQL and RDF, though, is that with no knowledge of a schema or any other clues about a dataset&amp;rsquo;s contents, simple queries can still let you explore that dataset to see what&amp;rsquo;s there. Today&amp;rsquo;s exploratory queries were not included among those that I described above.&lt;/p&gt;
&lt;p&gt;Example output for each query uses the &lt;a href=&#34;http://www.snee.com/bobdc.blog/files/BeatlesMusicians.ttl&#34;&gt;Beatles Musicians&lt;/a&gt; dataset that I described at &lt;a href=&#34;https://www.bobdc.com/blog/sparql-queries-of-beatles-reco/&#34;&gt;SPARQL queries of Beatles recording sessions&lt;/a&gt;.&lt;/p&gt;
&lt;h1 id=&#34;how-many-triples-does-this-dataset-have-in-all&#34;&gt;How many triples does this dataset have in all?&lt;/h1&gt;
&lt;pre&gt;&lt;code&gt;SELECT (COUNT (*) AS?tripleCount) WHERE {
   ?s ?p ?o
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Definitely a hall of fame, classic query. Here is the result for the Beatles musician data after performing the query with the Jena &lt;a href=&#34;https://jena.apache.org/documentation/tools/index.html&#34;&gt;arq&lt;/a&gt; command line query engine:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;---------------
| tripleCount |
===============
| 4089        |
---------------
&lt;/code&gt;&lt;/pre&gt;
&lt;h1 id=&#34;show-all-the-types-being-used&#34;&gt;Show all the types being used&lt;/h1&gt;
&lt;p&gt;Never mind whether any types were declared; how many types are used? List them, but don&amp;rsquo;t repeat any.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;SELECT DISTINCT ?type WHERE {
   ?s a ?type
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The result with the Beatles musician data:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;----------------------------------------------------
| type                                             |
====================================================
| &amp;lt;http://learningsparql.com/ns/schema/Song&amp;gt;       |
| &amp;lt;http://learningsparql.com/ns/schema/Musician&amp;gt;   |
| &amp;lt;http://learningsparql.com/ns/schema/Instrument&amp;gt; |
----------------------------------------------------
&lt;/code&gt;&lt;/pre&gt;
&lt;h1 id=&#34;count-instances-per-type&#34;&gt;Count instances per type&lt;/h1&gt;
&lt;p&gt;Of the types that the previous query found being used, how many instances of each are there? This is useful when you are prioritizing what you&amp;rsquo;re going to do with the data.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;SELECT  ?type (COUNT (?s) AS ?instanceCount) 
WHERE {
   ?s a ?type . 
}
GROUP BY  ?type
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The result:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;--------------------------------------------------------------------
| type                                             | instanceCount |
====================================================================
| &amp;lt;http://learningsparql.com/ns/schema/Instrument&amp;gt; | 180           |
| &amp;lt;http://learningsparql.com/ns/schema/Song&amp;gt;       | 293           |
| &amp;lt;http://learningsparql.com/ns/schema/Musician&amp;gt;   | 238           |
--------------------------------------------------------------------
&lt;/code&gt;&lt;/pre&gt;
&lt;h1 id=&#34;count-the-properties-that-each-type-uses&#34;&gt;Count the properties that each type uses&lt;/h1&gt;
&lt;p&gt;Of the types that were found above, how many different properties does each use?&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;SELECT DISTINCT ?type (COUNT(DISTINCT ?p) AS ?c)
WHERE {
   ?s a ?type . 
   ?s ?p ?o . 
}
GROUP BY ?type
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Number of properties used in the Beatles data, by type:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;----------------------------------------------------------
| type                                             | c   |
==========================================================
| &amp;lt;http://learningsparql.com/ns/schema/Instrument&amp;gt; | 2   |
| &amp;lt;http://learningsparql.com/ns/schema/Song&amp;gt;       | 182 |
| &amp;lt;http://learningsparql.com/ns/schema/Musician&amp;gt;   | 2   |
----------------------------------------------------------
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The next query will show us why the &lt;code&gt;Song&lt;/code&gt; class uses so many properties.&lt;/p&gt;
&lt;h1 id=&#34;list-properties-per-type&#34;&gt;List properties per type&lt;/h1&gt;
&lt;p&gt;What are these properties that each type uses? This is also useful for prioritization. Note the similarities with and differences from the previous query.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;SELECT DISTINCT ?type ?property
WHERE {
   ?s a ?type .
   ?s ?property ?o .
}
ORDER BY ?type ?property
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The following is an excerpt from the middle of this query&amp;rsquo;s result, with &lt;code&gt;&amp;lt;http://learningsparql.com/ns/schema/Song&amp;gt;&lt;/code&gt; reduced to &lt;code&gt;s:Song&lt;/code&gt; to make it all fit better here. This sample shows that all the different instruments, with all their different spellings, were properties of each song. (Read more about how that worked in my  &lt;a href=&#34;https://www.bobdc.com/blog/sparql-queries-of-beatles-reco/&#34;&gt;SPARQL queries of Beatles recording sessions&lt;/a&gt; blog post.)&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;| s:Song | &amp;lt;http://learningsparql.com/ns/instrument/guiro&amp;gt;
| s:Song | &amp;lt;http://learningsparql.com/ns/instrument/guitar&amp;gt;
| s:Song | &amp;lt;http://learningsparql.com/ns/instrument/handbell&amp;gt;
| s:Song | &amp;lt;http://learningsparql.com/ns/instrument/handclaps&amp;gt;
| s:Song | &amp;lt;http://learningsparql.com/ns/instrument/harmonica&amp;gt;
| s:Song | &amp;lt;http://learningsparql.com/ns/instrument/harmonium&amp;gt;
| s:Song | &amp;lt;http://learningsparql.com/ns/instrument/harmonyvocals&amp;gt; 
&lt;/code&gt;&lt;/pre&gt;
&lt;h1 id=&#34;have-a-query-create-a-schema-for-this-schemaless-data&#34;&gt;Have a query create a schema for this schemaless data&lt;/h1&gt;
&lt;p&gt;Consider that:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The dataset has no schema but we found types being used&lt;/li&gt;
&lt;li&gt;We found properties associated with these types&lt;/li&gt;
&lt;li&gt;Schemas are themselves datasets of triples&lt;/li&gt;
&lt;li&gt;SPARQL lets you create triples&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This all adds up to the ability to create a schema where there isn&amp;rsquo;t any. In fact, we can do it with a slight variation on the last query:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;PREFIX rdfs: &amp;lt;http://www.w3.org/2000/01/rdf-schema#&amp;gt;
PREFIX rdf:  &amp;lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#&amp;gt; 

CONSTRUCT {
   ?type a rdfs:Class .
   ?property a rdf:Property .
}
WHERE {
  ?s a ?type .
  ?s ?property ?o .
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Note how the &lt;code&gt;WHERE&lt;/code&gt; clause of this query is identical to the one from the preceding &lt;code&gt;SELECT&lt;/code&gt; query. Here is an excerpt of what it created with the Beatles session data:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;s:Instrument  rdf:type  rdfs:Class .
s:Song  rdf:type  rdfs:Class .
s:Musician  rdf:type  rdfs:Class .
i:recorder  rdf:type  rdf:Property .
i:celesta  rdf:type  rdf:Property .
i:tabla  rdf:type  rdf:Property .
i:tenorsaxophone  rdf:type  rdf:Property .
rdfs:label  rdf:type  rdf:Property .
i:harmonica  rdf:type  rdf:Property .
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We could go a little further by having the schema use the &lt;code&gt;rdfs:domain&lt;/code&gt; and &lt;code&gt;rdfs:range&lt;/code&gt; properties to associate the declared properties with the classes that the query found them with:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;PREFIX rdfs: &amp;lt;http://www.w3.org/2000/01/rdf-schema#&amp;gt;
PREFIX rdf:  &amp;lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#&amp;gt;

CONSTRUCT {
  ?type a rdfs:Class .
  ?property a rdf:Property .
  ?property rdfs:domain ?type .
  ?property rdfs:range ?otype . 
}
WHERE {
  ?s a ?type  .
  ?s ?property ?o .
  OPTIONAL { ?o a ?otype }
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Along with the schema triples you see above, this new version adds triples like these:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;i:banjo  rdf:type    rdf:Property ;
        rdfs:domain  s:Song ;
        rdfs:range   s:Musician .
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;It also gives the &lt;code&gt;rdfs:label&lt;/code&gt; property &lt;code&gt;rdfs:domain&lt;/code&gt; values of  &lt;code&gt;s:Instrument&lt;/code&gt;, &lt;code&gt;s:Musician&lt;/code&gt;, and &lt;code&gt;s:Song&lt;/code&gt;, which isn&amp;rsquo;t quite right; as the RDFS spec &lt;a href=&#34;https://www.w3.org/TR/rdf-schema/#ch_label&#34;&gt;tells us&lt;/a&gt;, &amp;ldquo;[t]he &lt;code&gt;rdfs:domain&lt;/code&gt; of &lt;code&gt;rdfs:label&lt;/code&gt; is &lt;code&gt;rdfs:Resource&lt;/code&gt;&amp;rdquo;. The spec also &lt;a href=&#34;https://www.w3.org/TR/rdf-schema/#ch_domain&#34;&gt;tells us&lt;/a&gt; that &amp;ldquo;the resources denoted by subjects of triples with predicate P are instances of all the classes stated by the &lt;code&gt;rdfs:domain&lt;/code&gt; properties&amp;rdquo;, which in the case of my example means that every instance with an &lt;code&gt;rdfs:label&lt;/code&gt; property is an instrument and a musician and song.&lt;/p&gt;
&lt;p&gt;We clearly don&amp;rsquo;t want to say that, but if you are creating a schema for a dataset that lacks one, &lt;code&gt;CONSTRUCT&lt;/code&gt; queries like this can give you a big head start. Just run one or the other with the dataset and then edit the schema that it creates as you see fit.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Comments? Reply to &lt;a href=&#34;https://twitter.com/bobdc/status/1520446102658592768&#34;&gt;my tweet&lt;/a&gt; announcing this blog entry.&lt;/em&gt;&lt;/p&gt;
&lt;hr /&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2022">2022</category>
      
      <category domain="https://www.bobdc.com//categories/rdf">RDF</category>
      
      <category domain="https://www.bobdc.com//categories/digital-humanities">digital-humanities</category>
      
      <category domain="https://www.bobdc.com//categories/skos">SKOS</category>
      
      <category domain="https://www.bobdc.com//categories/sql">SQL</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
    </item>
    
    <item>
      <title>Doing a podcast interview about technical writing</title>
      <link>https://www.bobdc.com/blog/ieee-podcast/</link>
      <pubDate>Sun, 06 Mar 2022 12:10:00 +0000</pubDate>
      
      <guid>https://www.bobdc.com/blog/ieee-podcast/</guid>
      
      
      <description><div>History, tools, and more.</div><div>&lt;img src=&#34;https://www.bobdc.com/img/main/se-radio-logo.png&#34; style=&#34;margin: 0px 30px 20px 80px;&#34; border=&#34;0&#34; align=&#34;right&#34; width=&#34;350&#34;  /&gt;
&lt;p&gt;After listening to hundreds of podcast interviews over the years I finally got to be the subject of one myself. Nikhil Krishna interviewed me for the &lt;a href=&#34;https://www.se-radio.net/&#34;&gt;Software Engineering Radio&lt;/a&gt; podcast, which is sponsored by the &lt;a href=&#34;https://www.ieee.org&#34;&gt;IEEE&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;It was titled &lt;a href=&#34;https://www.se-radio.net/2022/03/episode-501-bob-ducharme-on-creating-technical-documentation-for-software-projects/&#34;&gt;Bob DuCharme on Creating Technical Documentation for Software Projects&lt;/a&gt;. I&amp;rsquo;m going to quote the episode page&amp;rsquo;s list of topics we discussed, but to practice one of the things I preached, I will convert that page&amp;rsquo;s description to a bulleted list:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;The difference between different types of documentation and the audiences they target&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The importance of using proper grammar and clarity in writing good documentation that people want to read&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Other forms of documentation (images, video and audio)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Challenges of maintaining and updating documentation&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Keeping documentation in sync with products&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Toolchains for building documentation&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;History of software documentation tooling and standards&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Another important topic we covered was working with other people in a tech organization such as developers and marketing people.&lt;/p&gt;
&lt;p&gt;After my discussion of XML&amp;rsquo;s role in the history of technical documentation in the interview (basically, a retelling of &lt;a href=&#34;https://www.bobdc.com/blog/a-brief-opinionated-history-of/&#34;&gt;this history of XML&lt;/a&gt; that I wrote several years ago) I was happy to see that the Software Engineering Podcast does offer an &lt;a href=&#34;https://seradio.libsyn.com/rss&#34;&gt;RSS feed&lt;/a&gt; for people to track the podcast guests and topics. You can find other blog entries that I&amp;rsquo;ve written on tech writing topics in the category &lt;a href=&#34;https://www.bobdc.com/categories/documenting-software/&#34;&gt;documenting software&lt;/a&gt; in this blog. The podcast episode page has links to additional relevant material.&lt;/p&gt;
&lt;p&gt;One thing I regretted forgetting to mention in the interview, when we were discussing writing style, was George Orwell&amp;rsquo;s classic essay &lt;a href=&#34;https://www.orwellfoundation.com/the-orwell-foundation/orwell/essays-and-other-works/politics-and-the-english-language/&#34;&gt;Politics and the English Language&lt;/a&gt;. I had meant to recommend that everyone read it but pretend that the title is &amp;ldquo;Technology and the English Language&amp;rdquo;.&lt;/p&gt;
&lt;p&gt;So if you&amp;rsquo;re interested in doing technical writing or just being involved with technical writing tasks, you might find this podcast episode useful.&lt;/p&gt;
&lt;p&gt;&lt;iframe loading=&#34;lazy&#34; style=&#34;border: none;&#34; src=&#34;//html5-player.libsyn.com/embed/episode/id/22301357/height/90/theme/custom/autoplay/no/autonext/no/thumbnail/yes/preload/no/no_addthis/no/direction/backward/render-playlist/no/custom-color/337598/&#34; width=&#34;100%&#34; height=&#34;90&#34; scrolling=&#34;no&#34; allowfullscreen=&#34;allowfullscreen&#34;&gt;&lt;/iframe&gt;&lt;/p&gt;
&lt;hr/&gt;
&lt;p&gt;&lt;em&gt;Comments? Reply to &lt;a href=&#34;https://twitter.com/bobdc/status/1500524882383228931&#34;&gt;my tweet&lt;/a&gt; announcing this blog entry.&lt;/em&gt;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2022">2022</category>
      
      <category domain="https://www.bobdc.com//categories/documenting-software">documenting software</category>
      
    </item>
    
    <item>
      <title>Taking some RDF beyond what it could do in a relational database</title>
      <link>https://www.bobdc.com/blog/dhconfrdfpart2/</link>
      <pubDate>Sun, 27 Feb 2022 11:02:00 +0000</pubDate>
      
      <guid>https://www.bobdc.com/blog/dhconfrdfpart2/</guid>
      
      
      <description><div>Part 2 of 2.</div><div>&lt;img src=&#34;https://www.bobdc.com/img/main/IndexOfDigitalHumanitiesConferencesHomePage.png&#34; style=&#34;margin: 0px 30px 20px 80px;&#34; border=&#34;0&#34; align=&#34;right&#34;  /&gt;
&lt;p&gt;In my &lt;a href=&#34;../dhconfrdfpart1/&#34;&gt;last posting&lt;/a&gt; I described Carnegie Mellon University&amp;rsquo;s &lt;a href=&#34;https://dh-abstracts.library.cmu.edu/&#34;&gt;Index of Digital Humanities Conferences&lt;/a&gt; project, which makes over 60 years of Digital Humanities research abstracts and relevant metadata available on both the project&amp;rsquo;s website and as a file of zipped CSV that they update often. I also described how I developed scripts to convert all that CSV to some pretty nice RDF and made the scripts available on github. I finished with a promise to follow up by showing some of the things we can do with RDF versions of this data that we can&amp;rsquo;t do (or at least, can&amp;rsquo;t do nearly as easily) with the relational version. And here we are.&lt;/p&gt;
&lt;h2 id=&#34;easier-addition-of-new-properties-that-only-apply-to-a-few-instances-of-some-classes&#34;&gt;Easier addition of new properties that only apply to a few instances of some classes&lt;/h2&gt;
&lt;p&gt;What if you want to store additional data about the abstracts, conferences or authors? For example, if you want to store the hash tags associated with the conferences? The Chesapeake Digital Humanities Consortium 2020 conference (&lt;code&gt;&amp;lt;http://rdfdata.org/dha/conference/i170&amp;gt;&lt;/code&gt; in my RDF data) has a &lt;code&gt;dha:url&lt;/code&gt; value of &lt;a href=&#34;https://chesapeakedh.github.io/conference-2020&#34;&gt;https://chesapeakedh.github.io/conference-2020&lt;/a&gt;. That&amp;rsquo;s the conference home page, and if I go there I see that the conference hash tag is #CDHC20. When I&amp;rsquo;m at a conference—or not there and wishing that I was—Twitter searches for the conference&amp;rsquo;s hashtag can tell me interesting things that are going on or about to go on. This means that a Twitter hashtag is a hook to additional information about the conference, as you can see with a &lt;a href=&#34;https://twitter.com/hashtag/cdhc20?src=hash&#34;&gt;search on #CDHC20&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Let&amp;rsquo;s say that you could only find hashtags for 15% of the conferences. If you were storing the full dataset in relational tables, is it worth adding a new column to the &lt;code&gt;conferences&lt;/code&gt; table to store this value that will be blank for 85% of the rows? In this particular case, it&amp;rsquo;s not even up to me. I would have to convince the team at Carnegie Mellon to add this column to their &lt;code&gt;conferences&lt;/code&gt; table and populate it.&lt;/p&gt;
&lt;p&gt;With RDF, I don&amp;rsquo;t have to worry about any of this. I can create the data when I have it as more triples like this:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;&amp;lt;http://rdfdata.org/dha/conference/i170&amp;gt; dha:hashtag &amp;#34;#CDHC20&amp;#34; . 
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;(RDF geek note: Instead of storing the hash tag as a literal string value I was tempted to do it as the URL for the Twitter search because resource URIs as objects can then link to other resources. I left it as a string value because the same hashtag might be used with other social media such as &lt;a href=&#34;https://www.instagram.com/explore/tags/cdhc20/&#34;&gt;Instagram&lt;/a&gt;.)&lt;/p&gt;
&lt;h2 id=&#34;linking-to-other-data-sets-out-there-linked-data&#34;&gt;Linking to other data sets out there (Linked Data!)&lt;/h2&gt;
&lt;p&gt;I can also add triples of data that enrich the metadata stored with the project. For example, the RDF I created shows that seven works have a keyword value of &lt;code&gt;http://rdfdata.org/dha/keyword/i6995&lt;/code&gt;, which has the label &amp;ldquo;TEI&amp;rdquo;. Wikipedia tells us that the &lt;a href=&#34;https://en.wikipedia.org/wiki/Text_Encoding_Initiative&#34;&gt;Text Encoding Initiative&lt;/a&gt; is &amp;ldquo;a text-centric community of practice in the academic field of digital humanities, operating continuously since the 1980s&amp;rdquo;. They&amp;rsquo;ve been putting classic works of literature, along with copious metadata, into XML ever since XML was a &lt;a href=&#34;https://en.wikipedia.org/wiki/Standard_Generalized_Markup_Language&#34;&gt;four-letter word&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;If the Text Encoding Initiative has a Wikipedia page, then it also has &lt;a href=&#34;https://www.wikidata.org/wiki/Q780920&#34;&gt;triples in Wikidata&lt;/a&gt;. These show the project&amp;rsquo;s Twitter handle, its Library of Congress authority ID, its home page, and much more. Just as I added the hashtag value for the Chesapeake Digital Humanities conference above with a triple, I can add another triple that connects the Index of Digital Humanities Conferences URI for TEI to all that great information about it in Wikidata:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;&amp;lt;http://rdfdata.org/dha/keyword/i6995&amp;gt; dha:wikidata &amp;lt;http://www.wikidata.org/entity/Q780920&amp;gt; . 
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;This makes the available metadata about the seven Digital Humanities Conferences works tagged this way much richer.&lt;/p&gt;
&lt;h2 id=&#34;easy-federation-and-integration-of-new-data&#34;&gt;Easy federation and integration of new data&lt;/h2&gt;
&lt;p&gt;This goal blurs a bit with &amp;ldquo;Linking to other data sets out there&amp;rdquo; described above, because if you can link to a dataset with a SPARQL endpoint such as Wikidata then you can send it a CONSTRUCT query and retrieve data from it to store with your local data. The &amp;ldquo;Using standards instead of ad-hoc namespaces&amp;rdquo; section of part one of this blog entry was another step toward this kind of integration, because much of the point of using shared vocabularies is the ability to connect your data to other datasets that use the same vocabularies.&lt;/p&gt;
&lt;p&gt;Other data sources offer interesting potential connections to the Digital Humanities conference data. One is the &lt;a href=&#34;https://en.wikipedia.org/wiki/Virtual_International_Authority_File&#34;&gt;Virtual International Authority File&lt;/a&gt;, or &lt;a href=&#34;https://viaf.org/&#34;&gt;VIAF&lt;/a&gt;. This has some fairly official data about authors and their works that you can retrieve in RDF. Author names may not always be completely unique, but looking at this data I realized that many authors are self-disambiguating&amp;ndash;if your name is &amp;ldquo;John Smith&amp;rdquo;, and you know that many other authors have that name, if your middle name is Francis you may choose to use &amp;ldquo;John Francis Smith&amp;rdquo; or some variation such as &amp;ldquo;J. Frank Smith&amp;rdquo; or &amp;ldquo;Jack F. Smith&amp;rdquo; as your author name to make it easier for people to find the work that you wrote.&lt;/p&gt;
&lt;p&gt;The RDF that my script generated from the Carnegie Mellon data included this in &lt;code&gt;appellations.ttl&lt;/code&gt;:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;&amp;lt;http://rdfdata.org/dha/appellation/i13&amp;gt;
        rdf:type        dha:Appellation ;
        dha:id          &amp;#34;13&amp;#34; ;
        dha:first_name  &amp;#34;A. Charles&amp;#34; ;
        dha:last_name   &amp;#34;Muller&amp;#34; .
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;VIAF has A. Charles Muller at &lt;a href=&#34;https://viaf.org/viaf/117299466/#Muller,_A._Charles,_1953-&#34;&gt;https://viaf.org/viaf/117299466/#Muller,_A._Charles,_1953-&lt;/a&gt;, with 117299466 being their database&amp;rsquo;s unique identifier for this author. We can use that identifier to create the URL &lt;a href=&#34;https://viaf.org/viaf/117299466/rdf.xml&#34;&gt;https://viaf.org/viaf/117299466/rdf.xml&lt;/a&gt; and then download 111 triples about him. We can also download &lt;a href=&#34;http://viaf.org/viaf/data/&#34;&gt;various versions of the entire VIAF dataset&lt;/a&gt;, but that is too many gigabytes for me to do some quick experiments with. If it was loaded into a triple store, a SPARQL query that concatenates the &lt;code&gt;dha:first_name&lt;/code&gt; and &lt;code&gt;dha:last_name&lt;/code&gt; values above could help to automate the connection of conference paper authors to VIAF records.&lt;/p&gt;
&lt;h2 id=&#34;inferencing-finding-new-facts-and-connections&#34;&gt;Inferencing: finding new facts and connections&lt;/h2&gt;
&lt;p&gt;Authors of the conference papers made up their own keywords to assign to their works instead of selecting from a curated taxonomy, so it&amp;rsquo;s one big flat list. I did a little curation myself to give the list some hierarchy that would make it easier to find relationships between more relevant papers.&lt;/p&gt;
&lt;p&gt;There were over two dozen keywords that had some variation on &amp;ldquo;TEI&amp;rdquo; or &amp;ldquo;Text Encoding Initiative&amp;rdquo; as their keywords. In my github project&amp;rsquo;s &lt;code&gt;newrdf&lt;/code&gt; directory I added some triples to the SKOS scheme I described in part one called &lt;code&gt;keywordScheme&lt;/code&gt;. The &lt;code&gt;modelTriples.ttl&lt;/code&gt; file in that directory begins like this:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;@prefix rdf:   &amp;lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#&amp;gt; .
@prefix skos:  &amp;lt;http://www.w3.org/2004/02/skos/core#&amp;gt; .
@prefix dha:   &amp;lt;http://rdfdata.org/dha/ns/dh-abstracts/&amp;gt; .
@prefix dhak:  &amp;lt;http://rdfdata.org/dha/keyword/&amp;gt; .

dhak:r10001 a               skos:Concept ;
            skos:inScheme   dha:keywordScheme ;
            skos:prefLabel  &amp;#34;Text Encoding Initiative (TEI)&amp;#34; .

dhak:i1100 skos:broader dhak:r10001 . # generated tei&amp;#34;
dhak:i2639 skos:broader dhak:r10001 . # tei and structural markup&amp;#34;
dhak:i2641 skos:broader dhak:r10001 . # tei encoding&amp;#34;
dhak:i2642 skos:broader dhak:r10001 . # tei markup&amp;#34;
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;First, it defines a new SKOS concept called &amp;ldquo;Text Encoding Initiative (TEI)&amp;rdquo;. The triples that follow that say that each of the relevant SKOS concepts generated from the Carnegie Mellon CSV by my automated conversion has this new one as its &lt;code&gt;skos:broader&lt;/code&gt; value, just as &amp;ldquo;dachshund&amp;rdquo; in an animal taxonomy might have a broader value of &amp;ldquo;dog&amp;rdquo; to group together the different breeds. After the &lt;code&gt;dhak:i2642&lt;/code&gt; triple shown above there are 22 more about other TEI-related keywords. (I was tempted to automate the creation of all of these by looking for a substring of &amp;ldquo;tei&amp;rdquo; in the generated keyword concepts, but existing keywords like &amp;ldquo;Wittgenstein&amp;rdquo; and &amp;ldquo;Frankenstein&amp;rdquo; showed me that this was a bad idea.)&lt;/p&gt;
&lt;p&gt;The git repository where I stored all the files for this conversion project has a &lt;a href=&#34;https://github.com/bobdc/dhconf2rdf&#34;&gt;readme&lt;/a&gt; file that shows some queries demonstrating the value added by this additional data modeling of the otherwise flat keyword list. A SPARQL query for all the works tagged &amp;ldquo;tei&amp;rdquo; retrieves a list of 90 of them. A query for all works tagged with something in the taxonomic subtree of &amp;ldquo;Text Encoding Initiative (TEI)&amp;rdquo; finds 132, so adding a little bit of semantics in the form of explicit relationships between related topics made it possible to find more papers about the TEI. A third query in the readme counts how many TEI-related papers were submitted each year for results that could be turned into a chart of the TEI&amp;rsquo;s popularity at these conferences over time:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;1996: 5
1997: 1
1998: 2
2001: 6
2004: 2
2013: 12
2014: 19
2015: 14
2016: 18
2017: 12
2018: 7
2019: 24
2020: 8
2021: 2
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;The &amp;ldquo;inferencing&amp;rdquo; here is the deduction, based on the little bit of modeling that I did, of connections that were not otherwise explicit between resources described by the dataset.&lt;/p&gt;
&lt;p&gt;The triples in &lt;code&gt;modelTriples.ttl&lt;/code&gt; that enable this, like the RDF triples about conference hash tags, demonstrate how RDF can add value to a dataset that is outside of the control of the person doing the adding. As long as the &lt;code&gt;id&lt;/code&gt; values in the original database keep identifying the same things, we can turn them into URIs that let us connect new kinds of data to the original dataset. It&amp;rsquo;s another great example of the new possibilities that become available when you use RDF to store your data.&lt;/p&gt;
&lt;hr/&gt;
&lt;p&gt;&lt;em&gt;Comments? Reply to &lt;a href=&#34;https://twitter.com/bobdc/status/1497967289299259394&#34;&gt;my tweet&lt;/a&gt; announcing this blog entry.&lt;/em&gt;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2022">2022</category>
      
      <category domain="https://www.bobdc.com//categories/rdf">RDF</category>
      
      <category domain="https://www.bobdc.com//categories/digital-humanities">digital-humanities</category>
      
      <category domain="https://www.bobdc.com//categories/skos">SKOS</category>
      
      <category domain="https://www.bobdc.com//categories/sql">SQL</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
    </item>
    
    <item>
      <title>Converting Digital Humanities paper and conference metadata to RDF</title>
      <link>https://www.bobdc.com/blog/dhconfrdfpart1/</link>
      <pubDate>Sun, 30 Jan 2022 11:35:06 +0000</pubDate>
      
      <guid>https://www.bobdc.com/blog/dhconfrdfpart1/</guid>
      
      
      <description><div>How and why.</div><div>&lt;img src=&#34;https://www.bobdc.com/img/main/IndexOfDigitalHumanitiesConferencesHomePage.png&#34; style=&#34;margin: 0px 30px 20px 80px;&#34; border=&#34;0&#34; align=&#34;right&#34;  /&gt;
&lt;p&gt;I think that RDF has been very helpful in the field of &lt;a href=&#34;https://en.wikipedia.org/wiki/Digital_humanities&#34;&gt;Digital Humanities&lt;/a&gt; for two reasons: first, because so much of that work involves gaining insight from adding new data sources to a given collection, and second, because a large part of this data is metadata about manuscripts and other artifacts. RDF&amp;rsquo;s flexibility supports both of these very well, and several standard schemas and ontologies have matured in the Digital Humanities community to help coordinate the different data sets.&lt;/p&gt;
&lt;p&gt;Unrelated to RDF, in late 2020 a project at Carnegie Mellon University released the &lt;a href=&#34;https://dh-abstracts.library.cmu.edu/&#34;&gt;The Index of Digital Humanities Conferences&lt;/a&gt;. As the project&amp;rsquo;s home page tells us, &amp;ldquo;Browse 7,296 presentations from 500 digital humanities conferences spanning 61 years, featuring 8,651 different authors hailing from 1,853 institutions and 86 countries&amp;rdquo;. These numbers have gone up since the original release of the project. The &lt;a href=&#34;https://dh-abstracts.library.cmu.edu/pages/about/&#34;&gt;About&lt;/a&gt; page and Scott Weingart&amp;rsquo;s &lt;a href=&#34;http://scottbot.net/dh-conf-index/&#34;&gt;blog post&lt;/a&gt; about the project give more good background.&lt;/p&gt;
&lt;p&gt;The presentation abstracts, along with the connections to their presenters and their affiliations, are a gold mine for Digital Humanities research. One of the project&amp;rsquo;s main menus is &lt;a href=&#34;https://dh-abstracts.library.cmu.edu/downloads&#34;&gt;Downloads&lt;/a&gt;, which lets you download all the data used for the project. The &amp;ldquo;Last updated&amp;rdquo; message on that page gives me the impression that they update it several times a week, if not every day. The &amp;ldquo;Full Data&amp;rdquo; zip file that you can download from there has CSV files of all the tables in the project&amp;rsquo;s database.&lt;/p&gt;
&lt;p&gt;According to the project&amp;rsquo;s &lt;a href=&#34;https://dh-abstracts.library.cmu.edu/pages/colophon/&#34;&gt;Colophon&lt;/a&gt;, they store their data  in PostgreSQL and built the interface with Django. I can&amp;rsquo;t blame them for storing the data as a relational database instead of RDF, precisely because tools like Django and Ruby on Rails make it so easy to generate nice websites from relational data.&lt;/p&gt;
&lt;p&gt;Of course, though, I converted it all to RDF, so I&amp;rsquo;m going to describe here how I converted it—or rather, how I built a process to convert it, because I wanted an automated system that could easily be re-run when the CSV data to download gets updated. My next posting will describe the cool new things I could do with the data once it was in RDF, because &amp;ldquo;why bother&amp;rdquo; is an important question for any such project. Here&amp;rsquo;s a preview to whet your appetite:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Easier addition of new properties that only apply to a few instances of some classes&lt;/li&gt;
&lt;li&gt;Linking to other data sets out there (Linked Data!)&lt;/li&gt;
&lt;li&gt;Easy federation and integration of new data&lt;/li&gt;
&lt;li&gt;Inferencing: finding new facts and connections&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I put everything necessary to do the conversion and enhancements on &lt;a href=&#34;https://github.com/bobdc/dhconf2rdf&#34;&gt;github&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I could have loaded the CSV files into a locally running relational database and then used &lt;a href=&#34;https://www.bobdc.com/tags/d2rq/&#34;&gt;D2RQ&lt;/a&gt; as an intermediary layer to treat the relational data as triples. When the Index of Digital Humanities Conferences releases an updated version of their data, though, clearing out the relational data tables and then reloading the updated tables would have been a lot more trouble then just running the short scripts that I wrote, especially if the structure of any of those tables had evolved. And, part of the fun of the conversion was moving beyond the original model to take advantage of relevant standards for easier connection to other projects.&lt;/p&gt;
&lt;h1 id=&#34;converting-the-data&#34;&gt;Converting the data&lt;/h1&gt;
&lt;p&gt;There were two reasons that I wanted the ability to re-run my set of scripts and queries to accommodate updated versions of the data. &amp;ldquo;Updated versions&amp;rdquo; could mean that some tables of data had new rows or revised rows, but I wanted to be able to handle new tables and columns as well. If the data models evolve, I want my output triples to reflect this evolution. (This has already paid off. When I first wrote up my notes on this conversion, the Index of Digital Humanities Conferences project had 22 tables, and now it has 23, and I did not need to revise any of my scripts to include the new table&amp;rsquo;s data.)&lt;/p&gt;
&lt;p&gt;With three of the tables loaded into spreadsheets we can see how one table defines the connections between data in the other two the relational way:&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://www.bobdc.com/img/main/ThreeDHConfTables.png&#34;&gt;&lt;img src=&#34;https://www.bobdc.com/img/main/ThreeDHConfTables.png&#34; class=&#34;centered&#34; width=&#34;600&#34; alt=&#34;Three DH Conference tables as spreadsheets&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;works_keywords.csv&lt;/code&gt; table currently has 13,730 rows. As you can see above, rows 2 and 3 of that spreadsheet tell us that the keywords with IDs 889 (&amp;quot;&lt;a href=&#34;https://www.loc.gov/ead/&#34;&gt;ead&lt;/a&gt;&amp;quot;) and 2439 (&amp;ldquo;sgml-encoding&amp;rdquo;) have been assigned to work 103, &amp;ldquo;What&amp;rsquo;s Interesting for Humanities Computing About Whitman&amp;rsquo;s Poetry Manuscripts?&amp;rdquo; This database has nine tables whose sole job is recording relationships between other tables like &lt;code&gt;works_keywords&lt;/code&gt; does for the &lt;code&gt;works&lt;/code&gt; and &lt;code&gt;keywords&lt;/code&gt; tables. (As you&amp;rsquo;ll see, RDF does a better job of expressing such relationships.)&lt;/p&gt;
&lt;p&gt;I used the open source &lt;a href=&#34;https://www.bobdc.com/blog/tarql/&#34;&gt;tarql&lt;/a&gt; tool to convert all the tables to RDF. Here are some excerpts from the initial conversion:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;# from keywords.ttl
&amp;lt;http://rdfdata.org/dha/keyword/i889&amp;gt;
        rdf:type   dha:Keyword ;
        dha:id     &amp;#34;889&amp;#34; ;
        dha:title  &amp;#34;ead&amp;#34; .

# from works.ttl
&amp;lt;http://rdfdata.org/dha/work/i103&amp;gt;
        rdf:type        dha:Work ;
        dha:id          &amp;#34;103&amp;#34; ;
        dha:conference  &amp;lt;http://rdfdata.org/dha/conference/i2&amp;gt; ;
        dha:title       &amp;#34;What&amp;#39;s Interesting for Humanities Computing About Whitman&amp;#39;s Poetry Manuscripts?&amp;#34; ;
        dha:work_type   &amp;#34;3&amp;#34; .

# from works_keywords.ttl
&amp;lt;http://rdfdata.org/dha/works_keywords/i1&amp;gt;
        rdf:type     dha:works_keywords ;
        dha:id       &amp;#34;1&amp;#34; ;
        dha:work     &amp;lt;http://rdfdata.org/dha/work/i103&amp;gt; ;
        dha:keyword  &amp;lt;http://rdfdata.org/dha/keyword/i889&amp;gt; .
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;To convert whatever CSV files happened to be in the downloaded zip file, my &lt;code&gt;makeQueries.pl&lt;/code&gt; perl script reads all of the CSV files that it finds in the &lt;code&gt;dh_conferences_data&lt;/code&gt; subdirectory and:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;If a file has no underscore in its name and is therefore not a list of relationships, the perl script uses a proper-cased singular version of the file&amp;rsquo;s name as a class name for the data it contains—for example, &amp;ldquo;Work&amp;rdquo; for the data in &lt;code&gt;works.csv&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Creates the query that will drive tarql&amp;rsquo;s conversion of the CSV file. &lt;code&gt;makeQueries.pl&lt;/code&gt; reads the property names from the CSV&amp;rsquo;s first line and uses them to create a SPARQL CONSTRUCT query that creates an instance of the class whose name it identified in the previous step. Each data row&amp;rsquo;s ID value (with an &amp;ldquo;i&amp;rdquo; prefix added) is used as the local name of the URI that represents that row&amp;rsquo;s resource. This gives the first work listed (&amp;ldquo;Writing about It: Documentation and Humanities Computing&amp;rdquo;) a URI of &lt;code&gt;http://rdfdata.org/dha/work/i1&lt;/code&gt;, and the 103rd one, which is shown above, a URI of  &lt;code&gt;http://rdfdata.org/dha/work/i103&lt;/code&gt; .&lt;/li&gt;
&lt;li&gt;Writes the query to the &lt;code&gt;dh_conferences_sparql&lt;/code&gt; subdirectory with the same filename as the input CSV file and an extension of &lt;code&gt;rq&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Writes a line to standard out that tells tarql to read this new SPARQL query file, run it, and put the output in the &lt;code&gt;dh_conferences_rdf&lt;/code&gt; subdirectory in a file with the same name as the query and an extension of &lt;code&gt;ttl&lt;/code&gt;. The directions with the script say to redirect its output of all of these tarql calls to a shell script, so when the perl script is done you can run that shell script to do the actual conversion of all that CSV to RDF.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The &lt;code&gt;makeQueries.pl&lt;/code&gt; perl script also has an array of &lt;code&gt;foreignKeyFields&lt;/code&gt; so that it knows that when a line from one CSV file is referencing an instance of data in another, it should reference it with a URI. (Knowledge graphs!) So, for example, a value of &amp;ldquo;1&amp;rdquo; for a work&amp;rsquo;s conference (The Joint International Conference of the Association for Literary and Linguistic Computing and the Association for Computers and the Humanities, in Glasgow) is turned into the appropriate URI so that the triple about the &amp;ldquo;Writing about it&amp;rdquo; paper&amp;rsquo;s conference is this:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;&amp;lt;http://rdfdata.org/dha/work/i1&amp;gt; dha:conference &amp;lt;http://rdfdata.org/dha/conference/i1&amp;gt; .
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;If the data model of the relational input data did include a new column of foreign key references, this would require a slight adjustment to the perl script to add that to this &lt;code&gt;foreignKeyFields&lt;/code&gt; array.&lt;/p&gt;
&lt;h1 id=&#34;making-the-rdf-better-than-the-relational-data&#34;&gt;Making the RDF better than the relational data&lt;/h1&gt;
&lt;p&gt;Once you have data as triples—&lt;em&gt;any&lt;/em&gt; triples—you can use SPARQL CONSTRUCT queries to improve that data.&lt;/p&gt;
&lt;h2 id=&#34;using-standards-instead-of-ad-hoc-namespaces&#34;&gt;Using standards instead of ad-hoc namespaces&lt;/h2&gt;
&lt;p&gt;My conversion script puts a lot of resources in namespaces built around my domain name &lt;code&gt;rdfdata.org&lt;/code&gt;. When possible, I&amp;rsquo;d rather that they use standard namespaces. For example, the script above created this in &lt;code&gt;keywords.ttl&lt;/code&gt;:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;&amp;lt;http://rdfdata.org/dha/keyword/i2641&amp;gt;
        rdf:type   dha:Keyword ;
        dha:id     &amp;#34;2641&amp;#34; ;
        dha:title  &amp;#34;tei encoding&amp;#34; .
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;If we&amp;rsquo;re using keywords to assign subjects to works, I&amp;rsquo;d rather store information about those keywords using the &lt;a href=&#34;https://www.bobdc.com/categories/skos/&#34;&gt;SKOS&lt;/a&gt; standard, so my &lt;code&gt;keywords2skos.rq&lt;/code&gt; SPARQL query turns the above into this:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;&amp;lt;http://rdfdata.org/dha/keyword/i2641&amp;gt;
        rdf:type        skos:Concept ;
        skos:inScheme   dha:keywordScheme ;
        skos:prefLabel  &amp;#34;tei encoding&amp;#34; .
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Note that it&amp;rsquo;s not actually &lt;em&gt;converting&lt;/em&gt; the &lt;code&gt;http://rdfdata.org/dha/keyword/i2641&lt;/code&gt; resource, but just adding new triples about it in the SKOS namespace. These triples are stored separately from the original, so we don&amp;rsquo;t have to load originals into a triplestore when we use this data in an application.&lt;/p&gt;
&lt;p&gt;The conference and abstract data also assigned topics to the various papers, so I did a similar conversion with them, storing them in the SKOS &lt;code&gt;dha:topicScheme&lt;/code&gt; scheme instead of the &lt;code&gt;dha:keywordScheme&lt;/code&gt; one shown above that I used for keywords.&lt;/p&gt;
&lt;p&gt;If I was creating a serious production application, I could take this further. For example, instead of using the property &lt;code&gt;http://rdfdata.org/dha/ns/dh-abstracts/title&lt;/code&gt; to reference the abstracts&amp;rsquo; titles, I could use &lt;code&gt;http://purl.org/dc/elements/1.1/title&lt;/code&gt;, and there is probably an ontology for conferences out there that has defined some of these other properties. (The &lt;a href=&#34;https://schema.org/&#34;&gt;schema.org&lt;/a&gt; &lt;a href=&#34;https://schema.org/Event&#34;&gt;&lt;code&gt;Event&lt;/code&gt;&lt;/a&gt; class looks like it could cover a lot of the latter.)&lt;/p&gt;
&lt;h2 id=&#34;improving-the-links-between-resources&#34;&gt;Improving the links between resources&lt;/h2&gt;
&lt;p&gt;As we saw above, the &lt;code&gt;works_keywords.ttl&lt;/code&gt; RDF file that this process creates from the &lt;code&gt;works_keywords.csv&lt;/code&gt; data ends up with triples like this, which tells us that &lt;code&gt;works_keywords&lt;/code&gt; row &lt;code&gt;i1&lt;/code&gt; represents a link from work &lt;code&gt;i103&lt;/code&gt; to keyword &lt;code&gt;i889&lt;/code&gt;:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;&amp;lt;http://rdfdata.org/dha/works_keywords/i1&amp;gt;
        rdf:type     dha:works_keywords ;
        dha:id       &amp;#34;1&amp;#34; ;
        dha:work     &amp;lt;http://rdfdata.org/dha/work/i103&amp;gt; ;
        dha:keyword  &amp;lt;http://rdfdata.org/dha/keyword/i889&amp;gt; .
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;RDF lets us do better than this relational database style out-of-line linking. Instead of a &amp;ldquo;link&amp;rdquo; resource that references the two linked resources, why not just say in the data about work &lt;code&gt;i103&lt;/code&gt; that it has a keyword of resource &lt;code&gt;i889&lt;/code&gt;? The &lt;code&gt;createWorkKeywordTriples.rq&lt;/code&gt; query does just that, reading the above triples and creating a new &lt;code&gt;workKeywordTriples.ttl&lt;/code&gt; file in the &lt;code&gt;newrdf&lt;/code&gt; subdirectory that has triples like this:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;dhaw:i103 schema:keywords dhak:i889 .
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Once I&amp;rsquo;ve done that, I don&amp;rsquo;t even need the triples in the &lt;code&gt;works_keywords.ttl&lt;/code&gt; file. They&amp;rsquo;re just an artifact of the data&amp;rsquo;s relational heritage. I also used the schema.org standard&amp;rsquo;s property &lt;code&gt;schema:keywords&lt;/code&gt; to show that a given keyword was assigned to a given work. If I&amp;rsquo;m going to connect keywords to a work the RDF way, I may as well use a property from a well-known standard to do it!&lt;/p&gt;
&lt;p&gt;A &lt;code&gt;createWorkTopicTriples.rq&lt;/code&gt; SPARQL CONSTRUCT query does the same thing with the topic assignments that &lt;code&gt;createWorkKeywordTriples.rq&lt;/code&gt; did with the keyword assignments.&lt;/p&gt;
&lt;h1 id=&#34;what-have-we-got&#34;&gt;What have we got?&lt;/h1&gt;
&lt;p&gt;Once we have made these improvements, we can run the following query to ask about the title, conference year, and any keywords associated with any works that mention Whitman in their title:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;PREFIX dha:    &amp;lt;http://rdfdata.org/dha/ns/dh-abstracts/&amp;gt; 
PREFIX schema: &amp;lt;http://schema.org/&amp;gt; 

SELECT ?title ?conferenceYear ?keyword WHERE {
 ?work dha:title ?title ;
       dha:conference ?conferenceID ;
       schema:keywords ?keywordID . 
  ?keywordID dha:title ?keyword . 
  FILTER (CONTAINS(?title,&amp;#34;Whitman&amp;#34;))
  ?conferenceID dha:year ?conferenceYear . 
}
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;There is only one work, but because it has two different keywords assigned to it, the result shows up as two rows:&lt;/p&gt;
&lt;img src=&#34;https://www.bobdc.com/img/main/dhquery1results.png&#34; class=&#34;centered&#34;  alt=&#34;Results of first sample query&#34;/&gt;
&lt;h1 id=&#34;next-steps&#34;&gt;Next steps&lt;/h1&gt;
&lt;p&gt;The github repository&amp;rsquo;s &lt;a href=&#34;https://github.com/bobdc/dhconf2rdf#readme&#34;&gt;readme file&lt;/a&gt;  has a step-by-step enumeration of which scripts to run when, with less discussion than you&amp;rsquo;ve seen here. It also provides a preview of  some of the things I&amp;rsquo;ll talk about &lt;a href=&#34;../dhconfrdfpart2&#34;&gt;next time&lt;/a&gt; when I demonstrate some of the things we can do with RDF versions of this data that we can&amp;rsquo;t do (or at least, can&amp;rsquo;t do nearly as easily) with the relational version.&lt;/p&gt;
&lt;hr/&gt;
&lt;p&gt;&lt;em&gt;Comments? Reply to &lt;a href=&#34;https://twitter.com/bobdc/status/1487829265320132617&#34;&gt;my tweet&lt;/a&gt; announcing this blog entry.&lt;/em&gt;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2022">2022</category>
      
      <category domain="https://www.bobdc.com//categories/digital-humanities">digital-humanities</category>
      
      <category domain="https://www.bobdc.com//categories/rdf">RDF</category>
      
      <category domain="https://www.bobdc.com//categories/skos">SKOS</category>
      
      <category domain="https://www.bobdc.com//categories/sql">SQL</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
    </item>
    
    <item>
      <title>17 years of my web bookmarks, with metadata</title>
      <link>https://www.bobdc.com/blog/bookmarks/</link>
      <pubDate>Fri, 31 Dec 2021 12:58:00 +0000</pubDate>
      
      <guid>https://www.bobdc.com/blog/bookmarks/</guid>
      
      
      <description><div> Featuring &#34;75 Bleeding-Edge Search Engines To Beat Google&#34;, and more!</div><div>&lt;img src=&#34;https://www.bobdc.com/img/main/bookmark.png&#34; style=&#34;margin: 0px 30px 20px 80px;&#34; border=&#34;0&#34; align=&#34;right&#34; /&gt;
&lt;p&gt;Much of the original point of the web was not just linking from one page to another but also saving and managing links, ideally with some metadata. Because of this, all browsers give you some way to save a link to a web page as a bookmark, and they typically let you sort these into a hierarchical arrangement of folders.&lt;/p&gt;
&lt;p&gt;Third-party apps have cropped up with various strategies for improving on the built-in bookmark management offered by browsers. I have used &lt;a href=&#34;https://www.diigo.com/&#34;&gt;diigo&lt;/a&gt; since 2004  and &lt;a href=&#34;https://en.wikipedia.org/wiki/Delicious_(website)&#34;&gt;del.icio.us&lt;/a&gt; before that, and a recent review of &lt;a href=&#34;https://www.diigo.com/user/bobducharme&#34;&gt;my 71 pages of bookmarks&lt;/a&gt; was like a tour of my own mind for 17 years. (I seem to remember migrating the del.icio.us bookmarks when I made the transition but only see a few dozen of them showing up together in the early days of my using diigo.)&lt;/p&gt;
&lt;p&gt;The ability to &lt;a href=&#34;https://www.diigo.com/user/bobducharme/tags&#34;&gt;tag&lt;/a&gt; diigo bookmarks makes it easier for me to link to batches of them from here. For example, 63 of my early links are about the very concept of &lt;a href=&#34;https://www.diigo.com/user/bobducharme?query=%23linking&#34;&gt;linking&lt;/a&gt; and the related standards and implementations that were evolving at the time. These links about linking include links to entries from my first blog, &lt;a href=&#34;http://www.snee.com/xml/tal.html&#34;&gt;Thinking About Linking&lt;/a&gt;&amp;quot;, which I had on oreilly.com.&lt;/p&gt;
&lt;p&gt;Reviewing the full bookmark collection showed some interesting patterns.&lt;/p&gt;
&lt;h1 id=&#34;too-much-fun-to-not-share-right-away&#34;&gt;Too much fun to not share right away&lt;/h1&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;The &lt;a href=&#34;http://museumofbadart.org/&#34;&gt;Museum of Bad Art (MOBA)&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;From 2008: &lt;a href=&#34;https://www.cmswire.com/cms/search/-75-bleedingedge-search-engines-to-beat-google-002861.php&#34;&gt;75 Bleeding-Edge Search Engines To Beat Google&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;https://i.imgur.com/qc4rHwf.gifv&#34;&gt;Christopher Loopin: recursive Winnie the Pooh&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;http://infomesh.net/2002/swhaiku/&#34;&gt;The Semantic Web&amp;hellip; in Haiku&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h1 id=&#34;link-rot-or-not&#34;&gt;Link rot (or not)&lt;/h1&gt;
&lt;p&gt;Big respect to the organizations whose URLs still point to the same content they pointed to way back when:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The New York Times, like with this &lt;a href=&#34;https://open.blogs.nytimes.com/2007/10/23/messing-around-with-metadata/&#34;&gt;Messing Around With Metadata&lt;/a&gt; piece that I bookmarked in 2007.&lt;/li&gt;
&lt;li&gt;The Economist, like with this &lt;a href=&#34;https://www.economist.com/technology-quarterly/2008/04/09/start-making-sense&#34;&gt;Start making sense&lt;/a&gt; piece about the semantic web from 2008.&lt;/li&gt;
&lt;li&gt;Lifehacker, like with this &lt;a href=&#34;https://lifehacker.com/completely-remove-programs-with-revo-uninstaller-282337&#34;&gt;review of a Windows uninstaller&lt;/a&gt; that I bookmarked in 2007. (One name that came up a lot in my bookmark inventory was &lt;a href=&#34;https://ginatrapani.org/&#34;&gt;Gina Trapani&lt;/a&gt;. She wrote many, many pieces at Lifehacker that I found useful enough to bookmark.)&lt;/li&gt;
&lt;li&gt;I was going to give special kudos to technology news company &lt;a href=&#34;https://gigaom.com/&#34;&gt;GigaOM&lt;/a&gt;, because when I started reviewing these links I found that after GigaOM acquired the media news publisher &lt;a href=&#34;https://en.wikipedia.org/wiki/PaidContent&#34;&gt;PaidContent&lt;/a&gt; they redirected PaidContent article URLs to gigaom.com URLs for the same articles, but these links have &lt;a href=&#34;http://www.paidcontent.org/entry/419-the-inside-word/&#34;&gt;rotted away&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The plain domain name paidcontent.org does redirect to gigaom.com, which reminds me of another interesting pattern I saw: expired domain names, including those with specific technical names, were often taken over by Japanese or Chinese sites that seem to have nothing to do with the original content. One example is &lt;a href=&#34;http://medianmusic.com&#34;&gt;medianmusic.com&lt;/a&gt;, which was selling groceries in Chinese when I first checked during my bookmark review and now shows Chinese content that I don&amp;rsquo;t understand well enough to generalize about.&lt;/p&gt;
&lt;p&gt;Some other companies who get neither the A+ of the companies above for link maintenance or a failing grade. Two examples:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;O&amp;rsquo;Reilly. The URLs for the tons of content that was once on oreillynet.com (like to this &lt;a href=&#34;http://www.oreillynet.com/pub/a/oreilly/digitalmedia/2005/11/16/what-is-screencasting.html&#34;&gt;What is screencasting&lt;/a&gt; piece) now just redirect to the oreilly.com home page. It is nice to see that O&amp;rsquo;Reilly Radar pieces like &lt;a href=&#34;http://radar.oreilly.com/archives/2007/09/recent-conversa.html&#34;&gt;Recent conversations about online documentation&lt;/a&gt; and O&amp;rsquo;Reilly Tools of Change for Publishing pieces like &lt;a href=&#34;http://toc.oreilly.com/2008/07/ala-2008-librarians-and-patron.html&#34;&gt;ALA 2008: Librarians and Patrons Want More Openness&lt;/a&gt; still lead to the original articles.&lt;/li&gt;
&lt;li&gt;Publishing industry newsletter The Gilbane Advisor, which still has the 2009 Bill Trippe piece &lt;a href=&#34;https://gilbane.com/2009/07/random_house_creating_a_21st_century_publishing_framework&#34;&gt;Random House: Creating a 21st Century Publishing Framework&lt;/a&gt; but not several of the other pieces I had bookmarked.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Failing grades:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;IBM. While reviewing my diigo links I made note of five or six IBM developerWorks articles that were still there ten or so years after being published, but as I write this, none of them are there anymore. I&amp;rsquo;ll give them credit for one thing, though: they paid us to write those articles! After looking over my contract for one of them I recently republished it here: &lt;a href=&#34;../skosibm&#34;&gt;Taxonomy management with SKOS&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Taxonomy management tool vendor Synaptica. They had many bookmarkable articles about taxonomy management on their synapticacentral.com site, but all the ones I looked for are no longer there.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Of course, you could add to all three lists above; I&amp;rsquo;m just basing them on my own bookmark review. I purged many of them from the collection before I started taking notes for this blog post.&lt;/p&gt;
&lt;p&gt;The &lt;a href=&#34;https://archive.org/web/&#34;&gt;Wayback Machine&lt;/a&gt; is like a versioning system for the web that lets you see how just about any web page looked at earlier points in the web&amp;rsquo;s history. It has been so valuable that I just donated $10 to them while reviewing one of my links that use it. I have replaced several of my formerly dead diigo bookmarks with links to Wayback Machine versions, like &lt;a href=&#34;http://web.archive.org/web/20120614115555/http://www.americastestkitchenfeed.com/do-it-yourself/2011/07/how-to-make-candied-ginger/&#34;&gt;this recipe for making candied ginger from fresh ginger&lt;/a&gt;.&lt;/p&gt;
&lt;h1 id=&#34;things-i-thought-were-going-to-be-a-bigger-deal-than-they-turned-out-to-be&#34;&gt;Things I thought were going to be a bigger deal than they turned out to be&lt;/h1&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;I already mentioned that I used to be especially interested in linking—not just the ability to jump from web page to relevant web page, but standards-based architectures being built around these ideas and the ability for applications to take advantage of these architectures. &lt;a href=&#34;https://www.xml.com/pub/a/2002/03/13/xlink.html&#34;&gt;Twenty years ago&lt;/a&gt; I realized that it wasn&amp;rsquo;t going to play out that way, although of course various JavaScript libraries and related tools have let people create more sophisticated link implementations. Standardized metadata built in to the links? Not so much.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;A dozen links tagged &lt;a href=&#34;https://www.diigo.com/user/bobducharme?query=%23rdfa&#34;&gt;RDFa&lt;/a&gt;. With millions (billions?) of HTML pages now using JSON-LD to embed triples, I won&amp;rsquo;t complain about RDFa&amp;rsquo;s failure. The goal of machine-readable triples being embedded in HTML pages was achieved.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;https://www.diigo.com/user/bobducharme?query=%23HTML5&#34;&gt;HTML5&lt;/a&gt; and the bitter process of its development.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;https://www.diigo.com/user/bobducharme?query=%23chatbots&#34;&gt;Chatbots&lt;/a&gt;. My two most recent bookmarks with this tag are to &lt;a href=&#34;https://chatbotsmagazine.com/&#34;&gt;Chatbots Magazine&lt;/a&gt;, whose newest article is over two years old, and a 2018 piece on chatbotslife.com titled &lt;a href=&#34;https://chatbotslife.com/chatbots-what-happened-dcc3f91a512c&#34;&gt;Chatbots: What Happened?&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Google+. I can&amp;rsquo;t even link to my bookmarks there because I deleted them all during my purge.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Some of my bookmarks showed the rise in popularity of things that continue to be popular, such as cloud computing, Twitter, and electronic book technologies.&lt;/p&gt;
&lt;h1 id=&#34;miscellaneous-observations&#34;&gt;Miscellaneous observations&lt;/h1&gt;
&lt;p&gt;I had many bookmarks for:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Tasks that were difficult to do in Linux 13 years ago but are easier now.&lt;/li&gt;
&lt;li&gt;Windows utilities that would now be outdated even if I still used Windows.&lt;/li&gt;
&lt;li&gt;Things I no longer need to bookmark because a web  search to find them is faster than finding the bookmark (for example, a web form that escapes and unescapes URLs).&lt;/li&gt;
&lt;li&gt;Things in the category of “I should read this but don’t feel like it; I will tag it so that I can come back to it if I ever regret not reading it”—especially in the field of machine learning.&lt;/li&gt;
&lt;li&gt;How did I find the book image shown at the beginning of this blog post? After I wrote my first draft, I searched my diigo bookmarks for &lt;a href=&#34;https://www.diigo.com/user/bobducharme?query=clipart&#34;&gt;clipart&lt;/a&gt; and found &lt;a href=&#34;https://openclipart.org/&#34;&gt;Openclipart&lt;/a&gt;, which I had tagged as &lt;a href=&#34;https://www.diigo.com/user/bobducharme/?query=%23opensource&#34;&gt;opensource&lt;/a&gt; and &lt;a href=&#34;https://www.diigo.com/user/bobducharme/?query=%23clipart&#34;&gt;clipart&lt;/a&gt;. I guess I use my diigo bookmarks more than I realize.&lt;/li&gt;
&lt;/ul&gt;
&lt;h1 id=&#34;adding-this-data-to-a-personal-knowledge-graph&#34;&gt;Adding this data to a personal knowledge graph&lt;/h1&gt;
&lt;p&gt;The idea of a &lt;a href=&#34;https://twitter.com/search?q=%22personal%20knowledge%20graph%22&amp;amp;f=live&#34;&gt;personal knowledge graph&lt;/a&gt; is hot lately. A curated set of over 1,700 favorite bookmarks sounds like an excellent addition to one. You can export diigo bookmarks to CSV, so I did that and used &lt;a href=&#34;../tarql&#34;&gt;tarql&lt;/a&gt; to convert all of my links and their associated metadata to 7,641 triples.&lt;/p&gt;
&lt;p&gt;In diigo you can assign multiple tags to your bookmarks; I apparently assigned four different tags to 21 of them. When you do this, diigo outputs a given bookmark&amp;rsquo;s multiple tags as a single CSV list in the CSV output, so that the &amp;ldquo;tags&amp;rdquo; value for my bookmark for &lt;a href=&#34;https://devio.wordpress.com/2012/02/19/user-interface-design/&#34;&gt;this cartoon about user interfaces&lt;/a&gt;  is &amp;ldquo;Apple,Google,Comic,userInterface&amp;rdquo;. Luckily, tarql supports Jena’s &lt;code&gt;apf:strSplit&lt;/code&gt; function, making it easy to split that list and create four different &lt;code&gt;ex:tag&lt;/code&gt; triples for that bookmark. (That &lt;code&gt;ex:&lt;/code&gt; namespace was just for the quick and dirty test. For a real application I would use &lt;a href=&#34;https://www.dublincore.org/specifications/dublin-core/dcmi-terms/#subject&#34;&gt;dc:subject&lt;/a&gt; for the tags.) After I added this function to my conversion query, it created 583 more triples than it had before.&lt;/p&gt;
&lt;p&gt;How did I find out that I had assigned four different tags to 21 bookmarks? With a SPARQL query after doing the conversion, of course. With this data in RDF I can look for patterns and connect those tags to keywords in a taxonomy if I want. I can also connect up the data to other datasets. For example, the query that drives tarql could convert tags to URIs from standardized subject collections. I had tagged two bookmarks as &lt;a href=&#34;https://www.diigo.com/user/bobducharme?query=%23F1&#34;&gt;F1&lt;/a&gt;; these could be converted to the URI &lt;code&gt;&amp;lt;http://cv.iptc.org/newscodes/subjectcode/15039001&amp;gt;&lt;/code&gt;, the &lt;a href=&#34;https://iptc.org/&#34;&gt;IPTC&lt;/a&gt; subject code for Formula One racing, for easier connection to other content out there. There are all kinds of possibilities.&lt;/p&gt;
&lt;hr/&gt;
&lt;p&gt;&lt;em&gt;Comments? Reply to &lt;a href=&#34;https://twitter.com/bobdc/status/1476978518504480773&#34;&gt;my tweet&lt;/a&gt; announcing this blog entry.&lt;/em&gt;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2021">2021</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">sparql</category>
      
    </item>
    
    <item>
      <title>My command line OWL processor</title>
      <link>https://www.bobdc.com/blog/cmdlineowl/</link>
      <pubDate>Sun, 21 Nov 2021 12:30:00 +0000</pubDate>
      
      <guid>https://www.bobdc.com/blog/cmdlineowl/</guid>
      
      
      <description><div>With most of the credit going to to Ivan Herman.</div><div>&lt;img src=&#34;https://www.bobdc.com/img/main/CharlieChristian.jpg&#34; style=&#34;margin: 0px 30px 20px 80px;&#34; border=&#34;0&#34; align=&#34;right&#34; width=&#34;260pt&#34; alt=&#34;Charlie Christian&#34;/&gt;
&lt;p&gt;I recently &lt;a href=&#34;https://twitter.com/bobdc/status/1446514759562666005&#34;&gt;asked on Twitter&lt;/a&gt; about the availability of command line OWL processors. I got some leads, but most would have required a little coding or integration work on my part. I decided that a &lt;a href=&#34;https://www.bobdc.com/blog/driving-hadoop-data-integratio/&#34;&gt;small project&lt;/a&gt;  that I did with the OWL-RL Python library a few years ago gave me a head start on just  creating my own OWL command line processor in Python. It was pretty easy.&lt;/p&gt;
&lt;p&gt;My goal was something that would read RDF files, do inferencing, and output any triples created by the inferencing. The heavy lifting is done by the &lt;a href=&#34;https://pypi.org/project/owlrl/&#34;&gt;OWL-RL library&lt;/a&gt; for the classic &lt;a href=&#34;https://github.com/RDFLib/rdflib&#34;&gt;RDFLib&lt;/a&gt; Python library. The OWL-RL library was originally written by &lt;a href=&#34;http://www.ivan-herman.net/&#34;&gt;Ivan Herman&lt;/a&gt; and is now maintained by &lt;a href=&#34;https://pypi.org/user/ashleysommer/&#34;&gt;Ashley Sommer&lt;/a&gt; and &lt;a href=&#34;https://pypi.org/user/ncar/&#34;&gt;Nicholas Car&lt;/a&gt;. (As you would guess from its name, this library implements the &lt;a href=&#34;https://www.w3.org/TR/owl2-profiles/#OWL_2_RL&#34;&gt;rule-based OWL profile&lt;/a&gt; known as OWL RL.) My script is short and simple enough that instead of putting it on github I&amp;rsquo;ve just pasted it below.&lt;/p&gt;
&lt;h2 id=&#34;testing-it&#34;&gt;Testing it&lt;/h2&gt;
&lt;p&gt;In my recent blog posting &lt;a href=&#34;../dontneedowl/&#34;&gt;You probably don&amp;rsquo;t need OWL&lt;/a&gt;, I wrote about an inferencing use case:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;For example, in &lt;a href=&#34;../trying-out-blazegraph/&#34;&gt;Trying Out Blazegraph&lt;/a&gt; (which only supports bits of OWL), I showed a dataset that had triples about various chairs and desks being located in various rooms, as well as triples about which rooms were in which buildings, but nothing about which furniture was in which buildings (or for that matter, what counted as furniture). I then used the RDFS &lt;code&gt;rdfs:subClassOf&lt;/code&gt; property to declare that &lt;code&gt;dm:Chair&lt;/code&gt; and &lt;code&gt;dm:Desk&lt;/code&gt; were subclasses of  &lt;code&gt;dm:Furniture&lt;/code&gt;, and I also declared that my &lt;code&gt;dm:locatedIn&lt;/code&gt; property was an &lt;code&gt;owl:TransitiveProperty&lt;/code&gt;. With these additional modeling triples, a SPARQL query to an OWL processor that understood &lt;code&gt;rdfs:subClassOf&lt;/code&gt; and &lt;code&gt;owl:TransitiveProperty&lt;/code&gt; could then list which furniture was in which building. This little bit of OWL actually added some semantics to the model as well, because it tells us—and OWL processors—a little about the &amp;ldquo;meaning&amp;rdquo; of &lt;code&gt;dm:locatedIn&lt;/code&gt;.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;To try this example with my new command line processor, I didn&amp;rsquo;t even need to use SPARQL. I just stored the &amp;ldquo;Trying Out Blazegraph&amp;rdquo; sample data in a file called &lt;code&gt;chairsAndTables.ttl&lt;/code&gt; and fed it to my script like this:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;owl-rl-inferencing.py chairsAndTables.ttl
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Here are the first three triples of the output:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;&amp;lt;http://learningsparql.com/ns/data#chair15&amp;gt; a ns2:Furniture, ns1:Thing ;
    ns2:locatedIn &amp;lt;http://learningsparql.com/ns/data#building100&amp;gt; .
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;It inferred that chair 15 is an instance of the &lt;code&gt;Furniture&lt;/code&gt; class (and of the &lt;code&gt;Thing&lt;/code&gt; class) and that it&amp;rsquo;s in building 100. It also output triples about what buildings all the other chairs and tables were in, so I counted this as a successful test.&lt;/p&gt;
&lt;p&gt;For another test, I was especially happy to see the script do the inferencing I expected from one particular example in my book &lt;a href=&#34;http://www.learningsparql.com/&#34;&gt;Learning SPARQL&lt;/a&gt;. Example dataset &lt;a href=&#34;http://www.learningsparql.com/2ndeditionexamples/ex424.ttl&#34;&gt;&lt;code&gt;ex424.ttl&lt;/code&gt;&lt;/a&gt; lists the name, instrument played, and birth state of six musicians without saying that any is a member of any class. Here are two examples:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;d:m2 rdfs:label &amp;#34;Charlie Christian&amp;#34; ;
     dm:plays d:Guitar ;
     dm:stateOfBirth d:TX .

d:m4 rdfs:label &amp;#34;Kim Gordon&amp;#34; ;
     dm:plays d:Bass ;
     dm:stateOfBirth d:NY .
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;It also includes the following &lt;a href=&#34;https://www.cs.vu.nl/~guus/public/owl-restrictions/&#34;&gt;restriction class&lt;/a&gt; definitions, which specify conditions that qualify an instance as a member of the classes &lt;code&gt;Guitarist&lt;/code&gt;, &lt;code&gt;Texan&lt;/code&gt;, and &lt;code&gt;TexasGuitarPlayer&lt;/code&gt;:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;dm:Guitarist
   owl:equivalentClass
           [ rdf:type owl:Restriction ;
             owl:hasValue d:Guitar ;
             owl:onProperty dm:plays
           ] .

dm:Texan
   owl:equivalentClass
           [ rdf:type owl:Restriction ;
             owl:hasValue d:TX ;
             owl:onProperty dm:stateOfBirth
           ] .

dm:TexasGuitarPlayer
   owl:equivalentClass
        [ rdf:type owl:Class ;
          owl:intersectionOf (dm:Texan dm:Guitarist)
        ] .
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;To test my script&amp;rsquo;s ability to read different serializations, I split up &lt;code&gt;ex424.ttl&lt;/code&gt; into &lt;code&gt;ex424a.ttl&lt;/code&gt;, &lt;code&gt;ex424b.nt&lt;/code&gt;, and &lt;code&gt;ex424c.rdf&lt;/code&gt;before feeding them to the script like this:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;owl-rl-inferencing.py ex424a.ttl ex424b.nt ex424c.rdf 
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;The output included the following triples, so we know that it inferred that Charlie Christian was an instance of all three classes:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;&amp;lt;http://learningsparql.com/ns/data#m2&amp;gt; a
        &amp;lt;http://learningsparql.com/ns/demo#Guitarist&amp;gt;,
        &amp;lt;http://learningsparql.com/ns/demo#Texan&amp;gt;,
        &amp;lt;http://learningsparql.com/ns/demo#TexasGuitarPlayer&amp;gt; .
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;It did not infer that resource &lt;code&gt;m4&lt;/code&gt;, New York bassist Kim Gordon, was in either class. It did infer that Texas piano player Red Garland was a &lt;code&gt;Texan&lt;/code&gt;, but not a &lt;code&gt;Guitarist&lt;/code&gt; or a &lt;code&gt;TexasGuitarPlayer&lt;/code&gt;, and it inferred that native Californian Bonnie Raitt was a &lt;code&gt;Guitarist&lt;/code&gt; but not a member of the other two classes.&lt;/p&gt;
&lt;h2 id=&#34;combining-this-with-other-tools&#34;&gt;Combining this with other tools&lt;/h2&gt;
&lt;p&gt;The inferred triples may need some management after they&amp;rsquo;re materialized. If chair 15 gets moved from room 101 in building 100 to building 201 in building 200, we don&amp;rsquo;t want that inferred triple about it being in building 100 hanging out any more. Named graphs can help here, as I described in &lt;a href=&#34;../materializing&#34;&gt;Living in a materialized world: Managing inferenced triples with named graphs&lt;/a&gt;. That shows how RDFLib lets you &lt;a href=&#34;../pipelining-sparql-queries-in-m/&#34;&gt;pipeline&lt;/a&gt; a series of queries and updates, letting you combine simple and complex operations into sophisticated applications. The ability to do OWL inferencing can contribute a lot to these pipelines.&lt;/p&gt;
&lt;p&gt;Without taking advantage of RDFLib&amp;rsquo;s pipelining ability at the Python code level, you can do some pipelining right from your operating system command line, sending the output of my &lt;code&gt;owl-rl-inferencing.py&lt;/code&gt; script to an Apache Jena tool such as &lt;a href=&#34;../jenagems/#riot&#34;&gt;riot&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Either way, I hope the script is useful to someone. Let me know!&lt;/p&gt;
&lt;h2 id=&#34;the-code&#34;&gt;The code&lt;/h2&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;#!/usr/bin/env python3&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;# owl-rl-inferencing.py: read RDF files provided as command line&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;# arguments, do OWL RL inferencing, and output any new triples&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;# resulting from that.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#f92672&#34;&gt;import&lt;/span&gt; sys
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#f92672&#34;&gt;import&lt;/span&gt; rdflib
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#f92672&#34;&gt;import&lt;/span&gt; owlrl
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;if&lt;/span&gt; len(sys&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;argv) &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;&lt;/span&gt;  &lt;span style=&#34;color:#ae81ff&#34;&gt;2&lt;/span&gt;:  &lt;span style=&#34;color:#75715e&#34;&gt;# print directions&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    print(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;Read RDF files, perform inferencing, and output the new triples.&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    print (&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;Enter one or more .ttl, .nt, and .rdf filenames as arguments.&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    sys&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;exit()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;inputGraph &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; rdflib&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;Graph()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;graphToExpand &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; rdflib&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;Graph()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;# Read the files. arg 0 is the script name, so don&amp;#39;t parse that as RDF.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;for&lt;/span&gt; filename &lt;span style=&#34;color:#f92672&#34;&gt;in&lt;/span&gt; sys&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;argv[&lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;:]:   
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#66d9ef&#34;&gt;if&lt;/span&gt; filename&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;endswith(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;.ttl&amp;#34;&lt;/span&gt;):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;       inputGraph&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;parse(filename, format&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;turtle&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#66d9ef&#34;&gt;elif&lt;/span&gt; filename&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;endswith(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;.nt&amp;#34;&lt;/span&gt;):       
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;       inputGraph&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;parse(filename, format&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;nt&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#66d9ef&#34;&gt;elif&lt;/span&gt; filename&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;endswith(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;.rdf&amp;#34;&lt;/span&gt;):       
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;       inputGraph&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;parse(filename, format&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;xml&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#66d9ef&#34;&gt;else&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        print(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;# Filename &amp;#34;&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; filename &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34; doesn&amp;#39;t end with .ttl, .nt, or .rdf.&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;# Copy the input graph so that we can diff to identify new triples later.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;for&lt;/span&gt; s, p, o &lt;span style=&#34;color:#f92672&#34;&gt;in&lt;/span&gt; inputGraph:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    graphToExpand&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;add((s,p,o))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;# Do the inferencing. See&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;# https://owl-rl.readthedocs.io/en/latest/stubs/owlrl.DeductiveClosure.html#owlrl.DeductiveClosure&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;# for other owlrl.* choices.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;owlrl&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;DeductiveClosure(owlrl&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;OWLRL_Semantics)&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;expand(graphToExpand)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;newTriples &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; graphToExpand &lt;span style=&#34;color:#f92672&#34;&gt;-&lt;/span&gt; inputGraph  &lt;span style=&#34;color:#75715e&#34;&gt;# How cool is that? &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;# Output Turtle comments reporting on graph sizes&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;print(&lt;span style=&#34;color:#e6db74&#34;&gt;f&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;# inputGraph: &lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;{&lt;/span&gt;len(inputGraph)&lt;span style=&#34;color:#e6db74&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt; triples&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;print(&lt;span style=&#34;color:#e6db74&#34;&gt;f&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;# graphToExpand: &lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;{&lt;/span&gt;len(graphToExpand)&lt;span style=&#34;color:#e6db74&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt; triples&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;print(&lt;span style=&#34;color:#e6db74&#34;&gt;f&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;# newTriples: &lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;{&lt;/span&gt;len(newTriples)&lt;span style=&#34;color:#e6db74&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt; triples&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;# Output the new triples (decode() is to omit &amp;#34;b&amp;#39;&amp;#39; &amp;#34; in output)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;print(newTriples&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;serialize(format&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;turtle&amp;#39;&lt;/span&gt;)&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;decode())
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;hr/&gt;
&lt;p&gt;&lt;em&gt;Comments? Reply to &lt;a href=&#34;https://twitter.com/bobdc/status/1462476624406929408&#34;&gt;my tweet&lt;/a&gt; announcing this blog entry.&lt;/em&gt;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2021">2021</category>
      
      <category domain="https://www.bobdc.com//categories/owl">OWL</category>
      
      <category domain="https://www.bobdc.com//categories/rdfs">RDFS</category>
      
    </item>
    
    <item>
      <title>You probably don&#39;t need OWL</title>
      <link>https://www.bobdc.com/blog/dontneedowl/</link>
      <pubDate>Sun, 17 Oct 2021 11:50:00 +0000</pubDate>
      
      <guid>https://www.bobdc.com/blog/dontneedowl/</guid>
      
      
      <description><div>And if you do there&#39;s a simple way to prove it.</div><div>&lt;img src=&#34;https://www.bobdc.com/img/main/w3cOwlLogo.png&#34; style=&#34;margin: 0px 30px 20px 80px;&#34; border=&#34;0&#34; align=&#34;right&#34; width=&#34;260pt&#34; /&gt;
&lt;p&gt;During the course of my recent blog posts &lt;a href=&#34;../whatisrdf/&#34;&gt;What is RDF?&lt;/a&gt;, &lt;a href=&#34;../whatisrdfs/&#34;&gt;What is RDFS?&lt;/a&gt;, &lt;a href=&#34;../whatisrdfspart2/&#34;&gt;What else can I do with RDFS?&lt;/a&gt;, and &lt;a href=&#34;../skosibm/&#34;&gt;Taxonomy management with SKOS&lt;/a&gt;, some readers wondered if I would do a &amp;ldquo;What is OWL?&amp;rdquo; followup. I recommended to one inquirer that he read  pages 39-41 and 263 - 269 of &lt;a href=&#34;http://www.learningsparql.com&#34;&gt;Learning SPARQL&lt;/a&gt;; I think that provides a pretty good introduction to OWL&amp;rsquo;s history and how to do some of the set-based logic that was an important part of its original intent.&lt;/p&gt;
&lt;p&gt;A recent blog entry by Irene Polikoff, a founder of my former employer TopQuadrant, has also inspired &lt;a href=&#34;https://twitter.com/jindrichmynarz/status/1401918042532110349&#34;&gt;a lot of conversation&lt;/a&gt; about when people should or shouldn&amp;rsquo;t use OWL. Her entry&amp;rsquo;s title is pretty categorical: &lt;a href=&#34;https://web.archive.org/web/20220702180448/https://www.topquadrant.com/owl-blog/&#34;&gt;Why I Don’t Use OWL Anymore&lt;/a&gt;. I think that bits of OWL can be more useful than she does, but still less useful than many people do. I&amp;rsquo;ll get to some examples below.&lt;/p&gt;
&lt;h1 id=&#34;data-modeling-use-rdfs&#34;&gt;Data modeling? Use RDFS&lt;/h1&gt;
&lt;p&gt;At its simplest level, data modeling is the identification and enumeration of the pieces of information that you want to keep track of and the relationships between them. A standards-based, machine-readable version of this enumeration is very valuable to application development. As I wrote in  &lt;a href=&#34;../whatisrdfs/&#34;&gt;What is RDFS?&lt;/a&gt; and &lt;a href=&#34;../whatisrdfspart2/&#34;&gt;What else can I do with RDFS?&lt;/a&gt;, RDFS can do that pretty well. It does an especially good job for &lt;a href=&#34;https://schema.org/&#34;&gt;schema.org&lt;/a&gt;, one of the great success stories of RDF-based technology, &lt;a href=&#34;../whatisrdfs#schemaorg&#34;&gt;as I described&lt;/a&gt; in the first of those two pieces.
You can go beyond RDFS to add information about your data&amp;rsquo;s structures and potential relationships in even more detail, but as we&amp;rsquo;ll see, machine-readable descriptions of this information won&amp;rsquo;t do you much good unless you have tools that will read these descriptions and use them to contribute value to your applications.&lt;/p&gt;
&lt;h1 id=&#34;defining-constraints-on-that-data-model-use-shacl&#34;&gt;Defining constraints on that data model? Use SHACL&lt;/h1&gt;
&lt;p&gt;OWL can go beyond RDFS to describe additional details about your classes and properties, but it can only rarely describe what counts as a valid instance of a class and what doesn&amp;rsquo;t. This has been a fundamental need of data processing for as long as people have been using data on computers: developers who write applications that use data don&amp;rsquo;t want to write lots of code to make sure that the data they read is what they&amp;rsquo;re really expecting. They want to assume that the processes that created that data already did this validation. SQL&amp;rsquo;s CREATE TABLE statements let you specify data types of and dependencies between table columns, not to mention which are required and which are optional; DTDs and later forms of schema do the same for XML.&lt;/p&gt;
&lt;p&gt;RDF never really had this until the W3C standard SHACL, as I described in &lt;a href=&#34;../validating-rdf-data-with-shacl/&#34;&gt;Validating RDF data with SHACL&lt;/a&gt;. Irene&amp;rsquo;s followup to her blog entry mentioned above is titled &lt;a href=&#34;https://www.topquadrant.com/shacl-blog/&#34;&gt;Why I Use SHACL For Defining Ontology Models&lt;/a&gt;, and it explains many of the advantages that SHACL brings. (She does write &amp;ldquo;I no longer used RDFS/OWL (besides declaring classes and subclasses)&amp;rdquo;, so she hasn&amp;rsquo;t completely replaced her usage of RDFS.)&lt;/p&gt;
&lt;h1 id=&#34;controlled-vocabulary-use-skos&#34;&gt;Controlled vocabulary? Use SKOS&lt;/h1&gt;
&lt;p&gt;Last month in  &lt;a href=&#34;../skosibm/&#34;&gt;Taxonomy management with SKOS&lt;/a&gt; I described how taxonomies and thesauri are controlled vocabularies that typically let you store metadata about the vocabulary terms, including their relationships to each other. You could picture a taxonomy or thesaurus as a potentially large collection of terms arranged in a tree in which lower levels of the tree describe subsets of the higher levels. If we want to represent this all in RDF, should we do it as OWL classes? I say: no. This is not a nail for that hammer.&lt;/p&gt;
&lt;p&gt;First of all, the lower levels of a taxonomy tree do not represent subsets of the higher levels. The tree&amp;rsquo;s nodes represent terms, not sets of things, and lower levels of the tree show more specific terms: for example, &amp;ldquo;collie&amp;rdquo; and &amp;ldquo;bulldog&amp;rdquo; as more specific versions of &amp;ldquo;dog&amp;rdquo; and &amp;ldquo;dog&amp;rdquo; as a more specific version of &amp;ldquo;mammal&amp;rdquo;. Heather Hedden, author of &lt;a href=&#34;https://www.hedden-information.com/accidental-taxonomist/&#34;&gt;the leading introduction to taxonomy development&lt;/a&gt;, summed it up nicely in her blog post &lt;a href=&#34;https://accidental-taxonomist.blogspot.com/2020/12/differing-definitions-of-ontologies.html&#34;&gt;Differing Definitions of Ontologies&lt;/a&gt;: &amp;ldquo;ontology structures are meant to model data, not to organize taxonomy concepts that could be either generic (common nouns)  named entities (proper nouns)&amp;rdquo;.&lt;/p&gt;
&lt;p&gt;In a taxonomy, &amp;ldquo;Person broader than Employee&amp;rdquo; means that a book or other form of media about employees is also a work about persons. In an ontology, &amp;ldquo;Employee is a subclass of Person&amp;rdquo; lets you distinguish between properties that apply to all persons (family name, given name) and properties that apply to employees but not to persons (hire date, salary).&lt;/p&gt;
&lt;p&gt;SKOS is itself an OWL ontology that defines a data model for storing controlled vocabularies and their metadata. It has commercial and open source support among popular vocabulary management tools. (Pinterest &lt;a href=&#34;https://arxiv.org/pdf/1907.02106.pdf&#34;&gt;developed their own&lt;/a&gt; ontology for taxonomy management, but it draws on SKOS.) SKOS is a W3C standard that is specialized for this particular job. SKOS vocabularies and OWL ontologies can use each other as input; a straightforward SPARQL query can often create one from the other, but keep their different purposes in mind. The traction that SKOS-based tools have achieved over the years is a powerful argument to use this standard for vocabulary management.&lt;/p&gt;
&lt;h1 id=&#34;but-if-you-really-need-owl&#34;&gt;But if you really need OWL&amp;hellip;&lt;/h1&gt;
&lt;p&gt;If you really need OWL, prove it! Do something with your data and an OWL processor that would have been noticeably more difficult without that processor. This will demonstrate what value OWL brings to your data.&lt;/p&gt;
&lt;p&gt;For example, in &lt;a href=&#34;../trying-out-blazegraph/&#34;&gt;Trying Out Blazegraph&lt;/a&gt; (which only supports bits of OWL), I showed a dataset that had triples about various chairs and desks being located in various rooms, as well as triples about which rooms were in which buildings, but nothing about which furniture was in which buildings (or for that matter, what counted as furniture). I then used the RDFS &lt;code&gt;rdfs:subClassOf&lt;/code&gt; property to declare that &lt;code&gt;dm:Chair&lt;/code&gt; and &lt;code&gt;dm:Desk&lt;/code&gt; were subclasses of  &lt;code&gt;dm:Furniture&lt;/code&gt;, and I also declared that my &lt;code&gt;dm:locatedIn&lt;/code&gt; property was an &lt;code&gt;owl:TransitiveProperty&lt;/code&gt;. With these additional modeling triples, a SPARQL query to an OWL processor that understood &lt;code&gt;rdfs:subClassOf&lt;/code&gt; and &lt;code&gt;owl:TransitiveProperty&lt;/code&gt; could then list which furniture was in which building. This little bit of OWL actually added some semantics to the model as well, because it tells us—and OWL processors—a little about the &amp;ldquo;meaning&amp;rdquo; of &lt;code&gt;dm:locatedIn&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;That was pretty easy. I think it&amp;rsquo;s a good general rule that if you want to demonstrate the value of a certain technology, show something that you can do with it that would have been a lot more trouble, if not impossible, without it. A query about data that is relevant to many different businesses, such as employee or facility data, is a great way to do this. (I always thought that Protégé&amp;rsquo;s famed &lt;a href=&#34;https://protegeproject.github.io/protege/getting-started/#open-the-pizza-ontology&#34;&gt;pizza ontology&lt;/a&gt; was a little too cutesy of a demonstration domain—of course everyone likes pizza, but why not use a domain where there is an actual chance that people would use an ontology to manage the relevant data?)&lt;/p&gt;
&lt;p&gt;The most visible pushback that I saw to Irene&amp;rsquo;s blog posts about not using OWL was &lt;a href=&#34;https://triply.cc/blog/2021-08-why-we-use-owl&#34;&gt;Why We Use OWL Every Day At Triply&lt;/a&gt; from the Amsterdam-based company. Their explanations of OWL&amp;rsquo;s value focused on its role as human-readable documentation of modeling intentions, which is certainly valuable, but they &lt;a href=&#34;https://twitter.com/bobdc/status/1435627413812219906&#34;&gt;did not point to&lt;/a&gt; any usage of OWL as machine-readable modeling instructions when I asked.&lt;/p&gt;
&lt;p&gt;I am not done playing with OWL, and I still dream of making the following pin and wearing it to a conference where at least some of the attendees will get the joke:&lt;/p&gt;
&lt;img src=&#34;https://www.bobdc.com/img/main/itsAnOWLThing.png&#34; class=&#34;centered&#34; width=&#34;200&#34; alt=&#34;It&#39;s an owl:Thing&#34;/&gt;
&lt;hr/&gt;
&lt;p&gt;&lt;em&gt;Comments? Reply to &lt;a href=&#34;https://twitter.com/bobdc/status/1449766266873466888&#34;&gt;my tweet&lt;/a&gt; announcing this blog entry.&lt;/em&gt;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2021">2021</category>
      
      <category domain="https://www.bobdc.com//categories/owl">OWL</category>
      
      <category domain="https://www.bobdc.com//categories/skos">SKOS</category>
      
      <category domain="https://www.bobdc.com//categories/rdfs">RDFS</category>
      
    </item>
    
    <item>
      <title>Taxonomy management with SKOS</title>
      <link>https://www.bobdc.com/blog/skosibm/</link>
      <pubDate>Sun, 19 Sep 2021 11:47:00 +0000</pubDate>
      
      <guid>https://www.bobdc.com/blog/skosibm/</guid>
      
      
      <description><div>Republishing an IBM developer works article.</div><div>&lt;p&gt;&lt;em&gt;In 2011, IBM developerWorks published an article that I wrote titled &amp;ldquo;Improve your taxonomy management using the W3C SKOS standard.&amp;rdquo; (They have always loved those &amp;ldquo;Get Better at This Thing&amp;rdquo; titles.) Several years later they took it (and a ton of other developerWorks content) down. I have republished it here as background for recent discussions about when OWL is appropriate to use and when it isn&amp;rsquo;t; more on that next month. I didn&amp;rsquo;t change anything but added a few comments in &lt;em&gt;&lt;strong&gt;bold italics&lt;/strong&gt;&lt;/em&gt; about my 2021 perspective on some of these issues. See also &lt;a href=&#34;../../categories/skos&#34;&gt;several other pieces&lt;/a&gt; that I&amp;rsquo;ve written about SKOS over the years.&lt;/em&gt;&lt;/p&gt;
&lt;div class=&#34;sidebar&#34;&gt;
&lt;h1 id=&#34;controlled-vocabularies-taxonomies-and-thesauri-whats-the-difference&#34;&gt;Controlled vocabularies, taxonomies, and thesauri: What&amp;rsquo;s the difference?&lt;/h1&gt;
&lt;p&gt;A &lt;em&gt;controlled vocabulary&lt;/em&gt; is a list of terms that define the potential values for something—for example, the possible subjects of a set of news stories or the official two-letter abbreviations of the states of the United States. A &lt;em&gt;taxonomy&lt;/em&gt; is a controlled vocabulary arranged in a hierarchy to show relationships between terms. The possible subjects of a set of news stories is most likely this kind of controlled vocabulary, with &amp;ldquo;Acquisition&amp;rdquo; and &amp;ldquo;Executive hiring&amp;rdquo; as children of the hierarchy&amp;rsquo;s &amp;ldquo;Business news&amp;rdquo; node.&lt;/p&gt;
&lt;p&gt;These relationships are metadata that indicate, for example, that a story about an executive being hired is a type of business news story or that a dachshund in an animal taxonomy is a kind of dog. When a taxonomy-aware image search engine returns a picture tagged &amp;ldquo;dachshund&amp;rdquo; to someone searching for &amp;ldquo;dog&amp;rdquo; pictures, it takes advantage of this metadata to help the searcher get greater value from the image collection.&lt;/p&gt;
&lt;p&gt;A &lt;em&gt;thesaurus&lt;/em&gt; is typically a taxonomy with additional metadata about each term such as alternative terms (for example, &amp;ldquo;mutt&amp;rdquo; for &amp;ldquo;dog&amp;rdquo;) and pointers to related terms that might or might not be in the same hierarchy (for example, &amp;ldquo;doghouse&amp;rdquo; for &amp;ldquo;dog&amp;rdquo;). People who specialize in the creation and maintenance of thesauri are usually known as taxonomists, perhaps because the term &amp;ldquo;thesaurist&amp;rdquo; sounds too much like &amp;ldquo;thesaurus&amp;rdquo; or maybe because &amp;ldquo;thesaurus&amp;rdquo; reminds people from outside the metadata management field too much of books of synonym lists used as writing aids, such as &lt;em&gt;Roget&amp;rsquo;s Thesaurus&lt;/em&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Whether you manage a taxonomy to integrate business processes in an enterprise, to manage keywords assigned to content for more intelligent retrieval, or to manage the menus of a large web-based retail site, you might find that your taxonomy management tool stores data in a proprietary binary format that doesn&amp;rsquo;t migrate well to other tools. A standards-based way to represent this data can help you integrate vocabulary data from multiple sources while reducing your dependence on proprietary tools.&lt;/p&gt;
&lt;p&gt;The Simple Knowledge Organization System (SKOS) is a W3C standard that builds on the W3C&amp;rsquo;s RDF, RDFS, and OWL specifications to provide a standard model for representing controlled vocabularies. You can use SKOS for flat lists and also for more structured controlled vocabularies with additional metadata such as taxonomies and thesauri.&lt;/p&gt;
&lt;p&gt;Because SKOS is defined using the RDF model, it&amp;rsquo;s easy to read and create data in an XML format. &lt;em&gt;&lt;strong&gt;(Not so much encouraging RDF/XML here as namechecking a standard that readers unfamiliar with RDF would have heard of.)&lt;/strong&gt;&lt;/em&gt; Growing tool support for SKOS means that using it requires no knowledge of the related W3C standards, but the more you know, the more you can take advantage of the extensibility of SKOS to include customized metadata in your vocabularies that might not be part of the SKOS standard.&lt;/p&gt;
&lt;p&gt;As organizations ranging from The New York Times to NASA to the UN Food and Agriculture Organization make their subject listings available in SKOS, this standard also makes it easier to reuse well-known vocabularies and to create connections between your content and other content that uses the same vocabularies.&lt;/p&gt;
&lt;h1 id=&#34;terms-versus-concepts-and-labels&#34;&gt;Terms versus concepts and labels&lt;/h1&gt;
&lt;p&gt;Vocabulary management systems have always been structured to manage terms, along with relationships between terms and other metadata. SKOS takes a higher-level view of what you manage, which makes internationalization much easier. For example, an older system might store the term &amp;ldquo;dog&amp;rdquo; with a broader term of &amp;ldquo;mammal&amp;rdquo; and narrower terms of &amp;ldquo;dachshund&amp;rdquo; or &amp;ldquo;bulldog.&amp;rdquo; The term &amp;ldquo;mutt&amp;rdquo; would be a separate term, and &amp;ldquo;dog&amp;rdquo; would have what taxonomists call a use-for relationship to &amp;ldquo;mutt&amp;rdquo;—if someone assigning keywords to photographs wants to assign the word &amp;ldquo;mutt&amp;rdquo; to a picture of Lassie, the vocabulary application would direct him to use the word &amp;ldquo;dog&amp;rdquo; instead. The term &amp;ldquo;perro&amp;rdquo; could have a relationship &amp;ldquo;Spanish&amp;rdquo; to the term &amp;ldquo;dog,&amp;rdquo; and &amp;ldquo;chien&amp;rdquo; could have the relationship &amp;ldquo;French&amp;rdquo; to it, but a Spanish user wondering about the French term for &amp;ldquo;perro&amp;rdquo; might not be able to look this up without knowing that they&amp;rsquo;re connected by their relationship to the English term.&lt;/p&gt;
&lt;p&gt;Another disadvantage of this arrangement is that the terms &amp;ldquo;mutt&amp;rdquo; and &amp;ldquo;perro&amp;rdquo; are as separate from &amp;ldquo;dog&amp;rdquo; as the term &amp;ldquo;cat&amp;rdquo; or &amp;ldquo;gato&amp;rdquo; (a Spanish term). Even though &amp;ldquo;mutt,&amp;rdquo; &amp;ldquo;dog,&amp;rdquo; and &amp;ldquo;perro&amp;rdquo; refer to the same thing, their relationships must be explicitly specified. Figure 1 displays these relationships in a diagram; solid-line arrows represent a &amp;ldquo;broader than&amp;rdquo; relationship (mammal to cat and dog; dog to bulldog and dachshund), and dotted-line arrows are labeled for the Spanish (&amp;ldquo;perro&amp;rdquo;) or French (&amp;ldquo;chien&amp;rdquo;) equivalents for &amp;ldquo;dog,&amp;rdquo; alternate terms in Spanish (&amp;ldquo;chucho&amp;rdquo;) and English (&amp;ldquo;mutt&amp;rdquo;) for &amp;ldquo;dog,&amp;rdquo; plus the Spanish (&amp;ldquo;gato&amp;rdquo;) for &amp;ldquo;cat.&amp;rdquo;&lt;/p&gt;
&lt;img src=&#34;https://www.bobdc.com/img/main/labels.jpg&#34; width=&#34;400&#34; class=&#34;centered&#34;  alt=&#34;Sample label relationships in a pre-SKOS taxonomy&#34;/&gt;
&lt;center&gt;&lt;i&gt;Figure 1. Sample label relationships in a pre-SKOS taxonomy&lt;/i&gt;&lt;/center&gt;&lt;br/&gt;
&lt;p&gt;With SKOS, you manage concepts that have different kinds of labels, and each label might have a language associated with it. The most important label is the preferred label, and SKOS allows each concept to have only one of these in each language. A single concept could have an English preferred label of &amp;ldquo;dog,&amp;rdquo; a Spanish preferred label of &amp;ldquo;perro,&amp;rdquo; and a French preferred label of &amp;ldquo;chien.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;Another kind of label is the alternative label, which SKOS-based software might use to represent labels that are being tracked but not recommended. For example, the concept with an English preferred label of &amp;ldquo;dog&amp;rdquo; might have an English alternative label of &amp;ldquo;mutt&amp;rdquo; and a Spanish alternative label of &amp;ldquo;chucho.&amp;rdquo; Instead of being separate terms that must have their relationships explicitly typed, &amp;ldquo;dog,&amp;rdquo; &amp;ldquo;perro,&amp;rdquo; &amp;ldquo;chien,&amp;rdquo; &amp;ldquo;mutt,&amp;rdquo; and &amp;ldquo;chucho&amp;rdquo; all refer to the same concept, providing different information about that concept depending on the needs of each application. Figure 2 illustrates the information from Figure 1 rearranged as SKOS concepts, with fewer arrows and clearer relationships between the terms. (As with the earlier figure, solid-line arrows represent a &amp;ldquo;broader than&amp;rdquo; relationship.) The actual identifiers for each concept, which might be hidden under the covers by a vocabulary management application, are URIs.&lt;/p&gt;
&lt;img src=&#34;https://www.bobdc.com/img/main/concepts.jpg&#34; width=&#34;550&#34; class=&#34;centered&#34;  alt=&#34;Sample concepts relationship in SKOS&#34;/&gt;
&lt;center&gt;&lt;i&gt;Figure 2. Sample concepts relationship in SKOS&lt;/i&gt;&lt;/center&gt;&lt;br/&gt;
&lt;p&gt;When you compare the two diagrams, you can see that in Figure 1, &amp;ldquo;perro&amp;rdquo; and &amp;ldquo;mutt&amp;rdquo; were just additional terms that &amp;ldquo;dog&amp;rdquo; pointed to, &amp;ldquo;bulldog&amp;rdquo; and &amp;ldquo;dachshund,&amp;rdquo; but in Figure 2 you can see that &amp;ldquo;perro&amp;rdquo; and &amp;ldquo;mutt&amp;rdquo; refer to the same concept while &amp;ldquo;bulldog&amp;rdquo; and &amp;ldquo;dachshund&amp;rdquo; are different concepts.&lt;/p&gt;
&lt;p&gt;Concepts can have many kinds of relationships in SKOS besides &amp;ldquo;broader than.&amp;rdquo; The concept with an English preferred label of &amp;ldquo;dog&amp;rdquo; might have a &amp;ldquo;related&amp;rdquo; relationship with a &amp;ldquo;doghouse&amp;rdquo; concept in a different taxonomy. Because SKOS uses unique URIs as concept identifiers instead of the labels themselves, you can define relationships between a given concept and any concept in any accessible SKOS vocabulary in the world, even if it&amp;rsquo;s maintained by NASA or The New York Times.&lt;/p&gt;
&lt;p&gt;The UN Food and Agriculture Organization&amp;rsquo;s &lt;a href=&#34;http://www.fao.org/agrovoc/releases&#34;&gt;AGROVOC&lt;/a&gt; thesaurus for food-related domains such as fishing and farming must serve a truly international audience. A single AGROVOC concept can have preferred labels in over a dozen languages and even more alternative labels because there is no limit to the number of alternative labels you can specify for a given concept from each language. SKOS uses concepts with label properties to make multi-lingual tracking of terms much easier than one of the older, term-based approaches to organizing thesaurus data would, and this in turn makes communication between people from different cultures about food issues much easier.&lt;/p&gt;
&lt;h1 id=&#34;more-metadata&#34;&gt;More metadata&lt;/h1&gt;
&lt;p&gt;Along with the preferred and alternative labels and relationships between concepts described above, SKOS lets you store a term&amp;rsquo;s definition, scope notes, history notes, and a variety of other properties about each concept. Because SKOS is defined using the W3C&amp;rsquo;s OWL standard for specifying ontologies, it&amp;rsquo;s very easy to define and use additional properties that are specific to your industry or business to the concepts in your vocabularies.&lt;/p&gt;
&lt;p&gt;These properties can come from other data and metadata standards, such as the Dublin Core vocabulary, the Market Data Definition Language developed for the financial industry, or the Metadata Object Description Schema developed by the Library of Congress. They can also be properties that are specific to your company&amp;rsquo;s system and that no one else uses because they&amp;rsquo;re part of the added value for how you manage your information. For example, a pharmaceutical company might define a new &amp;ldquo;requires&amp;rdquo; relationship in an animal taxonomy to point to concepts in another taxonomy&amp;rsquo;s data about veterinary vaccines.&lt;/p&gt;
&lt;p&gt;SKOS-based tools for editing and managing your vocabularies should understand that extensibility is part of this standard. Additional properties from outside of the SKOS specification should be part of their interface as you work with that data, showing up on the forms and reports along with the standardized SKOS properties.&lt;/p&gt;
&lt;h1 id=&#34;more-granular-metadata-skos-xl&#34;&gt;More granular metadata: SKOS-XL&lt;/h1&gt;
&lt;p&gt;Although the OWL language used to specify SKOS has certain crucial differences from object-oriented approaches to data modeling, it has one important thing in common: You define a data model by declaring classes, subclasses, and properties (or, to use the object-oriented term, attributes) of those classes. The SKOS ontology defines a Concept class, and preferred labels, alternative labels, and relationships to other concepts are modeled as properties of that class.&lt;/p&gt;
&lt;p&gt;You can assign all the metadata you want to a given concept, but SKOS provides no way to assign metadata to a specific label. What if you want to store data that describes the source of the label &amp;ldquo;chucho,&amp;rdquo; or when it was last edited, or who edited it?&lt;/p&gt;
&lt;p&gt;To accommodate this situation, the W3C published the SKOS Extension for Labels (SKOS-XL) specification, in which the values for a concept&amp;rsquo;s preferred, alternative, and other labels are not strings but members of a new Label class defined by the extension specification. Being instances of a class, these labels can have all the metadata you want to assign to them, which gives you a lot more flexibility.&lt;/p&gt;
&lt;h1 id=&#34;easier-metadata-integration&#34;&gt;Easier metadata integration&lt;/h1&gt;
&lt;p&gt;Earlier I mentioned that because SKOS uses unique URIs as concept identifiers, you can define a relationship between a given concept and any other SKOS-based concept whose URI ID you know, whether it&amp;rsquo;s in the same taxonomy as a given concept or in a different taxonomy published on the web by a separate company. This ability is also great for a situation that falls between these two extremes: When different groups within the same enterprise have their own vocabularies to manage, integration of these vocabularies into a centrally managed single vocabulary can do more harm than good because vocabulary maintenance becomes more complex with the growing scale of data and the data must be revised to reach compromises between the needs of different groups. The marketing department and the repairs department might mean different things when they use the term &amp;ldquo;customer,&amp;rdquo; and they might have good reasons for doing so; forcing them both to use the same definition can reduce the vocabulary&amp;rsquo;s value for both of them.&lt;/p&gt;
&lt;p&gt;With SKOS, you can define relationships between concepts from different vocabularies. Because of this, well-defined concept relationship metadata gives you the hooks to use vocabularies from different departments together without forcing you to revise and combine them all into a monolithic single vocabulary that doesn&amp;rsquo;t fully meet any group&amp;rsquo;s needs. The relationships can be standard SKOS relationships such as &amp;ldquo;related&amp;rdquo; or &amp;ldquo;broader&amp;rdquo; (for example, you might say that the marketing department&amp;rsquo;s concept of &amp;ldquo;customer&amp;rdquo; is broader than the repair department&amp;rsquo;s), but again, you can define your own customized relationships as well.&lt;/p&gt;
&lt;h1 id=&#34;skos-and-the-semantic-web&#34;&gt;SKOS and the Semantic Web&lt;/h1&gt;
&lt;p&gt;When becoming interested in semantic technology, many worry that before they build their first application, they must learn the RDF data model, the various syntaxes for expressing it, the SPARQL query language, and how to model data with RDF schema and OWL. When you use a SKOS-based vocabulary manager, you most likely fill out forms and use typical user interface widgets to manage your data with no need to learn the base W3C standards that underlie SKOS, but if you choose to learn a little about them, you can get more out of your data. For example, you can use the SPARQL query language to ask questions that might not be part of your vocabulary management package, and as mentioned above, you can define new properties and even classes to keep track of more customized metadata.&lt;/p&gt;
&lt;p&gt;You can also connect your data to a wider variety of data out there, whether it uses the SKOS ontology or not. The ability of the RDF data model to connect independently created data is what makes the Semantic Web a web, and the ability to combine datasets is an important payoff of this ability. For example, by making their SKOS-based subject header index freely available on the web, The New York Times lets other publishers use these subject headers for their own content, giving those publishers connections to related New York Times articles. More importantly, for The New York Times, it drives more traffic to their articles tagged with those subject headers.&lt;/p&gt;
&lt;p&gt;After you&amp;rsquo;ve added some properties to your SKOS data and run a few SPARQL queries against it, you can think about defining new ontologies apart from SKOS (or finding other existing standard ontologies besides SKOS to extend) and take greater and greater advantage of Semantic Web technologies.&lt;/p&gt;
&lt;h1 id=&#34;tools&#34;&gt;Tools&lt;/h1&gt;
&lt;p&gt;Any RDF tool that can edit data guided by a particular ontology can load the SKOS OWL ontology and let you create SKOS concepts and populate their properties with the appropriate metadata. For management of vocabularies by staff with no RDF background, several tools are available:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;TopQuadrant&amp;rsquo;s Enterprise Vocabulary Net (EVN) is a commercial web-based collaborative system built around the SKOS data model for the management of controlled vocabularies across an enterprise. &lt;em&gt;&lt;strong&gt;This has since evolved into &lt;a href=&#34;https://www.topquadrant.com/products/topbraid-enterprise-data-governance/&#34;&gt;TopBraid EDG&lt;/a&gt;, which focuses on a broader set of Data Governance tasks. I was happy to see that all of the remaining tools in this list are still around ten years after I originally wrote this piece.&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;https://www.poolparty.biz/&#34;&gt;PoolParty&lt;/a&gt; is a commercial thesaurus management and SKOS editor system that includes text mining and linked data capabilities.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The &lt;a href=&#34;https://code.google.com/archive/p/skoseditor/&#34;&gt;SKOSed&lt;/a&gt; plug-in for the Protégé ontology editor lets you edit thesauri represented in SKOS. Both SKOSed and Protégé are open source.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;https://iqvoc.net/&#34;&gt;iQvoc&lt;/a&gt; is an open source tool for managing vocabularies that can import and export SKOS.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;https://www.vocabularyserver.com/&#34;&gt;TemaTres&lt;/a&gt; is an open source vocabulary manager that can output vocabulary data as SKOS files.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Import and export of SKOS by vocabulary management tools should eventually be as common as import and export of comma-separated values by spreadsheet programs. If you use a taxonomy management program that doesn&amp;rsquo;t support the standard, let its makers know that you want to see it.&lt;/p&gt;
&lt;p&gt;The RDF basis of SKOS also means that you can take advantage of RDF-aware application development tools and libraries to build SKOS editing systems yourself much more quickly than you can build a taxonomy management system where you had to define and implement all the data structures yourself.&lt;/p&gt;
&lt;h1 id=&#34;starting-small-and-scaling-up&#34;&gt;Starting small and scaling up&lt;/h1&gt;
&lt;p&gt;If you have one or more large, complex controlled vocabularies to manage, converting it all to use a new format can be a big, expensive job. Converting a subset to SKOS as a pilot project can be much easier, and if you convert a few different subsets and then eventually connect them by defining the appropriate concept relationships across vocabulary boundaries, you start to see the benefit of SKOS in your own organization. With the growing support of both free and commercial software for the standard, SKOS is definitely worth further investigation by anyone who manages vocabularies and is interested in the benefits of standardization.&lt;/p&gt;
&lt;hr/&gt;
&lt;p&gt;&lt;em&gt;Comments? Reply to &lt;a href=&#34;https://twitter.com/bobdc/status/1439621667832086531&#34;&gt;my tweet&lt;/a&gt; announcing this blog entry.&lt;/em&gt;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2021">2021</category>
      
      <category domain="https://www.bobdc.com//categories/skos">SKOS</category>
      
    </item>
    
    <item>
      <title>What else can I do with RDFS?</title>
      <link>https://www.bobdc.com/blog/whatisrdfspart2/</link>
      <pubDate>Fri, 20 Aug 2021 11:01:00 +0000</pubDate>
      
      <guid>https://www.bobdc.com/blog/whatisrdfspart2/</guid>
      
      
      <description><div>Schemas can be a little fancier and even more useful with no need for OWL.</div><div>&lt;img src=&#34;https://www.bobdc.com/img/main/schemapic.jpg&#34; border=&#34;0&#34; align=&#34;right&#34; width=&#34;180px&#34; /&gt;
&lt;p&gt;In my last blog entry, &lt;a href=&#34;../whatisrdfs&#34;&gt;What is RDFS?&lt;/a&gt;, I described how the RDF Schema language lets you define RDF vocabularies, with the definitions themselves being RDF triples. We saw how simple class and property name definitions in a schema can, as machine-readable documentation for a dataset&amp;rsquo;s structure, provide greater interoperability for data and applications built around the same domain. Today we&amp;rsquo;ll look at how RDF schemas can store additional kinds of valuable information to add to what we saw in the sample schemas last time, and then we&amp;rsquo;ll look at some of the cool things that RDF schemas let you do.&lt;/p&gt;
&lt;h1 id=&#34;more-data-modeling&#34;&gt;More data modeling&lt;/h1&gt;
&lt;p&gt;When we use RDFS to define class and property names we can also define relationships between them. The following expands on the schema from last time to define relations between classes, between properties, and between classes and properties:&lt;/p&gt;
&lt;pre&gt;
@prefix rdf:     &amp;lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#&gt; .
@prefix rdfs:    &amp;lt;http://www.w3.org/2000/01/rdf-schema#&gt; . 
@prefix vcard:   &amp;lt;http://www.w3.org/2006/vcard/ns#&gt; .
@prefix emp:     &amp;lt;http://www.snee.com/schema/employees/&gt; .
@prefix ex:      &amp;lt;http://www.snee.com/example/&gt; .
@prefix dcterms: &amp;lt;http://purl.org/dc/terms/&gt; . 

emp:Person rdf:type rdfs:Class ;
          rdfs:label &#34;person&#34; . 

emp:Employee a rdfs:Class ; 
            &lt;b&gt;rdfs:subClassOf emp:Person ;&lt;/b&gt;
            rdfs:label &#34;employee&#34; . 

vcard:given-name  rdf:type rdf:Property ;
                  &lt;b&gt;rdfs:domain emp:Person ;&lt;/b&gt;
                  rdfs:label &#34;given name&#34;.

vcard:family-name rdf:type rdf:Property ;
                  &lt;b&gt;rdfs:domain emp:Person ;&lt;/b&gt;
                  rdfs:label &#34;family name&#34; ;
                  rdfs:label &#34;apellido&#34;@es . 

emp:hireDate a rdf:Property ;
            &lt;b&gt;rdfs:domain  emp:Employee ;&lt;/b&gt;
            rdfs:label   &#34;hire date&#34; ;
            rdfs:comment &#34;The first day an employee was on the payroll.&#34;  ;
            &lt;b&gt;rdfs:subPropertyOf dcterms:date . &lt;/b&gt;

emp:reportsTo a rdf:Property ; 
             &lt;b&gt;rdfs:domain emp:Employee ;
             rdfs:range  emp:Employee ;&lt;/b&gt;
             rdfs:label  &#34;reports to&#34; .
&lt;/pre&gt;
&lt;p&gt;The first thing that this schema has that the earlier one didn&amp;rsquo;t is a triple saying that &lt;code&gt;ex:Employee&lt;/code&gt; is a subclass of &lt;code&gt;ex:Person&lt;/code&gt;. If an inferencing parser saw that employees &lt;code&gt;ex:id2&lt;/code&gt; (Heidi Smith) and &lt;code&gt;ex:id3&lt;/code&gt; (Jane Berger) are instances of the &lt;code&gt;ex:Employee&lt;/code&gt; class, it would know that they were also instances of the &lt;code&gt;emp:Person&lt;/code&gt; class.&lt;/p&gt;
&lt;p&gt;Now that we know how to declare classes and indicate which is a subclass of another, we can build class hierarchies. These will be familiar to people who have used most modern programming languages. However, few if any of these programming languages let you also build property hierarchies. The schema above declares the &lt;code&gt;emp:hireDate&lt;/code&gt; property to be a subproperty of the popular &lt;a href=&#34;https://www.dublincore.org/specifications/dublin-core/dcmi-terms/&#34;&gt;Dublin Core&lt;/a&gt; vocabulary&amp;rsquo;s &lt;code&gt;dcterms:date&lt;/code&gt; property.&lt;/p&gt;
&lt;p&gt;What does this buy you? For one thing, a tool that generates a user interface for this human resources data might not recognize the &lt;code&gt;emp:hireDate&lt;/code&gt; property, but if it does the inferencing to find out that this property is a specialized version of the standard &lt;code&gt;dcterms:date&lt;/code&gt; one, it might know that a &lt;a href=&#34;https://developer.mozilla.org/en-US/docs/Web/HTML/Element/input/date&#34;&gt;date widget&lt;/a&gt; would be more appropriate to represent this field on an editing form than a plain text box.&lt;/p&gt;
&lt;p&gt;The &lt;a href=&#34;https://www.dublincore.org/specifications/dublin-core/dcmi-terms/dublin_core_terms.ttl&#34;&gt;Turtle version&lt;/a&gt; of the RDFS schema for the Dublin Core DCMI Metadata Terms vocabulary includes nine triples with the predicate and object &lt;code&gt;rdfs:subPropertyOf &amp;lt;http://purl.org/dc/elements/1.1/date&amp;gt;&lt;/code&gt;. These show us that properties such as &lt;code&gt;dcterms:available&lt;/code&gt;, &lt;code&gt;dcterms:created&lt;/code&gt;, and &lt;code&gt;dcterms:dateAccepted&lt;/code&gt; are dates. You might guess that from a property named &amp;ldquo;dateAccepted&amp;rdquo;, but you wouldn&amp;rsquo;t know this about a &amp;ldquo;created&amp;rdquo; property without this machine-readable way to describe the semantics of that property. (I rarely use the term &amp;ldquo;semantics&amp;rdquo;, but when I do use it, I mean it.)&lt;/p&gt;
&lt;p&gt;The next new thing to note in this schema, now that we&amp;rsquo;ve seen how to define relationships between classes, and between properties, is how this schema defines a relationship between a property and a class. The first &lt;code&gt;rdfs:domain&lt;/code&gt; triple above associates the &lt;code&gt;vcard:given-name&lt;/code&gt; property with the &lt;code&gt;emp:Person&lt;/code&gt; class. (Remember that if &lt;code&gt;emp:Employee&lt;/code&gt; is a subclass of  &lt;code&gt;emp:Person&lt;/code&gt;, then this property is now associated with &lt;code&gt;emp:Employee&lt;/code&gt; as well.) Is there anything wrong with associating a property defined in a standard vocabulary with my own thing that I&amp;rsquo;m defining in my own vocabulary? Absolutely not; it&amp;rsquo;s actually a good thing, because it provides a standards-based context for the thing I&amp;rsquo;m defining for my own application.&lt;/p&gt;
&lt;p&gt;As the &lt;a href=&#34;https://www.w3.org/TR/rdf-schema/#ch_domain&#34;&gt;W3C RDFS Recommendation&lt;/a&gt; tells us, &amp;ldquo;&lt;code&gt;rdfs:domain&lt;/code&gt; is an instance of &lt;code&gt;rdf:Property&lt;/code&gt; that is used to state that any resource that has a given property is an instance of one or more classes&amp;rdquo;. Given this, my schema is saying that if an RDF resource has a &lt;code&gt;vcard:given-name&lt;/code&gt; property, then we can infer that that resource is an instance of &lt;code&gt;emp:Person&lt;/code&gt;. (If this leads to an inference that the office dog is a person, I should re-evaluate my class hierarchy and which properties are associated with which classes.)&lt;/p&gt;
&lt;p&gt;Sometimes we forget that RDFS and OWL were invented to enable this kind of inferencing across data found on the web. They were not invented to help us define data structures, but as I&amp;rsquo;ve shown, RDFS is handy to at least document them. Continuing with my user-interface-generation example, a system generating an edit form for an Employee instance would know from this schema&amp;rsquo;s &lt;code&gt;rdfs:domain&lt;/code&gt; triplets that this editing form should include &lt;code&gt;vcard:given-name&lt;/code&gt;, &lt;code&gt;vcard:family-name&lt;/code&gt;, &lt;code&gt;emp:hireDate&lt;/code&gt;, and &lt;code&gt;emp:reportsTo&lt;/code&gt; fields. (And, as I mentioned last time, it should know that the form would be easier to read if these fields were labeled with the properties&amp;rsquo; &lt;code&gt;rdfs:label&lt;/code&gt; values and not the actual property names.)&lt;/p&gt;
&lt;p&gt;Software developers who recognize the ability to define class hierarchies may be a bit confused by the relationship between classes and properties in RDFS. In standard object-oriented modeling, when you define a class, you define the properties used by that class, and some may be inherited from superclasses. In RDFS, you define classes and properties separately and then associate them, if you like, with the &lt;code&gt;rdfs:domain&lt;/code&gt; property. (The fact that properties can have their own hierarchies is something else that can take object-oriented developers some time to get accustomed to.)&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;rdfs:range&lt;/code&gt; property defined for the &lt;code&gt;emp:reportsTo&lt;/code&gt; class is another way to define a relationship between a class and a property. According to the RDFS Recommendation, it &amp;ldquo;is used to state that the values of a property are instances of one or more classes&amp;rdquo;. We saw that if &lt;code&gt;emp:reportsTo&lt;/code&gt; has an &lt;code&gt;rdfs:domain&lt;/code&gt; of &lt;code&gt;emp:Employee&lt;/code&gt;, then &amp;ldquo;X reports to Y&amp;rdquo; means that X is an &lt;code&gt;emp:Employee&lt;/code&gt;; if &lt;code&gt;emp:reportsTo&lt;/code&gt; has an &lt;code&gt;rdfs:range&lt;/code&gt; of  &lt;code&gt;emp:Employee&lt;/code&gt;, we can infer from the same statement that Y is an &lt;code&gt;emp:Employee&lt;/code&gt;—that is, that an employee reports to another employee. Even if we don&amp;rsquo;t plan on doing this kind of inferencing with &lt;code&gt;rdfs:range&lt;/code&gt;, it&amp;rsquo;s still useful to indicate what kind of values to expect for a given property. For example, the application generating a form to edit employee data could generate a drop-down list of employee names on the &amp;ldquo;reports to&amp;rdquo; part of the form instead of a plain text box.&lt;/p&gt;
&lt;h1 id=&#34;more-support-of-interesting-applications&#34;&gt;More support of interesting applications&lt;/h1&gt;
&lt;p&gt;I&amp;rsquo;ve written other blog entries about how I applied the ideas described above to various useful projects.&lt;/p&gt;
&lt;h2 id=&#34;drive-a-mobile-user-interface&#34;&gt;Drive a (mobile!) user interface&lt;/h2&gt;
&lt;p&gt;In &lt;a href=&#34;../using-sparql-queries-from-nati/&#34;&gt;Using SPARQL queries from native Android apps&lt;/a&gt; I describe how I used the MIT App Inventor toolkit to create a native Android app that lets the user pick a clothing product and a color and a size for that product before sending the selected information off to a web server. The choice of products, colors, and sizes are all stored in an RDFS model; screenshots from my phone show how the list of color choices expanded after I added a new one to the RDF schema that stored the model. This blog entry also describes how additions to the RDFS model (with no changes to the Android app) would enable support in the app for other spoken languages besides English.&lt;/p&gt;
&lt;h2 id=&#34;data-integration&#34;&gt;Data integration&lt;/h2&gt;
&lt;p&gt;&lt;a href=&#34;http://www.bobdc.com/blog/driving-hadoop-data-integratio/&#34;&gt;Driving Hadoop data integration with standards-based models instead of code&lt;/a&gt; describes a data integration demo that combines data from Microsoft&amp;rsquo;s SQL Server Northwind sample database with data from Oracle&amp;rsquo;s sample HR database. These databases both describe human resources databases but use different names (for example, &lt;code&gt;LastName&lt;/code&gt; and &lt;code&gt;last_name&lt;/code&gt;) for similar properties. Using Python and a SPARQL query, the demo collects data from the two sources and represents them using a common vocabulary. The system uses an RDFS model to both define this vocabulary and—and this part is crucial—to define the mapping from the two data sources to this vocabulary using the &lt;code&gt;rdfs:subPropertyOf&lt;/code&gt; property mentioned above. After running the demo and expanding the RDFS model to cover more of the input, running the demo again integrates more of the source data with only that expansion of the model. No changes to the Python scripts were necessary.&lt;/p&gt;
&lt;p&gt;All the ideas I&amp;rsquo;ve described about this project so far are pretty simple. The novelty of the article was that I set it all up to happen on a &lt;a href=&#34;../hadoop&#34;&gt;Hadoop&lt;/a&gt; cluster distributed across multiple systems, because that was especially hot at the time.&lt;/p&gt;
&lt;p&gt;Because this article was written to accompany something I did for IBM Data Magazine, it doesn&amp;rsquo;t assume familiarity with RDF as much as other entries on my blog, so if you&amp;rsquo;re new to RDF that might be helpful.&lt;/p&gt;
&lt;h2 id=&#34;transform-data-with-partial-schemas&#34;&gt;Transform data with partial schemas&lt;/h2&gt;
&lt;p&gt;My more recent blog entry &lt;a href=&#34;../partialschemas&#34;&gt;Transforming data with inferencing and (partial!) schemas&lt;/a&gt; describes how, if you have a big mess of more data than you need, an RDF schema for the subset of that data that you actually want can be very useful. This is especially true when you use inferencing to transform the data. I&amp;rsquo;ll quote the whole first paragraph of that blog posting here:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;I originally planned to title this “Partial schemas!” but as I assembled the example I realized that in addition to demonstrating the value of partial, incrementally-built schemas, the steps shown below also show how inferencing with schemas can implement transformations that are very useful in data integration. In the right situations this can be even better than SPARQL, because instead of using code—whether procedural or declarative—the transformation is driven by the data model itself.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Here&amp;rsquo;s another paragraph from after the piece walks through the demo:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;This idea of letting the data and its schema evolve in a more flexible manner is especially great for data integration projects. My example here started off with a (somewhat) big mess of RDF; if you&amp;rsquo;re working with more than one RDF dataset—maybe with some converted from other formats such as JSON or relational databases—then the use of RDFS to identify little subsets of those datasets &lt;em&gt;and to specify relationships between components of those subsets&lt;/em&gt; can help your knowledge graph and the applications that use it become useful a lot sooner.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Again, you&amp;rsquo;ll see many of the techniques outlined in today&amp;rsquo;s blog post put to good use in that project.&lt;/p&gt;
&lt;h2 id=&#34;a-bit-more-useful-background&#34;&gt;A bit more useful background&lt;/h2&gt;
&lt;p&gt;When using certain standards, it&amp;rsquo;s easy to assume that the standard itself is a batch of long, technical jargon. The W3C RDF Schema Recommendation is not very long and actually quite readable, as I wrote in &lt;a href=&#34;http://www.bobdc.com/blog/rdfs-the-primary-document/&#34;&gt;RDFS: The primary document&lt;/a&gt;, so I recommend it. The &lt;a href=&#34;https://en.wikipedia.org/wiki/RDF_Schema&#34;&gt;RDF Schema Wikipedia page&lt;/a&gt; also summarizes what RDFS offers and what kinds of things you can do with it quite nicely.&lt;/p&gt;
&lt;p&gt;I have been referring to inferencing quite casually, although my &amp;ldquo;Data integration&amp;rdquo; and &amp;ldquo;Transform data with partial schemas&amp;rdquo; examples do go into more detail about actually executing that. You may also find &lt;a href=&#34;../materializing&#34;&gt;Living in a materialized world&lt;/a&gt; useful; this covers the potential role and mechanics of RDFS inferencing.&lt;/p&gt;
&lt;p&gt;And, &lt;a href=&#34;../jenagems&#34;&gt;Hidden gems included with Jena’s command line utilities&lt;/a&gt; describes how an open source multi-platform Apache Jena tool can perform RDFS inferencing for you.&lt;/p&gt;
&lt;p&gt;Let me know how you end up using RDFS! There is a lot of potential there that has been unused for too long.&lt;/p&gt;
&lt;hr/&gt;
&lt;p&gt;&lt;em&gt;Comments? Reply to &lt;a href=&#34;https://twitter.com/bobdc/status/1428737402185584645&#34;&gt;my tweet&lt;/a&gt; announcing this blog entry.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://creativecommons.org/licenses/by/2.0/&#34;&gt;CC BY 2.0&lt;/a&gt; &lt;a href=&#34;https://www.flickr.com/photos/h_duncan/50259072881/in/photolist-2jzdTnk-7HmJg6-7HqDcb-2kjuRxL-jpPNH9-6tX2xu-FLV2Gk-P8tcKE-6tSUk8-6tX3is-jvFuFV-joAz5Y-jpLVtu-4n7jAD-2cegzFj-6tX3g7-6tSUap-7HqDhy-6PBVjR-6tX3jA-6tSUg8-jrh79d-3VFfGk-98uTat-5Rf3kN-67kHPt-6tkLQB-5PQECt-jvv58E-2Bf8o5-6tX2uh-5CRngn-7nTViq-6tSU2X-aooPVM-6tX2Eo-8VHpaL-d4uS5u-6tSTNc-6tX2yQ-6tX2wC-6rJh81-ieejJg-6tSU5M-a9cwu-6T3cGR-6TY6HT-4JaYY9-6tSU1X-7HmHWr&#34;&gt;photo&lt;/a&gt; by &lt;a href=&#34;https://www.flickr.com/photos/h_duncan/&#34;&gt;Howard Duncan&lt;/a&gt;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2021">2021</category>
      
      <category domain="https://www.bobdc.com//categories/rdfs">RDFS</category>
      
    </item>
    
    <item>
      <title>What is RDFS?</title>
      <link>https://www.bobdc.com/blog/whatisrdfs/</link>
      <pubDate>Sun, 25 Jul 2021 11:55:00 +0000</pubDate>
      
      <guid>https://www.bobdc.com/blog/whatisrdfs/</guid>
      
      
      <description><div>And how much can a simple schema do for you?</div><div>&lt;img src=&#34;https://www.bobdc.com/img/main/schemapic.jpg&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedPicture&#34; width=&#34;180px&#34; /&gt;
&lt;p&gt;RDFS, or RDF Schema, is a &lt;a href=&#34;https://www.w3.org/TR/rdf-schema/&#34;&gt;W3C standard&lt;/a&gt; specialized vocabulary for describing RDF vocabularies and data models. Before I discuss it further, though, I&amp;rsquo;d like to explain why the use of standardized, specialized vocabularies (whether RDFS itself or a vocabulary that someone uses RDFS to describe) can be useful beyond the advantages of sharing a vocabulary with others for easier interoperability.&lt;/p&gt;
&lt;p&gt;Last month, in &lt;a href=&#34;../whatisrdf/&#34;&gt;What is RDF?&lt;/a&gt;, my example dataset included triples whose predicates came from the W3C standard &lt;a href=&#34;https://www.w3.org/TR/vcard-rdf/&#34;&gt;vCard business card&lt;/a&gt; ontology. It also included triples from a namespace that I had created myself with my own domain name. Certain kinds of RDF applications go through data and, when they find predicates that use a specialized vocabulary designed for such applications, they execute special tasks designed around that vocabulary. For example, GeoSPARQL applications that find predicates from the &lt;code&gt;http://www.opengis.net/def/function/geosparql/&lt;/code&gt; namespace can perform geospatial math that answers questions such as &amp;ldquo;what museums are within a mile of New York&amp;rsquo;s Museum of Modern Art?&amp;rdquo;, as I described in &lt;a href=&#34;../geosparqlgraphdb/&#34;&gt;GeoSPARQL queries on OSM Data in GraphDB&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The use of RDF does not require any schemas. However, the commercial and open source tools that can understand the RDFS vocabulary (by which I mean the RDFS vocabulary itself, not necessarily the ones you define with it) make it easier for applications to build user interfaces around RDF-based applications, to integrate data from disparate datasets, and more. Before we get there, though, let&amp;rsquo;s look at an example of an RDF schema and some data that uses it.&lt;/p&gt;
&lt;p&gt;The following RDFS schema uses the Turtle syntax to describe a few classes and properties.&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;# Employee schema version 1
# Pound sign lets you add comments to Turtle.
@prefix rdfs:  &amp;lt;http://www.w3.org/2000/01/rdf-schema#&amp;gt; . 
@prefix rdf:   &amp;lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#&amp;gt; . 
@prefix vcard: &amp;lt;http://www.w3.org/2006/vcard/ns#&amp;gt; .
@prefix emp:   &amp;lt;http://www.snee.com/schema/employees/&amp;gt; .

emp:Person   rdf:type rdfs:Class .
emp:Employee a        rdfs:Class .

vcard:given-name  a rdf:Property .
vcard:family-name a rdf:Property .
emp:hireDate      a rdf:Property .
emp:reportsTo     a rdf:Property .
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;The first thing to note is that the schema is itself triples, using Turtle RDF to describe a few RDF structures. This means that you can use SPARQL and other RDF tools to work with the schema itself and with collections of schemas.&lt;/p&gt;
&lt;p&gt;The second thing to note is how simple a schema can be—in this case, just six triples saying &amp;ldquo;Here are some classes and properties to potentially use&amp;rdquo;.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;rdf:type&lt;/code&gt; predicate means &amp;ldquo;is an instance of the following class&amp;rdquo;, so the first  triple above says that &lt;code&gt;emp:Person&lt;/code&gt; is itself a class. (Below we&amp;rsquo;ll see how to create instances of &lt;code&gt;emp:Person&lt;/code&gt;.) This schema&amp;rsquo;s next triple says that &lt;code&gt;emp:Employee&lt;/code&gt; is also a class. Instead of the &lt;code&gt;rdf:type&lt;/code&gt; predicate, that line uses the shortcut &amp;quot; a &amp;quot;. This means the same thing, but with a syntax that brings the triple closer to the English expression &amp;ldquo;&lt;code&gt;emp:Employee&lt;/code&gt; is a class&amp;rdquo;.&lt;/p&gt;
&lt;p&gt;The remaining four triples in that example list some available properties. I copied two from the vCard vocabulary and made up two myself.&lt;/p&gt;
&lt;h2 id=&#34;using-the-schema&#34;&gt;Using the schema&lt;/h2&gt;
&lt;p&gt;The following instance data uses the classes and properties declared above:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;@prefix vcard: &amp;lt;http://www.w3.org/2006/vcard/ns#&amp;gt; .
@prefix emp:   &amp;lt;http://www.snee.com/schema/employees/&amp;gt; .
@prefix ex:    &amp;lt;http://www.snee.com/example/&amp;gt; .

ex:id1 a emp:Person ; 
       vcard:given-name  &amp;#34;Francis&amp;#34; ;
       vcard:family-name &amp;#34;Jones&amp;#34; .

ex:id2 a emp:Employee ;
       vcard:given-name  &amp;#34;Heidi&amp;#34; ;
       vcard:family-name &amp;#34;Smith&amp;#34; ;
       emp:hireDate      &amp;#34;2015-01-13&amp;#34; .

ex:id3 a emp:Employee ; 
       vcard:given-name  &amp;#34;Jane&amp;#34; ;
       vcard:family-name &amp;#34;Berger&amp;#34; ;
       emp:reportsTo     ex:id2 . 
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;These triples use another bit of Turtle syntax that I didn&amp;rsquo;t cover last month: a semicolon means &amp;ldquo;the next triple has the same subject as the last one&amp;rdquo;. For example, the first three lines after the prefix declarations in this sample data say that resource &lt;code&gt;sn:id1&lt;/code&gt; is an instance of the class Person, has a given name of Francis, and a family name of Jones.&lt;/p&gt;
&lt;p&gt;The schema above doesn&amp;rsquo;t say much, but it&amp;rsquo;s already at least as useful as a list of the columns in a relational table. Someone who has this schema and is working with this data knows what property names to use if they want query the data, add to it, or delete from it. They also know what the potential classes are and can query for instances of those classes. All of these abilities are a big help if multiple people are going to create interoperable data and applications.&lt;/p&gt;
&lt;h2 id=&#34;adding-to-the-schema&#34;&gt;Adding to the schema&lt;/h2&gt;
&lt;p&gt;The next version of the same schema goes a little further by providing more information about the classes and properties:&lt;/p&gt;
&lt;pre&gt;
# Employee schema version 2
@prefix rdfs:  &amp;lt;http://www.w3.org/2000/01/rdf-schema#&gt; . 
@prefix rdf:   &amp;lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#&gt; . 
@prefix vcard: &amp;lt;http://www.w3.org/2006/vcard/ns#&gt; .
@prefix emp:   &amp;lt;http://www.snee.com/schema/employees/&gt; .

emp:Person rdf:type rdfs:Class ;
           &lt;b&gt;rdfs:label &#34;person&#34;&lt;/b&gt; . 

emp:Employee a rdfs:Class ; 
             &lt;b&gt;rdfs:label &#34;employee&#34; ;
             rdfs:comment &#34;A full-time, non-contractor employee.&#34; &lt;/b&gt;.

vcard:given-name  rdf:type rdf:Property ;
                  &lt;b&gt;rdfs:label &#34;given name&#34;&lt;/b&gt;.

vcard:family-name rdf:type rdf:Property ;
                  &lt;b&gt;rdfs:label &#34;family name&#34; ;
                   rdfs:label &#34;apellido&#34;@es &lt;/b&gt;. 

emp:hireDate a rdf:Property ;
             &lt;b&gt;rdfs:label   &#34;hire date&#34; ;
             rdfs:comment &#34;The first day an employee was on the payroll.&#34;&lt;/b&gt; .

emp:reportsTo a rdf:Property ; 
              &lt;b&gt;rdfs:label  &#34;reports to&#34;&lt;/b&gt; .
&lt;/pre&gt;
&lt;p&gt;This version includes &lt;code&gt;rdfs:comment&lt;/code&gt; and &lt;code&gt;rdfs:label&lt;/code&gt; properties. The former function as documentation for the things they&amp;rsquo;re describing. They should provide clarity as to exactly what the described resource means, like the &lt;code&gt;rdfs:comment&lt;/code&gt; value for the &lt;code&gt;emp:Employee&lt;/code&gt; resource: &amp;ldquo;A full-time, non-contractor employee.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;rdfs:label&lt;/code&gt; property provides a human-readable name for the resource being described. This is especially helpful for reports and applications that use this data. For example, if your application will display a form where people can edit data about employees, it would be difficult for these end users to read the form if it labeled its fields with actual property names such as &amp;ldquo;vcard:given-name&amp;rdquo; and  &amp;ldquo;emp:hireDate&amp;rdquo;. On the other hand, you shouldn&amp;rsquo;t hard-code more readable form field names like &amp;ldquo;hire date&amp;rdquo; and &amp;ldquo;family name&amp;rdquo; in your application code, either.&lt;/p&gt;
&lt;p&gt;For some real &lt;a href=&#34;https://martinfowler.com/bliki/ModelDrivenSoftwareDevelopment.html&#34;&gt;model-driven development&lt;/a&gt; you want to set it up so that as your model (as encoded by the schema) evolves the application automatically adapts to this evolution wherever possible. Providing display names as part of the model helps move your application toward this goal. An application that uses the revised version of my sample schema can use &lt;code&gt;rdfs:label&lt;/code&gt; values such as &amp;ldquo;family name&amp;rdquo; and &amp;ldquo;given name&amp;rdquo; to provide much more readable form field labels.&lt;/p&gt;
&lt;p&gt;RDF (and hence RDFS) also let you add language tags to literal values. If you add multiple &lt;code&gt;rdfs:label&lt;/code&gt; values to an RDF resource and you tag each of these values according to its language, then the model-driven development described above can extend to the generation of forms in different languages for different users. In the second version of my schema the resource  &lt;code&gt;vcard:family-name&lt;/code&gt; has labels in both English and Spanish. (A future version of the schema should have Spanish labels for the other classes and properties as well.) You can even include language codes for &lt;a href=&#34;http://learningsparql.com/2ndeditionexamples/ex037.ttl&#34;&gt;country-specific&lt;/a&gt; versions of terms so that a given form could be displayed in American English, British English, Castilian Spanish, Mexican Spanish, and more, all based on data in the schema.&lt;/p&gt;
&lt;p&gt;Remember that while I&amp;rsquo;m using &lt;code&gt;rdfs:label&lt;/code&gt; and &lt;code&gt;rdfs:comment&lt;/code&gt; values in an RDFS schema here, you can also use them in any RDF you like. For example:&lt;/p&gt;
&lt;pre&gt;
@prefix rdfs:  &amp;lt;http://www.w3.org/2000/01/rdf-schema#&gt; . 
@prefix vcard: &amp;lt;http://www.w3.org/2006/vcard/ns#&gt; .
@prefix emp:   &amp;lt;http://www.snee.com/schema/employees/&gt; .
@prefix ex:    &amp;lt;http://www.snee.com/example/&gt; .

ex:id3 a emp:Employee ;
       vcard:given-name &#34;Jane&#34; ;
       vcard:family-name &#34;Berger&#34; ;
       rdfs:label &#34;Jane Berger&#34; ;
       &lt;b&gt;rdfs:comment &#34;&#34;&#34;Jane has taken the sales department from being only her
                    and an assistant to the ten-person team we have today.&#34;&#34;&#34;&lt;/b&gt; .
&lt;/pre&gt;
&lt;p&gt;(This &lt;code&gt;rdfs:comment&lt;/code&gt; value here is shown as a &lt;a href=&#34;https://www.w3.org/TR/turtle/#grammar-production-STRING_LITERAL_LONG_QUOTE&#34;&gt;long literal&lt;/a&gt;, which encloses the values in triple quotation marks so that the value can include carriage returns.) Similarly, you can add language tags to any RDF literal values you want—not just RDFS schemas.&lt;/p&gt;
&lt;h1 id=&#34;schemaorg&#34;&gt;schema.org&lt;/h1&gt;
&lt;p&gt;In my next blog entry I&amp;rsquo;ll describe some fancier modeling that you can do with RDFS and how it can help applications such as data integration and even a mobile application. I&amp;rsquo;ll also mention (as I have &lt;a href=&#34;../partialschemas/&#34;&gt;before&lt;/a&gt;) how, in the debate over schema-driven software development versus schemaless development, the use of partial schemas can give you the best of both worlds. (&lt;a href=&#34;../whatisrdf/&#34;&gt;Last month&lt;/a&gt; I promised a few of those things for this blog entry, but for this entry I wanted to emphasize the value of RDFS&amp;rsquo;s most basic constructs.)&lt;/p&gt;
&lt;p&gt;Meanwhile, take a look at the RDFS schema for &lt;a href=&#34;https://schema.org/&#34;&gt;schema.org&lt;/a&gt;. From the &lt;a href=&#34;https://schema.org/docs/developers.html#defs&#34;&gt;Vocabulary Definition Files&lt;/a&gt; section of the page &lt;a href=&#34;https://schema.org/docs/developers.html&#34;&gt;Schema.org for Developers&lt;/a&gt; you can pick which variation you want, in which serialization; I would pick the &lt;a href=&#34;https://schema.org/version/latest/schemaorg-current-https.ttl&#34;&gt;Turtle serialization&lt;/a&gt; to see how the schema demonstrates what I&amp;rsquo;ve been describing here.&lt;/p&gt;
&lt;p&gt;You should recognize a lot of the Turtle version of the schema.org schema, because it&amp;rsquo;s mostly declarations of classes and properties with &lt;code&gt;rdfs:label&lt;/code&gt;  values and descriptive &lt;code&gt;rdfs:comment&lt;/code&gt; values. Schema.org provides an excellent role model for RDFS development—all without any OWL! Fifteen years ago I had a &lt;a href=&#34;../rdfs-without-rdfowl&#34;&gt;difficult time finding&lt;/a&gt; an example of RDFS being used without any OWL mixed in, and I think Schema.org has been a real inspiration since then.&lt;/p&gt;
&lt;p&gt;From now on, when you see a given set of RDF terms being used, ask &amp;ldquo;where can I find a schema documenting it?&amp;rdquo; And, if you find a schema (or OWL ontology) describing a model, ask &amp;ldquo;where can I see sample data that follows this schema? (Schema.org sample data tends to be in JSON-LD, but you can &lt;a href=&#34;../jenagems/#riot&#34;&gt;convert&lt;/a&gt; it to Turtle easily enough.)&lt;/p&gt;
&lt;hr/&gt;
&lt;p&gt;&lt;em&gt;Comments? Reply to &lt;a href=&#34;https://twitter.com/bobdc/status/1419329037005205504&#34;&gt;my tweet&lt;/a&gt; announcing this blog entry.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://creativecommons.org/licenses/by/2.0/&#34;&gt;CC BY 2.0&lt;/a&gt; &lt;a href=&#34;https://www.flickr.com/photos/h_duncan/50259072881/in/photolist-2jzdTnk-7HmJg6-7HqDcb-2kjuRxL-jpPNH9-6tX2xu-FLV2Gk-P8tcKE-6tSUk8-6tX3is-jvFuFV-joAz5Y-jpLVtu-4n7jAD-2cegzFj-6tX3g7-6tSUap-7HqDhy-6PBVjR-6tX3jA-6tSUg8-jrh79d-3VFfGk-98uTat-5Rf3kN-67kHPt-6tkLQB-5PQECt-jvv58E-2Bf8o5-6tX2uh-5CRngn-7nTViq-6tSU2X-aooPVM-6tX2Eo-8VHpaL-d4uS5u-6tSTNc-6tX2yQ-6tX2wC-6rJh81-ieejJg-6tSU5M-a9cwu-6T3cGR-6TY6HT-4JaYY9-6tSU1X-7HmHWr&#34;&gt;photo&lt;/a&gt; by &lt;a href=&#34;https://www.flickr.com/photos/h_duncan/&#34;&gt;Howard Duncan&lt;/a&gt;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2021">2021</category>
      
      <category domain="https://www.bobdc.com//categories/rdfs">RDFS</category>
      
    </item>
    
    <item>
      <title>What is RDF?</title>
      <link>https://www.bobdc.com/blog/whatisrdf/</link>
      <pubDate>Sun, 27 Jun 2021 13:20:00 +0000</pubDate>
      
      <guid>https://www.bobdc.com/blog/whatisrdf/</guid>
      
      
      <description><div>What can this simple standardized model do for you?</div><div>&lt;img src=&#34;https://www.bobdc.com/img/main/rdflogo.png&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; /&gt;
&lt;p&gt;&lt;em&gt;I have usually assumed that people reading this blog already know what RDF is. After recent discussions with people coming to RDF from the Linked (Open) Data and Knowledge Graph worlds, I realized that it would be useful to have a simple explanation that I could point to. This builds on material from the first three minutes of my video &lt;a href=&#34;https://www.youtube.com/watch?v=FvGndkpa4K0&#34;&gt;SPARQL in 11 Minutes&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;RDF, or Resource Description Framework, is a &lt;a href=&#34;https://www.w3.org/RDF/&#34;&gt;W3C standard&lt;/a&gt; (along with HTML, CSS, and XML) for a simple, flexible data model. RDF lets you describe data using a collection of three-part statements that can say things like &amp;ldquo;employee 3 has a title of &amp;lsquo;Vice President&amp;rsquo;.&amp;rdquo; We call these three parts the subject, predicate, and object. You can think of them as an entity identifier, an attribute name, and an attribute value.&lt;/p&gt;
&lt;img src=&#34;https://www.bobdc.com/img/main/emp3TitleVP.png&#34; class=&#34;centered&#34; width=&#34;400&#34; alt=&#34;sample triple: emp3 title Vice President&#34;/&gt;
&lt;p&gt;The subject and predicate are actually represented using URIs (Uniform Resource Identifiers) to make it absolutely clear what we&amp;rsquo;re talking about. URIs are similar to URLs (Uniform Resource Locators), and often look like them, but they&amp;rsquo;re not locators, or addresses; they&amp;rsquo;re just identifiers.&lt;/p&gt;
&lt;p&gt;The URIs in the following show that:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;we mean  employee 3 from a specific company&lt;/li&gt;
&lt;li&gt;we mean &amp;ldquo;title&amp;rdquo; in the sense of job title and not a label for a book, movie, or other creative work, because we&amp;rsquo;re using the URI for title defined by the W3C&amp;rsquo;s published version of the &lt;a href=&#34;https://www.w3.org/TR/vcard-rdf/&#34;&gt;vCard business card&lt;/a&gt; ontology&lt;/li&gt;
&lt;/ul&gt;
&lt;img src=&#34;https://www.bobdc.com/img/main/emp3TitleVPURIs.png&#34; class=&#34;centered&#34; width=&#34;500&#34; alt=&#34;sample triple with URIs&#34;/&gt;
&lt;p&gt;The object, or third part of a triple, can also be a URI:&lt;/p&gt;
&lt;img src=&#34;https://www.bobdc.com/img/main/emp8reportsToemp3.png&#34; class=&#34;centered&#34; width=&#34;600&#34; alt=&#34;triple: emp8 reports to emp3&#34;/&gt;
&lt;p&gt;This way, the same resource can be the object of some triples and the subject of others, which lets you connect up triples into networks of data called graphs.&lt;/p&gt;
&lt;img src=&#34;https://www.bobdc.com/img/main/emp3ReportsToemp8Graph.png&#34; class=&#34;centered&#34; width=&#34;400&#34; alt=&#34;graph with previous image&#39;s triple&#34;/&gt;
&lt;p&gt;RDF&amp;rsquo;s popular Turtle syntax often shortens the URIs by having an abbreviated prefix stand in for everything in the URI before the last part. This makes URIs simpler to read and write.&lt;/p&gt;
&lt;pre&gt;
@prefix &lt;span style=&#34;color:red&#34;&gt;vcard:&lt;/span&gt; &amp;lt;http://www.w3.org/2006/vcard/ns#&gt; .
@prefix &lt;span style=&#34;color:red&#34;&gt;sn:&lt;/span&gt;    &amp;lt;http://www.snee.com/hr/&gt; .

&lt;span style=&#34;color:red&#34;&gt;sn:&lt;/span&gt;emp3 &lt;span style=&#34;color:red&#34;&gt;vcard:&lt;/span&gt;title &#34;Vice President&#34; . 
&lt;/pre&gt;
&lt;p&gt;Just about any data can be represented as a collection of triples. For example, we can usually represent each entry of a table by using the row identifier as the subject, the column name as the predicate, and the value as the object.&lt;/p&gt;
&lt;img src=&#34;https://www.bobdc.com/img/main/tableWithTriple.png&#34; class=&#34;centered&#34; width=&#34;600&#34; alt=&#34;triple in table&#34;/&gt;
&lt;p&gt;This can give us triples for every fact on the table.&lt;/p&gt;
&lt;pre&gt;
@prefix vcard: &amp;lt;http://www.w3.org/2006/vcard/ns#&gt; .
@prefix sn: &amp;lt;http://www.snee.com/hr/&gt; .

sn:emp1   vcard:given-name   &#34;Heidi&#34; .
sn:emp1   vcard:family-name   &#34;Smith&#34; .
sn:emp1   vcard:title   &#34;CEO&#34; .
sn:emp1   sn:hireDate   &#34;2015-01-13&#34; .
sn:emp1   sn:completedOrientation   &#34;2015-01-30&#34; .

sn:emp2   vcard:given-name   &#34;John&#34; .
sn:emp2   vcard:family-name   &#34;Smith&#34; .
sn:emp2   sn:hireDate   &#34;2015-01-28&#34; .
sn:emp2   vcard:title   &#34;Engineer&#34; .
sn:emp2   sn:completedOrientation   &#34;2015-01-30&#34; .
sn:emp2   sn:completedOrientation   &#34;2015-03-15&#34; .

sn:emp3   vcard:given-name   &#34;Francis&#34; .
sn:emp3   vcard:family-name   &#34;Jones&#34; .
sn:emp3   sn:hireDate   &#34;2015-02-13&#34; .
&lt;span style=&#34;color:red&#34;&gt;sn:emp3   vcard:title   &#34;Vice President&#34; .&lt;/span&gt;

sn:emp4   vcard:given-name   &#34;Jane&#34; .
sn:emp4   vcard:family-name   &#34;Berger&#34; .
sn:emp4   sn:hireDate   &#34;2015-03-10&#34; .
sn:emp4   vcard:title   &#34;Sales&#34; .
&lt;/pre&gt;
&lt;p&gt;Some of the property names here come from the vcard standard vocabulary. For the properties not available in vcard or another standard vocabulary that I knew of, I made up my own property names using my own domain name.  Many other standardized vocabularies such as &lt;a href=&#34;https://schema.org/docs/schemas.html&#34;&gt;schema.org&lt;/a&gt;, &lt;a href=&#34;https://www.geonames.org/ontology/documentation.html&#34;&gt;geonames&lt;/a&gt;, and &lt;a href=&#34;https://www.dublincore.org/specifications/dublin-core/dcmi-terms/&#34;&gt;Dublin Core&lt;/a&gt; provide URIs to help you make the exact sense of a term clear.  (As one example, I would have used Dublin Core if I wanted to use the term &amp;ldquo;title&amp;rdquo; to &lt;a href=&#34;https://www.dublincore.org/specifications/dublin-core/dcmi-terms/#http://purl.org/dc/terms/title&#34;&gt;refer to a book&lt;/a&gt;.) RDF makes it easy to mix and match standard vocabularies and customizations.&lt;/p&gt;
&lt;p&gt;The data in the example above fits neatly into the table shown. Imagine that it was in a relational table and we wanted to add information about Heidi Smith&amp;rsquo;s university degree. With a relational table, we&amp;rsquo;d have to add a new column to the table—a structural change to the database itself that would probably require a database administrator. To do this with RDF, it&amp;rsquo;s just one more triple:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;sn:emp1 sn:degree &amp;#34;MFA University of Iowa 2015&amp;#34; .
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Imagine that a database administrator had added a &lt;code&gt;degree&lt;/code&gt; column to the relational table, but now Heidi has an additional degree to describe in the data. The degree column can only store one degree description, so to allow for employees having more than one degree in a relational database, the database administrator would probably remove the new &lt;code&gt;degree&lt;/code&gt; column from that table and then create one or more entirely new tables to track the relationship of employees to degrees. In RDF, it would be just one more triple:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;sn:emp1 sn:degree &amp;#34;MBA Wharton 2019&amp;#34; .
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;A triple object that is not a URI is known as a literal. In the examples we&amp;rsquo;ve seen so far, the literals are all strings, but they can be other data types.  They can be &lt;a href=&#34;https://www.w3.org/TR/xmlschema-2/&#34;&gt;XSD data types&lt;/a&gt; such as boolean, integer or float, and they can be data types that you define yourself:&lt;/p&gt;
&lt;pre&gt;
@prefix sn:  &amp;lt;http://www.snee.com/hr/&gt; .
@prefix xsd: &amp;lt;http://www.w3.org/2001/XMLSchema#&gt; .

sn:emp1 sn:startDate &#34;2021-03-04&#34;&lt;span style=&#34;color:red&#34;&gt;^^xsd:date&lt;/span&gt; . 
sn:emp1 sn:empCode   &#34;D1&#34;&lt;span style=&#34;color:red&#34;&gt;^^sn:myCustomDataType&lt;/span&gt; . 
&lt;/pre&gt;
&lt;h1 id=&#34;rdf-syntaxes&#34;&gt;RDF syntaxes&lt;/h1&gt;
&lt;p&gt;I mentioned earlier that RDF is a standardized data &lt;strong&gt;model&lt;/strong&gt;. There have been various syntaxes to write it down. The original was called &lt;a href=&#34;https://www.w3.org/TR/rdf-syntax-grammar/&#34;&gt;RDF/XML&lt;/a&gt;; XML was used because it was standardized and flexible, and also because one of the original RDF use cases was to add arbitrary metadata to web pages—the idea was that an additional block of XML would fit well into an HTML file&amp;rsquo;s &lt;code&gt;head&lt;/code&gt; element. As it turned out, using XML to represent arbitrary collections of relationships could get verbose and messy. Because of this, no one uses RDF/XML anymore, but unfortunately, in the early days, this particular syntax gave RDF itself a bad reputation. (My own theory is that the file naming convention of giving RDF/XML files an extension of &amp;ldquo;.rdf&amp;rdquo; made people think that that&amp;rsquo;s what RDF really was.)&lt;/p&gt;
&lt;p&gt;Now most people use Turtle, which is much simpler and also a &lt;a href=&#34;https://www.w3.org/TR/turtle/&#34;&gt;W3C standard&lt;/a&gt;. &lt;a href=&#34;https://en.wikipedia.org/wiki/Resource_Description_Framework#Serialization_formats&#34;&gt;Other syntaxes&lt;/a&gt; are available, including the increasingly popular JSON-LD. All the examples shown in this introduction use Turtle syntax.&lt;/p&gt;
&lt;h1 id=&#34;sparql&#34;&gt;SPARQL&lt;/h1&gt;
&lt;p&gt;SPARQL (&amp;ldquo;SPARQL Protocol and RDF Query Language&amp;rdquo;) is another W3C standard. The protocol part is usually only an issue for people writing programs that pass SPARQL queries back and forth between different machines.&lt;/p&gt;
&lt;p&gt;SPARQL queries typically use a Turtle-like syntax to describe patterns of what kinds of triples to retrieve from a dataset. The patterns often resemble Turtle triples but with variables serving as wildcards to add flexibility to the matching patterns and to store values that result from matches. The following query asks for the given name and family name  of everyone with a job title of &amp;ldquo;Vice President&amp;rdquo;:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;PREFIX  vcard: &amp;lt;http://www.w3.org/2006/vcard/ns#&amp;gt;
PREFIX  sn:    &amp;lt;http://www.snee.com/hr/&amp;gt;

SELECT ?givenName ?familyName
WHERE
  { ?employee vcard:title &amp;#34;Vice President&amp;#34; .
    ?employee vcard:given-name  ?givenName .
    ?employee vcard:family-name ?familyName .
  }
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;You can see more examples of simple SPARQL queries in the video &lt;a href=&#34;https://www.youtube.com/watch?v=FvGndkpa4K0&#34;&gt;SPARQL in 11 Minutes&lt;/a&gt;.&lt;/p&gt;
&lt;h1 id=&#34;triplestores&#34;&gt;Triplestores&lt;/h1&gt;
&lt;p&gt;A triplestore is a database manager for RDF triples. A wide choice of open source and commercial triplestores is available, some of which can store billions of triples. They typically offer both web-based graphical user interfaces and programmatic ways to add, edit, and retrieve data.&lt;/p&gt;
&lt;p&gt;The &amp;ldquo;P&amp;rdquo; for &amp;ldquo;Protocol&amp;rdquo; in &amp;ldquo;SPARQL&amp;rdquo; is the basis for some of the programmatic interfaces. This is yet another example of how tools for working with RDF are all based on open, published standards and supported by a broad range of implementations.&lt;/p&gt;
&lt;h1 id=&#34;data-integration&#34;&gt;Data Integration&lt;/h1&gt;
&lt;p&gt;The second sentence of the &lt;a href=&#34;https://www.w3.org/RDF/&#34;&gt;W3C RDF Overview&lt;/a&gt; page tells us that &amp;ldquo;RDF has features that facilitate data merging even if the underlying schemas differ, and it specifically supports the evolution of schemas over time without requiring all the data consumers to be changed&amp;rdquo;. At the simplest level, you can integrate two different RDF datasets by just concatenating the files together, assuming that both use a syntax such as Turtle or &lt;a href=&#34;https://www.w3.org/TR/n-triples/&#34;&gt;N-Triples&lt;/a&gt;. Loading multiple datasets into the same dataset of a triplestore, whether those datasets use the same syntax or not, is also easy and popular.&lt;/p&gt;
&lt;p&gt;This ease of data integration has been a big driver in RDF&amp;rsquo;s success as people convert data from various other formats and models to RDF in order to easily use the combination. (In an upcoming &amp;ldquo;What is RDFS?&amp;rdquo; blog entry I will describe how RDF Schema can define optional models that make this even easier.)&lt;/p&gt;
&lt;h1 id=&#34;the-semantic-web&#34;&gt;The Semantic Web&lt;/h1&gt;
&lt;p&gt;In the early days of RDF, the idea of sharing machine-readable data across the World Wide Web as the &amp;ldquo;Semantic Web&amp;rdquo; was popular to the point of being overhyped because it sometimes got mixed up in vague, old-school Artificial Intelligence ideas of machines &amp;ldquo;understanding&amp;rdquo; things. We saw above how to show that &amp;ldquo;title&amp;rdquo; was meant in the sense of &amp;ldquo;job title&amp;rdquo; instead of a label for a book; this indicates some of the meaning, or semantics, of the word in a useful, machine-readable way.&lt;/p&gt;
&lt;p&gt;In &amp;ldquo;What is RDFS?&amp;rdquo; we&amp;rsquo;ll see how triples can show that Heidi Smith is an instance of the Employee class, and how if Employee is a subclass of Person, then we can infer that Heidi is also an instance of Person and has the associated properties. OWL lets you do even more. These little bits of semantics can be very useful, but the hype around the possibilities of  a connected web of such semantics—and around this web&amp;rsquo;s potential destiny as a platform for end-user applications—led to the term &amp;ldquo;Semantic Web&amp;rdquo; falling out of fashion.&lt;/p&gt;
&lt;h1 id=&#34;linked-open-data&#34;&gt;Linked (Open) Data&lt;/h1&gt;
&lt;p&gt;There is no standard specification for what counts as Linked Data. Many point to a &lt;a href=&#34;https://www.w3.org/DesignIssues/LinkedData.html&#34;&gt;Design Issues document&lt;/a&gt; that web inventor and W3C Director Tim Berners-Lee wrote with the caveat &amp;ldquo;personal view only&amp;rdquo;. The document outlines some rules and best practices for sharing of machine-readable data across platforms.&lt;/p&gt;
&lt;p&gt;Below the document&amp;rsquo;s four rules of Linked Data is an enumeration of the &amp;ldquo;5 Stars of Linked Data&amp;rdquo; that reflects how I&amp;rsquo;ve seen the term widely used. It includes the possibility that a CSV file available on a web server can be considered Linked Data, if not 5 Star Linked Data, and this has appealed to many people who admire the ideas behind Linked Data but don&amp;rsquo;t necessarily like RDF in any syntax—especially in RDF/XML. In general, Linked Data puts more emphasis on the sharing of machine-readable data using URIs and URLs than on the syntax of the data itself.&lt;/p&gt;
&lt;p&gt;Many organizations have found that Linked Data principles for sharing data across platforms have benefited their own use of data integration behind their firewalls. Linked Open Data applies these principles to data shared with the world. Berners-Lee&amp;rsquo;s document describes Linked Open Data as &amp;ldquo;Linked Data which is released under an open licence, which does not impede its reuse for free&amp;rdquo;; this typically means data shared on the public Internet where everyone can access it.&lt;/p&gt;
&lt;p&gt;Whether your Linked Data is open or not, the on-line book &lt;a href=&#34;https://patterns.dataincubator.org/book/&#34;&gt;Linked Data Patterns&lt;/a&gt; by Leigh Dodds and Ian Davis is a great place to learn about best practices for sharing data using Linked Data principles.  Jonathan Blaney&amp;rsquo;s &lt;a href=&#34;https://programminghistorian.org/en/lessons/intro-to-linked-data&#34;&gt;Introduction to the Principles of Linked Open Data&lt;/a&gt; also provides some good background.&lt;/p&gt;
&lt;h1 id=&#34;knowledge-graphs&#34;&gt;Knowledge Graphs&lt;/h1&gt;
&lt;p&gt;We&amp;rsquo;ve seen  how RDF triples can combine into graphs. Graph data structures are older than computer science itself. The term &amp;ldquo;knowledge graph&amp;rdquo; has been around for a few years too, but it became especially popular after an engineering SVP at Google published &lt;a href=&#34;https://blog.google/products/search/introducing-knowledge-graph-things-not/&#34;&gt;Introducing the Knowledge Graph: things, not strings&lt;/a&gt; in 2012. After this, many people working with different kinds of graph data tools started saying &amp;ldquo;Google stores their data in a knowledge graph? So do we, and you can, too!&amp;rdquo; RDF-based systems store data in a graph and include many options for storing semantics, so they&amp;rsquo;re an excellent candidate for storing knowledge graphs. The ease of data integration is also appealing to people interested in knowledge graphs, who often want to merge multiple graphs into a whole that is greater than the sum of its parts. I wrote more about this at &lt;a href=&#34;http://www.bobdc.com/blog/knowledgegraphs/&#34;&gt;Knowledge Graphs!&lt;/a&gt;&lt;/p&gt;
&lt;h1 id=&#34;rdf-and-you&#34;&gt;RDF and You&lt;/h1&gt;
&lt;p&gt;If you first learned about RDF from one of the approaches described above, I hope that I&amp;rsquo;ve given you a broader context of what it has done and can do. It&amp;rsquo;s important to remember that RDF and SPARQL are open standards with many implementations in the commercial and open source worlds. Because of their popularity in the academic world, many accuse these standards of being limited to academia, but that&amp;rsquo;s just not true. &lt;a href=&#34;http://sparql.club/&#34;&gt;Brand-name companies&lt;/a&gt; all over the world are seeing the value and increasing their usage of these standards all the time.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;d like to close with a quote from &lt;a href=&#34;https://twitter.com/danbri&#34;&gt;Dan Brickley&lt;/a&gt; and &lt;a href=&#34;https://twitter.com/libbymiller&#34;&gt;Libby Miller&lt;/a&gt;&amp;rsquo;s book &lt;a href=&#34;http://book.validatingrdf.com/bookHtml005.html&#34;&gt;Validating RDF Data&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;People think RDF is a pain because it is complicated. The truth is even worse. RDF is painfully simplistic, but it allows you to work with real-world data and problems that are horribly complicated. While you can avoid RDF, it is harder to avoid complicated data and complicated computer problems.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Next time we&amp;rsquo;ll see how RDFS can help deal with some of those complications.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2021">2021</category>
      
      <category domain="https://www.bobdc.com//categories/rdf">RDF</category>
      
    </item>
    
    <item>
      <title>Calling your own JavaScript functions from SPARQL queries</title>
      <link>https://www.bobdc.com/blog/arqjavascript/</link>
      <pubDate>Sun, 23 May 2021 11:25:00 +0000</pubDate>
      
      <guid>https://www.bobdc.com/blog/arqjavascript/</guid>
      
      
      <description><div>More Jena arq fun.</div><div>&lt;img src=&#34;https://www.bobdc.com/img/main/sparqlAndJSLogos.png&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; /&gt;
&lt;p&gt;When I saw &amp;ldquo;Add support for scripting languages other than JavaScript&amp;rdquo; in the  &lt;a href=&#34;https://mail-archives.apache.org/mod_mbox/jena-users/202104.mbox/%3C05b4ad3b-0da8-4016-77b6-9aef7933da9d%40apache.org%3E&#34;&gt;Jena release 4.0.0 release notes&lt;/a&gt; my first reaction was &amp;ldquo;What? I can run the &lt;code&gt;arq&lt;/code&gt; command line SPARQL processor and call my own functions that I wrote in JavaScript?&amp;rdquo;&lt;/p&gt;
&lt;p&gt;The &lt;a href=&#34;https://jena.apache.org/documentation/query/javascript-functions.html&#34;&gt;ARQ - JavaScript SPARQL Functions&lt;/a&gt; page of the Jena documentation shows how to do this. I had some fun playing with this capability, and as you&amp;rsquo;ll see, it offers some easy opportunities to clean up and improve your data.&lt;/p&gt;
&lt;p&gt;First, let&amp;rsquo;s see how it looks on the command line to run &lt;code&gt;arq&lt;/code&gt; with a SPARQL query that calls external JavaScript functions. It&amp;rsquo;s basically a typical invocation of &lt;code&gt;arq&lt;/code&gt; with an additional &lt;code&gt;--set&lt;/code&gt; parameter to point at a file of JavaScript functions, which in this example is called &lt;code&gt;myjs.js&lt;/code&gt;:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;arq --set arq:js-library=myjs.js --query jstest.rq --data phoneNumbers.ttl
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;The data file that I used for my experiments simply lists a few people and their phone numbers. The &lt;code&gt;v:homeTel&lt;/code&gt; values use several different conventions for notating US phone numbers:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;@prefix v: &amp;lt;http://www.w3.org/2006/vcard/ns#&amp;gt; .
@prefix d: &amp;lt;http://learningsparql.com/ns/data#&amp;gt; .

d:i9771 v:given-name &amp;#34;Cindy&amp;#34; ;
        v:homeTel &amp;#34;1 (203) 446-5478&amp;#34; .

d:i0432 v:given-name &amp;#34;Richard&amp;#34; ;
        v:homeTel &amp;#34;   (729)556-5135   &amp;#34; .

d:i8301 v:given-name &amp;#34;Craig&amp;#34; ;
        v:homeTel &amp;#34;9232765135&amp;#34; .

d:i8309 v:given-name &amp;#34;Leigh&amp;#34; ;
        v:homeTel &amp;#34;843-5544&amp;#34; .
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;The query in &lt;code&gt;jstest.rq&lt;/code&gt; copies the triples and also does the following:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Passes the &lt;code&gt;v:homeTel&lt;/code&gt; value to a &lt;code&gt;normalizeUSPhoneNumber()&lt;/code&gt; function that I wrote in the &lt;code&gt;myjs.js&lt;/code&gt; file.&lt;/li&gt;
&lt;li&gt;Calls the &lt;code&gt;createRating()&lt;/code&gt; function in the same JavaScript file and passes the result to the CONSTRUCT clause, which puts the generated value in a &lt;code&gt;d:rating&lt;/code&gt; triple.&lt;/li&gt;
&lt;li&gt;Calls a JavaScript &lt;code&gt;Date()&lt;/code&gt; function directly (as opposed to calling it via something in &lt;code&gt;myjs.js&lt;/code&gt;) and assigns the returned value to an &lt;code&gt;?updateDate&lt;/code&gt; variable that also gets used in the CONSTRUCT clause.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Notice how all of the JavaScript function calls in the SPARQL query have a &lt;code&gt;js:&lt;/code&gt; prefix that is declared at the top like any other prefix. This is how &lt;code&gt;arq&lt;/code&gt; knows that these are external JavaScript functions.&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;# jstest.rq

PREFIX js: &amp;lt;http://jena.apache.org/ARQ/jsFunction#&amp;gt;
PREFIX d:  &amp;lt;http://learningsparql.com/ns/data#&amp;gt; 
PREFIX v:  &amp;lt;http://www.w3.org/2006/vcard/ns#&amp;gt; 

CONSTRUCT {
  ?s v:given-name ?name ; 
  v:homeTel ?normalizedUSPhoneNumber ;
  d:rating  ?starRating ;
  d:as-of   ?updateDate;
}
WHERE {
  ?s v:given-name ?name ;
  v:homeTel ?phoneNum .
  BIND (js:normalizeUSPhoneNumber(?phoneNum) AS ?normalizedUSPhoneNumber)
  BIND (js:createRating() AS ?starRating)
  BIND (js:Date() AS ?updateDate)  # calling JavaScript function directly
}
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;The JavaScript file defines two functions, both mentioned above:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;normalizeUSPhoneNumber()&lt;/code&gt; uses regular expressions to convert the phone number to an nnn-nnn-nnnn format if it has an area code and nnn-nnnn if it doesn&amp;rsquo;t. While SPARQL offers some support for regular expressions when you&amp;rsquo;re calculating a Boolean value to use in a FILTER expression, it doesn&amp;rsquo;t let you use regular expressions to manipulate values that can then be used in output, so I wanted to write a function that would demonstrate that.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;createRating()&lt;/code&gt; generates a random integer between one and five to demonstrate how we can call the &lt;code&gt;random()&lt;/code&gt; function to generate a number and then use other functions to massage that number into something we want.&lt;/li&gt;
&lt;/ul&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;// myjs.js

function normalizeUSPhoneNumber(phoneNumber) {
  phoneNumber = phoneNumber.replace(/ /g, &amp;#34;&amp;#34;)
    .replace(/^1/g,&amp;#34;&amp;#34;)
    .replace(/-/g,&amp;#34;&amp;#34;)
    .replace(/\(/g,&amp;#34;&amp;#34;)
    .replace(/\)/g,&amp;#34;&amp;#34;)
    .replace(/(\d\d\d\d$)/, &amp;#34;-$1&amp;#34;);
  if (phoneNumber.length &amp;gt; 10) {
     phoneNumber = phoneNumber.replace(/^(\d\d\d)/,&amp;#34;$1-&amp;#34;);
  }
  return phoneNumber;
}

function createRating() {
   return Math.ceil(Math.random()*5);
}
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Running the command line shown with these files gives us this output:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;@prefix d:     &amp;lt;http://learningsparql.com/ns/data#&amp;gt; .
@prefix v:     &amp;lt;http://www.w3.org/2006/vcard/ns#&amp;gt; .
@prefix js:    &amp;lt;http://jena.apache.org/ARQ/jsFunction#&amp;gt; .

d:i9771  d:as-of      &amp;#34;Mon May 10 2021 08:02:35 GMT-0400 (EDT)&amp;#34; ;
        d:rating      3 ;
        v:given-name  &amp;#34;Cindy&amp;#34; ;
        v:homeTel     &amp;#34;203-446-5478&amp;#34; .

d:i8309  d:as-of      &amp;#34;Mon May 10 2021 08:02:35 GMT-0400 (EDT)&amp;#34; ;
        d:rating      5 ;
        v:given-name  &amp;#34;Leigh&amp;#34; ;
        v:homeTel     &amp;#34;843-5544&amp;#34; .

d:i0432  d:as-of      &amp;#34;Mon May 10 2021 08:02:35 GMT-0400 (EDT)&amp;#34; ;
        d:rating      4 ;
        v:given-name  &amp;#34;Richard&amp;#34; ;
        v:homeTel     &amp;#34;729-556-5135&amp;#34; .

d:i8301  d:as-of      &amp;#34;Mon May 10 2021 08:02:35 GMT-0400 (EDT)&amp;#34; ;
        d:rating      2 ;
        v:given-name  &amp;#34;Craig&amp;#34; ;
        v:homeTel     &amp;#34;923-276-5135&amp;#34; .
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Running it more than once gives different values for &lt;code&gt;d:rating&lt;/code&gt; each time, as I had hoped. (You always want to double-check that with random functions.)&lt;/p&gt;
&lt;p&gt;I also wanted to demonstrate a filter condition with a function that takes multiple arguments and returns true or false, and that&amp;rsquo;s easy enough to do, but I couldn&amp;rsquo;t think of a good one that would do something that I couldn&amp;rsquo;t do in SPARQL. In SPARQL something like that might take up multiple lines of the query, so it would be more verbose, but still, comparing values in multiple variables to then set a Boolean as true or false is straightforward in standard SPARQL without calling some external function.&lt;/p&gt;
&lt;p&gt;Since writing this little demo I have already used this ability to call external JavaScript functions to clean up some data in another project the way I did with the phone numbers above. I had the SPARQL  query above  call &lt;code&gt;js:Date()&lt;/code&gt; directly to show that we &lt;em&gt;can&lt;/em&gt; call JavaScript functions directly from such queries; if I hadn&amp;rsquo;t, I would have the query call a new function in the &lt;code&gt;myjs.js&lt;/code&gt; file that called &lt;code&gt;js:Date()&lt;/code&gt; and then used regular expressions or some other string manipulation tools to trim the returned date value down or convert it to &lt;a href=&#34;https://en.wikipedia.org/wiki/ISO_8601&#34;&gt;ISO 8601&lt;/a&gt; format. It would be another good example of how this ability to call external JavaScript functions from a SPARQL query makes the excellent library of native JavaScript functions available to a SPARQL developer.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2021">2021</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
    </item>
    
    <item>
      <title>Hidden gems included  with Jena’s command line utilities</title>
      <link>https://www.bobdc.com/blog/jenagems/</link>
      <pubDate>Sun, 25 Apr 2021 11:58:00 +0000</pubDate>
      
      <guid>https://www.bobdc.com/blog/jenagems/</guid>
      
      
      <description><div>Lots of ways to manipulate your RDF from the open-source multiplatform tool kit</div><div>&lt;img id=&#34;idm45478314451696&#34; src=&#34;https://www.bobdc.com/img/main/gems.png&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; width=&#34;240&#34;/&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;#rdfdiff&#34;&gt;rdfdiff&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#shacl&#34;&gt;shacl&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#qparse&#34;&gt;qparse and uparse&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#rsparql&#34;&gt;rsparql&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#rupdate&#34;&gt;rupdate&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#rdfparse&#34;&gt;rdfparse&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#fusekiDatasets&#34;&gt;Working with Fuseki datasets from the command line&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;#dumpingDatasets&#34;&gt;Dumping dataset contents&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#queryingFuseki&#34;&gt;Querying a Fuseki dataset&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#updatingFuseki&#34;&gt;Updating a Fuseki dataset&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#loadingFile&#34;&gt;Loading a data file into a Fuseki dataset&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#otherUtilities&#34;&gt;Other command line utilities for Fuseki datasets&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#riot&#34;&gt;riot&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;#convertingSerializations&#34;&gt;Converting serializations&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#counting&#34;&gt;Counting triples&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#concatenating&#34;&gt;Concatenating&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#inferencing&#34;&gt;Inferencing&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;!-- use image at https://www.flickr.com/photos/britishlibrary/11124109165/in/photolist-hWZY6B-r7hQAt-rms3hu-21fbAhy-21fbzD9-hPL6xv-a6X2ti-a6X1uV-i693Gq-occXMz-i6giZM-r7byHQ-of1p5q-otFubP-oetU4D-i61mjq-orDGMN-otFAtt-ibRNTD-hSfrTv-ocbYhB-r5qAUV-oeugEV-hTA3Ja-otDSrw-a6ZUpE-qrKFxy-otpeJF-Lr3rcT-LhF6bM-xs7vto-KZEMEL-LgGbyL-Ksdrr1-Ksrpmc-2iCu4Gq-KZENc7-L9ShQ3-23XyZrT-La1ezu-KDqKoc-KumpMv-2jb97NZ-KDBEM6-L9Skfo-23XyZ9t-SqSUj3-L9Q463-Lo7i3C-KDesLN --&gt;
&lt;p&gt;On page 5 of my book &lt;a href=&#34;http://www.learningsparql.com&#34;&gt;Learning SPARQL&lt;/a&gt; I described how the open source RDF processing framework &lt;a href=&#34;https://jena.apache.org/&#34;&gt;Apache Jena&lt;/a&gt; includes command line utilities called &lt;code&gt;arq&lt;/code&gt; and &lt;code&gt;sparql&lt;/code&gt; that let you run SPARQL queries with a simple command line like this:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;arq --data mydata.ttl --query myquery.rq
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;At the time, the &lt;code&gt;arq&lt;/code&gt; one supported some SPARQL extensions that the &lt;code&gt;sparql&lt;/code&gt; one didn&amp;rsquo;t. I don&amp;rsquo;t even remember what they were and tended to use &lt;code&gt;arq&lt;/code&gt; just because the name is shorter. I have since learned that with support for the extensions being added to &lt;code&gt;sparql&lt;/code&gt;, there are now no particular differences between the two.&lt;/p&gt;
&lt;p&gt;Jena (which recently celebrated &lt;a href=&#34;https://mail-archives.apache.org/mod_mbox/jena-users/202104.mbox/%3C05b4ad3b-0da8-4016-77b6-9aef7933da9d%40apache.org%3E&#34;&gt;release 4.0.0&lt;/a&gt;) includes Linux and Windows versions of many other utilities in addition to &lt;code&gt;arq&lt;/code&gt; and &lt;code&gt;sparql&lt;/code&gt;. I&amp;rsquo;ve mentioned several here when I used one or another to accomplish a particular task, and I thought it would be nice to summarize some of the ones that I have and have not mentioned before. I may be repeating some earlier explanations, but it should be handy to have them in one place.&lt;/p&gt;
&lt;p&gt;You&amp;rsquo;ll find Linux utilities such as &lt;code&gt;arq&lt;/code&gt; and &lt;code&gt;shacl&lt;/code&gt; in Jena&amp;rsquo;s &lt;code&gt;bin&lt;/code&gt; directory and corresponding Windows utilities such as &lt;code&gt;arq.bat&lt;/code&gt; and &lt;code&gt;shacl.bat&lt;/code&gt; in its &lt;code&gt;bat&lt;/code&gt; directory.&lt;/p&gt;
&lt;p&gt;Remember that, like &lt;code&gt;arq&lt;/code&gt; and &lt;code&gt;sparql&lt;/code&gt;, many of these support additional command line parameters beyond the ones I show here. Use &lt;code&gt;--help&lt;/code&gt; with each to find out more. I tried to demo what I found to be the most useful about each.&lt;/p&gt;
&lt;p&gt;You can find more background about some of these utilities on the Jena documentation pages &lt;a href=&#34;https://jena.apache.org/documentation/query/cmds.html&#34;&gt;ARQ - Command Line Applications&lt;/a&gt; (which covers more than just &lt;code&gt;arq&lt;/code&gt;) and the &amp;ldquo;Command line tools&amp;rdquo; section of the &lt;a href=&#34;https://jena.apache.org/documentation/io/&#34;&gt;Reading and Writing RDF in Apache Jena&lt;/a&gt; page.&lt;/p&gt;
&lt;p&gt;And thanks to &lt;a href=&#34;https://twitter.com/AndySeaborne&#34;&gt;Andy Seaborne&lt;/a&gt; for reviewing a draft of this!&lt;/p&gt;
&lt;h1 id=&#34;rdfdiff&#34;&gt;rdfdiff&lt;/h1&gt;
&lt;p&gt;Use the &lt;code&gt;rdfdiff&lt;/code&gt; utility to compare two dataset files. It&amp;rsquo;s like the venerable UNIX command &lt;a href=&#34;https://www.man7.org/linux/man-pages/man1/diff.1.html&#34;&gt;&lt;code&gt;diff&lt;/code&gt;&lt;/a&gt;, except that it looks for different triples instead of lines. The order of the input triples doesn&amp;rsquo;t matter to &lt;code&gt;rdfdiff&lt;/code&gt;, and it can compare data files in different serializations. For example, here is a little RDF/XML file:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-xml&#34; data-lang=&#34;xml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;&amp;lt;!-- joereceiving.rdf --&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;rdf:RDF&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#a6e22e&#34;&gt;xmlns:rdf=&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;http://www.w3.org/1999/02/22-rdf-syntax-ns#&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#a6e22e&#34;&gt;xmlns:d=&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;http://whatever/&amp;#34;&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;&amp;gt;&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;rdf:Description&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;rdf:about=&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;http://whatever/emp3&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#f92672&#34;&gt;&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;d:dept&amp;gt;&lt;/span&gt;receiving&lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;/d:dept&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;d:name&amp;gt;&lt;/span&gt;joe&lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;/d:name&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;d:insurance&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;rdf:resource=&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;http://www.uhc.com&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#f92672&#34;&gt;/&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;/rdf:Description&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;/rdf:RDF&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Here is a Turtle file with roughly the same information:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;# joereceiving.ttl

@prefix w: &amp;lt;http://whatever/&amp;gt; .

w:emp3 w:name &amp;#34;Joseph&amp;#34; ;
       w:dept &amp;#34;receiving&amp;#34; ;
       w:insurance &amp;lt;http://www.uhc.com&amp;gt; .
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;I ran this command to compare the two, also including the names of their formats:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;rdfdiff joereceiving.rdf joereceiving.ttl RDF/XML TURTLE
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;I got this output:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;&amp;lt; [http://whatever/emp3, http://whatever/name, &amp;#34;joe&amp;#34;]
&amp;gt; [http://whatever/emp3, http://whatever/name, &amp;#34;Joseph&amp;#34;]
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Like the text file comparison utility &lt;code&gt;diff&lt;/code&gt;, the report uses &lt;code&gt;&amp;lt;&lt;/code&gt; as a prefix to show you what was in the first file but not the second and &lt;code&gt;&amp;gt;&lt;/code&gt; to show you what was in the second but not the first.&lt;/p&gt;
&lt;p&gt;As with many other Jena utilities, you can use the URL of a remote file instead of the name of a local file for either or both of the first two arguments.&lt;/p&gt;
&lt;h1 id=&#34;shacl&#34;&gt;shacl&lt;/h1&gt;
&lt;p&gt;In &lt;a href=&#34;../validating-rdf-data-with-shacl/&#34;&gt;Validating RDF data with SHACL&lt;/a&gt; I described how to use an open source tool developed by &lt;a href=&#34;http://www.topquadrant.com&#34;&gt;TopQuadrant&lt;/a&gt; to validate RDF data against constraints on that data that are described using the W3C SHACL standard. Jena includes a &lt;a href=&#34;https://jena.apache.org/documentation/shacl/index.html&#34;&gt;&lt;code&gt;shacl&lt;/code&gt;&lt;/a&gt; utility to do the same kind of validation, and when running this with the &lt;a href=&#34;http://snee.com/bobdc.blog/files/employees.ttl&#34;&gt;&lt;code&gt;employees.ttl&lt;/code&gt;&lt;/a&gt; file that that blog entry links to, all of my examples described there work with Jena &lt;code&gt;shacl&lt;/code&gt; as well.&lt;/p&gt;
&lt;p&gt;Because the &lt;code&gt;employees.ttl&lt;/code&gt; file had class definitions, instance data, and SHACL shapes all defined within that one file, I passed that filename as both the &lt;code&gt;--data&lt;/code&gt; and &lt;code&gt;--shapes&lt;/code&gt; parameter when I ran this command line tool:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;shacl validate --data employees.ttl --shapes employees.ttl
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;It found all of my test constraint violations:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;After I uncommented the data&amp;rsquo;s &lt;code&gt;e2&lt;/code&gt; example, &lt;code&gt;shacl&lt;/code&gt; reported that it was missing the required &lt;code&gt;hr:jobGrade&lt;/code&gt; value.&lt;/li&gt;
&lt;li&gt;After I uncommented the &lt;code&gt;e3&lt;/code&gt; example, it reported that its &lt;code&gt;hr:jobGrade&lt;/code&gt; value was not an integer.&lt;/li&gt;
&lt;li&gt;After I uncommented the &lt;code&gt;e4&lt;/code&gt; example, it reported that its &lt;code&gt;hr:jobGrade&lt;/code&gt; value fell out of the allowed range.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;As &lt;a href=&#34;https://www.w3.org/TR/shacl/#validation-report&#34;&gt;the SHACL specification requires&lt;/a&gt;, the validation reports produced by &lt;code&gt;shacl&lt;/code&gt; were themselves sets of triples, whether it found violations or not. This makes it easier to fit the tool into an RDF processing pipeline.&lt;/p&gt;
&lt;p&gt;Adding &lt;code&gt;-v&lt;/code&gt; for &amp;ldquo;verbose&amp;rdquo; after &lt;code&gt;shacl validate&lt;/code&gt; in that command line adds additional information to the output.&lt;/p&gt;
&lt;p&gt;The utility&amp;rsquo;s &lt;code&gt;print&lt;/code&gt; option outputs the rules in the file. It can do this as regular RDF, &lt;a href=&#34;https://w3c.github.io/shacl/shacl-compact-syntax/&#34;&gt;compact SHACL syntax&lt;/a&gt; (surprisingly useful if you have a lot of rules), or the default: a simple text representation.&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;shacl print --out=RDF employees.ttl     # out=RDF, compact, or text
&lt;/code&gt;&lt;/pre&gt;&lt;span id=&#34;qparse&#34;/&gt;
&lt;h1 id=&#34;qparse-and-uparse&#34;&gt;qparse and uparse&lt;/h1&gt;
&lt;p&gt;The &lt;code&gt;qparse&lt;/code&gt; utility parses a query and can do various things with it as described by its &lt;code&gt;--help&lt;/code&gt; option. I recently learned that it can pretty-print queries, so if the spacing and indentation of a query that you&amp;rsquo;re trying to understand is a mess, &lt;code&gt;qparse&lt;/code&gt; can make it easier to understand and even capitalize keywords and add line numbers.&lt;/p&gt;
&lt;p&gt;Here is a sloppily formatted little query:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;# namedept.rq
prefix w: &amp;lt;http://whatever/&amp;gt; Select
* WHERE { ?s w:name ?name . optiONAL {       ?s w:dept ?dept } }
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;I run this command,&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;qparse --query namedept.rq
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;and I get this output:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;PREFIX  w:    &amp;lt;http://whatever/&amp;gt;

SELECT  *
WHERE
  { ?s  w:name  ?name
    OPTIONAL
      { ?s  w:dept  ?dept }
  }
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Adding &lt;code&gt;--num&lt;/code&gt; to the command line would add line numbers to the output.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;uparse&lt;/code&gt; utility can do the same thing for update queries. The following pretty-prints the file &lt;code&gt;updatetest.ru&lt;/code&gt;:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;uparse --file=updatetest.ru
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Further documentation about both commands is available in the &lt;a href=&#34;https://jena.apache.org/documentation/query/cmds.html&#34;&gt;Jena documentation&lt;/a&gt;.&lt;/p&gt;
&lt;h1 id=&#34;rsparql&#34;&gt;rsparql&lt;/h1&gt;
&lt;p&gt;This sends a local query to a SPARQL endpoint specified with a URL. I would typically use &lt;a href=&#34;https://curl.se/&#34;&gt;&lt;code&gt;curl&lt;/code&gt; &lt;/a&gt; for this, but after reviewing the &lt;code&gt;--help&lt;/code&gt; options for &lt;code&gt;rsparql&lt;/code&gt; I see that it makes it easier to specify that you want the results in  text, XML, JSON, CSV, or TSV. When sending a SPARQL query with &lt;code&gt;curl&lt;/code&gt;, you can&amp;rsquo;t assume that the endpoint supports all of these result formats, and you probably have to look up their &lt;a href=&#34;https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/MIME_types/Common_types&#34;&gt;mime types&lt;/a&gt;, because I certainly haven&amp;rsquo;t memorized them.&lt;/p&gt;
&lt;p&gt;The following sends the SPARQL query in the &lt;code&gt;5triples.rq&lt;/code&gt; file to the Wikidata endpoint and then outputs the results at the command line:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;rsparql --query 5triples.rq --service=https://query.wikidata.org/sparql
&lt;/code&gt;&lt;/pre&gt;&lt;h1 id=&#34;rupdate&#34;&gt;rupdate&lt;/h1&gt;
&lt;p&gt;This send a local update query to a SPARQL endpoint specified with a URL. It will have to be one where you have update permission, which may well be a locally running copy of Fuseki. The following executes the update request stored in &lt;code&gt;updatetest.ru&lt;/code&gt; on the test1 dataset in the locally running copy of Fuseki (assuming that &lt;code&gt;fuseki-server&lt;/code&gt; was started up with the &lt;code&gt;--update&lt;/code&gt; parameter, as described below):&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;rupdate --service=http://localhost:3030/test1 --update=updatetest.ru
&lt;/code&gt;&lt;/pre&gt;&lt;h1 id=&#34;rdfparse&#34;&gt;rdfparse&lt;/h1&gt;
&lt;p&gt;This parses an RDF/XML document. People don&amp;rsquo;t use RDF/XML much anymore, and with good reason, but if you find any RDF/XML this is a simple way to convert it. The &lt;code&gt;riot&lt;/code&gt; utility, described below, is even better, but I especially like the &lt;code&gt;-R&lt;/code&gt; switch available with &lt;code&gt;rdfparse&lt;/code&gt;; this tells it to search through an arbitrary XML document and extract any triples stored within embedded &lt;code&gt;rdf:RDF&lt;/code&gt; elements. That can be great for processing some RDF that was embedded into XML before &lt;a href=&#34;http://www.bobdc.com/blog/json-ld/&#34;&gt;JSON-LD&lt;/a&gt; or even &lt;a href=&#34;http://www.bobdc.com/blog/rdfa-can-be-so-simple/&#34;&gt;RDFa&lt;/a&gt; were around.  Here&amp;rsquo;s a nice arbitrary XML document that I called &lt;code&gt;xproduct1.xml&lt;/code&gt;:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-xml&#34; data-lang=&#34;xml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;myDoc&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;header&amp;gt;&amp;lt;whatev/&amp;gt;&amp;lt;/header&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;rdf:RDF&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#a6e22e&#34;&gt;xmlns:rdf=&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;http://www.w3.org/1999/02/22-rdf-syntax-ns#&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#a6e22e&#34;&gt;xmlns:d=&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;http://whatever/&amp;#34;&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;&amp;gt;&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;rdf:Description&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;rdf:about=&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;http://whatever/emp1&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#f92672&#34;&gt;&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;d:dept&amp;gt;&lt;/span&gt;shipping&lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;/d:dept&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;d:name&amp;gt;&lt;/span&gt;jane&lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;/d:name&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;/rdf:Description&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;/rdf:RDF&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;arbitraryElement/&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;rdf:RDF&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#a6e22e&#34;&gt;xmlns:rdf=&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;http://www.w3.org/1999/02/22-rdf-syntax-ns#&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#a6e22e&#34;&gt;xmlns:d=&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;http://whatever/&amp;#34;&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;&amp;gt;&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;rdf:Description&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;rdf:about=&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;http://whatever/emp3&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#f92672&#34;&gt;&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;d:dept&amp;gt;&lt;/span&gt;receiving&lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;/d:dept&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;d:name&amp;gt;&lt;/span&gt;joe&lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;/d:name&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;/rdf:Description&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;/rdf:RDF&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;/myDoc&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;I run the following command,&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;rdfparse -R xproduct1.xml 
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;and it produces this nice ntriples output:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;http://whatever/emp1&amp;gt; &amp;lt;http://whatever/dept&amp;gt; &amp;#34;shipping&amp;#34; .
&amp;lt;http://whatever/emp1&amp;gt; &amp;lt;http://whatever/name&amp;gt; &amp;#34;jane&amp;#34; .
&amp;lt;http://whatever/emp3&amp;gt; &amp;lt;http://whatever/dept&amp;gt; &amp;#34;receiving&amp;#34; .
&amp;lt;http://whatever/emp3&amp;gt; &amp;lt;http://whatever/name&amp;gt; &amp;#34;joe&amp;#34; .
&lt;/code&gt;&lt;/pre&gt;&lt;h1 id=&#34;fusekiDatasets&#34;&gt;Working with Fuseki datasets from the command line&lt;/h1&gt;
&lt;p&gt;Jena includes several utilities that let you work with datasets created using Jena&amp;rsquo;s &lt;a href=&#34;https://jena.apache.org/documentation/fuseki2/&#34;&gt;Fuseki&lt;/a&gt; SPARQL server. Their ability to load and update data can be very helpful in an automated system that uses Fuseki as its backend data store.&lt;/p&gt;
&lt;p&gt;To create some of this data to test with, I used the following command to start up Fuseki in a mode that would allow updates to data that it was storing:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;fuseki-server --update
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;When you go to Fuseki&amp;rsquo;s GUI interface at http://localhost:3030 and tell it that you want to create a new dataset, you have to choose between three types of dataset: in-memory ones that will not persist from session to session, &amp;ldquo;Persistent&amp;rdquo; ones that use the older TDB format, and &amp;ldquo;Persistent (TDB2)&amp;rdquo; ones that use the more advanced TDB2 format. For my examples below I just created TDB2 datasets. TDB versions of the commands are also included with Jena, but if you&amp;rsquo;re creating a new dataset, you may as well use TDB2.&lt;/p&gt;
&lt;p&gt;Most of these utilities expect you to specify a path to an assembler file to tell those utilities which Fuseki dataset to operate on. I never tried making my way through the &lt;a href=&#34;https://jena.apache.org/documentation/assembler/assembler-howto.html&#34;&gt;Jena Assembler howto&lt;/a&gt; documentation, but I recently noticed that Fuseki creates assembler files for us, so I don&amp;rsquo;t have to worry about their structure and syntax because I can have Fuseki make them for me. When I used Fuseki&amp;rsquo;s GUI to create a TDB2 dataset called test1, Fuseki created the assembler file &lt;code&gt;apache-jena-fuseki/run/configuration/test1.ttl&lt;/code&gt;, so I knew where to point the command line utilities.&lt;/p&gt;
&lt;p&gt;These command line tools won&amp;rsquo;t work with the Fuseki datasets if you have Fuseki running because Fuseki locks the files. My examples below assume that I have  created the test1 dataset describe above, used the web-based interface to upload data to it (although, as we&amp;rsquo;ll see, this can be done with command line tools as well), and then shut down the Fuseki server.&lt;/p&gt;
&lt;p&gt;Additional information about these commands is available at &lt;a href=&#34;https://jena.apache.org/documentation/tdb2/tdb2_cmds.html&#34;&gt;TDB2 - Command Line Tools&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;dumpingDatasets&#34;&gt;Dumping dataset contents&lt;/h2&gt;
&lt;p&gt;The following command showed me the contents of that TDB2 dataset at the command line:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;tdb2.tdbdump --tdb ../../apache-jena-fuseki/run/configuration/test1.ttl 
&lt;/code&gt;&lt;/pre&gt;&lt;h2 id=&#34;queryingFuseki&#34;&gt;Querying a Fuseki dataset&lt;/h2&gt;
&lt;p&gt;With a SPARQL query stored in &lt;code&gt;myquery.rq&lt;/code&gt;, this command queries the test1 dataset and outputs the results at the command line:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;tdb2.tdbquery --tdb ../../apache-jena-fuseki/run/configuration/test1.ttl --query myquery.rq
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Setting of the output format is similar to doing it with &lt;code&gt;arq&lt;/code&gt;. Run &lt;code&gt;tdb2.tdbquery --help&lt;/code&gt; to find out more.&lt;/p&gt;
&lt;h2 id=&#34;updatingFuseki&#34;&gt;Updating a Fuseki dataset&lt;/h2&gt;
&lt;p&gt;With  the file &lt;code&gt;updatetest.ru&lt;/code&gt; storing a SPARQL INSERT update request that inserts a single triple, the following command didn&amp;rsquo;t show anything at the command line,&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;tdb2.tdbupdate --tdb ../../apache-jena-fuseki/run/configuration/test1.ttl --update updatetest.ru
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;but when I restarted the Fuseki server and used the web-based interface to query dataset test1 for all of its triples, I saw the triple inserted by the &lt;code&gt;updatetest.ru&lt;/code&gt; query in there with the triples that had been in there before.&lt;/p&gt;
&lt;h2 id=&#34;loadingFile&#34;&gt;Loading a data file into a Fuseki dataset&lt;/h2&gt;
&lt;p&gt;The following loaded the triples in the file &lt;code&gt;furniture.ttl&lt;/code&gt; into the test1 dataset (which I confirmed the same way I did with my previous example) and displayed some status messages:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;tdb2.tdbloader --tdb ../../apache-jena-fuseki/run/configuration/test1.ttl furniture.ttl
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;It&amp;rsquo;s best to make sure that there are no parsing problems with the file you load before you load it. A quick way to do that is with the &lt;code&gt;--validate&lt;/code&gt; parameter of the &lt;code&gt;riot&lt;/code&gt; command:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;riot --validate furniture.ttl
&lt;/code&gt;&lt;/pre&gt;&lt;h2 id=&#34;otherUtilities&#34;&gt;Other command line utilities for Fuseki datasets&lt;/h2&gt;
&lt;p&gt;The following commands all work on the dataset  whose assembler file you point to with the &lt;code&gt;--tdb&lt;/code&gt; parameter:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;tdb2.tdbstats&lt;/code&gt; outputs a LISPy set of parenthesized expressions telling you about the dataset.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;tdb2.tdbbackup&lt;/code&gt; creates a gzipped copy of the dataset&amp;rsquo;s triples.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;I tried &lt;code&gt;tdb2.tdbcompact&lt;/code&gt; and got a status message of &amp;ldquo;Compacted in 0.570s&amp;rdquo;; someday I&amp;rsquo;ll try this with a larger dataset to really investigate the effect.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h1 id=&#34;riot&#34;&gt;riot&lt;/h1&gt;
&lt;p&gt;Jena includes many command line utilities that I won&amp;rsquo;t describe here because &lt;code&gt;riot&lt;/code&gt; (&amp;ldquo;RDF I/O Technology&amp;rdquo;) combines them all into one utility that I have been using more and more lately. I mentioned in &lt;a href=&#34;http://www.bobdc.com/blog/turtlefromgooglekg/&#34;&gt;Pulling Turtle RDF triples from the Google Knowledge Graph&lt;/a&gt; how it can accept triples via standard input, which was great for the use case that I described there of converting Google Knowledge Graph JSON-LD to Turtle triples on the fly.&lt;/p&gt;
&lt;p&gt;We&amp;rsquo;ve already seen another nice use of &lt;code&gt;riot&lt;/code&gt; above: validating a file of triples before loading it into dataset stored on a server.&lt;/p&gt;
&lt;h2 id=&#34;convertingSerializations&#34;&gt;Converting serializations&lt;/h2&gt;
&lt;p&gt;To simply convert an RDF file from one serialization to another, use the &lt;code&gt;riot&lt;/code&gt; &lt;code&gt;--output&lt;/code&gt; parameter to name the new serialization:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;riot --output=JSONLD emps.ttl
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;The Jena utilities &lt;code&gt;nquads&lt;/code&gt;, &lt;code&gt;ntriples&lt;/code&gt;, &lt;code&gt;rdfxml&lt;/code&gt;, &lt;code&gt;trig&lt;/code&gt;, and &lt;code&gt;turtle&lt;/code&gt; are all specialized versions of &lt;code&gt;riot&lt;/code&gt; that produce the named serializations with no need for an &lt;code&gt;--output&lt;/code&gt; parameter.&lt;/p&gt;
&lt;span id=&#34;counting&#34;/&gt;
&lt;h2 id=&#34;counting-triples&#34;&gt;Counting triples&lt;/h2&gt;
&lt;p&gt;When I want to know how many triples are in a Turtle file, here&amp;rsquo;s what I usually do:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Look around my hard disk for a query file that uses COUNT to count all the triples.&lt;/li&gt;
&lt;li&gt;Give up looking.&lt;/li&gt;
&lt;li&gt;Look up the COUNT syntax in my book &amp;ldquo;Learning SPARQL&amp;rdquo;.&lt;/li&gt;
&lt;li&gt;Write another query file for counting all the triples.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Now I can just use &lt;code&gt;riot&lt;/code&gt; with this simple command line:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;riot --count furniture.ttl
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;It also works with quads.&lt;/p&gt;
&lt;span id=&#34;concatenating&#34;/&gt;
&lt;h2 id=&#34;concatenating&#34;&gt;Concatenating&lt;/h2&gt;
&lt;p&gt;Jena includes an &lt;code&gt;rdfcat&lt;/code&gt; utility that outputs the concatenated contents of any data files listed on its command line. First, it outputs a header that says &amp;ldquo;DEPRECATED: Please use &amp;lsquo;riot&amp;rsquo; instead&amp;rdquo;. Providing multiple data file names as arguments when running &lt;code&gt;riot&lt;/code&gt; (I think I just got &lt;a href=&#34;https://idioms.thefreedictionary.com/run+riot&#34;&gt;another pun&lt;/a&gt; of the name) will by default output an ntriples version of their concatenated triples with status messages showing where each one starts. Adding &lt;code&gt;--quiet&lt;/code&gt; suppresses the status messages, and &lt;code&gt;--output&lt;/code&gt; lets you specify a different output serialization.&lt;/p&gt;
&lt;h2 id=&#34;inferencing&#34;&gt;Inferencing&lt;/h2&gt;
&lt;p&gt;Jena includes an &lt;code&gt;infer&lt;/code&gt;utility that does inferencing from an RDFS model, but I no longer bother with it because &lt;code&gt;riot&lt;/code&gt; can do this as well.
The following little RDFS model shows that two properties from the Oracle and Microsoft sample relational databases are subproperties of similar schema.org properties:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;# empmodel.ttl
@prefix rdfs:     &amp;lt;http://www.w3.org/2000/01/rdf-schema#&amp;gt; . 
@prefix schema:   &amp;lt;http://schema.org/&amp;gt; . 
@prefix oraclehr: &amp;lt;http://snee.com/vocab/schema/OracleHR#&amp;gt; .
@prefix nw:       &amp;lt;http://snee.com/vocab/schema/SQLServerNorthwind#&amp;gt; .

oraclehr:employees_first_name rdfs:subPropertyOf schema:givenName  . 
oraclehr:employees_last_name  rdfs:subPropertyOf schema:familyName . 
nw:employees_FirstName        rdfs:subPropertyOf schema:givenName  . 
nw:employees_LastName         rdfs:subPropertyOf schema:familyName . 
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Here is some data using the Oracle and Microsoft properties:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;# emps.ttl
@prefix rdfs:     &amp;lt;http://www.w3.org/2000/01/rdf-schema#&amp;gt; . 
@prefix schema:   &amp;lt;http://schema.org/&amp;gt; . 
@prefix oraclehr: &amp;lt;http://snee.com/vocab/schema/OracleHR#&amp;gt; .
@prefix nw:       &amp;lt;http://snee.com/vocab/schema/SQLServerNorthwind#&amp;gt; .

oraclehr:employees_100 oraclehr:employees_last_name &amp;#34;King&amp;#34; ;
    oraclehr:employees_first_name &amp;#34;Steven&amp;#34; .

nw:employees_2 nw:employees_LastName &amp;#34;Fuller&amp;#34; ;
    nw:employees_FirstName &amp;#34;Andrew&amp;#34; . 
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;This command tells &lt;code&gt;riot&lt;/code&gt; to do inferencing on &lt;code&gt;emps.ttl&lt;/code&gt; using the RDFS modeling in &lt;code&gt;empmodel.ttl&lt;/code&gt;:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;riot --rdfs empmodel.ttl emps.ttl
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;And here is the ntriples result with spaces added for more readability:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;&amp;lt;http://snee.com/vocab/schema/OracleHR#employees_100&amp;gt;
  &amp;lt;http://snee.com/vocab/schema/OracleHR#employees_last_name&amp;gt; &amp;#34;King&amp;#34; .
  
&amp;lt;http://snee.com/vocab/schema/OracleHR#employees_100&amp;gt;
  &amp;lt;http://schema.org/familyName&amp;gt; &amp;#34;King&amp;#34; .
  
&amp;lt;http://snee.com/vocab/schema/OracleHR#employees_100&amp;gt;
  &amp;lt;http://snee.com/vocab/schema/OracleHR#employees_first_name&amp;gt; &amp;#34;Steven&amp;#34; .
  
&amp;lt;http://snee.com/vocab/schema/OracleHR#employees_100&amp;gt;
  &amp;lt;http://schema.org/givenName&amp;gt; &amp;#34;Steven&amp;#34; .
  
&amp;lt;http://snee.com/vocab/schema/SQLServerNorthwind#employees_2&amp;gt;
  &amp;lt;http://snee.com/vocab/schema/SQLServerNorthwind#employees_LastName&amp;gt; &amp;#34;Fuller&amp;#34; .
  
&amp;lt;http://snee.com/vocab/schema/SQLServerNorthwind#employees_2&amp;gt;
  &amp;lt;http://schema.org/familyName&amp;gt; &amp;#34;Fuller&amp;#34; .
  
&amp;lt;http://snee.com/vocab/schema/SQLServerNorthwind#employees_2&amp;gt;
  &amp;lt;http://snee.com/vocab/schema/SQLServerNorthwind#employees_FirstName&amp;gt; &amp;#34;Andrew&amp;#34; .
  
&amp;lt;http://snee.com/vocab/schema/SQLServerNorthwind#employees_2&amp;gt;
  &amp;lt;http://schema.org/givenName&amp;gt; &amp;#34;Andrew&amp;#34; .
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;The new triples show that these employees have schema.org properties in addition to the original OracleHR and Northwind properties. This ability makes this kind of inferencing great for data integration, as I described in &lt;a href=&#34;http://www.bobdc.com/blog/driving-hadoop-data-integratio/&#34;&gt;Driving Hadoop data integration with standards-based models instead of code&lt;/a&gt;. (In that I used the Python libray &lt;a href=&#34;https://rdflib.dev/&#34;&gt;rdflib&lt;/a&gt; to do the same kind of inferencing, but that&amp;rsquo;s the beauty of standards—having a choice of tools to implement the same expected behavior.)&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2021">2021</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
    </item>
    
    <item>
      <title>Pulling Turtle RDF triples from the Google Knowledge Graph</title>
      <link>https://www.bobdc.com/blog/turtlefromgooglekg/</link>
      <pubDate>Sun, 28 Mar 2021 13:05:00 +0000</pubDate>
      
      <guid>https://www.bobdc.com/blog/turtlefromgooglekg/</guid>
      
      
      <description><div>Even querying by type!</div><div>&lt;p&gt;When I wrote about my first deep dive into &lt;a href=&#34;../knowledgegraphs/&#34;&gt;Knowledge Graphs&lt;/a&gt;, I mentioned that although the term was around well before 2012, the idea of a Knowledge Graph was blessed as an official Google thing that year when one of their engineering SVPs published the article  &lt;a href=&#34;https://blog.google/products/search/introducing-knowledge-graph-things-not/&#34;&gt;Introducing the Knowledge Graph: things, not strings&lt;/a&gt;. This blessing gave some focus to many members of the graph database community because they could say that what they had been doing was similar, if not the same, as what Google was doing.&lt;/p&gt;
&lt;p&gt;I still didn&amp;rsquo;t think of the Google Knowledge Graph as a specific thing, but as more of a marketing term describing a set of technologies, like IBM&amp;rsquo;s Watson. I have changed my mind: in Pascal Hitzler&amp;rsquo;s &lt;a href=&#34;https://cacm.acm.org/magazines/2021/2/250085-a-review-of-the-semantic-web-field/fulltext&#34;&gt;A Review of the Semantic Web Field&lt;/a&gt; in the Communications of the ACM I learned that there is an actual, RESTful &lt;a href=&#34;https://developers.google.com/knowledge-graph&#34;&gt;Google Knowledge Graph Search API&lt;/a&gt;, and I&amp;rsquo;ve been having some fun pulling Turtle RDF triples out of it.&lt;/p&gt;
&lt;p&gt;That Google page demonstrates what you can put in a URL to request JSON-LD data from their Knowledge Graph. Their first example sends a search for &amp;ldquo;Taylor Swift&amp;rdquo;; below I have used that example with &lt;a href=&#34;https://curl.se/&#34;&gt;curl&lt;/a&gt; and piped the output through the &lt;a href=&#34;https://jena.apache.org/documentation/io/&#34;&gt;Jena riot&lt;/a&gt; command line utility (not to be confused with &lt;a href=&#34;http://jennariot.com/&#34;&gt;DJ Jenna Riot&lt;/a&gt;, who I just learned about in a web search) so that I could get Turtle triples of the result. I won&amp;rsquo;t even bother showing the JSON-LD version here because I can get the Turtle version with this single command:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;curl \
  &amp;#34;https://kgsearch.googleapis.com/v1/entities:search?query=taylor+swift&amp;amp;key=API_KEY&amp;amp;limit=1&amp;amp;indent=True&amp;#34; \
  | riot --syntax=JSONLD --output=turtle
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Two notes about this command line:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;I substituted my own API key for &amp;ldquo;API_KEY&amp;rdquo; above. You can get your own at &lt;a href=&#34;https://developers.google.com/maps/documentation/javascript/get-api-key&#34;&gt;API Key&lt;/a&gt; by filling out a few forms.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;When you feed RDF to riot, it can usually guess the serialization from the end of the input filename, but when piping data to it from stdout like I do above, you need the &lt;code&gt;--syntax&lt;/code&gt; parameter to tell it what flavor of RDF you are feeding it.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;That command gave me 14 triples, including these:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;&amp;lt;http://g.co/kg/m/0dl567&amp;gt;
        &amp;lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#type&amp;gt;  &amp;lt;http://schema.org/Thing&amp;gt; ;
        &amp;lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#type&amp;gt;  &amp;lt;http://schema.org/Person&amp;gt; ;
        goog:detailedDescription  _:b2 ;
        &amp;lt;http://schema.org/description&amp;gt;  &amp;#34;American singer&amp;#34; ;
        &amp;lt;http://schema.org/name&amp;gt;  &amp;#34;Taylor Swift&amp;#34; ;
        &amp;lt;http://schema.org/url&amp;gt;   &amp;#34;http://www.taylorswift.com/&amp;#34; .

_:b2    &amp;lt;http://schema.org/articleBody&amp;gt;  &amp;#34;Taylor Alison Swift is an American 
            singer-songwriter. Her narrative songwriting, which often takes 
            inspiration from her personal life, has received widespread 
            critical praise and media coverage.\n&amp;#34; ;
        &amp;lt;http://schema.org/url&amp;gt;  &amp;#34;https://en.wikipedia.org/wiki/Taylor_Swift&amp;#34; .
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;The &lt;a href=&#34;https://en.wikipedia.org/wiki/Freebase_(database)&#34;&gt;Wikipedia page&lt;/a&gt; for the now-defunct Freebase database tells us that &amp;ldquo;On 16 December 2015, Google officially announced the Knowledge Graph API, which is meant to be a replacement to the Freebase API&amp;rdquo;, so I&amp;rsquo;ve been missing out on this for a while. The Taylor Swift data above includes an interesting bit of the Freebase legacy: the local name of the URI used to represent her as a resource in the Google Knowledge Graph  is &lt;code&gt;m/0d1567&lt;/code&gt;, which we can see on her &lt;a href=&#34;https://www.wikidata.org/wiki/Q26876&#34;&gt;Wikidata page&lt;/a&gt; was the identifier that Freebase used for her. For people, places, and things that were not represented in Freebase at the time that Freebase shut down in 2016 (for example,  &lt;a href=&#34;https://www.wikidata.org/wiki/Q62591281&#34;&gt;Lil Nas X&lt;/a&gt;, whose Wikipedia page shows no Freebase identifier and says that he has been active since 2018) I assume that some Google algorithm just generates new identifiers in their Knowledge Graph.&lt;/p&gt;
&lt;h1 id=&#34;more-query-api-options&#34;&gt;More query API options&lt;/h1&gt;
&lt;p&gt;You can pick apart the URL with the Taylor Swift query and then reassemble it with new pieces using the Google Knowledge Graph &lt;a href=&#34;https://developers.google.com/knowledge-graph/reference/rest/v1&#34;&gt;API Reference&lt;/a&gt;. For instance, that query has a &lt;code&gt;limit&lt;/code&gt; value of 1, but the API reference tells us that this can be up to 500, with a default value of 20. The reference page also includes a form you can fill out with sample API call parameters to learn about them more interactively than you would by revising a curl command over and over.&lt;/p&gt;
&lt;p&gt;A more interesting option for the query URL is &lt;code&gt;types&lt;/code&gt;, which lets you limit your search to entities of one or more specified &lt;a href=&#34;https://developers.google.com/knowledge-graph#knowledge_graph_entities&#34;&gt;schema.org types&lt;/a&gt;. For example, a query that uses parameters of  &lt;code&gt;query=charles+schwab&amp;amp;type=Corporation&lt;/code&gt; returns information about the company with that name, but &lt;code&gt;query=charles+schwab&amp;amp;type=Person&lt;/code&gt; returns information about its founder. (Because &lt;code&gt;types&lt;/code&gt; is plural you can also specify a comma-delimited list as that parameter&amp;rsquo;s value.)&lt;/p&gt;
&lt;p&gt;With no &lt;code&gt;limit&lt;/code&gt; parameter in the URL, the query about Charles Schwab the person actually returned eight people: Charles R. Schwab, the founder of the financial services firm; Pennsylvania steel magnate Charles M. Schwab; Émile Martin Charles Schwabe, a Swiss Symbolist painter and printmaker, and five other people.&lt;/p&gt;
&lt;p&gt;This brings me to a few triples returned by my command line above that I didn&amp;rsquo;t show in the Taylor Swift example. Because the request sends a query to Google, just like a search entered at &lt;a href=&#34;https://www.google.com&#34;&gt;www.google.com&lt;/a&gt;, the server actually returns a list of search results. Here is the beginning of the Turtle version of the search result for the person Charles Schwab:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;_:b0    &amp;lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#type&amp;gt;  &amp;lt;http://schema.org/ItemList&amp;gt; ;
        &amp;lt;http://schema.org/itemListElement&amp;gt;  _:b1 ;
        &amp;lt;http://schema.org/itemListElement&amp;gt;  _:b2 ;
        &amp;lt;http://schema.org/itemListElement&amp;gt;  _:b3 ;
        &amp;lt;http://schema.org/itemListElement&amp;gt;  _:b4 ;
        &amp;lt;http://schema.org/itemListElement&amp;gt;  _:b5 ;
        &amp;lt;http://schema.org/itemListElement&amp;gt;  _:b6 ;
        &amp;lt;http://schema.org/itemListElement&amp;gt;  _:b7 ;
        &amp;lt;http://schema.org/itemListElement&amp;gt;  _:b8 .

_:b1    &amp;lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#type&amp;gt; goog:EntitySearchResult ;
        goog:resultScore            1.105882568359375E3 ;
        &amp;lt;http://schema.org/result&amp;gt;  &amp;lt;http://g.co/kg/m/028lhc&amp;gt; .

&amp;lt;http://g.co/kg/m/028lhc&amp;gt;
        &amp;lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#type&amp;gt; &amp;lt;http://schema.org/Person&amp;gt; ;
        &amp;lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#type&amp;gt; &amp;lt;http://schema.org/Thing&amp;gt; ;
        goog:detailedDescription  _:b9 ;
        &amp;lt;http://schema.org/description&amp;gt;  &amp;#34;American magnate&amp;#34; ;
        &amp;lt;http://schema.org/name&amp;gt;  &amp;#34;Charles M. Schwab&amp;#34; .
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;The first instance in the data is an item list. This points at instances of &lt;code&gt;entitySearchResult&lt;/code&gt;; the first of these has the blank node &lt;code&gt;_:b1&lt;/code&gt; as its identifier. This search result points to information about the steel magnate, which identifies him with his &lt;a href=&#34;https://www.wikidata.org/wiki/Q365218&#34;&gt;Freebase ID&lt;/a&gt;, and it also has a search result score.&lt;/p&gt;
&lt;p&gt;The API documentation tells us that the result score is &amp;ldquo;an indicator of how well the entity matched the request constraints&amp;rdquo;. I imagine that this is not simply a score of string similarity but also takes into account the popularity of each search result—otherwise, I don&amp;rsquo;t know how the result score would be 12 for financial services firm founder Charles R. Schwab, 1.1 for steel magnate Charles M. Schwab, and 6 for Swiss symbolist Émile Martin Charles Schwabe.&lt;/p&gt;
&lt;h1 id=&#34;linking-that-data&#34;&gt;Linking that data&lt;/h1&gt;
&lt;p&gt;The Google Knowledge Base API doesn&amp;rsquo;t return a large amount of data for each entity, but when you have the Freebase ID, you can use it to retrieve additional data about that entity from Wikidata. The following simple little Wikidata query (try it &lt;a href=&#34;https://query.wikidata.org/#CONSTRUCT%20%7B%3Fs%20%3Fp%20%3Fo%20%7D%20WHERE%20%7B%0A%20%20%20%3Fs%20wdtn%3AP646%20%3Chttp%3A%2F%2Fg.co%2Fkg%2Fm%2F028lhc%3E%20%3B%0A%20%20%20%20%20%20%3Fp%20%3Fo%20.%0A%20%20%7D&#34;&gt;here&lt;/a&gt;) uses the Freebase ID that we saw above for steel magnate Charles Schwab to pull down 140 triples about him from Wikidata:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;CONSTRUCT {?s ?p ?o } WHERE {
   ?s wdtn:P646 &amp;lt;http://g.co/kg/m/028lhc&amp;gt; ;
      ?p ?o .
  }
&lt;/code&gt;&lt;/pre&gt;&lt;h1 id=&#34;exploring-for-more-data&#34;&gt;Exploring for more data&lt;/h1&gt;
&lt;p&gt;The Google Knowledge Graph API includes a boolean &lt;code&gt;prefix&lt;/code&gt; parameter that &amp;ldquo;[e]nables prefix (initial substring) match against names and aliases of entities&amp;rdquo;. The following asks for all entities of type &lt;code&gt;MusicGroup&lt;/code&gt; whose name begins with &amp;ldquo;bea&amp;rdquo;:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;curl \
  &amp;#34;https://kgsearch.googleapis.com/v1/entities:search?prefix=true&amp;amp;query=bea&amp;amp;types=MusicGroup&amp;amp;limit=500&amp;amp;key=API-KEY&amp;#34; \
  | riot --syntax=JSONLD &amp;gt; beagroups.ttl
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;The 481 results included the Beatles, Beach Boys and Beastie Boys, as I expected.&lt;/p&gt;
&lt;p&gt;I was wondering if a sorted list of result scores would reveal any pattern, and then I realized, duh, I can write a SPARQL query to do that; it&amp;rsquo;s why I pulled the data as triples! (I could execute a &lt;a href=&#34;../json-ld/&#34;&gt;query against the JSON-LD&lt;/a&gt;, but I prefer to work with Turtle because it&amp;rsquo;s easier to read.)&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;PREFIX s: &amp;lt;http://schema.org/&amp;gt;

SELECT ?resultScore ?bandName WHERE {
  ?result      &amp;lt;http://schema.googleapis.com/resultScore&amp;gt; ?resultScore ;
               s:result ?musicGroup .
  ?musicGroup  s:name ?bandName . 
}
ORDER BY DESC(?resultScore)
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Here are the first few results when running this query against the RDF of &amp;ldquo;bea&amp;rdquo; music groups that the curl command above pulled down:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;---------------------------------------------------------------------------------------------
| resultScore         | bandName                                                            |
=============================================================================================
| 2.518089111328125E3 | &amp;#34;The Beatles&amp;#34;                                                       |
| 3.5488818359375E2   | &amp;#34;Beastie Boys&amp;#34;                                                      |
| 1.969714050292969E2 | &amp;#34;Beak&amp;#34;                                                              |
| 1.761080169677734E2 | &amp;#34;Beatrice&amp;#34;                                                          |
| 1.361932220458984E2 | &amp;#34;Brooklyn Bounce&amp;#34;                                                   |
| 1.338223876953125E2 | &amp;#34;Battle Beast&amp;#34;                                                      |
| 1.335271911621094E2 | &amp;#34;Beatsteaks&amp;#34;                                                        |
| 1.331909942626953E2 | &amp;#34;Beartooth&amp;#34;                                                         |
| 1.256562881469727E2 | &amp;#34;Beady Belle&amp;#34;                                                       |
| 1.212170104980469E2 | &amp;#34;Beatfreakz&amp;#34;                                                        |
| 1.101853561401367E2 | &amp;#34;The Trammps&amp;#34;                                                       |
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;(Yes, the Trammps, of &lt;a href=&#34;https://www.youtube.com/watch?v=BPV6kpNnr3c&#34;&gt;Disco Inferno&lt;/a&gt; fame.) The Beach Boys ranked at 111, well below many groups I&amp;rsquo;ve never heard of that, like the Trammps, didn&amp;rsquo;t even have &amp;ldquo;bea&amp;rdquo; anywhere in their name: Vansire? The Parlotones? Turbotronic?&lt;/p&gt;
&lt;p&gt;The ability to pull typed data directly from Google&amp;rsquo;s Knowledge Graph is pretty great, especially since we can link much of that data to other good data sources. I had considered titling this blog entry “Piping data to stdin of Jena’s riot utility” (talk about your clickbait!) but as you can see decided to go with the Knowledge Graph angle—not because this term is a popular way to talk about graph databases in general, but because we&amp;rsquo;re pulling data from the graph that Google itself is calling a Knowledge Graph.&lt;/p&gt;
&lt;p&gt;Still, this ability to feed data to riot via stdin is pretty nice, and it smooths a key handoff of this trick the old-fashioned UNIX way. When these pieces are all assembled together like this, they make it easier to incorporate Google Knowledge Graph data into the wide range of RDF-based tools that are out there. It will have many great applications.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2021">2021</category>
      
      <category domain="https://www.bobdc.com//categories/knowledge-graphs">knowledge-graphs</category>
      
      <category domain="https://www.bobdc.com//categories/rdf">rdf</category>
      
      <category domain="https://www.bobdc.com//categories/linked-data">linked-data</category>
      
    </item>
    
    <item>
      <title>Linking different knowledge graphs together</title>
      <link>https://www.bobdc.com/blog/linkingkgs/</link>
      <pubDate>Sun, 28 Feb 2021 11:10:00 +0000</pubDate>
      
      <guid>https://www.bobdc.com/blog/linkingkgs/</guid>
      
      
      <description><div>Really linking them, not doing ETL.</div><div>&lt;p&gt;Lately I&amp;rsquo;ve been thinking about some aspects of RDF technology that I have taken for granted as basic building blocks of dataset design but that Knowledge Graph fans who are new to RDF may not be fully aware of—especially when they compare RDF to alternative ways to build knowledge graphs. A key building block is the ability to link independently created knowledge graphs.&lt;/p&gt;
&lt;blockquote id=&#34;id202592&#34; class=&#34;pullquote&#34;&gt;...it gives a better idea of what the “semantic web” was about: the world-wide linking of, not just documents, but (in more 2021 terminology) knowledge graphs.&lt;/blockquote&gt;
&lt;p&gt;For a little historical perspective: before Tim Berners-Lee invented the web, hypertext systems were all very closed systems. A &lt;a href=&#34;https://en.wikipedia.org/wiki/Storyspace&#34;&gt;Storyspace&lt;/a&gt; story (one of which I still own on a three and half inch floppy disk) could not link to an Apple &lt;a href=&#34;https://en.wikipedia.org/wiki/HyperCard&#34;&gt;Hypercard&lt;/a&gt; &amp;ldquo;stack&amp;rdquo; and a HyperCard stack could not link to a Storyspace story. The World Wide Web let any hypertext page anywhere in the world link to any other, and just look how far that has scaled.&lt;/p&gt;
&lt;p&gt;Imagine that you and I want to create relational data and use it in the same SQL system. We can&amp;rsquo;t just go off and each define our own database schema and expect our two databases to work together. The design work must be coordinated so that our respective contributions are essentially designed as a single system. Otherwise, the data from your system must be read (Extracted from your system), converted to be compatible with my system (Transformed), and then Loaded into my system—a process known in the industry as &lt;a href=&#34;https://en.wikipedia.org/wiki/Extract,_transform,_load&#34;&gt;ETL&lt;/a&gt;. If the data in your system later gets updated, my system&amp;rsquo;s users won&amp;rsquo;t know it until we repeat the whole process or invoke some custom ETL process to identify and retrieve the new parts.&lt;/p&gt;
&lt;p&gt;This was never the case with independently designed web pages, because anyone&amp;rsquo;s page could link to anyone else&amp;rsquo;s web page, and it&amp;rsquo;s not the case with RDF knowledge graphs. If I make one available on the public Internet, you can connect yours to mine so that as your and my datasets evolve, the connections themselves can remain the same but you&amp;rsquo;ll gain the benefits of the updated datasets. If we&amp;rsquo;re using different identifiers to refer to the same things, a little modeling can be part of the connection to indicate which things are the same, and then you&amp;rsquo;re off and running using the two datasets as one knowledge graph.&lt;/p&gt;
&lt;p&gt;The format of RDF graph node identifiers follow a &lt;a href=&#34;https://tools.ietf.org/html/rfc3986&#34;&gt;published IETF standard&lt;/a&gt;. The identifiers themselves remain universally unique (as with Java package names, they&amp;rsquo;re built off of domain names, which lets domain owners establish their own naming conventions) so your ability to reference one of my graph nodes from your data means that a link from your data to mine will work very simply. This was the “linked” part of &lt;a href=&#34;https://en.wikipedia.org/wiki/Linked_data&#34;&gt;Linked Data&lt;/a&gt;, and getting back to the once-revolutionary possibility of any hypertext document linking to any other hypertext document, it gives a better idea of what the “semantic web” was about: the world-wide linking of, not just documents, but (in more 2021 terminology) knowledge graphs, especially when modeling of the graph can be part of the graph itself.&lt;/p&gt;
&lt;p&gt;(Just to whet your appetite, I&amp;rsquo;m going to demonstrate all of this below by linking a new graph of the Beatles&amp;rsquo; favorite drinks and sports to a remote graph I made several years ago about who played what instruments on which songs.)&lt;/p&gt;
&lt;p&gt;There are two basic steps to linking your knowledge graph to someone else&amp;rsquo;s. As with HTML documents, you don&amp;rsquo;t need any kind of permission or cooperation from  anyone on the destination system to make the link if that destination is available via HTTP.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Either use the same resource identifiers that the graph you are linking to does or add some modeling that maps your identifiers to theirs.&lt;/li&gt;
&lt;li&gt;Use a SPARQL query (and implicitly, the SPARQL protocol—the &amp;ldquo;P&amp;rdquo; in SPARQL that defines a standard way to to transmit queries and results back and forth) to ask an endpoint for a graph subset meeting the conditions for your application.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The nodes of a graph need identifiers, and some non-RDF graph storage systems keep these under the covers to make your life simpler. If you can see these identifiers, though, both in your own graphs and in the graphs of others, it&amp;rsquo;s easier to identify nodes in the different graphs and create connections between them, connecting their host graphs into a larger graph. (And, RDF URI identifiers are no more difficult to read than URLs, which aren&amp;rsquo;t that difficult to read&amp;hellip; unless you&amp;rsquo;re in SharePoint world, in which case you have my sympathy.) If you know that two graphs use different identifiers for the same resource, your own data model can assert that both identifiers reference the same resource—with the data model statements just being additional edges on your own graph—and then standards-compliant (often free!) software can then take advantage of those assertions.&lt;/p&gt;
&lt;p&gt;To quote Pascal Hitzler&amp;rsquo;s recent Communications of the ACM article &lt;a href=&#34;https://cacm.acm.org/magazines/2021/2/250085-a-review-of-the-semantic-web-field/fulltext&#34;&gt;A Review of the Semantic Web Field&lt;/a&gt; (which uses the abbreviation IRI for &amp;ldquo;Internationalized Resource Identifiers&amp;rdquo;, a superset of URIs that allow a broader range of character choices),&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;What is usually associated with the term &amp;ldquo;linked data&amp;rdquo; is that linked data consists of a (by now rather large) set of RDF graphs that are linked in the sense that many IRI identifiers in the graphs also appear also in other, sometimes multiple, graphs. In a sense, the collection of all these linked RDF graphs can be understood as one very big RDF graph.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h1 id=&#34;retrieving-remote-data-from-a-sparql-endpoint&#34;&gt;Retrieving remote data from a SPARQL endpoint&lt;/h1&gt;
&lt;p&gt;A SPARQL query can use the SERVICE keyword to request data from another graph via an endpoint. It can then combine the retrieved data with local data and use the combination as a larger graph with more helpful connections than the local data has. For example, let&amp;rsquo;s say that after reviewing the website &lt;a href=&#34;http://www.beatlesinterviews.org&#34;&gt;The Beatles Interview Database&lt;/a&gt; you&amp;rsquo;ve compiled the following data about the Fab Four, and you have it stored locally in a file called &lt;code&gt;BeatlesFaves.ttl&lt;/code&gt;:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;@prefix b: &amp;lt;http://www.bobdc.com/ns/beatles/&amp;gt; .
@prefix wd: &amp;lt;http://www.wikidata.org/entity/&amp;gt; .
@prefix rdfs: &amp;lt;http://www.w3.org/2000/01/rdf-schema#&amp;gt; .

# Sources:
# http://www.beatlesinterviews.org/db1964.0614b.beatles.html
# http://www.beatlesinterviews.org/db1964.0906.beatles.html

wd:Q2632 rdfs:label &amp;#34;Ringo Starr&amp;#34; ;
    b:favoriteDrink &amp;#34;bourbon&amp;#34; ;
    b:favoriteBritishGroup &amp;#34;Rolling Stones&amp;#34; . 

wd:Q1203 rdfs:label &amp;#34;John Lennon&amp;#34; ;
    b:favoriteBritishGroup &amp;#34;Rolling Stones&amp;#34; . 

wd:Q2599 rdfs:label &amp;#34;Paul McCartney&amp;#34; ;
    b:favoriteBritishGroup &amp;#34;The Searchers&amp;#34; ;
    b:favoriteDrink &amp;#34;scotch and Coke&amp;#34; . 

wd:Q2643 rdfs:label &amp;#34;George Harrison&amp;#34; ;
    b:favoriteBritishGroup &amp;#34;The Animals&amp;#34; . 
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;This data uses the Wikidata identifiers to identify the individual Beatles so that the data will more easily integrate with other data that may use these identifiers—just as Pascal described above—such as Wikidata itself. (The Wikidata identifiers are easy to find; just look for &amp;ldquo;Wikidata item&amp;rdquo; on the left side of any Wikipedia page, such as &lt;a href=&#34;https://en.wikipedia.org/wiki/Ringo_Starr&#34;&gt;Ringo&amp;rsquo;s&lt;/a&gt;.)&lt;/p&gt;
&lt;p&gt;You have learned that several years ago some guy (OK, me) published an RDF graph of data at &lt;a href=&#34;http://www.bobdc.com/miscfiles/BeatlesMusicians.ttl&#34;&gt;http://www.bobdc.com/miscfiles/BeatlesMusicians.ttl&lt;/a&gt; about who played what instruments on which Beatles songs. (The creation of this dataset is described at &lt;a href=&#34;http://www.bobdc.com/blog/sparql-queries-of-beatles-reco/&#34;&gt;SPARQL queries of Beatles recording sessions&lt;/a&gt; along with some fun queries.) The URIs in that dataset that identify the musicians use URIs built from the musicians&amp;rsquo; names instead of using Wikipedia URIs. (There were so many musicians that I didn&amp;rsquo;t want to look all of them up in Wikidata manually, and some have names that are &lt;a href=&#34;https://en.wikipedia.org/wiki/Chris_Thomas&#34;&gt;common enough&lt;/a&gt; that automating the lookup wouldn&amp;rsquo;t have worked too well.)&lt;/p&gt;
&lt;p&gt;To show that &lt;code&gt;wd:Q2632&lt;/code&gt; from one graph is the same as &lt;code&gt;m:RingoStarr&lt;/code&gt; from the other I created a triple using &lt;code&gt;owl:sameAs&lt;/code&gt;. This predicate basically says &amp;ldquo;all facts about each of these two resources are true for the other one, so they are effectively the same resource&amp;rdquo;.&lt;/p&gt;
&lt;p&gt;My use of an OWL predicate required me to use a SPARQL processor that could handle more than RDFS. (See &lt;a href=&#34;http://www.bobdc.com/blog/partialschemas/&#34;&gt;Transforming data with inferencing and (partial!) schemas&lt;/a&gt; for examples of RDFS inferencing as part of a graph processing pipeline.) I only needed a little more than RDFS; &amp;ldquo;RDFS Plus&amp;rdquo; is a non-standard superset that adds &lt;code&gt;owl:sameAs&lt;/code&gt; support and a few other useful OWL bits to a SPARQL processor without committing to a full implementation of one of the &lt;a href=&#34;https://www.w3.org/TR/owl2-primer/#OWL_2_Profiles&#34;&gt;official OWL Profiles&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;To get this  &lt;code&gt;owl:sameAs&lt;/code&gt; support I used the free version of the GraphDB triplestore, which I&amp;rsquo;ve also &lt;a href=&#34;http://www.bobdc.com/blog/geosparqlgraphdb/&#34;&gt;used recently&lt;/a&gt; because of its GeoSPARQL support. &amp;ldquo;RDFS Support&amp;rdquo; is something you select when creating a GraphDB repository, so I did that and unchecked GraphDB&amp;rsquo;s &amp;ldquo;Disable owl:SameAs&amp;rdquo; checkbox. (I&amp;rsquo;m guessing that this checkbox is available because overuse of &lt;code&gt;owl:sameAs&lt;/code&gt; can use a lot of computing cycles.)&lt;/p&gt;
&lt;p&gt;After loading the &lt;code&gt;BeatlesFaves.ttl&lt;/code&gt; data above, I loaded the following &lt;code&gt;mapToWikidata.ttl&lt;/code&gt; file:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;@prefix m:   &amp;lt;http://learningsparql.com/ns/musician/&amp;gt; .
@prefix wd:  &amp;lt;http://www.wikidata.org/entity/&amp;gt; .
@prefix owl: &amp;lt;http://www.w3.org/2002/07/owl#&amp;gt; .

wd:Q2632 owl:sameAs m:RingoStarr  . 
wd:Q1203 owl:sameAs m:JohnLennon . 
wd:Q2599 owl:sameAs m:PaulMcCartney .
wd:Q2643 owl:sameAs m:GeorgeHarrison .
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;After doing this, a query of this repository for all the triples showed statements like &lt;code&gt;{m:GeorgeHarrison rdfs:label &amp;quot;George Harrison&amp;quot;}&lt;/code&gt;, which was not a triple in either of the loaded knowledge graphs but was inferred from the combination, so I knew I was all set.&lt;/p&gt;
&lt;h1 id=&#34;the-sparql-query&#34;&gt;The SPARQL query&lt;/h1&gt;
&lt;p&gt;I could have read the &lt;a href=&#34;http://www.bobdc.com/miscfiles/BeatlesMusicians.ttl&#34;&gt;http://www.bobdc.com/miscfiles/BeatlesMusicians.ttl&lt;/a&gt; file into GraphDB just like I read in &lt;code&gt;BeatlesFaves.ttl&lt;/code&gt; and &lt;code&gt;mapToWikidata.ttl&lt;/code&gt;, but that would be the old-fashioned ETL approach, where querying across datasets is really a query of a single dataset created by copying them all into one place. What if the remote dataset got updated with the names of the cellists on &amp;ldquo;The Long and Winding Road&amp;rdquo;, which are currently &lt;a href=&#34;https://www.beatlesbible.com/songs/the-long-and-winding-road/&#34;&gt;not there&lt;/a&gt;? I would have to either identify the new triples added to the remote data or reload the whole remote dataset. Instead of reading in the entire remote dataset, I would rather read the data that I need from it dynamically at query time to make sure that I had the most recent data.&lt;/p&gt;
&lt;p&gt;I can do this with SPARQL&amp;rsquo;s SERVICE keyword. This specifies the URL of a SPARQL endpoint and a query to send to it. The following query uses this keyword to find out the favorite British band of the bass player from &amp;ldquo;The Long and Winding Road&amp;rdquo;:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;PREFIX rdfs: &amp;lt;http://www.w3.org/2000/01/rdf-schema#&amp;gt; 
PREFIX i:    &amp;lt;http://learningsparql.com/ns/instrument/&amp;gt;
PREFIX s:    &amp;lt;http://learningsparql.com/ns/schema/&amp;gt; 
PREFIX b:    &amp;lt;http://www.bobdc.com/ns/beatles/&amp;gt;

SELECT ?britishGroup
WHERE { ?bassist b:favoriteBritishGroup ?britishGroup .
  SERVICE &amp;lt;https://dydra.com/bobdc/beatles-musicians/sparql&amp;gt;
  { SELECT ?bassist 
    WHERE { ?song a s:Song ; 
                  rdfs:label &amp;#34;The Long And Winding Road&amp;#34; ;
            i:bass ?bassist .
          }
  }
}
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;(Fun fact I had never noticed before: John plays bass on that.) For this demo, I stored the data on &lt;a href=&#34;https://dydra.com/about&#34;&gt;Dydra&lt;/a&gt;, which made it very easy to create a free account, upload data and make it available via a SPARQL endpoint, and to then set user access levels to that data. The data can be maintained on Dydra easily enough, so that a call to a Dydra endpoint really is retrieval from a dynamic database.&lt;/p&gt;
&lt;p&gt;The inner query above asks the remote data about the song&amp;rsquo;s bass player, binding the URI representing the bassist to the &lt;code&gt;?bassist&lt;/code&gt; variable.  The outer query then asks for the favorite British group of this bass player, which turned out to be the Rolling Stones.&lt;/p&gt;
&lt;p&gt;Note that the &lt;code&gt;?bassist&lt;/code&gt; variable will store the identifier &lt;code&gt;http://learningsparql.com/ns/musician/JohnLennon&lt;/code&gt; and the locally-stored data says that the Stones were the favorite British band of resource &lt;code&gt;wd:Q1203&lt;/code&gt;. That&amp;rsquo;s why I added a modeling triple  &lt;code&gt;wd:Q1203 owl:sameAs m:JohnLennon&lt;/code&gt; and used GraphDB, a triplestore that supports  &lt;code&gt;owl:sameAs&lt;/code&gt; as part of the RDFS superset that it supports. Remember, not all triplestores do, so that&amp;rsquo;s something to think about when planning an application.&lt;/p&gt;
&lt;p&gt;This ability to send a subquery off to a remote system and then the result is an important aspect of both the SPARQL query language and the SPARQL protocol, which has its &lt;a href=&#34;https://www.w3.org/TR/sparql11-protocol/&#34;&gt;own standardized specification&lt;/a&gt;. When you consider different systems that may play roles in building and using knowledge graphs, keep in mind that SPARQL&amp;rsquo;s mechanics for tying local and remote data together are both standardized and widely implemented.&lt;/p&gt;
&lt;p&gt;(A background note: I had also planned to show another query that retrieved the recording session data from &lt;a href=&#34;http://www.bobdc.com/miscfiles/BeatlesMusicians.ttl&#34;&gt;http://www.bobdc.com/miscfiles/BeatlesMusicians.ttl&lt;/a&gt; using SPARQL&amp;rsquo;s FROM keyword. When I investigated why some SPARQL processors did retrieve remote RDF files specified by this keyword and some didn&amp;rsquo;t, I learned that as a &lt;a href=&#34;https://www.w3.org/TR/sparql11-query/#security&#34;&gt;security consideration&lt;/a&gt; this retrieval is not required.)&lt;/p&gt;
&lt;p&gt;SPARQL&amp;rsquo;s ability to link together different RDF knowledge graphs—even when those graphs aren&amp;rsquo;t necessarily using the same identifiers to refer to the same resources—provides another huge benefit: it reduces the need for large complex schemas (typically, ontologies) to create useful knowledge graphs. Imagine that I create a small RDF knowledge graph that achieves certain goals, and then you create another that achieves different goals, and then a third person realizes that these two graphs are both related to the application that she is working on. Ideally, you and I would have each included a schema (which is just more triples!) listing the classes and properties we used; even small schemas would help people like this third person take advantage of our datasets. Whether we made schemas available or not, though, she can use the technique described above to connect the two graphs into a whole that is greater than the sum of its parts, growing into a larger knowledge graph the way that the collection of HTML pages available via HTTP has grown into the World Wide Web since 1993.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2021">2021</category>
      
      <category domain="https://www.bobdc.com//categories/knowledge-graphs">knowledge-graphs</category>
      
      <category domain="https://www.bobdc.com//categories/rdf">rdf</category>
      
      <category domain="https://www.bobdc.com//categories/rdf/owl">rdf/owl</category>
      
      <category domain="https://www.bobdc.com//categories/semantic-web">semantic-web</category>
      
      <category domain="https://www.bobdc.com//categories/linked-data">linked-data</category>
      
    </item>
    
    <item>
      <title>Transforming data with inferencing and (partial!) schemas</title>
      <link>https://www.bobdc.com/blog/partialschemas/</link>
      <pubDate>Sun, 24 Jan 2021 11:00:00 +0000</pubDate>
      
      <guid>https://www.bobdc.com/blog/partialschemas/</guid>
      
      
      <description><div>An excellent compromise between schemas and &#34;schemaless&#34; development.</div><div>&lt;blockquote id=&#34;id202455&#34; class=&#34;pullquote&#34;&gt;If you’re working with more than one RDF dataset, then the use of RDFS to identify little subsets of those datasets and to specify relationships between components of those subsets can help your knowledge graph and the applications that use it become useful a lot sooner.&lt;/blockquote&gt;
&lt;p&gt;I originally planned to title this &amp;ldquo;Partial schemas!&amp;rdquo; but as I assembled the example I realized that in addition to demonstrating the value of partial, incrementally-built schemas, the steps shown below also show how inferencing with schemas can implement transformations that are very useful in data integration. In the right situations this can be even better than SPARQL, because instead of using code—whether procedural or declarative—the transformation is driven by the data model itself.&lt;/p&gt;
&lt;p&gt;Also, the models are &lt;a href=&#34;https://www.w3.org/TR/rdf-schema/&#34;&gt;RDF Schemas&lt;/a&gt;, also known as RDFS. When people talk about RDF inferencing, they&amp;rsquo;re often talking about some of the more advanced inferencing that the superset (actually, &lt;a href=&#34;https://www.w3.org/TR/owl2-primer/#OWL_2_Profiles&#34;&gt;supersets&lt;/a&gt;) of RDFS known as OWL can do. Many people don&amp;rsquo;t realize how much you can do with simple RDFS inferencing.&lt;/p&gt;
&lt;h1 id=&#34;schema-no-schema-or-some-schema&#34;&gt;Schema, no schema, or&amp;hellip; some schema&lt;/h1&gt;
&lt;p&gt;For most of the history of data processing on computers, people needed to spell out the structure of their data before they could actually start accumulating data. For example, when using relational databases, you can&amp;rsquo;t add a row to a table unless you (or someone) has already specified all the columns that are going to be in that table and all of their types. In fact, you probably had to do this for all the database&amp;rsquo;s other tables as well, because the tables aren&amp;rsquo;t really ready until their relationships have all been straightened out through the process of &lt;a href=&#34;https://en.wikipedia.org/wiki/Database_normalization&#34;&gt;normalization&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The rise of NoSQL databases—especially &lt;a href=&#34;https://www.mongodb.com/blog/post/why-schemaless&#34;&gt;MongoDB&lt;/a&gt;—and the fact that schemas were optional for XML got developers excited about the ability to add any data of any structure they wished to a dataset. Since then, blogs have been full of debates about the value of developing with vs. without schemas. Not enough people appreciate the wonderful compromise offered by RDF knowledge graphs, where partial schemas can give you the best of both worlds, so I wanted to demonstrate that.&lt;/p&gt;
&lt;h1 id=&#34;finding-a-big-mess-of-rdf&#34;&gt;Finding a big mess of RDF&lt;/h1&gt;
&lt;p&gt;I wanted to start with an RDF dataset that was bigger and more complex than I needed  so that I could show how a schema for just a subset of it could help to get only the parts that I wanted. On a page for the &lt;a href=&#34;https://serpapi.com/youtube-search-api&#34;&gt;YouTube offering&lt;/a&gt; of a &lt;a href=&#34;https://serpapi.com&#34;&gt;search engine API company&lt;/a&gt; I found a 26K sample of JSON that their API would return on a search for &amp;ldquo;Star Wars&amp;rdquo;, so I used AtomGraph&amp;rsquo;s &lt;a href=&#34;http://www.bobdc.com/blog/json2rdf/&#34;&gt;JSON2RDF&lt;/a&gt; to convert that to an RDF file that I called &lt;a href=&#34;http://bobdc.com/miscfiles/ytstarwars.ttl&#34;&gt;ytstarwars.ttl&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;This turned the JSON&amp;rsquo;s unnamed containers into a lot of triples with blank nodes in the RDF. The structural relationships of these triples were easier to see after a look at the JSON, which had some header data and then an array named &lt;code&gt;movie_results&lt;/code&gt;with JSON objects about movies that each had &lt;code&gt;title&lt;/code&gt;, &lt;code&gt;description&lt;/code&gt;, and several other properties. After that was a similar array named &lt;code&gt;video_results&lt;/code&gt; that had objects with  &lt;code&gt;title&lt;/code&gt; and &lt;code&gt;description&lt;/code&gt; properties and others (not identical to the movie object properties) that I didn&amp;rsquo;t care about.&lt;/p&gt;
&lt;p&gt;After telling JSON2RDF to use a base URI of &lt;code&gt;http://bobdc.com/ns/pschemademo&lt;/code&gt; in its output I got a lot of RDF triples with predicates of &lt;code&gt;http://bobdc.com/ns/pschemademo#video_results&lt;/code&gt; (hereafter, &lt;code&gt;t:video_results&lt;/code&gt;), blank node subjects that represented the array, and blank node objects that represented the videos themselves. To describe the individual videos, the RDF included triples with these videos as subjects and predicate-object pairs like (&lt;code&gt;t:title&lt;/code&gt; &amp;ldquo;Star Wars: The Empire Strikes Back&amp;rdquo;) and (&lt;code&gt;t:link&lt;/code&gt; &amp;ldquo;&lt;a href=&#34;https://www.youtube.com/watch?v=Ooh3k8cJDBg%22&#34;&gt;https://www.youtube.com/watch?v=Ooh3k8cJDBg&amp;quot;&lt;/a&gt;.)&lt;/p&gt;
&lt;p&gt;It sounds messy but isn&amp;rsquo;t too bad when you flip back and forth between the JSON and the RDF that JSON2RDF created. The fun part was transforming this RDF into something simpler and cleaner—not with the SPARQL CONSTRUCT queries that I would typically use to turn one set of RDF into another, but with a schema and inferencing.&lt;/p&gt;
&lt;h1 id=&#34;transforming-with-a-schema&#34;&gt;Transforming with a schema&lt;/h1&gt;
&lt;p&gt;I started the schema with just this:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;# pschema1.ttl
@prefix t:    &amp;lt;http://bobdc.com/ns/pschemademo#&amp;gt; .
@prefix rdfs: &amp;lt;http://www.w3.org/2000/01/rdf-schema#&amp;gt;  .

t:Video a rdfs:Class .
t:title rdfs:domain t:Video .
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;The first triple declares &lt;code&gt;t:Video&lt;/code&gt; to be a class.&lt;/p&gt;
&lt;p&gt;We often use the &lt;code&gt;rdfs:domain&lt;/code&gt; property to say &amp;ldquo;this property is associated with this class&amp;rdquo;, which is a typical thing to do in a data model, but the second triple above actually does more than that: it says that if a resource has a &lt;code&gt;t:title&lt;/code&gt; property, then an inferencing parser should infer that this resource is an instance of the &lt;code&gt;t:Video&lt;/code&gt; class. (Or, in triple terms: if the parser finds a triple with &lt;code&gt;t:title&lt;/code&gt; as its predicate, then infer a new triple saying that the found triple&amp;rsquo;s subject is an instance of the specified class.)&lt;/p&gt;
&lt;p&gt;Several of the &lt;a href=&#34;https://github.com/stain/jena-docker/tree/master/jena&#34;&gt;command line utilities&lt;/a&gt; that come with &lt;a href=&#34;https://jena.apache.org/&#34;&gt;Apache Jena&lt;/a&gt; let you use an &lt;code&gt;--rdfs&lt;/code&gt; switch to point to a vocabulary file of triples to use for inferencing. Here&amp;rsquo;s how I used Jena&amp;rsquo;s &lt;code&gt;riot&lt;/code&gt; utility to parse the Turtle version of the YouTube Star Wars query result with inferencing based on the schema above:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;riot --rdfs=pschema1.ttl ytstarwars.ttl &amp;gt; temp.ttl
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;The result is a copy of the input with triples like these added:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;_:Ba465efcc265d609003ef1776e61da647 rdf:type t:Video .
_:Ba465efcc265d609003ef1776e61da647 rdf:title &amp;#34;LEGO® Star Wars™ The Build Zone&amp;#34; .
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;In addition to videos in the search results, there were also movie results from the &lt;code&gt;movie_results&lt;/code&gt; array, so let&amp;rsquo;s declare the same &lt;code&gt;rdfs:Class&lt;/code&gt; and &lt;code&gt;rdfs:domain&lt;/code&gt; triples for them and then do more inferencing&amp;hellip;&lt;/p&gt;
&lt;p&gt;But there&amp;rsquo;s a problem. Movie results also have &lt;code&gt;t:title&lt;/code&gt; properties, and the schema above says that anything with a &lt;code&gt;t:title&lt;/code&gt; is a video result. How can the schema distinguish between videos and movies, and how can we say that both videos and movies have titles?&lt;/p&gt;
&lt;p&gt;I mentioned earlier that the RDF created by AtomGraph includes triples with predicates of &lt;code&gt;t:video_results&lt;/code&gt;, blank node subjects that represent the video results array, and blank node objects that represent the members of the array—the videos themselves. It also includes similar &lt;code&gt;t:movie_results&lt;/code&gt; triples to store movies.&lt;/p&gt;
&lt;p&gt;The first draft of the schema above used RDF&amp;rsquo;s &lt;code&gt;rdfs:domain&lt;/code&gt; property to say that if a triple has a particular predicate then the resource represented by its subject is an instance of a particular class. The second draft uses a different part of the RDFS vocabulary: &lt;code&gt;rdfs:range&lt;/code&gt;.&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;# pschema2.ttl
@prefix t:    &amp;lt;http://bobdc.com/ns/pschemademo#&amp;gt; .
@prefix rdfs: &amp;lt;http://www.w3.org/2000/01/rdf-schema#&amp;gt;  .

t:Video a rdfs:Class .
t:Movie a rdfs:Class .

t:video_results rdfs:range t:Video . 
t:movie_results rdfs:range t:Movie . 
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Unlike the &lt;code&gt;rdfs:domain&lt;/code&gt; property, the &lt;code&gt;rdfs:range&lt;/code&gt; property tells the inferencing engine that if a particular property is used as a triple&amp;rsquo;s predicate, then that triple&amp;rsquo;s &lt;em&gt;object&lt;/em&gt; is a member of a particular class. The &lt;code&gt;t:video_results&lt;/code&gt; triple in this new schema tells the inferencing engine that when it sees the triple &lt;code&gt;{_:blankNode1 t:video_results _:blankNode2}&lt;/code&gt; in the input, it should create the triple &lt;code&gt;{_:blankNode2 a t:Video}&lt;/code&gt;.  The other &lt;code&gt;rdfs:range&lt;/code&gt; triple in the schema does something similar to say that the object of &lt;code&gt;t:movie_results&lt;/code&gt; triples are instances of &lt;code&gt;t:Movie&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;The first two triples in the new schema declare those two classes, but strictly speaking this isn&amp;rsquo;t necessary. If the schema says that &lt;code&gt;_:blankNode1&lt;/code&gt; is a member of a particular class, then the inference engine will infer that that class exists. It&amp;rsquo;s still worth declaring the classes, though, because an important reason to have schemas in the first place is to show the structure of the data to people using that data so that they can get more out of it.&lt;/p&gt;
&lt;p&gt;Running a similar riot command with the new schema then creates new triples such as the following:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;_:B43a50d34335d3e6c8db6403bc5bea2cf a t:Movie .
_:B2f3a9c7d55b4e5ab6272a20db6a16b97 a t:Video . 
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;How do we show that the title, description, and link properties in the triples generated by AtomGraph apply to videos and movies but not necessarily to other classes that may come up in this data? With another incremental modeling step: we&amp;rsquo;ll make the Movie and Video classes subclasses of another class (in this case, &lt;a href=&#34;https://schema.org/CreativeWork&#34;&gt;CreativeWork&lt;/a&gt; from schema.org; I may as well take advantage of an existing standard to make the data more interoperable with other applications) and declare that the properties go with that superclass:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;# pschema3.ttl
@prefix t:    &amp;lt;http://bobdc.com/ns/pschemademo#&amp;gt; .
@prefix rdfs: &amp;lt;http://www.w3.org/2000/01/rdf-schema#&amp;gt;  .
@prefix s:    &amp;lt;http://schema.org/&amp;gt; 

t:Video a rdfs:Class ;
        rdfs:subClassOf s:CreativeWork . 

t:Movie a rdfs:Class ;
        rdfs:subClassOf s:CreativeWork . 

t:video_results rdfs:range t:Video . 
t:movie_results rdfs:range t:Movie . 

t:title rdfs:domain s:CreativeWork .
t:link rdfs:domain s:CreativeWork .
t:description rdfs:domain s:CreativeWork .
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Here are some of the triples generated by riot from that schema, with blank node names and &lt;code&gt;t:description&lt;/code&gt; values shortened to fit here better:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;_:Ba2f a t:Video .
_:Ba2f a s:CreativeWork .
_:Ba2f t:title &amp;#34;2020 Portrayed by Star Wars&amp;#34; .
_:Ba2f t:link &amp;#34;https://www.youtube.com/watch?v=L8Sezzl7_zU&amp;#34; .
_:Ba2f t:description &amp;#34;A Parody of Star Wars in which...&amp;#34; .

_:B166 a t:Movie .
_:B166 a s:CreativeWork .
_:B166 t:title &amp;#34;Star Wars: The Empire Strikes Back&amp;#34; .
_:B166 t:link &amp;#34;https://www.youtube.com/watch?v=Ooh3k8cJDBg&amp;#34; .
_:B166 t:description &amp;#34;Discover the conflict between good and ...&amp;#34; .
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;There is a lot more modeling that I could do with this data. I could take greater advantage of the schema.org ontology and maybe &lt;a href=&#34;https://www.dublincore.org/specifications/dublin-core/dcmi-terms&#34;&gt;Dublin Core&lt;/a&gt; as well so that my data interoperates with other data and applications better. The remainder of the data converted by AtomGraph has more properties and classes which I may or may not care about. If I do, I can add more to my schema; if I don&amp;rsquo;t, I&amp;rsquo;m done.&lt;/p&gt;
&lt;p&gt;The value of inferencing from schemas is really just a bonus to this exercise. The original key points I meant to prove here are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;A little schema can provide a little value right away.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Incrementally building on it can provide more and more value.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Your schema doesn&amp;rsquo;t need to cover all of your input data.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In my &lt;a href=&#34;../knowledgegraphs/&#34;&gt;last blog entry&lt;/a&gt; I wrote about the excellent &amp;ldquo;Knowledge Graphs&amp;rdquo; paper (&lt;a href=&#34;https://arxiv.org/pdf/2003.02320.pdf&#34;&gt;pdf&lt;/a&gt;) written by some experts in many related topics as a product of a &lt;a href=&#34;https://www.dagstuhl.de/en/program/calendar/semhp/?semnr=18371&#34;&gt;Schloss Dagstuhl conference&lt;/a&gt; in 2018. One bit of that paper that I quoted is very relevant to this blog entry as well:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Graphs allow maintainers to postpone the definition of a schema, allowing the data – and its scope – to evolve in a more flexible manner than typically possible in a relational setting, particularly for capturing incomplete knowledge.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This idea of letting the data and its schema evolve in a more flexible manner is especially great for data integration projects. My example here started off with a (somewhat) big mess of RDF; if you&amp;rsquo;re working with more than one RDF dataset—maybe with some converted from other formats such as JSON or relational databases—then the use of RDFS to identify little subsets of those datasets &lt;em&gt;and to specify relationships between components of those subsets&lt;/em&gt; can help your knowledge graph and the applications that use it become useful a lot sooner.&lt;/p&gt;
&lt;p&gt;It works at the other end of the scale as well. For proof of concept work, a small bit of data with a small schema can help to prove your concept. From there, incrementally adding to this data and schema can get those who saw the proved concept more and more interested as you build it up. This agile approach goes over well with software developers, who have good reasons to be suspicious of starting off with a large complex schema. (I actually consider a schema with no corresponding data to only be a schema proposal: how do we know that the schema is doing a good job? The academic world is full of these, although they are more often known as ontologies.)&lt;/p&gt;
&lt;p&gt;Note that I did all of this without any SPARQL. I would probably use some SPARQL as one more step to pull out the inferred triples instead of keeping all those original triples about the JSON file&amp;rsquo;s structure that AtomGraph generated, but that would be as a convenience. The main work of transforming the data subset that I had into the model that I wanted was still performed with the RDFS model.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve written another example of how incremental schema development can benefit an application in in &lt;a href=&#34;../driving-hadoop-data-integratio/&#34;&gt;Driving Hadoop data integration with standards-based models instead of code&lt;/a&gt;. (Note the subtitle: &amp;ldquo;RDFS Models!&amp;rdquo;) The main point at the time was to show how this could all work on a Hadoop infrastructure. I took RDF generated from two different employee databases with two different structures, built a small model that integrated subsets of them, ran a script that performed the integration, expanded the model, and ran the same script to perform a larger integration with no changes to the script itself. Hadoop or no Hadoop, this example provides another nice example of how RDFS inferencing with gradually growing schemas can help you take advantage of existing datasets that were not originally designed for your application.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2021">2021</category>
      
      <category domain="https://www.bobdc.com//categories/rdfs">RDFS</category>
      
      <category domain="https://www.bobdc.com//categories/rdf">RDF</category>
      
      <category domain="https://www.bobdc.com//categories/knowledge-graphs">knowledge-graphs</category>
      
    </item>
    
    <item>
      <title>Knowledge Graphs!</title>
      <link>https://www.bobdc.com/blog/knowledgegraphs/</link>
      <pubDate>Sun, 20 Dec 2020 11:45:00 +0000</pubDate>
      
      <guid>https://www.bobdc.com/blog/knowledgegraphs/</guid>
      
      
      <description><div>Semantic Linked Knowledge Web Data Graphs?</div><div>&lt;img id=&#34;idm45478314451696&#34; src=&#34;https://www.bobdc.com/img/main/knowledgegraphrdf.png&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;Google Knowledge Graph with RDF triples&#34;/&gt;
&lt;p&gt;For several years I thought of &amp;ldquo;knowledge graphs&amp;rdquo; as the buzzphrase that had partially replaced &amp;ldquo;Linked Data&amp;rdquo;, which was the buzzphrase that had partially replaced &amp;ldquo;Semantic Web&amp;rdquo;. In a &lt;a href=&#34;http://www.bobdc.com/blog/selling-rdf-technology-to-big/&#34;&gt;2012 blog entry&lt;/a&gt; I explained how Hadoop and the new-at-the-time NoSQL databases had convinced me that even if a technology has a funny name, selling it based on the problems it solves makes more sense and ages better than selling a buzz phrase vision and then, if that goes well, describing the technology that enables that vision. (In &lt;a href=&#34;http://www.bobdc.com/blog/coming-soon-new-expanded-editi/&#34;&gt;another blog entry&lt;/a&gt; I described how the second edition of my book &lt;a href=&#34;http://www.learningsparql.com&#34;&gt;Learning SPARQL&lt;/a&gt; had &amp;ldquo;55% more pages! 23% fewer mentions of the semantic web!&amp;rdquo;) In other words, I&amp;rsquo;ve had time to get more suspicious of buzz phrase visions over the years.&lt;/p&gt;
&lt;h1 id=&#34;the-hot-new-thing&#34;&gt;The hot new thing&lt;/h1&gt;
&lt;p&gt;I also knew that RDF-related vendors have been talking about knowledge graph capabilities for several years, but these same vendors were also talking about Semantic Web and Linked Data capabilities before that, so I thought that they were just rebranding with the new buzz phrase as a marketing strategy. Recently, though, I realized how far the excitement about knowledge graphs had spread independently of that community. My initial surprise was Ben Lorica&amp;rsquo;s &lt;a href=&#34;https://thedataexchange.media/building-and-deploying-knowledge-graphs/&#34;&gt;interview with Mayank Kerjiwal&lt;/a&gt; on his &amp;ldquo;Data Exchange&amp;rdquo; podcast about knowledge graphs. Then, when I &lt;a href=&#34;https://twitter.com/bobdc/status/1317507842291748865&#34;&gt;tweeted&lt;/a&gt; about it, &lt;a href=&#34;https://twitter.com/pacoid/status/1317531289495375872&#34;&gt;Paco Nathan&lt;/a&gt; recommended that I join the &lt;a href=&#34;https://www.knowledgegraph.tech/&#34;&gt;Knowledge Graph Conference&lt;/a&gt; Slack group. I&amp;rsquo;d been aware of Lorica and Nathan&amp;rsquo;s work for years but had given up on RDF-like technology making much of a blip on their radar.&lt;/p&gt;
&lt;p&gt;I joined the Slack group and found old friends and some  new ones there. When I asked the group about a good definition of &amp;ldquo;knowledge graph&amp;rdquo; I was a bit inundated, especially with pointers to vendor explanations. As a bandwagon buzzphrase for our time, many vendors are shouting &amp;ldquo;That currently hot thing? Yeah! That&amp;rsquo;s what we do!&amp;rdquo; (even the SEO sharks have &lt;a href=&#34;https://www.google.com/search?q=%22KNOWLEDGE+GRAPH%22+SEO&#34;&gt;smelled blood in this water&lt;/a&gt;) so I was less interested in the vendor perspectives on a good definition.  In that Slack thread, Tomas Deely pointed me to the paper simply titled &amp;ldquo;Knowledge Graphs&amp;rdquo; (&lt;a href=&#34;https://arxiv.org/pdf/2003.02320.pdf&#34;&gt;pdf&lt;/a&gt;) written by &lt;a href=&#34;https://twitter.com/juansequeda&#34;&gt;@juansequeda&lt;/a&gt;, Antoine Zimmerman (&lt;a href=&#34;https://twitter.com/MonsieurAZ&#34;&gt;@MonsieurAZ&lt;/a&gt; ), and 14 other people whose names were less familiar to me. As Juan &lt;a href=&#34;https://twitter.com/bobdc/status/1326546671304536077&#34;&gt;explained to me&lt;/a&gt;, the paper came out of a &lt;a href=&#34;https://www.dagstuhl.de/en/program/calendar/semhp/?semnr=18371&#34;&gt;Schloss Dagstuhl conference&lt;/a&gt; in 2018.&lt;/p&gt;
&lt;h1 id=&#34;a-serious-informative-review-of-knowledge-graph-technology&#34;&gt;A serious, informative review of knowledge graph technology&lt;/h1&gt;
&lt;p&gt;It was nice to see the formal discipline of this paper—for example, its description of &amp;ldquo;the distinction between nodes/edges and entities/relations&amp;rdquo;—when compared with all the me-too vendor definitions of knowledge graphs floating around. After 5 or 6 years of my looking at knowledge graphs through RDF-colored glasses this paper gave me a broader perspective. Its introduction tells us:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The goal of this tutorial paper is to motivate and give a comprehensive introduction to knowledge graphs: to describe their foundational data models and how they can be queried; to discuss representations relating to schema, identity, and context; to discuss deductive and inductive ways to make knowledge explicit; to present a variety of techniques that can be used for the creation and enrichment of graph-structured data; to describe how the quality of knowledge graphs can be discerned and how they can be refined; to discuss standards and best practices by which knowledge graphs can be published; and to provide an overview of existing knowledge graphs found in practice.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;That&amp;rsquo;s a lot of material to cover, so the paper is long. After I read the 78-page main body of the 132-page paper, Appendix A on page 108 (after 30 pages of 583 footnote references) in particular gave me the perspective that I was looking for on both the long-term and recent histories of knowledge graphs as well as the relative roles of RDF and non-RDF technologies along the way. I very strongly recommend that people interested in knowledge graphs start with this five-page appendix. (Note: I read and took notes about the paper a few weeks ago and just noticed that all of my numbers earlier in this paragraph were off. I then saw the &amp;ldquo;11 Dec 2020&amp;rdquo; date stamp on the first page and realized that it has been revised since I read it. I&amp;rsquo;ve tried to update my numbers and quotes here to reflect the latest version.)&lt;/p&gt;
&lt;p&gt;The short version of the history of knowledge graphs begins in 2012 (the year I blogged about reducing my usage of the term &amp;ldquo;semantic web&amp;rdquo;!) when an engineering SVP at Google published &lt;a href=&#34;https://blog.google/products/search/introducing-knowledge-graph-things-not/&#34;&gt;Introducing the Knowledge Graph: things, not strings&lt;/a&gt;. Sections 2 and 3 of the Schloss Dagstuhl Knowledge Graph paper&amp;rsquo;s Appendix A are titled &amp;ldquo;&amp;lsquo;Knowledge Graphs&amp;rsquo;: Pre 2012&amp;rdquo; and &amp;ldquo;&amp;lsquo;Knowledge Graphs&amp;rsquo;: 2012 Onwards&amp;rdquo; because this Google article was such a key event in knowledge graph history. Appendix A gives perspective on what makes which research—both before and after 2012—relevant to whatever is now considered knowledge graph technology. Section 3 also sorts out various classes of &amp;ldquo;knowledge graph&amp;rdquo; definitions, providing good context on the paper&amp;rsquo;s own definition given in both its introduction and in its summary at the end: &amp;ldquo;a graph of data intended to accumulate and convey knowledge of the real world, whose nodes represent entities of interest and whose edges represent relations between these entities&amp;rdquo;. We can all think of followup questions for that definition, but for 28 words it&amp;rsquo;s pretty good.&lt;/p&gt;
&lt;p&gt;Section 3 of the appendix includes an important, lower-level supplement to this definition that addresses the important question of what makes a graph data structure a knowledge graph : &amp;ldquo;We refer to a knowledge graph as a data graph potentially enhanced with representations of schema, identity, context, ontologies and/or rules&amp;rdquo;.&lt;/p&gt;
&lt;h1 id=&#34;reading-through-rdf-colored-glasses&#34;&gt;Reading through RDF-colored glasses&lt;/h1&gt;
&lt;p&gt;All of these potential enhancements are covered in some detail in the main body of the paper. Although that coverage often goes for pages without mentioning any RDF-related technology, when I see (through my admittedly RDF-colored glasses) discussions of schema, identity, and ontologies around a tourism example of named node-edge-node triples, I of course see lots of RDF. Here&amp;rsquo;s an example:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Graphs allow maintainers to postpone the definition of a schema, allowing the data – and its scope – to evolve in a more flexible manner than typically possible in a relational setting, particularly for capturing incomplete knowledge. Unlike (other) NoSQL models, specialised graph query languages support not only standard relational operators (joins, unions, projections, etc.), but also navigational operators for recursively finding entities connected through arbitrary-length paths.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;There&amp;rsquo;s no mention of RDF or SPARQL there, but it certainly lists many of their key capabilities. (I&amp;rsquo;ll be discussing the wonderful possibilities of partial RDF schemas, as a compromise between no schemas and fully detailed ones, in an upcoming blog entry.) It goes on:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Standard knowledge representation formalisms – such as ontologies and rules – can be employed to define and reason about the semantics of the terms used to label and describe the nodes and edges in the graph.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;It was interesting how that and many other parts of the paper discussed capabilities that OWL has provided since RDF&amp;rsquo;s early years. OWL has a smaller profile in the RDF world than it used to (people who once thought that OWL might help to define data structures that could help maintain data quality have turned to &lt;a href=&#34;http://www.bobdc.com/blog/the-w3c-standard-constraint-la/&#34;&gt;SHACL&lt;/a&gt;, because that&amp;rsquo;s not really what OWL was for, and these users&amp;rsquo; confusion over all the different OWL profiles didn’t help) so it was interesting to see how much the Schloss Dagstuhl Knowledge Graph paper discussed ontologies, Description Logics (the DL in &lt;a href=&#34;http://www.bobdc.com/blog/the-dl-in-owl-dl/&#34;&gt;OWL-DL&lt;/a&gt;), T-Boxes, A-Boxes, Individuals, and especially entailment and related topics where OWL can contribute plenty.&lt;/p&gt;
&lt;h1 id=&#34;graph-and-non-graph-technologies&#34;&gt;Graph and non-graph technologies&lt;/h1&gt;
&lt;p&gt;Something else that made me think of knowledge graphs as a vague buzzphrase was the way people used the term to reference technologies that are quite separate from the use of graph data structures: relational databases, text indexing, named entity recognition and other areas that fall under the currently overlapping umbrellas of machine learning and artificial intelligence.&lt;/p&gt;
&lt;p&gt;Some of those do have direct applications to graphs, such as the use of &lt;a href=&#34;http://www.bobdc.com/blog/semantic-web-semantics-vs-vect/&#34;&gt;embeddings&lt;/a&gt; with triples, and the Schloss Dagstuhl paper covers these technologies. It has good reasons for this, describing them as techniques by which knowledge graphs can be &amp;ldquo;enriched from diverse sources of legacy data that may range from plain text to structured formats (and anything in between).&amp;rdquo; The paper&amp;rsquo;s conclusion sums up the relationship among these technologies nicely:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Research on knowledge graphs can become a confluence of techniques arising
from different areas with the common objective of maximising the knowledge – and thus value
– that can be distilled from diverse sources at large scale using a graph-based data abstraction.
Pursuing this objective will benefit from expertise on graph databases, knowledge representation,
logic, machine learning, graph algorithms and theory, ontology engineering, data quality, natural
language processing, information extraction, privacy and security, and more besides.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;(A side note on embeddings, RDF and knowledge graphs: The RDF2VEC algorithm used to do embeddings with RDF has been around since 2016, and you can find many discussions about it since then that refer to &lt;a href=&#34;https://www.google.com/search?q=rdf2vec%22rdf+graph+embeddings%22&#34;&gt;RDF graph embeddings&lt;/a&gt;. More recent discussions of it, though, have titles like &lt;a href=&#34;https://towardsdatascience.com/how-to-create-representations-of-entities-in-a-knowledge-graph-using-pyrdf2vec-82e44dad1a0&#34;&gt;How to Create Representations of Entities in a Knowledge Graph using pyRDF2Vec&lt;/a&gt;. It&amp;rsquo;s another example of an RDF thing that&amp;rsquo;s been around for a while now being described as a knowledge graph thing because of the current cachet of the term.)&lt;/p&gt;
&lt;h1 id=&#34;knowledge-graphs-and-rdf&#34;&gt;Knowledge graphs and RDF&lt;/h1&gt;
&lt;p&gt;Section 10.2 of the Schloss Dagstuhl Knowledge Graph paper, &amp;ldquo;Enterprise Knowledge Graphs&amp;rdquo;, includes footnoted mentions of over a dozen brand-name companies who have discussed their knowledge graph initiatives. I did a quick skim of all the referenced works to check for references to RDF technology and found that &lt;a href=&#34;https://www.thomsonreuters.com/en/press-releases/2017/october/thomson-reuters-launches-first-of-its-kind-knowledge-graph-feed.html&#34;&gt;Thomson Reuters&lt;/a&gt; has got RDF in that mix, which didn&amp;rsquo;t surprise me. The big surprise for me was Pinterest not only using RDF but &lt;a href=&#34;https://arxiv.org/pdf/1907.02106.pdf&#34;&gt;using OWL&lt;/a&gt;. Put a pin in &lt;a href=&#34;https://www.pinterest.com/shannonalford/owls/&#34;&gt;that&lt;/a&gt;!&lt;/p&gt;
&lt;p&gt;Discussions of the work at the other companies didn&amp;rsquo;t mention RDF, but most were fairly high-level discussions, so I&amp;rsquo;m guessing that some of them use RDF and some don&amp;rsquo;t. (While the &lt;a href=&#34;https://www.astrazeneca.com/what-science-can-do/labtalk-blog/uncategorized/how-data-and-ai-are-helping-unlock-the-secrets-of-disease.html&#34;&gt;referenced AstraZeneca article&lt;/a&gt; didn&amp;rsquo;t mention RDF, I know that as a customer of &lt;a href=&#34;https://allegrograph.com/customers/astrazeneca/&#34;&gt;Allegrograph&lt;/a&gt;, &lt;a href=&#34;https://www.ontotext.com/knowledgehub/case-studies/astrazeneca-early-hypotheses-testing-linked-data/&#34;&gt;Ontotext&lt;/a&gt; and TopQuadrant—for whom I did training at AstraZeneca—they have been RDF fans for a while, especially because of the data integration possibilities.)&lt;/p&gt;
&lt;p&gt;Knowledge graphs are not as synonymous with RDF as those of us with the aforementioned glasses might like to think. In fact, knowledge graphs currently looks bigger than that, and it&amp;rsquo;s easy enough to picture companies like Google, Facebook, and eBay defining their own data structures and schema languages to build graphs with no reference to the relevant W3C standards. I don&amp;rsquo;t think this is necessarily a bad thing; it looks like a pretty big tent.&lt;/p&gt;
&lt;h1 id=&#34;the-vision-thing&#34;&gt;The vision thing&lt;/h1&gt;
&lt;p&gt;I described how I became suspicious of selling RDF technology by building a buzz phrase vision around it and starting the marketing pitch with that. The pleasant surprise in my study of the knowledge graph world is that it was built around currently important ideas and needs, not any specific technology, and RDF-related technology turns out to provide excellent, standardized, widely implemented open source and commercial support for the implementation of knowledge graphs. In other words, this newer vision came along fairly independently of RDF and happens to be a great fit for it, so I&amp;rsquo;ll just try to be grateful. I&amp;rsquo;ll still be a bit self-conscious when I insert the phrase &amp;ldquo;knowledge graph&amp;rdquo; into a technology discussion that I would have written anyway even if knowledge graphs weren&amp;rsquo;t so hot—as if I were jumping on the bandwagon—but it&amp;rsquo;s a pretty nice bandwagon, and RDF people have a lot that we can contribute to it.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2020">2020</category>
      
      <category domain="https://www.bobdc.com//categories/knowledge-graphs">knowledge-graphs</category>
      
      <category domain="https://www.bobdc.com//categories/rdf">rdf</category>
      
      <category domain="https://www.bobdc.com//categories/rdf/owl">rdf/owl</category>
      
      <category domain="https://www.bobdc.com//categories/semantic-web">semantic-web</category>
      
      <category domain="https://www.bobdc.com//categories/linked-data">linked-data</category>
      
    </item>
    
    <item>
      <title>Using SPARQL to combine Wikidata and OSM triples</title>
      <link>https://www.bobdc.com/blog/osmpluswikidata/</link>
      <pubDate>Sun, 22 Nov 2020 12:15:00 +0000</pubDate>
      
      <guid>https://www.bobdc.com/blog/osmpluswikidata/</guid>
      
      
      <description><div>Linking that data.</div><div>&lt;p&gt;&lt;a href=&#39;https://www.amnh.org/&#39;&gt; &lt;img width=&#39;200&#39; class=&#39;centered&#39; src=&#39;http://commons.wikimedia.org/wiki/Special:FilePath/American%20Museum%20of%20Natural%20History%20New%20York%20City.jpg&#39;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Last month in &lt;a href=&#34;../geosparqlgraphdb/&#34;&gt;GeoSPARQL queries on OSM Data in GraphDB&lt;/a&gt; I showed how to use SPARQL to retrieve triples about Manhattan museums from OpenStreetMap&amp;rsquo;s SPARQL endpoint. Then, after loading the triples into Ontotext&amp;rsquo;s free GraphDB triplestore, I showed how GraphDB&amp;rsquo;s support for the GeoSPARQL standard let me query for all the museums within a mile of the Museum of Modern Art. The OSM data doesn&amp;rsquo;t include pictures of the museums, but I mentioned that it does include the museum&amp;rsquo;s Wikidata URIs, so today we&amp;rsquo;ll see how to use those URIs to retrieve the images from Wikidata and connect them to the data retrieved from OSM. The result of this process includes the images you see here, each linking to the pictured museum&amp;rsquo;s website.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#39;http://www.folkartmuseum.org&#39;&gt; &lt;img width=&#39;200&#39;  class=&#39;centered&#39; src=&#39;http://commons.wikimedia.org/wiki/Special:FilePath/The%20American%20Folk%20Art%20Museum.JPG&#39;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Before I get to that I wanted to show a nice query that Ontotext founder Atanas Kiryakov showed me after I published that last blog entry. I had used curl to send a SPARQL CONSTRUCT query to OSM&amp;rsquo;s endpoint and save the triples in a local Turtle file. Once I had that file I loaded it into GraphDB and ran the query about museums near MoMA there. Atanas&amp;rsquo;s query uses the SPARQL SERVICE keyword to do the retrieval from within GraphDB so that all the steps that I did can happen with one query:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;PREFIX geof: &amp;lt;http://www.opengis.net/def/function/geosparql/&amp;gt;
PREFIX uom:  &amp;lt;http://www.opengis.net/def/uom/OGC/1.0/&amp;gt;
PREFIX osmt: &amp;lt;https://wiki.openstreetmap.org/wiki/Key:&amp;gt;
PREFIX osmm: &amp;lt;https://www.openstreetmap.org/meta/&amp;gt;

SELECT ?museum ?museumName ?metersFromMoma where {
    SERVICE &amp;lt;https://sophox.org/sparql&amp;gt; {
        ?moma   osmt:official_name &amp;#34;The Museum of Modern Art&amp;#34; ;
                osmm:loc ?momaLoc .
        ?museum osmt:tourism &amp;#34;museum&amp;#34; ;
                osmt:name ?museumName ;
                osmm:loc ?museumLoc .     }   
    BIND(round(geof:distance(?museumLoc,?momaLoc, uom:metre)) AS ?metersFromMoma)
    FILTER(?metersFromMoma &amp;lt; 1610)  # Only those less than a mile away.
    FILTER(?museum != ?moma)        # Don&amp;#39;t bother showing MoMA itself.   
} ORDER BY ?metersFromMoma
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;His query uses no features that are specific to GraphDB, so this query would work with any SPARQL engine that supports the GeoSPARQL standard—which in this case, means supporting that &lt;code&gt;geof:distance()&lt;/code&gt; function call. GraphDB was the first triplestore I found that had this support.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#39;https://momath.org/&#39;&gt; &lt;img width=&#39;200&#39; class=&#39;centered&#39; src=&#39;http://commons.wikimedia.org/wiki/Special:FilePath/National%20Museum%20of%20Mathematics%2011%20East%2026th%20Street%20entrance.jpg&#39;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;To get pictures of the retrieved museums, I created a variation on Atanas&amp;rsquo;s query that retrieved triples about the Manhattan museums and inserted them into the active local repository:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;PREFIX osmt: &amp;lt;https://wiki.openstreetmap.org/wiki/Key:&amp;gt;

INSERT { ?museum ?p ?o } WHERE
{
    SERVICE &amp;lt;https://sophox.org/sparql&amp;gt; {
    ?museum osmt:addr:city &amp;#34;New York&amp;#34;;
            osmt:tourism &amp;#34;museum&amp;#34;;
            ?p ?o .
    }   
}
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;The following query then showed me that each museum&amp;rsquo;s &lt;code&gt;osmt:wikidata&lt;/code&gt; value in that locally stored data was a Wikidata identifier such as &lt;code&gt;https://www.wikidata.org/wiki/Q636942&lt;/code&gt; for the International Center of Photography:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;PREFIX osmt: &amp;lt;https://wiki.openstreetmap.org/wiki/Key:&amp;gt;
SELECT * WHERE {
 ?museum osmt:addr:city &amp;#34;New York&amp;#34;;
         osmt:tourism &amp;#34;museum&amp;#34;;
         osmt:wikidata ?wikidataID . 
}
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;&lt;a href=&#39;https://www.icp.org/&#39;&gt; &lt;img width=&#39;200&#39; class=&#39;centered&#39; src=&#39;http://commons.wikimedia.org/wiki/Special:FilePath/Intnl%20Cenf%20Photog%2043%20jeh.JPG&#39;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;If you look at the &lt;a href=&#34;https://www.wikidata.org/wiki/Q636942&#34;&gt;Wikidata page for the ICP&lt;/a&gt; you&amp;rsquo;ll see that it includes a picture of it, and if you click on the &amp;ldquo;image&amp;rdquo; property name there you&amp;rsquo;ll see that this is property &lt;a href=&#34;https://www.wikidata.org/wiki/Property:P18&#34;&gt;P18&lt;/a&gt; in Wikidata. So, my next query took each of the Wikidata ID values of the museums and used the SERVICE keyword to send them off to Wikidata where it used them to retrieve image URLs, which it stored locally:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;PREFIX osmt: &amp;lt;https://wiki.openstreetmap.org/wiki/Key:&amp;gt;
PREFIX wdt: &amp;lt;http://www.wikidata.org/prop/direct/&amp;gt;

INSERT { ?museum wdt:P18 ?imageURL} WHERE {
  ?museum osmt:addr:city &amp;#34;New York&amp;#34;;
          osmt:tourism &amp;#34;museum&amp;#34;;
          osmt:wikidata ?wikidataID . 
  SERVICE &amp;lt;https://query.wikidata.org/sparql&amp;gt; {
    ?wikidataID wdt:P18 ?imageURL. 
  }
}
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;(As always, I first ran the query above with the CONSTRUCT keyword instead of INSERT just to make sure that I was properly asking for what I was trying to get.)&lt;/p&gt;
&lt;p&gt;&lt;a href=&#39;https://www.paleycenter.org/&#39;&gt; &lt;img width=&#39;200&#39; class=&#39;centered&#39; src=&#39;http://commons.wikimedia.org/wiki/Special:FilePath/Museum%20of%20Television%20and%20Radio%202006.jpg&#39;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The OSM data that I pulled included website URLs for most of the museums, so I queried the data I had aggregated from the two endpoints to list the websites and image URLs for museums within a mile of MoMA (actually, within 2 miles to give me a nicer choice of pictures to include here):&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;PREFIX osmt: &amp;lt;https://wiki.openstreetmap.org/wiki/Key:&amp;gt;
PREFIX geof: &amp;lt;http://www.opengis.net/def/function/geosparql/&amp;gt;
PREFIX uom:  &amp;lt;http://www.opengis.net/def/uom/OGC/1.0/&amp;gt;
PREFIX wdt: &amp;lt;http://www.wikidata.org/prop/direct/&amp;gt;
PREFIX osmm: &amp;lt;https://www.openstreetmap.org/meta/&amp;gt;

SELECT ?website ?imageURL WHERE {
   ?moma   osmt:official_name &amp;#34;The Museum of Modern Art&amp;#34; ;
           osmm:loc ?momaLoc .
   ?museum wdt:P18 ?imageURL ;
           osmt:website ?website ;
           osmm:loc ?museumLoc . 
   BIND(round(geof:distance(?museumLoc,?momaLoc, uom:metre)) AS ?metersFromMoma)
   FILTER(?metersFromMoma &amp;lt; 3220)  # Only those less than 2 miles away.
   FILTER(?museum != ?moma)   # Don&amp;#39;t bother showing MoMA itself.   
}
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;&lt;a href=&#39;https://www.guggenheim.org/new-york&#39;&gt; &lt;img class=&#39;centered&#39; width=&#39;200&#39; src=&#39;http://commons.wikimedia.org/wiki/Special:FilePath/NYC%20-%20Guggenheim%20Museum.jpg&#39;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;When displaying query results, GraphDB adds a handy &amp;ldquo;Download as&amp;rdquo; button, so I saved a tab-separated value version of that query&amp;rsquo;s results and used the ancient Linux utility &lt;a href=&#34;https://en.wikipedia.org/wiki/Sed&#34;&gt;sed&lt;/a&gt; to wrap the values in a bit of HTML:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;sed -E &amp;#34;s/(.+)\t&amp;lt;(.+)&amp;gt;/\&amp;lt;a href=&amp;#39;\1&amp;#39;&amp;gt; \
&amp;lt;img width=&amp;#39;200&amp;#39; src=&amp;#39;\2&amp;#39;\/&amp;gt;&amp;lt;\/a&amp;gt;/&amp;#34; query-result.tsv &amp;gt; temp.html
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;I could then copy the bits of HTML from the resulting file to the text file I&amp;rsquo;m typing now so that the images you see can be links to the home pages.&lt;/p&gt;
&lt;p&gt;If you&amp;rsquo;re reading this more than a few months after November of 2020 and the URLs of any of those images have changed, they&amp;rsquo;ll show up as broken links. With any application that uses data from remote sources, we have to consider various factors when making the decision whether to dynamically grab certain data when necessary or grab it once and store it locally for future use. Isn&amp;rsquo;t it nice how SPARQL and the widely-implemented open source and commercial RDF tools out there give us so many options when we make these decisions?&lt;/p&gt;
&lt;p&gt;&lt;a href=&#39;https://www.nyhistory.org/&#39;&gt; &lt;img width=&#39;200&#39; class=&#39;centered&#39; src=&#39;http://commons.wikimedia.org/wiki/Special:FilePath/NJ%20Loyalists%20in%20N-Y%20HS%20hall%20jeh.jpg&#39;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Have you ever pulled data from two different endpoints to answer a question that neither endpoint could answer by itself? Let me know at &lt;a href=&#34;https://twitter.com/bobdc&#34;&gt;@bobdc&lt;/a&gt;.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2020">2020</category>
      
      <category domain="https://www.bobdc.com//categories/openstreetmap">OpenStreetMap</category>
      
      <category domain="https://www.bobdc.com//categories/wikidata">Wikidata</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
      <category domain="https://www.bobdc.com//categories/rdf">RDF</category>
      
    </item>
    
    <item>
      <title>GeoSPARQL queries on OSM Data in GraphDB</title>
      <link>https://www.bobdc.com/blog/geosparqlgraphdb/</link>
      <pubDate>Sun, 25 Oct 2020 11:40:00 +0000</pubDate>
      
      <guid>https://www.bobdc.com/blog/geosparqlgraphdb/</guid>
      
      
      <description><div>Or, Querying geospatial data with SPARQL Part 2</div><div>&lt;img id=&#34;idm45478314451696&#34; src=&#34;https://www.bobdc.com/img/main/OSMSPARQL.png&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;OSM and SPARQL logo&#34;/&gt;
&lt;p&gt;Over a year ago, in &lt;a href=&#34;../geosparql1/&#34;&gt;Querying geospatial data with SPARQL: Part 1&lt;/a&gt;, I described my dream of pulling geospatial data down from Open Street Map, loading it into a local triplestore, and then querying it with queries that conformed to the GeoSPARQL standard. At the time, I tried several triplestores and data sources and never quite got there. When I tried it recently with Ontotext&amp;rsquo;s free version of &lt;a href=&#34;https://www.ontotext.com/products/graphdb/&#34;&gt;GraphDB&lt;/a&gt;, it all turned out to be quite easy.&lt;/p&gt;
&lt;p&gt;For some background, read that blog entry up through the paragraph beginning &amp;ldquo;The geosparql.org website has some preloaded data&amp;hellip;&amp;rdquo; The rest of the entry describes my only somewhat successful attempts to do geospatial queries with Blazegraph and Parliament and how I looked forward to Apache Jena&amp;rsquo;s growing GeoSPARQL support. (A few years earlier I wrote a bit about GeoSPARQL in &lt;a href=&#34;../visualizing-dbpedia-geographic/&#34;&gt;Visualizing DBpedia geographic data with some help from SPARQL&lt;/a&gt;.)&lt;/p&gt;
&lt;h1 id=&#34;graphdb&#34;&gt;GraphDB&lt;/h1&gt;
&lt;p&gt;The GraphDB page that I link to above includes a chart that shows that the free version does plenty, and most importantly, doesn&amp;rsquo;t expire or limit the amount of data you load. Once I downloaded it, installed it, started it up, and had it running at http://localhost:7200, its web-based interface had a tutorial to  &amp;ldquo;(1) Create a repository (2) Load a sample dataset (3) Run a SPARQL query&amp;rdquo; so I went through those steps. When you use GraphDB&amp;rsquo;s form to create a new repository, you&amp;rsquo;ll see that the &amp;ldquo;Rulesets&amp;rdquo; field has a default value of &amp;ldquo;RDFS-Plus (Optimized)&amp;rdquo; and offers 10 other choices, including several OWL choices and an &amp;ldquo;Upload custom ruleset&amp;rdquo; option. The form also includes a &amp;ldquo;Supports SHACL validation&amp;rdquo; checkbox and other options, so these were all great to see.&lt;/p&gt;
&lt;p&gt;Before trying GraphDB with geospatial data I wanted to test out its support for inferencing and for RDF* and SPARQL*. I had a nice short example ready to go at my blog entry &lt;a href=&#34;../rdf-and-sparql&#34;&gt;RDF* and SPARQL*: Reification can be pretty cool&lt;/a&gt; after the paragraph beginning &amp;ldquo;Blazegraph lets you do inferencing, so I couldn’t resist mixing that with RDF* and SPARQL*.&amp;rdquo; Treating two triples as resources themselves (thanks, RDF*!), the sample data in that example makes one triple an instance of &lt;code&gt;d:Class2&lt;/code&gt; and the other an instance of &lt;code&gt;d:Class3&lt;/code&gt;, and then it makes both of those classes subclasses of &lt;code&gt;d:Class1&lt;/code&gt; without creating any instances of &lt;code&gt;d:Class1&lt;/code&gt;. The query that follows this sample data doesn&amp;rsquo;t just ask for the instances of &lt;code&gt;d:Class1&lt;/code&gt;, which GraphDB&amp;rsquo;s RDFS-Plus support will find in its subclasses; it asks for the subject, predicate, and object of each of these instances. (Thanks, SPARQL*!) It all worked fine in GraphDB.&lt;/p&gt;
&lt;h1 id=&#34;using-geosparql-with-graphdb&#34;&gt;Using GeoSPARQL with GraphDB&lt;/h1&gt;
&lt;p&gt;In my &amp;ldquo;Part 1&amp;rdquo; blog entry I described how a database manager&amp;rsquo;s ability to deal properly with geospatial data usually requires an add-on. GraphDB does use what they call a plugin for this, but there&amp;rsquo;s no need to download and plug it in yourself; it&amp;rsquo;s already in GraphDB and you turn it on by simply adding a triple to the repository setting &lt;code&gt;geoSparql:enabled&lt;/code&gt; to True for some resource as described in &lt;a href=&#34;https://graphdb.ontotext.com/documentation/9.4/free/geosparql-support.html#usage&#34;&gt;their GeoSPARQL documentation&lt;/a&gt;. I got all of that page&amp;rsquo;s &lt;a href=&#34;https://graphdb.ontotext.com/documentation/9.4/free/geosparql-support.html#geosparql-examples&#34;&gt;GeoSPARQL examples&lt;/a&gt; to work easily enough after loading the data that it pointed to.&lt;/p&gt;
&lt;p&gt;In Part 1 I also wrote &amp;ldquo;Because I just love converting triples from one namespace to another so that I can use new tools and standards with them, I hoped to get some OSM triples and convert them to the right namespaces to enable geospatial queries on them using a local triplestore.&amp;rdquo; Having gotten the GeoSPARQL examples mentioned above to work in GraphDB I had a model to use when converting the OSM triples, and then I got a nice surprise: I didn&amp;rsquo;t have to convert them!&lt;/p&gt;
&lt;p&gt;I pulled all the triples about museums in &amp;ldquo;New York&amp;rdquo; from the Open Street Map SPARQL endpoint with the following simple query:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;PREFIX osmt: &amp;lt;https://wiki.openstreetmap.org/wiki/Key:&amp;gt;

CONSTRUCT { ?museum ?p ?o }
WHERE {
  ?museum osmt:addr:city &amp;#34;New York&amp;#34;;
          osmt:tourism &amp;#34;museum&amp;#34;;
          ?p ?o .
}
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;(Despite requesting &amp;ldquo;New York&amp;rdquo; museums, the results all seemed to be in Manhattan. An &lt;code&gt;osmt:addr:city&lt;/code&gt; value of &amp;ldquo;Brooklyn&amp;rdquo; got other museums.)&lt;/p&gt;
&lt;p&gt;After storing that query in the file &lt;code&gt;manhattanMuseums.rq&lt;/code&gt;, the following curl command (split at the \ for display here) retrieved the triples and stored them in the file &lt;code&gt;manhattanMuseums.ttl&lt;/code&gt;:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;curl --data-urlencode &amp;#34;query@manhattanMuseums.rq&amp;#34; \
 https://sophox.org/sparql -H &amp;#34;Accept: text/turtle&amp;#34; &amp;gt; manhattanMuseums.ttl
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;(On October 25th when I first published this I thought that their SPARQL endpoint was down, but it turned out that my re-testing of the curl call was failing because of my own dumb typo.)&lt;/p&gt;
&lt;p&gt;Here are two triples that it retrieved about one museum that I highly recommend:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;osmnode:368061660 osmm:loc &amp;#34;Point(-73.9900266 40.7187837)&amp;#34;^^geo:wktLiteral ;
	&amp;lt;https://wiki.openstreetmap.org/wiki/Key:name&amp;gt; &amp;#34;Lower East Side Tenement Museum&amp;#34; .
&lt;/code&gt;&lt;/pre&gt;&lt;h1 id=&#34;why-no-need-to-convert-the-data&#34;&gt;Why no need to convert the data?&lt;/h1&gt;
&lt;p&gt;Here is the cool part that meant that I didn&amp;rsquo;t have to convert any triples before loading &lt;code&gt;manhattanMuseums.ttl&lt;/code&gt; into GraphDB and issuing standard GeoSPARQL queries on it: while SPARQL has a perfectly decent selection of &lt;a href=&#34;https://www.w3.org/TR/sparql11-query/#operandDataTypes&#34;&gt;data types&lt;/a&gt;, you can define your own, and section 8.5.1 of the &lt;a href=&#34;https://www.ogc.org/standards/geosparql&#34;&gt;GeoSPARQL specification&lt;/a&gt; defines the &lt;a href=&#34;http://www.opengis.net/ont/geosparql#wktLiteral&#34;&gt;http://www.opengis.net/ont/geosparql#wktLiteral&lt;/a&gt; datatype for specifying geospatial coordinates. As you can see in the Tenement Museum example above, the OSM triples use that type, so I was all set.&lt;/p&gt;
&lt;p&gt;In Part 1 I also wrote &amp;ldquo;A proper geospatial query for something like all the museums within a mile of the &lt;a href=&#34;https://www.moma.org/&#34;&gt;Museum of Modern Art&lt;/a&gt; is more complicated because of the effect of the earth’s curvature.&amp;rdquo; It&amp;rsquo;s not so complicated with proper GeoSPARQL support because  I can call the &lt;code&gt;geof:distance&lt;/code&gt; function, which is not supported by Open Street Map&amp;rsquo;s SPARQL endpoint but is supported by GraphDB as part of its GeoSPARQL support. I loaded &lt;code&gt;manhattanMuseums.ttl&lt;/code&gt; into GraphDB and ran the following query:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;
PREFIX geof: &amp;lt;http://www.opengis.net/def/function/geosparql/&amp;gt;
PREFIX uom:  &amp;lt;http://www.opengis.net/def/uom/OGC/1.0/&amp;gt;
PREFIX osmt: &amp;lt;https://wiki.openstreetmap.org/wiki/Key:&amp;gt;
PREFIX osmm: &amp;lt;https://www.openstreetmap.org/meta/&amp;gt; 

SELECT ?museumName ?metersFromMoma
WHERE  {
   ?moma   osmt:official_name &amp;#34;The Museum of Modern Art&amp;#34; ;
           osmm:loc ?momaLoc .
   ?museum osmt:tourism &amp;#34;museum&amp;#34; ;
           osmt:name ?museumName ;
           osmm:loc ?museumLoc . 
    # Find the distance from each museum to MoMA and save it
    BIND(round(geof:distance(?museumLoc,?momaLoc, uom:metre)) 
        AS ?metersFromMoma)
    FILTER(?metersFromMoma &amp;lt; 1610)  # Only those less than a mile away.
    FILTER(?museum != ?moma)        # Don&amp;#39;t bother showing MoMA itself.
}
ORDER BY ?metersFromMoma
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;(I tried pulling address data as well, but not all museums had that, especially the ones that were close to MoMA.) With that query pasted into the file &lt;code&gt;museumsNearMoma.rq&lt;/code&gt;, the following pulled a TSV version of the results from my locally running copy of GraphDB&amp;hellip;&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;curl --header &amp;#34;Accept: text/tab-separated-values&amp;#34; --data-urlencode \
  &amp;#34;query@museumsNearMoma.rq&amp;#34; http://localhost:7200/repositories/OSMManhattanData
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;so that I could paste them here:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;?museumName	?metersFromMoma
Paley Center for Media	88
Museum of Arts and Design	766
International Center of Photography	827
National Geographic Encounter - Ocean Odyssey	925
American Folk Art Museum	1350
Frick Collection	1399
Asia Society	1450
Mount Vernon Hotel Museum	1503
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;GeoSPARQL has a lot more for GIS geeks than the &lt;code&gt;geof:distance&lt;/code&gt; function, so check out the spec for that. Also, after I wrote the first draft of this blog entry, I found out &lt;a href=&#34;https://twitter.com/opengeospatial/status/1313836646614343688&#34;&gt;on Twitter&lt;/a&gt; about a new document from the Open Geospatial Consortium, the standards group responsible for GeoSPARQL: &lt;a href=&#34;http://docs.ogc.org/wp/19-078r1/19-078r1.html&#34;&gt;OGC Benefits of Representing Spatial Data Using Semantic and Graph Technologies&lt;/a&gt;. It lists nice use cases that show the benefits of semantic technologies, describes the use cases addressed by GeoSPARQL, and proposes some extensions to that specification.&lt;/p&gt;
&lt;p&gt;There is also an excellent Linked Data/Knowledge Graph angle to my example above, especially for &lt;a href=&#34;https://en.wikipedia.org/wiki/GLAM_(industry_sector)&#34;&gt;GLAM&lt;/a&gt; researchers: because the OSM data includes triples like this additional one about the Tenement Museum,&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;osmnode:368061660 &amp;lt;https://wiki.openstreetmap.org/wiki/Key:wikidata&amp;gt; wd:Q901533 .
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;you can connect up the geospatial data in OSM with triples from Wikidata to aggregate even more cool data about the entities in OSM. And, you can do it all in a local, free triplestore!&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2020">2020</category>
      
      <category domain="https://www.bobdc.com//categories/openstreetmap">OpenStreetMap</category>
      
      <category domain="https://www.bobdc.com//categories/rdf">RDF</category>
      
      <category domain="https://www.bobdc.com//categories/geosparql">GeoSPARQL</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
      <category domain="https://www.bobdc.com//categories/gis">gis</category>
      
      <category domain="https://www.bobdc.com//categories/linked-data">linked-data</category>
      
    </item>
    
    <item>
      <title>Using SPARQL do to quick and dirty joins of CSV data</title>
      <link>https://www.bobdc.com/blog/sparqlcsvjoin/</link>
      <pubDate>Sun, 27 Sep 2020 12:03:00 +0000</pubDate>
      
      <guid>https://www.bobdc.com/blog/sparqlcsvjoin/</guid>
      
      
      <description><div>Or data with other delimiters.</div><div>&lt;img src=&#34;https://www.bobdc.com/img/main/join.png&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;SPARQL and CSV logos&#34;/&gt;
&lt;p&gt;I recently needed to join two datasets at work, cross-referencing one property in a spreadsheet with another in a JSON file. I used a combination of &lt;a href=&#34;https://stedolan.github.io/jq/&#34;&gt;&lt;code&gt;jq&lt;/code&gt;&lt;/a&gt;, &lt;code&gt;perl&lt;/code&gt;, &lt;code&gt;sort&lt;/code&gt;, &lt;a href=&#34;https://www.man7.org/linux/man-pages/man1/uniq.1.html&#34;&gt;&lt;code&gt;uniq&lt;/code&gt;&lt;/a&gt;, and&amp;hellip; I won&amp;rsquo;t go into details.&lt;/p&gt;
&lt;p&gt;I wondered later if it would have been easier if I had used tarql (which I&amp;rsquo;ve blogged about &lt;a href=&#34;../tarql/&#34;&gt;before&lt;/a&gt;) to convert it all to RDF and then did the join with a SPARQL query. It turned out to be quite easy. A single SPARQL conversion query to run with tarql, after changing one line per dataset that I applied it to, was all I needed to create the RDF that let me do all the joins I wanted with additional simple queries. This will be even easier in the future as I re-use the conversion query with other datasets that I want to join.&lt;/p&gt;
&lt;p&gt;To demonstrate this I will show how I did this to join three CSV files: a list of student names and IDs, a list of course names and IDs, and a list of student and course IDs that shows who took which courses.&lt;/p&gt;
&lt;p&gt;I didn&amp;rsquo;t include field name headers in the data files because tarql would use them as property names and I wanted to make my scripts more generic by letting tarql use its default generic names of &lt;code&gt;a&lt;/code&gt;, &lt;code&gt;b&lt;/code&gt;, &lt;code&gt;c&lt;/code&gt;, and so on through the alphabet for dataset property names.&lt;/p&gt;
&lt;p&gt;Here are the data files, starting with &lt;code&gt;students.txt&lt;/code&gt;:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;s1001,Craig Ellis
s1002,Jane Jones
s1003,Richard Mutt
s1004,Cindy Marshall
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;courses.txt:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;c2001,Linear Algebra I
c2002,Impressionists and Post-Impressionists
c2003,Intro to Theravada Buddhism
c2004,Democracy in the Gilded Age
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;studentCourse.txt:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;s1002,c2001
s1002,c2004
s1003,c2001
s1003,c2004
s1004,c2001
s1004,c2003
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;I had two goals that would require joins: to list the student names next to the names of the courses they took, without showing any IDs, and then to list the course names with the number of students enrolled in each. The first step was to convert the delimited files to RDF with tarql; I could then write short queries to fulfill the two goals.&lt;/p&gt;
&lt;p&gt;I used the query below to convert the &lt;code&gt;students.txt&lt;/code&gt; file to RDF. The &lt;code&gt;?u a t:student&lt;/code&gt; triple pattern in the CONSTRUCT clause creates a triple saying &amp;ldquo;this row of data represents an instance of this class&amp;rdquo; so that the join queries will know which data represents what kinds of things. Modifying this script to handle other data types merely requires changing the object of this one triple pattern. For example, the query that converts &lt;code&gt;courses.txt&lt;/code&gt; to Turtle has &lt;code&gt;t:course&lt;/code&gt; in that triple pattern instead of &lt;code&gt;t:student&lt;/code&gt;.&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;# constructAllStudents.rq
PREFIX t: &amp;lt;http://learningsparql.com/ns/tarql/&amp;gt;

CONSTRUCT {
   ?u a t:student .

   ?u t:a ?a .
   ?u t:b ?b .
   ?u t:c ?c .
   ?u t:d ?d .
   ?u t:e ?e .
   ?u t:f ?f .
   # As many as you want. Can be more than the number of input columns.
} 
 WHERE {
   BIND (UUID() AS ?u)
}
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;This query tells tarql to use that query to create the Turtle file for that dataset:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;  tarql -H constructAllStudents.rq students.txt &amp;gt; students.ttl
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;I used similar command lines with the slight variations described above on that CONSTRUCT query to create &lt;code&gt;courses.ttl&lt;/code&gt; and &lt;code&gt;studentCourse.ttl&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Here is the SPARQL query that uses the data from those three Turtle files to join the student names with the course names. Your JOIN query will look a little different from mine, but not too different, because the use of properties such as &lt;code&gt;t:a&lt;/code&gt;, &lt;code&gt;t:b&lt;/code&gt;, and &lt;code&gt;t:c&lt;/code&gt; that correspond to tarql variables like &lt;code&gt;?a&lt;/code&gt; and &lt;code&gt;?b&lt;/code&gt; (instead of more specific names from a data file header line) let me make the query more generic.&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;# joinThem.rq

PREFIX t: &amp;lt;http://learningsparql.com/ns/tarql/&amp;gt;

SELECT ?studentName ?courseName WHERE {

   ?student a t:student ;
            t:a ?studentID ;
            t:b ?studentName .

   ?course a t:course ; 
            t:a ?courseID ;
            t:b ?courseName .
   
   ?class a t:studentCourse ; 
           t:a ?studentID ;
           t:b ?courseID .
}
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;The &lt;a href=&#34;https://jena.apache.org/documentation/query/index.html&#34;&gt;arq&lt;/a&gt; SPARQL processor&amp;rsquo;s ability to accept more than one &lt;code&gt;-data&lt;/code&gt; argument let me use a single command to run this join query with the three Turtle files as input:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;arq --query joinThem.rq -data courses.ttl -data students.ttl -data studentCourse.ttl
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Here is the result:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;----------------------------------------------------
| studentName      | courseName                    |
====================================================
| &amp;#34;Richard Mutt&amp;#34;   | &amp;#34;Linear Algebra I&amp;#34;            |
| &amp;#34;Richard Mutt&amp;#34;   | &amp;#34;Democracy in the Gilded Age&amp;#34; |
| &amp;#34;Jane Jones&amp;#34;     | &amp;#34;Linear Algebra I&amp;#34;            |
| &amp;#34;Jane Jones&amp;#34;     | &amp;#34;Democracy in the Gilded Age&amp;#34; |
| &amp;#34;Cindy Marshall&amp;#34; | &amp;#34;Linear Algebra I&amp;#34;            |
| &amp;#34;Cindy Marshall&amp;#34; | &amp;#34;Intro to Theravada Buddhism&amp;#34; |
----------------------------------------------------
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;This data enables joins for other purposes as well. This next query joins the data and then shows course names with the number of students enrolled in each one:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;# coursePopularity.rq

PREFIX t: &amp;lt;http://learningsparql.com/ns/tarql/&amp;gt;

SELECT ?courseName (COUNT(*) as ?students)
WHERE {

   ?student a t:student ;
            t:a ?studentID ;
            t:b ?studentName .

   ?course a t:course ; 
            t:a ?courseID ;
            t:b ?courseName .
   
   ?class a t:studentCourse ; 
           t:a ?studentID ;
           t:b ?courseID .
}

GROUP BY ?courseName
ORDER BY DESC(?students)
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;The command to run this just substitutes the new query for the previous one on the command line used earlier:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;arq --query coursePopularity.rq --data courses.ttl -data students.ttl -data studentCourse.ttl
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;And here is the result:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;--------------------------------------------
| courseName                    | students |
============================================
| &amp;#34;Linear Algebra I&amp;#34;            | 3        |
| &amp;#34;Democracy in the Gilded Age&amp;#34; | 2        |
| &amp;#34;Intro to Theravada Buddhism&amp;#34; | 1        |
--------------------------------------------
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;I want to reiterate that the trick described here is strictly for quick-and-dirty joins. Calling the first property in all the datasets &lt;code&gt;a&lt;/code&gt;, the second one &lt;code&gt;b&lt;/code&gt;, the third &lt;code&gt;c&lt;/code&gt;, and so on is just a convenience to reduce the amount of query editing needed. If I was going to convert data like this for long-term usage I would use more descriptive names for each one (maybe even take advantage of property names in header rows) and add some more modeling triples that define classes, properties, and their relationships.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;m starting to think of &lt;code&gt;tarql&lt;/code&gt; and &lt;code&gt;arq&lt;/code&gt; as members of the Linux command toolbox that includes venerable old tools like &lt;code&gt;sort&lt;/code&gt; and &lt;code&gt;uniq&lt;/code&gt; as well as recent tools such as &lt;code&gt;jq&lt;/code&gt; and &lt;code&gt;xmllint&lt;/code&gt; that I&amp;rsquo;m seeing in more Linux standard distributions. (There&amp;rsquo;s actually one called &lt;a href=&#34;https://linux.die.net/man/1/join&#34;&gt;&lt;code&gt;join&lt;/code&gt;&lt;/a&gt; that can do simple joins with two files but no more than two.) I can mix and match different combinations of these tools to perform many different tasks with many different kinds of data with no need to crank up some server or memory-intensive GUI tool.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Cropped photo of &amp;ldquo;JOIN&amp;rdquo; sign by Marcel van Schooten  via &lt;a href=&#34;https://www.flickr.com/photos/mvs/2726291360/in/photolist-59UXrs-AjAhi9-2j6NNxw-2if6SEK-2hrE2vk-59QJqB-59UXB1-59QJfD-59QJ4n-59UXd7-59UXaG-59UXoU-2imbSbT-59UXvQ-59QJm2-59UXmE-BH1Vtx-ebmbsH-6VfRaa-9f8Ckf-s3k6my-LcZE5m-S9L8dG-izJEJ9-9UwU19-P2nxMK-dUELft-8ChTRX-2hkY38S-2ixdWLS-opqjdc-69YrFP-2g3sJnF-22ekDfV-6FNeta-UqocvK-XrmL5J-2iN2wLZ-VrLJHd-F7k7jf-2hT2PXk-2djTWk8-244sMY8-7MNL4-svpQAK-Y5Ckcq-9TRAb1-iB35PL-5daoZo-XW1oEZ&#34;&gt;flickr&lt;/a&gt; &lt;a href=&#34;https://creativecommons.org/licenses/by/2.0/&#34;&gt;(CC BY 2.0)&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2020">2020</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
      <category domain="https://www.bobdc.com//categories/csv">CSV</category>
      
    </item>
    
    <item>
      <title>Generating MODS XML from RDF with Go templates</title>
      <link>https://www.bobdc.com/blog/rdf2modsxml/</link>
      <pubDate>Sun, 30 Aug 2020 12:20:00 +0000</pubDate>
      
      <guid>https://www.bobdc.com/blog/rdf2modsxml/</guid>
      
      
      <description><div>Using a built-in Go(lang) feature to drive an RDF application.</div><div>&lt;img src=&#34;https://www.bobdc.com/img/main/rdfgo.png&#34; width=&#34;200px&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;RDF, MODS, and Go logos&#34;/&gt;
&lt;p&gt;I had heard that &lt;a href=&#34;https://golang.org/&#34;&gt;Go&lt;/a&gt; (also known as &amp;ldquo;golang&amp;rdquo;)  was an &lt;a href=&#34;http://pypl.github.io/PYPL.html&#34;&gt;increasingly popular&lt;/a&gt; newish programming language before I &lt;a href=&#34;http://www.bobdc.com/blog/changing-my-blogs-domain-name/&#34;&gt;migrated my blog&lt;/a&gt; from being generated by handmade XSLT scripts on snee.com to using the &lt;a href=&#34;https://gohugo.io/&#34;&gt;Hugo&lt;/a&gt; platform to generate it on bobdc.com. Hugo is written in Go, which was invented at Google (get it?) by three people, two of whom had contributed to the development of C, Unix, and important related technology at Bell Labs. Go provides an excellent basis for a website generation system because, although it prides itself on a fairly minimal core feature set, it provides templating of output with its standard libraries. As I wrote when I described the website migration, I never had to learn the programming language to get the website up and running, but I tweaked many &lt;a href=&#34;https://gohugo.io/templates&#34;&gt;Hugo templates&lt;/a&gt; to customized the website&amp;rsquo;s appearance.&lt;/p&gt;
&lt;p&gt;This made me wonder whether Go and its templates would be a good way to generate content from RDF. Short answer: yes. After learning some Go I wrote a program that reads in triples, loads them into an appropriate data structure, and then hands that off to a template for output. Once I&amp;rsquo;d written the program, most of my work consisted of building up the template text file with little need to go back and tweak and recompile the Go code.&lt;/p&gt;
&lt;p&gt;My demo project was to convert journal publishing RDF metadata into &lt;a href=&#34;https://www.loc.gov/standards/mods/&#34;&gt;MODS&lt;/a&gt; XML. The Metadata Object Description Schema standard is hosted at the Library of Congress and is very popular for library metadata. It has &lt;a href=&#34;https://www.loc.gov/standards/mods/modsrdf-primer.html&#34;&gt;its own RDF vocabulary&lt;/a&gt;, but a &lt;a href=&#34;https://wiki.lyrasis.org/display/samvera/MODS+and+RDF+Descriptive+Metadata+Subgroup&#34;&gt;MODS to RDF Working Group&lt;/a&gt; decided to &amp;ldquo;consider a range of widely-adopted RDF namespaces, rather than pursuing a straight XML-to-RDF approach using the MODS RDF Ontology or proposing a new formal ontology&amp;rdquo;. This quote comes from their &amp;ldquo;MODS to RDF Mapping Recommendations&amp;rdquo; (&lt;a href=&#34;https://t.co/L20MBi0BBs&#34;&gt;pdf&lt;/a&gt;), which describes how to use Dublin Core, Library of Congress, schema.org, Europeana, and other RDF vocabularies to express MODS metadata.&lt;/p&gt;
&lt;p&gt;This idea appealed to me because I see great potential  in modeling relationships between the rich metadata standards of the publishing and library worlds in order to help people take better advantage of combinations of these standards. Using RDFS (or more high-powered modeling tools such as OWL or SHACL, but maybe just RDFS), a  single system can more easily support multiple standards because it knows that if another system expects a &lt;a href=&#34;https://id.loc.gov/ontologies/bibframe.html#c_Title&#34;&gt;BIBFRAME title&lt;/a&gt; but the host system stores book titles as &lt;a href=&#34;https://www.dublincore.org/specifications/dublin-core/dcmi-terms/#http://purl.org/dc/terms/title&#34;&gt;Dublin Core titles&lt;/a&gt;, RDFS triples defining these as equivalent can help to automate the delivery of whatever the destination system wants.&lt;/p&gt;
&lt;p&gt;In the publishing world, metadata delivery is more likely to be in XML because the content itself is often in XML. (Many people forget: that&amp;rsquo;s &lt;a href=&#34;http://www.bobdc.com/blog/a-brief-opinionated-history-of/&#34;&gt;why XML was invented&lt;/a&gt;.) So, after RDF-based tools reap the benefits of the modeling described above, eventual delivery often needs to be in XML.&lt;/p&gt;
&lt;p&gt;Shortly after I started this I described my plan for using Go templates to my daughter and she told me about the &lt;a href=&#34;https://jinja.palletsprojects.com/en/2.11.x/&#34;&gt;Jinja Python template library&lt;/a&gt;. Using that would have made this all much easier for me, because I already know Python and a &lt;a href=&#34;https://pypi.org/project/rdflib/&#34;&gt;nice RDF library for it&lt;/a&gt;, but I wanted to try it with Go specifically because templating is a standard part of the language as opposed to a community add-on library. (For embedding RDF values in templated XML, slicker proprietary alternatives to my novice Go coding are also available from MarkLogic and TopQuadrant.)&lt;/p&gt;
&lt;h1 id=&#34;the-goal&#34;&gt;The goal&lt;/h1&gt;
&lt;p&gt;For converting semi-realistic publishing RDF to MODS XML I took &lt;a href=&#34;https://www.loc.gov/standards/mods/userguide/examples.html#journal_article&#34;&gt;this sample journal XML&lt;/a&gt; from the MODS website and then wrote out an RDF version of that metadata using the MODS to RDF Mapping Recommendations mentioned above. I copied all of these triples with a new subject and slight changes to their objects so that they could play the role of metadata about a dummy second document; this let me test whether my program could output MODS data for multiple documents. Finally, I used &lt;a href=&#34;http://xmlsoft.org/xmllint.html&#34;&gt;&lt;code&gt;xmllint&lt;/code&gt;&lt;/a&gt; to validate the result against the &lt;a href=&#34;http://www.loc.gov/standards/mods/v3/mods-3-3.xsd&#34;&gt;MODS XML Schema&lt;/a&gt; to ensure that the result was valid MODS XML.&lt;/p&gt;
&lt;h1 id=&#34;writing-and-running-the-go-code&#34;&gt;Writing and running the Go code&lt;/h1&gt;
&lt;p&gt;Many resources for learning Go are available. &lt;a href=&#34;https://tour.golang.org/welcome/1&#34;&gt;This tour&lt;/a&gt; was fine for me. I postponed reading &lt;a href=&#34;https://golang.org/doc/code.html&#34;&gt;How to Write Go Code&lt;/a&gt; because it looked to be more about large-scale systems, but I should have read it earlier on to better understand how to import the RDF library (or, in Go terminology, &amp;ldquo;package&amp;rdquo;) that  I used.&lt;/p&gt;
&lt;p&gt;Somewhere in the middle of this project I started reading the book &lt;a href=&#34;http://www.gopl.io/&#34;&gt;The Go Programming Language&lt;/a&gt;, which was co-authored by Brian Kernighan—another Bell Labs alum with plenty of impressive UNIX-related accomplishments to his credit, including co-authoring the seminal book “The C Programming Language” with Dennis Richie. (I had to look up that book&amp;rsquo;s title just now because everyone has referred to it as “The K&amp;amp;R” since it was published in 1978.) I’d been considering a return trip to the K&amp;amp;R recently but don’t need to now because the Go book is more or less the modern version of that book for a modern version of C. The book&amp;rsquo;s website includes the complete tutorial chapter, and I highly recommend it. I am tempted to put this wonderful line from the tutorial on my blog’s template so that it shows up underneath all of my blog posts: “In the interests of keeping code samples to a reasonable size, our early examples are intentionally somewhat cavalier about error handling”.&lt;/p&gt;
&lt;p&gt;Having written the original SQL page for the &lt;a href=&#34;https://learnxinyminutes.com/&#34;&gt;Learn X in Y minutes&lt;/a&gt; site, I should have thought to look at its &lt;a href=&#34;https://learnxinyminutes.com/docs/go/&#34;&gt;Go page&lt;/a&gt; sooner. It&amp;rsquo;s a concise, handy resource that gives you a broad tour of the language quickly.&lt;/p&gt;
&lt;p&gt;Go has clear roots in C. It&amp;rsquo;s easier, though, with no pointer arithmetic or malloc memory management to worry about. I was surprised at how often I wanted to make it do something I hadn&amp;rsquo;t done with it before and got it to work by the second try.&lt;/p&gt;
&lt;p&gt;The &lt;a href=&#34;https://golang.org/pkg/&#34;&gt;standard Go packages&lt;/a&gt; include one for creating text templates and one for HTML templates. The HTML one includes some extra bits to protect against code injection and does not pass along &lt;code&gt;&amp;lt;!--&lt;/code&gt; HTML and XML comments &lt;code&gt;--&amp;gt;&lt;/code&gt; in the template to the output. I didn&amp;rsquo;t notice any other differences and found the HTML one to be fine for generating MODS XML.&lt;/p&gt;
&lt;p&gt;While Go packages are available to ease the querying of SPARQL endpoints, there is currently no Go equivalent of Python&amp;rsquo;s RDFlib, which has its own SPARQL engine. I used the &lt;a href=&#34;https://github.com/knakk/rdf&#34;&gt;knakk/rdf&lt;/a&gt; Go package to read the triples out of the disk files that provide my program&amp;rsquo;s input. (As a bonus, this package can read several different RDF serializations.) My program was really just a variation on the &lt;a href=&#34;https://github.com/knakk/rdf2rdf/blob/master/rdf2rdf.go&#34;&gt;sample program&lt;/a&gt; that came with that package, so there are parts of my Go program that I don&amp;rsquo;t completely understand, but hey, it works. This package does have &lt;a href=&#34;https://godoc.org/github.com/knakk/rdf&#34;&gt;godoc documentation&lt;/a&gt; available.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://goinbigdata.com/example-of-using-templates-in-golang/&#34;&gt;This Yury Pitsishin blog post&lt;/a&gt; was a good way to get started with templates. You can define the templates within the Go source code but will more typically put them in a separate document. This offers the benefit of letting you tune the output without recompiling the conversion code. The Go code&amp;rsquo;s template definition uses the &lt;code&gt;Template.ParseFiles&lt;/code&gt; method to specify the external file to use as a template, and then in the code the defined template&amp;rsquo;s &lt;code&gt;Execute&lt;/code&gt; method passes along a data structure that has been populated in the program to use with the output template.&lt;/p&gt;
&lt;p&gt;(Because this blog entry is getting long and my current Go skills are not something to show off, I&amp;rsquo;m not including my sample code here. You can find the Go code, template, sample input, and sample output &lt;a href=&#34;https://github.com/bobdc/misc/tree/master/rdf2modsxml&#34;&gt;on github&lt;/a&gt;.)&lt;/p&gt;
&lt;p&gt;The data structure that my program passes to my template is a map of maps that I called &lt;code&gt;docsMetadata&lt;/code&gt; because it stores the metadata for a set of documents. A Go map is like a Python dictionary, letting you store and retrieve things using a key. The &lt;code&gt;docsMetadata&lt;/code&gt; keys are subject URIs—three different subject URIs used in a given &lt;code&gt;docsMetadata&lt;/code&gt; instance would be specifying metadata for three different documents—and the things stored with them are maps whose keys are predicate URIs. Those keys give access to simple arrays (well, actually, &amp;ldquo;slices&amp;rdquo;, which are Go&amp;rsquo;s dynamic version of arrays) so that I can store more than one value for a given subject-predicate combination such as the following two triples from my sample input:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;&amp;lt;https://example.org/objects/1&amp;gt; dce:subject &amp;#34;College librarians--Recruiting&amp;#34; .
&amp;lt;https://example.org/objects/1&amp;gt; dce:subject &amp;#34;College librarians--United States&amp;#34; .
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Outside of the &lt;code&gt;dce:subject&lt;/code&gt; values my demo rarely uses any slice entries beyond the first one. The MODS schema does allow multiple values for many of its metadata properties, so this made a simple way to store more than one publisher address, media type, or other property if necessary. Also, the knakk RDF package does let you check whether a triple&amp;rsquo;s object is a URI, a literal value, or a blank node, so I could have done some fancier RDF processing of those. As a proof of concept demo I thought it best to just treat them all as strings for now.&lt;/p&gt;
&lt;h1 id=&#34;developing-the-mods-xml-template&#34;&gt;Developing the MODS XML template&lt;/h1&gt;
&lt;p&gt;You can put pretty much any text you want in a Go template. Nested pairs of curly braces store codes that give instructions to the compiled Go program about what to do with the data being passed in; often, this means inserting a particular component of that data structure. If the program passes an &lt;code&gt;Employee&lt;/code&gt; data structure that has a &lt;code&gt;Name&lt;/code&gt; field, then when the program sees &amp;ldquo;&amp;lt;p&amp;gt;Hello, &lt;b&gt;{{.Name}}&lt;/b&gt;&amp;lt;/p&amp;gt;&amp;rdquo; in the template it will replace the curly brace expression with the value of the &lt;code&gt;Name&lt;/code&gt; field. If you edit that part of the template file to say &amp;ldquo;&amp;lt;p&amp;gt;Hello, {{.Name}}&lt;b&gt; at {{.Address}}&lt;/b&gt;&amp;lt;/p&amp;gt;&amp;rdquo; you can then run the program and see the new version of the output with no need to recompile the program.&lt;/p&gt;
&lt;p&gt;Go&amp;rsquo;s templating language includes special keywords for tasks like conditional formatting. For example, if you don&amp;rsquo;t have &lt;code&gt;Address&lt;/code&gt; values for all of the employees, you could format the phrase above to only include &amp;quot; at &amp;quot; and the address value if the there actually was an address value: &amp;ldquo;&amp;lt;p&amp;gt;Hello, {{.Name}}&lt;b&gt;{{if .Address}}&lt;/b&gt; at {{.Address}}&lt;b&gt;{{end}}&lt;/b&gt;&amp;lt;/p&amp;gt;&amp;rdquo;.&lt;/p&gt;
&lt;p&gt;You can see more of the special template codes at &lt;a href=&#34;https://curtisvermeeren.github.io/2017/09/14/Golang-Templates-Cheatsheet&#34;&gt;Golang Templates Cheatsheet&lt;/a&gt;. An important one for my MODS project was &lt;code&gt;range&lt;/code&gt;, which lets you iterate over multiple values for a given property. I used it for the &lt;code&gt;dce:subject&lt;/code&gt; values mentioned above and also to enclose nearly the whole template so that I could output metadata about multiple journal documents. Because I was passing a map of maps to the template, referencing just the right bits was not as simple as pulling a Name value out of an Employee data structure, but it wasn&amp;rsquo;t too bad.&lt;/p&gt;
&lt;p&gt;One downside to working with these templates is the cryptic error messages caused by template problems. A missing curly brace could lead to an error message of &amp;ldquo;panic: runtime error: invalid memory address or nil pointer dereference&amp;rdquo; with no line numbers about the template problem or other helpful information. Instead of celebrating Brian Kernigan&amp;rsquo;s cavalier approach to error handling in examples I should probably dig further into Go&amp;rsquo;s &lt;a href=&#34;https://blog.golang.org/error-handling-and-go&#34;&gt;facilities for that&lt;/a&gt;.&lt;/p&gt;
&lt;h1 id=&#34;running-it&#34;&gt;Running it&lt;/h1&gt;
&lt;p&gt;The knakk sample program uses an &lt;code&gt;in&lt;/code&gt; command line parameter to indicate the input file, so mine does too:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;rdf2modsxml -in modsjournals2.ttl &amp;gt; modsjournals2.xml
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;The output had a lot of extra blank lines, which ultimately don&amp;rsquo;t matter in XML, but I sometimes ran the program like this to remove them:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;rdf2modsxml -in modsjournals2.ttl | awk &amp;#39;NF &amp;gt; 0&amp;#39; &amp;gt; modsjournals2.xml
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;(Raise your hand if you know what the &amp;ldquo;k&amp;rdquo; in &amp;ldquo;awk&amp;rdquo; &lt;a href=&#34;https://en.wikipedia.org/wiki/AWK&#34;&gt;stands for&lt;/a&gt;.)&lt;/p&gt;
&lt;p&gt;Was it valid MODS XML? As I mentioned above, I used &lt;code&gt;xmllint&lt;/code&gt; (which seems to be part of most standard Linux distributions now and can be downloaded for Windows or MacOS) to validate the result against the &lt;a href=&#34;http://www.loc.gov/standards/mods/v3/mods-3-3.xsd&#34;&gt;MODS XML Schema&lt;/a&gt;.&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;xmllint --schema mods-3-4.xsd modsjournals2.xml --noout
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;The &lt;code&gt;--noout&lt;/code&gt; parameter tells &lt;code&gt;xmllint&lt;/code&gt; not to show any of the content and to just indicate whether the XML document conforms to the schema or not. The output of my rdf2modsxml program did conform.&lt;/p&gt;
&lt;h1 id=&#34;go-and-rdf-and-publishing-and-library-metadata&#34;&gt;Go and RDF and publishing and library metadata&lt;/h1&gt;
&lt;p&gt;If the development of a useful new tool requires the writing of code that imports libraries and then needs to be compiled to a binary version, that can be asking a bit much of people who are not full-time software developers. If that code is fairly simple (with a package to do the most difficult part already available) and the main work of using the tool consists of just editing a separate text file, then I think that the use of Go templates for RDF application development offers some real promise. With Hugo as a model, this could obviously be done to use RDF data in applications destined for browsers; I was especially happy to see that it works to generate XML that conforms to an important standard unrelated to HTML.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;m also going to start being braver about messing around with the Hugo templates used to generate this blog!&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2020">2020</category>
      
      <category domain="https://www.bobdc.com//categories/metadata">metadata</category>
      
    </item>
    
    <item>
      <title>The HTML interface to your SPARQL endpoint is not your SPARQL endpoint</title>
      <link>https://www.bobdc.com/blog/endpointandcurl/</link>
      <pubDate>Sun, 19 Jul 2020 11:15:00 +0000</pubDate>
      
      <guid>https://www.bobdc.com/blog/endpointandcurl/</guid>
      
      
      <description><div>Remember what the &#39;P&#39; in &#39;SPARQL&#39; stands for.</div><div>&lt;blockquote id=&#34;id103368&#34; class=&#34;pullquote&#34;&gt;If you have interesting data, we want to use it in application development!&lt;/blockquote&gt;
&lt;p&gt;Something that happens to me now and then: I&amp;rsquo;ll hear that an organization with a lot of interesting data (science, music, whatever) makes the data available on a SPARQL endpoint. I send my browser to the URL listed as the SPARQL endpoint and I see a web form. I enter a simple query on the web form to retrieve a few random triples, click the form&amp;rsquo;s button, and the results of my query appear. Then I enter fancier queries to &lt;a href=&#34;../exploring-a-sparql-endpoint/&#34;&gt;explore the endpoint&amp;rsquo;s data&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Then, if there is a clear indication of an endpoint URL that is different from their form&amp;rsquo;s URL, I append &lt;code&gt;/query=&lt;/code&gt; and an escaped version of a simple query to it so that I can &lt;a href=&#34;../curling-sparql/&#34;&gt;send the query to the endpoint with curl&lt;/a&gt;. If I see no clear  indication of an endpoint URL that is different from this form&amp;rsquo;s URL, I&amp;rsquo;ll look around the website a bit for it, and if I still have no luck I&amp;rsquo;ll try using the form&amp;rsquo;s URL and several variations on it.  (Below are some hints on these variations.)&lt;/p&gt;
&lt;p&gt;Sometimes I just can&amp;rsquo;t find a working endpoint URL. There are sites out there advertising a SPARQL endpoint where the only way to send a query to the endpoint is via the HTML form interface. I won&amp;rsquo;t name specific sites here, but it&amp;rsquo;s definitely a pattern I&amp;rsquo;ve noticed.&lt;/p&gt;
&lt;p&gt;&amp;ldquo;SPARQL&amp;rdquo; stands for &amp;ldquo;SPARQL &lt;em&gt;Protocol&lt;/em&gt; and RDF Query Language&amp;rdquo;. The &lt;a href=&#34;https://www.w3.org/TR/2013/REC-sparql11-protocol-20130321/&#34;&gt;SPARQL 1.1 Protocol specification&lt;/a&gt; tells us &amp;ldquo;This document specifies the SPARQL Protocol; it describes a means for conveying SPARQL queries and updates to a SPARQL processing service and returning the results via HTTP to the entity that requested them.&amp;rdquo; It also tells us that a SPARQL Protocol service is &amp;ldquo;[a]n HTTP server that services HTTP requests and sends back HTTP responses for SPARQL Protocol operations. The URI at which a SPARQL Protocol service listens for requests is generally known as a SPARQL endpoint&amp;rdquo;.&lt;/p&gt;
&lt;p&gt;An &amp;ldquo;endpoint&amp;rdquo; that doesn&amp;rsquo;t support this protocol is not a SPARQL endpoint.  Curl provides many ways to send a query via HTTP and then process the results—my mention of it above links to something I wrote with several examples—and it&amp;rsquo;s a great way to test a proper endpoint.&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s not about curl, though; curl is just a great way to explore a service&amp;rsquo;s HTTP support. Any modern programming language supports HTTP, which means that you should be able to write a program in any of these languages that sends a request to a SPARQL endpoint and then processes the result &lt;em&gt;without needing any special SPARQL or RDF library&lt;/em&gt;. (Of course, there are many such libraries to make this processing even easier.) The curl utility just provides a convenient way to do quick and dirty tests of a SPARQL endpoint from the command line. The ability to do this from the command line, and from within a programming language that provides HTTP support, means that you can automate the execution of these queries and then mix and match the results with other processing to create cool applications. If the only way to issue SPARQL queries against your data is to enter a query on a web form and then click a button, then I can&amp;rsquo;t use your data in this kind of application development. If you have interesting data, we want to use it in application development!&lt;/p&gt;
&lt;h2 id=&#34;finding-the-endpoint&#34;&gt;Finding the endpoint&lt;/h2&gt;
&lt;p&gt;Ideally, the announcement of the endpoint tells you both the URL for endpoint where you send HTTP requests and the URL for a web form front end to that endpoint. For example, &lt;a href=&#34;https://wiki.dbpedia.org/&#34;&gt;DBpedia&lt;/a&gt;&amp;rsquo;s endpoint is at &lt;code&gt;http://dbpedia.org/sparql&lt;/code&gt; and the web form interface is at &lt;code&gt;http://dbpedia.org/snorql/&lt;/code&gt;, where it uses a UI tool called &amp;ldquo;snorql.&amp;rdquo; Note that the snorql form says &amp;ldquo;SPARQL Explorer for &lt;a href=&#34;http://dbpedia.org/sparql%22&#34;&gt;http://dbpedia.org/sparql&amp;quot;&lt;/a&gt; right at the top. That&amp;rsquo;s the kind of clarity about the relationship between the form and the endpoint that I want to see more of out there. The &lt;a href=&#34;https://yago-knowledge.org/sparql&#34;&gt;yago&lt;/a&gt; endpoint form also does this nicely.&lt;/p&gt;
&lt;p&gt;Some places use the same URL for both the endpoint and the web interface to the endpoint, such as the European Bioinformatics Institute&amp;rsquo;s endpoint at &lt;a href=&#34;https://www.ebi.ac.uk/rdf/services/sparql&#34;&gt;https://www.ebi.ac.uk/rdf/services/sparql&lt;/a&gt;, the AGROVOC Thesaurus endpoint at &lt;a href=&#34;http://agrovoc.uniroma2.it/sparql&#34;&gt;http://agrovoc.uniroma2.it/sparql&lt;/a&gt;, and JazzCats one at &lt;a href=&#34;http://cdhr-linkeddata.anu.edu.au/jazzcats-sparql/sparql&#34;&gt;http://cdhr-linkeddata.anu.edu.au/jazzcats-sparql/sparql&lt;/a&gt;. Using the same URL doesn&amp;rsquo;t mean that the HTML interface to the SPARQL endpoint is the same as the endpoint itself; their HTTP servers have some step noting whether a &lt;code&gt;query&lt;/code&gt; parameter was passed with with the URL, and if not, they deliver the HTML page with the web form if it does.&lt;/p&gt;
&lt;p&gt;You&amp;rsquo;ll notice how these endpoint URLs all end in &lt;code&gt;/sparql&lt;/code&gt;. Not all SPARQL endpoints do, but it&amp;rsquo;s a nice convention. If a SPARQL endpoint web form is at &lt;a href=&#34;http://www.example.com&#34;&gt;http://www.example.com&lt;/a&gt; and I see no clear indication of an endpoint URL, I&amp;rsquo;ll try &lt;a href=&#34;http://www.example.com/sparql&#34;&gt;http://www.example.com/sparql&lt;/a&gt; as an endpoint by appending a query parameter with a &lt;a href=&#34;https://www.motobit.com/util/url-decoder.asp&#34;&gt;URL-escaped&lt;/a&gt; version of a very simple query such as &amp;ldquo;SELECT * WHERE { ?s ?p ?o } LIMIT 5&amp;rdquo;. With curl, I can then test it with this:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;curl http://example.com/sparql?query=SELECT%20*%20WHERE%20%7B%3Fs%20%3Fp%20%3Fo%7D%20LIMIT%205
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;If that doesn&amp;rsquo;t work (for example, if the curl request gets you nothing or the HTML of an error message page) and the URL begins with &amp;ldquo;http://&amp;rdquo;, try adding an &amp;ldquo;s&amp;rdquo; after the &amp;ldquo;p&amp;rdquo;. Once you do get a SPARQL result set from and endpoint, it&amp;rsquo;s  typically XML of the query results and you can start exploring ways to get other formats such as &lt;a href=&#34;https://www.w3.org/TR/2013/REC-sparql11-results-json-20130321/&#34;&gt;JSON&lt;/a&gt; or &lt;a href=&#34;https://www.w3.org/TR/2013/REC-sparql11-results-csv-tsv-20130321/&#34;&gt;TSV&lt;/a&gt;. (Again, see my &lt;a href=&#34;../curling-sparql/&#34;&gt;curling SPARQL&lt;/a&gt; post for a quick tour of some possibilities.)&lt;/p&gt;
&lt;p&gt;You can also email the people running the site and say &amp;ldquo;Hey! Great data! I enjoyed entering queries on your form! Does your site have a SPARQL endpoint that supports the SPARQL protocol so that I can get the data with curl and other HTTP tools instead of just using a browser to see rendered HTML of the results?&amp;rdquo; It&amp;rsquo;s one of the reasons that I&amp;rsquo;m writing this blog entry—so I can just point to this long-winded explanation of the difference instead of trying to do a short summary in another email to one of those sites.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/curl">curl</category>
      
      <category domain="https://www.bobdc.com//categories/linked-data">linked-data</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
    </item>
    
    <item>
      <title>Converting CSV to RDF with Tarql</title>
      <link>https://www.bobdc.com/blog/tarql/</link>
      <pubDate>Sun, 21 Jun 2020 11:00:00 +0000</pubDate>
      
      <guid>https://www.bobdc.com/blog/tarql/</guid>
      
      
      <description><div>Quick and easy and, if you like, streaming.</div><div>&lt;img src=&#34;https://www.bobdc.com/img/main/csv2rdf.png&#34; width=&#34;200px&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;Jupyter and SPARQL logos&#34;/&gt;
&lt;!-- regarding https://twitter.com/namedgraph/status/1271065806693003266 I didn&#39;t cover https://github.com/AtomGraph/CSV2RDF because there is no executable file there. You have to build it.  --&gt; 
&lt;p&gt;I have seen several tools for converting spreadsheets to RDF over the years. They typically try to cover so many different cases that learning how to use them has taken more effort than just writing a short perl script that uses the &lt;code&gt;split()&lt;/code&gt; command, so that&amp;rsquo;s what I usually ended up doing. (Several years ago I did come up with &lt;a href=&#34;..//converting-csv-to-rdf/&#34;&gt;another way&lt;/a&gt; that was more of a cute trick with Turtle syntax.)&lt;/p&gt;
&lt;p&gt;A year or two ago I learned about &lt;a href=&#34;https://github.com/tarql/tarql/&#34;&gt;Tarql&lt;/a&gt;, which lets you query delimited files as if they were RDF triples, and I definitely liked it. It seemed so simple, though, that I didn&amp;rsquo;t think it was worth a whole blog post. Recently, however, I was chatting with &lt;a href=&#34;https://www.semanticarts.com/team/#dave&#34;&gt;Dave McComb&lt;/a&gt; of Semantic Arts and learned that this simple utility often plays a large role in the work they do for their clients, so I played some more with Tarql. I also interviewed &lt;a href=&#34;https://www.semanticarts.com/team/#boris&#34;&gt;Boris Pelakh&lt;/a&gt; of Semantic Arts about what kinds of tasks they use Tarql for in their customer work.&lt;/p&gt;
&lt;p&gt;I downloaded Tarql from &lt;a href=&#34;https://github.com/tarql/tarql/releases&#34;&gt;https://github.com/tarql/tarql/releases&lt;/a&gt;, unzipped it,  found a shell script and batch file in a &lt;code&gt;bin&lt;/code&gt; subdirectory of the unzipped version, and was ready to run it.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ll just jump in with a simple example before discussing the various possibilities. Here is a file I called &lt;code&gt;test1.csv&lt;/code&gt;:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;name,quantity,description,available
widget,3,for framing the blivets,false
blivet,2,needed for widgets,true
&amp;#34;like, wow&amp;#34;,4,testing the CSV parsing,true
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Here is a sample query to run against it:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;# test1.rq
SELECT ?name ?quantity ?available
WHERE {}
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;From the command line I tell Tarql to run the &lt;code&gt;test1.rq&lt;/code&gt; query with &lt;code&gt;test1.csv&lt;/code&gt; as input:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;tarql test1.rq test1.csv
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Here is the result:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;--------------------------------------
| name        | quantity | available |
======================================
| &amp;#34;widget&amp;#34;    | &amp;#34;3&amp;#34;      | &amp;#34;false&amp;#34;   |
| &amp;#34;blivet&amp;#34;    | &amp;#34;2&amp;#34;      | &amp;#34;true&amp;#34;    |
| &amp;#34;like, wow&amp;#34; | &amp;#34;4&amp;#34;      | &amp;#34;true&amp;#34;    |
--------------------------------------
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;The first thing I like here is that the comma in &amp;ldquo;like, wow&amp;rdquo; doesn&amp;rsquo;t cause the problems that I had when using the perl &lt;code&gt;split()&lt;/code&gt; function, which split lines at every comma—even the quoted ones. (Perl has a library to get around that, but finding it and installing it was too much trouble for such a simple task.)&lt;/p&gt;
&lt;p&gt;If the query above had specified the dataset with the FROM keyword, like this,&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;# test2.rq
SELECT ?name ?quantity ?available

FROM &amp;lt;file:test1.csv&amp;gt;
WHERE {}
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;then I wouldn&amp;rsquo;t have to mention the data source on the command line,&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;tarql test2.rq
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;and I would get the same result.&lt;/p&gt;
&lt;p&gt;To really turn the data into triples, we can use a CONSTRUCT query. The following does this with the same data and, because Tarql treats everything as a string, it casts the &lt;code&gt;quantity&lt;/code&gt; values to integers and the &lt;code&gt;available&lt;/code&gt; values to Booleans:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;PREFIX ex:  &amp;lt;http://www.learningsparql.com/ns/example/&amp;gt;
PREFIX xsd: &amp;lt;http://www.w3.org/2001/XMLSchema#&amp;gt;

CONSTRUCT { 
   ?u ex:name ?name ;
      ex:quantity ?q ;
      ex:available ?a . 
}
FROM &amp;lt;file:test1.csv&amp;gt;
WHERE { 
  BIND (UUID() AS ?u) 
  BIND (xsd:integer(?quantity) AS ?q)
  BIND (xsd:boolean(?available) AS ?a)
}
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Here is the result:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;@prefix xsd:  &amp;lt;http://www.w3.org/2001/XMLSchema#&amp;gt; .
@prefix ex:  &amp;lt;http://www.learningsparql.com/ns/example/&amp;gt; .
@prefix rdf:  &amp;lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#&amp;gt; .

&amp;lt;urn:uuid:8a6ad6dc-1b2d-4900-a63f-d25286379a0a&amp;gt;
        ex:name       &amp;#34;widget&amp;#34; ;
        ex:quantity   3 ;
        ex:available  false .

&amp;lt;urn:uuid:66ddf7f2-8c37-4ecb-86cf-056234aad317&amp;gt;
        ex:name       &amp;#34;blivet&amp;#34; ;
        ex:quantity   2 ;
        ex:available  true .

&amp;lt;urn:uuid:c8db5512-3772-4193-a172-525181a712de&amp;gt;
        ex:name       &amp;#34;like, wow&amp;#34; ;
        ex:quantity   4 ;
        ex:available  true .
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;The &lt;a href=&#34;http://tarql.github.io/&#34;&gt;Tarql documentation&lt;/a&gt; shows a lot more options and its &lt;a href=&#34;http://tarql.github.io/examples/&#34;&gt;examples&lt;/a&gt; page shows several cool things. And, of course, you have the full power of SPARQL to manipulate the data that you&amp;rsquo;re pulling from tables; one example is my use of &lt;a href=&#34;https://www.w3.org/TR/sparql11-query/#func-uuid&#34;&gt;&lt;code&gt;UUID()&lt;/code&gt;&lt;/a&gt; function in the CONSTRUCT query above. Another nice example is  a &lt;a href=&#34;https://gist.github.com/jaw111/902a03f40eca46b685a1096fda1d3542&#34;&gt;federated query&lt;/a&gt; with Tarql that  &lt;a href=&#34;https://www.twitter.com/wohnjalker&#34;&gt;John Walker&lt;/a&gt; put together.&lt;/p&gt;
&lt;p&gt;The original version of Tarql is among many contributions that &lt;a href=&#34;http://richard.cyganiak.de/&#34;&gt;Richard Cyganiak&lt;/a&gt; has made to RDF-related software over the years. As he told me in an email,&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;I started the project in 2013 when I was still at NUI Galway (formerly DERI), with large contributions from my then-colleague Fadi Maali, and Emir Munoz from Fujitsu Labs. We were working with open data from a number of government data catalogs at the time, and this data often came as CSV files. Tarql started out as a quick hack to help with ingesting that data into our RDF-based tools. The hack proved quite successful. But to this day, Tarql is really just a thin wrapper around Apache Jena&amp;rsquo;s ARQ query engine. All the hard work happens there.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;One important point in the design is that it can stream. That is, only a small part of input and output need to be kept in-memory at any given time. That makes it work well on large CSV files. Again, Jena made it possible by providing building blocks that support streaming operation.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;It&amp;rsquo;s a testament to the design of SPARQL, really. The syntax is so nice and concise, and the underlying model so flexible, that it can be adapted to quite different tasks.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Because Richard pointed out that it can stream, I wanted to show this alternative to my first command line above, which does the same thing but using input from stdin:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;tarql --stdin test1.rq &amp;lt; test1.csv
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;I asked Semantic Arts&amp;rsquo; Boris Pelakh a few things about the role that Tarql plays in the work that Semantic Arts does for their customers and it turns out that it&amp;rsquo;s a pretty big role.&lt;/p&gt;
&lt;hr /&gt;
&lt;p&gt;&lt;em&gt;Boris, to start, tell me a little about what Semantic Arts does and where Tarql fits in.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Semantic Arts provides consulting services, helping companies transform their data models to a semantic graph paradigm, while helping them achieve data harmonization and improve comprehension and efficiency. We use Tarql to transform tabular data (either spreadsheets or SQL exports) into RDF for further processing. It is an essential part of our ETL process.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Where do the tables come from that you&amp;rsquo;re feeding to Tarql? From customer data or from tables that Semantic Arts staff develop as part of their research into the company?&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;It is primarily customer data—bulk CSV, XLS, or SQL exports. In almost all our engagements, the customers already have a large volume of data, either in relational databases or some sort of data warehouse using something like Hadoop or S3. We have used Tarql to transform transaction data, asset inventories, dataset metadata, and so forth.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;What do you do with the Tarql output?&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;It is generally loaded into our local AllegroGraph store during the development process or the client&amp;rsquo;s chosen triple store during production. We then help our clients build semantic applications on top of that triple store. I have also set up ETL pipelines where Tarql runs in EC2 instances and uses S3 to load the generated RDF into Neptune for a scalable solution.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;I believe Semantic Arts helps customers come up with some overall business process schemas or related artifacts; having this data in a triplestore like Allegrograph probably helps a lot with that.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Absolutely. In several engagements we were able to run graph analytics on the imported data for insights as well as running validation, either via SHACL or SPARQL, to help improve data quality.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;What kinds of roles does that RDF play in the deliverables that you eventually provide to the customer?&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;In our view, RDF provides all the best features of both relational and property graph databases, and is an ideal foundation for an enterprise data system.  We help our customers migrate their siloed data into a unified, semantic model defined by an enterprise ontology that we help build. Then, we develop a semantic application stack (APIs and UIs) that take advantage of the newly enriched data.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;So triples in Allegrograph provide the raw material for what eventually ends up as the enterprise ontology.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;While we use AllegroGraph internally, we do not mandate a specific triple store to our customers, instead working with their preferred infrastructure. We have worked with Stardog, AWS Neptune, and MarkLogic, among others. But yes, the instance data created via Tarql, along with classes and properties defined in Protege, provided a unified enterprise ontology for the customer to use.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Tarql provides a lot of potential command line switches to use. Are there interesting ones that you feel many people miss out on?&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;--dedup&lt;/code&gt; option added in Tarql 1.2 (I believe) helps reduce the size of the generated RDF by avoiding the generation of duplicate triples. Tuning the deduplication window size is a careful compromise between the memory footprint of the transform and the output size, and is tuned per pipeline. The support for &lt;code&gt;apf:strSplit&lt;/code&gt;, which allows for the generation of multiple RDF result sets from a single input line, has also been helpful in the past, though that is internal to the query and not a command line option.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Tarql has some &amp;ldquo;magic&amp;rdquo; predicates?&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Yes. For example, &lt;code&gt;tarql:expandPrefix&lt;/code&gt; is very useful when minting URIs in a CONSTRUCT query.  It avoids hard-coding of namespaces into the transformation, lending flexibility and ease of maintenance. Also, the magic &lt;code&gt;?ROWNUM&lt;/code&gt; variable that Tarql provides into the bindings is nice for generating unique IRIs when the data set does not have unique keys.&lt;/p&gt;
&lt;hr /&gt;
&lt;p&gt;Boris is also working on a Python implementation of Tarql called &lt;a href=&#34;https://github.com/RDFLib/pyTARQL&#34;&gt;pyTARQL&lt;/a&gt; that looks like it could be useful for a lot of developers.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2020">2020</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
    </item>
    
    <item>
      <title>SPARQL in a Jupyter Notebook</title>
      <link>https://www.bobdc.com/blog/jupytersparql/</link>
      <pubDate>Sun, 31 May 2020 11:15:00 +0000</pubDate>
      
      <guid>https://www.bobdc.com/blog/jupytersparql/</guid>
      
      
      <description><div>For real this time.</div><div>&lt;img src=&#34;https://www.bobdc.com/img/main/jupyterSparql.png&#34; width=&#34;200px&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;Jupyter and SPARQL logos&#34;/&gt;
&lt;p&gt;A few years ago I wrote a blog post titled &lt;a href=&#34;../sparql-in-a-jupyter-aka-ipytho/&#34;&gt;SPARQL in a Jupyter (a.k.a. IPython) notebook: With just a bit of Python to frame it all&lt;/a&gt;. It described how &lt;a href=&#34;https://jupyter.org/&#34;&gt;Jupyter&lt;/a&gt; notebooks, which have become increasingly popular in the data science world, are an excellent way to share executable code and the results and documentation of that code. Not only do these notebooks make it easy to package all of this in a very presentable way; they also make it easy for your reader to tweak the code in a local copy of your notebook, run the new version, and see the result. This is an especially effective way to help someone understand how a given block of code works.&lt;/p&gt;
&lt;p&gt;When these notebooks were first invented they were known as IPython (&amp;ldquo;Interactive Python&amp;rdquo;) notebooks. At the time, all the executable code was Python, but since then the renaming to &amp;ldquo;Jupyter&amp;rdquo; has been accompanied by support for more and more languages—even &lt;a href=&#34;https://vatlab.github.io/sos-docs/doc/user_guide/multi_kernel_notebook.html&#34;&gt;multiple languages in the same notebook&lt;/a&gt;. It wasn&amp;rsquo;t supporting SPARQL yet when I wrote the post described above, but my &amp;ldquo;just a bit of Python to frame it all&amp;rdquo; automated the handoff of SPARQL queries to the &lt;a href=&#34;https://github.com/RDFLib/rdflib&#34;&gt;rdflib&lt;/a&gt; Python library so that ideally even someone who didn&amp;rsquo;t know Python could enter SPARQL queries into a notebook and see the results as part of the notebook.&lt;/p&gt;
&lt;p&gt;The wait for the real thing is over. &lt;a href=&#34;https://github.com/paulovn&#34;&gt;Paulo Villegas&lt;/a&gt; has released a SPARQL kernel for Jupyter notebooks that lets us run queries natively, and I have been having some fun with it. The project&amp;rsquo;s &lt;a href=&#34;https://github.com/paulovn/sparql-kernel&#34;&gt;sparql-kernel&lt;/a&gt; git repository has good documentation in its readme file. There&amp;rsquo;s no need to clone the project; the following three commands  installed the sparqlkernel files locally for me, installed those into my copy of Jupyter, and then started up Jupyter.&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;pip install sparqlkernel
jupyter sparqlkernel install --user bob
jupyter notebook
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;At this point I was looking at Jupyter in my browser, and when I clicked the &amp;ldquo;New&amp;rdquo; button to create a new notebook I saw SPARQL as a choice right under &amp;ldquo;Python 3&amp;rdquo;.&lt;/p&gt;
&lt;p&gt;While the SPARQL processing in my earlier post about Jupyter was handled by rdflib, this SPARQL kernel functions more as a very nice interface to a SPARQL endpoint that you specify. Or endpoints that you specify—as we&amp;rsquo;ll see, it&amp;rsquo;s very easy to switch between endpoints in one notebook. You specify the endpoint to talk to using a Jupyter &amp;ldquo;magic&amp;rdquo; command, which is a special command that begins with a percent sign.&lt;/p&gt;
&lt;p&gt;Once I was set up with this, I created a new notebook titled &lt;a href=&#34;https://github.com/bobdc/misc/blob/master/JupyterSPARQL/Jupyter%20and%20SPARQL%20and%20Dort%20or%20Dordrecht.ipynb&#34;&gt;Jupyter and SPARQL and Dort or Dordrecht&lt;/a&gt; where you can read and see the various steps I took to retrieve triples from two different endpoints about a famous J.M.W. Turner painting. (Another great thing about Jupyter: github understands it well enough to host the notebooks so that they look the same as they look in a browser pointing at a local Jupyter server. Sometimes when I follow the link to my new notebook, after a minute it tells me &amp;ldquo;Sorry, something went wrong&amp;rdquo; and displays a &amp;ldquo;reload&amp;rdquo; button, and then after clicking that button it usually works pretty quickly.) You can see the results of my queries right in the notebook, and if you download it and install Jupyter and sparql-kernel you can modify the queries and rerun them yourself. (For the notebook&amp;rsquo;s last query you&amp;rsquo;d need a triplestore such as &lt;a href=&#34;https://jena.apache.org/documentation/fuseki2/&#34;&gt;Fuseki&lt;/a&gt; running locally at localhost:3030. It doesn&amp;rsquo;t even have to have any data in it; as you&amp;rsquo;ll see in my new notebook, I used Fuseki to execute a federated query across the other two endpoints.)&lt;/p&gt;
&lt;p&gt;While creating my new notebook, sometimes I was about to plug a new query into it  and thought &amp;ldquo;I should put this query into its own file and send it off to to the endpoint &lt;a href=&#34;http://www.bobdc.com/blog/curling-sparql/&#34;&gt;with curl&lt;/a&gt; just to make sure it works properly&amp;rdquo; because that&amp;rsquo;s such a reflex reaction for me. For trying out queries and iteratively tuning them, though, doing them right in the notebook is much easier than editing a text file and sending it off to the endpoint with a shell command, because I can see the query and results (or errors) right there in the same glance. Despite being a diehard &lt;a href=&#34;https://www.gnu.org/software/emacs/&#34;&gt;Emacs&lt;/a&gt; guy I&amp;rsquo;m pretty confident that this will be my new routine from now on. When I develop multiple related queries in parallel, although I love Emacs&amp;rsquo; &lt;a href=&#34;https://github.com/ljos/sparql-mode&#34;&gt;sparql-mode&lt;/a&gt; (which also hooks up to an endpoint and shows your result right with your query), I still have to keep track of which query is in which buffer. In a Jupyter notebook, I can put nicely-formatted text blocks before and after each query to describe what each query is supposed to do and to annotate my progress with each query.&lt;/p&gt;
&lt;p&gt;I don&amp;rsquo;t want to write things here that are redundant with my new notebook about the Turner painting or with my earlier blog entry about Jupyter, so I encourage you to read the latter if you&amp;rsquo;d like to learn more about why Jupyter notebooks are so great and the former if you want to see the new powers that sparql-kernel adds to Jupyter for SPARQL users. I know I&amp;rsquo;m going to be a much more regular user of this nice tool.&lt;/p&gt;
&lt;p&gt;(Note: just yesterday I learned that Jupyter&amp;rsquo;s competitor &lt;a href=&#34;https://zeppelin.apache.org/&#34;&gt;Apache Zeppelin&lt;/a&gt; also has a &lt;a href=&#34;https://zeppelin.apache.org/docs/0.9.0-preview1/interpreter/sparql.html&#34;&gt;SPARQL plugin&lt;/a&gt;, so that is something to check out as well.)&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2020">2020</category>
      
      <category domain="https://www.bobdc.com//categories/jupyter">Jupyter</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
    </item>
    
    <item>
      <title>Living in a materialized world</title>
      <link>https://www.bobdc.com/blog/materializing/</link>
      <pubDate>Sun, 26 Apr 2020 11:15:00 +0000</pubDate>
      
      <guid>https://www.bobdc.com/blog/materializing/</guid>
      
      
      <description><div>Managing inferenced triples with named graphs.</div><div>&lt;img src=&#34;https://www.bobdc.com/img/main/harrisonAlbumCover.png&#34; width=&#34;200px&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;Living in the Material World album cover&#34;/&gt;
&lt;p&gt;I&amp;rsquo;ve often thought that named graphs could provide an infrastructure for managing inferenced triples, and a &lt;a href=&#34;https://twitter.com/linkedktk/status/1231975320703688704&#34;&gt;recent Twitter exchange&lt;/a&gt; with Adrian Gschwend inspired me to follow through with a little demo.&lt;/p&gt;
&lt;p&gt;Before I describe this demo I&amp;rsquo;m going to review some basic ideas about RDF inferencing and database denormalization. Then I&amp;rsquo;ll describe one approach to managing your own inferencing with an RDF version of database denormalization.&lt;/p&gt;
&lt;h1 id=&#34;inferencing&#34;&gt;Inferencing&lt;/h1&gt;
&lt;p&gt;As I wrote in the &amp;ldquo;What Is Inferencing?&amp;rdquo; section of the &amp;ldquo;RDF Schema, OWL, and Inferencing&amp;rdquo; chapter of my book &lt;a href=&#34;http://www.learningsparql.com&#34;&gt;Learning SPARQL&lt;/a&gt;, &amp;ldquo;Webster&amp;rsquo;s New World College Dictionary defines &amp;lsquo;infer&amp;rsquo; as &amp;rsquo;to conclude or decide from something known or assumed.&amp;rsquo; When you do RDF inferencing, your existing triples are the &amp;lsquo;something known,&amp;rsquo; and your inference tools will infer new triples from them.&amp;rdquo; If you have triples saying that Lassie is an instance of dog, and dog is a subclass of mammal, and mammal is a subclass of animal, then an inferencing tool such as a SPARQL engine that implements RDFS will recognize the implications of the &lt;code&gt;rdfs:subClassOf&lt;/code&gt; predicate used to make the last two statements. This means that if you query for all instances of mammal or animal it will include Lassie in the result.&lt;/p&gt;
&lt;p&gt;The &amp;ldquo;Using SPARQL to Do Your Inferencing&amp;rdquo; section of that same chapter shows how a query like the following can implement some inferencing for this RDFS property if your SPARQL engine doesn&amp;rsquo;t have this feature built in:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;PREFIX rdfs: &amp;lt;http://www.w3.org/2000/01/rdf-schema#&amp;gt; 

CONSTRUCT { ?instance a ?super . }
WHERE { 
  ?instance a ?subclass . 
  ?subclass rdfs:subClassOf ?super . 
}
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;You&amp;rsquo;d need to write such rules for all of the parts of RDFS and OWL that you wanted to implement—and even that might not be enough. Once the query above created a triple saying that Lassie is a mammal, it would be done, but a proper inferencing engine would then infer from that new triple that Lassie is also an animal.&lt;/p&gt;
&lt;p&gt;The above technique can still be useful for simple inferencing like implementation of the  &lt;code&gt; rdfs:subPropertyOf&lt;/code&gt; property for data integration, as long as your subproperties don&amp;rsquo;t have subproperties, so I&amp;rsquo;ll call this technique &amp;ldquo;one-pass inferencing.&amp;rdquo; (I wrote about the implementation of similar inferencing in &lt;a href=&#34;http://www.bobdc.com/blog/driving-hadoop-data-integratio/&#34;&gt;Driving Hadoop data integration with standards-based models instead of code&lt;/a&gt;.)&lt;/p&gt;
&lt;h1 id=&#34;database-denormalization&#34;&gt;Database denormalization&lt;/h1&gt;
&lt;p&gt;To oversimplify a bit, relational database &lt;a href=&#34;https://en.wikipedia.org/wiki/Database_normalization&#34;&gt;normalization&lt;/a&gt; is the process of working out which properties should be stored in which tables to avoid redundancy, because redundancy generally leads to inefficiency. When you store your customer&amp;rsquo;s addresses and information about each item that they ordered, you don&amp;rsquo;t want these in the same table; if one customer ordered three different items, then storing a copy of the address with all three items would take up unnecessary space and make it more difficult to update the address if that customer moves. If you store a unique customer number with the address in the customers table and also with each of the customer&amp;rsquo;s orders in a separate orders table, then when you want to list customer addresses with the items that each customer ordered, you tell the database system to do a &lt;a href=&#34;https://en.wikipedia.org/wiki/Relational_algebra#Joins_and_join-like_operators&#34;&gt;join&lt;/a&gt; of the tables using the customer number to cross-reference the information.&lt;/p&gt;
&lt;p&gt;Sometimes day-to-day operations of a large database system require millions of complex joins to fulfill common requests. This can lead a database administrator to introduce some redundancy in certain tables to increase the efficiency of these requests. We call this &lt;a href=&#34;https://en.wikipedia.org/wiki/Denormalization&#34;&gt;denormalization&lt;/a&gt;. Because of the potential problems of these redundancies, this requires careful management—perhaps clearing out and repopulating the denormalized tables every night at 2AM.&lt;/p&gt;
&lt;p&gt;Storing RDF triples that could otherwise be inferred dynamically, or &amp;ldquo;materializing&amp;rdquo; those triples, is similar. They&amp;rsquo;re considered redundant because if you have all the information necessary to infer a certain piece of information, why store that information in your dataset? Because repeated inferencing of that information will require repeated usage of compute power to perform the same task. When you&amp;rsquo;re doing SPARQL queries this also limits your choice of SPARQL processors, because different SPARQL processors support different levels of inferencing depending on their support for RDFS and different OWL profiles. Many can&amp;rsquo;t do any inferencing at all.&lt;/p&gt;
&lt;p&gt;Because you can do your own one-pass inferencing with CONSTRUCT queries (and with INSERT queries if you are using a triplestore that supports SPARQL UPDATE), you can do  your own materializing to get the effects of denormalization.&lt;/p&gt;
&lt;h1 id=&#34;using-named-graphs-to-manage-materialized-triples&#34;&gt;Using named graphs to manage materialized triples&lt;/h1&gt;
&lt;p&gt;The rest of this assumes that you are familiar with querying and updating of SPARQL named graphs. To be honest,  I use these rarely enough that I re-read the &amp;ldquo;Named Graphs&amp;rdquo; section of my book&amp;rsquo;s &amp;ldquo;Updating Data with SPARQL&amp;rdquo; chapter as a review before I assembled the steps below.&lt;/p&gt;
&lt;p&gt;I mentioned above how the manager of a relational database might have to clear out and repopulate the denormalized tables periodically so that their information stays synchronized with the canonical data. With RDF, we can store materialized triples in named graphs to enable a similar effect. The steps below walk through one possible scenario for this using the &lt;a href=&#34;https://jena.apache.org/documentation/fuseki2/&#34;&gt;Jena Fuseki&lt;/a&gt; triplestore.&lt;/p&gt;
&lt;p&gt;Imagine that my company has two subsidiaries, company1 and company2, that use different schemas to keep track of their employees, and I&amp;rsquo;m using RDFS inferencing to treat all that data as if it conformed to the same schema. Here is a sample of company1 data:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;# company1.ttl

@prefix c1d: &amp;lt;http://learningsparql.com/ns/company1/data#&amp;gt; . 
@prefix c1m: &amp;lt;http://learningsparql.com/ns/company1/model#&amp;gt; . 

c1d:rich c1m:firstName &amp;#34;Richard&amp;#34; . 
c1d:rich c1m:lastName &amp;#34;Mutt&amp;#34; . 
c1d:rich c1m:phone &amp;#34;342-667-9256&amp;#34; . 

c1d:jane c1m:firstName &amp;#34;Jane&amp;#34; . 
c1d:jane c1m:lastName &amp;#34;Smith&amp;#34; . 
c1d:jane c1m:phone &amp;#34;546-700-2543&amp;#34; . 
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Here is some company2 data:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;# company2.ttl 

@prefix c2d: &amp;lt;http://learningsparql.com/ns/company2/data#&amp;gt; . 
@prefix c2m: &amp;lt;http://learningsparql.com/ns/company2/model#&amp;gt; . 

c2d:i432 c2m:firstname &amp;#34;Nanker Phelge&amp;#34; . 
c2d:i432 c2m:surname &amp;#34;Mutt&amp;#34; . 
c2d:432 c2m:homephone &amp;#34;879-334-5234&amp;#34; . 

c2d:i245 c2m:firstname &amp;#34;Cindy&amp;#34; . 
c2d:i245 c2m:surname &amp;#34;Marshall&amp;#34; . 
c2d:i245 c2m:homephone &amp;#34;634-452-4678&amp;#34; . 
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;The two datasets use different properties in different namespaces (such as &lt;code&gt;c1m:lastName&lt;/code&gt; vs. &lt;code&gt;c2m:surname&lt;/code&gt;) to keep track of the same kinds of information.&lt;/p&gt;
&lt;p&gt;The &amp;ldquo;upload files&amp;rdquo; tab of Fuseki&amp;rsquo;s web-based interface includes a &amp;ldquo;Destination graph name&amp;rdquo; field with a prompt of &amp;ldquo;Leave blank for default graph&amp;rdquo;. I specified a graph name of company1 when I uploaded company1.ttl and Fuseki gave this graph a full name of http://localhost:3030/myDataset/data/company1 because it was running on the default port of 3030 on my computer. (All of my queries below define &lt;code&gt;d:&lt;/code&gt; as a prefix for http://localhost:3030/myDataset/data/, so I&amp;rsquo;ll use that to save some typing here.)&lt;/p&gt;
&lt;p&gt;After uploading company2.ttl into a &lt;code&gt;d:company2&lt;/code&gt; named graph, I uploaded the following bit of modeling into a named graph called &lt;code&gt;d:model&lt;/code&gt;. It names the company1 and company2 properties as subproperties of equivalent schema.org properties.&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;# integrationModel.ttl

@prefix rdfs: &amp;lt;http://www.w3.org/2000/01/rdf-schema#&amp;gt; .
@prefix c1m: &amp;lt;http://learningsparql.com/ns/company1/model#&amp;gt; . 
@prefix c2m: &amp;lt;http://learningsparql.com/ns/company2/model#&amp;gt; . 
@prefix schema: &amp;lt;http://schema.org/&amp;gt; . 

c1m:firstName rdfs:subPropertyOf schema:givenName . 
c1m:lastName rdfs:subPropertyOf schema:familyName . 
c1m:phone rdfs:subPropertyOf schema:telephone . 

c2m:firstname rdfs:subPropertyOf schema:givenName . 
c2m:surname rdfs:subPropertyOf schema:familyName . 
c2m:homephone  rdfs:subPropertyOf schema:telephone . 
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;If I loaded all of the above triples into a triplestore that could do inferencing, I could query for &lt;code&gt;schema:givenName&lt;/code&gt;, &lt;code&gt;schema:familyName&lt;/code&gt;, and &lt;code&gt;schema:telephone&lt;/code&gt; values right away and get all of the company1 and company2 data with that one query. For this example, though, I&amp;rsquo;m going to show how to do one-pass inferencing to set the stage for a query that can retrieve all that data using the schema.org property names.&lt;/p&gt;
&lt;p&gt;The next step was to do that inferencing—that is, to create the inferred triples. Before updating data in a triplestore with an INSERT command, it&amp;rsquo;s good to do a CONSTRUCT query to double-check that you&amp;rsquo;ll be creating what you had hoped to, so I ran the following query. It looks in a dataset&amp;rsquo;s default graph and any named graphs for resources that have properties that are subproperties of other properties and then creates triples using those superproperties:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;PREFIX rdfs: &amp;lt;http://www.w3.org/2000/01/rdf-schema#&amp;gt;
PREFIX d: &amp;lt;http://localhost:3030/myDataset/data/&amp;gt;

CONSTRUCT  { ?s ?superProp ?o }
WHERE
{
   { ?s ?p ?o }
   UNION
   { GRAPH ?g { ?s ?p ?o } }
   GRAPH d:model {?p rdfs:subPropertyOf ?superProp } .
}
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;For example, when this query sees that that &lt;code&gt;c2m:firstname&lt;/code&gt; is a subproperty of &lt;code&gt;schema:givenName&lt;/code&gt; and that &lt;code&gt;c2d:i245&lt;/code&gt; has a &lt;code&gt;c2m:firstname&lt;/code&gt; of &amp;ldquo;Cindy&amp;rdquo;, it constructs a triple saying that &lt;code&gt;c2d:i245&lt;/code&gt; has a &lt;code&gt;schema:givenName&lt;/code&gt; of  &amp;ldquo;Cindy&amp;rdquo;.  In other words, it expresses the original fact using a schema.org property in addition to the property from company1&amp;rsquo;s schema.&lt;/p&gt;
&lt;p&gt;The complete result of this query showed all of the company1 and company2 data but using the schema.org properties instead of their original schemas. Being the result of a CONSTRUCT query, though, these triples are temporary.&lt;/p&gt;
&lt;p&gt;I was then ready to really do the INSERT version of this query so that the new triples would be part of my dataset. That is, I was ready to do the actual inferencing. The following similar  query inserts those triples into their own &lt;code&gt;d:inferredData&lt;/code&gt; named graph so that when the time comes to update this redundant data it will be simple to clean out these materialized triples.&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;PREFIX rdfs: &amp;lt;http://www.w3.org/2000/01/rdf-schema#&amp;gt;
PREFIX d: &amp;lt;http://localhost:3030/myDataset/data/&amp;gt;

INSERT { GRAPH d:inferredData  { ?s ?superProp ?o }}
WHERE
{
  { ?s ?p ?o }
  UNION
  { GRAPH ?g { ?s ?p ?o } }
  GRAPH d:model {?p rdfs:subPropertyOf ?superProp } .
}
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;I used this next query to see if the triples I had seen with the recent CONSTRUCT query all got added to this new &lt;code&gt;d:inferredData&lt;/code&gt; graph by the INSERT request. They had:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;PREFIX d: &amp;lt;http://localhost:3030/myDataset/data/&amp;gt;

SELECT ?s ?p ?o
WHERE
{ 
   GRAPH d:inferredData { ?s ?p ?o } 
}
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;At this point I had integrated data from the two companies to conform to a common, standard model, and I could proceed with all the benefits of this arrangement as I queried across the two sets of employees by using the shared schema.&lt;/p&gt;
&lt;p&gt;But, let&amp;rsquo;s say that Jane Smith changes her contact number from 546-700-2543 to 546-111-2222. This gets updated in the original company1 data in the &lt;code&gt;d:company1&lt;/code&gt; named graph with the following update request:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;PREFIX d: &amp;lt;http://localhost:3030/myDataset/data/&amp;gt;
PREFIX c1d: &amp;lt;http://learningsparql.com/ns/company1/data#&amp;gt; 
PREFIX c1m: &amp;lt;http://learningsparql.com/ns/company1/model#&amp;gt; 

DELETE
{ GRAPH d:company1 { c1d:jane c1m:phone &amp;#34;546-700-2543&amp;#34; . } }
INSERT
{ GRAPH d:company1 { c1d:jane c1m:phone &amp;#34;546-111-2222&amp;#34; . } }
WHERE
{ GRAPH d:company1 { c1d:jane c1m:phone &amp;#34;546-700-2543&amp;#34; . } }
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;If I query the schema.org version of the data for Jane&amp;rsquo;s phone number I will still get her old one. This is easy enough to fix; first I blow away all the materialized triples,&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;PREFIX d: &amp;lt;http://localhost:3030/myDataset/data/&amp;gt;
DROP GRAPH d:inferredData
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;and then I regenerate up-to-date versions with the same INSERT command I used earlier. Problem solved. (If I have terabytes of triples of employee data, this DROP GRAPH followed by a new inferencing pass is the part that I&amp;rsquo;d do at 2AM each morning.)&lt;/p&gt;
&lt;h1 id=&#34;applying-these-steps&#34;&gt;Applying These Steps&lt;/h1&gt;
&lt;p&gt;I did all this by going to various Fuseki screens and pasting queries in. Fuseki has a nice feature in which after you run any query—even an update query—it shows you the URL and the &lt;a href=&#34;https://curl.haxx.se/&#34;&gt;curl&lt;/a&gt; command that would execute the same operation. This lets you string togethether these steps in a shell script that automates their execution, which would be handy for a production application. Instead of pasting all those queries into web forms I could just run that script, or, in the case of the 2AM updates, have a cron job run the script as I slept.&lt;/p&gt;
&lt;p&gt;For a production application, there are a few other things I might change. For example, if there were millions of triples of company1 data and millions of triples of company2 data I might do the inferencing over just one or the other instead of everything at once. Assuming that they got updated on different schedules (because they are, after all, different companies) this would skip some unnecessary processing.&lt;/p&gt;
&lt;p&gt;The ultimate lesson is that while named graphs are not particularly popular in typical SPARQL usage, they can be useful for managing large amounts of triples in which different sets of triples play different roles, and the materializing of inferenced triples is one nice example.&lt;/p&gt;
&lt;p&gt;Are you using named graphs for any production application? Let me know at &lt;a href=&#34;https://twitter.com/bobdc&#34;&gt;@bobdc&lt;/a&gt; or at &lt;a href=&#34;https://twitter.com/bobdc&#34;&gt;@learningsparql&lt;/a&gt;.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2020">2020</category>
      
      <category domain="https://www.bobdc.com//categories/rdf">RDF</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
    </item>
    
    <item>
      <title>Querying Wikidata for data that you just entered yourself</title>
      <link>https://www.bobdc.com/blog/editingwikidata/</link>
      <pubDate>Sun, 29 Mar 2020 12:08:00 +0000</pubDate>
      
      <guid>https://www.bobdc.com/blog/editingwikidata/</guid>
      
      
      <description><div>After about four minutes.</div><div>&lt;p&gt;Last month in &lt;a href=&#34;../wd2so/&#34;&gt;Populating a Schema.org dataset from Wikidata&lt;/a&gt; I talked about pulling data out of Wikidata and using it to create Schema.org triples, and I hinted about the possibility of updating Wikidata data directly. The SPARQL fun of this is to then perform queries against Wikidata and to see your data edits reflected within a few minutes. I was pleasantly surprised at how quickly edits showed up in query results, so I thought I would demo it with a little video.&lt;/p&gt;
&lt;p&gt;I had hoped that a video of a single unbroken shot could show me edit some data and then query for it and see the  edits reflected. As it turned out, it wasn&amp;rsquo;t updated in the back end database quickly enough for that, so you don&amp;rsquo;t see the edit reflected in the query I made right after performing the edit in the video. As you&amp;rsquo;ll see in the screenshot below, the new data did show up about four minutes later.&lt;/p&gt;
&lt;p&gt;Here is my four-minute video that would have been about seven minutes if, after editing data and trying immediately to query Wikidata&amp;rsquo;s SPARQL endpoint for the new data, I had kept recording and kept querying until I saw the edit reflected in the query result.&lt;/p&gt;

&lt;div style=&#34;position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;&#34;&gt;
  &lt;iframe src=&#34;https://www.youtube.com/embed/HfpdS_5omi8&#34; style=&#34;position: absolute; top: 0; left: 0; width: 100%; height: 100%; border:0;&#34; allowfullscreen title=&#34;YouTube Video&#34;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;p&gt;(One quick apology: not minding my Ps and Qs, I said &amp;ldquo;pname&amp;rdquo; at 1:01 when I meant to say &lt;a href=&#34;https://www.w3.org/TR/1999/REC-xml-names-19990114/#NT-QName&#34;&gt;qname&lt;/a&gt;, and even that wasn&amp;rsquo;t quite right; I was just talking about the URI&amp;rsquo;s local name, which would need a prefix to be a proper qname.)&lt;/p&gt;
&lt;p&gt;As you see in the video, I queried for Keith Richards&amp;rsquo; roles in the Rolling Stones, used the web interface to add &amp;ldquo;songwriter&amp;rdquo; as an additional role, and queried right away to see if this value showed up. It didn&amp;rsquo;t, and the &lt;code&gt;date&lt;/code&gt; command showed that I was checking this at 11:09:52.&lt;/p&gt;
&lt;p&gt;After I  finished recording the video I created a shell script called &lt;code&gt;temp1.sh&lt;/code&gt; with the curl command that sent the &lt;code&gt;kr.rq&lt;/code&gt; SPARQL query to Wikidata&amp;rsquo;s endpoint and a &lt;code&gt;date&lt;/code&gt; command to show when this happened. Once I saw that this two-line script worked, I added two more lines to make it a perpetual loop so that I could watch it and see what time &amp;ldquo;songwriter&amp;rdquo; showed up as one of Keith&amp;rsquo;s roles. As soon as I started up the looping version for the first time (11:13, as you can see below) it turned out that I didn&amp;rsquo;t need the loop: the available data was apparently updated just as I made that last edit to the script.&lt;/p&gt;
&lt;img src=&#34;https://www.bobdc.com/img/main/keithQueryTerminal.jpg&#34; border=&#34;0&#34;  /&gt;
&lt;p&gt;Here is the query if you&amp;rsquo;d like to try it yourself:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;# following two lines should be executed as one if you use curl for this:
# curl --data-urlencode &amp;#34;query@kr.rq&amp;#34; -H &amp;#34;Accept: text/tab-separated-values&amp;#34;  
# https://query.wikidata.org/bigdata/namespace/wdq/sparql

SELECT ?roleName WHERE {
wd:Q189599 p:P361 ?roleStatement .                # Keith Richards has-role
      ?roleStatement rdf:type wikibase:BestRank ; # The best role statement!
                     pq:P2868 ?role .             # subject-has-role ?role.
      ?role rdfs:label ?roleName . 
      FILTER ( lang(?roleName) = &amp;#34;en&amp;#34; )
}
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;I&amp;rsquo;m sure that sometimes it will take longer than four minutes and sometimes it may be quicker, but that&amp;rsquo;s not a lot of time to wait, and it was fun seeing how my edit to this wonderful growing database was available to a SPARQL query sent to the database&amp;rsquo;s endpoint just a few minutes later.&lt;/p&gt;
&lt;img src=&#34;https://www.bobdc.com/img/main/angiesingle.jpg&#34; border=&#34;0&#34; width=&#34;280&#34; style=&#34;display: block; margin-left: auto; margin-right: auto; &#34; /&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2020">2020</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">sparql</category>
      
      <category domain="https://www.bobdc.com//categories/wikidata">wikidata</category>
      
    </item>
    
    <item>
      <title>Populating a Schema.org dataset from Wikidata</title>
      <link>https://www.bobdc.com/blog/wd2so/</link>
      <pubDate>Sun, 23 Feb 2020 11:00:00 +0000</pubDate>
      
      <guid>https://www.bobdc.com/blog/wd2so/</guid>
      
      
      <description><div>Rock and Roll!</div><div>&lt;img src=&#34;https://www.bobdc.com/img/main/wp2so.jpg&#34; border=&#34;0&#34; width=&#34;400&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; /&gt;
&lt;p&gt;As the &lt;a href=&#34;https://schema.org/&#34;&gt;Schema.org&lt;/a&gt; vocabulary gets applied to more and more data and the data in &lt;a href=&#34;https://www.wikidata.org/wiki/Wikidata:Main_Page&#34;&gt;Wikidata&lt;/a&gt; grows and grows, it&amp;rsquo;s only natural to think about the possibilities of creating Schema.org datasets that are populated from Wikidata.&lt;/p&gt;
&lt;p&gt;From the Wikidata side, the &lt;a href=&#34;https://www.wikidata.org/wiki/Wikidata:Schema.org&#34;&gt;Wikidata:Schema.org&lt;/a&gt; page provides an excellent discussion of the relationship between the two efforts. To summarize some key points: Schema.org is structurally much simpler than Wikidata to ease adoption, but because Schema.org provides no entity identifiers (for example, identifiers for specific people and places) &amp;ldquo;Schema.org is considering to encourage the use of &lt;a href=&#34;https://www.wikidata.org/wiki/Q2013&#34;&gt;Wikidata&lt;/a&gt; as a common entity base for the target of the &lt;a href=&#34;http://schema.org/sameAs&#34;&gt;schema:sameAs&lt;/a&gt; relation (not to be confused with &lt;a href=&#34;https://www.w3.org/TR/owl-ref/#sameAs-def&#34;&gt;owl:sameAs&lt;/a&gt;).&amp;rdquo;&lt;/p&gt;
&lt;p&gt;From the Schema.org side, &lt;a href=&#34;https://github.com/schemaorg/schemaorg/issues/280&#34;&gt;https://github.com/schemaorg/schemaorg/issues/280&lt;/a&gt; has some discussion about the mapping of Schema.org to the Wikidata model. It&amp;rsquo;s mostly about modeling the relationships between common classes and properties—important tasks if you want to automate large-scale conversion between the two models. The &lt;a href=&#34;https://github.com/okfn-brasil/schemaOrg-Wikidata-Map/blob/master/docs/quering-Wikidata.md&#34;&gt;schemaOrg-Wikidata-Map&lt;/a&gt; page, &amp;ldquo;for issue-280&amp;rsquo;s working group subsidy and reference&amp;rdquo;, has some good ideas for creating those mappings.&lt;/p&gt;
&lt;p&gt;In a recent &lt;a href=&#34;https://twitter.com/danbri/status/1205210324435193856&#34;&gt;Twitter thread&lt;/a&gt; about Wikidata &lt;a href=&#34;https://www.twitter.com/danbri&#34;&gt;Dan Brickley&lt;/a&gt; asked me if I was &amp;ldquo;interested in cooking up clever queries to help slurp out subsets&amp;rdquo;. Yes! The query below pulls out almost 21,000 Wikidata triples of album and musician data for bands with a genre of rock and roll (or, in Wikidata terms, bands with a &lt;code&gt;wdt:P136&lt;/code&gt; of &lt;code&gt;wd:Q11399&lt;/code&gt;). Wikidata currently has this kind of data for about 530 bands.&lt;/p&gt;
&lt;p&gt;As with any mapping from one data model to another, some properties let you simply substitute a new name for an old name but others require judgment calls and some model traversal to get at what you want. I wanted to point out a domain-specific data model traversal issue I came across and a more general Wikidata one that will be an issue for people working with any data domain, not just rock and roll bands.&lt;/p&gt;
&lt;p&gt;The domain-specific issues are important because while there are dreams of a generalized mapping between Wikidata and Schema.org, these two schemas both cover so much territory that it&amp;rsquo;s just not feasible. Here is my small example: while the Kinks studio album &amp;ldquo;Face to Face&amp;rdquo; is an instance of &amp;ldquo;album&amp;rdquo; in Wikidata, (&lt;code&gt;wd:Q675825 wdt:P31 wd:Q482994&lt;/code&gt;), the Rolling Stones studio album &amp;ldquo;Beggars Banquet&amp;rdquo; is an instance of studio album (&lt;code&gt;wd:Q339065 wdt:P31 wd:Q208569&lt;/code&gt;)  which is a subclass of album (&lt;code&gt;wd:Q208569 wdt:P279 wd:Q482994&lt;/code&gt;), as are live album (&lt;a href=&#34;https://www.wikidata.org/wiki/Q209939&#34;&gt;&lt;code&gt;wd:Q209939&lt;/code&gt;&lt;/a&gt;) and compilation album (&lt;a href=&#34;https://www.wikidata.org/wiki/Q222910&#34;&gt;&lt;code&gt;wd:Q222910&lt;/code&gt;&lt;/a&gt;). Because of this, my query that pulls out Wikidata triples to convert to Schema.org must look for instances of album and instances of subclasses of album. If the SPARQL engine could do inferencing, I could ask for instances of album, because an instance of a subclass is an instance of its superclass, but this SPARQL engine won&amp;rsquo;t do inferencing. Schema.org actually does have a &lt;a href=&#34;https://schema.org/MusicAlbumProductionType&#34;&gt;&lt;code&gt;schema:MusicAlbumProductionType&lt;/code&gt;&lt;/a&gt; class whose instances such as &lt;code&gt;schema:studioAlbum&lt;/code&gt;, &lt;code&gt;schema:LiveAlbum&lt;/code&gt;, and &lt;code&gt;schema:CompilationAlbum&lt;/code&gt;could store this distinction between various types of albums, but this doesn&amp;rsquo;t change the fact that Wikidata lists the studio album &amp;ldquo;Beggars Banquet&amp;rdquo; as an instance of &amp;ldquo;studio album&amp;rdquo; but the studio album &amp;ldquo;Face to Face&amp;rdquo; as an instance of the studio album superclass &amp;ldquo;album&amp;rdquo;.  (Coming soon: how to correct the Wikidata data!)&lt;/p&gt;
&lt;p&gt;Wikidata&amp;rsquo;s SPARQL engine has enough to do without doing inferencing; my query asks for a lot, and getting it to run in under 60 seconds to avoid a timeout took some rearrangement of triple patterns here and there to make it more efficient. I was surprised that I got away with including an OPTIONAL graph pattern and still kept everything under 60 seconds.&lt;/p&gt;
&lt;p&gt;The use of UNION also helped retrieve the albums despite their different relationships to the data model. You&amp;rsquo;ll see that I UNIONed a third expression in there, which brings me to a key aspect of the Wikidata data model that queries must deal with: &lt;a href=&#34;https://www.mediawiki.org/wiki/Wikibase/DataModel/Primer#Statements&#34;&gt;statements&lt;/a&gt;. Instead of having a triple saying that the work is an album, certain albums have triples saying that there are statements claiming that they are albums. (I&amp;rsquo;m not 100% sure about my wording describing the role of statements here and I&amp;rsquo;m open to correction.) This gives the query a bit more indirection to follow. Because Wikidata may have multiple statements about a topic, a query can request the highest ranked of these: we want the one that is an instance of  &lt;code&gt;wikibase:BestRank&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Whether you&amp;rsquo;re modeling rock and roll bands or commodity prices, the structure of these statements and availability of classes such as &lt;code&gt;wikibase:BestRank&lt;/code&gt; will play a role in your programmatic access to Wikidata data. Removing the levels of indirection added by these statements will be typical of any mapping of Wikidata data to simpler models such as Schema.org. My query for band data also references Wikidata statements in order to request information about each album&amp;rsquo;s release date and each member&amp;rsquo;s role within the band—for example, that &lt;a href=&#34;https://www.wikidata.org/wiki/Q189599&#34;&gt;Keith Richards&lt;/a&gt; has the role &amp;ldquo;lead guitarist&amp;rdquo; with the Rolling Stones. (I would not rank this statement&amp;rsquo;s claim very highly; when Richards was paired with Brian Jones originally and with Ron Wood since 1976, the lack of clear lead and rhythm guitar roles was always an important part of the band&amp;rsquo;s sound, and when paired with Mick Taylor, Taylor was the lead guitarist.) Wikidata had minimal data about rock and roll band member roles, so I gingerly put the request in the OPTIONAL graph pattern mentioned above.&lt;/p&gt;
&lt;p&gt;Here is the query. Note the use of comments to explain the meaning of each cryptic Wikidata prefixed name for easier readability.&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;# rockAndRollBandData.rq: retrieve personnel and album data about
# bands with a genre of rock and roll from Wikidata and output triples
# that use the schema.org model.

# From the command line (but executed on a single line): 

# curl --data-urlencode &amp;#34;query@rockAndRollBandData.rq&amp;#34;  
#   -H &amp;#34;Accept: text/turtle&amp;#34; 
#   https://query.wikidata.org/bigdata/namespace/wdq/sparql

PREFIX schema: &amp;lt;http://schema.org/&amp;gt; 
PREFIX wd:     &amp;lt;http://www.wikidata.org/entity/&amp;gt;
PREFIX wdt:    &amp;lt;http://www.wikidata.org/prop/direct/&amp;gt;
PREFIX rdfs:   &amp;lt;http://www.w3.org/2000/01/rdf-schema#&amp;gt;

CONSTRUCT {
  
   ?band   a schema:MusicGroup ;
           schema:name ?bandName ; 
           schema:musicGroupMember ?member ;
           schema:albums ?album .   

   ?album  a schema:MusicAlbum ;
           schema:name ?albumTitle ;
           schema:datePublished ?releaseDate . 

   ?member schema:name ?memberName ;
           schema:roleName ?roleName .  

}
WHERE {
   ?band wdt:P136 wd:Q11399 ;            # band has genre of rock and roll
         rdfs:label ?bandName ;
         wdt:P527 ?member  .             # band has-part ?member
   FILTER ( lang(?bandName) = &amp;#34;en&amp;#34; )

   ?member rdfs:label ?memberName .
   FILTER ( lang(?memberName) = &amp;#34;en&amp;#34; )
   OPTIONAL {                                     # Member&amp;#39;s role. 
      ?member p:P361 ?roleStatement .             # part-of role statement.
      ?roleStatement rdf:type wikibase:BestRank ; # The best role statement!
                     pq:P2868 ?role .             # subject-has-role ?role.
      ?role rdfs:label ?roleName . 
      FILTER ( lang(?roleName) = &amp;#34;en&amp;#34; )
   }

   { ?album wdt:P31 wd:Q482994 . }       # instance of album (wd:Q482994)
   UNION
   { ?album wdt:P31 ?albumSubclass .     # or a subclass of that such as
     ?albumSubclass p:P279 wd:Q482994 .  # live or compilation album
   }
   UNION 
   { ?album wdt:P31 ?albumSubclass .
     ?albumSubclass p:P279 ?albumClassStatement .   # subclass of
     ?albumClassStatement ps:P279 wd:Q482994 ;
                          rdf:type wikibase:BestRank . 
   }

   ?album wdt:P175 ?band ;                      # has performer
          rdfs:label ?albumTitle ;
          p:P577 ?releaseDateStatement .        # publication date   
  
   FILTER ( lang(?albumTitle) = &amp;#34;en&amp;#34; )

   ?releaseDateStatement ps:P577 ?releaseDate ; # release date as ISO 8601
          rdf:type wikibase:BestRank .          # Only the best!

}
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;I would provide a link to the results, but you can run it yourself with the &lt;a href=&#34;https://curl.haxx.se/&#34;&gt;curl&lt;/a&gt; command shown in the query&amp;rsquo;s header if you store the query in a file called &lt;code&gt;rockAndRollBandData.rq&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Once I had the Schema.org version it was fun to query that with queries that were much simpler than what would have been necessary with Wikidata. For example, the following asks this extracted data who has been a member of more than one band and what the bands were:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;PREFIX schema: &amp;lt;http://schema.org/&amp;gt; 
PREFIX rdfs:   &amp;lt;http://www.w3.org/2000/01/rdf-schema#&amp;gt;

SELECT ?member ?group1 ?group2  WHERE {
  
  ?groupURI1 a schema:MusicGroup ;
             schema:name ?group1 ;
             schema:musicGroupMember ?memberURI . 

  ?groupURI2 a schema:MusicGroup ;
             schema:name ?group2 ;
  schema:musicGroupMember ?memberURI . 
  
  ?memberURI schema:name ?member .

FILTER(?groupURI1 != ?groupURI2)
  
}
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;It would make an interesting class project to retrieve a larger, more complex set of data from Wikidata and then map it to a model such as Schema.org. The coordination of the participants&amp;rsquo; activity (and triples) would be good work experience for everyone involved, and the project could result in something valuable to a particular domain&amp;rsquo;s community. This could include the development of procedures for the updating of their locally stored version as Wikidata evolves, as well as for updates to the source Wikidata data itself when there are gaps for that domain. (Again, coming soon: more on that latter issue!)&lt;/p&gt;
&lt;p&gt;If you doing something like this on your own or with a group, let me know. I&amp;rsquo;d love to hear about it.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2020">2020</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">sparql</category>
      
      <category domain="https://www.bobdc.com//categories/wikidata">wikidata</category>
      
      <category domain="https://www.bobdc.com//categories/schema.org">schema.org</category>
      
    </item>
    
    <item>
      <title>One-click replacement of an IMDb page with the corresponding Wikipedia page</title>
      <link>https://www.bobdc.com/blog/imdb2wp/</link>
      <pubDate>Sun, 19 Jan 2020 11:03:00 +0000</pubDate>
      
      <guid>https://www.bobdc.com/blog/imdb2wp/</guid>
      
      
      <description><div>With some Python, JavaScript, and of course, SPARQL.</div><div>&lt;img src=&#34;https://www.bobdc.com/img/main/imdb2wp.png&#34; border=&#34;0&#34; width=&#34;400&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; /&gt;
&lt;p&gt;I &lt;a href=&#34;https://twitter.com/bobdc/status/1203428231484981248&#34;&gt;recently tweeted&lt;/a&gt; &amp;ldquo;I find that @imdb  is so crowded with ads that’s it’s easier to use Wikipedia to look up movies and actors and directors and their careers. And then there’s that Wikidata SPARQL endpoint!&amp;rdquo; Instead of just &lt;a href=&#34;https://quoteinvestigator.com/2017/03/19/candle/&#34;&gt;cursing the darkness&lt;/a&gt;, I decided to light a little SPARQL-Python-JavaScript candle, and it was remarkably easy.&lt;/p&gt;
&lt;p&gt;Drag this bookmarklet link to your browser&amp;rsquo;s bookmarks bar:  &lt;a href=&#39;javascript:function imdb2wp(currentURL) {newURL = currentURL.replace(/.+imdb.com\/.*?\/(.+?)\/.*/,&#34;http://learningsparql.com/cgi/imdb2wp.cgi?imdbID=$1&#34;); window.location.href = newURL; }; imdb2wp(location.href)&#39;&gt;imdb2wp&lt;/a&gt;. Then, when you&amp;rsquo;re looking at the IMDb page of a person, movie, or television show, the link should take you right to the Wikipedia page for that entity.&lt;/p&gt;
&lt;p&gt;The key to it all is the impressive amount of non-Wikidata identifiers that Wikidata has been adding. If you look at the IMDb page of, for example, the movie &lt;a href=&#34;https://www.imdb.com/title/tt0064652/&#34;&gt;Medium Cool&lt;/a&gt;, in its URL of &lt;code&gt;https://www.imdb.com/title/tt0064652/&lt;/code&gt; you&amp;rsquo;ll see the movie&amp;rsquo;s IMDb identifier tt0064652. If you look at the movie&amp;rsquo;s &lt;a href=&#34;https://www.wikidata.org/wiki/Q1284125&#34;&gt;Wikidata page&lt;/a&gt;, you&amp;rsquo;ll see that IMDb ID stored there. You won&amp;rsquo;t see the URL of its English Wikipedia page, but that&amp;rsquo;s easy enough to look up with the IMDb ID in the following SPARQL query:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;SELECT ?wppage WHERE {
   ?subject wdt:P345 &amp;#39;tt0064652&amp;#39; .
   ?wppage schema:about ?subject .
   FILTER(contains(str(?wppage),&amp;#39;//en.wikipedia&amp;#39;))
}
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;&lt;a href=&#34;https://query.wikidata.org/#SELECT%20%3Fwppage%20WHERE%20%7B%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%0A%3Fsubject%20wdt%3AP345%20%27tt0064652%27%20.%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%0A%20%20%3Fwppage%20schema%3Aabout%20%3Fsubject%20.%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%0A%20%20FILTER%28contains%28str%28%3Fwppage%29%2C%27%2F%2Fen.wikipedia%27%29%29%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%0A%7D&#34;&gt;Try it yourself.&lt;/a&gt; (Of course, a different filter condition can tell the query to find the corresponding Wikipedia page in language other than English.)&lt;/p&gt;
&lt;p&gt;How does the click on the browser&amp;rsquo;s bookmark bar execute the SPARQL query with the appropriate IMDb ID? Last August in &lt;a href=&#34;http://www.bobdc.com/blog/htmlform/&#34;&gt;Custom HTML form front end, SPARQL endpoint back end&lt;/a&gt; I wrote about an application in which the end user enters the name of a cocktail ingredient, clicks the search button, and then (after a SPARQL query asks Wikipedia for drinks that have that ingredient) that user sees a web page displaying those drinks with links to their Wikipedia pages. This new script, in Python this time, is also a CGI script. It accepts a parameter, plugs that parameter into a SPARQL query, sends the query off to the Wikidata endpoint, and then uses the result to give users what they want:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;#!/usr/bin/env python&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;# imdb2wp.cgi:go to Wikipedia page for a movie or &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;# person based on their IMDB ID value. Sample call:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;# http://learningsparql.com/cgi/imdb2wp.cgi?imdbID=nm0000598&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#f92672&#34;&gt;import&lt;/span&gt; sys
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;# Following needed for hosted version to find SPARQLWrapper library&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;sys&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;path&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;append(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;/home/bobdc/lib/python/&amp;#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#f92672&#34;&gt;from&lt;/span&gt; SPARQLWrapper &lt;span style=&#34;color:#f92672&#34;&gt;import&lt;/span&gt; SPARQLWrapper, JSON
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#f92672&#34;&gt;import&lt;/span&gt; cgi
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;form &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; cgi&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;FieldStorage() 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;imdbID &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; form&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;getvalue(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;imdbID&amp;#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;sparql &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; SPARQLWrapper(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;https://query.wikidata.org/sparql&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;# SPARQL query of Wikidata asks for the Wikipedia &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;# page of whatever has this IMDB ID.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;queryString &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&amp;#34;&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;SELECT ?wppage WHERE {
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;?subject wdt:P345 &amp;#39;IMDB-ID&amp;#39; . 
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;  ?wppage schema:about ?subject .
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;  FILTER(contains(str(?wppage),&amp;#39;//en.wikipedia&amp;#39;))
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&amp;#34;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;queryString &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; queryString&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;replace(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;IMDB-ID&amp;#34;&lt;/span&gt;,imdbID)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;sparql&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;setQuery(queryString)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;sparql&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;setReturnFormat(JSON)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;try&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  results &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; sparql&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;query()&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;convert()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  requestGood &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;True&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;except&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;Exception&lt;/span&gt;, e:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  results &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; str(e)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  requestGood &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;False&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;print &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;Content-type: text/html&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;\n\n&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;if&lt;/span&gt; requestGood &lt;span style=&#34;color:#f92672&#34;&gt;==&lt;/span&gt; &lt;span style=&#34;color:#66d9ef&#34;&gt;False&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  print &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&amp;lt;h1&amp;gt;Problem communicating with the server&amp;lt;/h1&amp;gt;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  print &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&amp;lt;p&amp;gt;&amp;#34;&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; results &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&amp;lt;/p&amp;gt;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;elif&lt;/span&gt; (len(results[&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;results&amp;#34;&lt;/span&gt;][&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;bindings&amp;#34;&lt;/span&gt;]) &lt;span style=&#34;color:#f92672&#34;&gt;==&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  print &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&amp;lt;p&amp;gt;No results found.&amp;lt;/p&amp;gt;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;else&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#66d9ef&#34;&gt;for&lt;/span&gt; result &lt;span style=&#34;color:#f92672&#34;&gt;in&lt;/span&gt; results[&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;results&amp;#34;&lt;/span&gt;][&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;bindings&amp;#34;&lt;/span&gt;]:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    wppage &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; result[&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;wppage&amp;#34;&lt;/span&gt;][&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;value&amp;#34;&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;print (&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&amp;lt;meta http-equiv=&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;\&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;Refresh&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;\&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt; content=&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;\&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;0;&amp;#34;&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; wppage &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34; &lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;\&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;gt;&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Note how short the script is even with its comments, white space, and error handling. As its header comment tells us, the script is called as a web service. If you replace the sample call to it shown there with the Medium Cool id of tt0064652, you&amp;rsquo;ll get the URL &lt;a href=&#34;http://learningsparql.com/cgi/imdb2wp.cgi?imdbID=tt0064652&#34;&gt;http://learningsparql.com/cgi/imdb2wp.cgi?imdbID=tt0064652&lt;/a&gt;, which as you can see by clicking it calls the script that sends you to that Wikipedia page. The script stores the passed value in an &lt;code&gt;imdbID&lt;/code&gt; variable and then inserts it into a query that looks just like the one hard-coded for &amp;ldquo;Medium Cool&amp;rdquo; above. Then, the script sends the query off to the Wikidata SPARQL endpoint.&lt;/p&gt;
&lt;p&gt;At a similar point in the Perl script that lists which cocktails have the entered ingredients, the script displays some HTML showing the results. The imdb2wp script does not render a page with results but instead sends back a &lt;a href=&#34;https://en.wikipedia.org/wiki/Meta_refresh&#34;&gt;meta refresh page&lt;/a&gt;. (I only recently learned that that was the actual name for these, and it is an excellent name.) This just sends the user to the Wikipedia page found by the SPARQL query.&lt;/p&gt;
&lt;p&gt;How does the single click call the CGI script? The &lt;a href=&#34;https://en.wikipedia.org/wiki/Bookmarklet&#34;&gt;bookmarklet&amp;rsquo;s&lt;/a&gt; URL is actually a bit of JavaScript that pulls the IMDb ID from the displayed page&amp;rsquo;s URL, appends it to &lt;a href=&#34;http://learningsparql.com/cgi/imdb2wp.cgi?imdbID=&#34;&gt;http://learningsparql.com/cgi/imdb2wp.cgi?imdbID=&lt;/a&gt;, and sends the browser off to the result. So to review:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;When viewing an IMDb page, you click the bookmarklet.&lt;/li&gt;
&lt;li&gt;JavaScript in the bookmarklet calls the CGI script with the IMDb ID.&lt;/li&gt;
&lt;li&gt;The CGI script plugs the IMDb ID into a SPARQL query and uses that query to ask Wikidata for the entity&amp;rsquo;s Wikipedia URL.&lt;/li&gt;
&lt;li&gt;The CGI script redirects you to that URL.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;SPARQL is just a query language—a syntax for describing what to do with a certain kind of data. The real value is in the data that we can query with SPARQL, and Wikidata is becoming more and more valuable. I found it surprisingly easy to use some otherwise old-fashioned (and standardized!) technologies to go from complaining about IMDb to actually doing something about the annoyance.&lt;/p&gt;
&lt;p&gt;This is just a taste of the many possibilities we’ll see from Wikidata&amp;rsquo;s storage of so many standard identifiers for real-world entities. Whatever domain you work in or want to work in, take a look at what kind of identifiers and other data Wikidata stores about that domain&amp;rsquo;s entities and you may very well be inspired to do something no one else has done in that domain by using SPARQL and scripting to mix and match that data with other data in that domain. Let me know if you do!&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2020">2020</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">sparql</category>
      
      <category domain="https://www.bobdc.com//categories/wikidata">wikidata</category>
      
    </item>
    
    <item>
      <title>Ancient Mesopotamian metadata</title>
      <link>https://www.bobdc.com/blog/firstmetadata/</link>
      <pubDate>Sun, 29 Dec 2019 10:56:00 +0000</pubDate>
      
      <guid>https://www.bobdc.com/blog/firstmetadata/</guid>
      
      
      <description><div>4,000 years old!</div><div>&lt;img src=&#34;https://www.bobdc.com/img/main/tabletBasketLabels.jpg&#34; border=&#34;0&#34; width=&#34;400&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;Cuneiform tablet basket labels&#34;/&gt;
&lt;p&gt;In an &lt;a href=&#34;https://www.newyorker.com/magazine/2019/10/14/can-a-machine-learn-to-write-for-the-new-yorker&#34;&gt;October 14th article&lt;/a&gt; in the New Yorker about the use of Artificial Intelligence to generate prose, John Seabrook wrote: &amp;ldquo;A recent exhibition on the written word at the British Library dates the emergence of cuneiform writing to the fourth millennium B.C.E., in Mesopotamia&amp;rdquo;. That got me thinking about some notes I once took on the early history of metadata, and I wondered if there was any scholarship to show that the earliest metadata is as old as the earliest writing. Not quite, but cuneiform tablets of metadata from the early second millennium B.C.E. are still some pretty old metadata.&lt;/p&gt;
&lt;p&gt;First, how do I define &amp;ldquo;metadata&amp;rdquo;? The classic definition &amp;ldquo;data about data&amp;rdquo; is a bit vague; a movie review is data about data, but it&amp;rsquo;s not metadata. I would define metadata as data—ideally, structured data—recorded to aid in the navigation of other data. I was going to say &amp;ldquo;navigation and retrieval and maintenance&amp;rdquo;, but you can&amp;rsquo;t efficiently retrieve or maintain data that you have difficulty finding, so it all builds from navigation. As a working definition I think this covers most uses of metadata.&lt;/p&gt;
&lt;p&gt;I followed a footnote from the 2000 book &lt;a href=&#34;https://www.amazon.com/Great-Libraries-Antiquity-Renaissance/dp/1584560185&#34;&gt;The Great Libraries: From Antiquity to the Renaissance&lt;/a&gt; to the article &lt;a href=&#34;https://books.google.com/books/about/Archive_and_Library_Technique_in_Ancient.html?id=o5ENIgAACAAJ&#34;&gt;Archive and Library Technique in Ancient Mesopotamia&lt;/a&gt; published by Danish researcher Mogens Weitemeyer in the International Library Review journal &lt;a href=&#34;https://www.worldcat.org/title/libri-international-library-review/oclc/50370733&#34;&gt;Libri&lt;/a&gt; in 1956. The article&amp;rsquo;s main point is to explore the idea of a &amp;ldquo;library&amp;rdquo; as opposed to an &amp;ldquo;archive&amp;rdquo; as these terms may apply to a particular archaeological site. To describe one particular set of cuneiform tablets that led to a library vs. archive debate among scholars, Weitemeyer wrote&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Some small tablets from the III Dynasty of Ur (a few somewhat older) found in Lagash, Umma (Djoha), Puzurish-Dagan (Drehem), and Ur tell us about the way in which the archive tablets were stored. At the left edge of the small tablets there are two holes comparatively near each other. From one hole ot the other extended a strand of reed (thin like bast), the impression of which is still clearly visible in the clay (Fig. 4b). By means of this reed-strand the small tablet was fastened to a container of tablets. This appears from the first line of the small tablet, which reads, in Sumerian, gá-dub-ba (dub=tablet, gá=container), i.e. tablet container. Hence, the small tablets were no doubt labels, attached to the receptacles and indicating their contents.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;&amp;ldquo;Figure 4b&amp;rdquo; refers to the label tablet on the right in the picture above. Weitemeyer went on to point out how you can see the pattern from the basketwork in the tablet on the left of the picture. He also went on to say&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The labels first stated that the receptacle was a tablet basket; then followed information about the contents of the tablets, e.g. legal verdicts, accounts, receipts and expenses. At the end was an indication of the period covered; in most cases the period was one year, in some cases the beginning year (or month) and the finishing year (or month) were indicated.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Each small tablet had information about a larger dataset (the content of the container it was attached to) to help people determine whether the information they needed was in that container. Not only is this clearly metadata, but with the apparently regular practice of indicating the period covered by the referenced data at the end of the small tablet&amp;rsquo;s description, this metadata even has some structure to it. Recording the date range covered by a set of described data has continued to be a pretty classic piece of metadata, and with the &lt;a href=&#34;https://en.wikipedia.org/wiki/Third_Dynasty_of_Ur&#34;&gt;Third Dynasty of Ur&lt;/a&gt; being 4,000 years ago, that&amp;rsquo;s some pretty old structured metadata.&lt;/p&gt;
&lt;p&gt;I have been researching the history of metadata on and off for a few years and may write up some more of what I found in future blog entries. (The next stop would be Mycenaean Greece.) It has been fun to find that the idea of metadata, which we consider to be so modern today, has actually been around for literally thousands of years.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2019">2019</category>
      
      <category domain="https://www.bobdc.com//categories/metadata">metadata</category>
      
    </item>
    
    <item>
      <title>Avoiding accidental cross products in SPARQL queries</title>
      <link>https://www.bobdc.com/blog/crossproducts/</link>
      <pubDate>Sun, 17 Nov 2019 09:30:00 +0000</pubDate>
      
      <guid>https://www.bobdc.com/blog/crossproducts/</guid>
      
      
      <description><div>Because one can sneak into your query when you didn&#39;t want  it.</div><div>&lt;blockquote class=&#34;pullquote&#34;&gt;Check the variables in your triple patterns that are connecting up sets of triples with other sets. They may not be doing a good job of it. &lt;/blockquote&gt;
&lt;p&gt;Have you ever written a SPARQL query that returned a suspiciously large amount of results, especially with too many combinations of values? You may have accidentally requested a cross product. I have spent too much time debugging queries where this turned out to be the problem, so I wanted to talk about avoiding it.&lt;/p&gt;
&lt;p&gt;Let&amp;rsquo;s look at a simple example. The following RDF triples show the names of three people and the departments where they work:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;@prefix d:  &amp;lt;http://learningsparql.com/ns/data#&amp;gt; .

d:emp1 d:name &amp;#34;jane&amp;#34; .
d:emp2 d:name &amp;#34;joe&amp;#34; .
d:emp3 d:name &amp;#34;pat&amp;#34; .

d:emp1 d:dept &amp;#34;shipping&amp;#34; .
d:emp2 d:dept &amp;#34;receiving&amp;#34; .
d:emp3 d:dept &amp;#34;accounting&amp;#34; .
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;The following SPARQL query attempts to list each person and their department:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;PREFIX d:  &amp;lt;http://learningsparql.com/ns/data#&amp;gt; 

SELECT ?name ?dept WHERE {
  ?employee d:name ?name .
  ?emp d:dept ?dept .
}
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;The result of this query somehow shows that all the employees work in all the departments:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;-------------------------
| name   | dept         |
=========================
| &amp;#34;pat&amp;#34;  | &amp;#34;shipping&amp;#34;   |
| &amp;#34;pat&amp;#34;  | &amp;#34;receiving&amp;#34;  |
| &amp;#34;pat&amp;#34;  | &amp;#34;accounting&amp;#34; |
| &amp;#34;jane&amp;#34; | &amp;#34;shipping&amp;#34;   |
| &amp;#34;jane&amp;#34; | &amp;#34;receiving&amp;#34;  |
| &amp;#34;jane&amp;#34; | &amp;#34;accounting&amp;#34; |
| &amp;#34;joe&amp;#34;  | &amp;#34;shipping&amp;#34;   |
| &amp;#34;joe&amp;#34;  | &amp;#34;receiving&amp;#34;  |
| &amp;#34;joe&amp;#34;  | &amp;#34;accounting&amp;#34; |
-------------------------
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Why? Experienced SPARQL users probably already saw the problem: the query&amp;rsquo;s first triple pattern says &amp;ldquo;find any triples where the predicate is &lt;code&gt;d:name&lt;/code&gt; and store the subject in &lt;code&gt;?employee&lt;/code&gt; and the object in &lt;code&gt;?name&lt;/code&gt;&amp;rdquo;. The second triple pattern should ask for the department of any employee that we found in the first triple pattern (&lt;code&gt;?employee&lt;/code&gt;). Instead, it&amp;rsquo;s just asking for all triples with &lt;code&gt;d:dept&lt;/code&gt; as the predicate and binding the subject and object to the &lt;code&gt;?emp&lt;/code&gt; and &lt;code&gt;?dept&lt;/code&gt; variables, which have nothing to do with the first triple pattern. If the second triple pattern had used the variable name &lt;code&gt;?employee&lt;/code&gt; instead of &lt;code&gt;?emp&lt;/code&gt;, the query would have asked for resources that matched both triple patterns, and would have given this result:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;-------------------------
| name   | dept         |
=========================
| &amp;#34;pat&amp;#34;  | &amp;#34;accounting&amp;#34; |
| &amp;#34;jane&amp;#34; | &amp;#34;shipping&amp;#34;   |
| &amp;#34;joe&amp;#34;  | &amp;#34;receiving&amp;#34;  |
-------------------------
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;I got three times as many results as I wanted because I created the new variable name &lt;code&gt;?emp&lt;/code&gt; when I should have re-used the existing one &lt;code&gt;?employee&lt;/code&gt;. Avoiding such variable name sloppiness is why some programming languages force you to declare variables. It&amp;rsquo;s also why others that don&amp;rsquo;t, such as &lt;a href=&#34;https://www.typescriptlang.org/&#34;&gt;JavaScript&lt;/a&gt; and &lt;a href=&#34;https://perldoc.perl.org/strict.html&#34;&gt;perl&lt;/a&gt;, offer optional add-ins that force this extra bit of housekeeping.&lt;/p&gt;
&lt;p&gt;When the Franz &lt;a href=&#34;https://allegrograph.com/&#34;&gt;Allegrograph&lt;/a&gt; triplestore sees a cross product it offers a &lt;a href=&#34;https://franz.com/agraph/support/documentation/current/sparql-reference.html#query-warnings&#34;&gt;query warning&lt;/a&gt; automated alert called &lt;code&gt;warn-bgp-cross-product&lt;/code&gt;, so I&amp;rsquo;ll bet that has saved their developers a lot of wasted time. The documentation for this potential warning has a nice summary of what causes cross products: &amp;ldquo;there are patterns in the query that have disjoint sets of variables which will cause the SPARQL engine to find all possible matches between the sets which can lead to very large solution sets&amp;rdquo;. (Some &lt;a href=&#34;https://www.cs.colostate.edu/~cs430dl/yr2016su/more_examples/Ch2/Relational%20algebra%20-%20cross%20product%20and%20natural%20join.pdf&#34;&gt;pdf&lt;/a&gt; class notes for a Colorado State University database class  show how this works with relational databases.)&lt;/p&gt;
&lt;p&gt;In my example cross product above, note that the offending variable names are not mentioned in the SELECT statement and therefore are not in the results. I have found that this can add plenty to the time it takes to identify a cross product as the source of a problem, because these mismatched variables are like cogs that are not meshing together correctly deep inside a machine where you can&amp;rsquo;t see them very well. This is especially true in a larger, more complex query; my query above is a small toy example to make the problem as clear as possible.&lt;/p&gt;
&lt;p&gt;One larger, more complex query where this happened was the second SPARQL query in my &lt;a href=&#34;http://www.bobdc.com/blog/docembeddings/&#34;&gt;Document analysis with machine learning&lt;/a&gt; blog entry last month. Not only did it cost me extra hours of work; the results were so bloated that &lt;a href=&#34;https://jena.apache.org/documentation/query/index.html&#34;&gt;arq&lt;/a&gt; was running out of memory, so I started doing the query in &lt;a href=&#34;http://bobdc.com/blog/trying-out-blazegraph/&#34;&gt;Blazegraph&lt;/a&gt; instead. When I noticed the same cosine similarity figure coming up with dozens of recipe pairings, this was the first warning that I had a cross product problem, just like with the repetitive patterns of all employees working in all departments above. I had no problem running the query with arq once I found the mismatched variable names and straightened out the cross product problem.&lt;/p&gt;
&lt;p&gt;So, if you see such repetition and get suspicious, check the variables in your triple patterns that are connecting up sets of triples with other sets. They may not be doing a good job of it.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2019">2019</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
    </item>
    
    <item>
      <title>Document analysis with machine learning</title>
      <link>https://www.bobdc.com/blog/docembeddings/</link>
      <pubDate>Sun, 27 Oct 2019 11:00:00 +0000</pubDate>
      
      <guid>https://www.bobdc.com/blog/docembeddings/</guid>
      
      
      <description><div>Cookbook recipes!</div><div>&lt;blockquote class=&#34;pullquote&#34;&gt;For people doing digital humanities work, the possibilities in the document embeddings corner of the machine learning world look especially promising.&lt;/blockquote&gt;
&lt;p&gt;I&amp;rsquo;ve been thinking about which machine learning tools can contribute the most to the field of &lt;a href=&#34;https://en.wikipedia.org/wiki/Digital_humanities&#34;&gt;digital humanities&lt;/a&gt;, and an obvious candidate is document embeddings. I&amp;rsquo;ll describe what these are below but I&amp;rsquo;ll start with the fun part: after using some document embedding Python scripts to compare the roughly 560  &lt;a href=&#34;https://en.wikibooks.org/wiki/Category:Recipes&#34;&gt;Wikibooks recipes&lt;/a&gt; to each other, I created an &lt;a href=&#34;http://www.bobdc.com/miscfiles/similarRecipes.html&#34;&gt;If you liked&amp;hellip;&lt;/a&gt; web page that shows, for each recipe, what other recipes were calculated to be most similar to that recipe.&lt;/p&gt;
&lt;p&gt;In &lt;a href=&#34;http://www.bobdc.com/blog/semantic-web-semantics-vs-vect/&#34;&gt;Semantic web semantics vs. vector embedding machine learning semantics&lt;/a&gt; I wrote about how neural networks can assign vectors of values to words based on the relationships among words in a given text corpus. Once these word vectors are &amp;ldquo;embedded&amp;rdquo; in a common vector space, relationships between those vectors can reflect the semantics of the words. The classic examples are asking a system that has done this for a decent-sized corpus of English text &amp;ldquo;king is to queen as man is to what&amp;rdquo; or &amp;ldquo;London is to England as Berlin is to what&amp;rdquo;. By comparing the calculated vectors, it&amp;rsquo;s relatively easy for a system to answer &amp;ldquo;woman&amp;rdquo; to the first question and &amp;ldquo;Germany&amp;rdquo; to the second.&lt;/p&gt;
&lt;p&gt;That post also mentioned how we can assign vectors to other things besides words. Plenty of code is available to generate and work with document embeddings, so I tried this with the &lt;a href=&#34;https://github.com/zalandoresearch/flair&#34;&gt;flair&lt;/a&gt; Python NLP framework available on github. For an introduction to flair, I recommend the github page&amp;rsquo;s tutorial and the article &lt;a href=&#34;https://towardsdatascience.com/text-classification-with-state-of-the-art-nlp-library-flair-b541d7add21f&#34;&gt;Text Classification with State of the Art NLP Library — Flair&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;To generated document embedding vectors for the Wikibooks recipes and then compare them all with each other I based my demo script below on the &lt;a href=&#34;https://github.com/swapnilg915/cosine_similarity_using_embeddings/blob/master/flair_embeddings.py&#34;&gt;flair&lt;/a&gt; example at the &lt;a href=&#34;https://github.com/swapnilg915/cosine_similarity_using_embeddings&#34;&gt;cosine_similarity_using_embeddings&lt;/a&gt; git repo. My demo shown here just does a few recipes, for reasons explained further down, and outputs RDF about the similarity scores it calculated so that I could perform SPARQL queries about those similarities.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;#!/usr/bin/env python     &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;# Read Wikibook recipes, calculate document vectors for each, calculate&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;# all cosine similarity pairings, and output RDF about the result. Recipes&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;# were downloaded from https://en.wikibooks.org/wiki/Category:Recipes, tags&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;# were stripped, and then &amp;lt;title&amp;gt;&amp;lt;/title&amp;gt; and &amp;lt;url&amp;gt;&amp;lt;/url&amp;gt; tags added to each.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#f92672&#34;&gt;import&lt;/span&gt; glob
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#f92672&#34;&gt;import&lt;/span&gt; re
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#f92672&#34;&gt;import&lt;/span&gt; pickle
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#f92672&#34;&gt;from&lt;/span&gt; flair.embeddings &lt;span style=&#34;color:#f92672&#34;&gt;import&lt;/span&gt; Sentence, StackedEmbeddings, FlairEmbeddings,WordEmbeddings
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#f92672&#34;&gt;import&lt;/span&gt; time
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#f92672&#34;&gt;import&lt;/span&gt; numpy &lt;span style=&#34;color:#66d9ef&#34;&gt;as&lt;/span&gt; np
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#f92672&#34;&gt;import&lt;/span&gt; regex &lt;span style=&#34;color:#66d9ef&#34;&gt;as&lt;/span&gt; re
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#f92672&#34;&gt;from&lt;/span&gt; nltk.corpus &lt;span style=&#34;color:#f92672&#34;&gt;import&lt;/span&gt; stopwords
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#f92672&#34;&gt;from&lt;/span&gt; nltk.tokenize &lt;span style=&#34;color:#f92672&#34;&gt;import&lt;/span&gt; word_tokenize
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#f92672&#34;&gt;from&lt;/span&gt; nltk.stem &lt;span style=&#34;color:#f92672&#34;&gt;import&lt;/span&gt; WordNetLemmatizer
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;# Most of this code is based on&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;# https://github.com/swapnilg915/cosine_similarity_using_embeddings/blob/master/flair_embeddings.py&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;# initialize embeddings&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;glove_embedding &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; WordEmbeddings(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;glove&amp;#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;flair_embedding_forward &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; FlairEmbeddings(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;news-forward&amp;#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;flair_embedding_backward &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; FlairEmbeddings(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;news-backward&amp;#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;class&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;FlairEmbeddings&lt;/span&gt;(object):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#66d9ef&#34;&gt;def&lt;/span&gt; __init__(self):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            self&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;stop_words &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; list(stopwords&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;words(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;english&amp;#39;&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            self&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;lemmatizer &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; WordNetLemmatizer()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            self&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;stacked_embeddings &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; StackedEmbeddings(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                    embeddings&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;[flair_embedding_forward, flair_embedding_backward])
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#66d9ef&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;word_token&lt;/span&gt;(self, tokens, lemma&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;False&lt;/span&gt;):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            tokens &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; str(tokens)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            tokens &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; re&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;sub(&lt;span style=&#34;color:#e6db74&#34;&gt;r&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;([\w].)([\~\!\@\#\$\%\^\&amp;amp;\*\(\)\-\+\[\]\{\}\/&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;\&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;\&amp;#39;\:\;])([\s\w].)&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;\\&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;1 &lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;\\&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;2 &lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;\\&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;3&amp;#34;&lt;/span&gt;, tokens)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            tokens &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; re&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;sub(&lt;span style=&#34;color:#e6db74&#34;&gt;r&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;\s+&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34; &amp;#34;&lt;/span&gt;, tokens)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#66d9ef&#34;&gt;if&lt;/span&gt; lemma:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                    &lt;span style=&#34;color:#66d9ef&#34;&gt;return&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34; &amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;join([self&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;lemmatizer&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;lemmatize(token, &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;v&amp;#39;&lt;/span&gt;) &lt;span style=&#34;color:#66d9ef&#34;&gt;for&lt;/span&gt; token &lt;span style=&#34;color:#f92672&#34;&gt;in&lt;/span&gt; \
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                                     word_tokenize(tokens&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;lower()) &lt;span style=&#34;color:#66d9ef&#34;&gt;if&lt;/span&gt; token &lt;span style=&#34;color:#f92672&#34;&gt;not&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;in&lt;/span&gt; self&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;stop_words \
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                                     &lt;span style=&#34;color:#f92672&#34;&gt;and&lt;/span&gt; token&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;isalpha()])
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#66d9ef&#34;&gt;else&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                    &lt;span style=&#34;color:#66d9ef&#34;&gt;return&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34; &amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;join([token &lt;span style=&#34;color:#66d9ef&#34;&gt;for&lt;/span&gt; token &lt;span style=&#34;color:#f92672&#34;&gt;in&lt;/span&gt; word_tokenize(tokens&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;lower()) &lt;span style=&#34;color:#66d9ef&#34;&gt;if&lt;/span&gt; token &lt;span style=&#34;color:#f92672&#34;&gt;not&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;in&lt;/span&gt; \
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                                     self&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;stop_words &lt;span style=&#34;color:#f92672&#34;&gt;and&lt;/span&gt; token&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;isalpha()])
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#66d9ef&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;cos_sim&lt;/span&gt;(self, a, b):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#66d9ef&#34;&gt;return&lt;/span&gt; np&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;inner(a, b) &lt;span style=&#34;color:#f92672&#34;&gt;/&lt;/span&gt; (np&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;linalg&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;norm(a) &lt;span style=&#34;color:#f92672&#34;&gt;*&lt;/span&gt; (np&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;linalg&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;norm(b)))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#66d9ef&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#a6e22e&#34;&gt;getFlairEmbedding&lt;/span&gt;(self, text):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            sentence &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; Sentence(text)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            self&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;stacked_embeddings&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;embed(sentence)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#66d9ef&#34;&gt;return&lt;/span&gt; np&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;mean([np&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;array(token&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;embedding) &lt;span style=&#34;color:#66d9ef&#34;&gt;for&lt;/span&gt; token &lt;span style=&#34;color:#f92672&#34;&gt;in&lt;/span&gt; sentence], axis&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;#################&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;if&lt;/span&gt; __name__ &lt;span style=&#34;color:#f92672&#34;&gt;==&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;__main__&amp;#39;&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    recipeDirectory &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#75715e&#34;&gt;# For this demo, just get the recipes whose titles begin with &amp;#34;J&amp;#34;. &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    filenameArray &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; glob&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;glob(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;/home/bob/temp/wprecipes/data/g-p/Cookbook:J*&amp;#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    print(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;# start: &amp;#39;&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; time&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;strftime(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;%H:%M:%S&amp;#39;&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    recipeDataArray &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; []   &lt;span style=&#34;color:#75715e&#34;&gt;# each entry will be an array with the following entries so&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#75715e&#34;&gt;# that they can be referenced like this: recipeDataArray[3][recipeTitleField]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    recipeTitleField &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    urlField &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    recipeField &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;2&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    recipeEmbeddingField &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;3&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    obj &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; FlairEmbeddings()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#66d9ef&#34;&gt;for&lt;/span&gt; file &lt;span style=&#34;color:#f92672&#34;&gt;in&lt;/span&gt; filenameArray:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        recipeContent &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        input &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; open(file, &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;r&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#66d9ef&#34;&gt;for&lt;/span&gt; line &lt;span style=&#34;color:#f92672&#34;&gt;in&lt;/span&gt; input:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#66d9ef&#34;&gt;if&lt;/span&gt; (&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&amp;lt;title&amp;gt;&amp;#34;&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;in&lt;/span&gt; line): 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                title &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; re&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;sub(&lt;span style=&#34;color:#e6db74&#34;&gt;r&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;^\s*&amp;lt;title&amp;gt;&amp;#39;&lt;/span&gt;,&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;&amp;#39;&lt;/span&gt;,line)  &lt;span style=&#34;color:#75715e&#34;&gt;# Remove title tags. &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                title &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; re&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;sub(&lt;span style=&#34;color:#e6db74&#34;&gt;r&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;\s*&amp;lt;/title&amp;gt;\s*&amp;#39;&lt;/span&gt;,&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;&amp;#39;&lt;/span&gt;,title) 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            recipeContent &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; recipeContent &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; line
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#66d9ef&#34;&gt;if&lt;/span&gt; (&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&amp;lt;url&amp;gt;&amp;#34;&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;in&lt;/span&gt; line): 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                url &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; re&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;sub(&lt;span style=&#34;color:#e6db74&#34;&gt;r&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;^\s*&amp;lt;url&amp;gt;&amp;#39;&lt;/span&gt;,&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;&amp;#39;&lt;/span&gt;,line)  &lt;span style=&#34;color:#75715e&#34;&gt;# Remove url tags.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                url &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; re&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;sub(&lt;span style=&#34;color:#e6db74&#34;&gt;r&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;\s*&amp;lt;/url&amp;gt;\s*&amp;#39;&lt;/span&gt;,&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;&amp;#39;&lt;/span&gt;,url) 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                &lt;span style=&#34;color:#75715e&#34;&gt;##print(file + &amp;#39;: &amp;#39; + url)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            recipeContent &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; recipeContent &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; line
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        input&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;close()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        recipeDataArray&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;append([title,url,recipeContent])
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    print(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;# starting to calculate embeddings: &amp;#39;&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; time&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;strftime(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;%H:%M:%S&amp;#39;&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#75715e&#34;&gt;# Calculate and save embeddings&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#66d9ef&#34;&gt;for&lt;/span&gt; r &lt;span style=&#34;color:#f92672&#34;&gt;in&lt;/span&gt; recipeDataArray:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        recipeEmbedding &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; obj&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;getFlairEmbedding(r[recipeField])
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        r&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;append(recipeEmbedding)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    print(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;# starting comparisons: &amp;#39;&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; time&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;strftime(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;%H:%M:%S&amp;#39;&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#75715e&#34;&gt;# output header of RDF&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    print(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;@prefix d: &amp;lt;http://learningsparql.com/data#&amp;gt; .&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    print(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;@prefix m: &amp;lt;http://learningsparql.com/model#&amp;gt; .&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    print(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;@prefix dc: &amp;lt;http://purl.org/dc/elements/1.1/&amp;gt; .&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;\n&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#75715e&#34;&gt;# Find the cosine similarity of all the combinations&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    recipesToCompare &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; len(recipeDataArray)  &lt;span style=&#34;color:#75715e&#34;&gt;# or some small number for tests&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    i1 &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#66d9ef&#34;&gt;while&lt;/span&gt; i1 &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;&lt;/span&gt; recipesToCompare:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        title &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; recipeDataArray[i1][recipeTitleField]&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;replace(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;\&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;&lt;/span&gt;,&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;&lt;/span&gt;&lt;span style=&#34;color:#ae81ff&#34;&gt;\\&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&amp;#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#75715e&#34;&gt;# Output a triple with the recipe&amp;#39;s title. &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        print(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;&amp;lt;&amp;#39;&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; recipeDataArray[i1][urlField] &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;&amp;gt;  dc:title &amp;#34;&amp;#39;&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; title &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;&amp;#34; .&amp;#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        i2 &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; i1 &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#66d9ef&#34;&gt;while&lt;/span&gt; i2 &lt;span style=&#34;color:#f92672&#34;&gt;&amp;lt;&lt;/span&gt; recipesToCompare:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#75715e&#34;&gt;# output triples like [ m:doc recipeN, recipeN+1 ; m:recipeCosineSim 0.8249611 ] &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            recipeCosineSim &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; \
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            obj&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;cos_sim(recipeDataArray[i1][recipeEmbeddingField],
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                        recipeDataArray[i2][recipeEmbeddingField])
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            print(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;[ m:doc &amp;lt;&amp;#39;&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; recipeDataArray[i1][urlField] &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                  &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;&amp;gt;, &amp;lt;&amp;#39;&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; recipeDataArray[i2][urlField] &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                  &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;&amp;gt; ; m:recipeCosineSim &amp;#39;&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; str(recipeCosineSim) &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39; ] . &amp;#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            i2 &lt;span style=&#34;color:#f92672&#34;&gt;+=&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        i1 &lt;span style=&#34;color:#f92672&#34;&gt;+=&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;print(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;# finished: &amp;#39;&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;+&lt;/span&gt; time&lt;span style=&#34;color:#f92672&#34;&gt;.&lt;/span&gt;strftime(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;%H:%M:%S&amp;#39;&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;On my Dell XPS 13 9350 laptop it took about 30 seconds to calculate each embedding. For 546 recipes, that is several hours, and my laptop was running very hot after half an hour of that.  (This got &lt;a href=&#34;https://www.youtube.com/watch?v=SGyOaCXr8Lw#t=0m31s&#34;&gt;Start Me Up&lt;/a&gt; completely stuck in my head throughout the experiment.) The script above demonstrates the steps of what I did at a small scale, but to create the full &amp;ldquo;If you liked&amp;hellip;&amp;rdquo; recipe comparison page I did the following. (You can find the scripts and queries results for this in &lt;a href=&#34;https://github.com/bobdc/misc/tree/master/recipeDocEmbeddings&#34;&gt;my own&lt;/a&gt; github repository.)&lt;/p&gt;
&lt;p&gt;Instead of reading all the recipes, calculating their embeddings, and calculating their similarities in one run, I split the script above in half. The first half performed three steps:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Read a third of the recipes. Without the &amp;ldquo;J&amp;rdquo; in &lt;code&gt;data/g-p/Cookbook:J*&lt;/code&gt; above, that&amp;rsquo;s the middle third of the recipe collection; the other two thirds were in &lt;code&gt;data/a-f&lt;/code&gt; and &lt;code&gt;data/q-z&lt;/code&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Calculate embeddings for each recipe.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Store the resulting array in a &lt;a href=&#34;https://docs.python.org/2/library/pickle.html&#34;&gt;Python pickle file&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;(If I pursue this more I plan to do all that in one batch on an Amazon AWS EC2 instance. Machine learning in the cloud is a topic you hear about often, and when you could fry an egg on your own laptop it starts to look especially appealing.)&lt;/p&gt;
&lt;p&gt;After running that first script on the three batches of recipes, my second script read the pickle files that the three runs created into one big &lt;code&gt;recipeDataArray&lt;/code&gt; array and then did the &amp;ldquo;Find the cosine similarity of all the combinations&amp;rdquo; part of the script above on that array. Even with 546 recipes, that only took two seconds. It&amp;rsquo;s nice to know that if a linear increase in the number of documents to compare turns out to mean a geometric increase in the number of comparisons to make, the calculation of each pair&amp;rsquo;s similarity is so quick that the geometric increase is not a big deal&amp;ndash;at least at this scale. (Some of the embedding vectors didn&amp;rsquo;t come out because the input was apparently an empty string, according to the error messages. This resulted in 1% of the cosine similarity figures being vectors of &amp;ldquo;nan&amp;rdquo;, or Not a Number, values. If I was doing this for a paying client I would find the input that caused these problems and do something about it, but for a personal project fun demo I just removed the offending vectors moving on to the next step.)&lt;/p&gt;
&lt;p&gt;After this script output RDF about the recipe similarities I could then explore the results. The following excerpt from that RDF gives you the flavor of what the queries had to work with. Each comparison is a blank node connecting up information about what two documents were compared and their comparison score. The &lt;code&gt;dc:title&lt;/code&gt; triples show the actual titles of recipes:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;[ m:doc 
  &amp;lt;https://en.wikibooks.org/wiki/Cookbook:Apple_Raisin_Oat_Muffins&amp;gt;,
  &amp;lt;https://en.wikibooks.org/wiki/Cookbook:Chewy_Ginger_Cookies&amp;gt; ; 
  m:recipeCosineSim 0.90684676 ] . 
  
&amp;lt;https://en.wikibooks.org/wiki/Cookbook:Apple_Raisin_Oat_Muffins&amp;gt; 
   dc:title &amp;#34;Cookbook:Apple Raisin Oat Muffins&amp;#34; .
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;The following SPARQL query lists all the pairings in ascending order of cosine similarity:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;PREFIX m: &amp;lt;http://learningsparql.com/model#&amp;gt; 
PREFIX dc: &amp;lt;http://purl.org/dc/elements/1.1/&amp;gt; 

SELECT ?score ?title1 ?title2 WHERE {
                      ?comparison m:doc ?recipe1, ?recipe2;
  m:recipeCosineSim ?score .
  ?recipe1 dc:title ?title1 .
  ?recipe2 dc:title ?title2 .
  FILTER (?recipe1 != ?recipe2)
}
ORDER BY ?score
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;It turns out that the two most similar recipes are &lt;a href=&#34;https://en.wikibooks.org/wiki/Cookbook:Wonton_Soup&#34;&gt;Wonton Soup&lt;/a&gt; and the &lt;a href=&#34;https://en.wikibooks.org/wiki/Cookbook:Egg_Roll&#34;&gt;Egg Roll&lt;/a&gt; recipe, with a cosine similarity score of 0.9928804. The pairing of &lt;a href=&#34;https://en.wikibooks.org/wiki/Cookbook:Pork_Pot_Pie&#34;&gt;Pork Pot Pie&lt;/a&gt; and &lt;a href=&#34;https://en.wikibooks.org/wiki/Cookbook:Chicken_Pot_Pie_II&#34;&gt;Chicken Pot Pie II&lt;/a&gt; came in second. (I was relieved to see that there was no Chicken Pot Pie I, because if there had been and it wasn&amp;rsquo;t more similar to its sequel than the Pork Pot Pie, then the whole model&amp;rsquo;s ability to determine similarities would be much more questionable. As you&amp;rsquo;ll see below, it&amp;rsquo;s questionable anyway, but there are actions I can take to try to improve it.)&lt;/p&gt;
&lt;p&gt;A slight variation on the above query created the basis of my &amp;ldquo;If you liked&amp;hellip;&amp;rdquo; page. It sorts the results by the recipe titles and then by descending order of what is most similar to each recipe.&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;PREFIX m: &amp;lt;http://learningsparql.com/model#&amp;gt; 
PREFIX dc: &amp;lt;http://purl.org/dc/elements/1.1/&amp;gt;
PREFIX xsd: &amp;lt;http://www.w3.org/2001/XMLSchema#&amp;gt;	

# A comparison object looks like this:
# [ m:doc &amp;lt;https://en.wikibooks.org/wiki/Cookbook:Apple_Raisin_Oat_Muffins&amp;gt;, 
#   &amp;lt;https://en.wikibooks.org/wiki/Cookbook:Adobo&amp;gt; ; m:recipeCosineSim 0.8590696 ] .

SELECT ?score ?doc1URL ?doc1title ?doc2URL ?doc2title WHERE {
   ?comparison m:doc ?doc1URL, ?doc2URL; m:recipeCosineSim ?score .
   ?doc1URL dc:title ?doc1title .
   ?doc2URL dc:title ?doc2title .
  
  FILTER(?doc1URL != ?doc2URL)
  
  # Experimenting with the cutoff that lead to a figure of .975:
  # .92: 189970 result lines; .95: 76866; .97: 8562; .98: 544; .975: 2604
  FILTER(?score &amp;gt; .975)
}
ORDER BY ?doc1title desc(?score)
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;When I ran this with &lt;a href=&#34;https://jena.apache.org/documentation/query/index.html&#34;&gt;arq&lt;/a&gt; I asked it for tab-separated value output, and I wrote a perl script to convert that to HTML. I also did a little hand-editing at the top of the HTML file to add the introduction, and that&amp;rsquo;s what you see at the  &lt;a href=&#34;http://www.bobdc.com/miscfiles/similarRecipes.html&#34;&gt;If you liked&amp;hellip;&lt;/a&gt; page. This web page is hopefully useful because you can easily look up a given recipe and find out which others are most similar to it. All recipe names there are links to the original recipes; their URLs were easy to store throughout the various steps because I treated each recipe&amp;rsquo;s URL as the URI for that document. This whole Linked Data thing is pretty useful sometimes!&lt;/p&gt;
&lt;p&gt;You can find the perl script and queries, along with the scripts I used to pull down and pepare the recipe data, in the git repository I created.&lt;/p&gt;
&lt;p&gt;A great feature of the Wikibooks recipe collection is how many different cuisines are represented, so I was hoping for some interesting cross-cultural pairings, but so many of the pairings make so little sense that it&amp;rsquo;s difficult to take the non-obvious ones seriously. This starts with the very first one on the list: I have no idea why it rates &lt;a href=&#34;https://en.wikibooks.org/wiki/Cookbook:Macaroni_and_Cheese&#34;&gt;Macaroni and Cheese&lt;/a&gt; as the closest thing to &lt;a href=&#34;https://en.wikibooks.org/wiki/Cookbook:B%C3%A1nh_M%C3%AC&#34;&gt;Bánh mì&lt;/a&gt;.  Ranking &lt;a href=&#34;https://en.wikibooks.org/wiki/Cookbook:Guacamole&#34;&gt;Guacamole&lt;/a&gt; as very similar to &lt;a href=&#34;https://en.wikibooks.org/wiki/Cookbook:A_Nice_Cup_of_Tea&#34;&gt;A Nice Cup of Tea&lt;/a&gt; is even worse.&lt;/p&gt;
&lt;p&gt;This reminds us that, as VCs throw their money at AI startups who promise easy, plug-and-play machine learning, we must remember what machine learning people call the &amp;ldquo;no free lunch&amp;rdquo; principal: no single model is going to do everything well. Getting good results means tweaking parameters for how tools do their work, and knowing what tweaks to make requires some study.&lt;/p&gt;
&lt;p&gt;I have ideas for experiments to get the cosine similarity scores to make more intuitive sense. Many of the values set in the &amp;ldquo;initialize embeddings&amp;rdquo; and &amp;ldquo;class FlairEmbeddings&amp;rdquo; sections of the script above were choices made from a particular selection. (I didn&amp;rsquo;t choose them myself, but just copied them from othe examples.) For example, instead of using news-forward and news-backward as the character-level language models, I could have selected from other choices described in the &lt;a href=&#34;https://github.com/zalandoresearch/flair/blob/master/flair/embeddings.py&#34;&gt;source code&lt;/a&gt;. For the word embeddings that the flair document embeddings build on, the same source code shows other alternatives to  &lt;a href=&#34;https://nlp.stanford.edu/projects/glove/&#34;&gt;GloVe&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;But, the time it takes to calculate all of those document embeddings makes it difficult to quickly churn through different combinations of initialization settings. I started up a few small cheap AWS &lt;a href=&#34;https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AMIs.html&#34;&gt;Amazon Machine Images&lt;/a&gt; and was unable to install flair on them, so my next line of research is to keep looking for a good one for this. (I appreciate any suggestions&amp;hellip;)&lt;/p&gt;
&lt;p&gt;Still, the fact that I could take someone else&amp;rsquo;s &lt;a href=&#34;https://github.com/swapnilg915/cosine_similarity_using_embeddings/blob/master/flair_embeddings.py&#34;&gt;63-line script&lt;/a&gt;, modify it a bit, and use machine learning to create an HTML index of recipe similarity in a good-sized yet diverse cookbook means that getting and then tweaking such tools is not a super difficult thing to do with a collection of documents. For people doing digital humanities work, the possibilities in the document embeddings corner of the machine learning world look especially promising.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2019">2019</category>
      
      <category domain="https://www.bobdc.com//categories/ai-and-machine-learning">AI and machine learning</category>
      
      <category domain="https://www.bobdc.com//categories/digital-humanities">digital humanities</category>
      
    </item>
    
    <item>
      <title>Converting JSON to RDF</title>
      <link>https://www.bobdc.com/blog/json2rdf/</link>
      <pubDate>Sun, 22 Sep 2019 11:00:00 +0000</pubDate>
      
      <guid>https://www.bobdc.com/blog/json2rdf/</guid>
      
      
      <description><div>Any JSON at all.</div><div>&lt;blockquote class=&#34;pullquote&#34;&gt;The real payoff of easy conversion of JSON to RDF is the ease with which you can then integrate that data with other datasets.&lt;/blockquote&gt;
&lt;p&gt;When I was at &lt;a href=&#34;https://www.topquadrant.com&#34;&gt;TopQuadrant&lt;/a&gt;, I learned that their &lt;a href=&#34;https://sparqlmotion.org&#34;&gt;SPARQLMotion&lt;/a&gt; scripting language had a module that could convert JSON to RDF. This had nothing to do with &lt;a href=&#34;http://www.bobdc.com/blog/json-ld/&#34;&gt;JSON-LD&lt;/a&gt;—it worked with any JSON at all, using blank nodes to indicate the grouping of data within arbitrary structures.&lt;/p&gt;
&lt;p&gt;Because this tool is only available to paying TopQuadrant customers (or those in the first 30 days of the trial version of TopBraid Composer Maestro Edition), I&amp;rsquo;ve kept my eye out for a free tool that would do this, and I was happy to see AtomGraph&amp;rsquo;s &lt;a href=&#34;https://github.com/AtomGraph/JSON2RDF&#34;&gt;JSON2RDF&lt;/a&gt; on github. I had to build the binary myself, but this was easy enough after a quick install of the &lt;a href=&#34;https://maven.apache.org&#34;&gt;maven&lt;/a&gt; build tool. As the JSON2RDF github readme file tells us, &lt;code&gt;mvn clean install&lt;/code&gt; is all you need to build a jar file. A &lt;a href=&#34;https://www.docker.com&#34;&gt;Docker&lt;/a&gt; image is also available.&lt;/p&gt;
&lt;p&gt;I could then run it on a &lt;code&gt;myinput.json&lt;/code&gt; input file to create a &lt;code&gt;myoutput.json&lt;/code&gt; file with this command line:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;java -jar json2rdf-1.0.0-SNAPSHOT-jar-with-dependencies.jar http://example.com/test# &amp;lt; myinput.json &amp;gt; myoutput.ttl
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;As you&amp;rsquo;ll see in the sample output below, the converter uses the URL provided in the command line as the base URI for the properties in the output.&lt;/p&gt;
&lt;p&gt;To test it I ran that command line using the following handmade JSON as input:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-json&#34; data-lang=&#34;json&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;{ 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;color&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;red&amp;#34;&lt;/span&gt;, 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;amount&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#ae81ff&#34;&gt;3&lt;/span&gt;,                      
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;arrayTest&amp;#34;&lt;/span&gt;: [&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;north&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;south&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;east&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;west&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#ae81ff&#34;&gt;3&lt;/span&gt;,&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;escaped \/string&amp;#34;&lt;/span&gt;],
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;boolTest&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#66d9ef&#34;&gt;true&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;nullTest&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#66d9ef&#34;&gt;null&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;addressBookEntry&amp;#34;&lt;/span&gt;: {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;first&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;Richard&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;last&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;Mutt&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;address&amp;#34;&lt;/span&gt;: {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;street&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;1 Main St&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;city&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;Springfield&amp;#34;&lt;/span&gt; ,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;zip&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;10045&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        }
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    }
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Here is the output that AtomGraph&amp;rsquo;s JSON2RDF created:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;_:B6bba &amp;lt;http://example.com/test#color&amp;gt; &amp;#34;red&amp;#34; .
_:B6bba &amp;lt;http://example.com/test#amount&amp;gt; &amp;#34;3&amp;#34;^^&amp;lt;http://www.w3.org/2001/XMLSchema#int&amp;gt; .
_:B6bba &amp;lt;http://example.com/test#arrayTest&amp;gt; &amp;#34;north&amp;#34; .
_:B6bba &amp;lt;http://example.com/test#arrayTest&amp;gt; &amp;#34;south&amp;#34; .
_:B6bba &amp;lt;http://example.com/test#arrayTest&amp;gt; &amp;#34;east&amp;#34; .
_:B6bba &amp;lt;http://example.com/test#arrayTest&amp;gt; &amp;#34;west&amp;#34; .
_:B6bba &amp;lt;http://example.com/test#arrayTest&amp;gt; &amp;#34;3&amp;#34;^^&amp;lt;http://www.w3.org/2001/XMLSchema#int&amp;gt; .
_:B6bba &amp;lt;http://example.com/test#arrayTest&amp;gt; &amp;#34;escaped /string&amp;#34; .
_:B6bba &amp;lt;http://example.com/test#boolTest&amp;gt; &amp;#34;true&amp;#34;^^&amp;lt;http://www.w3.org/2001/XMLSchema#boolean&amp;gt; .
_:B6bba &amp;lt;http://example.com/test#addressBookEntry&amp;gt; _:Bcd68.

_:Bcd68 &amp;lt;http://example.com/test#first&amp;gt; &amp;#34;Richard&amp;#34; .
_:Bcd68 &amp;lt;http://example.com/test#last&amp;gt; &amp;#34;Mutt&amp;#34; .
_:Bcd68 &amp;lt;http://example.com/test#address&amp;gt; _:B9a02 .

_:B9a02 &amp;lt;http://example.com/test#street&amp;gt; &amp;#34;1 Main St&amp;#34; .
_:B9a02 &amp;lt;http://example.com/test#city&amp;gt; &amp;#34;Springfield&amp;#34; .
_:B9a02 &amp;lt;http://example.com/test#zip&amp;gt; &amp;#34;10045&amp;#34; .
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;(To make it easier to read on this page I replaced the original blank node URIs created by JSON2RDF with shorter versions and added two carriage returns.) You can see that the converter handled the data types, escaped string, and nested structures just fine. This output also provides a nice lesson in how, although the simplicity of the RDF data model means that any data collection is a flat list of triples, you can still represent more complex data structures with very little trouble.&lt;/p&gt;
&lt;p&gt;That was a hand-curated example. To test it on something from the wild, I grabbed the following from the &lt;a href=&#34;https://www.mongodb.com/json-and-bson&#34;&gt;JSON and BSON&lt;/a&gt; page of mongodb.com:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-json&#34; data-lang=&#34;json&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;_id&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;name&amp;#34;&lt;/span&gt; : { &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;first&amp;#34;&lt;/span&gt; : &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;John&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;last&amp;#34;&lt;/span&gt; : &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;Backus&amp;#34;&lt;/span&gt; },
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;contribs&amp;#34;&lt;/span&gt; : [ &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;Fortran&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;ALGOL&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;Backus-Naur Form&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;FP&amp;#34;&lt;/span&gt; ],
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;awards&amp;#34;&lt;/span&gt; : [
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;award&amp;#34;&lt;/span&gt; : &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;W.W. McDowell Award&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;year&amp;#34;&lt;/span&gt; : &lt;span style=&#34;color:#ae81ff&#34;&gt;1967&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;by&amp;#34;&lt;/span&gt; : &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;IEEE Computer Society&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    }, {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;award&amp;#34;&lt;/span&gt; : &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;Draper Prize&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;year&amp;#34;&lt;/span&gt; : &lt;span style=&#34;color:#ae81ff&#34;&gt;1993&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;by&amp;#34;&lt;/span&gt; : &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;National Academy of Engineering&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    }
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  ]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;JSON2RDF turned it into this (again, with blank node URIs replaced with shorter versions for easier reading):&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;_:Bcd72 &amp;lt;http://example.com/test#_id&amp;gt; &amp;#34;1&amp;#34;^^&amp;lt;http://www.w3.org/2001/XMLSchema#int&amp;gt; .
_:Bcd72 &amp;lt;http://example.com/test#name&amp;gt; _:Be87 .
_:Be87 &amp;lt;http://example.com/test#first&amp;gt; &amp;#34;John&amp;#34; .
_:Be87 &amp;lt;http://example.com/test#last&amp;gt; &amp;#34;Backus&amp;#34; .
_:Bcd72 &amp;lt;http://example.com/test#contribs&amp;gt; &amp;#34;Fortran&amp;#34; .
_:Bcd72 &amp;lt;http://example.com/test#contribs&amp;gt; &amp;#34;ALGOL&amp;#34; .
_:Bcd72 &amp;lt;http://example.com/test#contribs&amp;gt; &amp;#34;Backus-Naur Form&amp;#34; .
_:Bcd72 &amp;lt;http://example.com/test#contribs&amp;gt; &amp;#34;FP&amp;#34; .
_:Bcd72 &amp;lt;http://example.com/test#awards&amp;gt; _:Bbc13 .
_:Bbc13 &amp;lt;http://example.com/test#award&amp;gt; &amp;#34;W.W. McDowell Award&amp;#34; .
_:Bbc13 &amp;lt;http://example.com/test#year&amp;gt; &amp;#34;1967&amp;#34;^^&amp;lt;http://www.w3.org/2001/XMLSchema#int&amp;gt; .
_:Bbc13 &amp;lt;http://example.com/test#by&amp;gt; &amp;#34;IEEE Computer Society&amp;#34; .
_:Bcd72 &amp;lt;http://example.com/test#awards&amp;gt; _:Ba9d .
_:Ba9d &amp;lt;http://example.com/test#award&amp;gt; &amp;#34;Draper Prize&amp;#34; .
_:Ba9d &amp;lt;http://example.com/test#year&amp;gt; &amp;#34;1993&amp;#34;^^&amp;lt;http://www.w3.org/2001/XMLSchema#int&amp;gt; .
_:Ba9d &amp;lt;http://example.com/test#by&amp;gt; &amp;#34;National Academy of Engineering&amp;#34; .
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;I ran this SPARQL query against those triples to find awards from after 1990,&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;PREFIX e: &amp;lt;http://example.com/test#&amp;gt;

SELECT ?awardName ?year WHERE {
   ?award e:year ?year ;
  e:award ?awardName  .
  
  FILTER (?year &amp;gt; 1990)
}
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;and got this result:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;-------------------------------------------------------------------
| awardName      | year                                           |
===================================================================
| &amp;#34;Draper Prize&amp;#34; | &amp;#34;1993&amp;#34;^^&amp;lt;http://www.w3.org/2001/XMLSchema#int&amp;gt; |
-------------------------------------------------------------------
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;This is still a rather artificial example. Before converting that JSON about John Backus I could have just queried it directly with a tiny bit of JavaScript or an even tinier &lt;a href=&#34;https://stedolan.github.io/jq/&#34;&gt;jq&lt;/a&gt; expression. The real payoff of easy conversion of JSON to RDF is the ease with which you can then integrate that data with other datasets. With the vast amount of JSON data out there, this means that there is even more data to take advantage of in RDF-based applications.&lt;/p&gt;
&lt;p&gt;For example, imagine that you have two different MongoDB JSON datasets designed independently by two different developers. Merging these into a single JSON dataset so that you can treat the combination as a whole that is greater than the sum of its parts is going to be a lot of ETL work. With the data in RDF, you only need a CONSTRUCT query for each dataset to rename some properties. (Aa few class, subclass, and subproperty declarations might be handy for a little data modeling, but these are optional.) Then, you just append one set of transformed triples to the other and you&amp;rsquo;ve got a single dataset.&lt;/p&gt;
&lt;p&gt;Two more notes about AtomGraph&amp;rsquo;s JSON2RDF:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Make sure to read through all the readme information on their Atomgraph&amp;rsquo;s &lt;a href=&#34;https://github.com/AtomGraph/JSON2RDF&#34;&gt;github page&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;As with SPARQLMotion&amp;rsquo;s ConvertJSONToRDF module, Atomgraph&amp;rsquo;s utility is part of a collection of tools that they make available to pipeline together for application development. Unlike SPARQLMotion, it&amp;rsquo;s open source and can be run from the command line, so in the old-fashioned Unix sense of the word &amp;ldquo;pipeline&amp;rdquo; it can be connected up to tools from other developers as well, such as the aforementioned jq.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2019">2019</category>
      
      <category domain="https://www.bobdc.com//categories/rdf">RDF</category>
      
      <category domain="https://www.bobdc.com//categories/json">JSON</category>
      
    </item>
    
    <item>
      <title>Custom HTML form front end, SPARQL endpoint back end</title>
      <link>https://www.bobdc.com/blog/htmlform/</link>
      <pubDate>Sun, 25 Aug 2019 09:00:00 +0000</pubDate>
      
      <guid>https://www.bobdc.com/blog/htmlform/</guid>
      
      
      <description><div>Your website&#39;s users sending SPARQL queries, even if they haven&#39;t heard of SPARQL.</div><div>&lt;img id=&#34;idm45478314451696&#34; src=&#34;https://www.bobdc.com/img/main/negronisparql.jpg&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;Negroni and SPARQL logo&#34;/&gt;
&lt;p&gt;In a &lt;a href=&#34;https://twitter.com/Joanne_Paul_/status/1153268516541358080&#34;&gt;recent Twitter exchange&lt;/a&gt;, &lt;a href=&#34;https://twitter.com/Joanne_Paul_&#34;&gt;Dr Joanne Paul&lt;/a&gt; asked &amp;ldquo;Does/can this exist? A website where I enter a title (eg. &amp;rsquo;earl of pembroke&amp;rsquo;) and a year (eg. 1553) and it spits out who held that title in that year (in this case, William Herbert).&amp;rdquo; &lt;a href=&#34;https://twitter.com/MichWatsonOz&#34;&gt;Michelle Watson&lt;/a&gt; &lt;a href=&#34;https://twitter.com/MichWatsonOz/status/1153281549896339456&#34;&gt;replied&lt;/a&gt; &amp;ldquo;I bet you could probably write SPARQL query to Wikipedia that would come close to doing that. Not sure how you&amp;rsquo;d embed that into a webpage though.&amp;rdquo; I &lt;a href=&#34;https://twitter.com/bobdc/status/1155482790651138049&#34;&gt;replied&lt;/a&gt; to that: &amp;ldquo;Have an HTML form that hands the entered values to a CGI script (Perl or Python or whatever) that plugs the values into a SPARQL query, sends that off to Wikipedia, and formats the result as HTML&amp;rdquo; and then &amp;ldquo;See pages 285 - 291 of my book &amp;ldquo;Learning SPARQL&amp;rdquo; for an example that uses Python and IMDB. The Python script is at &lt;a href=&#34;http://www.learningsparql.com/1steditionexamples/ex364-cgi.txt&#34;&gt;http://www.learningsparql.com/1steditionexamples/ex364-cgi.txt&lt;/a&gt; .&amp;rdquo;&lt;/p&gt;
&lt;p&gt;I thought I&amp;rsquo;d done a simple example on my blog outside of the book and couldn&amp;rsquo;t find it, so I&amp;rsquo;m doing another one here because it&amp;rsquo;s so easy. Instead of a Python CGI &lt;a href=&#34;https://en.wikipedia.org/wiki/Common_Gateway_Interface&#34;&gt;(Common Gateway Interface)&lt;/a&gt; script calling linkedmdb.org like I did in the book, I wrote a Perl CGI script that calls Wikidata. Instead of having the end user enter the names of two directors on a form and then listing all the actors who have been in films by both directors, like I did in the Python example, in my new one the end user enters the name of a cocktail ingredient and clicks a button. Then, a dynamic web page lists the cocktails that use that ingredient with links to each cocktail&amp;rsquo;s Wikipedia page. (The example in the book called a SPARQL  server at the Linked Movie Database, which doesn&amp;rsquo;t seem to work anymore anyway.) Either way, the key is that the person entering the query criteria is simply filling out a form and they don&amp;rsquo;t need to know anything about the technology on the back end.&lt;/p&gt;
&lt;p&gt;Before creating such a query, I had to ask: does Wikidata have the data  I need to determine which drinks have which ingredients? Wikipedia infoboxes are usually the quickest way to assess whether the data you need is available in a structured form. If you look at the Wikipedia page for a &lt;a href=&#34;https://en.wikipedia.org/wiki/Negroni&#34;&gt;Negroni&lt;/a&gt;, the infobox lists the ingredients in a fairly structured way, which usually means that the data is available in Wikidata with enough structure to query it. The infobox also shows that a Negroni is an &lt;a href=&#34;https://en.wikipedia.org/wiki/List_of_IBA_official_cocktails&#34;&gt;IBA (International Bartenders Association) Official Cocktail&lt;/a&gt;, or in data modeling terms, it&amp;rsquo;s an instance of a class that we can query for. (The narrative text of the page also has an excellent origin story about how Count Camillo Negroni inspired the drink&amp;rsquo;s creation in 1919 and how Orson Welles had something clever to say about the drink 28 years later.)&lt;/p&gt;
&lt;p&gt;The basic steps for creating a web form that calls a SPARQL endpoint:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Write a SPARQL query that requests a specific example of the thing you want from the endpoint. My query asked for cocktails where &amp;ldquo;bitters&amp;rdquo; was an ingredient.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Create a web page with an HTML form where the end user can enter the value or values that will customize the query.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Add the SPARQL query to a CGI script that takes the values passed from the web form, plugs them into the appropriate places in the query, sends the query off to the endpoint, and then displays the result as HTML.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The results of steps 1 and 3 end up in the same CGI file, and the result of step 2 is so small and simple (526 bytes, even with a dash of CSS) that you should take a quick look at my SPARQL cocktail query &lt;a href=&#34;http://bobdc.com/miscfiles/sparqlcocktail.html&#34;&gt;HTML form&lt;/a&gt; and its source before I describe the CGI file. As you’ll see, when the user clicks the form’s &amp;ldquo;search&amp;rdquo; button, the form passes the entered value to the script in a &lt;code&gt;q&lt;/code&gt; variable.&lt;/p&gt;
&lt;p&gt;Here is the Perl CGI script:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;#!/usr/bin/perl

# sample call: http://www.bobdc.com/cgi/sparqlcocktail.cgi?q=scotch

require sparql;  # Assumes that sparql.pm is in this directory; comes
# from https://github.com/swh/Perl-SPARQL-client-library

use strict;
use CGI;

# Usage of Perl-SPARQL-client-library based on test.pl included with it
my $params = CGI-&amp;gt;new;
my $searchTerm = $params-&amp;gt;param(&amp;#39;q&amp;#39;);

my $sparql = sparql-&amp;gt;new();
my $endpoint = &amp;#39;https://query.wikidata.org/sparql&amp;#39;;

# Prefixes used in query don&amp;#39;t need declarations
# because the endpoint has all of these predeclared. 
my $query = &amp;#39;
SELECT ?cocktailName ?wikipediaURL ?ingredientName WHERE {
  BIND (&amp;#34;SEARCHTERM&amp;#34; AS ?searchTerm )
  # ?cocktail instance of IBA official cocktail, 
  ?cocktail wdt:P31 wd:Q2536409 ;  
          # material used ?ingredient,
          wdt:P186 ?ingredient ;    
          rdfs:label ?cocktailName . 
  ?ingredient rdfs:label ?ingredientName . 
  FILTER (lang(?ingredientName) = &amp;#34;en&amp;#34;)
  FILTER (lang(?cocktailName) = &amp;#34;en&amp;#34;)
  # substring query so that &amp;#34;lime&amp;#34; finds &amp;#34;lime juice&amp;#34;, &amp;#34;lime wedge&amp;#34;, etc.
  FILTER(contains(lcase(?ingredientName),lcase(?searchTerm)))
  ?wikipediaURL schema:about ?cocktail . 
  FILTER(contains(str(?wikipediaURL),&amp;#34;/en.wikipedia.org&amp;#34;))
}
ORDER BY ?cocktailName 
&amp;#39;;

# Insert the search term into the query
$query =~ s/SEARCHTERM/$searchTerm/;

# Perform the query
my $queryResult = $sparql-&amp;gt;query($endpoint,$query);

# Output the result as HTML
print &amp;#34;Content-type: text/html\n\n&amp;#34;;
print &amp;#34;&amp;lt;html&amp;gt;&amp;lt;head&amp;gt;&amp;lt;title&amp;gt;SPARQL Cocktails Results&amp;lt;/title&amp;gt;\n&amp;#34;;
print &amp;#34;&amp;lt;style type=&amp;#39;text/css&amp;#39;&amp;gt; * { font-family: arial,helvetica}&amp;lt;/style&amp;gt;\n&amp;#34;;
print &amp;#34;&amp;lt;/head&amp;gt;&amp;lt;body&amp;gt;\n&amp;#34;;

if (scalar(@{$queryResult}) == 0) {
    print &amp;#34;No drinks found with $searchTerm as an ingredient.\n&amp;#34;;
}
else {
    print &amp;#34;&amp;lt;h2&amp;gt;Cocktails with $searchTerm as an ingredient&amp;lt;/h2&amp;gt;\n&amp;#34;;
    for my $row (@{$queryResult}) {
	my $wikipediaURL = $row-&amp;gt;{&amp;#39;wikipediaURL&amp;#39;};
	my $cocktailName = $row-&amp;gt;{&amp;#39;cocktailName&amp;#39;};
	my $ingredientName = $row-&amp;gt;{&amp;#39;ingredientName&amp;#39;};

	# Remove delimiters and language tags. 
	$wikipediaURL =~ s/&amp;lt;(.+)&amp;gt;/$1/;
	$cocktailName =~ s/\&amp;#34;(.+)\&amp;#34;\@en/$1/;
	$ingredientName =~ s/\&amp;#34;(.+)\&amp;#34;\@en/$1/;

	print &amp;#34;&amp;lt;p&amp;gt;&amp;lt;a href=&amp;#39;$wikipediaURL&amp;#39;&amp;gt;$cocktailName&amp;lt;/a&amp;gt;:&amp;#34;;
	print &amp;#34; $ingredientName&amp;lt;/p&amp;gt;\n&amp;#34;;
    }
}
print &amp;#34;&amp;lt;/body&amp;gt;&amp;lt;/html&amp;gt;\n&amp;#34;;
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;The SPARQL query is stored in the  Perl variable &lt;code&gt;$query&lt;/code&gt;, and the script takes the &lt;code&gt;q&lt;/code&gt; value passed from the form and replaces the string &amp;ldquo;SEARCHTERM&amp;rdquo; in the SPARQL query with that value.&lt;/p&gt;
&lt;p&gt;The workings of the query are described by comments within it. It uses  &lt;code&gt;sparql.pm&lt;/code&gt; from the &lt;a href=&#34;https://github.com/swh/Perl-SPARQL-client-library&#34;&gt;Perl-SPARQL-client-library&lt;/a&gt; library that &lt;a href=&#34;http://steve.harris.name/&#34;&gt;Steve Harris&lt;/a&gt; (a.k.a. &lt;a href=&#34;https://twitter.com/theno23&#34;&gt;@theno23&lt;/a&gt;) added there six years ago. It&amp;rsquo;s nice that when Steve’s library passes the query to the endpoint, the comments cause no problems—I have seen libraries that pass SPARQL queries to endpoints without the carriage returns so that embedded comments screw up the parsing of the query. So, my comments describing how the query works are right in the query instead of here.&lt;/p&gt;
&lt;p&gt;CGI scripts have been around since the 1990s and played an important role in the web evolving from static web pages to something more interactive and dynamic. They still work, as you can see, and make it easy to automate the use of SPARQL endpoints for people who&amp;rsquo;ve never heard of SPARQL or RDF. The layers of UI technology that have been developed since, typically  as JavaScript libraries, can of course be incorporated here so that a modern responsive interface can take advantage of SPARQL endpoints on the back end such as Wikidata as well.&lt;/p&gt;
&lt;p&gt;If you write an HTML form and CGI script that sends a SPARQL query to a SPARQL endpoint such as Wikidata, let me know. I&amp;rsquo;d love to see it!&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2019">2019</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
      <category domain="https://www.bobdc.com//categories/cgi">CGI</category>
      
      <category domain="https://www.bobdc.com//categories/perl">Perl</category>
      
    </item>
    
    <item>
      <title>Converting sqlite browser cookies to Turtle and querying them with SPARQL</title>
      <link>https://www.bobdc.com/blog/sqlite/</link>
      <pubDate>Sun, 28 Jul 2019 10:00:00 +0000</pubDate>
      
      <guid>https://www.bobdc.com/blog/sqlite/</guid>
      
      
      <description><div>Because you have more SQLite data than you realized.</div><div>&lt;img id=&#34;idm45478314451696&#34; src=&#34;https://www.bobdc.com/img/main/sparqlsqllite.png&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;Negroni and SPARQL logo&#34;/&gt;
&lt;p&gt;There is a reasonable chance that you&amp;rsquo;ve never heard of &lt;a href=&#34;https://sqlite.org/index.html&#34;&gt;SQLite&lt;/a&gt; and are unaware that this database management program and many database files in its format may be stored on all of your computing devices. Firefox and Chrome in particular use it to keep track of your cookies and, as I&amp;rsquo;ve recently learned, many other things. Of course I want to query all that data with SPARQL, so I wrote some short simple scripts to convert these tables of data to Turtle.&lt;/p&gt;
&lt;p&gt;From a Linux, Windows, or MacOS command prompt (or from the prompt that the excellent &lt;a href=&#34;https://termux.com/&#34;&gt;termux&lt;/a&gt; app adds to Android phones), type &lt;code&gt;sqlite3&lt;/code&gt; to get to the SQLite prompt. If you enter &lt;code&gt;sqlite3 someFileName&lt;/code&gt; it opens that file if it&amp;rsquo;s an SQLite database or creates one with that name if it doesn&amp;rsquo;t exist. From the SQLite prompt, the &lt;code&gt;.quit&lt;/code&gt; command quits SQLite, &lt;code&gt;.tables&lt;/code&gt; lists tables, and &lt;code&gt;.help&lt;/code&gt; tells you about the other commands. Other than that, at the prompt you can enter the typical SQL commands to create tables, insert data into them, as well as to query, update, and delete data. (I did a blog entry titled &lt;a href=&#34;http://www.bobdc.com/blog/my-sql-quick-reference/&#34;&gt;My SQL quick reference&lt;/a&gt; several years ago and have since contributed an &lt;a href=&#34;https://learnxinyminutes.com/docs/sql/&#34;&gt;updated version of it&lt;/a&gt; to the excellent &lt;a href=&#34;https://learnxinyminutes.com/&#34;&gt;Learn X in Y Minutes&lt;/a&gt; site.)&lt;/p&gt;
&lt;p&gt;A search of my hard disk found dozens and dozens of files whose names ended with &lt;code&gt;.sqlite&lt;/code&gt;. I believe it&amp;rsquo;s an older convention to end SQLite database filenames with &lt;code&gt;.db&lt;/code&gt;, and I had some Chrome and Firefox files with that as well. The &lt;code&gt;~/.config/google-chrome/Default&lt;/code&gt; directory had many files that didn&amp;rsquo;t have &lt;code&gt;.sqlite&lt;/code&gt; or &lt;code&gt;.db&lt;/code&gt; extensions but still turned out to be SQLite files.&lt;/p&gt;
&lt;p&gt;SQLite can execute a series of commands stored in a script. For example, my &lt;code&gt;tableList.scr&lt;/code&gt; file has just these two lines,&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;.tables
.quit
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;and from my operating system command line I can quickly list the tables in the &lt;code&gt;cookies.sqlite&lt;/code&gt; database file that I found in a Firefox directory with this command line:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;sqlite3 cookies.sqlite &amp;lt; tableList.scr 
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;The result shows that &lt;code&gt;cookies.sqlite&lt;/code&gt; has just one table: &lt;code&gt;moz_cookies&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Once I know what tables are an SQLite file, my &lt;code&gt;sqliteToTSV.sh&lt;/code&gt; script pulls the data from a named table within that file and saves it as a Turtle file so that I can query it with SPARQL. (You can find all of the scripts and queries that I wrote for this in &lt;a href=&#34;https://github.com/bobdc/misc/tree/master/sqliterdf&#34;&gt;github&lt;/a&gt;.) If you pass the database filename and table name to this shell script like this,&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt; sqliteToTSV.sh cookies.sqlite moz_cookies
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;it first creates an  SQLite script that executes an SQL SELECT command to save everything in that table to a tab-separated value file. It then runs a Perl script that converts the TSV file to Turtle. (This shell script should work fine as a Windows batch file with minimal modifications.)&lt;/p&gt;
&lt;p&gt;The Perl script would be especially short if I hadn&amp;rsquo;t found escaped JSON data in some SQLite column values and binary data in others, so I had the script check for those and just output stub labels instead of trying to do anything useful with them. (Note that for all the SQLite files that I played with, I actually played with copies in a new directory, not the originals created by applications such as the browsers.)&lt;/p&gt;
&lt;p&gt;The remaining ASCII data still offers plenty of interesting things to look at. My files in the git repo include an &lt;code&gt;ffCookiesHosts.rq&lt;/code&gt; SPARQL query that counts how many Firefox cookies come from each base domain and outputs a list sorted by the counts in descending order. Here are the first few lines of output:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;-------------------------------------
| baseDomain              | cookies |
=====================================
| &amp;#34;google.com&amp;#34;            | 57      |
| &amp;#34;pubmatic.com&amp;#34;          | 41      |
| &amp;#34;tremorhub.com&amp;#34;         | 32      |
| &amp;#34;verizon.com&amp;#34;           | 29      |
| &amp;#34;cnn.com&amp;#34;               | 22      |
| &amp;#34;verizonwireless.com&amp;#34;   | 21      |
| &amp;#34;nfl.com&amp;#34;               | 19      |
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;I&amp;rsquo;m not a big NFL guy, but I do remember that when having some Internet trouble the technician and I were using that site to check connectivity. The big surprises for me were the high scores of two names that I didn&amp;rsquo;t recognize: pubmatic.com and tremorhub.com. The tremorhub.com domain name redirects to telaria.com, some company that manages &amp;ldquo;premium&amp;rdquo; video advertising, which sounds just like the kind of company that would dump cookies on  your hard disk without telling you. The pubmatic.com site is about &amp;ldquo;monetization&amp;rdquo; of &amp;ldquo;content&amp;rdquo;, so they too look like a cookie-dumping ad tech firm.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;googleCookiesHosts.rq&lt;/code&gt; query in the git repo performs a similar query on data from the &lt;code&gt;cookies&lt;/code&gt; table in the &lt;code&gt;~/.config/google-chrome/Default/Cookies&lt;/code&gt; SQLite database. Its output listed rubiconproject.com as a leading cookie depositor along with sites that I actually visit often; they&amp;rsquo;re another ad tech firm that I haven&amp;rsquo;t heard of but has clearly been dumping plenty of cookies onto my hard disk.&lt;/p&gt;
&lt;p&gt;I started looking into this so that I could do SPARQL queries about these deposited cookies and it was interesting to see how many other kinds of SQLite files I had. That same &lt;code&gt;google-chrome/Default&lt;/code&gt; directory has a &lt;code&gt;History&lt;/code&gt; SQLite database file with 11 tables, including &lt;code&gt;keyword_search_terms&lt;/code&gt; and &lt;code&gt;visits&lt;/code&gt;. (Not all the files in that directory are SQLite files, so the lack of file extensions to indicate which ones are SQLite files is a bit annoying.) After conversion to Turtle, the &lt;code&gt;keyword_search_terms&lt;/code&gt; data had triples like this, showing that it had stored my search terms in both the entered case and in lower case:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;[
   m:keyword_id &amp;#34;2&amp;#34; ;
   m:url_id &amp;#34;18899&amp;#34; ;
   m:lower_term &amp;#34;coca y sus exploradores lo añoro&amp;#34; ;
   m:term &amp;#34;Coca y Sus Exploradores Lo Añoro&amp;#34; ] .
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;&lt;a href=&#34;https://www.thunderbird.net/en-US/&#34;&gt;Thunderbird&lt;/a&gt;, &lt;a href=&#34;https://www.skype.com/en/&#34;&gt;Skype&lt;/a&gt;, and even &lt;a href=&#34;https://ipython.org/&#34;&gt;iPython&lt;/a&gt; have also deposited SQLite files on my Ubuntu laptop&amp;rsquo;s hard disk.&lt;/p&gt;
&lt;p&gt;If I was writing a script to use in an application that used that &lt;code&gt;keyword_search_terms&lt;/code&gt; data, then instead of representing the values with blank nodes, I&amp;rsquo;d probably give the triples above a subject that built on that &lt;code&gt;m:url_id&lt;/code&gt; value. When converting SQL or CSV or other tabular data to Turtle before I&amp;rsquo;ve usually generated URIs to be the subjects; I finally realized that doing it as blank nodes with square brackets, like the example above, is a nice clean way to represent a row from tabular data and a little less trouble.&lt;/p&gt;
&lt;p&gt;One note about date formats: the Google cookies (and maybe more SQLite files) store dates in a strange format that I could not work out how to convert to proper ISO 8601 format in my Perl script. I found an explanation on &lt;a href=&#34;https://stackoverflow.com/questions/19429577/converting-the-date-within-the-places-sqlite-file-in-firefox-to-a-datetime&#34;&gt;Stack Overflow&lt;/a&gt; of how to convert the date formats as part of an SQL SELECT statement that retrieves the data. You could even use the same logic in SQLite to convert the dates within (a copy of!) the database file itself so that my generic Turtle extraction script would pull out more readable dates. For example, the following example of an SQL UPDATE command does this to the &lt;code&gt;lastAccessed&lt;/code&gt; column of a &lt;code&gt;moz_cookies&lt;/code&gt; table:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;UPDATE moz_cookies SET lastAccessed = datetime(lastAccessed / 1000000, &amp;#39;unixepoch&amp;#39;);
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Overall, it&amp;rsquo;s cool to see how much data is spread around our hard disks using SQLite so that, after some simple scripting, we can explore it with SPARQL.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2019">2019</category>
      
      <category domain="https://www.bobdc.com//categories/rdf">RDF</category>
      
      <category domain="https://www.bobdc.com//categories/turtle">Turtle</category>
      
      <category domain="https://www.bobdc.com//categories/sql">SQL</category>
      
      <category domain="https://www.bobdc.com//categories/sqlite">SQLite</category>
      
    </item>
    
    <item>
      <title>Querying geospatial data with SPARQL</title>
      <link>https://www.bobdc.com/blog/geosparql1/</link>
      <pubDate>Sun, 30 Jun 2019 10:00:00 +0000</pubDate>
      
      <guid>https://www.bobdc.com/blog/geosparql1/</guid>
      
      
      <description><div>Part 1.</div><div>&lt;img id=&#34;idm45478314451696&#34; src=&#34;https://www.bobdc.com/img/main/OSMSPARQL.png&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;OSM and SPARQL logo&#34;/&gt;
&lt;p&gt;&lt;a href=&#34;https://www.openstreetmap.org/&#34;&gt;OpenStreetMap&lt;/a&gt;, or &amp;ldquo;OSM&amp;rdquo; to geospatial folk, is a crowd-sourced online map that has made tremendous achievements in its role as the Wikipedia of geospatial data. (The &lt;a href=&#34;https://en.wikipedia.org/wiki/OpenStreetMap&#34;&gt;Wikipedia page for OpenStreetMap&lt;/a&gt; is really worth a skim to learn more about its impressive history.) OSM offers a free alternative to commercial mapping systems out there—and you better believe that the commercial mapping systems are reading that great free data into their own databases.&lt;/p&gt;
&lt;p&gt;OSM provides a &lt;a href=&#34;https://sophox.org/&#34;&gt;SPARQL endpoint&lt;/a&gt; and a &lt;a href=&#34;https://wiki.openstreetmap.org/wiki/SPARQL_examples&#34;&gt;nice page of example queries&lt;/a&gt;. With their endpoint, the following query lists the names and addresses of  all the museums in New York City (or, in RDF terms, everything with an &lt;code&gt;osmt:addr:city&lt;/code&gt; value of &amp;ldquo;New York&amp;rdquo; and an &lt;code&gt;osmt:tourism&lt;/code&gt; value of &amp;ldquo;museum&amp;rdquo;):&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;SELECT ?name ?housenumber ?street 
WHERE {
   ?museum osmt:addr:city &amp;#34;New York&amp;#34;;
      osmt:tourism &amp;#34;museum&amp;#34;;
      osmm:loc ?loc ;
      osmt:name ?name ;
      osmt:addr:housenumber ?housenumber ;
      osmt:addr:street ?street .
      # The following tells it to only get museums south of the Javits Center
      # FILTER(geof:latitude(?loc) &amp;lt; 40.758289)
}
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;You can try it &lt;a href=&#34;https://sophox.org/#SELECT%20%3Fname%20%3Fhousenumber%20%3Fstreet%20%0AWHERE%20%7B%0A%3Fmuseum%20osmt%3Aaddr%3Acity%20%22New%20York%22%3B%0A%20%20%20%20osmt%3Atourism%20%22museum%22%3B%0A%20%20%20osmm%3Aloc%20%3Floc%20%3B%0A%20%20%20osmt%3Aname%20%3Fname%20%3B%0A%20%20%20osmt%3Aaddr%3Ahousenumber%20%3Fhousenumber%20%3B%0A%20%20%20osmt%3Aaddr%3Astreet%20%3Fstreet%20.%0A%20%20%20%23%20The%20following%20tells%20it%20to%20only%20get%20museums%20south%20of%20the%20Javits%20Center%0A%20%20%20%23%20FILTER%28geof%3Alatitude%28%3Floc%29%20%3C%2040.758289%29%0A%7D%0A%20%20%20%0A&#34;&gt;here&lt;/a&gt;. As I write this, it returns 32 results, and if you uncomment the filter condition to only get museums south of that latitude, it returns 17. That filter condition is just a taste of actually using geospatial data; the &lt;code&gt;osm:loc&lt;/code&gt; value has a type of &lt;code&gt;http://www.opengis.net/ont/geosparql#wktLiteral&lt;/code&gt; and takes a form like &lt;code&gt;Point(-73.9900266 40.7187837)&lt;/code&gt;. As you can see, the filter with the &lt;code&gt;geof:latitude()&lt;/code&gt; function to pull the latitude value out out of the &lt;code&gt;Point&lt;/code&gt; value.&lt;/p&gt;
&lt;p&gt;This is a very basic level of geospatial data use. A proper geospatial query for something like all the museums within a mile of the Museum of Modern Art is more complicated because of the effect of the earth&amp;rsquo;s curvature. Although OSM stores each entity&amp;rsquo;s latitude and longitude values, its query engine doesn&amp;rsquo;t support such queries. (The &lt;a href=&#34;https://wiki.openstreetmap.org/wiki/Sophox#How_OSM_data_is_stored&#34;&gt;How OSM Data is Stored&lt;/a&gt; documentation of their SPARQL endpoint is good if you want to explore their SPARQL endpoint more.)&lt;/p&gt;
&lt;p&gt;The ability to execute real geospatial queries typically comes from an add-in to most databases. For example, if you already use Oracle for your relational data, you pay extra for &lt;a href=&#34;https://www.oracle.com/database/technologies/spatialandgraph.html&#34;&gt;Oracle Spatial&lt;/a&gt;. If you&amp;rsquo;re using the open source &lt;a href=&#34;https://www.postgresql.org/&#34;&gt;PostgreSQL&lt;/a&gt; relational database, you get the open source &lt;a href=&#34;https://postgis.net/&#34;&gt;PostGIS&lt;/a&gt; add-in. Even little &lt;a href=&#34;https://sqlite.org/index.html&#34;&gt;SQLite&lt;/a&gt; has &lt;a href=&#34;https://www.gaia-gis.it/fossil/libspatialite/index&#34;&gt;SpatiaLite&lt;/a&gt;. (If you&amp;rsquo;re storing massive amounts of data using &lt;a href=&#34;https://accumulo.apache.org/&#34;&gt;Apache Accumulo&lt;/a&gt; on a Hadoop platform, the add-in would be the open source &lt;a href=&#34;https://www.geomesa.org/&#34;&gt;GeoMesa&lt;/a&gt; suite developed at my employer &lt;a href=&#34;http://www.ccri.com/&#34;&gt;CCRi&lt;/a&gt;. Being around this project has taught me a lot about the issues of geospatial processing.)&lt;/p&gt;
&lt;p&gt;The &lt;a href=&#34;http://linkedgeodata.org/About&#34;&gt;LinkedGeoData.org&lt;/a&gt; project from the University of Leipzig&amp;rsquo;s Agile Knowledge Engineering and Semantic Web (AKSW) research group &amp;ldquo;uses the information collected by the OpenStreetMap project and makes it available as an RDF knowledge base according to the Linked Data principles&amp;rdquo;. It includes a &lt;a href=&#34;http://linkedgeodata.org/sparql&#34;&gt;SPARQL endpoint&lt;/a&gt; but I could find no documentation or examples of geospatial extensions to SPARQL. The endpoint is currently up and running, but the &amp;ldquo;About/News&amp;rdquo; page shows no activity on the project since May of last year. (A query for resources with an &lt;code&gt;rdfs:label&lt;/code&gt; of  &amp;ldquo;Grand Central Station&amp;rdquo; returned the URIs &lt;a href=&#34;http://linkedgeodata.org/triplify/node291087340&#34;&gt;http://linkedgeodata.org/triplify/node291087340&lt;/a&gt; and &lt;a href=&#34;http://linkedgeodata.org/triplify/way189853520&#34;&gt;http://linkedgeodata.org/triplify/way189853520&lt;/a&gt;, both of which returned &lt;a href=&#34;https://en.wikipedia.org/wiki/List_of_HTTP_status_codes#5xx_Server_errors&#34;&gt;HTTP 500&lt;/a&gt;. )&lt;/p&gt;
&lt;p&gt;A standardized extension for SPARQL called &lt;a href=&#34;https://en.wikipedia.org/wiki/OGC_GeoSPARQL&#34;&gt;GeoSPARQL&lt;/a&gt; specifies how to make queries about spatial information in which you can do things like specify criteria in terms of miles or kilometers, and a SPARQL engine that supports this standard will do the necessary trigonometry to give you the right answers. GeoSPARQL is sponsored by the &lt;a href=&#34;https://www.opengeospatial.org/&#34;&gt;Open Geospatial Consortium&lt;/a&gt;, who is also responsible for other popular geospatial standards such as the Web Feature Service and Web Map Service standards for REST API access to geospatial data. I have used both often at work. Looking at their &lt;a href=&#34;https://www.opengeospatial.org/docs/is&#34;&gt;standards&lt;/a&gt; page, I only just now learned that they are also the standards body behind &lt;a href=&#34;https://www.opengeospatial.org/standards/kml&#34;&gt;KML&lt;/a&gt;. Their &lt;a href=&#34;http://defs.opengis.net/elda-common/ogc-def/resource?uri=http://www.opengis.net/def/function/geosparql/&amp;amp;_format=html&#34;&gt;GeoSPARQL Functions&lt;/a&gt; page documents the extension functions. (I have co-workers who understand what the mathematical concept of a &amp;ldquo;convex hull&amp;rdquo; is; I have tried with little success.)&lt;/p&gt;
&lt;p&gt;The &lt;a href=&#34;http://www.geosparql.org/&#34;&gt;geosparql.org website&lt;/a&gt; has some preloaded data where you can try GeoSPARQL queries. I wanted to explore the possibilities of using a geospatial SPARQL extension, ideally GeoSPARQL, with data that I could control. Because I &lt;a href=&#34;http://www.bobdc.com/blog/json2skos/&#34;&gt;just love&lt;/a&gt; converting triples from one namespace to another so that I can use new tools and standards with them, I hoped to get some OSM triples and convert them to the right namespaces to enable geospatial queries on them using a local triplestore. I decided that a simpler first step would be to pull down some triples from geosparql.org and load those into the local triplestore, because I already knew that those would work with standard GeoSPARQL queries.&lt;/p&gt;
&lt;p&gt;The two downloadable triplestores that I could find that claimed geospatial support were &lt;a href=&#34;https://www.blazegraph.com/&#34;&gt;Blazegraph&lt;/a&gt; and &lt;a href=&#34;http://parliament.semwebcentral.org/&#34;&gt;Parliament&lt;/a&gt;.  (Blazegraph&amp;rsquo;s 2010 slides &amp;ldquo;Geospatial RDF Data Geospatial RDF Data&amp;rdquo; (&lt;a href=&#34;https://www.blazegraph.com/whitepapers/bigdata_geospatial.pdf&#34;&gt;pdf&lt;/a&gt;) provide a good introduction to issues of geospatial indexing.)  I got the sample query to run against their sample data as described on their &lt;a href=&#34;https://wiki.blazegraph.com/wiki/index.php/GeoSpatial&#34;&gt;Querying Geospatial Data&lt;/a&gt; page, but I had no luck when I tried to modify it to work with data that I had loaded into it. The &lt;code&gt;geo:predicate&lt;/code&gt; triple in their  sample query seems to be necessary, but I wasn&amp;rsquo;t querying for both location and time like their example does, and although I tried different objects for a triple using this property I couldn&amp;rsquo;t get it to work and gave up. (Since Amazon Neptune&amp;rsquo;s acqui-hire of most if not all of the Blazegraph staff, it doesn&amp;rsquo;t seem to be under active development anyway.)&lt;/p&gt;
&lt;p&gt;Parliament comes from Raytheon subsidiary BBN, a company with a &lt;a href=&#34;https://en.wikipedia.org/wiki/BBN_Technologies&#34;&gt;long history&lt;/a&gt; in important computer technology. This triplestore  promised not just geospatial support but support for the GeoSPARQL standard. I got Parliament up and running locally and found a localhost page about indexes that showed that the data I was using did not have a geospatial index, and I saw no way to create one. Their five-year-old User Guide (&lt;a href=&#34;http://parliament.semwebcentral.org/ParliamentUserGuide.pdf&#34;&gt;PDF&lt;/a&gt;) has a &amp;ldquo;Configuring Indexes&amp;rdquo; section consisting of the four words &amp;ldquo;Yet to be written&amp;rdquo;. I gave up on Parliament after some LinkedIn searches showed that the main people attached to the project are no longer at BBN.&lt;/p&gt;
&lt;p&gt;In the middle of all this research I learned some great news: &lt;a href=&#34;https://jena.apache.org/&#34;&gt;Apache Jena&lt;/a&gt; had &lt;a href=&#34;https://jena.apache.org/documentation/query/spatial-query.html&#34;&gt;some geospatial support&lt;/a&gt; that required the use of Lucene or Solr, the use of a custom querying vocabulary, and a lot of manual index configuration, but they are now &lt;a href=&#34;https://github.com/galbiston/geosparql-jena&#34;&gt;ramping up code development&lt;/a&gt; on direct support for the GeoSPARQL standard. That&amp;rsquo;s why this blog entry has a subtitle of &amp;ldquo;Part 1&amp;rdquo;, and I look forward to trying out GeoSPARQL in a locally running copy of Jena&amp;rsquo;s Fuseki server and then writing Part 2. (I&amp;rsquo;m going to be patient as I wait for it to be included in the binary release of Fuseki—or to put it another way, I&amp;rsquo;m too lazy to set up the environment to build it from the current &lt;a href=&#34;https://gitbox.apache.org/repos/asf?p=jena.git&#34;&gt;source&lt;/a&gt;.) And, once a &lt;a href=&#34;https://www.w3.org/community/sparql-12/&#34;&gt;SPARQL 1.2&lt;/a&gt; Recommendation gets closer and I update my book &lt;a href=&#34;http://www.learningsparql.com&#34;&gt;Learning SPARQL&lt;/a&gt;, I thought it would be a good idea to cover GeoSPARQL, so I&amp;rsquo;ll be happy to see support for SPARQL&amp;rsquo;s standardized geospatial extension in the triplestore that is already used in many of the book&amp;rsquo;s examples.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2019">2019</category>
      
      <category domain="https://www.bobdc.com//categories/rdf">RDF</category>
      
      <category domain="https://www.bobdc.com//categories/geosparql">GeoSPARQL</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
      <category domain="https://www.bobdc.com//categories/gis">gis</category>
      
    </item>
    
    <item>
      <title>Converting JSON-LD schema.org RDF to other vocabularies</title>
      <link>https://www.bobdc.com/blog/json2skos/</link>
      <pubDate>Sun, 12 May 2019 11:20:00 +0000</pubDate>
      
      <guid>https://www.bobdc.com/blog/json2skos/</guid>
      
      
      <description><div>So that we can use tools designed around those vocabularies.</div><div>&lt;!-- relevant files are in /home/bob/bb/projects/snee2bobdc --&gt;
&lt;blockquote class=&#34;pullquote&#34;&gt;Once you&#39;ve got data in any standardized RDF syntax, you can convert it to use whatever namespaces you want.&lt;/blockquote&gt;
&lt;p&gt;&lt;a href=&#34;http://www.bobdc.com/blog/json-ld/&#34;&gt;Last month&lt;/a&gt; I wrote about how we can treat the growing amount of JSON-LD in the world as RDF. By &amp;ldquo;treat&amp;rdquo; I mean &amp;ldquo;query it with SPARQL and use it with the wide choice of RDF application development tools out there&amp;rdquo;. While I did demonstrate that JSON-LD does just fine with URIs from outside of the  &lt;a href=&#34;https://schema.org/&#34;&gt;schema.org&lt;/a&gt; vocabulary, the vast majority of JSON-LD out there uses schema.org.&lt;/p&gt;
&lt;p&gt;Some people fret about the &amp;ldquo;one schema to rule them all&amp;rdquo; approach. I don&amp;rsquo;t worry so much because one of the great things about RDF is that once you&amp;rsquo;ve got data in any standardized RDF syntax, you can convert it to use whatever namespaces you want. Today we&amp;rsquo;ll see how I did this so that I could load JSON-LD schema.org metadata from my blog into a SKOS visualization tool.&lt;/p&gt;
&lt;p&gt;I also mentioned last month that the Hugo platform that I recently started using for my blog, in its default configuration, automatically generates JSON-LD metadata about my blog entries. The old Movable Type platform that I formerly used let me assign categories and tags to the entries, so when I migrated the old entries I brought those along.&lt;/p&gt;
&lt;p&gt;Here is an excerpt of some metadata from one of my blog entries after I converted it from JSON-LD to Turtle:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;[ a                         schema:BlogPosting ;
  schema:author             &amp;#34;Bob DuCharme&amp;#34; ;
  schema:datePublished      &amp;#34;2019-02-24 10:45:30 -0500 EST&amp;#34;^^schema:Date ;
  schema:description        &amp;#34;A quick reference.&amp;#34; ;
  schema:headline           &amp;#34;curling SPARQL&amp;#34; ;
  schema:inLanguage         &amp;#34;en&amp;#34; ;
  schema:keywords           &amp;#34;SPARQL&amp;#34; , &amp;#34;curl&amp;#34; , &amp;#34;Blog&amp;#34; ;
  schema:name               &amp;#34;curling SPARQL&amp;#34; ;
  schema:url                &amp;lt;http://www.bobdc.com/blog/curling-sparql/&amp;gt; .
] .
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;I wanted to convert that to RDF that met three conditions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Instead of using a blank node as the subject, use the &lt;code&gt;schema:url&lt;/code&gt; value included with the data.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Define SKOS concepts for each &lt;code&gt;schema:keywords&lt;/code&gt; value that the metadata uses.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Use &lt;a href=&#34;https://en.wikipedia.org/wiki/Dublin_Core&#34;&gt;Dublin Core&lt;/a&gt; properties to connect as much of the metadata as possible, including the SKOS concepts, to the posting.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This SPARQL query made this all quite straightforward:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;# convertTriples.rq

PREFIX schema: &amp;lt;http://schema.org/&amp;gt;
PREFIX dc:  &amp;lt;http://purl.org/dc/elements/1.1/&amp;gt;
PREFIX skos: &amp;lt;http://www.w3.org/2004/02/skos/core#&amp;gt;

CONSTRUCT {
  ?url   dc:title ?name ;
         dc:creator ?author ;
         dc:description ?description ;
         dc:subject ?kwURI . 
  ?kwURI a skos:Concept ;
         skos:prefLabel ?keyword . 
}
WHERE {
  ?entry schema:url ?url ;
         schema:name ?name ;
	 schema:author ?author ;
	 schema:description ?description .
	 OPTIONAL {
	    ?entry schema:keywords ?keyword . 
	    BIND(URI(concat(&amp;#34;http://bobdc.com/tags/&amp;#34;,?keyword))
	        AS ?kwURI)
	 }
}	 
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;The WHERE clause, as always, grabs the needed values. Instead of assuming that every blog entry has keywords assigned, I put the part that handles those inside of an OPTIONAL clause.&lt;/p&gt;
&lt;p&gt;That part also creates a URI from each &lt;code&gt;schema:keywords&lt;/code&gt; value to be the identity for the SKOS concept built from that keyword. To do that, I originally concatenated the values onto the base URI &lt;code&gt;http://bobdc.com/blog/kwords/&lt;/code&gt; that I just made up, but when I noticed that Hugo creates pages for each keyword at &lt;code&gt;http://www.bobdc.com/tags/&lt;/code&gt; I realized something nice: instead of a base URI that I made up from scratch, using the Hugo-generated one would give me dereferenceable URIs for the SKOS concepts. (For example, my earlier blog entries about DBpedia are assigned the keyword &amp;ldquo;dbpedia&amp;rdquo;, which becomes the URI &lt;a href=&#34;http://www.bobdc.com/tags/dbpedia&#34;&gt;http://www.bobdc.com/tags/dbpedia&lt;/a&gt;  of a concept about DBpedia, and you can click that URI to see a list of those blog entries.) So I used that as the base URI when creating the URI for each new SKOS concept.&lt;/p&gt;
&lt;p&gt;The CONSTRUCT clause follows through on the tasks listed in my bulleted list above.&lt;/p&gt;
&lt;p&gt;SKOS is typically used to arrange topics into hierarchies, so that if for example your SKOS vocabulary says that &amp;ldquo;collie&amp;rdquo; has a broader value of &amp;ldquo;dog&amp;rdquo; and you&amp;rsquo;re looking for articles about dogs, you can retrieve all the ones tagged with &amp;ldquo;dog&amp;rdquo; or with any of the values in the SKOS subtree below &amp;ldquo;dog&amp;rdquo; such as &amp;ldquo;collie&amp;rdquo;. After running the query above I had a list of concepts with no hierarchy, so I created one. Of course there are GUI tools that let you click and drag to turn such a list into a hierarchy; the use of these tools  is one of the reasons for converting the schema.org keyword metadata into SKOS metadata. Instead of using one of these tools, though, I found it simpler to just type out a text file with lines like this,&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;XSLT broader XML 
XBRL broader XML 
mysql broader SQL 
audio broader music 
bass broader music 
D2RQ broader RDF 
RDFa broader RDF 
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;and then, after typing in some namespace declarations at the very top, doing a few global replacements to turn it into this:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;bt:XSLT skos:broader bt:XML . 
bt:XBRL skos:broader bt:XML . 
bt:mysql skos:broader bt:SQL . 
bt:audio skos:broader bt:music . 
bt:bass skos:broader bt:music . 
bt:D2RQ skos:broader bt:RDF . 
bt:RDFa skos:broader bt:RDF . 
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Useful data modeling can sometimes be simple.&lt;/p&gt;
&lt;p&gt;I added these triples to the result of the CONSTRUCT query above and loaded the resulting SKOS into the wonderful &lt;a href=&#34;http://labs.sparna.fr/skos-play/&#34;&gt;SKOS Play!&lt;/a&gt; site&amp;rsquo;s visualizer. (&lt;a href=&#34;https://en.wikipedia.org/wiki/Cosplay&#34;&gt;Pun&lt;/a&gt; intended?) My not-very-controlled-vocabulary had a lot of orphan elements, so to make a nicer visualization I used the following query to pull,  from the result of the earlier CONSTRUCT query, only SKOS concepts taking part in some hierarchy:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;# getChildrenAndParents.rq

PREFIX skos:  &amp;lt;http://www.w3.org/2004/02/skos/core#&amp;gt;

CONSTRUCT {
  ?child ?childP ?childO .
  ?parent ?parentP ?parentO .
}
WHERE {
  ?child skos:broader ?parent .
  ?child ?childP ?childO .
  ?parent ?parentP ?parentO .
}
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Nothing OPTIONAL in that query!&lt;/p&gt;
&lt;p&gt;With the results of the first &lt;code&gt;convertTriples.rq&lt;/code&gt; CONSTRUCT query in a file called &lt;code&gt;convertedTriples.ttl&lt;/code&gt; and the additional &lt;code&gt;skos:broader&lt;/code&gt; triples in the &lt;code&gt;additionalModeling.ttl&lt;/code&gt; file, I had the  &lt;a href=&#34;https://jena.apache.org/documentation/query/cmds.html&#34;&gt;Jena arq command line tool&lt;/a&gt; run this new query on the combined data to create something to load into SKOS Play:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;arq --query getChildrenAndParents.rq --data convertedTriples.ttl --data additionalModeling.ttl &amp;gt; conceptTrees.ttl
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;arq&amp;rsquo;s ability to accept multiple &lt;code&gt;--data&lt;/code&gt; arguments (potentially each using different RDF syntaxes!) can be very handy sometimes.&lt;/p&gt;
&lt;p&gt;On the SKOS Play &lt;a href=&#34;http://labs.sparna.fr/skos-play/upload&#34;&gt;Play&lt;/a&gt; page, I used the local file option to upload the &lt;code&gt;conceptTrees.ttl&lt;/code&gt; file created by the arq command line shown above. (The page includes some options that look fun to play with: Infer on subclasses and subproperties, Handle SKOS-XL properties, and Transform an OWL ontology to SKOS.)&lt;/p&gt;
&lt;p&gt;When I clicked the page&amp;rsquo;s orange Next button, the site parsed my uploaded file, told me how many concepts it found, and offered some options for how to display it. I went with the default Visualize option of Tree Visualization, which displayed the &lt;code&gt;skosplay:allData&lt;/code&gt; node you see on the left below and the first row of nodes to the right of that. Clicking blue nodes displays their child nodes, and you can see the result after I clicked the RDF and XML ones.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;https://www.bobdc.com/img/main/skosplay1.png&#34; alt=&#34;SKOS Play image&#34; /&gt;&lt;/p&gt;
&lt;p&gt;SKOS concepts use URIs as their identity and &lt;code&gt;skos:prefLabel&lt;/code&gt; values to show human-readable values in as many languages as you like. You can see that the SKOS Play diagram uses &lt;code&gt;skos:prefLabel&lt;/code&gt; values when available, and the full URLs at the top of my diagram show that a few concepts still need &lt;code&gt;skos:prefLabel&lt;/code&gt; values. (The &lt;code&gt;convertTriples.rq&lt;/code&gt; query created them for most concepts.) It&amp;rsquo;s a nice example of how such tools can help us identify ways to improve our data, but of course a query for concepts that lack &lt;code&gt;skos:prefLabel&lt;/code&gt; values would be easy enough.&lt;/p&gt;
&lt;p&gt;I didn&amp;rsquo;t even do anything with the triples that I converted to use the Dublin Core vocabulary, but as a long-standing popular standard, there are plenty of tools out there that can work with it. They can help to make an even better case that if schema.org JSON-LD triples don&amp;rsquo;t conform to the vocabulary that you want to use, just convert them!&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2019">2019</category>
      
      <category domain="https://www.bobdc.com//categories/rdf">RDF</category>
      
      <category domain="https://www.bobdc.com//categories/json">JSON</category>
      
      <category domain="https://www.bobdc.com//categories/skos">SKOS</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
    </item>
    
    <item>
      <title>Exploring JSON-LD</title>
      <link>https://www.bobdc.com/blog/json-ld/</link>
      <pubDate>Sun, 21 Apr 2019 11:20:00 +0000</pubDate>
      
      <guid>https://www.bobdc.com/blog/json-ld/</guid>
      
      
      <description><div>And of course, querying it with SPARQL.</div><div>&lt;img src=&#34;https://www.bobdc.com/img/main/json-ld-data.png&#34; width=&#34;200px&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;JSON-LD logo&#34;/&gt;
&lt;p&gt;I paid little attention to JSON-LD until recently. I just thought of it as another RDF serialization format that, because it&amp;rsquo;s valid JSON, had more appeal to people normally uninterested in RDF. Dan Brickley&amp;rsquo;s December tweet that &amp;ldquo;&lt;a href=&#34;https://twitter.com/danbri/status/1078760462723022849&#34;&gt;JSON-LD is much more widely used than Turtle&lt;/a&gt;&amp;rdquo; inspired me to look a little harder at the JSON-LD ecosystem, and I found a lot of great things. To summarize: the amount of JSON-LD data out there is exploding, and we can query it with SPARQL, so it offers many new possibilities for RDF-based applications.&lt;/p&gt;
&lt;h2 id=&#34;json-ld-structure&#34;&gt;JSON-LD structure&lt;/h2&gt;
&lt;p&gt;The &lt;a href=&#34;https://json-ld.org/primer/latest/&#34;&gt;primer on the json-ld.org site&lt;/a&gt; is a good way to get a quick introduction to the syntax. The W3C&amp;rsquo;s &lt;a href=&#34;https://www.w3.org/2013/dwbp/wiki/RDF_AND_JSON-LD_UseCases&#34;&gt;RDF AND JSON-LD UseCases&lt;/a&gt; document has a  &lt;a href=&#34;https://www.w3.org/2013/dwbp/wiki/RDF_AND_JSON-LD_UseCases#Differences_with_RDF&#34;&gt;Differences with RDF&lt;/a&gt; section that provides a nice summary for people coming to JSON-LD from the RDF world.&lt;/p&gt;
&lt;p&gt;To get to know the JSON-LD syntax, I created a Turtle file with examples of some trickier RDF features and then converted it to JSON-LD to see what it looked like. My Turtle:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;    @prefix ab:   &amp;lt;http://learningsparql.com/ns/sample#&amp;gt; .
    @prefix rdfs: &amp;lt;http://www.w3.org/2000/01/rdf-schema#&amp;gt; .
    @prefix v:    &amp;lt;http://www.w3.org/2006/vcard/&amp;gt; .
    @prefix dc:   &amp;lt;http://purl.org/dc/elements/1.1/&amp;gt; .
    
    # Sample comment: I wish I could get Hugo to do syntax highlighting of Turtle!
    
    ab:i432 ab:firstName     &amp;#34;Richard&amp;#34; ;
            ab:lastName      &amp;#34;Mutt&amp;#34; ;
            ab:startYear     2013 ;
            ab:officer       true ;
            ab:reportsTo     ab:i193 ;
            ab:linkedIn      &amp;lt;https://www.linkedin.com/in/rmutt&amp;gt; ;
            ab:address       _:b1 .
    
    _:b1    ab:city          &amp;#34;Springfield&amp;#34; ;
            ab:streetAddress &amp;#34;32 Main St.&amp;#34; .
    
    ab:i193 ab:firstName     &amp;#34;Joan&amp;#34; ;
            ab:lastName      &amp;#34;Jones&amp;#34; ;
            v:title &amp;#34;Director&amp;#34;@en ; 
            v:title &amp;#34;Directeur&amp;#34;@fr .
            
    &amp;lt;urn:isbn:123456789X&amp;gt; dc:creator ab:i193 ;
            dc:title &amp;#34;Chicken Soup for the JSON-LD Soul&amp;#34; . 
    
    ab:firstName rdfs:label  &amp;#34;first name&amp;#34; .
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;It has some information in different types about employee Richard Mutt with an object property to identify his boss and a blank node to hold together the details of his address. Triples about his boss list her job title in both English and French; they also show her as the author of a book whose name is specified with a &amp;ldquo;title&amp;rdquo; property in a different namespace from the property identifying her job title.&lt;/p&gt;
&lt;p&gt;The Jena command line utilities that I currently have installed don&amp;rsquo;t write JSON-LD (as we&amp;rsquo;ll see, they can read it) (2021 update: they do now, for example with &lt;a href=&#34;http://www.bobdc.com/blog/jenagems/#riot&#34;&gt;riot&lt;/a&gt; &lt;code&gt;--syntax=jsonld&lt;/code&gt;) so I used the &lt;a href=&#34;http://www.easyrdf.org/converter&#34;&gt;easyrdf.org&lt;/a&gt; website to convert the Turtle sample above to JSON-LD. I&amp;rsquo;m tempted to include a screen shot of the result—it was a dense mass without a single carriage return, showing how the &lt;a href=&#34;https://json-ld.org/&#34;&gt;JSON-LD&lt;/a&gt; home page assertion that JSON-LD is &amp;ldquo;easy for humans to read and write&amp;rdquo; should be qualified with &amp;ldquo;if you add carriage returns and indenting in all the right places&amp;rdquo;. Of course, just about any programming or markup language is easy for humans to read and write if you add white space in all the right places, so this does not make JSON-LD special. (I do find it amusing when a set of software developers generalize from themselves to their entire species.)&lt;/p&gt;
&lt;p&gt;The &lt;a href=&#34;https://stedolan.github.io/jq/&#34;&gt;jq&lt;/a&gt; utility nicely converted the easyrdf output into something easier for humans to read. Here is the result:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-json&#34; data-lang=&#34;json&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    [
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;@id&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;_:b0&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;http://learningsparql.com/ns/sample#city&amp;#34;&lt;/span&gt;: [
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;          {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;@value&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;Springfield&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;          }
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        ],
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;http://learningsparql.com/ns/sample#streetAddress&amp;#34;&lt;/span&gt;: [
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;          {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;@value&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;32 Main St.&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;          }
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        ]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      },
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;@id&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;http://learningsparql.com/ns/sample#firstName&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;http://www.w3.org/2000/01/rdf-schema#label&amp;#34;&lt;/span&gt;: [
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;          {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;@value&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;first name&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;          }
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        ]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      },
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;@id&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;http://learningsparql.com/ns/sample#i0193&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      },
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;@id&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;http://learningsparql.com/ns/sample#i193&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;http://learningsparql.com/ns/sample#firstName&amp;#34;&lt;/span&gt;: [
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;          {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;@value&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;Joan&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;          }
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        ],
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;http://learningsparql.com/ns/sample#lastName&amp;#34;&lt;/span&gt;: [
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;          {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;@value&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;Jones&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;          }
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        ],
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;http://www.w3.org/2006/vcard/title&amp;#34;&lt;/span&gt;: [
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;          {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;@value&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;Director&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;@language&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;en&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;          },
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;          {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;@value&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;Directeur&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;@language&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;fr&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;          }
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        ]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      },
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;@id&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;http://learningsparql.com/ns/sample#i432&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;http://learningsparql.com/ns/sample#firstName&amp;#34;&lt;/span&gt;: [
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;          {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;@value&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;Richard&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;          }
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        ],
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;http://learningsparql.com/ns/sample#lastName&amp;#34;&lt;/span&gt;: [
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;          {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;@value&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;Mutt&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;          }
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        ],
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;http://learningsparql.com/ns/sample#startYear&amp;#34;&lt;/span&gt;: [
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;          {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;@value&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#ae81ff&#34;&gt;2013&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;          }
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        ],
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;http://learningsparql.com/ns/sample#officer&amp;#34;&lt;/span&gt;: [
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;          {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;@value&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#66d9ef&#34;&gt;true&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;          }
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        ],
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;http://learningsparql.com/ns/sample#reportsTo&amp;#34;&lt;/span&gt;: [
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;          {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;@id&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;http://learningsparql.com/ns/sample#i0193&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;          }
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        ],
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;http://learningsparql.com/ns/sample#linkedIn&amp;#34;&lt;/span&gt;: [
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;          {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;@id&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;https://www.linkedin.com/in/rmutt&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;          }
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        ],
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;http://learningsparql.com/ns/sample#address&amp;#34;&lt;/span&gt;: [
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;          {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;@id&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;_:b0&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;          }
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        ]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      },
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;@id&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;https://www.linkedin.com/in/rmutt&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      },
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;@id&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;urn:isbn:123456789X&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;http://purl.org/dc/elements/1.1/creator&amp;#34;&lt;/span&gt;: [
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;          {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;@id&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;http://learningsparql.com/ns/sample#i193&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;          }
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        ],
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;http://purl.org/dc/elements/1.1/title&amp;#34;&lt;/span&gt;: [
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;          {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#f92672&#34;&gt;&amp;#34;@value&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;Chicken Soup for the JSON-LD Soul&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;          }
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        ]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      }
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    ]
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Except for JSON&amp;rsquo;s inability to store comments, the converted version shows that JSON-LD managed to represent all the tricky RDF bits that I included in the input.&lt;/p&gt;
&lt;p&gt;With the &lt;a href=&#34;https://jena.apache.org/documentation/query/cmds.html&#34;&gt;Jena arq command line tool&lt;/a&gt; I successfully executed the following SPARQL query against the JSON-LD data above:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;CONSTRUCT { ?s ?p ?o } WHERE
{ ?s ?p ?o }
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;The query simply asks for all the triples. My arq command line asked for output in the default format of Turtle, and it worked fine.&lt;/p&gt;
&lt;p&gt;There are two bits of big news here for RDF people evaluating JSON-LD:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;I round-tripped some fairly complex RDF in and out of JSON-LD with no loss of anything but the comment.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;I performed a SPARQL query on JSON-LD. This demonstrates that the exploding amount of JSON-LD out there is available for use in RDF applications.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;sparql-queries-of-public-json-ld&#34;&gt;SPARQL queries of public JSON-LD&lt;/h2&gt;
&lt;p&gt;Next I queried some non-demo real-world data. The &lt;a href=&#34;https://www.overstock.com/&#34;&gt;overstock.com&lt;/a&gt; website has rich JSON-LD data about all of their products and even includes some nice JSON-LD in their search results pages. After searching the site for &amp;ldquo;headphones&amp;rdquo; and pulling the JSON-LD from the &lt;a href=&#34;https://www.overstock.com/headphones,/k,/results.html&#34;&gt;first page of search results&lt;/a&gt;, I wrote  a script to pull the JSON-LD for the 60 or so products listed there.&lt;/p&gt;
&lt;p&gt;The &lt;a href=&#34;https://www.bobdc.com/miscfiles/overstockComHeadphones.ttl&#34;&gt;aggregated data&lt;/a&gt; has 8,808 triples with 27 different predicates. If you do a View Source on the web page of a &lt;a href=&#34;https://www.overstock.com/Electronics/Mpow-Jaws-Wireless-Bluetooth-4.1-Stereo-Headset-Universal-Headphone-with-Hands-Free-Calling-for-iPhone-Other-Bluetooth-Devices/14783413/product.html?refccid=C6P6EYFJ6I62ZNKNRHZHC2PPEY&amp;amp;searchidx=56&#34;&gt;typical entry&lt;/a&gt; from the headphones list (search for &amp;ldquo;ld+json&amp;rdquo;) you&amp;rsquo;ll see that its JSON-LD provides more than just a product name and images—it includes a full paragraph of description, pricing, reviews, availability, and more.&lt;/p&gt;
&lt;p&gt;The following query of that data requests the price and name (but not description) of any headphones under $30 that include &amp;ldquo;Bluetooth&amp;rdquo; in their description:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;PREFIX xsd: &amp;lt;http://www.w3.org/2001/XMLSchema#&amp;gt; 
PREFIX s:   &amp;lt;http://schema.org/&amp;gt; 

SELECT ?price ?name WHERE {
   ?i a s:Product ;
   s:name ?name ;
   ?offers ?offer ; 
   s:description ?description .
  
   ?offer s:price ?price .
   FILTER(contains(?description,&amp;#34;Bluetooth&amp;#34;)) 
   FILTER(xsd:decimal(?price) &amp;lt; 30) 
}
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Here is the result:&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;--------------------------------------------------------------------------------------------------------------------------------------------
| price   | name                                                                                                                           |
============================================================================================================================================
| &amp;#34;16.49&amp;#34; | &amp;#34;BL1 Mini Bluetooth Monaural Headphone Stereo Wireless Stealth Business Wireless Bluetooth 4.1 Headphones&amp;#34;                     |
| &amp;#34;11.24&amp;#34; | &amp;#34;Mini Wireless Bluetooth 4.0 Stereo In-Ear Headset (Black)&amp;#34;                                                                    |
| &amp;#34;13.49&amp;#34; | &amp;#34;Mpow EM 13 Mini Wireless Earbud, Bluetooth V4.1 Invisible Earphone&amp;#34;                                                           |
| &amp;#34;21.99&amp;#34; | &amp;#34;X18 Wireless Bluetooth Earbuds Headphones Stereo Sound Built-in 6.0 Noise Cancelling Mic&amp;#34;                                     |
| &amp;#34;20.99&amp;#34; | &amp;#34;Mpow Bluetooth Headphones V4.1 Wireless Sport Headphones Noise Cancelling In-ear Stereo Earbuds 8-hour Playing Time with Mic&amp;#34; |
--------------------------------------------------------------------------------------------------------------------------------------------
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Of course, all of the properties use the schema.org vocabulary, but RDFS gives us ways to map this data as to other more specialized vocabularies. I&amp;rsquo;ll show some of that next time; above, the casting of the price from a string datatype to a decimal value is one taste of how SPARQL can turn the data into something more useful.&lt;/p&gt;
&lt;h1 id=&#34;a-bright-future&#34;&gt;A bright future&lt;/h1&gt;
&lt;p&gt;It&amp;rsquo;s been a pleasant surprise to see how many different sites include JSON-LD these days. The Hugo website generation framework that I wrote about migrating to &lt;a href=&#34;http://www.bobdc.com/blog/changing-my-blogs-domain-name/&#34;&gt;last month&lt;/a&gt; adds JSON-LD metadata by default, so my new blog website had JSON-LD before I even knew it did. I&amp;rsquo;ve also been surprised by how popular JSON-LD is with the search engine optimization crowd—a &lt;a href=&#34;https://www.google.com/search?q=%22json-ld%22+seo&amp;amp;oq=%22json-ld%22+seo&#34;&gt;Google search&lt;/a&gt; for &lt;code&gt;JSON-LD SEO&lt;/code&gt; gets over 200,000 hits, and many don&amp;rsquo;t even mention RDF. They just see it as a way to add metadata that Google&amp;rsquo;s crawlers are more likely to notice.&lt;/p&gt;
&lt;p&gt;While I&amp;rsquo;m currently only interested in JSON-LD as a growing source of data that I can query with SPARQL, there are some interesting things happening with the syntax and structure of JSON-LD itself. &lt;a href=&#34;https://twitter.com/gkellogg&#34;&gt;Greg Kellogg&lt;/a&gt;&amp;rsquo;s &lt;a href=&#34;https://www.w3.org/Data/events/data-ws-2019/assets/lightning/GreggKellogg.pptx&#34;&gt;JSON-LD 1.1 Update&lt;/a&gt; gives a nice overview of the additions to JSON-LD that are being considered. I certainly plan to play with it more.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2019">2019</category>
      
      <category domain="https://www.bobdc.com//categories/rdf">RDF</category>
      
      <category domain="https://www.bobdc.com//categories/json">JSON</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
      <category domain="https://www.bobdc.com//categories/json-ld">json-ld</category>
      
    </item>
    
    <item>
      <title>Changing my blog&#39;s domain name and platform</title>
      <link>https://www.bobdc.com/blog/changing-my-blogs-domain-name/</link>
      <pubDate>Sun, 24 Mar 2019 09:00:48 +0000</pubDate>
      
      <guid>https://www.bobdc.com/blog/changing-my-blogs-domain-name/</guid>
      
      
      <description><div>New look, new domain name.</div><div>&lt;img id=&#34;idm45478314451696&#34; src=&#34;https://www.bobdc.com/img/main/hugologo.png&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;Hugo logo&#34;/&gt;
&lt;p&gt;For too long I&amp;rsquo;ve postponed the migration of my blog to something more phone-friendly. I accumulated many notes about doing this, and I also wanted to move more of my online life from the snee.com domain to bobdc.com. When someone recently asked me about changing the stylesheet (I have dug and dug in the aforementioned notes but can&amp;rsquo;t remember who and will add their name here if I ever find it) I thought I&amp;rsquo;d take a deep breath and follow through with this. This is the last new blog entry you&amp;rsquo;ll see on the snee.com domain; you&amp;rsquo;ll also find it at &lt;a href=&#34;http://www.bobdc.com/blog&#34;&gt;bobdc.com/blog&lt;/a&gt; along with converted versions of all my other blog entries since I started on snee.com/bobdc.blog in 2005. I will continue my blog on bobdc.com/blog after this entry.&lt;/p&gt;
&lt;p&gt;The conversion of the old entries was most of the work, but with some Perl and XSLT and &lt;a href=&#34;https://pandoc.org/&#34;&gt;pandoc&lt;/a&gt; and spit and duct tape I got the legacy content into pretty good shape for the new platform.&lt;/p&gt;
&lt;p&gt;Of course, the platform choice was a geeky thing to agonize over. I finally went with &lt;a href=&#34;https://gohugo.io/&#34;&gt;Hugo&lt;/a&gt;, a Go-based static site generator. (I never had to learn the &lt;a href=&#34;https://golang.org/&#34;&gt;Go&lt;/a&gt; programming language, but it looks cool enough.)&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s a bit scary to think of the high percentage of the world&amp;rsquo;s blog entries that are created by data entry into web forms that then use a bunch of PHP to manage that content&amp;rsquo;s storage in relational databases. Having spent much of my career helping people store non-tabular content in standards-based non-tabular storage tools, I definitely wanted to get away from using PHP and relational database managers for narrative content, so I researched various static site generators before settling on Hugo.&lt;/p&gt;
&lt;p&gt;Simple web sites like my &lt;a href=&#34;http://www.learningsparql.com&#34;&gt;learningsparql.com&lt;/a&gt; and &lt;a href=&#34;http://www.datascienceglossary.org&#34;&gt;datascienceglossary.org&lt;/a&gt; sites are just plain static sites: HTML files that I edit as necessary. A static site generator lets you store content separate from the styling and then generates HTML for your site based on the combination. If you want to change your website&amp;rsquo;s layout or styling, you edit the CSS or whatever and then regenerate the HTML. (The version of MovableType that I used on snee.com actually did static site generation, but all the styling was managed with a mess of old PHP. I haven&amp;rsquo;t upgraded it in ten years because the &lt;a href=&#34;https://www.bobdc.com/blog/upgrading-to-movable-type-4&#34;&gt;last time I did&lt;/a&gt; it broke so much.) A selling point of Hugo is that it does this very quickly&amp;ndash;or, to use the now-clichéd phrase that they prefer, &amp;ldquo;&lt;a href=&#34;https://www.google.com/search?q=%22blazingly+fast%22+hugo&#34;&gt;blazingly fast&lt;/a&gt;&amp;rdquo;.&lt;/p&gt;
&lt;p&gt;I knew about &lt;a href=&#34;https://jekyllrb.com/&#34;&gt;Jekyll&lt;/a&gt; and &lt;a href=&#34;http://www.sphinx-doc.org/en/master/&#34;&gt;Sphinx&lt;/a&gt; from work because both are used for &lt;a href=&#34;https://www.geomesa.org/&#34;&gt;geomesa.org&lt;/a&gt;. After researching alternatives I decided that I liked the available Hugo &lt;a href=&#34;https://themes.gohugo.io/&#34;&gt;themes&lt;/a&gt; the most. The Hugo documentation isn&amp;rsquo;t very good, but the people on the &lt;a href=&#34;https://discourse.gohugo.io/&#34;&gt;discussion forum&lt;/a&gt; are very helpful, sometimes answering within minutes. If there is any interest I may write a blog entry about the important Hugo techniques I had to track down to customize my blog because they were not written up in an easily findable place.&lt;/p&gt;
&lt;p&gt;You store your Hugo content separately from the styling using Hugo&amp;rsquo;s &lt;a href=&#34;https://gohugo.io/content-management/formats/&#34;&gt;own variation&lt;/a&gt; of &lt;a href=&#34;https://en.wikipedia.org/wiki/Markdown&#34;&gt;markdown&lt;/a&gt;. As a longstanding XML guy ever since it was a &lt;a href=&#34;https://en.wikipedia.org/wiki/Standard_Generalized_Markup_Language&#34;&gt;four-letter word&lt;/a&gt;, I have ranted about what&amp;rsquo;s wrong with markdown&amp;ndash;or, as I should say, &amp;ldquo;the markdowns&amp;rdquo;&amp;ndash; but it works for what I want to do in my blog and you can embed just about any sensible HTML you want in places where markdown falls short. I would have preferred a static site generator where the content I wrote for each new blog entry conformed to some simple XHTML profile but I just couldn&amp;rsquo;t find anything with good themes and the right level of automation.&lt;/p&gt;
&lt;p&gt;In the lower-right of my &lt;a href=&#34;http://snee.com/bobdc.blog/&#34;&gt;snee.com blog&lt;/a&gt; you&amp;rsquo;ll see four variations on Atom and RSS feeds. More than one Atom or RSS feed seems to be difficult in Hugo, so my new blog&amp;rsquo;s &lt;a href=&#34;http://www.bobdc.com/blog/atom.xml&#34;&gt;Atom feed&lt;/a&gt; has summaries and links to the original postings and the new blog&amp;rsquo;s &lt;a href=&#34;http://www.bobdc.com/blog/index.xml&#34;&gt;RSS&lt;/a&gt; feed has the full entries. I will be setting the snee.com ones to redirect to the bobdc.com ones shortly, but you can just subscribe to the new ones now if you like.&lt;/p&gt;
&lt;p&gt;So, I apologize for the lack of phone-friendliness of my blog for the last few years and hope you enjoy the new more &lt;a href=&#34;https://en.wikipedia.org/wiki/Responsive_web_design&#34;&gt;responsive&lt;/a&gt; version of my blog.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2019">2019</category>
      
      <category domain="https://www.bobdc.com//categories/blogging-about-blogging">blogging about blogging</category>
      
    </item>
    
    <item>
      <title>curling SPARQL</title>
      <link>https://www.bobdc.com/blog/curling-sparql/</link>
      <pubDate>Sun, 24 Feb 2019 10:45:30 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/curling-sparql/</guid>
      
      
      <description><div>A quick reference.</div><div>&lt;p&gt;I&amp;rsquo;ve been using the &lt;a href=&#34;https://curl.haxx.se/&#34;&gt;curl&lt;/a&gt; utility to retrieve data from SPARQL endpoints for years, but I still have trouble remembering some of the important syntax, so I jotted down a quick reference for myself and I thought I&amp;rsquo;d share it. I also added some background.&lt;/p&gt;
&lt;h2 id=&#34;idm45504284143088&#34;&gt;Quick reference&lt;/h2&gt;
&lt;p&gt;Submit a URL-encoded SPARQL query on the operating system command line to the endpoint &lt;code&gt;http://edan.si.edu/saam/sparql&lt;/code&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;curl &amp;quot;http://edan.si.edu/saam/sparql?query=SELECT%20*%20WHERE%20%7B%3Fs%20%3Fp%20%3Fo%7D%20LIMIT%208&amp;quot;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;(Quoting the URL isn&amp;rsquo;t always necessary, but won&amp;rsquo;t hurt. Omitting it may hurt if some of the characters mean something special to your operating system&amp;rsquo;s command line interpreter.)&lt;/p&gt;
&lt;p&gt;Submit the same query stored in the file query1.rq:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;curl --data-urlencode &amp;quot;query@query1.rq&amp;quot; http://edan.si.edu/saam/sparql
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;There is no need to escape the query in the file, because the &lt;code&gt;--data-urlencode&lt;/code&gt; parameter tells curl to do so.&lt;/p&gt;
&lt;p&gt;The above queries return the data in whatever format the endpoint&amp;rsquo;s system administrators chose as the default. You can pass a request header to specify that you want a particular format. The following requests comma-separated values:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;curl -H &amp;quot;Accept: text/csv&amp;quot; --data-urlencode &amp;quot;query@query1.rq&amp;quot;  http://edan.si.edu/saam/sparql
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Other possible content types are &lt;code&gt;application/sparql-results+json&lt;/code&gt;, &lt;code&gt;application/sparql-results+xml&lt;/code&gt;, and &lt;code&gt;text/tab-separated-values&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;The above examples all use a SELECT query. A CONSTRUCT query requests triples, so instead of CSV or one of the other tabular formats you want an RDF serialization such as Turtle:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;curl -H &amp;quot;Accept: text/turtle&amp;quot; --data-urlencode &amp;quot;query@query2.rq&amp;quot;  http://edan.si.edu/saam/sparql
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Other possible content types for CONSTRUCT queries are &lt;code&gt;application/rdf+xml&lt;/code&gt;, &lt;code&gt;application/rdf+json&lt;/code&gt;, and, for ntriples, &lt;code&gt;text/plain&lt;/code&gt;. The &lt;a href=&#34;https://github.com/bio2rdf/bio2rdf-scripts/wiki/REST-API&#34;&gt;bio2rdf github page&lt;/a&gt; has good long lists for both SELECT and CONSTRUCT content types, although not all endpoints will support all of the listed types. (It lists &lt;code&gt;text/plain&lt;/code&gt; for N-triples, but you&amp;rsquo;re better off using &lt;code&gt;application/n-triples&lt;/code&gt;.)&lt;/p&gt;
&lt;h2 id=&#34;idm45504284132128&#34;&gt;Background&lt;/h2&gt;
&lt;p&gt;curl lets you submit many kinds of HTTP requests to HTTP servers. It&amp;rsquo;s part of the Linux and MacOS operating systems, and if you don&amp;rsquo;t have it on your Windows machine, you can &lt;a href=&#34;https://curl.haxx.se/windows/&#34;&gt;download&lt;/a&gt; it.&lt;/p&gt;
&lt;p&gt;If you enter &lt;code&gt;curl&lt;/code&gt; with no parameters other than a URL, like this,&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;curl http://www.learningsparql.com
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;it does the same HTTP GET that a browser would do. This has the same effect as doing a browser View Source on that web page.&lt;/p&gt;
&lt;p&gt;It gets more interesting when you&amp;rsquo;re not pointing curl at a static web page like &lt;a href=&#34;http://www.learningsparql.com&#34;&gt;http://www.learningsparql.com&lt;/a&gt; but at a dynamic resource such as a SPARQL endpoint. A SPARQL endpoint is usually identified with a URL ending with &lt;code&gt;/sparql&lt;/code&gt;. I tested everything shown above with these endpoint URLs:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;https://query.wikidata.org/bigdata/namespace/wdq/sparql&lt;/code&gt;, the SPARQL endpoint for Wikidata.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;http://localhost:3030/myDataset/sparql&lt;/code&gt;, the SPARQL endpoint for a local instance of &lt;a href=&#34;https://jena.apache.org/documentation/fuseki2/&#34;&gt;Apache Jena Fuseki&lt;/a&gt;. This is the triplestore that I described in the &amp;ldquo;Updating Data with SPARQL&amp;rdquo; chapter of my book &lt;a href=&#34;http://www.learningsparql.com&#34;&gt;Learning SPARQL&lt;/a&gt; because, for a server that accepts SPARQL UPDATE commands, it&amp;rsquo;s so easy to get up and running. Before running the queries against this endpoint I created a dataset on this running instance with the clever name of myDataset and loaded some triples into it. As you can see, a Fuseki endpoint URL includes the dataset name.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;http://edan.si.edu/saam/sparql&lt;/code&gt;, the SPARQL endpoint for the Smithsonian Institution. I used this one in the examples here because it&amp;rsquo;s the shortest of the three endpoint URLs that I used for testing.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The simplest way to send a query to a SPARQL endpoint is to add &lt;code&gt;query=[your URL-encoded query]&lt;/code&gt; to the end of the endpoint&amp;rsquo;s URL as with the very first example above. You can paste the resulting URL into the address bar of a web browser so that the browser will retrieve the query results from the endpoint, but curl lets you retrieve the results from a command line so that you can save the returned data and use it as part of an application.&lt;/p&gt;
&lt;p&gt;URL encoding is the process of taking characters that might screw up the parsing of the URL and converting each to a percent sign followed by a number representing its Unicode code point&amp;ndash;most often, converting each space to %20. For example, the escaped version of the query &lt;code&gt;SELECT * WHERE {?s ?p ?o} LIMIT 8&lt;/code&gt; that I used in the examples above is &lt;code&gt;SELECT%20*%20WHERE%20%7B%3Fs%20%3Fp%20%3Fo%7D%20LIMIT%208&lt;/code&gt;. Most programming languages offer built-in functions to do this; I usually paste one of these queries into a form on a website like &lt;a href=&#34;https://meyerweb.com/eric/tools/dencoder/&#34;&gt;this one&lt;/a&gt; and then copy the result after having the form do the conversion.&lt;/p&gt;
&lt;p&gt;When you add the escaped query to a SPARQL endpoint URL such as the Smithsonian one and enter the result as a parameter to curl at your command line, like this,&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;curl http://edan.si.edu/saam/sparql?query=SELECT%20*%20WHERE%20%7B%3Fs%20%3Fp%20%3Fo%7D%20LIMIT%208
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;it should retrieve a &lt;a href=&#34;https://www.w3.org/TR/sparql11-results-json/&#34;&gt;SPARQL Query Results JSON Format&lt;/a&gt; version of the data requested by that query, because that&amp;rsquo;s the default format for that endpoint.&lt;/p&gt;
&lt;p&gt;I actually don&amp;rsquo;t escape queries and add them to a curl command line often. When I&amp;rsquo;m refining a query by iteratively editing and running it, re-encoding the URL each time can be a pain, so I usually store the query in a text file (query1.rq for the sample SELECT query above and query2.rq for the CONSTRUCT query) and tell curl to URL-encode the file&amp;rsquo;s contents and send the result off to the SPARQL endpoint.&lt;/p&gt;
&lt;p&gt;If I keep the file with the query in a text editor, I can refine it, save it, and run the same command over and over without worrying about escaping each revision of the query. (Because my editor is Emacs, I could actually send the query to the endpoint using Emacs &lt;a href=&#34;https://www.emacswiki.org/emacs/SPARQLMode&#34;&gt;SPARQLMode&lt;/a&gt;, but today&amp;rsquo;s topic is curl.)&lt;/p&gt;
&lt;p&gt;The curl website has plenty of &lt;a href=&#34;https://curl.haxx.se/docs/&#34;&gt;documentation&lt;/a&gt;, but you can learn a lot with just this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;  curl --help
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Among the many, many options, some useful ones are &lt;code&gt;-o&lt;/code&gt; to redirect output to a file and &lt;code&gt;-L&lt;/code&gt; for &amp;ldquo;follow location hints&amp;rdquo; (that is, if the server has instructions to redirect a request for a given URL to something else, take the hint). Another is&lt;code&gt;-I&lt;/code&gt; for &amp;ldquo;Show document info only&amp;rdquo;: just get information about the requested &amp;ldquo;document&amp;rdquo; without actually retrieving a named resource, which is useful for debugging. The classic &lt;code&gt;-v&lt;/code&gt; for &amp;ldquo;verbose&amp;rdquo; is also handy for debugging.&lt;/p&gt;
&lt;p&gt;Take a look at the available options, experiment with some SPARQL endpoints, and soon you&amp;rsquo;ll be using &amp;ldquo;curl&amp;rdquo; as a verb (for example, &amp;ldquo;I tried to curl it but I didn&amp;rsquo;t have the right certs&amp;rdquo;&amp;ndash;see the &lt;code&gt;-E&lt;/code&gt; command line option for more on that) and you won&amp;rsquo;t be talking about hairstyling, arm exercises, or sliding round stones across the ice.&lt;/p&gt;
&lt;p&gt;(I just learned about &lt;a href=&#34;http://blog.mynarz.net/2015/05/curling-sparql-http-graph-store-protocol.html&#34;&gt;Curling SPARQL HTTP Graph Store protocol&lt;/a&gt; by &lt;a href=&#34;https://twitter.com/jindrichmynarz?lang=en&#34;&gt;@jindrichmynarz&lt;/a&gt;, so if you&amp;rsquo;ve gotten this far, you&amp;rsquo;ll like that too.)&lt;/p&gt;
&lt;img id=&#34;idm45504284108544&#34; src=&#34;https://www.bobdc.com/img/main/curling.png&#34; border=&#34;0&#34; style=&#34;display: block;margin-left: auto;margin-right: auto &#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;curling lamp&#34; width=&#34;320&#34;/&gt;
&lt;p&gt;&lt;em&gt;Curling image by Greg Scheckter via &lt;a href=&#34;https://www.flickr.com/photos/gregthebusker/4767909833/&#34;&gt;flicker&lt;/a&gt; CC &lt;a href=&#34;https://creativecommons.org/licenses/by/2.0/&#34;&gt;some rights reserved&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Comments? Just tweet to @bobdc for now, because Google+ is &lt;a href=&#34;https://smallbiztrends.com/2019/02/google-plus-shutting-down.html&#34;&gt;shutting down&lt;/a&gt;. I will be moving my blog to a new more phone-responsive platform shortly and I&amp;rsquo;m researching options for hosted comments.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2019">2019</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
    </item>
    
    <item>
      <title>Querying machine learning distributional semantics with SPARQL</title>
      <link>https://www.bobdc.com/blog/querying-machine-learning-dist/</link>
      <pubDate>Sun, 20 Jan 2019 09:57:40 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/querying-machine-learning-dist/</guid>
      
      
      <description><div>Bringing together my two favorite kinds of semantics.</div><div>&lt;blockquote id=&#34;idm45622484840592&#34; class=&#34;pullquote&#34;&gt;I recommend the paper to anyone interested in SPARQL or the embedding vectors side of machine learning. They seem to have a productive future together.&lt;/blockquote&gt;
&lt;p&gt;When I wrote &lt;a href=&#34;https://www.bobdc.com/blog/semantic-web-semantics-vs-vect&#34;&gt;Semantic web semantics vs. vector embedding machine learning semantics&lt;/a&gt;, I described how distributional semantics&amp;ndash;whose machine learning implementations are very popular in modern natural language processing&amp;ndash;are quite different from the kind of semantics that RDF people usually talk about. I recently learned of a fascinating project that brings RDF technology and distributional semantics together, letting our SPARQL query logic take advantage of entity similarity as rated by machine learning models.&lt;/p&gt;
&lt;p&gt;To review a little from that blog entry: machine learning implementations of distributional semantics can identify some of the meanings of words by analyzing their relationships with other words in a set of training data. For example, after analyzing the distribution of terms in a large enough text corpus, such a system can answer the question &amp;ldquo;woman is to man as queen is to what?&amp;rdquo; Along with the answer of &amp;ldquo;king&amp;rdquo;, discussions of this technology typically bring up other examples such as the questions &amp;ldquo;walking is to walked as swimming is to what?&amp;rdquo; (an especially nice one because &amp;ldquo;swim&amp;rdquo; is an irregular verb) and &amp;ldquo;London is to England as Berlin is to what?&amp;rdquo;&lt;/p&gt;
&lt;p&gt;These examples are a bit oversimplified. Instead of such a straightforward answer, an implementation such as &lt;a href=&#34;https://github.com/dav/word2vec&#34;&gt;word2vec&lt;/a&gt; typically responds with a list of scored words. If the analyzed corpus was large enough, asking word2vec to complete the second pair in &amp;ldquo;woman man queen&amp;rdquo; will get you a list of words with &amp;ldquo;king&amp;rdquo; having the highest score. In my experiments, this was nice for the &amp;ldquo;london england berlin&amp;rdquo; case, because while germany had the highest score, prussia had the second highest, and Berlin was the capital of Prussia for a few centuries.&lt;/p&gt;
&lt;p&gt;word2vec doesn&amp;rsquo;t actually compare the strings &amp;ldquo;london&amp;rdquo; and &amp;ldquo;england&amp;rdquo; and &amp;ldquo;berlin&amp;rdquo;. It uses &lt;a href=&#34;https://en.wikipedia.org/wiki/Cosine_similarity&#34;&gt;cosine similarity&lt;/a&gt; to compare vectors that were assigned to each word as a result of the training step done with the input corpus&amp;ndash;the machine &amp;ldquo;learning&amp;rdquo; part. Then, it looks for vectors whose similarity to the berlin vector is comparable to the similarity between the london and england vectors.&lt;/p&gt;
&lt;p&gt;Some of the most interesting work in machine learning of the past few years has built on the use of vectors to represent entities other than words. The popular &lt;a href=&#34;https://medium.com/scaleabout/a-gentle-introduction-to-doc2vec-db3e8c0cce5e&#34;&gt;doc2vec&lt;/a&gt; (originally implemented by my &lt;a href=&#34;http://www.ccri.com&#34;&gt;CCRi&lt;/a&gt; co-worker Tim Emerick) does it with documents, and others have done it with &lt;a href=&#34;https://arxiv.org/abs/1603.00982&#34;&gt;audio clips&lt;/a&gt; and images.&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s one thing to pick out an entity and then ask for a list of entities whose vectors are similar to that of the selected entity. Researchers at King Abdullah University of Science and Technology, the University of Birmingham, and Maastricht University have collaborated to take this further by mixing in some SPARQL. Their paper &lt;a href=&#34;https://www.biorxiv.org/content/early/2018/11/07/463778&#34;&gt;Vec2SPARQL: integrating SPARQL queries and knowledge graph embeddings&lt;/a&gt; describes &amp;ldquo;a general framework for integrating structured data and their vector space representations [that] allows jointly querying vector functions such as computing similarities (cosine, correlations) or classifications with machine learning models within a single SPARQL query&amp;rdquo;. They have made their implementation available as a Docker image and also put up a SPARQL endpoint with their sample data and SPARQL extensions.&lt;/p&gt;
&lt;p&gt;Vec2SPARQL lets you use SPARQL to move beyond simple comparison of vector similarity scores to combine SPARQL&amp;rsquo;s abilities with this. As they write,&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;For example, once feature vectors are extracted from images, meta-data that is associated with the images (such as geo-locations, image types, author, or similar) could be queried using SPARQL and &lt;em&gt;combined&lt;/em&gt; with the semantic queries over the feature vectors extracted from the images themselves. Such a combination would, for example, allow to identify the images authored by person &lt;em&gt;a&lt;/em&gt; that are most similar to an image of author &lt;em&gt;b&lt;/em&gt;; it can enable similarity- or analogy-based search and retrieval in precisely delineated subsets; or, when feature learning is applied to structured datasets, can combine similarity search and link prediction based on knowledge graph embeddings with structured queries based on SPARQL.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The paper&amp;rsquo;s authors extended &lt;a href=&#34;https://jena.apache.org/documentation/query/&#34;&gt;Apache Jena ARQ&lt;/a&gt; (the open source cross-platform command line SPARQL processor that I recommend in my book &lt;a href=&#34;http://www.learningsparql.com&#34;&gt;Learning SPARQL&lt;/a&gt;) with two new functions that make it easier to work with these vectors. The &lt;code&gt;similarity(?x,?y)&lt;/code&gt; function lets you compute the similarity of two vectors so that you can use the result in a &lt;code&gt;FILTER&lt;/code&gt;, &lt;code&gt;BIND&lt;/code&gt;, or &lt;code&gt;SELECT&lt;/code&gt; statement. For example, you might use it in a &lt;code&gt;FILTER&lt;/code&gt; statement to only retrieve resources whose similarity to a particular resource was above a specified threshold. Their &lt;code&gt;mostSimilar(?x,n)&lt;/code&gt; function asks for the &lt;code&gt;n&lt;/code&gt; most similar entities to the one passed as the first argument.&lt;/p&gt;
&lt;p&gt;Their paper discusses two applications of Vec2SPARQL, in which they &amp;ldquo;demonstrate using biomedical, clinical, and bioinformatics use cases how [their] approach can enable new kinds of queries and applications that combine symbolic processing and retrieval of information through sub-symbolic semantic queries within vector spaces&amp;rdquo;. As they described the first of their two examples,&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&amp;hellip;we can use Vec2SPARQL to perform queries of a knowledge graph of mouse genes, diseases and phenotypes and incorporate Vec2SPARQL similarity functions&amp;hellip; Our aim in this use case is to find mouse gene associations with human diseases by prioritizing them using their phenotypic similarity, and simultaneously restrict the similarity comparisons to genes and diseases with specific properties (such as being associated with a particular phenotype).&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The paper describes where they got their data and how they prepared it, and it shows a brief but expressive query that let them achieve their goal.&lt;/p&gt;
&lt;p&gt;In their second example, after assigning vectors to over 112,000 human chest x-ray images that also included gender, age, and diagnosis metadata, they could query for image similarity and also add filters to these queries such as combinations of age range and gender to find other patterns of similarity.&lt;/p&gt;
&lt;p&gt;The paper goes into greater detail on the data used for their samples and the similarity measures that they used. It also points to their &lt;a href=&#34;https://github.com/bio-ontology-research-group/vec2sparql&#34;&gt;source code on github&lt;/a&gt; and a &amp;ldquo;SPARQL endpoint&amp;rdquo; at &lt;a href=&#34;http://sparql.bio2vec.net/&#34;&gt;http://sparql.bio2vec.net/&lt;/a&gt; that is really more of a SPARQL endpoint query form. (The actual endpoint is at &lt;code&gt;http://sparql.bio2vec.net/patient_embeddings/query&lt;/code&gt;, and I successfully &lt;a href=&#34;https://twitter.com/coolmaksat/status/1079594129997348864&#34;&gt;sent a query there&lt;/a&gt; with curl.)&lt;/p&gt;
&lt;p&gt;For an academic paper, &amp;ldquo;Vec2SPARQL: integrating SPARQL queries and knowledge graph embeddings&amp;rdquo; is quite readable. (Although I didn&amp;rsquo;t have the right biology background to closely follow all the discussions of their sample query data, I could just about handle the math as shown.) I recommend the paper to anyone interested in SPARQL or the embedding vectors side of machine learning. They seem to have a productive future together.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2019">2019</category>
      
      <category domain="https://www.bobdc.com//categories/ai-and-machine-learning">AI and machine learning</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
    </item>
    
    <item>
      <title>Playing with wdtaxonomy</title>
      <link>https://www.bobdc.com/blog/playing-with-wdtaxonomy/</link>
      <pubDate>Sun, 23 Dec 2018 09:51:49 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/playing-with-wdtaxonomy/</guid>
      
      
      <description><div>Those queries from my last blog entry? Never mind!</div><div>&lt;p&gt;After I wrote about &lt;a href=&#34;https://www.bobdc.com/blog/extracting-rdf-data-models-fro&#34;&gt;Extracting RDF data models from Wikidata&lt;/a&gt; in my blog last month, &lt;a href=&#34;https://twitter.com/Ettore_Rizza&#34;&gt;Ettore Rizza&lt;/a&gt; &lt;a href=&#34;https://twitter.com/Ettore_Rizza/statuses/1064428103068467200&#34;&gt;suggested&lt;/a&gt; that I check out &lt;a href=&#34;https://www.npmjs.com/package/wikidata-taxonomy&#34;&gt;wdtaxonomy&lt;/a&gt;, which extracts taxonomies from Wikidata by retrieving the kinds of data that my blog entry&amp;rsquo;s sample queries retrieved, and it then displays the results as a tree. After playing with it, I&amp;rsquo;m tempted to tell everyone who read that blog entry to ignore the example queries I included, because you can learn a lot more from wdtaxonomy.&lt;/p&gt;
&lt;p&gt;The queries in that blog entry might still give you some useful perspective on how SPARQL can retrieve triples from &lt;a href=&#34;https://www.wikidata.org/wiki/Wikidata:Main_Page&#34;&gt;Wikidata&lt;/a&gt; that express tree-ish relationships between the concepts of a given domain that have Wikipedia pages&amp;ndash;whether you want to call that a taxonomy or an ontology&amp;ndash;but I was just dabbling, while wdtaxonomy is a full-featured serious application for this.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://jakobvoss.de/&#34;&gt;Jakob Voss&lt;/a&gt; designed wdtaxonomy as both a command line utility and as an NPM &lt;a href=&#34;https://www.npmjs.com/package/wikidata-taxonomy#user-content-usage-as-module&#34;&gt;module&lt;/a&gt; that you can reference from applications. I tried the command line version and had a lot of fun. To try it with my periodic table element example that I wrote about last month, I started by entering &amp;ldquo;wdtaxonomy Q11344&amp;rdquo; (using the same local name for the &lt;a href=&#34;https://www.wikidata.org/wiki/Q11344&#34;&gt;Wikidata identifier&lt;/a&gt; that I used before) and the results were impressive.&lt;/p&gt;
&lt;p&gt;wdtaxonomy typically outputs a text-based tree with various information about the nodes of the tree. Instead of pasting a sample here, I&amp;rsquo;m showing a screen shot of the beginning of the output so that you can see the nice color coding:&lt;/p&gt;
&lt;img id=&#34;idm45289136971824&#34; width=&#34;320&#34; src=&#34;https://www.bobdc.com/img/main/wdtaxonomy1.png&#34;/&gt;
&lt;p&gt;The wdtaxonomy readthedocs.io &lt;a href=&#34;https://wdtaxonomy.readthedocs.io/en/latest/&#34;&gt;documentation&lt;/a&gt; lists over two dozen command line options that you can use to customize the output. (Entering &amp;ldquo;wdtaxonomy&amp;rdquo; alone at the command line gives a good summary.) My favorite is &lt;code&gt;-s&lt;/code&gt;, which tells you you the SPARQL query that wdtaxonomy would use to retrieve the requested information from wikidata. Here is what that gives you when added it to the Q11344 command line I entered above:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;$ wdtaxonomy -s Q11344
  SELECT ?item ?broader ?itemLabel ?instances ?sites WITH {
    SELECT DISTINCT ?item { ?item wdt:P279* wd:Q11344 }
  } AS %items WHERE { 
    INCLUDE %items .
    OPTIONAL { ?item wdt:P279 ?broader } .
    {
      SELECT ?item (count(distinct ?element) as ?instances) {
        INCLUDE %items.
        OPTIONAL { ?element wdt:P31 ?item }
      } GROUP BY ?item
    }
    {
      SELECT ?item (count(distinct ?site) as ?sites) {
        INCLUDE %items.
        OPTIONAL { ?site schema:about ?item }
      } GROUP BY ?item
    }
    SERVICE wikibase:label {
      bd:serviceParam wikibase:language &amp;quot;en&amp;quot;
    }
  }
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;(The INCLUDE keyword used in this query is a Blazegraph and Anzo &lt;a href=&#34;https://wiki.blazegraph.com/wiki/index.php/NamedSubquery&#34;&gt;extension&lt;/a&gt; to the SPARQL standard.) Combining this &lt;code&gt;-s&lt;/code&gt; option with other options, such as &lt;code&gt;-i&lt;/code&gt; to include instances or &lt;code&gt;-d&lt;/code&gt; to include item descriptions, shows what SPARQL query the tool would generate to retrieve this additional information. It&amp;rsquo;s a great opportunity to learn more about SPARQL, about the Wikidata data model, and about their relationship. (I have worried that this data model would scare off people who are new to SPARQL&amp;ndash;that if their first data set to query was Wikidata, they migh think that the complexity of the necessary queries was because of SPARQL and not because of Wikidata&amp;ndash;but when I see all the great activity on Twitter around the use of SPARQL with Wikidata lately, I don&amp;rsquo;t worry so much anymore.)&lt;/p&gt;
&lt;p&gt;The ability to get at the generated SPARQL queries is also a huge help to my original goal of retrieving triples that let me store an RDFS/OWL ontology or a SKOS taxonomy about Wikipedia entities. I can change the SELECT part to a CONSTRUCT clause to create triples that use the variables bound in wdtaxonomy&amp;rsquo;s WHERE clauses. wdtaxonomy (or rather, Jakob) has done the difficult work of assembling the necessary query logic and we can just take it and use it.&lt;/p&gt;
&lt;p&gt;Some of the other command line options I liked include &lt;code&gt;-U&lt;/code&gt; to get full URIs and &lt;code&gt;-r&lt;/code&gt; to get superclasses of the named entity instead of its subclasses. I encourage everyone interested in SPARQL and Wikidata to install wdtaxonomy and start playing with it. Especially with that &lt;code&gt;-s&lt;/code&gt; option!&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2018">2018</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
      <category domain="https://www.bobdc.com//categories/wikidata">Wikidata</category>
      
    </item>
    
    <item>
      <title>Extracting RDF data models from Wikidata</title>
      <link>https://www.bobdc.com/blog/extracting-rdf-data-models-fro/</link>
      <pubDate>Sun, 18 Nov 2018 09:41:46 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/extracting-rdf-data-models-fro/</guid>
      
      
      <description><div>That&#39;s &#34;models&#34;, plural.</div><div>&lt;blockquote id=&#34;idm46211189631152&#34; class=&#34;pullquote&#34;&gt; Their avoidance of the standard model vocabularies is not a big deal, and we should be glad that they make this available in RDF at all.&lt;/blockquote&gt;
&lt;p&gt;Some people complain when an RDF dataset lacks a documented data model. A great thing about RDF and SPARQL is that if you want to know what kind of modeling might have been done for a dataset, &lt;em&gt;you just look&lt;/em&gt;, even if they&amp;rsquo;re using non-(W3C-)standard modeling structures. They&amp;rsquo;re still using triples, so you look at the triples.&lt;/p&gt;
&lt;p&gt;If I know that there is an entity &lt;code&gt;x:thing23&lt;/code&gt; in a dataset, I&amp;rsquo;m going to query for &lt;code&gt;{x:thing23 ?p ?o}&lt;/code&gt; and see what information there is about that entity. Hopefully I will find an &lt;code&gt;rdf:type&lt;/code&gt; triple saying that it&amp;rsquo;s a member of a class. If not, maybe it uses some other home-grown way to indicate class membership; either way, you can then start querying to find out about the class&amp;rsquo;s relationships to properties and other classes, and you&amp;rsquo;ve got a data model. What if it doesn&amp;rsquo;t use RDFS to describe these modeling structures and their relationships? A CONSTRUCT query will convert it to a data model that does.&lt;/p&gt;
&lt;p&gt;And, if &lt;code&gt;{x:thing23 ?p ?o}&lt;/code&gt; triples don&amp;rsquo;t indicate any class membership, just seeing what the &lt;code&gt;?p&lt;/code&gt; values are tells you something about the data model. If certain entities use certain properties for their predicates, and other entities use a list that overlaps with that, you&amp;rsquo;ve learned more about relationships between sets of entities in the dataset. All of these things can be investigated with simple queries.&lt;/p&gt;
&lt;p&gt;Wikidata offers tons of great data and modeling for us RDF people, but it wasn&amp;rsquo;t designed for us. They created their own model and then expressed the model and instance data in RDF, and I&amp;rsquo;m not going to complain; can you imagine how cool it would be if Google did the same with their knowledge graph? (When I &lt;a href=&#34;https://twitter.com/bobdc/status/1051140112543875072&#34;&gt;tweeted&lt;/a&gt; &amp;ldquo;Handy Wikidata hints for people who have been using RDF and SPARQL since before Wikidata was around: use wdt:P31 instead of rdf:type and wdt:P279 instead of rdfs:subClassOf&amp;rdquo;, &lt;a href=&#34;https://twitter.com/mark_l_watson&#34;&gt;Mark Watson&lt;/a&gt; replied that he liked my sense of humor. While I hadn&amp;rsquo;t meant to be funny I do appreciate &lt;em&gt;his&lt;/em&gt; sense of humor.) As I&amp;rsquo;ve worked at understanding Wikidata&amp;rsquo;s documentation about their mapping to RDF I&amp;rsquo;ve had fun just querying around to understand the structures. Again: this is one of the key reasons that RDF and SPARQL are great! Because we can do that!&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://www.bobdc.com/blog/sparql-full-text-wikipedia-sea&#34;&gt;Last month&lt;/a&gt; I described how you can find the subclass tree under a given class in Wikidata and since then I&amp;rsquo;ve done further exploration of how to pull data models out of Wikidata. Note that I say &amp;ldquo;models&amp;rdquo; and not &amp;ldquo;model&amp;rdquo;. &lt;a href=&#34;https://twitter.com/datao&#34;&gt;Olivier Rossel&lt;/a&gt; recently &lt;a href=&#34;https://twitter.com/datao/statuses/1056911654879969280&#34;&gt;referred to&lt;/a&gt; extracting the data model of Wikidata (my translation from his French), but I worry that looking for &amp;ldquo;the&amp;rdquo; grand RDF data model of Wikidata might set someone up for disappointment. I think that looking for data models to suit various projects will be more productive. (Olivier and I discussed this further in the &amp;ldquo;Handy Wikidata hints&amp;rdquo; thread mentioned above.)&lt;/p&gt;
&lt;p&gt;The following query builds on the one I did last month to either get a class tree below a given one or to get its superclasses instead. It creates triples that express the classes and their relationships using W3C standard properties.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;CONSTRUCT {
  ?class a owl:Class . 
  ?class rdfs:subClassOf ?superclass . 
  ?class rdfs:label ?classLabel . 
  ?property rdfs:domain ?class . 
  ?property rdfs:label ?classLabel .
}
WHERE {
  BIND(wd:Q11344 AS ?mainClass) .    # Q11344 chemical element; Q1420 automobile

  
  # Pick one or the other of the following two triple patterns. 
  ?class wdt:P279* ?mainClass.     # Find subclasses of the main class. 
  #?mainClass wdt:P279* ?class.     # Find superclasses of the main class. 

  
  ?class wdt:P279 ?superclass .     # So we can create rdfs:subClassOf triples
  ?class rdfs:label ?classLabel.
  OPTIONAL {
    ?class wdt:P1963 ?property.
    ?property rdfs:label ?propertyLabel.
    FILTER((LANG(?propertyLabel)) = &amp;quot;en&amp;quot;)
    }
  FILTER((LANG(?classLabel)) = &amp;quot;en&amp;quot;)
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;(Because the query uses prefixes that Wikidata already understands, I didn&amp;rsquo;t need to declare any.) When run in the Wikidata query service form, there are too many triples to see at once, so I put the query into a subtreeClasses.rq file and ran it with curl from the command line like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;curl --data-urlencode &amp;quot;query@subtreeClasses.rq&amp;quot; https://query.wikidata.org/sparql -H &amp;quot;Accept: text/turtle&amp;quot;  &amp;gt; chemicalElementSubClasses.ttl
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Loading the result into TopBraid Composer Free edition (available &lt;a href=&#34;https://www.topquadrant.com/downloads/topbraid-composer-install/&#34;&gt;here&lt;/a&gt;; the Free edition is a choice on the Product dropdown list) showed a class tree of the result like this:&lt;/p&gt;
&lt;img id=&#34;idm46211189613984&#34; width=&#34;280&#34; src=&#34;https://www.bobdc.com/img/main/elementClassTree.png&#34;/&gt;
&lt;p&gt;(It&amp;rsquo;s tempting to add an entry for &lt;a href=&#34;https://en.wikipedia.org/wiki/Professor_Frink&#34;&gt;Frinkonium&lt;/a&gt; as a subclass of &amp;ldquo;hypothetical chemical element&amp;rdquo;.) I understand that the Wikimedia Foundation had their reasons for not describing their models with the standard vocabularies, but this shows the value of using the standards: interoperability with other tools. It also shows that the Foundation&amp;rsquo;s avoidance of the standard model vocabularies is not a big deal, and that we should be glad that they make this available in RDF at all, because the sheer fact that it&amp;rsquo;s in RDF makes it easy to convert to whatever RDF we want with a CONSTRUCT query. (Again, imagine if Google did this with any portion of their knowledge graph&amp;hellip;)&lt;/p&gt;
&lt;p&gt;The query above also looks for properties for those classes so that it can express those in the output with the RDFS vocabulary. It didn&amp;rsquo;t find many, but this bears further investigation. &lt;a href=&#34;https://query.wikidata.org/#SELECT%20DISTINCT%20%3FconstraintLabel%20WHERE%20%7B%0A%20%20wd%3AQ11344%20wdt%3AP1963%20%3Fproperty.%0A%20%20%3Fproperty%20wdt%3AP2302%20%3Fconstraint.%0A%20%20%3Fconstraint%20rdfs%3Alabel%20%3FconstraintLabel.%0A%20%20FILTER%28%28LANG%28%3FconstraintLabel%29%29%20%3D%20%22en%22%29%0A%7D%0AORDER%20BY%20%3FconstraintLabel%0A&#34;&gt;This query&lt;/a&gt; shows that in addition to the chemical element class having properties, there are constraints on those properties described with triples, so there&amp;rsquo;s a lot more that can be done here to pull richer models out of Wikidata and then express them in more standard vocabularies.&lt;/p&gt;
&lt;p&gt;And of course there&amp;rsquo;s the possibility of pulling out instance data to go with these models. Queries for that would be easy enough to assemble but you might end up with so much data that Wikidata times out before giving it to you; you could use the techniques I described in &lt;a href=&#34;https://www.bobdc.com/blog/pipelining-sparql-queries-in-m&#34;&gt;Pipelining SPARQL queries in memory with the rdflib Python library&lt;/a&gt; to retrieve instance URIs and then retrieve the additional triples about those instances in batches of queries that use the VALUES keywords.&lt;/p&gt;
&lt;p&gt;Lots of data instances of rich models, all transformed to conform to the W3C standards so that they work with lots of open source and commercial tools&amp;ndash;the possibilities are pretty impressive. If anyone pulls datasets like this out of Wikidata for their field, let me know about it!&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2018">2018</category>
      
      <category domain="https://www.bobdc.com//categories/wikidata">Wikidata</category>
      
    </item>
    
    <item>
      <title>SPARQL full-text Wikipedia searching and Wikidata subclass inferencing</title>
      <link>https://www.bobdc.com/blog/sparql-full-text-wikipedia-sea/</link>
      <pubDate>Sun, 28 Oct 2018 12:37:19 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/sparql-full-text-wikipedia-sea/</guid>
      
      
      <description><div>Wikipedia querying techniques inspired by a recent paper.</div><div>&lt;img id=&#34;idm9&#34; src=&#34;https://www.bobdc.com/img/main/MilhaudBacharach.png&#34; width=&#34;300&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;Milhaud and Bacharach&#34;/&gt;
&lt;p&gt;I found all kinds of interesting things in the article &amp;ldquo;Getting the Most out of Wikidata: Semantic Technology Usage in Wikipedia&amp;rsquo;s Knowledge Graph&amp;rdquo;(&lt;a href=&#34;https://iccl.inf.tu-dresden.de/w/images/5/5a/Malyshev-et-al-Wikidata-SPARQL-ISWC-2018.pdf&#34;&gt;pdf&lt;/a&gt;) by Stanislav Malyshev of the Wikimedia Foundation and four co-authors from the Technical University of Dresden. I wanted to highlight two particular things that I will find useful in the future and then I&amp;rsquo;ll list a few more.&lt;/p&gt;
&lt;p&gt;Before I cover them, I wanted to mention that I&amp;rsquo;ve really grown to appreciate the little diamond icon in the upper-left of the Wikidata &lt;a href=&#34;https://query.wikidata.org&#34;&gt;query form&lt;/a&gt;. As I refine queries on that form, the queries typically get messier and messier, so the ability to clean it all up with one click is very convenient.&lt;/p&gt;
&lt;h2 id=&#34;idm15&#34;&gt;Full text searching of Wikipedia with SPARQL&lt;/h2&gt;
&lt;p&gt;The paper&amp;rsquo;s &amp;ldquo;Custom SPARQL Extensions&amp;rdquo; section describes several extensions, including the MediaWiki Web API. The &lt;a href=&#34;https://www.mediawiki.org/wiki/Wikidata_Query_Service/User_Manual/MWAPI&#34;&gt;Wikidata Query Service/User Manual/MWAPI&lt;/a&gt; page describes how you can call the &lt;a href=&#34;https://www.mediawiki.org/wiki/API:Main_page&#34;&gt;MediaWiki API&lt;/a&gt; search functions by using special property functions (that is, properties that instruct the query engine to execute certain special functions).&lt;/p&gt;
&lt;p&gt;This API is definitely one of those topics where reviewing the examples will get you started more quickly than trying to read the actual documention. Their first SPARQL query search example, &lt;a href=&#34;https://www.mediawiki.org/wiki/Wikidata_Query_Service/User_Manual/MWAPI#Find_all_entities_with_labels_%22cheese%22_and_get_their_types&#34;&gt;Find all entities with labels &amp;ldquo;cheese&amp;rdquo; and get their types&lt;/a&gt;, searches Wikipedia for entries that have &amp;ldquo;cheese&amp;rdquo; in one of their labels such as the page title or alternative names.&lt;/p&gt;
&lt;p&gt;The key difference in the &lt;a href=&#34;https://www.mediawiki.org/wiki/Wikidata_Query_Service/User_Manual/MWAPI#Find_articles_in_Wikipedia&#34;&gt;Find articles in Wikipedia&lt;/a&gt; example that follows the first cheese example is that its fifth line uses the property function &lt;code&gt;mwapi:srsearch&lt;/code&gt; as a predicate instead of &lt;code&gt;mwapi:search&lt;/code&gt;, telling the query to search the contents of all of the English (note the &amp;ldquo;.en&amp;rdquo; on the fourth line) Wikipedia pages. You can try that example yourself to do a full-text search for &amp;ldquo;cheese&amp;rdquo;. I did a similar search for &lt;a href=&#34;https://query.wikidata.org/#SELECT%20%2a%20WHERE%20%7B%0A%20%20SERVICE%20wikibase%3Amwapi%20%7B%0A%20%20%20%20%20%20bd%3AserviceParam%20wikibase%3Aapi%20%22Search%22%20.%0A%20%20%20%20%20%20bd%3AserviceParam%20wikibase%3Aendpoint%20%22en.wikipedia.org%22%20.%0A%20%20%20%20%20%20bd%3AserviceParam%20mwapi%3Asrsearch%20%22Darius%20Milhaud%20Burt%20Bacharach%22%20.%0A%20%20%20%20%20%20%3Ftitle%20wikibase%3AapiOutput%20mwapi%3Atitle%20.%0A%20%20%7D%0A%7D&#34;&gt;Darius Milhaud Burt Bacharach&lt;/a&gt; because I&amp;rsquo;ve recently been fascinated by the connections between Milhaud, a French composer who rose to prominence in the 1920s as a member of &lt;a href=&#34;https://en.wikipedia.org/wiki/Les_Six&#34;&gt;Les Six&lt;/a&gt;, and Bacharach, one of the greatest pop songwriters of the 1960s. (Listening to some Milhaud once, it struck me as odd that his use of horns would remind me of some Bacharach songs and arrangements until I found out that the author of &amp;ldquo;The Look of Love&amp;rdquo;, &amp;ldquo;Walk on By&amp;rdquo;, and &amp;ldquo;I Say a Little Prayer&amp;rdquo; studied with Milhaud in the 1940s at McGill University.) This query certainly doesn&amp;rsquo;t need the &amp;ldquo;LIMIT 20&amp;rdquo; at the end like the full-text search for &amp;ldquo;cheese&amp;rdquo; does, because these two guys don&amp;rsquo;t get mentioned on the same page as often as cheese gets mentioned, but it is an interesting set of pages.&lt;/p&gt;
&lt;h2 id=&#34;idm28&#34;&gt;Subclass inferencing with Wikidata&lt;/h2&gt;
&lt;p&gt;I&amp;rsquo;m still surprised at how many people use RDF without adding any schema information, or worse, without using schema information that&amp;rsquo;s already there. Wikidata provides plenty for us, and while the Blazegraph instance used as the back end to its SPARQL engine does not have its RDFS &lt;a href=&#34;https://www.bobdc.com/blog/trying-out-blazegraph&#34;&gt;inferencing&lt;/a&gt; capabilities turned on&amp;ndash;understandably, because queries that take advantage of this ask more of a processor and could therefore hamper scalability&amp;ndash;a nice property path trick does let us ask for all the instances of a particular class and of its subclasses. This wasn&amp;rsquo;t even mentioned in the &amp;ldquo;Getting the Most out of Wikidata&amp;rdquo; paper, but a mention of how Wikidata uses &lt;code&gt;owl:objectProperty&lt;/code&gt; inspired me to dig more into the use of the data modeling, and I came up with this.&lt;/p&gt;
&lt;p&gt;The following (try it &lt;a href=&#34;https://query.wikidata.org/#SELECT%20%28count%28%2a%29%20as%20%3Finstances%29%20WHERE%20%20%7B%0A%20%20%3Finstance%20wdt%3AP31%20wd%3AQ473708%20%20%20%20%20%23%20Instance%20has%20a%20type%20of%20%22home%20computers%22%0A%7D%0A&#34;&gt;here&lt;/a&gt;) shows that Wikidata currently has data about 125 instances of home computer models:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;SELECT (count(*) as ?instances) WHERE  {
  ?instance wdt:P31 wd:Q473708     # Instance has a type of &amp;quot;home computers&amp;quot;
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This next query (try it &lt;a href=&#34;https://query.wikidata.org/#SELECT%20%28COUNT%28%2a%29%20AS%20%3Finstances%29%20WHERE%20%7B%0A%20%20%3Finstance%20wdt%3AP31%20%3Fclass.%0A%20%20%3Fclass%20wdt%3AP279%20wd%3AQ473708.%0A%7D%0A&#34;&gt;here&lt;/a&gt;) shows that there are 28 instances of classes that are a direct subclass of &amp;ldquo;home computers&amp;rdquo;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;SELECT (COUNT(*) AS ?instances) WHERE {
  ?instance wdt:P31 ?class.
  ?class wdt:P279 wd:Q473708.     # wdt:P279: subclass of 
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Merely adding the property path asterisk operator to &lt;code&gt;wdt:P31&lt;/code&gt; tells the query engine to find instances of the home computer class and also instances of any class in the subclass tree below it (try it &lt;a href=&#34;https://query.wikidata.org/#SELECT%20%28COUNT%28%2a%29%20AS%20%3Finstances%29%20WHERE%20%7B%0A%20%20%3Finstance%20wdt%3AP31%20%3Fclass.%0A%20%20%3Fclass%20wdt%3AP279%2a%20wd%3AQ473708.%0A%7D%0A&#34;&gt;here&lt;/a&gt;) and it finds 154 of them:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;SELECT (COUNT(*) AS ?instances) WHERE {
  ?instance wdt:P31 ?class.
  ?class wdt:P279* wd:Q473708.
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;As with regular expressions, the asterisk means &amp;ldquo;0 or more steps away,&amp;rdquo; so that instances of wd:Q473708 would be counted along with instances of classes from its subclass tree. Using a plus sign instead would have meant &amp;ldquo;1 or more instances away&amp;rdquo; so that query would not have found instances of wd:Q473708.&lt;/p&gt;
&lt;p&gt;The ability to use class relationships to identify potentially useful data is just one example of how schema metadata adds value to data. And, we get more than just these additional instances; we get additional class names that tell us more about these instances. For example, we can find that the &lt;a href=&#34;https://www.wikidata.org/wiki/Q55267838&#34;&gt;Thomson MO5-CnAM 43737&lt;/a&gt; computer is an instance of the class &lt;a href=&#34;https://www.wikidata.org/wiki/Q2396081&#34;&gt;Thomson M05&lt;/a&gt;, which is a subclass of &lt;a href=&#34;https://www.wikidata.org/wiki/Q3095025&#34;&gt;MOTO Gamme&lt;/a&gt;, which is a subclass of &lt;a href=&#34;https://www.wikidata.org/wiki/Q473708&#34;&gt;home computer&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;idm49&#34;&gt;And more&lt;/h2&gt;
&lt;p&gt;Some other nice things I learned about in the paper:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;The use of &lt;code&gt;wikibase:around&lt;/code&gt; and &lt;code&gt;wikibase:box&lt;/code&gt; for additional kinds of geographic queries in addition to the ability to search within a city&amp;rsquo;s limits as I described &lt;a href=&#34;https://www.bobdc.com/blog/dividing-and-conquering-sparql&#34;&gt;in July&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;A list of additional endpoints that you can use in federated queries sent to Wikidata.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Support for Blazegraph&amp;rsquo;s graph traversal features.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Multiple live Grafana dashboards about Wikidata usage such as data about &lt;a href=&#34;https://grafana.wikimedia.org/dashboard/db/wikidata-special-entitydata?refresh=30m&amp;amp;orgId=1&#34;&gt;agents and formats requested&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you&amp;rsquo;re interested in SPARQL, Wikidata, or especially the combination, you&amp;rsquo;ll learn some fascinating things from this paper.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2018">2018</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
      <category domain="https://www.bobdc.com//categories/wikidata">Wikidata</category>
      
    </item>
    
    <item>
      <title>Panic over &#34;superhuman&#34; AI</title>
      <link>https://www.bobdc.com/blog/panic-over-superhuman-ai/</link>
      <pubDate>Sun, 23 Sep 2018 11:27:48 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/panic-over-superhuman-ai/</guid>
      
      
      <description><div>Robot overlords not on the way.</div><div>&lt;p&gt;&lt;a href=&#34;https://www.imdb.com/title/tt2145829/&#34;&gt;&lt;img id=&#34;idm45830488598352&#34; src=&#34;https://www.bobdc.com/img/main/robotoverlords.jpg&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;Robot Overlords movie poster&#34; width=&#34;320&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;When someone describes their worries about AI taking over the world, I usually think to myself &amp;ldquo;I recently bookmarked a good article about why this is silly and I should point this person to it&amp;rdquo;, but in that instant I can&amp;rsquo;t remember what the article was. I recently re-read a few and thought I&amp;rsquo;d summarize them here in case anyone wants to point their friends to some sensible discussions of why such worries are unfounded.&lt;/p&gt;
&lt;h2 id=&#34;the-impossibility-of-intelligence-explosionhttpsmediumcomfrancoischolletthe-impossibility-of-intelligence-explosion-5be4a9eda6ec-by-françois-chollet&#34;&gt;&lt;a href=&#34;https://medium.com/@francois.chollet/the-impossibility-of-intelligence-explosion-5be4a9eda6ec&#34;&gt;The impossibility of intelligence explosion&lt;/a&gt; by François Chollet&lt;/h2&gt;
&lt;p&gt;Chollet is an AI researcher at Google and the author of the Keras deep learning framework and the Manning books &amp;ldquo;Deep Learning with Python&amp;rdquo; and &amp;ldquo;Deep Learning with R&amp;rdquo;. Like some of the other articles covered here, his piece takes on the idea that we will someday build an AI system that can build a better one on its own, and then that one will build a better one, and so on until the &lt;a href=&#34;https://en.wikipedia.org/wiki/Technological_singularity&#34;&gt;singularity&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;His outline gives you a general idea of his line of reasoning; the bulleted lists in his last two sections are also good:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;A flawed reasoning that stems from a misunderstanding of intelligence&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Intelligence is situational&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Our environment puts a hard limit on our individual intelligence&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Most of our intelligence is not in our brain, it is externalized as our civilization&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;An individual brain cannot implement recursive intelligence augmentation&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;What we know about recursively self-improving systems&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Conclusions&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;One especially nice paragraph:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;In particular, there is no such thing as &amp;ldquo;general&amp;rdquo; intelligence. On an abstract level, we know this for a fact via the &amp;ldquo;no free lunch&amp;rdquo; theorem &amp;ndash; stating that no problem-solving algorithm can outperform random chance across all possible problems. If intelligence is a problem-solving algorithm, then it can only be understood with respect to a specific problem. In a more concrete way, we can observe this empirically in that all intelligent systems we know are highly specialized. The intelligence of the AIs we build today is hyper specialized in extremely narrow tasks &amp;ndash; like playing Go, or classifying images into 10,000 known categories. The intelligence of an octopus is specialized in the problem of being an octopus. The intelligence of a human is specialized in the problem of being human.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id=&#34;idm45830488584656&#34;&gt;&lt;a href=&#34;https://www.theguardian.com/technology/2018/jul/25/ai-artificial-intelligence-social-media-bots-wrong&#34;&gt;&amp;lsquo;The discourse is unhinged&amp;rsquo;: how the media gets AI alarmingly wrong&lt;/a&gt; by Oscar Schwartz&lt;/h2&gt;
&lt;p&gt;This &lt;a href=&#34;https://www.theguardian.com&#34;&gt;Guardian&lt;/a&gt; piece focuses on how the media encourages silly thinking about the future of AI. As the article&amp;rsquo;s subtitle tells us,&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Social media has allowed self-proclaimed &amp;lsquo;AI influencers&amp;rsquo; who do nothing more than paraphrase Elon Musk to cash in on this hype with low-quality pieces. The result is dangerous.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Much of the article focuses on the efforts of Zachary Lipton, a machine learning assistant professor at Carnegie Mellon, to call out bad journalism on the topic. One example is an article that I was also guilty of taking too seriously: Fast Company&amp;rsquo;s &lt;a href=&#34;https://www.fastcompany.com/90132632/ai-is-inventing-its-own-perfect-languages-should-we-let-it&#34;&gt;AI Is Inventing Languages Humans Can&amp;rsquo;t Understand. Should We Stop It?&lt;/a&gt; The actual &amp;ldquo;language&amp;rdquo; was just overly repetitive sentences made possible by recursive grammar rules, which I had &lt;a href=&#34;http://www.snee.com/bob/docs/solfish.pdf&#34;&gt;experienced myself&lt;/a&gt; many years ago doing a LISP-based project for a Natural Language Processing course. Schwartz quotes the Sun article &lt;a href=&#34;https://www.thesun.co.uk/tech/4141624/facebook-robots-speak-in-their-own-language/&#34;&gt;Facebook shuts off AI experiment after two robots begin speaking in their OWN language only they can understand&lt;/a&gt; as saying that the incident &amp;ldquo;closely resembled the plot of The Terminator in which a robot becomes self-aware and starts waging a war on humans&amp;rdquo;. (The Sun article also says &amp;ldquo;Experts have called the incident exciting but also incredibly scary&amp;rdquo;; according to the Guardian article, &amp;ldquo;These findings were considered to be fairly interesting by other experts in the field, but not totally surprising or groundbreaking&amp;rdquo;.)&lt;/p&gt;
&lt;p&gt;Schwartz&amp;rsquo;s piece describes how the term &amp;ldquo;electronic brain&amp;rdquo; is as old as electronic computers, and how overhyped media coverage of machines that &amp;ldquo;think&amp;rdquo; as far back as the 1940s led to inflated expectations about AI that greatly contributed to the several &lt;a href=&#34;https://en.wikipedia.org/wiki/AI_winter&#34;&gt;AI winters&lt;/a&gt; we&amp;rsquo;ve had since then.&lt;/p&gt;
&lt;h2 id=&#34;idm45830488576080&#34;&gt;&lt;a href=&#34;https://www.ben-evans.com/benedictevans/2018/06/22/ways-to-think-about-machine-learning-8nefy&#34;&gt;Ways to Think About Machine Learning&lt;/a&gt; by Benedict Evans&lt;/h2&gt;
&lt;p&gt;If you&amp;rsquo;re going to read only one of the articles I describe here all the way through, I recommend this one. I don&amp;rsquo;t listen to every episode of the &lt;a href=&#34;https://a16z.com/podcasts/&#34;&gt;a16z podcast&lt;/a&gt;, but I do listen to every one that includes Benedict Evans (this week&amp;rsquo;s episode, on &lt;a href=&#34;https://a16z.com/2018/09/17/hallway-conversation-tesla-disruption/&#34;&gt;Tesla and the Nature of Disruption&lt;/a&gt;, was typically excellent), and I have subscribed to his &lt;a href=&#34;https://www.ben-evans.com/newsletter/&#34;&gt;newsletter&lt;/a&gt; for years. He&amp;rsquo;s a sharp guy with sensible attitudes about how technologies and societies fit together and where it may lead.&lt;/p&gt;
&lt;p&gt;One theme of many of the articles I describe here is the false notion that intelligence is a single thing that can be measured on a one-dimensional scale. As Evans puts it,&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;This gets to the heart of the most common misconception that comes up in talking about machine learning - that it is in some way a single, general purpose thing, on a path to HAL 9000, and that Google or Microsoft have each built *one*, or that Google &amp;lsquo;has all the data&amp;rsquo;, or that IBM has an actual thing called &amp;lsquo;Watson&amp;rsquo;. Really, this is always the mistake in looking at automation: with each wave of automation, we imagine we&amp;rsquo;re creating something anthropomorphic or something with general intelligence. In the 1920s and 30s we imagined steel men walking around factories holding hammers, and in the 1950s we imagined humanoid robots walking around the kitchen doing the housework. We didn&amp;rsquo;t get robot servants - we got washing machines.&lt;/p&gt;
&lt;p&gt;Washing machines &lt;em&gt;are&lt;/em&gt; robots, but they&amp;rsquo;re not &amp;lsquo;intelligent&amp;rsquo;. They don&amp;rsquo;t know what water or clothes are. Moreover, they&amp;rsquo;re not general purpose even in the narrow domain of washing - you can&amp;rsquo;t put dishes in a washing machine, nor clothes in a dishwasher (or rather, you can, but you won&amp;rsquo;t get the result you want). They&amp;rsquo;re just another kind of automation, no different conceptually to a conveyor belt or a pick-and-place machine. Equally, machine learning lets us solve classes of problem that computers could not usefully address before, but each of those problems will require a different implementation, and different data, a different route to market, and often a different company. Each of them is a piece of automation. Each of them is a washing machine.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;After bringing up relational databases as a point of comparison for what new technology can do (&amp;ldquo;Relational databases gave us Oracle, but they also gave us SAP, and SAP and its peers gave us global just-in-time supply chains - they gave us Apple and Starbucks&amp;rdquo;), he asks &amp;ldquo;What, then, are the washing machines of machine learning, for real companies?&amp;rdquo; He offers some good suggestions, some of which can be summarized as &amp;ldquo;AI will allow the automation of more things&amp;rdquo;.&lt;/p&gt;
&lt;p&gt;He also discusses low-hanging fruit for what new things AI may automate. As an excellent followup to that, I recommend Kathryn Hume&amp;rsquo;s Harvard Business Review article &lt;a href=&#34;https://hbr.org/2017/10/how-to-spot-a-machine-learning-opportunity-even-if-you-arent-a-data-scientist&#34;&gt;How to Spot a Machine Learning Opportunity, Even If You Aren&amp;rsquo;t a Data Scientist&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;idm45830488564368&#34;&gt;&lt;a href=&#34;https://www.wired.com/2017/04/the-myth-of-a-superhuman-ai/&#34;&gt;The Myth of a Superhuman AI&lt;/a&gt; by Kevin Kelly&lt;/h2&gt;
&lt;p&gt;In this Wired Magazine article by one of their founders, after a discussion of some of the panicky scenarios out there we read that &amp;ldquo;buried in this scenario of a takeover of superhuman artificial intelligence are five assumptions which, when examined closely, are not based on any evidence&amp;rdquo;. He lists them, then lists five &amp;ldquo;heresies [that] have more evidence to support them&amp;rdquo;; these five provide the structure for the rest of his piece:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Intelligence is not a single dimension, so &amp;ldquo;smarter than humans&amp;rdquo; is a meaningless concept.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Humans do not have general purpose minds, and neither will AIs.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Emulation of human thinking in other media will be constrained by cost.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Dimensions of intelligence are not infinite.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Intelligences are only one factor in progress.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;A good point about how artificial general intelligence is not something to worry about makes a nice analogy with artificial flight:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;When we invented artificial flying we were inspired by biological modes of flying, primarily flapping wings. But the flying we invented &amp;ndash; propellers bolted to a wide fixed wing &amp;ndash; was a new mode of flying unknown in our biological world. It is alien flying. Similarly, we will invent whole new modes of thinking that do not exist in nature. In many cases they will be new, narrow, &amp;ldquo;small,&amp;rdquo; specific modes for specific jobs &amp;ndash; perhaps a type of reasoning only useful in statistics and probability.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;(This reminds me of Evans writing &amp;ldquo;We didn&amp;rsquo;t get robot servants - we got washing machines&amp;rdquo;.) Another good metaphor is Kelly&amp;rsquo;s comparison of attitudes about superhuman AI with &lt;a href=&#34;https://en.wikipedia.org/wiki/Cargo_cult&#34;&gt;cargo cults&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;It is possible that superhuman AI could turn out to be another cargo cult. A century from now, people may look back to this time as the moment when believers began to expect a superhuman AI to appear at any moment and deliver them goods of unimaginable value. Decade after decade they wait for the superhuman AI to appear, certain that it must arrive soon with its cargo.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id=&#34;idm45830488554880&#34;&gt;&lt;a href=&#34;https://www.businessinsider.com/myths-misconceptions-about-artificial-intelligence-2015-9&#34;&gt;19 A.I. experts reveal the biggest myths about robots&lt;/a&gt; by Guia Marie Del Prado&lt;/h2&gt;
&lt;p&gt;This Business Insider piece is almost three years old but still relevant. Most of the experts it quotes are actual computer scientist professors, so you get much more sober assessments than you&amp;rsquo;ll see in the panicky articles out there. Here&amp;rsquo;s a good one from Berkeley computer scientist Stuart Russell:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The most common misconception is that what AI people are working towards is a conscious machine, that until you have a conscious machine there&amp;rsquo;s nothing to worry about. It&amp;rsquo;s really a red herring.&lt;/p&gt;
&lt;p&gt;To my knowledge, nobody, no one who is publishing papers in the main field of AI, is even working on consciousness. I think there are some neuroscientists who are trying to understand it, but I&amp;rsquo;m not aware that they&amp;rsquo;ve made any progress.&lt;/p&gt;
&lt;p&gt;As far as AI people, nobody is trying to build a conscious machine, because no one has a clue how to do it, at all. We have less clue about how to do that than we have about build a faster-than-light spaceship.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;From Pieter Abbeel, another Berkeley computer scientist:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;In robotics there is something called Moravec&amp;rsquo;s Paradox: &amp;ldquo;It is comparatively easy to make computers exhibit adult level performance on intelligence tests or playing checkers, and difficult or impossible to give them the skills of a one-year-old when it comes to perception and mobility&amp;rdquo;.&lt;/p&gt;
&lt;p&gt;This is well appreciated by researchers in robotics and AI, but can be rather counter-intuitive to people not actively engaged in the field.&lt;/p&gt;
&lt;p&gt;Replicating the learning capabilities of a toddler could very well be the most challenging problem for AI, even though we might not typically think of a one-year-old as the epitome of intelligence.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I was happy to see the article quote NYU&amp;rsquo;s Ernie Davis, whose AI class I took over 20 years ago while working on my master&amp;rsquo;s degree there. (Reviewing my class notebook I see a lot of LISP and Prolog code, so things have changed a lot.)&lt;/p&gt;
&lt;p&gt;This article implicitly has a nice guideline for when to take predictions about the future of AI seriously: are they computer scientists familiar with the actual work going on lately? If they&amp;rsquo;re experts in other fields engaging in science fiction riffing (or as the Guardian article put it more cleverly, paraphrasing Elon Musk), take it all with a big grain of salt.&lt;/p&gt;
&lt;p&gt;I don&amp;rsquo;t mean to imply that the progress of technologies labeled as &amp;ldquo;Artificial Intelligence&amp;rdquo; has no potential problems to worry about. Just as automobiles and chain saws and a lot of other technology invented over the years can do harm as well as good, the new power brought by advanced processors, storage, and memory can be misused intentionally or accidentally, so it&amp;rsquo;s important to think through all kinds of scenarios when planning for the future. In fact, this is all the more reason not to worry about sentient machines: as the Guardian piece quotes Lipton, &amp;ldquo;There are policymakers earnestly having meetings to discuss the rights of robots when they should be talking about discrimination in algorithmic decision making. But this issue is terrestrial and sober, so not many people take an interest.&amp;rdquo; Sensible stuff to keep in mind.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2018">2018</category>
      
      <category domain="https://www.bobdc.com//categories/ai-and-machine-learning">AI and machine learning</category>
      
      <category domain="https://www.bobdc.com//categories/technology-future">technology, future</category>
      
    </item>
    
    <item>
      <title>Pipelining SPARQL queries in memory with the rdflib Python library</title>
      <link>https://www.bobdc.com/blog/pipelining-sparql-queries-in-m/</link>
      <pubDate>Mon, 27 Aug 2018 08:55:23 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/pipelining-sparql-queries-in-m/</guid>
      
      
      <description><div>Using retrieved data to make more queries.</div><div>&lt;img id=&#34;idm45434532866288&#34; src=&#34;https://www.bobdc.com/img/main/pipelines.jpg&#34; width=&#34;320&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;&#34;/&gt;
&lt;p&gt;Last month in &lt;a href=&#34;https://www.bobdc.com/blog/dividing-and-conquering-sparql&#34;&gt;Dividing and conquering SPARQL endpoint retrieval&lt;/a&gt; I described how you can avoid timeouts for certain kinds of SPARQL endpoint queries by first querying for the resources that you want to know about and then querying for more data about those resources a subset at a time using the VALUES keyword. (The example query retrieved data, including the latitude and longitude, about points within a specified city.) I built my demo with some shell scripts, some Perl scripts, and a bit of spit and glue.&lt;/p&gt;
&lt;p&gt;I started playing with &lt;a href=&#34;https://github.com/RDFLib/rdflib&#34;&gt;RDFLib&amp;rsquo;s&lt;/a&gt; SPARQL capabilities a few years ago as I put together the demo for &lt;a href=&#34;https://www.bobdc.com/blog/driving-hadoop-data-integratio&#34;&gt;Driving Hadoop data integration with standards-based models instead of code&lt;/a&gt;. I was pleasantly surprised to find out how easily it could run a CONSTRUCT query on triples stored in memory and then pass the result on to one or more additional queries, letting you pipeline a series of such queries with no disk I/O. Applying these techniques to replace my shell scripts and Perl scripts from last month showed me that these same techniques could be used for all kinds of RDF applications.&lt;/p&gt;
&lt;p&gt;When I was at &lt;a href=&#34;https://www.topquadrant.com/&#34;&gt;TopQuadrant&lt;/a&gt; I got to know &lt;a href=&#34;https://www.topquadrant.com/technology/sparqlmotion/&#34;&gt;SPARQLMotion&lt;/a&gt;, their (proprietary) drag-and-drop system for pipelining components that can do this sort of thing. RDFLib offers several graph manipulation methods that can extend what I&amp;rsquo;ve done here to do many additional SPARQLMotion-ish things. When I recently &lt;a href=&#34;https://twitter.com/bobdc/status/1023234084091453440&#34;&gt;asked&lt;/a&gt; about other pipeline component-based RDF development tools out there, I learned of &lt;a href=&#34;https://etl.linkedpipes.com/&#34;&gt;Linked Pipes ETL&lt;/a&gt;, &lt;a href=&#34;https://github.com/usc-isi-i2/Web-Karma&#34;&gt;Karma&lt;/a&gt;, &lt;a href=&#34;https://github.com/StataBS/ld-pipeline&#34;&gt;ld-pipeline&lt;/a&gt;, &lt;a href=&#34;https://github.com/vivo-project/VIVO-Harvester&#34;&gt;VIVO Harvester&lt;/a&gt;, &lt;a href=&#34;http://silkframework.org/&#34;&gt;Silk&lt;/a&gt;, &lt;a href=&#34;https://github.com/UnifiedViews&#34;&gt;UnifiedViews&lt;/a&gt;, and a PoolParty &lt;a href=&#34;https://www.poolparty.biz/unifiedviews/&#34;&gt;framework around Unified Views&lt;/a&gt;. I hope to check out as many of them as I can in the future, but with the functions I&amp;rsquo;ve written for my new Python script, I can now accomplish so much with so little Python code that my motivation to go looking beyond that is diminishing&amp;ndash;especially considering that when doing it this way, I have all of Python&amp;rsquo;s abilities to manipulate strings and data structures standing by in case I need them.&lt;/p&gt;
&lt;p&gt;For me, the two most basic RDF tasks to augment the general Python capabilities are retrieval of triples from a remote endpoint for local storage and querying of locally stored triples. RDFLib makes the latter easy. For the former I was &lt;a href=&#34;https://twitter.com/bobdc/status/1018127733459767296&#34;&gt;looking for a library&lt;/a&gt;, but Jindřich Mynarz &lt;a href=&#34;https://twitter.com/jindrichmynarz/status/1018134809544134656&#34;&gt;pointed out&lt;/a&gt; that no specialized library was necessary; he even showed me the &lt;a href=&#34;https://gist.github.com/jindrichmynarz/a947af0712682e0d23584719f0b6b400&#34;&gt;basic code&lt;/a&gt; to make it happen. (I swear I had tried a few times before posting the question on Twitter, so the brevity and elegance of his example were a bit embarrassing for me.)&lt;/p&gt;
&lt;p&gt;You can find my new Python script to replace last month&amp;rsquo;s work &lt;a href=&#34;https://github.com/bobdc/misc/blob/master/pythonsparql/pipelining.py&#34;&gt;on github&lt;/a&gt;. More than half of it is made up of the actual SPARQL queries being stored in variables. This is a good thing, because it means that the Python instructions (to retrieve triples from the endpoint, to load up the local graph with retrieved triples, to query that graph, and to build and then run new queries based on those query results) all together take up less than half of the script. In other words, the script is more about the queries than about the code to execute them.&lt;/p&gt;
&lt;p&gt;The main part of the script isn&amp;rsquo;t very long:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;# 1. Get the qnames for the geotagged entities within the city and store in graph g. 


queryRetrieveGeoPoints = queryRetrieveGeoPoints.replace(&amp;quot;CITY-QNAME&amp;quot;,cityQname)
url = endpoint + &amp;quot;?&amp;quot; + urllib.urlencode({&amp;quot;query&amp;quot;: queryRetrieveGeoPoints})
g.parse(url)
logging.info(&#39;Triples in graph g after queryRetrieveGeoPoints: &#39; + str(len(g)))


# 2. Take the subjects in graph g and create queries with a VALUES clause 
#    of up to maxValues of the subjects. 


subjectQueryResults = g.query(queryListSubjects)
splitAndRunRemoteQuery(&amp;quot;querySubjectData&amp;quot;,subjectQueryResults,
                       entityDataQueryHeader,entityDataQueryFooter)


# 3. See what classes are used and get their names and those of their superclasses.
classList = g.query(listClassesQuery)
splitAndRunRemoteQuery(&amp;quot;queryGetClassInfo&amp;quot;,classList,
                       queryGetClassesHeader,queryGetClassesFooter)


# 4. See what objects need labels and get them.
objectsThatNeedLabel = g.query(queryObjectsThatNeedLabel)
splitAndRunRemoteQuery(&amp;quot;queryObjectsThatNeedLabel&amp;quot;,objectsThatNeedLabel,
                       queryGetObjectLabelsHeader,queryGetObjectLabelsFooter)


print(g.serialize(format = &amp;quot;n3&amp;quot;))   # (Actually Turtle, which is what we want, not n3.)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;splitAndRunRemoteQuery&lt;/code&gt; function was one I wrote based on my prototype from last month.&lt;/p&gt;
&lt;p&gt;I first used RDFLib over &lt;a href=&#34;https://www.xml.com/pub/a/2003/02/12/rdflib.html&#34;&gt;15 years ago&lt;/a&gt;, when SPARQL hadn&amp;rsquo;t even been invented yet. Hardcore RDFLib fans will prefer the greater efficiency of its native functions over the use of SPARQL queries, but my goal here was to have SPARQL 1.1 queries drive all the action, and RDFLib supports this very nicely. Its native functions also offer additional capabilities that bring it closer to some of the pipelining things I remember from SPARQLMotion. For example, the &lt;a href=&#34;https://rdflib.readthedocs.io/en/stable/intro_to_graphs.html#set-operations-on-rdflib-graphs&#34;&gt;set operations on graphs&lt;/a&gt; let you perform actions such as unions, intersections, differences, and XORs of graphs, which can be handy when mixing and matching data from multiple sources to massage that data into a single cleaned-up dataset&amp;ndash;just the kind of thing that makes RDF so great in the first place.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Picture by &lt;a href=&#34;https://www.flickr.com/photos/mikecogh/&#34;&gt;Michael Coghlan&lt;/a&gt; on &lt;a href=&#34;https://www.flickr.com/photos/mikecogh/11429811244/&#34;&gt;Flickr&lt;/a&gt; (CC BY-SA 2.0)&lt;/em&gt;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2018">2018</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
    </item>
    
    <item>
      <title>Dividing and conquering SPARQL endpoint retrieval</title>
      <link>https://www.bobdc.com/blog/dividing-and-conquering-sparql/</link>
      <pubDate>Sun, 22 Jul 2018 11:52:42 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/dividing-and-conquering-sparql/</guid>
      
      
      <description><div>With the VALUES keyword.</div><div>&lt;img id=&#34;idm46294214072368&#34; src=&#34;https://www.bobdc.com/img/main/neonvalues.png&#34; border=&#34;0&#34; width=&#34;300&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;VALUES neon sign&#34;/&gt;
&lt;p&gt;When I &lt;a href=&#34;https://www.bobdc.com/blog/sparql-11s-new-values-keyword&#34;&gt;first tried SPARQL&amp;rsquo;s VALUES keyword&lt;/a&gt; (at which point it was pretty new to SPARQL, having only recently been added to SPARQL 1.1) I demoed it with a fairly artificial example. I &lt;a href=&#34;http://www.snee.com/bobdc.blog/2013/07/using-values-to-map-values-in.html&#34;&gt;later found&lt;/a&gt; that it solved one particular problem for me by letting me create a little lookup table. Recently, it gave me huge help in one of the most classic SPARQL development problems of all: how to retrieve so much data from an endpoint that the first attempts at that retrieval resulted in timeouts.&lt;/p&gt;
&lt;p&gt;The &lt;a href=&#34;https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/queries&#34;&gt;Wikidata:SPARQL query service/queries&lt;/a&gt; page includes an excellent Wikidata query to &lt;a href=&#34;https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/queries#Query_to_find_latitudes_and_longitudes_for_places_in_Paris&#34;&gt;find latitudes and longitudes for places in Paris&lt;/a&gt;. You can easily modify this query to retrieve from places within other cities, and I wanted to build on this query to make it retrieve additional available data about those places as well. While &lt;a href=&#34;https://www.bobdc.com/blog/the-wikidata-data-model-and-yo&#34;&gt;accounting for the indirection in the Wikidata query model&lt;/a&gt; made this a little more complicated, it wasn&amp;rsquo;t much trouble to write.&lt;/p&gt;
&lt;p&gt;The expanded query worked great for a city like Charlottesville, where I live, but for larger cities, the query was just asking for too much information from the endpoint and timed out. My new idea was to first ask for the roughly the same information that the Paris query above does, and to then request additional data about those entities a batch at a time with a series of queries that use the VALUES keyword to specify each batch. (I&amp;rsquo;ve pasted a sample query requesting one batch below.)&lt;/p&gt;
&lt;p&gt;It worked just fine. I put all the queries and other relevant files in a &lt;a href=&#34;http://snee.com/bobdc.blog/files/dividAndConquerWithVALUES.zip&#34;&gt;zip file&lt;/a&gt; for people who want to check it out, but it&amp;rsquo;s probably not worth looking at too closely, because in a month or two I&amp;rsquo;ll be replacing it with a Python version that does everything more efficiently. It&amp;rsquo;s still worth explaining the steps in this version&amp;rsquo;s shell script driver file, because the things I worked out for this prototype effort&amp;ndash;despite its Perl scripting and extensive disk I/O&amp;ndash;mean that the Python version should come together pretty quickly. That&amp;rsquo;s what prototypes are for!&lt;/p&gt;
&lt;h2 id=&#34;idm46294214061168&#34;&gt;The driver shell script&lt;/h2&gt;
&lt;p&gt;Before running the shell script, you specify the Wikidata local name of the city to query near the top of the &lt;code&gt;getCityEntities.rq&lt;/code&gt; SPARQL query file. (This is easier than it sounds&amp;ndash;for example, to do it for Charlottesville, go to its &lt;a href=&#34;https://en.wikipedia.org/wiki/Charlottesville,_Virginia&#34;&gt;Wikipedia page&lt;/a&gt; and click &lt;a href=&#34;https://www.wikidata.org/wiki/Q123766&#34;&gt;Wikidata item&lt;/a&gt; in the menu on the left to find that Q123766 is the local name.)&lt;/p&gt;
&lt;p&gt;Once that&amp;rsquo;s done, running the zip file&amp;rsquo;s &lt;code&gt;getCityData.sh&lt;/code&gt; shell script executes these main steps:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;It uses a &lt;a href=&#34;https://curl.haxx.se/&#34;&gt;curl&lt;/a&gt; command to send the &lt;code&gt;getCityEntities.rq&lt;/code&gt; CONSTRUCT query to the &lt;a href=&#34;https://query.wikidata.org/sparql&#34;&gt;https://query.wikidata.org/sparql&lt;/a&gt; endpoint.The curl command saves the resulting triples in a file called &lt;code&gt;cityEntities.ttl&lt;/code&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;It uses &lt;a href=&#34;https://jena.apache.org/documentation/query/index.html&#34;&gt;ARQ&lt;/a&gt; to run the &lt;code&gt;listSubjects.rq&lt;/code&gt; query on the new &lt;code&gt;cityEntities.ttl&lt;/code&gt; file, specifying that the result should be a TSV file.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The results of &lt;code&gt;listSubjects.rq&lt;/code&gt; get piped to a Perl script called &lt;code&gt;makePart2Queries.pl&lt;/code&gt;. This creates a series of CONSTRUCT query files that ask Wikidata for data about entities listed in a VALUES section. It puts 50 entries in each file&amp;rsquo;s VALUES section; this figure of 50 is stored in a &lt;code&gt;$maxLines&lt;/code&gt; variable in &lt;code&gt;makePart2Queries.pl&lt;/code&gt; where it can be reset if the endpoint is still timing out. This step also adds lines to a shell script called &lt;code&gt;callTempQueries.sh&lt;/code&gt;, where each line uses curl to call one of the queries that uses VALUES to request a batch of data.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;getCityData.sh&lt;/code&gt; next runs the &lt;code&gt;callTempQueries.sh&lt;/code&gt; shell script to execute all of these new queries, storing the resulting triples in the file &lt;code&gt;tempCityData.ttl&lt;/code&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The &lt;code&gt;tempCityData.ttl&lt;/code&gt; file has plenty of good data, but it can be used to get additional relevant data, so the script&amp;rsquo;s next line runs a query that creates a TSV file with a list of all of the classes found in &lt;code&gt;tempCityData.ttl&lt;/code&gt; triples of the form {?instance wdt:P31 ?class}. (wdt:P31 is the Wikidata equivalent of &lt;code&gt;rdf:type&lt;/code&gt;, indicating that a resource is an instance of a particular class.) That TSV file then drives the creation of a query that gets sent to the SPARQL endpoint to ask about the classes&amp;rsquo; parent and grandparent classes, and that data gets added to &lt;code&gt;tempCityData.ttl&lt;/code&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Another ARQ call in the script uses a local query to check for triple objects in the &lt;a href=&#34;http://www.wikidata.org/entity/&#34;&gt;http://www.wikidata.org/entity/&lt;/a&gt; namespace that don&amp;rsquo;t have &lt;code&gt;rdfs:label&lt;/code&gt; values and get them&amp;ndash;or at least, get the English ones, but it&amp;rsquo;s easy to fix if you want labels in different or additional languages.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The script runs one final ARQ query on &lt;code&gt;tempCityData.ttl&lt;/code&gt;: the classic &lt;code&gt;SELECT * WHERE {?s ?p ?o}&lt;/code&gt;. This request for all the triples actually tidies up the Turtle data a bit, storing all the triples with common subjects together. It puts the result in &lt;code&gt;cityData.ttl&lt;/code&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;One running theme of some of the shell script&amp;rsquo;s steps is the retrieval of labels associated with &lt;a href=&#34;https://www.w3.org/TeamSubmission/turtle/#qname&#34;&gt;qnames&lt;/a&gt;. Wikidata has a lot of triples like {wd:Q69040 wd:P361 wd:Q16950} that are just three qnames, so retrieved data will have more value to applications if people and processes can find out what each qname refers to.&lt;/p&gt;
&lt;p&gt;The main shell script has other housekeeping steps such as recording of the start and end times and deletion of the temporary files. I had more ideas for things to add, but I&amp;rsquo;ll save those for the Python version.&lt;/p&gt;
&lt;p&gt;The Python version won&amp;rsquo;t just be a more efficient version of my use of VALUES to do batch retrievals of data that might otherwise time out. It will demonstrate, more nicely, something that only gets hinted at in this mess of shell and Perl scripts: the ability to automate the generation of SPARQL queries that build on the results of previously executed queries so that they can all work together as a pipeline to drive increasingly sophisticated RDF application development.&lt;/p&gt;
&lt;p&gt;Here is a sample of one of the queries created to request data about one batch of entities within the specified city:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;PREFIX p: &amp;lt;http://www.wikidata.org/prop/&amp;gt; 
PREFIX wgs84: &amp;lt;http://www.w3.org/2003/01/geo/wgs84_pos#&amp;gt;
PREFIX rdfs: &amp;lt;http://www.w3.org/2000/01/rdf-schema#&amp;gt; 
PREFIX skos: &amp;lt;http://www.w3.org/2004/02/skos/core#&amp;gt; 


CONSTRUCT
{ ?s ?p ?o. 
  ?s ?p1 ?o1 . 
  ?s wgs84:lat ?lat . 
  ?s wgs84:long ?long .
  ?p rdfs:label ?pname .
  ?s wdt:P31 ?class .   
}
WHERE {
  VALUES ?s {
&amp;lt;http://www.wikidata.org/entity/Q42537129&amp;gt;
&amp;lt;http://www.wikidata.org/entity/Q30272197&amp;gt;
# about 48 more of those here...
}
  # wdt:P131 means &#39;located in the administrative territorial entity&#39; .
  ?s wdt:P131+ ?geoEntityWikidataID .  
      ?s p:P625 ?statement . # coordinate-location statement
  ?statement psv:P625 ?coordinate_node .
  ?coordinate_node wikibase:geoLatitude ?lat .
  ?coordinate_node wikibase:geoLongitude ?long .


  # Reduce the indirection used by Wikidata triples. Based on Tommy Potter query
  # at http://www.snee.com/bobdc.blog/2017/04/the-wikidata-data-model-and-yo.html.
  ?s ?directClaimP ?o .                   # Get the truthy triples. 
  ?p wikibase:directClaim ?directClaimP . # Find the wikibase properties linked
  ?p rdfs:label ?pname .                  # to the truthy triples&#39; predicates.


  # the following VALUES clause is actually faster than just
  # having specific triple patterns for those 3 p1 values.
  ?s ?p1 ?o1 .
  VALUES ?p1 {
    schema:description
    rdfs:label        
    skos:altLabel
  }


  ?s wdt:P31 ?class . # Class membership. Pull this and higher level classes out in later query.

  
  # If only English names desired
  FILTER (isURI(?o1) || lang(?o1) = &#39;en&#39; )
  # For English + something else, follow this pattern: 
  # FILTER (isURI(?o1) || lang(?o1) = &#39;en&#39; || lang(?o1) = &#39;de&#39;)


  FILTER(lang(?pname) = &#39;en&#39;)
}  
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;em&gt;Neon sign picture by &lt;a href=&#34;https://www.flickr.com/photos/jeremybrooks/&#34;&gt;Jeremy Brooks&lt;/a&gt; on &lt;a href=&#34;https://www.flickr.com/photos/jeremybrooks/2633307750/&#34;&gt;Flickr&lt;/a&gt; (CC BY-NC 2.0)&lt;/em&gt;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2018">2018</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
    </item>
    
    <item>
      <title>Running and querying my own Wikibase instance</title>
      <link>https://www.bobdc.com/blog/running-and-querying-my-own-wi/</link>
      <pubDate>Sun, 17 Jun 2018 11:17:14 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/running-and-querying-my-own-wi/</guid>
      
      
      <description><div>Querying it, of course, with SPARQL.</div><div>&lt;blockquote id=&#34;idm46076157065344&#34; class=&#34;pullquote&#34;&gt;Many of us have waited years for an open-source framework that makes the development of web-based RDF applications as easy as Ruby on Rails does for web-based SQL applications. This dockerized version of Wikibase looks like a big step in this direction.&lt;/blockquote&gt;
&lt;p&gt;When Dario Taraborelli&amp;rsquo;s &lt;a href=&#34;https://twitter.com/ReaderMeter/status/959921584159866881&#34;&gt;tweeted&lt;/a&gt; about how quickly he got a local wikibase instance and SPARQL endpoint up and running with &lt;a href=&#34;https://github.com/wmde/wikibase-docker&#34;&gt;wikibase-docker&lt;/a&gt;, he inspired me to give it a shot, and it was surprisingly easy and fun.&lt;/p&gt;
&lt;p&gt;I have minimal experience with docker. As instructed by wikibase-docker&amp;rsquo;s &lt;a href=&#34;https://github.com/wmde/wikibase-docker/blob/master/README-compose.md&#34;&gt;README&lt;/a&gt; page, I installed docker and docker-compose. (When I got to the &lt;a href=&#34;https://docs.docker.com/get-started/#test-docker-installation&#34;&gt;Test Docker Installation&lt;/a&gt; part of the &lt;a href=&#34;https://docs.docker.com/get-started/&#34;&gt;Get Started, Part 1: Orientation and setup&lt;/a&gt; page for setting up docker, the hello-world app gave me a &amp;ldquo;permission denied&amp;rdquo; problem, but &lt;a href=&#34;https://techoverflow.net/2017/03/01/solving-docker-permission-denied-while-trying-to-connect-to-the-docker-daemon-socket/&#34;&gt;this solution&lt;/a&gt; described at Techoverflow solved it. I did have to reboot, as it suggested.)&lt;/p&gt;
&lt;p&gt;Continuing along with the wikibase-docker README, when I clicked &amp;ldquo;http://localhost:8181&amp;rdquo; under &lt;a href=&#34;https://github.com/wmde/wikibase-docker/blob/master/README-compose.md#user-content-accessing-your-wikibase-instance-and-the-query-service-ui&#34;&gt;Accessing your Wikibase instance and the Query Service UI&lt;/a&gt; it was pretty cool to see my own local running instance of the wiki:&lt;/p&gt;
&lt;img id=&#34;idm46076157056352&#34; src=&#34;https://www.bobdc.com/img/main/wikidata1.png&#34; width=&#34;400&#34;/&gt;
&lt;p&gt;Moving along in the README, I clicked &amp;ldquo;Create a new item&amp;rdquo; before I clicked &amp;ldquo;Create a new property&amp;rdquo;, but when I saw that the new item&amp;rsquo;s property list offered no choices, I realized that I should define some properties before creating any items. Properties and items can have names, aliases, and descriptions in a wide choice of spoken languages, and Wikibase includes a nice choice of data types.&lt;/p&gt;
&lt;p&gt;After defining a property and creating items that had a value for that property, the &amp;ldquo;Query Service UI @ http://localhost:8282&amp;rdquo; link on the README led to a web form where I could enter a SPARQL query. I entered &lt;code&gt;SELECT * WHERE { ?s ?p ?o}&lt;/code&gt; and saw the default triples that were part of the store as well as triples about the items and property that I had created.&lt;/p&gt;
&lt;p&gt;The &amp;ldquo;Get an RDF dump from wikibase&amp;rdquo; docker command on the README page did just fine. Reviewing the triples in its output, I saw that the created entities fit the Wikidata data model described at &lt;a href=&#34;https://www.mediawiki.org/wiki/Wikibase/DataModel/Primer&#34;&gt;Wikibase/DataModel/Primer&lt;/a&gt;, which I wrote about at &lt;a href=&#34;https://www.bobdc.com/blog/the-wikidata-data-model-and-yo&#34;&gt;The Wikidata data model and your SPARQL queries&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;It took me some time (and a &lt;a href=&#34;https://twitter.com/bobdc/status/1001176880798752768&#34;&gt;tweet&lt;/a&gt;) to realize that the &amp;ldquo;Query Service Backend (Behind a proxy)&amp;rdquo; URL listed on the README file was the URL for the SPARQL endpoint. The first query I tried after that worked with no problem:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;curl http://localhost:8989/bigdata/sparql?query=SELECT%20DISTINCT%20%3Fp%20WHERE%20%7B%20%3Fs%20%3Fp%20%3Fo%20%7D
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;It was also easy to access this server from my phone across my home wifi when I substituted the machine&amp;rsquo;s name or IP address for &amp;ldquo;localhost&amp;rdquo; in the URLs above. The web interface was the same on a phone as on a big screen; the MediaWiki project&amp;rsquo;s &lt;a href=&#34;https://www.mediawiki.org/wiki/Manual:Mobiles,_tablets_and_responsive_design&#34;&gt;Mobiles, tablets and responsive design&lt;/a&gt; manual page describes some options for extending the interface. If someone out there is looking for UI work and has some time on their hands, contributing some phone and tablet responsiveness to this open source project would be a great line on your résumé.&lt;/p&gt;
&lt;p&gt;And finally, while the docker version of this is quick to get up and running, if you&amp;rsquo;re going far with your own MediaWiki installation, you&amp;rsquo;ll want to look over the &lt;a href=&#34;https://www.mediawiki.org/wiki/Wikibase/Installation&#34;&gt;Installation instructions&lt;/a&gt; for the regular, non-docker version.&lt;/p&gt;
&lt;p&gt;After I did these experiments and wrote my first draft of this, I discovered the medium.com posting &lt;a href=&#34;https://medium.com/@thisismattmiller/wikibase-for-research-infrastructure-part-1-d3f640dfad34&#34;&gt;Wikibase for Research Infrastructure &amp;ndash; Part 1&lt;/a&gt; by Pratt Institute librarian and researcher &lt;a href=&#34;http://thisismattmiller.com/&#34;&gt;Matt Miller&lt;/a&gt;. His piece describes a nice use case of following through on creating a Wikibase application and points to some handy Python scripts for automating the creation of classes and other structures from spreadsheets. His use case happens to be one of my favorite RDF-related available data sources: the &lt;a href=&#34;https://linkedjazz.org/&#34;&gt;Linked Jazz Project&lt;/a&gt;. I look forward to Part 2.&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s great to have such a comprehensive system running on my local machine, complete with a web interface that lets non-RDF people create and edit any data they want and, for the RDF people, a SPARQL interface to let them pull and manipulate that data. For more serious dataset development, the MediaWiki project includes some &lt;a href=&#34;https://www.mediawiki.org/wiki/Extension:Page_Forms/Quick_start_guide#The_easy_way_-_Special:CreateClass_2&#34;&gt;helpful documentation&lt;/a&gt; about how to define your own classes and associated properties and forms. (&lt;em&gt;July 20th note: that page is actually about &lt;a href=&#34;https://www.semantic-mediawiki.org/wiki/Semantic_MediaWiki&#34;&gt;Semantic MediaWiki&lt;/a&gt;, which I played around with a few years ago&amp;ndash;apparently I didn&amp;rsquo;t keep my notes on that and Wikibase as organized as I should have.&lt;/em&gt;)&lt;/p&gt;
&lt;p&gt;Many of us have waited years for an open-source framework that makes the development of web-based RDF applications as easy as &lt;a href=&#34;https://www.bobdc.com/blog/joining-the-ruby-on-rails-chor&#34;&gt;Ruby on Rails&lt;/a&gt; does for web-based SQL applications. The dockerized version of Wikibase looks like a big step in this direction.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2018">2018</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
      <category domain="https://www.bobdc.com//categories/wikidata">Wikidata</category>
      
    </item>
    
    <item>
      <title>RDF* and SPARQL*</title>
      <link>https://www.bobdc.com/blog/rdf-and-sparql/</link>
      <pubDate>Mon, 28 May 2018 09:36:59 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/rdf-and-sparql/</guid>
      
      
      <description><div>Reification can be pretty cool.</div><div>&lt;img id=&#34;idm45504699000944&#34; src=&#34;https://www.bobdc.com/img/main/rdfrdf.png&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; width=&#34;160&#34; alt=&#34;triple within a triple&#34;/&gt;
&lt;p&gt;After I posted &lt;a href=&#34;https://www.bobdc.com/blog/reification-is-a-red-herring&#34;&gt;Reification is a red herring (and you don&amp;rsquo;t need property graphs to assign data to individual relationships)&lt;/a&gt; last month, I had an amusingly difficult time explaining to my wife how that would generate so much Twitter activity. This month I wanted to make it clear that I&amp;rsquo;m not opposed to reification in and of itself, and I wanted to describe the fun I&amp;rsquo;ve been having playing with Olaf Hartig and Bryan Thompson&amp;rsquo;s RDF* and and SPARQL* extensions to these standards to make reification more elegant.&lt;/p&gt;
&lt;p&gt;In that post, I said that in many years of using RDF I&amp;rsquo;ve never needed to use reification because, for most use cases where it was a candidate solution, I was better off using RDFS to declare classes and properties that reflected the use case domain instead of going right to the standard reification syntax (awkward in any standardized serialization) that let me create triples about triples. My soapbox ranting in that post focused on the common argument that the property graph approach of systems like Tinkerpop and Neo4j is better than RDF because achieving similar goals in RDF would require reification; as I showed, it doesn&amp;rsquo;t.&lt;/p&gt;
&lt;p&gt;But, reification can still be very useful, especially in the world of metadata. (I am slightly jealous of the &lt;a href=&#34;https://journals.ala.org/index.php/lrts/article/view/5557/6839&#34;&gt;metadata librarians&lt;/a&gt; of the world for having the word &amp;ldquo;metadata&amp;rdquo; in their job title&amp;ndash;it sounds even cooler &lt;a href=&#34;https://cbpq.qc.ca/offre-demploi/bibliothecaire-aux-metadonnees&#34;&gt;in Canada&lt;/a&gt;: Bibliothécaire aux métadonnées.) If metadata is data about data, and more and more of the Information Science world is taking advantage of linked data technologies, then triples about triples are bound to be useful in their use of information for provenance, curation, and all kinds of scholarship about datasets.&lt;/p&gt;
&lt;p&gt;The conclusion of my blog post mentioned how, just as I was finishing it up, I discovered Olaf Hartig and Bryan Thompson&amp;rsquo;s 2014 paper &lt;a href=&#34;https://arxiv.org/pdf/1406.3399.pdf&#34;&gt;Foundations of an Alternative Approach to Reification in RDF&lt;/a&gt; and Blazegraph&amp;rsquo;s implementation of it. I decided to play with this a bit in Blazegraph in order to get a hands-on appreciation of what was possible, and I like it. (Olaf recently &lt;a href=&#34;https://twitter.com/olafhartig/statuses/995040531947409408&#34;&gt;mentioned on Twitter&lt;/a&gt; that these capabilities are being added into &lt;a href=&#34;https://jena.apache.org/&#34;&gt;Apache Jena&lt;/a&gt; as well, so this isn&amp;rsquo;t just a Blazegraph thing.)&lt;/p&gt;
&lt;p&gt;As I described in &lt;a href=&#34;https://www.bobdc.com/blog/trying-out-blazegraph&#34;&gt;Trying out Blazegraph&lt;/a&gt; two years ago, it&amp;rsquo;s pretty simple to download the Blazegraph jar, start it up, load RDF data, and query it. For my RDF* experiments, I started up Blazegraph and created a Blazegraph namespace with a mode of rdr and then did my first few experiments there.&lt;/p&gt;
&lt;p&gt;I started with the examples in Olaf&amp;rsquo;s slides &lt;a href=&#34;http://olafhartig.de/slides/RDFStarInvitedTalkWSP2018.pdf&#34;&gt;RDF* and SPARQL*: An Alternative Approach to Statement-Level Metadata in RDF&lt;/a&gt;. To make the slides visually cleaner, he left out full URIs and prefixes, so I added some to properly see the querying in action. I loaded his slide 15 data into my new Blazegraph namespace, specifying a format of Turtle-RDR. The double brackets that you see here are the RDF* extension that lets us create triples that are themselves resources that we can use as subjects and objects of other triples:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;@prefix d: &amp;lt;http://www.learningsparql.com/ns/data/&amp;gt; .
&amp;lt;&amp;lt;d:Kubrick d:influencedBy d:Welles&amp;gt;&amp;gt; d:significance 0.8 ;
      d:source &amp;lt;https://nofilmschool.com/2013/08/films-directors-that-influenced-stanley-kubrick&amp;gt; .
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This data tells us that the triple about Kubrick being influenced by Welles has a significance of 0.8 and a source at an article on nofilmschool.com.&lt;/p&gt;
&lt;p&gt;I then executed the following query, based on Olaf&amp;rsquo;s from slide 16, with no problem:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;PREFIX d: &amp;lt;http://www.learningsparql.com/ns/data/&amp;gt; 
SELECT ?x WHERE {
  &amp;lt;&amp;lt;?x d:influencedBy d:Welles&amp;gt;&amp;gt; d:significance ?sig .
  FILTER (?sig &amp;gt; 0.7)
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;In this case, the use of the double angle brackets is the SPARQL* extension that lets us do the same thing that this syntax does in RDF*. This query asks for whoever was named as being influenced by Welles in statements that have a significance greater than 0.7. The query worked just fine in Blazegraph.&lt;/p&gt;
&lt;p&gt;SPARQL* also lets you query for the components of triples that are being treated as independent resources. From Olaf&amp;rsquo;s slide 17, this next query asks for whoever was influenced by Welles and the significance and source of any returned statements, and it worked fine with the data above:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;PREFIX d: &amp;lt;http://www.learningsparql.com/ns/data/&amp;gt; 
SELECT ?x ?sig ?src WHERE {
  &amp;lt;&amp;lt;?x d:influencedBy d:Welles&amp;gt;&amp;gt; d:significance ?sig ;
  d:source ?src .
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;His slide 18 query returns the same result as that one, but takes the syntax a bit further by binding the triple pattern about someone influencing Welles to a variable and then querying for that:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;PREFIX d: &amp;lt;http://www.learningsparql.com/ns/data/&amp;gt; 
SELECT ?x ?sig ?src WHERE {
  BIND(&amp;lt;&amp;lt;?x d:influencedBy d:Welles&amp;gt;&amp;gt; AS ?t)
  ?t  d:significance ?sig ;
      d:source ?src .
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Moving on to more easy experiments, I found that all the examples on the Blazegraph page &lt;a href=&#34;https://wiki.blazegraph.com/wiki/index.php/Reification_Done_Right&#34;&gt;Reification Done Right&lt;/a&gt; worked exactly as shown there. That page also provides some nice background for ways to use RDF* and SPARQL* in Blazegraph.&lt;/p&gt;
&lt;p&gt;Blazegraph lets you do inferencing, so I couldn&amp;rsquo;t resist mixing that with RDF* and SPARQL*. I had to create a new Blazegraph namespace that not only had a Mode of rdr but also had the &amp;ldquo;Inference&amp;rdquo; box checked upon creation, and then I loaded this data:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;@prefix d:    &amp;lt;http://www.learningsparql.com/ns/data/&amp;gt; .
@prefix rdfs: &amp;lt;http://www.w3.org/2000/01/rdf-schema#&amp;gt; .


&amp;lt;&amp;lt;d:s1 d:p1 d:o1&amp;gt;&amp;gt; a d:Class2 .
&amp;lt;&amp;lt;d:s2 d:p2 d:o2&amp;gt;&amp;gt; a d:Class3 .


d:Class2 rdfs:subClassOf d:Class1 . 
d:Class3 rdfs:subClassOf d:Class1 . 
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;It creates two triples that are themselves resources, with one being an instance of Class2 and the other being an instanced of Class3. Two final triples tell us that each of those classes are subclasses of Class1. The following query asked for triples that are instances of Class1, despite the data having no explicit triples about Class1 instances, and Blazegraph did the inferencing and found both of them:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;PREFIX d: &amp;lt;http://www.learningsparql.com/ns/data/&amp;gt; 
SELECT ?x ?y ?z WHERE {
   &amp;lt;&amp;lt;?x ?y ?z&amp;gt;&amp;gt; a d:Class1 . 
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;After doing this inferencing, I was thinking that OWL metadata and inferencing about such triples should open up a lot of new possibilities, but I realized that none of those possibilities are necessarily new: they&amp;rsquo;ll just be easier to implement than they would have been using the old method of reification that used four triples to represent one. Still, being easier to implement counts for plenty, and I think that metadata librarians and other people doing work to build value around existing triples now have a reasonable syntax some nice tools to explore this.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2018">2018</category>
      
      <category domain="https://www.bobdc.com//categories/rdf">RDF</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
      <category domain="https://www.bobdc.com//categories/triplestores">triplestores</category>
      
    </item>
    
    <item>
      <title>Reification is a red herring</title>
      <link>https://www.bobdc.com/blog/reification-is-a-red-herring/</link>
      <pubDate>Sun, 22 Apr 2018 10:14:15 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/reification-is-a-red-herring/</guid>
      
      
      <description><div>And you don&#39;t need property graphs to assign data to individual relationships.</div><div>&lt;blockquote id=&#34;idm45510006274096&#34; class=&#34;pullquote&#34;&gt;RDF&#39;s very simple subject-predicate-object data model is a building block that you can use to build other models that can make your applications even better.&lt;/blockquote&gt;
&lt;p&gt;I recently &lt;a href=&#34;https://twitter.com/bobdc/status/972847454071779328&#34;&gt;tweeted&lt;/a&gt; that the ZDNet article &lt;a href=&#34;http://www.zdnet.com/article/back-to-the-future-does-graph-database-success-hang-on-query-language/&#34;&gt;Back to the future: Does graph database success hang on query language?&lt;/a&gt; was the best overview of the graph database world(s) that I&amp;rsquo;d seen so far, and I also warned that many such &amp;ldquo;overviews&amp;rdquo; were often just Neo4j employees plugging their own product. (The Neo4j company is actually called Neo Technology.) The most extreme example of this is the free O&amp;rsquo;Reilly book &lt;a href=&#34;https://neo4j.com/lp/book-graph-databases&#34;&gt;Graph Databases&lt;/a&gt;, which is free because it&amp;rsquo;s being given away by its three authors&amp;rsquo; common employer: Neo Technology! The book would have been more accurately titled &amp;ldquo;Building Graph Applications with Cypher&amp;rdquo;, the Neo4j query language. This 238-page book on graph databases manages to mention SPARQL and Gremlin only twice each. The ZDNet article above does a much more balanced job of covering RDF and SPARQL, Gremlin and Tinkerpop, and Cypher and Neo4j.&lt;/p&gt;
&lt;p&gt;The DZone article &lt;a href=&#34;https://dzone.com/articles/rdf-triple-stores-vs-labeled-property-graphs-whats&#34;&gt;RDF Triple Stores vs. Labeled Property Graphs: What&amp;rsquo;s the Difference?&lt;/a&gt; is by another Neo employee, field engineer &lt;a href=&#34;https://neo4j.com/blog/contributor/jesus-barrasa/&#34;&gt;Jesús Barrasa&lt;/a&gt;. It doesn&amp;rsquo;t mention Tinkerpop or Gremlin at all, but does a decent job of describing the different approach that property graph databases such as Neo4j and Tinkerpop take in describing graphs of nodes and edges when compared with RDF triplestores. Its straw man arguments about RDF&amp;rsquo;s supposed deficiencies as a data model reminded me of a common theme I&amp;rsquo;ve seen over the years.&lt;/p&gt;
&lt;p&gt;The fundamental thing that most people don&amp;rsquo;t get about RDF, including many people who are successfully using it to get useful work done, is that &lt;em&gt;RDF&amp;rsquo;s very simple subject-predicate-object data model is a building block that you can use to build other models that can make your applications even better&lt;/em&gt;. Just because RDF doesn&amp;rsquo;t require the use of schemas doesn&amp;rsquo;t mean that it can&amp;rsquo;t use them; the RDF Schema Language lets you declare classes, properties, and information about these that you can use to &lt;a href=&#34;https://www.bobdc.com/blog/using-sparql-queries-from-nati&#34;&gt;drive user interfaces&lt;/a&gt;, to enable more efficient and readable queries, and to do all the other things that people typically use schemas for. Even better, you can develop a schema for the subset of the data you care about (as opposed to being forced to choose between a schema for the whole data set or no schema at all, as with XML), which is great for data integration projects, and then build your schema up from there.&lt;/p&gt;
&lt;p&gt;Barrasa writes of property graphs that &amp;ldquo;[t]he important thing to remember here is that both the nodes and relationships have an internal structure, which differentiates this model from the RDF model. By internal structure, I mean this set of key-value pairs that describe them.&amp;rdquo; This is the first important difference between RDF and property graphs: in the latter, nodes and edges can each have their own separate set (implemented as an array in Neo4j) of key-value pairs. Of course, nodes in RDF don&amp;rsquo;t need this; to say that the node for Jack has an attribute-value pair of (hireDate, &amp;ldquo;2017-04-12&amp;rdquo;), we simply make another triple with Jack as the subject and these as the predicate and object.&lt;/p&gt;
&lt;p&gt;Describing the other key difference, Barrasa writes that while the nodes of property graphs have unique identifiers, &amp;ldquo;[i]n the same way, edges, or connections between nodes&amp;ndash;which we call relationships&amp;ndash;have an ID&amp;rdquo;. Property graph edges are unique at the instance level; if Jane reportsTo Jack and Jack reportsTo Jill, the two reportsTo relationships here each have their own unique identifier and their own set of key-value pairs to store information about each edge.&lt;/p&gt;
&lt;p&gt;He writes that in RDF &amp;ldquo;[t]he predicate will represent an edge&amp;ndash;a relationship&amp;ndash;and the object will be another node or a literal value. But here, from the point of view of the graph, that&amp;rsquo;s going to be another vertex.&amp;rdquo; Not necessarily, at least for the literal values; these represent the values in RDF&amp;rsquo;s equivalent of the key-value pairs&amp;ndash;the non-relationship information being attached to a node such as (hireDate, &amp;ldquo;2017-04-12&amp;rdquo;) above. This ability is why a node doesn&amp;rsquo;t need its own internal key-value data structure.&lt;/p&gt;
&lt;p&gt;He begins his list of differences between property graphs and RDF with the big one mentioned above: &amp;ldquo;Difference #1: RDF Does Not Uniquely Identify Instances of Relationships of the Same Type,&amp;rdquo; which is certainly true. But, his example, which he describes as &amp;ldquo;an RDF graph in which Dan cannot like Ann three times&amp;rdquo;, is very artificial.&lt;/p&gt;
&lt;p&gt;One of his &amp;ldquo;RDF workarounds&amp;rdquo; for using RDF to describe that Dan liked Ann three times is reification, in which we convert each triple to four triples: one saying that a given resource is an RDF statement, a second identifying the resource&amp;rsquo;s subject, a third naming the predicate, and a fourth naming the object. This way, the statement itself has identity, and we can add additional information about it as triples that use the statement&amp;rsquo;s identifier as a subject and additional predicates and objects as key-value pairs such as (time, &amp;ldquo;2018-03-04T11:43:00&amp;rdquo;) to show when a particular &amp;ldquo;like&amp;rdquo; took place. Barrasa writes &amp;ldquo;This is quite ugly&amp;rdquo;; I agree, and it can also do bad things to storage requirements.&lt;/p&gt;
&lt;p&gt;In my 15 years of working with RDF, I have never felt the need to use reification. It&amp;rsquo;s funny how the 2004 &lt;a href=&#34;https://www.w3.org/TR/rdf-primer/&#34;&gt;RDF Primer 1.0&lt;/a&gt; has a &lt;a href=&#34;https://www.w3.org/TR/rdf-primer/#reification&#34;&gt;section on reification&lt;/a&gt; but the 2014 &lt;a href=&#34;https://www.w3.org/TR/rdf11-primer/&#34;&gt;RDF Primer 1.1&lt;/a&gt; (of which I am proud to be listed in the &lt;a href=&#34;https://www.w3.org/TR/rdf11-primer/#section-Acknowledgments&#34;&gt;Acknowledgments&lt;/a&gt;) doesn&amp;rsquo;t even mention reification, because simpler modeling techniques are available, so reification was rarely if ever used.&lt;/p&gt;
&lt;p&gt;By &amp;ldquo;modeling techniques&amp;rdquo; I mean &amp;ldquo;declaring and then using a model&amp;rdquo;, although in RDF, you don&amp;rsquo;t even have to declare it. If you want to keep track of separate instances of employees, or games, or buildings, you can declare any of these as a class and then create instances of it; similarly, if you want to keep track of separate instances of a particular relationship, declare a class for that relationship and then create instances of it.&lt;/p&gt;
&lt;p&gt;How would we apply this to Barrasa&amp;rsquo;s example, where he wants to keep track of information about Likes? We use a class called Like, where each instance identifies who liked who. (When I first wrote that previous sentence, I wrote that we can &lt;em&gt;declare&lt;/em&gt; a class called Like, but again, we don&amp;rsquo;t need to declare it to use it. Declaring it is better for serious applications where multiple developers must work together, because part of the point of a schema is to give everyone a common frame of reference about the data they&amp;rsquo;re working with.) The instance could also identify the date and time of the Like, comments associated with it, and anything else you wanted to add as a set of key-value pairs for each Like instance that is implemented as just more triples.&lt;/p&gt;
&lt;p&gt;Here&amp;rsquo;s an example. After optional declarations of the relevant class and properties associated with it, the following has four Likes showing who liked who when and a &amp;ldquo;foo&amp;rdquo; value to demonstrate the association of arbitrary metadata with that Like.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;@prefix d:    &amp;lt;http://learningsparql.com/ns/data/&amp;gt; .
@prefix m:    &amp;lt;http://learningsparql.com/ns/model/&amp;gt; .
@prefix rdfs: &amp;lt;http://www.w3.org/2000/01/rdf-schema#&amp;gt; . 


# Optional schema.
m:Like  a rdfs:Class .          # A class...
m:liker rdfs:domain m:Like .    # and properties that go with this class.
m:liked rdfs:domain m:Like .
m:foo   rdfs:domain m:Like .


[] a m:Like ;
   m:liker d:Dan ;
   m:liked d:Ann ;
   m:time &amp;quot;2018-03-04T11:43:00&amp;quot; ;
   m:foo &amp;quot;bar&amp;quot; .


[] a m:Like ;
   m:liker d:Dan ;
   m:liked d:Ann ;
   m:time &amp;quot;2018-03-04T11:58:00&amp;quot; ;
   m:foo &amp;quot;baz&amp;quot; .


[] a m:Like ;
   m:liker d:Dan ;
   m:liked d:Ann ;
   m:time &amp;quot;2018-03-04T12:04:00&amp;quot; ;
   m:foo &amp;quot;bat&amp;quot; .


[] a m:Like ;
   m:liker d:Ann ;
   m:liked d:Dan ;
   m:time &amp;quot;2018-03-04T12:06:00&amp;quot; ;
   m:foo &amp;quot;bam&amp;quot; .
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Instead of making up specific identifiers for each Like, I made them blank nodes so that the RDF processing software will generate identifiers and keep track of them.&lt;/p&gt;
&lt;p&gt;As to Barrasa&amp;rsquo;s use case of counting how many times Dan liked Ann, it&amp;rsquo;s pretty easy with SPARQL:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;PREFIX d: &amp;lt;http://learningsparql.com/ns/data/&amp;gt; 
PREFIX m: &amp;lt;http://learningsparql.com/ns/model/&amp;gt;


SELECT (count(*) AS ?likeCount) WHERE {
  ?like a m:Like ;
        m:liker d:Dan ;
        m:liked d:Ann .
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;(This query would actually work with just the &lt;code&gt;m:liker&lt;/code&gt; and &lt;code&gt;m:liked&lt;/code&gt; triple patterns, but as with the &lt;a href=&#34;https://twitter.com/bobdc/status/952910713751769088&#34;&gt;example that I tweeted to Dan Brickley about&lt;/a&gt;, declaring your RDF resources as instances of classes can lay the groundwork for more efficient and readable queries.) Here is &lt;a href=&#34;https://jena.apache.org/documentation/query/&#34;&gt;ARQ&lt;/a&gt;&amp;rsquo;s output for this query:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;-------------
| likeCount |
=============
| 3         |
-------------
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Let&amp;rsquo;s get a little fancier. Instead of counting all of Dan&amp;rsquo;s likes of Ann, we&amp;rsquo;ll just list the ones from before noon on March 3, sorted by their foo values:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;PREFIX d: &amp;lt;http://learningsparql.com/ns/data/&amp;gt; 
PREFIX m: &amp;lt;http://learningsparql.com/ns/model/&amp;gt;


SELECT ?fooValue ?time WHERE {
  ?like a m:Like ;
        m:liker d:Dan ;
        m:liked d:Ann ;
        m:time ?time ;
        m:foo ?fooValue .
FILTER (?time &amp;lt; &amp;quot;2018-03-04T12:00&amp;quot;)
}
ORDER BY ?fooValue
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And here is ARQ&amp;rsquo;s result for this query:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;------------------------------------
| fooValue | time                  |
====================================
| &amp;quot;bar&amp;quot;    | &amp;quot;2018-03-04T11:43:00&amp;quot; |
| &amp;quot;baz&amp;quot;    | &amp;quot;2018-03-04T11:58:00&amp;quot; |
------------------------------------
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;After working through a similar example for modeling flights between New York and San Francisco, Barrasa begins a sentence &amp;ldquo;Because we can&amp;rsquo;t create such a simple model in RDF&amp;hellip;&amp;rdquo; This is ironic; the RDF model is simpler than the Labeled Property Graph model, because it&amp;rsquo;s all subject-predicate-object triples without the use of additional data structures attached to the graph nodes and edges. His RDF version would have been much simpler if he had just created instances of a class called Flight, because again, while the base model of RDF is the simple triple, more complex models can easily be created by declaring classes, properties, and information about those classes and properties&amp;ndash;which we can do by just creating new triples!&lt;/p&gt;
&lt;p&gt;To summarize, complaints about RDF that focus on reification are so 2004, and they are a red herring, because they distract from the greater power that RDF&amp;rsquo;s modeling abilities bring to application development.&lt;/p&gt;
&lt;p&gt;A funny thing happened after writing all this, though. As part of my plans to look into Tinkerpop and Gremlin and potential connections to RDF as a next step, I was looking into Stardog and Blazegraph&amp;rsquo;s common support of both. I found a Blazegraph page called &lt;a href=&#34;https://wiki.blazegraph.com/wiki/index.php/Reification_Done_Right&#34;&gt;Reification Done Right&lt;/a&gt; where I learned of Olaf Hartig and Bryan Thompson&amp;rsquo;s 2014 paper &lt;a href=&#34;https://arxiv.org/pdf/1406.3399.pdf&#34;&gt;Foundations of an Alternative Approach to Reification in RDF&lt;/a&gt;. If Blazegraph has implemented their ideas, then there is a lot of potential there. And if the Blazegraph folks brought this with them to Amazon Neptune, that would be even more interesting, although apparently that hasn&amp;rsquo;t shown up yet.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2018">2018</category>
      
      <category domain="https://www.bobdc.com//categories/rdf">RDF</category>
      
    </item>
    
    <item>
      <title>Album &#34;Gin &amp; Heptatonic&#34; by my band The Heptatonic Jazz Quintet</title>
      <link>https://www.bobdc.com/blog/album-gin-heptatonic-by-my-ban/</link>
      <pubDate>Sun, 25 Mar 2018 12:52:28 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/album-gin-heptatonic-by-my-ban/</guid>
      
      
      <description><div>Now available on the big streaming services.</div><div>&lt;p&gt;&lt;a href=&#34;http://www.heptatonic.com&#34;&gt;&lt;img id=&#34;idm45698402796880&#34; src=&#34;https://www.bobdc.com/img/main/frontCover400x400.png&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;Gin &amp; Heptatonic cover&#34; width=&#34;260&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;(I promise to go back to writing about RDF and related technology with my next entry, which is tentatively titled &amp;ldquo;Reification is a red herring: you don&amp;rsquo;t need property graphs to assign data to individual relationships.&amp;rdquo;)&lt;/p&gt;
&lt;p&gt;Along with the jazz bass playing that I&amp;rsquo;ve been working on since 2003, I&amp;rsquo;ve written a few jazz tunes to try with the people I played with, so I recently got together some of my favorite local musicians and recorded an album of these songs. As soon as I told my wife that I planned to call the band &amp;ldquo;The Heptatonic Jazz Quintet&amp;rdquo; she suggested calling the album &amp;ldquo;Gin &amp;amp; Heptatonic&amp;rdquo;, and I couldn&amp;rsquo;t argue with that. (A &lt;a href=&#34;https://en.wikipedia.org/wiki/Heptatonic_scale&#34;&gt;heptatonic scale&lt;/a&gt; is a scale with seven notes, like most scales in Western music. And of course, beginning with &amp;ldquo;hep&amp;rdquo; makes it a great name for a jazz band. I was &lt;a href=&#34;https://twitter.com/bobdc/status/718815744218284032&#34;&gt;thrilled&lt;/a&gt; to grab the domain name &lt;a href=&#34;http://www.heptatonic.com&#34;&gt;heptatonic.com&lt;/a&gt; for only $12.) The music is mostly hard bop, swing, and variations on those.&lt;/p&gt;
&lt;p&gt;My brother &lt;a href=&#34;http://mcylinder.com/index.php?title=Main_Page&#34;&gt;Peter&lt;/a&gt; produced the album and did the excellent &lt;a href=&#34;http://www.concordmusicgroup.com/labels/prestige/&#34;&gt;Prestige&lt;/a&gt; and &lt;a href=&#34;http://www.bluenote.com/&#34;&gt;Blue Note&lt;/a&gt;-inspired front cover using a &lt;a href=&#34;https://www.flickr.com/photos/jenny-pics/9568936573&#34;&gt;picture&lt;/a&gt; that I found in a Flickr search for Creative Commons CC BY 2.0 images. I did the &lt;a href=&#34;http://heptatonic.com/img/backCoverMockup634x634.png&#34;&gt;back cover&lt;/a&gt; myself with a deep dive into &lt;a href=&#34;https://www.gimp.org/&#34;&gt;GIMP&lt;/a&gt;. (On the topic of open source Linux-Windows-Mac software that played a role, I love the &lt;a href=&#34;https://musescore.com/&#34;&gt;MuseScore&lt;/a&gt; scoring program and used it for lead sheets, MIDI demos, and horn arrangements.)&lt;/p&gt;
&lt;p&gt;Two songs have lyrics. I knew that the album&amp;rsquo;s closing song &amp;ldquo;Let&amp;rsquo;s&amp;rdquo; required greater lyrical skills than I was capable of, so for that I called in my old New York music friend Philip Shelley. His illustrious musical career included the production of a &lt;a href=&#34;http://www.snee.com/music/ha/&#34;&gt;demo&lt;/a&gt; of the last serious rock band I was in many years ago, and he wrote a song on the other demo. (You can read more about my limited New York rock career in an &lt;a href=&#34;https://www.bobdc.com/blog/me-as-80s-new-york-lead-guitar&#34;&gt;older blog entry&lt;/a&gt;.) Because no one in the quintet had any singing ambitions, for those two songs we got special guest &lt;a href=&#34;https://www.youtube.com/watch?v=yv3mGENY6BY&#34;&gt;Dick Orange&lt;/a&gt;, a popular local singer who specializes in &amp;ldquo;the great American songbook&amp;rdquo;, which generally means songs made famous by Frank Sinatra.&lt;/p&gt;
&lt;p&gt;It was interesting to learn about the current infrastructure of getting music out where people can hear it. A &lt;a href=&#34;http://www.musicforpicture.com/&#34;&gt;former business partner of my brother&amp;rsquo;s&lt;/a&gt; recommended &lt;a href=&#34;http://www.tunecore.com&#34;&gt;TuneCore&lt;/a&gt;, so I had them print a hundred CDs and, more importantly, take care of the music publishing administration and distribute the album to &lt;a href=&#34;https://open.spotify.com/album/6uLlvZnQWTnU6WA1uqbj3M&#34;&gt;Spotify&lt;/a&gt;, &lt;a href=&#34;https://listen.tidal.com/album/85804598&#34;&gt;Tidal&lt;/a&gt;, &lt;a href=&#34;https://www.amazon.com/Gin-Heptatonic-Jazz-Quintet/dp/B07BF3P9KV&#34;&gt;Amazon&lt;/a&gt;, Apple Music, iTunes, and other services. (I can&amp;rsquo;t provide you with Apple Music or iTunes links to the album; just search for &amp;ldquo;heptatonic&amp;rdquo; from inside of your favorite Apple walled garden.)&lt;/p&gt;
&lt;p&gt;So if you like jazz, please check out the album and &amp;ldquo;Like&amp;rdquo; the band&amp;rsquo;s &lt;a href=&#34;https://www.facebook.com/heptatonicjazzquintet&#34;&gt;Facebook page&lt;/a&gt;. If you&amp;rsquo;re in the Charlottesville Virginia area on June 1st, come to our CD Release Party at Cville Coffee, which has wine and beer in addition to coffee.&lt;/p&gt;
&lt;p&gt;And I promise: next I&amp;rsquo;ll go back to blogging about triples!&lt;/p&gt;
&lt;img id=&#34;idm45698402775072&#34; src=&#34;https://www.bobdc.com/img/main/hjqwithdick.jpg&#34; border=&#34;0&#34; style=&#34;display: block;margin-left: auto;margin-right: auto &#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;Heptatonic Jazz Quintet with Dick Orange&#34; width=&#34;360&#34;/&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2018">2018</category>
      
      <category domain="https://www.bobdc.com//categories/music">music</category>
      
    </item>
    
    <item>
      <title>Playing jazz bass</title>
      <link>https://www.bobdc.com/blog/playing-jazz-bass/</link>
      <pubDate>Sun, 25 Feb 2018 12:15:11 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/playing-jazz-bass/</guid>
      
      
      <description><div>A brief crash course.</div><div>&lt;p&gt;I enjoy writing short tutorials to get people started on something that may have seemed intimidating to them before, and I thought it might be fun to write up something that isn&amp;rsquo;t related to software but that I have thought a lot about in the last 15 years: jazz bass playing.&lt;/p&gt;
&lt;p&gt;A few basic patterns that you can repeat over nearly any chord will get you pretty far. Any rock or classical bass player should be able to pick these up quickly. It should also work for any guitar player, because both electric and upright basses are tuned like the low four strings of a guitar. (Of course, the upright lacks frets, so you have to put your left hand&amp;rsquo;s fingers where the frets would be.) This crash course can be useful to keyboard players as well, who can treat it as a guide to what to play with their left hand for jazz tunes.&lt;/p&gt;
&lt;p&gt;You can think of just about all jazz as being composed of 7th chords: major 7th, minor 7th, dominant 7th, and, less often, diminished seventh, or half diminished chords. These each consist of four notes, and the distances between the notes are what make them sound different&amp;ndash;for example, the first two notes of a major 7th are a major third apart, and in a minor 7th they&amp;rsquo;re a minor third apart. Jazz musicians who see a three note triad chord like D minor may just add the seventh anyway, treating it as a D minor 7th. For a dominant 7th such as G7 in the key of C, jazz musicians since the advent of bebop in the 1940s sometimes add more notes to the chord such as the 9th, 11th, and 13th notes of the root note&amp;rsquo;s scale. They may even shift some of those added notes up or down a half step so that you see a fancy chord name like G#9. As a bass player, just think of that as a G7. To summarize, it&amp;rsquo;s simplest to think of it all as 7th chords.&lt;/p&gt;
&lt;p&gt;There are some classic patterns that bass players typically play over these 7th chords, and if you learn a few of them and the notes of the chords, you can play simple jazz bass lines. Guitar players know that if they play the notes of an A minor 7th chord and then move their left hand one fret up the neck and do the same thing, they&amp;rsquo;ll be playing a Bb minor 7th, so learning how to play all the chords means learning only a few patterns that you can play up and down the neck. The same applies to these jazz bassline patterns.&lt;/p&gt;
&lt;p&gt;A walking jazz bass line is nearly all quarter notes, so when you see &amp;ldquo;1357&amp;rdquo; below, for a given chord in a given bar played in 4/4 time, you would play these four notes as quarter notes: the root of the chord (the 1), the 3rd, the 5th, and the 7th. For example, over an A minor 7th chord, 1357 would mean playing A C E G.&lt;/p&gt;
&lt;p&gt;For each of these patterns, we&amp;rsquo;ll look at how you would play them on the first four bars of the jazz standard Autumn Leaves. (Compare &lt;a href=&#34;https://www.youtube.com/watch?v=ZEMCeymW1Ow&#34;&gt;Nat King Cole&amp;rsquo;s version&lt;/a&gt; with &lt;a href=&#34;https://www.youtube.com/watch?v=rsz6TE6t7-A&#34;&gt;Miles Davis&amp;rsquo;s&lt;/a&gt;; Miles&amp;rsquo; fifty-second intro puts off the actual song a bit.)&lt;/p&gt;
&lt;h2 id=&#34;idm45853255691392&#34;&gt;1357&lt;/h2&gt;
&lt;p&gt;This is probably the most important pattern, but not the one you&amp;rsquo;ll use the most. It&amp;rsquo;s just an arpeggio of the chord&amp;ndash;that is, the playing of each note of the 7th chord from the root up. It&amp;rsquo;s an important pattern to practice with any given song because it helps you to really understand the song&amp;rsquo;s structure. Over the first four bars of Autumn Leaves, this pattern would look like this on a bass staff (click the play button underneath it to hear the bass line with a piano and drums generated by the excellent open source scoring program &lt;a href=&#34;https://musescore.org/en&#34;&gt;MuseScore&lt;/a&gt;):&lt;/p&gt;
&lt;p&gt;&lt;img id=&#34;idm45853255688944&#34; src=&#34;https://www.bobdc.com/img/main/bassCrashCourse/1357.png&#34;/&gt;&lt;br /&gt;
&lt;audio id=&#34;idm45853255688240&#34; controls=&#34;controls&#34;&gt; &lt;source id=&#34;idm45853255687664&#34; src=&#34;https://www.bobdc.com/img/main/bassCrashCourse/1357.mp3&#34;/&gt; &lt;/audio&gt;&lt;/p&gt;
&lt;p&gt;Repeating the same pattern for four bars is not something you&amp;rsquo;d want to do when playing with other people, but for this pattern it&amp;rsquo;s something worth doing for an entire song while practicing on your own because it helps you to get to know the song&amp;rsquo;s chords better.&lt;/p&gt;
&lt;h2 id=&#34;idm45853255685904&#34;&gt;1353&lt;/h2&gt;
&lt;p&gt;This one is so useful that I use it too often when I&amp;rsquo;m on automatic pilot. You can&amp;rsquo;t go wrong with it. I mentioned above that the main difference between a major seventh chord and a minor seventh chord is the &amp;ldquo;3&amp;rdquo; note; this pattern really brings that out while still hitting the most important notes of the chord from a bass player&amp;rsquo;s perspective&amp;ndash;the root and the fifth&amp;ndash;on the crucial first and third beats of the bar. Here it is over the start of Autumn Leaves:&lt;/p&gt;
&lt;p&gt;&lt;img id=&#34;idm45853255684160&#34; src=&#34;https://www.bobdc.com/img/main/bassCrashCourse/1353.png&#34;/&gt;&lt;br /&gt;
&lt;audio id=&#34;idm45853255683456&#34; controls=&#34;controls&#34;&gt; &lt;source id=&#34;idm45853255682928&#34; src=&#34;https://www.bobdc.com/img/main/bassCrashCourse/1353.mp3&#34;/&gt; &lt;/audio&gt;&lt;/p&gt;
&lt;h2 id=&#34;idm45853255681840&#34;&gt;1155&lt;/h2&gt;
&lt;p&gt;This seems almost too simple, but it sounds great if you give it a strong swing feel on a song like Duke Ellington&amp;rsquo;s &lt;a href=&#34;https://www.youtube.com/watch?v=TrytKuC3Z_o&#34;&gt;Satin Doll&lt;/a&gt;. Here it is over Autumn Leaves:&lt;/p&gt;
&lt;p&gt;&lt;img id=&#34;idm45853255680144&#34; src=&#34;https://www.bobdc.com/img/main/bassCrashCourse/1155.png&#34;/&gt;&lt;br /&gt;
&lt;audio id=&#34;idm45853255679440&#34; controls=&#34;controls&#34;&gt; &lt;source id=&#34;idm45853255678912&#34; src=&#34;https://www.bobdc.com/img/main/bassCrashCourse/1155.mp3&#34;/&gt; &lt;/audio&gt;&lt;/p&gt;
&lt;h2 id=&#34;idm45853255677824&#34;&gt;1231&lt;/h2&gt;
&lt;p&gt;The 2nd note of the chord&amp;rsquo;s scale is not a chord tone, but here it leads to a chord tone on the crucial third beat. This is the first pattern we&amp;rsquo;ve seen that doesn&amp;rsquo;t always have either a 1 or a 5 on the first and third beat; the 3 on the third beat brings out the color of the chord more. In Autumn Leaves:&lt;/p&gt;
&lt;p&gt;&lt;img id=&#34;idm45853255676704&#34; src=&#34;https://www.bobdc.com/img/main/bassCrashCourse/1231.png&#34;/&gt;&lt;br /&gt;
&lt;audio id=&#34;idm45853255676000&#34; controls=&#34;controls&#34;&gt; &lt;source id=&#34;idm45853255675472&#34; src=&#34;https://www.bobdc.com/img/main/bassCrashCourse/1231.mp3&#34;/&gt; &lt;/audio&gt;&lt;/p&gt;
&lt;h2 id=&#34;idm45853255674640&#34;&gt;1235&lt;/h2&gt;
&lt;p&gt;Similar to the last one, and similarly useful. In Autumn Leaves:&lt;/p&gt;
&lt;p&gt;&lt;img id=&#34;idm45853255673760&#34; src=&#34;https://www.bobdc.com/img/main/bassCrashCourse/1235.png&#34;/&gt;&lt;br /&gt;
&lt;audio id=&#34;idm45853255673056&#34; controls=&#34;controls&#34;&gt; &lt;source id=&#34;idm45853255672528&#34; src=&#34;https://www.bobdc.com/img/main/bassCrashCourse/1235.mp3&#34;/&gt; &lt;/audio&gt;&lt;/p&gt;
&lt;h2 id=&#34;idm45853255671440&#34;&gt;1875&lt;/h2&gt;
&lt;p&gt;The 8 here really refers to the 1, but an octave higher. This is our first pattern with a 7th in it. In Autumn Leaves:&lt;/p&gt;
&lt;p&gt;&lt;img id=&#34;idm45853255670512&#34; src=&#34;https://www.bobdc.com/img/main/bassCrashCourse/1875.png&#34;/&gt;&lt;br /&gt;
&lt;audio id=&#34;idm45853255669808&#34; controls=&#34;controls&#34;&gt; &lt;source id=&#34;idm45853255669280&#34; src=&#34;https://www.bobdc.com/img/main/bassCrashCourse/1875.mp3&#34;/&gt; &lt;/audio&gt;&lt;/p&gt;
&lt;p&gt;If you replace each quarter note in that with two swung eighth notes, you&amp;rsquo;d have a classic Chicago blues bass line, although major seventh chords don&amp;rsquo;t come up in Chicago blues very often:&lt;/p&gt;
&lt;p&gt;&lt;img id=&#34;idm45853255667984&#34; src=&#34;https://www.bobdc.com/img/main/bassCrashCourse/11887755.png&#34;/&gt;&lt;br /&gt;
&lt;audio id=&#34;idm45853255667280&#34; controls=&#34;controls&#34;&gt; &lt;source id=&#34;idm45853255666752&#34; src=&#34;https://www.bobdc.com/img/main/bassCrashCourse/11887755.mp3&#34;/&gt; &lt;/audio&gt;&lt;/p&gt;
&lt;p&gt;(John Paul Jones&amp;rsquo; bass line in Led Zeppelin&amp;rsquo;s &lt;a href=&#34;https://www.youtube.com/watch?v=wEPog_WdPE4&#34;&gt;How Many More Times&lt;/a&gt; is a variation on this: 1 8757 1 8 7 5.)&lt;/p&gt;
&lt;h2 id=&#34;idm45853255664432&#34;&gt;8753&lt;/h2&gt;
&lt;p&gt;Going down from the root of the chord through the chord&amp;rsquo;s other notes is also great. Again, you have the 1 (an octave higher this time) and the 5 on the first and third beat. In Autumn Leaves:&lt;/p&gt;
&lt;p&gt;&lt;img id=&#34;idm45853255663424&#34; src=&#34;https://www.bobdc.com/img/main/bassCrashCourse/8753.png&#34;/&gt;&lt;br /&gt;
&lt;audio id=&#34;idm45853255662720&#34; controls=&#34;controls&#34;&gt; &lt;source id=&#34;idm45853255662192&#34; src=&#34;https://www.bobdc.com/img/main/bassCrashCourse/8753.mp3&#34;/&gt; &lt;/audio&gt;&lt;/p&gt;
&lt;h2 id=&#34;idm45853255661104&#34;&gt;Half bars&lt;/h2&gt;
&lt;p&gt;Jazz songs typically have one chord per bar. There are songs ranging from &lt;a href=&#34;https://www.youtube.com/watch?v=5G7UIeYGq0k&#34;&gt;I Got Rhythm&lt;/a&gt; (and the hundreds of songs based on it) to John Coltrane&amp;rsquo;s &lt;a href=&#34;https://www.youtube.com/watch?v=30FTr6G53VU&#34;&gt;Giant Steps&lt;/a&gt; that are mostly two chords per bar, but in most jazz you&amp;rsquo;ll see one chord per bar with the occasional two-chord bar at the end of a four- or eight-bar phrase. If you play the chord notes 13, 15, or 85 over each half bar, you&amp;rsquo;ll be fine. Here are the first four bars of &amp;ldquo;I Got Rhythm&amp;rdquo; using 13 13 15 85 13 85 15 85:&lt;/p&gt;
&lt;p&gt;&lt;img id=&#34;idm45853255658352&#34; src=&#34;https://www.bobdc.com/img/main/bassCrashCourse/IGotRhythm.png&#34;/&gt;&lt;br /&gt;
&lt;audio id=&#34;idm45853255657648&#34; controls=&#34;controls&#34;&gt; &lt;source id=&#34;idm45853255657120&#34; src=&#34;https://www.bobdc.com/img/main/bassCrashCourse/IGotRhythm.mp3&#34;/&gt; &lt;/audio&gt;&lt;/p&gt;
&lt;h2 id=&#34;idm45853255656032&#34;&gt;Putting some together&lt;/h2&gt;
&lt;p&gt;Good bass playing mixes and matches these (and more) patterns. Below I&amp;rsquo;ve written out a bass line for the first eight bars of Autumn Leaves, labeling which of the patterns above is used in each bar:&lt;/p&gt;
&lt;p&gt;&lt;img id=&#34;idm45853255655024&#34; src=&#34;https://www.bobdc.com/img/main/bassCrashCourse/AutumnLeaves8bars.png&#34;/&gt;&lt;br /&gt;
&lt;audio id=&#34;idm45853255654304&#34; controls=&#34;controls&#34;&gt; &lt;source id=&#34;idm45853255653776&#34; src=&#34;https://www.bobdc.com/img/main/bassCrashCourse/AutumnLeaves8bars.mp3&#34;/&gt; &lt;/audio&gt;&lt;/p&gt;
&lt;p&gt;Note how all the patterns listed above start with the root note of the chord. This is a solid, dependable thing to do, and greatly aids the jazz bass player&amp;rsquo;s job of showing the others what chord is being played. A step toward more advanced bass playing is getting away from this&amp;ndash;for example, starting on the 3 or the 5 of the chord&amp;ndash;while still making it clear to the rest of the group exactly which chord is happening. (They should already know, but still, you and the drummer and the piano or guitar player are providing the cake of which the other player&amp;rsquo;s solos are the frosting.)&lt;/p&gt;
&lt;p&gt;Using more non-chord tones, the way 1232 and 1235 do above, is also a way to move past beginner status, as is moving beyond playing four quarter notes for every bar. As a first step to moving beyond the patterns above, try substituting 8 for 1 in more of the patterns, and try coming up with your own combinations of 1, 3, 5, 7, and 8. And, listen to great bass players. My favorites are Paul Chambers and Ray Brown, but if you listen to older, pre-bebop jazz, you&amp;rsquo;ll hear more of these simple patterns come up more often.&lt;/p&gt;
&lt;img id=&#34;idm45853255650304&#34; width=&#34;500&#34; src=&#34;https://www.bobdc.com/img/main/bassCrashCourse/JayeBobVictor.jpg&#34; border=&#34;0&#34; style=&#34;display: block;margin-left: auto;margin-right: auto &#34; alt=&#34;Jaye, Bob, and Victor of Jazz Collective \#9&#34; hspace=&#34;30px&#34; vspace=&#34;30px&#34;/&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2018">2018</category>
      
      <category domain="https://www.bobdc.com//categories/music">music</category>
      
    </item>
    
    <item>
      <title>JavaScript SPARQL</title>
      <link>https://www.bobdc.com/blog/javascript-sparql/</link>
      <pubDate>Sun, 28 Jan 2018 09:35:35 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/javascript-sparql/</guid>
      
      
      <description><div>With rdfstore-js.</div><div>&lt;blockquote id=&#34;idm45782206154720&#34; class=&#34;pullquote&#34;&gt;... all in the world&#39;s most popular programming language.&lt;/blockquote&gt;
&lt;p&gt;I finally had a chance to play with &lt;a href=&#34;https://github.com/antoniogarrote/rdfstore-js&#34;&gt;rdfstore-js&lt;/a&gt; by &lt;a href=&#34;https://twitter.com/antoniogarrote&#34;&gt;Antonio Garrote&lt;/a&gt; and it was all pretty straightforward. I already had node.js installed, so a simple &lt;code&gt;npm install js&lt;/code&gt; installed his library. Then, I was ready to include the library in a JavaScript script that would read some RDF and query it with SPARQL. I just ran my script &lt;a href=&#34;https://nodejs.org/api/cli.html&#34;&gt;from the command line&lt;/a&gt;, but node.js fans know that they can take advantage of this library&amp;rsquo;s features in much more interesting application architectures. (Before I go on, I wanted to mention that after I tweeted yesterday that this blog entry was coming, &lt;a href=&#34;https://twitter.com/andyseaborne&#34;&gt;Andy Seaborne&lt;/a&gt; reminded me about Apache Jena&amp;rsquo;s ability to load and run JavaScript functions. I tried the example from the feature&amp;rsquo;s &lt;a href=&#34;http://jena.staging.apache.org/documentation/query/javascript-functions.html&#34;&gt;home page&lt;/a&gt; and it worked great right out of the box.)&lt;/p&gt;
&lt;p&gt;My sample script starts with a function I wrote for general-purpose output of SPARQL SELECT queries, then creates an &lt;code&gt;rdfstore&lt;/code&gt; object and saves a query that will be used twice later in the script. After loading some RDF data about my book &lt;a href=&#34;http://www.learningsparql.com&#34;&gt;Learning SPARQL&lt;/a&gt; from the OCLC&amp;rsquo;s &lt;a href=&#34;https://www.worldcat.org/&#34;&gt;Worldcat&lt;/a&gt; online library catalog into the rdfstore, it runs the saved query against the loaded data to list ISBN numbers. The script then loads data about another book, runs the same query, and you can see the additional ISBN numbers in the new output.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;// Utility function for outputting SELECT results
function outputSPARQLResults(results) {
    for (row in results) {
        printedLine = &#39;&#39;
        for (column in results[row]) {
            printedLine = printedLine + results[row][column].value + &#39; &#39;
        }
        console.log(printedLine)
    }
}


// Create an rdfstore
var rdfstore = require(&#39;rdfstore&#39;) 


// Define a query to execute.
var listISBNs = &#39;PREFIX s: &amp;lt;http://schema.org/&amp;gt; \
PREFIX ls: &amp;lt;http://learningsparql.com/ns/data#&amp;gt; \
PREFIX wco: &amp;lt;http://www.worldcat.org/title/-/oclc/&amp;gt; \
PREFIX wci: &amp;lt;http://worldcat.org/isbn/&amp;gt; \
SELECT ?isbn \
FROM ls:g1 WHERE { ?book s:isbn ?isbn } &#39;


rdfstore.create(function(err, store) {   // no error handling

   
    store.execute(
        // Load data about the book Learning SPARQL into named graph g1 in the rdfstore.
        &#39;LOAD &amp;lt;http://worldcat.org/oclc/890467322.ttl&amp;gt; \
        INTO GRAPH &amp;lt;http://learningsparql.com/ns/data#g1&amp;gt;&#39;, function(err) {


            store.setPrefix(&#39;s&#39;, &#39;http://schema.org/&#39;)
            store.setPrefix(&#39;ls&#39;, &#39;http://learningsparql.com/ns/data#&#39;)
            store.setPrefix(&#39;wco&#39;, &#39;http://www.worldcat.org/title/-/oclc/&#39;)
            store.setPrefix(&#39;wci&#39;, &#39;http://worldcat.org/isbn/&#39;)

           
        store.execute(listISBNs, function(err, results) {
                console.log(&amp;quot;=== ISBN value ===&amp;quot;)
                outputSPARQLResults(results)
        })
        }
    )


    store.execute(
        // Load data about the book &amp;quot;XML: The Annotated Specification&amp;quot; into the same graph
        &#39;LOAD &amp;lt;http://worldcat.org/oclc/40768745.ttl&amp;gt; \
        INTO GRAPH &amp;lt;http://learningsparql.com/ns/data#g1&amp;gt;&#39;, function(err) {
        store.execute(listISBNs, function(err, results) {
                console.log(&amp;quot;\n=== ISBN values after adding 2nd book&#39;s data ===&amp;quot;)
                outputSPARQLResults(results)
        })
        }
    )

    
})
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The script produces this output:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;=== ISBN value ===
9781449371432 
1449371434 


=== ISBN values after adding 2nd book&#39;s data ===
9781449371432 
1449371434 
9780130826763 
0130826766
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I loaded the data into a named graph because the library documentation&amp;rsquo;s sample query for loading remote data did. I briefly tried loading the data into the default graph, but had no luck; I&amp;rsquo;m all for the use of name graphs, anyway. I also tried deleting triples from and inserting them into the &lt;code&gt;g1&lt;/code&gt; named graph and then querying again to see the results, and I didn&amp;rsquo;t have much luck there either (no error messages&amp;ndash;I just didn&amp;rsquo;t see the query results I expected after the deletion and insertion) , but my minimal understanding of node.js asynchronous behavior was probably to blame. The library&amp;rsquo;s &lt;a href=&#34;https://github.com/antoniogarrote/rdfstore-js&#34;&gt;github page&lt;/a&gt; shows that it does support INSERT and DELETE queries.&lt;/p&gt;
&lt;p&gt;I wouldn&amp;rsquo;t use this library&amp;rsquo;s triplestore for ongoing production maintenance of a set of triples, anyway; I see it as a great lightweight way to grab triples from one or more sources and then perform SPARQL queries on those triples to look for subsets and patterns that can contribute to an application, all in the world&amp;rsquo;s &lt;a href=&#34;http://www.businessinsider.com/the-9-most-popular-programming-languages-according-to-the-facebook-for-programmers-2017-10/#1-javascript-15&#34;&gt;most popular programming language&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The rdfstore-js github page also shows that it offers many ways to query and manipulate the loaded data that, for a JavaScript programmer, would be more direct. If Antonio&amp;rsquo;s ultimate goal was to bring RDF to JavaScript developers, I won&amp;rsquo;t complain; I&amp;rsquo;m just glad that he brought a useful JavaScript library to RDF (and SPARQL) developers.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2018">2018</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
    </item>
    
    <item>
      <title>SPARQL and Amazon Web Service&#39;s Neptune database</title>
      <link>https://www.bobdc.com/blog/sparql-and-amazon-web-services/</link>
      <pubDate>Sun, 31 Dec 2017 09:53:14 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/sparql-and-amazon-web-services/</guid>
      
      
      <description><div>Promising news for large-scale RDF development.</div><div>&lt;p&gt;Amazon recently &lt;a href=&#34;https://yukon.aws.amazon.com/rds/gdb?region=us-east-1#&#34;&gt;announced&lt;/a&gt; Neptune as an AWS service. As its &lt;a href=&#34;https://aws.amazon.com/neptune/&#34;&gt;home page&lt;/a&gt; describes it,&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Amazon Neptune is a fast, scalable graph database service. Neptune efficiently stores and navigates highly connected data. Its query processing engine is optimized for leading graph query languages, Apache TinkerPop™ Gremlin and the W3C&amp;rsquo;s RDF SPARQL. Neptune provides high performance through the open and standard APIs of these graph frameworks. And, Neptune is fully managed, so you no longer need to worry about database management tasks such as hardware provisioning, software patching, setup, configuration, or backups.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Apart from the practical aspects of the scalable yet convenient use of RDF and SPARQL that Neptune will enable, it&amp;rsquo;s exciting to see such a high-profile acknowledgment of SPARQL as a serious development tool. &lt;a href=&#34;http://sparql.club/&#34;&gt;Many organizations&lt;/a&gt; already knew this, but judging from the reaction to the Neptune announcement on Twitter, many more people are finally understanding this.&lt;/p&gt;
&lt;blockquote id=&#34;idm139982883658048&#34; class=&#34;pullquote&#34;&gt;It&#39;s exciting to see such a high-profile acknowledgment of SPARQL as a serious development tool. &lt;/blockquote&gt;
&lt;p&gt;Rumors have been flying that the &lt;a href=&#34;https://www.blazegraph.com/&#34;&gt;Blazegraph&lt;/a&gt; triplestore may play some role in Amazon&amp;rsquo;s new graph store. As Stardog CEO &lt;a href=&#34;https://twitter.com/kendall&#34;&gt;Kendall Clark&lt;/a&gt; wrote on &lt;a href=&#34;https://news.ycombinator.com/item?id=15809687&#34;&gt;ycombinator recently&lt;/a&gt;, &amp;ldquo;Amazon acquired the domains, etc. Many former Blazegraph engineers are now Amazon Neptune engineers according to LinkedIn, etc. It was rumored widely in the graph db world fwiw.&amp;rdquo; Yahoo Knowledge Graph science and data lead &lt;a href=&#34;https://twitter.com/nicolastorzec&#34;&gt;Nicolas Torzec&lt;/a&gt; responded to Kendall&amp;rsquo;s comment with a link showing that &lt;a href=&#34;https://www.trademarkia.com/blazegraph-86498414.html&#34;&gt;Amazon now owns the Blazegraph trademark&lt;/a&gt;. (Blazegraph&amp;rsquo;s website hasn&amp;rsquo;t shown much activity in a while, with the latest post on their &lt;a href=&#34;https://www.blazegraph.com/press/&#34;&gt;Press&lt;/a&gt; page being from May of last year.)&lt;/p&gt;
&lt;p&gt;May of last year was also when I wrote &lt;a href=&#34;https://www.bobdc.com/blog/trying-out-blazegraph&#34;&gt;Trying out Blazegraph&lt;/a&gt; about my positive experiences about this graph store, and after the recent announcement I &lt;a href=&#34;https://twitter.com/bobdc/status/936995695655116800&#34;&gt;tweeted&lt;/a&gt; that if Blazegraph was part of Neptune, it would be very cool if that included Blazegraph&amp;rsquo;s inferencing. Pavel Klinov replied by &lt;a href=&#34;https://twitter.com/klinovp/status/937599682582319105&#34;&gt;pointing out&lt;/a&gt; a &lt;a href=&#34;https://www.youtube.com/watch?v=6o1Ezf6NZ_E&#34;&gt;Neptune announcement video&lt;/a&gt; where they explicitly say that inferencing is not supported.&lt;/p&gt;
&lt;p&gt;This hour-long &amp;ldquo;AWS re:Invent 2017: NEW LAUNCH! Deep dive on Amazon Neptune&amp;rdquo; video included some other interesting points. Because Neptune supports property graphs via Tinkerpop as well as SPARQL, early in the video the speaker provides some &lt;a href=&#34;https://www.youtube.com/watch?v=6o1Ezf6NZ_E#t=8m20s&#34;&gt;background on property graphs versus RDF&lt;/a&gt;. He devotes a good portion of his presentation to talking through an SQL query for people who are unfamiliar with graph databases and then covering comparable SPARQL and Tinkerpop Gremlin queries.&lt;/p&gt;
&lt;p&gt;The &lt;a href=&#34;https://www.youtube.com/watch?v=6o1Ezf6NZ_E#t=4m51s&#34;&gt;plug from Thomson Reuters&lt;/a&gt; early in the video was nice to see, coming from a large well-known organization that has been taking SPARQL seriously for a while. Later in the video, &lt;a href=&#34;https://www.youtube.com/watch?v=6o1Ezf6NZ_E#t=21m50s&#34;&gt;one slide&amp;rsquo;s&lt;/a&gt; use of Thomson Reuter&amp;rsquo;s &lt;a href=&#34;https://permid.org/&#34;&gt;PermID&lt;/a&gt; vocabulary with the geonames vocabulary in the same triple was especially nice to see, because while the extent of RDF&amp;rsquo;s usage continues to be a pleasant surprise for me, I&amp;rsquo;m also surprised by how many people only use it for the simplicity of the triples data model&amp;ndash;they&amp;rsquo;re missing the data integration power of the ability to mix and match the wide variety of existing vocabularies (and hence data sources) with their own data.&lt;/p&gt;
&lt;p&gt;The video&amp;rsquo;s &lt;a href=&#34;https://www.youtube.com/watch?v=6o1Ezf6NZ_E#t=26m&#34;&gt;second speaker&lt;/a&gt; talks more about Neptune&amp;rsquo;s enterprise features such as fast failover, encryption at rest and in transit, and backup and restore, which are all great things to see in a cloud-based triplestore. Neptune offers a lot of room; as this speaker &lt;a href=&#34;https://www.youtube.com/watch?v=6o1Ezf6NZ_E#t=30m58s&#34;&gt;mentions&lt;/a&gt;, &amp;ldquo;Storage volumes are not required to be statically allocated; they actually grow automatically up to a maximum size of 64 terabytes.&amp;rdquo; The ability to &lt;a href=&#34;https://www.youtube.com/watch?v=6o1Ezf6NZ_E#t=36m10s&#34;&gt;restore a dataset to its state from a previous point in time&lt;/a&gt; also sounds very useful.&lt;/p&gt;
&lt;p&gt;Once the speakers &lt;a href=&#34;https://www.youtube.com/watch?v=6o1Ezf6NZ_E#t=38m07s&#34;&gt;started taking questions&lt;/a&gt;, it looked to me like there were more questions about RDF and SPARQL than there were about Tinkerpop and Gremlin. The former included the &lt;a href=&#34;https://www.youtube.com/watch?v=6o1Ezf6NZ_E#t=41m32s&#34;&gt;question about inferencing&lt;/a&gt;, which got a response (as Pavel had pointed out to me) of &amp;ldquo;we do not have in-database inference currently&amp;hellip; we are very interested in use cases for inferencing.&amp;rdquo; They also &lt;a href=&#34;https://www.youtube.com/watch?v=6o1Ezf6NZ_E#t=39m45s&#34;&gt;said&lt;/a&gt; that Neptune&amp;rsquo;s underlying graph engine was custom-built by Amazon as a graph system, which left me more curious about the potential role of Blazegraph in the released version of Neptune. (Maybe &amp;ldquo;by Amazon&amp;rdquo; includes former Blazegraph engineers.)&lt;/p&gt;
&lt;p&gt;Some more interesting facts from the question and answer session:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;https://www.youtube.com/watch?v=6o1Ezf6NZ_E#t=42m30s&#34;&gt;Timeouts&lt;/a&gt; of SPARQL end points can be configured.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;They have &lt;a href=&#34;https://www.youtube.com/watch?v=6o1Ezf6NZ_E#t=43m&#34;&gt;tested it&lt;/a&gt; with pretty close to a hundred billion triples. (Who remembers the &lt;a href=&#34;http://km.aifb.kit.edu/projects/btc-2014/&#34;&gt;Billion Triples Challenge&lt;/a&gt;?)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;https://www.youtube.com/watch?v=6o1Ezf6NZ_E#t=43m19s&#34;&gt;Neptune supports release 1.1&lt;/a&gt; of SPARQL &lt;a href=&#34;https://www.w3.org/TR/sparql11-query/&#34;&gt;Query&lt;/a&gt; and &lt;a href=&#34;https://www.w3.org/TR/2013/REC-sparql11-update-20130321/&#34;&gt;Update&lt;/a&gt;, and the endpoint supports 1.1 of the SPARQL &lt;a href=&#34;https://www.w3.org/TR/2013/REC-sparql11-protocol-20130321/&#34;&gt;Protocol&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;It supports &lt;a href=&#34;https://www.youtube.com/watch?v=6o1Ezf6NZ_E#t=44m06s&#34;&gt;named graphs&lt;/a&gt;, which will be particularly handy for managing multiple datasets when dealing with data at that scale.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;While the preview configuration of Neptune &lt;a href=&#34;https://www.youtube.com/watch?v=6o1Ezf6NZ_E#t=49m30s&#34;&gt;does not allow federated SPARQL queries for security reasons&lt;/a&gt;, they &amp;ldquo;do see a lot of use cases for SPARQL federation.&amp;rdquo;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;While Neptune currently doesn&amp;rsquo;t support &amp;ldquo;&lt;a href=&#34;https://www.youtube.com/watch?v=6o1Ezf6NZ_E#t=51m02s&#34;&gt;schema concepts or constraints in the graph schema&lt;/a&gt;,&amp;rdquo; it &amp;ldquo;is something that [they] have on their roadmap.&amp;rdquo; The Amazon rep first responded to this question by asking if the questioner was talking about something like &lt;a href=&#34;https://www.bobdc.com/blog/validating-rdf-data-with-shacl&#34;&gt;SHACL&lt;/a&gt;; although they do not currently support this, just hearing him mention SHACL showed me that this great new standard is gaining some mindshare out there.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I&amp;rsquo;m looking forward to playing with SPARQL on AWS Neptune and will certainly be reporting back about my experiences here.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2017">2017</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
    </item>
    
    <item>
      <title>SPARQL queries of Beatles recording sessions</title>
      <link>https://www.bobdc.com/blog/sparql-queries-of-beatles-reco/</link>
      <pubDate>Sun, 19 Nov 2017 10:40:53 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/sparql-queries-of-beatles-reco/</guid>
      
      
      <description><div>Who played what when?</div><div>&lt;img id=&#34;idm140697882958784&#34; src=&#34;https://www.bobdc.com/img/main/sparqlbeatles.png&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;SPARQL and Beatles logos&#34;/&gt;
&lt;p&gt;While listening to the song &lt;a href=&#34;https://www.youtube.com/watch?v=ERoS6y5zE0Y&#34;&gt;Dear Life&lt;/a&gt; on the new Beck album, I wondered who played the piano on the Beatles&amp;rsquo; &lt;a href=&#34;https://www.youtube.com/watch?v=xT4a5RYLBSw&#34;&gt;Martha My Dear&lt;/a&gt;. A web search found the website &lt;a href=&#34;https://www.beatlesbible.com&#34;&gt;Beatles Bible&lt;/a&gt;, where the &lt;a href=&#34;https://www.beatlesbible.com/songs/martha-my-dear/&#34;&gt;Martha My Dear&lt;/a&gt; page showed that it was Paul.&lt;/p&gt;
&lt;p&gt;This was not a big surprise, but one pleasant surprise was how that page listed absolutely everyone who played on the song and what they played. For example, a musician named Leon Calvert played both trumpet and flugelhorn. The site&amp;rsquo;s &lt;a href=&#34;https://www.beatlesbible.com/songs/&#34;&gt;Beatles&amp;rsquo; Songs&lt;/a&gt; page links to pages for every song, listing everyone who played on them, with very few exceptions&amp;ndash;for example, for giant Phil Spector productions like &lt;a href=&#34;https://www.beatlesbible.com/songs/the-long-and-winding-road/&#34;&gt;The Long and Winding Road&lt;/a&gt;, it does list all the instruments, but not who played them. On the other hand, for the orchestra on &lt;a href=&#34;https://www.beatlesbible.com/songs/a-day-in-the-life/&#34;&gt;A Day in the Life&lt;/a&gt;, it lists the individual names of all 12 violin players, all 4 violists, and the other 25 or so musicians who joined the Fab Four for that.&lt;/p&gt;
&lt;p&gt;An especially nice surprise on this website was how syntactically consistent the listings were, leading me to think &amp;ldquo;with some curl commands, python scripting, and some regular expressions, I could, &lt;a href=&#34;https://www.youtube.com/watch?v=F3kky9yMm14#t=0m05s&#34;&gt;dare I say it&lt;/a&gt;, convert all these listings to an RDF database of everyone who played on everything, then do some really cool SPARQL queries!&amp;rdquo;&lt;/p&gt;
&lt;p&gt;So I did, and the RDF is available in the file &lt;a href=&#34;http://www.snee.com/bobdc.blog/files/BeatlesMusicians.ttl&#34;&gt;BeatlesMusicians.ttl&lt;/a&gt;. The great part about having this is the ability to query across the songs to find out things such as how many different people played a given instrument on Beatles recordings or what songs a given person may have played on, regardless of instrument. In a pop music geek kind of way, it&amp;rsquo;s been kind of exciting to think that I could ask and answer questions about the Beatles that may have never been answered before.&lt;/p&gt;
&lt;p&gt;Here are three typical triples. All of these resources have corresponding rdfs:label values to make query output look nicer:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;t:HereComesTheSun i:Moogsynthesiser  m:GeorgeHarrison .
t:EleanorRigby    i:cello            m:NormanJones, m:DerekSimpson .
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Here are some of the queries I entered.&lt;/p&gt;
&lt;h2 id=&#34;idm140697882945360&#34;&gt;Who ever played piano for the Beatles, and on how many songs?&lt;/h2&gt;
&lt;pre&gt;&lt;code&gt;PREFIX  i:     &amp;lt;http://learningsparql.com/ns/instrument/&amp;gt; 
PREFIX  rdfs:  &amp;lt;http://www.w3.org/2000/01/rdf-schema#&amp;gt; 


SELECT ?pianistName (COUNT(?pianist) AS ?pianistCount)  WHERE {
  ?song i:piano ?pianist .
  ?pianist rdfs:label ?pianistName . 
}
GROUP BY ?pianistName
ORDER BY DESC(?pianistCount)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The result:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;-------------------------------------
| pianistName        | pianistCount |
=====================================
| &amp;quot;Paul McCartney&amp;quot;   | 60           |
| &amp;quot;George Martin&amp;quot;    | 22           |
| &amp;quot;John Lennon&amp;quot;      | 16           |
| &amp;quot;John &#39;Duff&#39; Lowe&amp;quot; | 2            |
| &amp;quot;Chris Thomas&amp;quot;     | 1            |
| &amp;quot;Kenny Powell&amp;quot;     | 1            |
| &amp;quot;Mal Evans&amp;quot;        | 1            |
| &amp;quot;Ringo Starr&amp;quot;      | 1            |
-------------------------------------
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Paul&amp;rsquo;s number one spot is no surprise, and these results and other data do support any assertion that George Martin truly was the fifth Beatle. Seeing &lt;a href=&#34;https://en.wikipedia.org/wiki/Chris_Thomas_(record_producer)&#34;&gt;Chris Thomas&lt;/a&gt; there was a surprise to me; he went on to produce the Sex Pistols album, the first three Pretenders albums, the second through fifth Roxy Music albums, and many more classics. And, we have to wonder &amp;ldquo;what song had Ringo on piano?&amp;rdquo; As we&amp;rsquo;ll see, that was easy enough to query.&lt;/p&gt;
&lt;p&gt;This variation on the query above is slightly broader, because it looks for people who played any instruments with the string &amp;ldquo;piano&amp;rdquo; in their name:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;PREFIX  rdfs:  &amp;lt;http://www.w3.org/2000/01/rdf-schema#&amp;gt; 


SELECT ?pianistName (COUNT(?pianist) AS ?pianistCount)  WHERE {
  ?instrument rdfs:label ?instrumentName . 
  FILTER(contains(?instrumentName,&amp;quot;piano&amp;quot;))
  ?song ?instrument ?pianist . 
  ?pianist rdfs:label ?pianistName . 
}
GROUP BY ?pianistName
ORDER BY DESC(?pianistCount)


-------------------------------------
| pianistName        | pianistCount |
=====================================
| &amp;quot;Paul McCartney&amp;quot;   | 67           |
| &amp;quot;George Martin&amp;quot;    | 22           |
| &amp;quot;John Lennon&amp;quot;      | 20           |
| &amp;quot;Billy Preston&amp;quot;    | 6            |
| &amp;quot;Chris Thomas&amp;quot;     | 2            |
| &amp;quot;John &#39;Duff&#39; Lowe&amp;quot; | 2            |
| &amp;quot;Kenny Powell&amp;quot;     | 1            |
| &amp;quot;Mal Evans&amp;quot;        | 1            |
| &amp;quot;Nicky Hopkins&amp;quot;    | 1            |
| &amp;quot;Ringo Starr&amp;quot;      | 1            |
-------------------------------------
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This raises Paul and John&amp;rsquo;s numbers and adds Nicky Hopkins (who also did important piano work for the Stones, the Kinks, and the Who) and Billy Preston, who in addition to the &lt;a href=&#34;https://www.youtube.com/watch?v=p6gKe9Fr2ok#t=1m15s&#34;&gt;electric piano on Get Back&lt;/a&gt;, apparently played on five other songs. (The increase in numbers isn&amp;rsquo;t all from electric pianos, but also from the &lt;a href=&#34;https://en.wikipedia.org/wiki/Pianet&#34;&gt;pianet&lt;/a&gt; that John and Paul each played once or twice.)&lt;/p&gt;
&lt;h2 id=&#34;idm140697882935536&#34;&gt;What song had Ringo on piano?&lt;/h2&gt;
&lt;pre&gt;&lt;code&gt;PREFIX  rdfs:  &amp;lt;http://www.w3.org/2000/01/rdf-schema#&amp;gt; 
PREFIX  i:     &amp;lt;http://learningsparql.com/ns/instrument/&amp;gt; 
PREFIX  m:     &amp;lt;http://learningsparql.com/ns/musician/&amp;gt; 


SELECT ?song WHERE {
  ?songURI i:piano m:RingoStarr .
  ?songURI rdfs:label ?song .
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The result is a White Album song that Ringo apparently wrote himself:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;----------------------------
| song                     |
============================
| &amp;quot;Don&#39;t Pass Me By&amp;quot; |
----------------------------
&lt;/code&gt;&lt;/pre&gt;
&lt;h2 id=&#34;idm140697882933024&#34;&gt;Who were all the cellists the Beatles ever used, and on what songs?&lt;/h2&gt;
&lt;pre&gt;&lt;code&gt;PREFIX  i:     &amp;lt;http://learningsparql.com/ns/instrument/&amp;gt; 
PREFIX  rdfs:  &amp;lt;http://www.w3.org/2000/01/rdf-schema#&amp;gt; 


SELECT?name ?songTitle WHERE {
  ?song i:cello ?musician .
  ?song rdfs:label ?songTitle .
  ?musician rdfs:label ?name . 
}
ORDER BY ?name
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The result:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;-------------------------------------------------------
| name                  | songTitle                   |
=======================================================
| &amp;quot;Alan Dalziel&amp;quot;        | &amp;quot;A Day In The Life&amp;quot;         |
| &amp;quot;Alan Dalziel&amp;quot;        | &amp;quot;She&#39;s Leaving Home&amp;quot;  |
| &amp;quot;Alex Nifosi&amp;quot;         | &amp;quot;A Day In The Life&amp;quot;         |
| &amp;quot;Allen Ford&amp;quot;          | &amp;quot;Within You Without You&amp;quot;    |
| &amp;quot;Bram Martin&amp;quot;         | &amp;quot;I Am The Walrus&amp;quot;           |
| &amp;quot;Dennis Vigay&amp;quot;        | &amp;quot;A Day In The Life&amp;quot;         |
| &amp;quot;Dennis Vigay&amp;quot;        | &amp;quot;She&#39;s Leaving Home&amp;quot;  |
| &amp;quot;Derek Simpson&amp;quot;       | &amp;quot;Eleanor Rigby&amp;quot;             |
| &amp;quot;Derek Simpson&amp;quot;       | &amp;quot;Strawberry Fields Forever&amp;quot; |
| &amp;quot;Eldon Fox&amp;quot;           | &amp;quot;Glass Onion&amp;quot;               |
| &amp;quot;Eldon Fox&amp;quot;           | &amp;quot;I Am The Walrus&amp;quot;           |
| &amp;quot;Eldon Fox&amp;quot;           | &amp;quot;Piggies&amp;quot;                   |
| &amp;quot;Francisco Gabarro&amp;quot;   | &amp;quot;A Day In The Life&amp;quot;         |
| &amp;quot;Francisco Gabarro&amp;quot;   | &amp;quot;Yesterday&amp;quot;                 |
| &amp;quot;Frederick Alexander&amp;quot; | &amp;quot;Martha My Dear&amp;quot;            |
| &amp;quot;Jack Holmes&amp;quot;         | &amp;quot;All You Need Is Love&amp;quot;      |
| &amp;quot;John Hall&amp;quot;           | &amp;quot;Strawberry Fields Forever&amp;quot; |
| &amp;quot;Lionel Ross&amp;quot;         | &amp;quot;All You Need Is Love&amp;quot;      |
| &amp;quot;Lionel Ross&amp;quot;         | &amp;quot;I Am The Walrus&amp;quot;           |
| &amp;quot;Norman Jones&amp;quot;        | &amp;quot;Eleanor Rigby&amp;quot;             |
| &amp;quot;Norman Jones&amp;quot;        | &amp;quot;Strawberry Fields Forever&amp;quot; |
| &amp;quot;Peter Beavan&amp;quot;        | &amp;quot;Within You Without You&amp;quot;    |
| &amp;quot;Peter Willison&amp;quot;      | &amp;quot;Blue Jay Way&amp;quot;              |
| &amp;quot;Reginald Kilbey&amp;quot;     | &amp;quot;Glass Onion&amp;quot;               |
| &amp;quot;Reginald Kilbey&amp;quot;     | &amp;quot;Martha My Dear&amp;quot;            |
| &amp;quot;Reginald Kilbey&amp;quot;     | &amp;quot;Piggies&amp;quot;                   |
| &amp;quot;Reginald Kilbey&amp;quot;     | &amp;quot;Within You Without You&amp;quot;    |
| &amp;quot;Terry Weil&amp;quot;          | &amp;quot;I Am The Walrus&amp;quot;           |
| &amp;quot;Uncredited&amp;quot;          | &amp;quot;Let It Be&amp;quot;                 |
-------------------------------------------------------
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I have no reason to recognize any of the names here, but I when I sent the URL of the &lt;a href=&#34;https://www.beatlesbible.com/songs/shes-leaving-home/&#34;&gt;She&amp;rsquo;s Leaving Home&lt;/a&gt; page to a friend who was a London session string player in the sixties, he said that the members of the double string quartet on on that song were very top people and that some were friends of his.&lt;/p&gt;
&lt;h2 id=&#34;idm140697882924080&#34;&gt;Who played on how many songs, period?&lt;/h2&gt;
&lt;pre&gt;&lt;code&gt;PREFIX  rdfs:  &amp;lt;http://www.w3.org/2000/01/rdf-schema#&amp;gt; 


SELECT ?playerName (COUNT(?player) AS ?playerCount)  WHERE {
  ?song ?instrument ?player . 
  ?player rdfs:label ?playerName . 
}
GROUP BY ?playerName
# don&#39;t bother with people who only played on one song
HAVING (COUNT(?player) &amp;gt; 1)        
ORDER BY DESC(?playerCount)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The result:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;--------------------------------------
| playerName           | playerCount |
======================================
| &amp;quot;Paul McCartney&amp;quot;     | 678         |
| &amp;quot;John Lennon&amp;quot;        | 576         |
| &amp;quot;George Harrison&amp;quot;    | 502         |
| &amp;quot;Ringo Starr&amp;quot;        | 412         |
| &amp;quot;Uncredited&amp;quot;         | 58          |
| &amp;quot;George Martin&amp;quot;      | 45          |
| &amp;quot;Unknown&amp;quot;            | 28          |
| &amp;quot;Mal Evans&amp;quot;          | 16          |
| &amp;quot;Pete Best&amp;quot;          | 14          |
| &amp;quot;Billy Preston&amp;quot;      | 11          |
| &amp;quot;Tony Sheridan&amp;quot;      | 8           |
| &amp;quot;Chris Thomas&amp;quot;       | 6           |
| &amp;quot;John Underwood&amp;quot;     | 5           |
| &amp;quot;Neil Aspinall&amp;quot;      | 5           |
| &amp;quot;Sidney Sax&amp;quot;         | 5           |
| &amp;quot;Yoko Ono&amp;quot;           | 5           |
| &amp;quot;David Mason&amp;quot;        | 4           |
| &amp;quot;Jeff Lynne&amp;quot;         | 4           |
| &amp;quot;Reginald Kilbey&amp;quot;    | 4           |
| &amp;quot;Eldon Fox&amp;quot;          | 3           |
| &amp;quot;Eric Bowie&amp;quot;         | 3           |
| &amp;quot;Erich Gruenberg&amp;quot;    | 3           |
| &amp;quot;Harry Klein&amp;quot;        | 3           |
| &amp;quot;Henry Datyner&amp;quot;      | 3           |
| &amp;quot;Leon Calvert&amp;quot;       | 3           |
| &amp;quot;Neil Sanders&amp;quot;       | 3           |
| &amp;quot;Pattie Harrison&amp;quot;    | 3           |
| &amp;quot;Rex Morris&amp;quot;         | 3           |
| &amp;quot;Stuart Sutcliffe&amp;quot;   | 3           |
| &amp;quot;Alan Civil&amp;quot;         | 2           |
| &amp;quot;Alan Dalziel&amp;quot;       | 2           |
| &amp;quot;Andy White&amp;quot;         | 2           |
| &amp;quot;Bill Povey&amp;quot;         | 2           |
| &amp;quot;Brian Jones&amp;quot;        | 2           |
| &amp;quot;Colin Hanton&amp;quot;       | 2           |
| &amp;quot;Dennis Vigay&amp;quot;       | 2           |
| &amp;quot;Dennis Walton&amp;quot;      | 2           |
| &amp;quot;Derek Simpson&amp;quot;      | 2           |
| &amp;quot;Derek Watkins&amp;quot;      | 2           |
| &amp;quot;Eric Clapton&amp;quot;       | 2           |
| &amp;quot;Francisco Gabarro&amp;quot;  | 2           |
| &amp;quot;Fred Lucas&amp;quot;         | 2           |
| &amp;quot;Freddy Clayton&amp;quot;     | 2           |
| &amp;quot;Geoff Emerick&amp;quot;      | 2           |
| &amp;quot;Gordon Pearce&amp;quot;      | 2           |
| &amp;quot;Irene King&amp;quot;         | 2           |
| &amp;quot;Jack Greene&amp;quot;        | 2           |
| &amp;quot;Jack Rothstein&amp;quot;     | 2           |
| &amp;quot;John &#39;Duff&#39; Lowe&amp;quot;   | 2           |
| &amp;quot;Johnnie Scott&amp;quot;      | 2           |
| &amp;quot;Jurgen Hess&amp;quot;        | 2           |
| &amp;quot;Keith Cummings&amp;quot;     | 2           |
| &amp;quot;Kenneth Essex&amp;quot;      | 2           |
| &amp;quot;Leo Birnbaum&amp;quot;       | 2           |
| &amp;quot;Lionel Ross&amp;quot;        | 2           |
| &amp;quot;Mahapurush Misra&amp;quot;   | 2           |
| &amp;quot;Marianne Faithfull&amp;quot; | 2           |
| &amp;quot;Mick Jagger&amp;quot;        | 2           |
| &amp;quot;Mike Redway&amp;quot;        | 2           |
| &amp;quot;Norman Jones&amp;quot;       | 2           |
| &amp;quot;Norman Lederman&amp;quot;    | 2           |
| &amp;quot;Norman Smith&amp;quot;       | 2           |
| &amp;quot;Other musicians&amp;quot;    | 2           |
| &amp;quot;Pat Whitmore&amp;quot;       | 2           |
| &amp;quot;Ralph Elman&amp;quot;        | 2           |
| &amp;quot;Ronald Thomas&amp;quot;      | 2           |
| &amp;quot;Stephen Shingles&amp;quot;   | 2           |
| &amp;quot;Tony Gilbert&amp;quot;       | 2           |
| &amp;quot;Tony Tunstall&amp;quot;      | 2           |
| &amp;quot;Tristan Fry&amp;quot;        | 2           |
| &amp;quot;Victor Spinetti&amp;quot;    | 2           |
--------------------------------------
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;No big surprises in the top 10 but there definitely are after that. For example&amp;hellip;&lt;/p&gt;
&lt;h2 id=&#34;idm140697882915408&#34;&gt;What 4 Beatles tracks did ELO founder Jeff Lynne play on?&lt;/h2&gt;
&lt;pre&gt;&lt;code&gt;PREFIX  rdfs:  &amp;lt;http://www.w3.org/2000/01/rdf-schema#&amp;gt; 
PREFIX  m:     &amp;lt;http://learningsparql.com/ns/musician/&amp;gt; 


SELECT ?instrument ?songName WHERE {
  ?song ?instrumentURI m:JeffLynne .
  ?song rdfs:label ?songName .
  ?instrumentURI rdfs:label ?instrument . 
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Apparently he sang and played overdubs with Paul, George, and Ringo on some John demos, after John died, as &amp;ldquo;new&amp;rdquo; Beatle material to go with the &lt;a href=&#34;https://en.wikipedia.org/wiki/The_Beatles_Anthology&#34;&gt;Anthology&lt;/a&gt; documentary and rereleases.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;---------------------------------------
| instrument       | songName         |
=======================================
| &amp;quot;backing vocals&amp;quot; | &amp;quot;Real Love&amp;quot;      |
| &amp;quot;guitar&amp;quot;         | &amp;quot;Real Love&amp;quot;      |
| &amp;quot;harmony vocals&amp;quot; | &amp;quot;Free As A Bird&amp;quot; |
| &amp;quot;guitar&amp;quot;         | &amp;quot;Free As A Bird&amp;quot; |
---------------------------------------
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;As you look through the big list of musicians above, you&amp;rsquo;ll probably want to plug more names into that last query. For example, any Beatles or Eric Clapton fan knows that he played the guitar solo on &lt;a href=&#34;https://www.youtube.com/watch?v=D-dONCnY_Yg#t=2m00s&#34;&gt;While My Guitar Gently Weeps&lt;/a&gt;, but why does he get a &amp;ldquo;2&amp;rdquo; up there? It turns out that he and some other big names sang backing vocals on &lt;a href=&#34;https://www.beatlesbible.com/songs/all-you-need-is-love/&#34;&gt;All You Need Is Love&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Let me know what kinds of queries and results you come up with!&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2017">2017</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
    </item>
    
    <item>
      <title>An HTML form trick to add some convenience to life</title>
      <link>https://www.bobdc.com/blog/an-html-form-trick-to-add-some/</link>
      <pubDate>Sun, 29 Oct 2017 10:07:24 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/an-html-form-trick-to-add-some/</guid>
      
      
      <description><div>With a little JavaScript as needed.</div><div>&lt;p&gt;On the computers that I use the most, the browser home page is an HTML file with links to my favorite pages and a &amp;ldquo;single&amp;rdquo; form that lets me search the sites that I search the most. I can enter a search term in the field for any of the sites, press Enter, and then that site gets searched. The two tricks that I use to create these fields have been handy enough that I thought I&amp;rsquo;d share them in case they&amp;rsquo;re useful to others.&lt;/p&gt;
&lt;p&gt;I quote the word &amp;ldquo;single&amp;rdquo; above because it appears to be a single form but is actually multiple little forms in the HTML. Here is an example with four of my entries; enter something into any of the fields and press Enter to see what I mean:&lt;/p&gt;
&lt;hr /&gt;
&lt;p&gt;wikipedia    &lt;form id=&#34;idm139672803255216&#34; action=&#34;http://en.wikipedia.org/wiki/Special:Search&#34;&gt; &lt;input id=&#34;idm139672803254608&#34; type=&#34;text&#34; name=&#34;search&#34; autofocus=&#34;autofocus&#34;/&gt; &lt;/form&gt;
youtube      &lt;form id=&#34;idm139672803252864&#34; action=&#34;http://www.youtube.com/results&#34;&gt; &lt;input id=&#34;idm139672803252320&#34; type=&#34;text&#34; name=&#34;search_query&#34;/&gt; &lt;/form&gt;
dictionary   &lt;form id=&#34;idm139672803250800&#34; name=&#34;dictionaryform&#34; action=&#34;javascript:window.location.href = &#39;http://www.dictionary.com/browse/&#39;.concat(document.dictionaryform[&#39;queryword&#39;].value)&#34;&gt; &lt;input id=&#34;idm139672803249888&#34; type=&#34;text&#34; name=&#34;queryword&#34;/&gt; &lt;/form&gt;
whois        &lt;form id=&#34;idm139672803248416&#34; name=&#34;whoisform&#34; action=&#34;javascript:window.location.href = &#39;http://www.whois.com/whois/&#39;.concat(document.whoisform[&#39;domain&#39;].value)&#34;&gt; &lt;input id=&#34;idm139672803247520&#34; type=&#34;text&#34; name=&#34;domain&#34;/&gt; &lt;/form&gt;&lt;/p&gt;
&lt;hr /&gt;
&lt;p&gt;The first two fields search the way most search forms do, by passing a search string as a parameter to some back end process. To add one of these fields to my form, I just had to look at the source of the actual website&amp;rsquo;s search form to find out what variable it was passing to what URL and then reproduce that in a little form around that field in my home page file. For Wikipedia, I set the form&amp;rsquo;s &lt;code&gt;action&lt;/code&gt; attribute to &amp;ldquo;&lt;a href=&#34;http://en.wikipedia.org/wiki/Special:Search%22&#34;&gt;http://en.wikipedia.org/wiki/Special:Search&amp;quot;&lt;/a&gt; and the &lt;code&gt;input&lt;/code&gt; element&amp;rsquo;s &lt;code&gt;name&lt;/code&gt; attribute to &amp;ldquo;search&amp;rdquo;. This way, if I enter &amp;ldquo;foobar&amp;rdquo; in my version of their search field above, the form creates the URL &lt;a href=&#34;https://en.wikipedia.org/wiki/Special:Search?search=foobar&#34;&gt;https://en.wikipedia.org/wiki/Special:Search?search=foobar&lt;/a&gt; to perform the search, and it works. (The &lt;code&gt;input&lt;/code&gt; field of the Wikipedia field also has the &lt;code&gt;autofocus&lt;/code&gt; field set to &amp;ldquo;autofocus&amp;rdquo; so that when a browser displays the page, the cursor is in that field, and I can then just press Tab a few times to quickly get to the others.) For YouTube there&amp;rsquo;s a different URL and the search parameter variable name is &amp;ldquo;search_query&amp;rdquo;, so I set the &lt;code&gt;name&lt;/code&gt; attribute on that second little form&amp;rsquo;s &lt;code&gt;input&lt;/code&gt; element to have that value.&lt;/p&gt;
&lt;p&gt;The third and fourth input fields above search websites with a more RESTful interface, so instead of passing a value in a particular variable name to a CGI script, they just construct a URL with the search term at the end. From within the form, this is actually trickier than the CGI way to do it because some JavaScript must be embedded into the form&amp;rsquo;s &lt;code&gt;action&lt;/code&gt; attribute to concatenate the entered value onto the appropriate URL and then send the browser to the resulting URL. You can see how this is done with a View Source of this blog entry. (Note how verbose the JavaScript way to grab that form value is&amp;ndash;I&amp;rsquo;d appreciate any suggestions for a simpler way.) You&amp;rsquo;ll also see that to send the browser to the appropriate destination, the form sets the &lt;code&gt;href&lt;/code&gt; property of the &lt;code&gt;window.location&lt;/code&gt; object to the new URL.&lt;/p&gt;
&lt;p&gt;Just about all the search forms I&amp;rsquo;ve found fall into one of these two categories, so for my master search forms at home and at work I&amp;rsquo;ve also added fields to search Google Maps, JIRA, Amazon, and more. You can see three more examples at the end of my entry from last April, &lt;a href=&#34;https://www.bobdc.com/blog/the-wikidata-data-model-and-yo&#34;&gt;The Wikidata data model and your SPARQL queries&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;It all makes a nice example of doing a little fun scripting, instead of real work, to save upwards of minutes a day!&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://xkcd.com/1205/&#34;&gt;&lt;img id=&#34;idm139672803236800&#34; src=&#34;https://imgs.xkcd.com/comics/is_it_worth_the_time.png&#34; border=&#34;0&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;xkcd cartoon&#34; style=&#34;display: block;margin-left: auto;margin-right: auto&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2017">2017</category>
      
    </item>
    
    <item>
      <title>Understanding activation functions better</title>
      <link>https://www.bobdc.com/blog/understanding-activation-funct/</link>
      <pubDate>Sun, 17 Sep 2017 13:11:14 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/understanding-activation-funct/</guid>
      
      
      <description><div>And making neural networks look a little less magic.</div><div>&lt;p&gt;&lt;a href=&#34;https://dataskeptic.com/blog/episodes/2017/activation-functions&#34;&gt;&lt;img id=&#34;idm139816220203472&#34; width=&#34;240&#34; src=&#34;https://www.bobdc.com/img/main/activationfunctions.png&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;activation function graphs&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Trying to get my data science and machine learning knowledge more caught up with my colleagues at &lt;a href=&#34;http://www.ccri.com&#34;&gt;CCRi&lt;/a&gt;, I have been regularly listening to the podcasts &lt;a href=&#34;http://www.thetalkingmachines.com/&#34;&gt;Talking Machines&lt;/a&gt; and &lt;a href=&#34;http://lineardigressions.com/&#34;&gt;Linear Digressions&lt;/a&gt;. One colleague recently recommended &lt;a href=&#34;https://dataskeptic.com/&#34;&gt;Data Skeptic&lt;/a&gt;, which I had tried before and didn&amp;rsquo;t get hooked on, but after listening to their episode on &lt;a href=&#34;https://dataskeptic.com/blog/episodes/2017/activation-functions&#34;&gt;Activation Functions&lt;/a&gt; I am now hooked. I am so hooked that I am going back through their four-year history and listening to all of their episodes marked &amp;ldquo;[MINI]&amp;rdquo;; these are shorter episodes focused on single specific important concepts, like the activation function episode.&lt;/p&gt;
&lt;p&gt;In my blog entry &lt;a href=&#34;https://www.bobdc.com/blog/a-modern-neural-network-in-11&#34;&gt;A modern neural network in 11 lines of Python&lt;/a&gt; last December, I quoted Per Harald Borgen&amp;rsquo;s &lt;a href=&#34;https://medium.com/learning-new-stuff/how-to-learn-neural-networks-758b78f2736e#.qkx5pzw2b&#34;&gt;Learning How To Code Neural Networks&lt;/a&gt;, where he says that backpropagation &amp;ldquo;essentially means that you look at how wrong the network guessed, and then adjust the networks weights accordingly.&amp;rdquo; I now understand better about a key design decision when making that adjustment. (All corrections to my explanations below are welcome.)&lt;/p&gt;
&lt;p&gt;You can&amp;rsquo;t adjust the weights with just any old number. For one thing, they usually have to fit within a certain range. If your input value is between 1 and 5000 and the adjustment function expects a number between 0 and 1, you could divide the number by 5000 before passing it along, but that won&amp;rsquo;t give your model much help adjusting its future guesses. Division is linear, which means that if you plot a graph where the function&amp;rsquo;s inputs are the x values and the outputs for each input are the corresponding y values, the result is a straight line. (&lt;a href=&#34;https://stackoverflow.com/questions/9782071/why-must-a-nonlinear-activation-function-be-used-in-a-backpropagation-neural-net&#34;&gt;Some technical definitions&lt;/a&gt; of linearity consider that one to be overly simplified, but the &lt;a href=&#34;https://en.wikipedia.org/wiki/Linearity&#34;&gt;Wikipedia entry&lt;/a&gt; is pretty close.) Combining linear functions just gives you another linear function, and a neural network&amp;rsquo;s goal is to converge on a value, which requires non-linearity. As Alan Richmond wrote in &lt;a href=&#34;http://python3.codes/a-neural-network-in-python-part-2-activation-functions-bias-sgd-etc/&#34;&gt;A Neural Network in Python, Part 2: activation functions, bias, SGD, etc.&lt;/a&gt;, without non-linearity, &amp;ldquo;adding layers adds nothing that couldn&amp;rsquo;t be done with just one layer,&amp;rdquo; and those extra layers are what give deep learning its depth.&lt;/p&gt;
&lt;p&gt;So, squeezing the input value down within a particular range won&amp;rsquo;t be enough. The sigmoid function that I described last December maps the input value to an S-curve so that greater positive values and lower negative values affect the output less than input values that are closer to 0. Ultimately, it does return a value between 0 and 1, and that&amp;rsquo;s what the 11-lines-of-Python network used to adjust weights in its earlier layer.&lt;/p&gt;
&lt;p&gt;For some situations, through, instead of a value between 0 and 1, a value between -1 and 1 might be more useful&amp;ndash;for example, if there is a potential need to adjust a weight downward. The &lt;a href=&#34;https://en.wikipedia.org/wiki/Hyperbolic_function&#34;&gt;hyperbolic tangent function&lt;/a&gt; also returns values that follow an S-curve, but they fall between -1 and 1. (While the regular tangent function you may have learned about in trigonometry class is built around a circle, the hyperbolic tangent function, or tanh, is built around a hyperbola. I don&amp;rsquo;t completely understand the difference, but when I look at a &lt;a href=&#34;https://www.varsitytutors.com/hotmath/hotmath_help/topics/graphing-tangent-function&#34;&gt;graph of the regular tangent&lt;/a&gt; function, I have a much more difficult time picturing how it would be helpful for tweaking a weight&amp;rsquo;s value.)&lt;/p&gt;
&lt;p&gt;When you choose between a sigmoid function, a tanh function, and one of other alternatives mentioned below, you&amp;rsquo;re choosing an &lt;a href=&#34;https://en.wikipedia.org/wiki/Activation_function&#34;&gt;activation function&lt;/a&gt;. The best choice depends on what you&amp;rsquo;re trying to do with your data, and the knowlege of what each can do for you is an important part of the model-building process. (The need for this knowledge when building a machine learning model is one reason that machine learning cannot be commoditized as easily as many people claim; see the &amp;ldquo;MLaaS dies a second death&amp;rdquo; section of Bradford Cross&amp;rsquo;s &lt;a href=&#34;http://www.bradfordcross.com/blog/2017/3/3/five-ai-startup-predictions-for-2017&#34;&gt;Five AI Startup Preductions for 2017&lt;/a&gt; for an excellent discussion of some related issues.)&lt;/p&gt;
&lt;p&gt;The Data Skeptic podcast episode covers two other possible activation functions: a step function, which only outputs 0 or 1, and the &lt;a href=&#34;https://en.wikipedia.org/wiki/Rectifier_(neural_networks)&#34;&gt;Rectified Linear Unit&lt;/a&gt; (ReLU) function, which sets negative values to 0 and leaves others alone. ReLU activation functions come up in a &lt;a href=&#34;https://github.com/temerick/pytorch-lnl/blob/master/PyTorch%20lunch%20and%20learn.ipynb&#34;&gt;Jupyter notebook&lt;/a&gt; that accompanied the CCRi blog entry &lt;a href=&#34;http://www.ccri.com/2017/05/31/deep-learning-pytorch-jupyter-notebook/&#34;&gt;Deep Learning with PyTorch in a Jupyter notebook&lt;/a&gt; that I wrote last May, and they also appear in an earlier, more detailed draft of a recent CCRi blog entry that I edited called &lt;a href=&#34;http://www.ccri.com/2017/08/25/deep-reinforcement-learning-win-battleship/&#34;&gt;Deep Reinforcement Learning-of how to win at Battleship&lt;/a&gt;. Both times, I had no idea what a ReLU function was. Now I do; maybe I am catching up with these colleagues after all.&lt;/p&gt;
&lt;p&gt;If you want a better understanding of the choices developers make when designing neural network models to solve specific problems, I strongly recommend listening to the Data Skeptic podcast episode on activation functions, which is only 14 minutes. I especially liked its cornbread cooking examples, where questions of how much you might adjust the amount of different ingredients provided excellent examples of which activation functions would push numbers where you wanted them.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Images from Data Skeptic podcast page are &lt;a href=&#34;https://creativecommons.org/licenses/by-nc-sa/4.0/&#34;&gt;CC-BY-NC-SA&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2017">2017</category>
      
      <category domain="https://www.bobdc.com//categories/ai-and-machine-learning">AI and machine learning</category>
      
    </item>
    
    <item>
      <title>Validating RDF data with SHACL</title>
      <link>https://www.bobdc.com/blog/validating-rdf-data-with-shacl/</link>
      <pubDate>Sun, 20 Aug 2017 10:54:36 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/validating-rdf-data-with-shacl/</guid>
      
      
      <description><div>Setting some constraints--then violating them!</div><div>&lt;img id=&#34;idm140654375893984&#34; src=&#34;https://www.bobdc.com/img/main/shackles.jpg&#34; width=&#34;250&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;shackles&#34;/&gt;
&lt;p&gt;Last month, in &lt;a href=&#34;https://www.bobdc.com/blog/the-w3c-standard-constraint-la&#34;&gt;The W3C standard constraint language for RDF: SHACL&lt;/a&gt;, I described the history of this new standard that lets us define constraints on RDF data and an &lt;a href=&#34;https://github.com/TopQuadrant/shacl#user-content-command-line-usage&#34;&gt;open source tool&lt;/a&gt; that lets us identify where such constraints were violated. The presence of the standard and tools that let us implement the standard will be a big help to the use of RDF in production environments.&lt;/p&gt;
&lt;p&gt;There&amp;rsquo;s a lot you can do with SHACL&amp;ndash;enough that the full collection of features available and their infrastructure can appear a bit complicated. I wanted to create some simple constraints for some simple data and then use the &lt;code&gt;shaclvalidate.sh&lt;/code&gt; tool to identify which parts of the data violated which constraints, and it went very nicely.&lt;/p&gt;
&lt;p&gt;I started by going through a &lt;a href=&#34;http://www.topquadrant.com/technology/shacl/tutorial/&#34;&gt;TopQuadrant tutorial&lt;/a&gt; that builds some SHACL exercises using their TopBraid Composer GUI tool (free edition available by selecting &amp;ldquo;Free Edition&amp;rdquo; from the &amp;ldquo;Product&amp;rdquo; field on the &lt;a href=&#34;http://www.topquadrant.com/downloads/topbraid-composer-install/&#34;&gt;TopBraid Composer Installation&lt;/a&gt; page). Then, after I examined the triples that Composer generated when I followed the tutorial&amp;rsquo;s steps, I created my own new example called &lt;code&gt;employees.ttl&lt;/code&gt; to run with &lt;code&gt;shaclvalidate.sh&lt;/code&gt;. (To make my example as stripped-down as possible, I used a &lt;a href=&#34;https://www.gnu.org/software/emacs/&#34;&gt;text editor&lt;/a&gt; for this, not Composer.) You can download my file &lt;a href=&#34;http://snee.com/bobdc.blog/files/employees.ttl&#34;&gt;right here&lt;/a&gt;; below I describe the file a few lines at a time to show what I was doing and how the pieces fit together.&lt;/p&gt;
&lt;p&gt;I started off with declarations for prefixes, a class, and a few properties for that class:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;@prefix hr: &amp;lt;http://learningsparql.com/ns/humanResources#&amp;gt; .
@prefix d:  &amp;lt;http://learningsparql.com/ns/data#&amp;gt; .
@prefix rdfs: &amp;lt;http://www.w3.org/2000/01/rdf-schema#&amp;gt; .
@prefix rdf: &amp;lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#&amp;gt; .
@prefix xsd: &amp;lt;http://www.w3.org/2001/XMLSchema#&amp;gt; .
@prefix sh: &amp;lt;http://www.w3.org/ns/shacl#&amp;gt; .


#### Regular RDFS modeling ####


hr:Employee a rdfs:Class .


hr:name
   rdf:type rdf:Property ;
   rdfs:domain hr:Employee .


hr:hireDate
   rdf:type rdf:Property ;
   rdfs:domain hr:Employee ;
   rdfs:range xsd:date .


hr:jobGrade
   rdf:type rdf:Property ;
   rdfs:domain hr:Employee ;
   rdfs:range xsd:integer .
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;There is nothing new and interesting there, but it&amp;rsquo;s worth reviewing why these declarations are useful: so that applications using instances of this class know more about it and can do more with it. For example, when generating a form to let users edit Employee instances, an application noting that &lt;code&gt;hr:hireDate&lt;/code&gt; has an &lt;code&gt;rdfs:range&lt;/code&gt; of &lt;code&gt;xsd:date&lt;/code&gt; might provide a date-picking widget on the form instead of just providing a text field to fill out. (And, if the application sees an additional property for this class declared someday, it can automatically generate a field for the new property on the edit form, so that this model really is driving application behavior.) These &lt;code&gt;rdfs:range&lt;/code&gt; values are &lt;em&gt;not&lt;/em&gt; there so that an automated process can check whether that instance data conforms to these types, although some applications may have done that. This is the hole that SHACL fills, as we will see below.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;  #### Additional SHACL modeling ####


hr:Employee
# Following two lines are an alternative to the line above
#hr:EmployeeShape
#  sh:targetClass hr:Employee ;
   a sh:NodeShape ;
   sh:property hr:nameShape ;
   sh:property hr:jobGradeShape .


hr:nameShape
   sh:path hr:name ;
   sh:datatype xsd:string ;
   sh:minCount 1 ;
   sh:maxCount 1 .


hr:jobGradeShape
   sh:path hr:jobGrade ;
   sh:datatype xsd:integer ;
   sh:minCount 1 ;
   sh:maxCount 1 ;
   sh:minInclusive 1;
   sh:maxInclusive 7 .
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The SHACL vocabulary is associated here with the prefix &lt;code&gt;sh:&lt;/code&gt;. Some of the best documentation of this vocabulary is right where it should be&amp;ndash;in &lt;code&gt;rdfs:comment&lt;/code&gt; values of the class and property declarations in &lt;a href=&#34;https://www.w3.org/ns/shacl.ttl&#34;&gt;https://www.w3.org/ns/shacl.ttl&lt;/a&gt;. (As we&amp;rsquo;ll see, &lt;a href=&#34;https://www.w3.org/TR/shacl/&#34;&gt;the spec itself&lt;/a&gt; is also a good place to find out what&amp;rsquo;s what.)&lt;/p&gt;
&lt;p&gt;Above, we see that &lt;code&gt;hr:Employee&lt;/code&gt;, which had already been declared to be an &lt;code&gt;rdfs:Class&lt;/code&gt;, is also declared to be an &lt;code&gt;sh:NodeShape&lt;/code&gt;. To quote a few of the &lt;code&gt;shacl.ttl&lt;/code&gt; vocabulary file&amp;rsquo;s &lt;code&gt;rdfs:comment&lt;/code&gt; values, &amp;ldquo;a shape is a collection of constraints that may be targeted for certain nodes,&amp;rdquo; &amp;ldquo;a node shape is a shape that specifies constraint [sic] that need to be met with respect to focus nodes,&amp;rdquo; and (quoting &lt;a href=&#34;https://www.w3.org/TR/shacl/#focusNodes&#34;&gt;the spec&lt;/a&gt; this time) &amp;ldquo;an RDF term that is validated against a shape using the triples from a data graph is called a focus node.&amp;rdquo; So, declaring &lt;code&gt;hr:Employee&lt;/code&gt; to also be a &lt;code&gt;sh:NodeShape&lt;/code&gt; lets it serve as a collection of constraints for certain nodes.&lt;/p&gt;
&lt;p&gt;Note the commented-out alternative lines after that first one. Instead of making the existing &lt;code&gt;hr:Employee&lt;/code&gt; class also serve as a collection of constraints for instances of that class, we could declare a separate new class as an instance of &lt;code&gt;sh:NodeShape&lt;/code&gt; (in the commented-out example, a new instance called &lt;code&gt;hr:EmployeeShape&lt;/code&gt;) and go on to define the constraints there. How would the validator know that &lt;code&gt;hr:EmployeeShape&lt;/code&gt; was storing constraints for the &lt;code&gt;hr:Employee&lt;/code&gt; class? Because, as the last commented-out line shows, its &lt;code&gt;sh:targetClass&lt;/code&gt; property would point to the &lt;code&gt;hr:Employee&lt;/code&gt; class. (Thanks to my former TopQuadrant colleague Holger for helping me to understand how that works.)&lt;/p&gt;
&lt;p&gt;After naming the place to store the constraints, we create some using the SHACL vocabulary&amp;rsquo;s &lt;code&gt;sh:property&lt;/code&gt; property. The &lt;code&gt;rdfs:comment&lt;/code&gt; for this property in &lt;code&gt;shacl.ttl&lt;/code&gt; tells us that it &amp;ldquo;Links a shape to its property shapes.&amp;rdquo; In the SHACL files created by TopBraid Composer, it links to property shapes grouped together with blank nodes, but as you can see above, I pointed them at shapes for the Employee name and jobGrade properties that have their own URIs.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;hr:nameShape&lt;/code&gt; and &lt;code&gt;hr:jobGradShape&lt;/code&gt; property shapes above are pretty self-explanatory. To show that one value for each must be included with each instance of &lt;code&gt;hr:Employee&lt;/code&gt;, I gave each an &lt;code&gt;sh:minCount&lt;/code&gt; and a &lt;code&gt;sh:maxCount&lt;/code&gt; value of 1. The property shapes also have data types specified, and unlike the use of the &lt;code&gt;rdfs:range&lt;/code&gt; specifications for these properties above, these will be used for validation. For &lt;code&gt;hr:jobGradeShape&lt;/code&gt;, I also added &lt;code&gt;sh:minInclusive&lt;/code&gt; and &lt;code&gt;sh:maxInclusive&lt;/code&gt; values to restrict any data values to be from 1 to 7.&lt;/p&gt;
&lt;p&gt;The last part of &lt;code&gt;employees.ttl&lt;/code&gt; has four instances of &lt;code&gt;hr:Employee&lt;/code&gt;. The first meets all the defined constraints:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;d:e1
   a hr:Employee;
   hr:name &amp;quot;Barry Wom&amp;quot; ;
   hr:hireDate &amp;quot;2017-06-03&amp;quot; ;
   hr:jobGrade 6 .
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;When I comment out the other three instances and run shaclvalidate on the file, it gives me back a validation report, in the form of triples, about how everything is cool:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;  @prefix sh:    &amp;lt;http://www.w3.org/ns/shacl#&amp;gt; .


[ a            sh:ValidationReport ;
  sh:conforms  true
] .
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The next instance lacks the required &lt;code&gt;hr:jobGrade&lt;/code&gt; value:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;d:e2
   a hr:Employee;
   hr:name &amp;quot;Ron Nasty&amp;quot; ;
   hr:hireDate &amp;quot;2017-08-11&amp;quot; . 
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;After I uncommented this instance in &lt;code&gt;employees.ttl&lt;/code&gt;, shaclvalidate told me this about it:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;@prefix d:     &amp;lt;http://learningsparql.com/ns/data#&amp;gt; .
@prefix sh:    &amp;lt;http://www.w3.org/ns/shacl#&amp;gt; .
@prefix hr:    &amp;lt;http://learningsparql.com/ns/humanResources#&amp;gt; .


[ a            sh:ValidationReport ;
  sh:conforms  false ;
  sh:result    [ a                             sh:ValidationResult ;
                 sh:focusNode                  d:e2 ;
                 sh:resultMessage              &amp;quot;Less than 1 values&amp;quot; ;
                 sh:resultPath                 hr:jobGrade ;
                 sh:resultSeverity             sh:Violation ;
                 sh:sourceConstraintComponent  sh:MinCountConstraintComponent ;
                 sh:sourceShape                hr:jobGradeShape
               ]
] .
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;As I mentioned last month, returning these validation reports as triples makes it easier to plug the process into a larger automated workflow, and here we see that when constraints are violated, the triples include information to incorporate into that larger workflow&amp;ndash;for example, to build a message to display in a pop-up message box. You could also query accumulated validation reports with SPARQL to identify patterns of what kinds of violations happened how often.&lt;/p&gt;
&lt;p&gt;The third employee tests the SHACL validator&amp;rsquo;s ability to detect data type violations, because the &lt;code&gt;hr:jobGrade&lt;/code&gt; value is not an integer:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;  d:e3
   a hr:Employee;
   hr:name &amp;quot;Stig O&#39;Hara&amp;quot; ;
   hr:hireDate &amp;quot;2017-03-14&amp;quot; ;
   hr:jobGrade 3.14 .
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;shaclvalidate does just fine with that:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;@prefix d:     &amp;lt;http://learningsparql.com/ns/data#&amp;gt; .
@prefix sh:    &amp;lt;http://www.w3.org/ns/shacl#&amp;gt; .
@prefix hr:    &amp;lt;http://learningsparql.com/ns/humanResources#&amp;gt; .


[ a            sh:ValidationReport ;
  sh:conforms  false ;
  sh:result    [ a                             sh:ValidationResult ;
                 sh:focusNode                  d:e3 ;
                 sh:resultMessage              &amp;quot;Value does not have datatype xsd:integer&amp;quot; ;
                 sh:resultPath                 hr:jobGrade ;
                 sh:resultSeverity             sh:Violation ;
                 sh:sourceConstraintComponent  sh:DatatypeConstraintComponent ;
                 sh:sourceShape                hr:jobGradeShape ;
                 sh:value                      3.14
               ]
] .
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The last employee instance tests the SHACL validator&amp;rsquo;s ability to detect a value that falls outside of a specified range, because &lt;code&gt;hr:jobGrade&lt;/code&gt; is greater than 7:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;d:e4
   a hr:Employee;
   hr:name &amp;quot;Dirk McQuickly&amp;quot; ;
   hr:hireDate &amp;quot;2017-01-08&amp;quot; ;
   hr:jobGrade 8 .
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This isn&amp;rsquo;t a problem either:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;@prefix d:     &amp;lt;http://learningsparql.com/ns/data#&amp;gt; .
@prefix sh:    &amp;lt;http://www.w3.org/ns/shacl#&amp;gt; .
@prefix hr:    &amp;lt;http://learningsparql.com/ns/humanResources#&amp;gt; .


[ a            sh:ValidationReport ;
  sh:conforms  false ;
  sh:result    [ a                             sh:ValidationResult ;
                 sh:focusNode                  d:e4 ;
                 sh:resultMessage              &amp;quot;Value is not &amp;lt;= 7&amp;quot; ;
                 sh:resultPath                 hr:jobGrade ;
                 sh:resultSeverity             sh:Violation ;
                 sh:sourceConstraintComponent  sh:MaxInclusiveConstraintComponent ;
                 sh:sourceShape                hr:jobGradeShape ;
                 sh:value                      8
               ]
] .
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I deliberately picked simple examples to see how difficult they would be to implement, and as with many powerful software systems, my only problem was navigating the detailed documentation of the architecture and many features to find the parts that I wanted.&lt;/p&gt;
&lt;p&gt;What other built-in constraints are available besides &lt;code&gt;sh:datatype&lt;/code&gt;, &lt;code&gt;sh:minCount&lt;/code&gt;, &lt;code&gt;sh:maxCount&lt;/code&gt;, &lt;code&gt;sh:minInclusive&lt;/code&gt;, and &lt;code&gt;sh:maxInclusive&lt;/code&gt;? See for yourself in section 4 of the spec: &lt;a href=&#34;https://www.w3.org/TR/shacl/#core-components&#34;&gt;Core Constraint Components&lt;/a&gt;. (For a nice quick skim of the available constraints, just look through that section&amp;rsquo;s entries in the spec&amp;rsquo;s &lt;a href=&#34;https://www.w3.org/TR/shacl/#table-of-contents&#34;&gt;table of contents&lt;/a&gt;.)&lt;/p&gt;
&lt;p&gt;If you&amp;rsquo;ve done much work with RDF, you&amp;rsquo;re going to enjoy this.&lt;/p&gt;
&lt;p&gt;1912 farm and garden supply catalog image courtesy of &lt;a href=&#34;https://www.flickr.com/photos/internetarchivebookimages/16045403204/&#34;&gt;flickr&lt;/a&gt;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2017">2017</category>
      
      <category domain="https://www.bobdc.com//categories/rdf">RDF</category>
      
    </item>
    
    <item>
      <title>The W3C standard constraint language for RDF: SHACL</title>
      <link>https://www.bobdc.com/blog/the-w3c-standard-constraint-la/</link>
      <pubDate>Sun, 30 Jul 2017 10:46:06 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/the-w3c-standard-constraint-la/</guid>
      
      
      <description><div>A brief history of the new standard and some toys to play with it.</div><div>&lt;p&gt;&lt;a href=&#34;https://www.slideshare.net/cygri/shacl-shaping-the-big-ball-of-data-mud&#34;&gt;&lt;img id=&#34;idm140501143665392&#34; src=&#34;https://www.bobdc.com/img/main/rcshacl.jpg&#34; width=&#34;360&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;Richard Cyganiak SHACL slide&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Many people have complained about how the Web Ontology Language, or &lt;a href=&#34;https://www.w3.org/OWL/&#34;&gt;OWL&lt;/a&gt;, wasn&amp;rsquo;t a very good constraint language for RDF data. They didn&amp;rsquo;t realize that it wasn&amp;rsquo;t designed to be a constraint language, in which you define the structure of a dataset as a guide to applications so that these applications know what to expect. OWL was designed to do other things, and we finally have the W3C standard RDF constraint language we&amp;rsquo;ve been waiting for, but before we discuss it, a little history puts it in better context.&lt;/p&gt;
&lt;p&gt;For nearly all computer applications ever, there has been some ability to define what should be in the data, such as the columns of a relational table and their data types, the elements and attributes of a set of XML documents, or the classes of data that an object-oriented program is working with and their attributes. Data was usually not even added to these data sets until it conformed to the descriptions. These data definitions are known as prescriptive schemas, but OWL&amp;rsquo;s goal was to provide descriptive schemas: metadata about existing data sets, typically from the web, so that you could infer new knowledge about the resources that you found. (When I mention OWL, assume that I&amp;rsquo;m including its base layer &lt;a href=&#34;https://www.w3.org/TR/rdf-schema/&#34;&gt;RDFS&lt;/a&gt; as well.)&lt;/p&gt;
&lt;p&gt;When building large RDF applications, though, prescriptive schemas can provide some benefits, and while OWL can do a bit of this, it can&amp;rsquo;t do it very well. And, the OWL tools that can check whether constraints have been violated are fairly big and heavy because of all of their additional inferencing capabilities. So people complained. (For a good overview of the cool things that OWL is currently being used for, see &lt;a href=&#34;http://videolectures.net/eswc2016_hendler_wither_OWL/&#34;&gt;Jim Hendler&amp;rsquo;s 2016 ESWC talk&lt;/a&gt;.)&lt;/p&gt;
&lt;p&gt;At my former employer &lt;a href=&#34;http://www.topquadrant.com&#34;&gt;TopQuadrant&lt;/a&gt;, principal engineer Holger Knublauch developed a triples-based constraint language for RDF called &lt;a href=&#34;http://spinrdf.org/&#34;&gt;SPIN&lt;/a&gt;, for &amp;ldquo;SPARQL Inferencing Notation.&amp;rdquo; It took advantage of SPARQL&amp;rsquo;s ability to define constraints&amp;ndash;basically, you would query for things you didn&amp;rsquo;t want to see in the data, like an &lt;code&gt;invoice&lt;/code&gt; instance with no &lt;code&gt;approvedBy&lt;/code&gt; value, and if you found any, you knew where the constraint was violated. SPIN provided a structure for storing these queries as metadata about a dataset, and it was very useful in TopQuadrant&amp;rsquo;s customer work. I wrote about it here in &lt;a href=&#34;https://www.bobdc.com/blog/a-rules-language-for-rdf&#34;&gt;2009&lt;/a&gt; and in &lt;a href=&#34;http://www.snee.com/bobdc.blog/2010/03/is-spin-the-schematron-of-rdf.html&#34;&gt;2010&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;It was so useful that eventually some of these customers, as well as TopQuadrant and some TopQuadrant colleagues at other companies, started a W3C working group to develop a new constraint language that built on the ideas of SPIN: the Shapes Constraint Language, or &lt;a href=&#34;https://www.w3.org/TR/shacl/&#34;&gt;SHACL&lt;/a&gt;. (Get it? &amp;ldquo;Shackle&amp;rdquo;? Constraints?) SHACL &lt;a href=&#34;https://www.w3.org/blog/news/archives/6421&#34;&gt;is now a Recommendation&lt;/a&gt;: an official W3C standard just like HTML, XML, CSS, and the RDF standards.&lt;/p&gt;
&lt;p&gt;The TopQuadrant page &lt;a href=&#34;https://www.topquadrant.com/shacl-features-and-specifications/&#34;&gt;An Overview of SHACL Features and Specifications&lt;/a&gt; gives a nice overview of the components of SHACL and their relationships, and I will be digging deeper into that in the coming weeks.&lt;/p&gt;
&lt;p&gt;Some more great recent SHACL news is the &lt;a href=&#34;https://lists.w3.org/Archives/Public/public-rdf-shapes/2017Jul/0002.html&#34;&gt;availability&lt;/a&gt; of an API with command line tools to try SHACL out. I&amp;rsquo;ve been playing with the shaclvalidate.sh shell script tool (a Windows batch file is also included) which reads a file of triples that include constraints and instance data and then lists any violated constraints. A form-based &lt;a href=&#34;http://shacl.org/playground/&#34;&gt;SHACL playground&lt;/a&gt; is also available; I took the sample constraints and the Turtle version of the sample data available on that page, combined them into a single file, and fed that file to shaclvalidate.sh. The validation report that it created pointed out that the &lt;code&gt;schema:Person&lt;/code&gt; instance&amp;rsquo;s death date was earlier than its birth date, thereby violating one of the defined constraints. These reports are themselves sets of triples, making this kind of validation easier to plug this validation process into a larger workflow.&lt;/p&gt;
&lt;p&gt;The open source SHACL API that Holger created is &lt;a href=&#34;https://github.com/TopQuadrant/shacl&#34;&gt;available on github&lt;/a&gt;. The week that he released the command line tools I was actually trying to code up a SHACL command line validator myself around the API (with much kind help to my atrophied Java skills from Andy Seaborne), so I was very glad to see Holger release something that saved me from further Java coding.&lt;/p&gt;
&lt;p&gt;Holger&amp;rsquo;s API includes many &lt;a href=&#34;https://github.com/TopQuadrant/shacl/tree/master/src/test/resources/sh/tests&#34;&gt;test cases&lt;/a&gt; that I know will teach me a lot about SHACL&amp;rsquo;s capabilities. For example, &lt;a href=&#34;https://github.com/TopQuadrant/shacl/blob/master/src/test/resources/sh/tests/sparql/component/propertyValidator-select-001.test.ttl&#34;&gt;one test&lt;/a&gt; demonstrates the ability to define a constraint with a SPARQL query, one of the original inspirations for this constraint language, and I have already successfully run this test with the validation shell script. SPARQL-based constraints are less necessary in SHACL than you might think, because the core of SHACL is a vocabulary to define many common constraint conditions, but it&amp;rsquo;s still great to see them, because they add so much flexibility to the constraints that you can define&amp;ndash;for example, you could specify that an &lt;code&gt;approvedBy&lt;/code&gt; value is only required if &lt;code&gt;invoiceAmount&lt;/code&gt; is greater than a certain value.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;m looking forward to playing more with SHACL. A good next step for anyone interested is to review the slides titled &lt;a href=&#34;https://www.slideshare.net/cygri/shacl-shaping-the-big-ball-of-data-mud&#34;&gt;Shaping the Big Ball of Data Mud: W3C&amp;rsquo;s Shapes Constraint Language (SHACL)&lt;/a&gt; that TopQuadrant&amp;rsquo;s Richard Cyganiak gave to the Lotico Berlin Semantic Web meetup last November. I&amp;rsquo;ve copied his excellent conclusion slide above.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2017">2017</category>
      
      <category domain="https://www.bobdc.com//categories/rdf">RDF</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
    </item>
    
    <item>
      <title>Creating Wide CSV files with SPARQL</title>
      <link>https://www.bobdc.com/blog/creating-wide-csv-files-with-s/</link>
      <pubDate>Sun, 25 Jun 2017 09:47:13 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/creating-wide-csv-files-with-s/</guid>
      
      
      <description><div>Lots of columns and commas, but all in the right place.</div><div>&lt;blockquote id=&#34;idm139864526871584&#34; class=&#34;pullquote&#34;&gt;I was a bit proud that I came up with this simple way to make sure all the values came out in the right places in this fairly complicated target output.&lt;/blockquote&gt;
&lt;p&gt;I recently decided to copy my address book, which I have in an RDF file, to Google Contacts. The basic steps are pretty straightforward:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;In Google Contacts, create an entry with test data in every field: TestGiveName, TestFamilyName, &lt;a href=&#34;mailto:testemail@whatever.com&#34;&gt;testemail@whatever.com&lt;/a&gt;, and so forth.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Export the contacts as a &lt;a href=&#34;https://en.wikipedia.org/wiki/Comma-separated_values&#34;&gt;CSV&lt;/a&gt; file. The currently default &amp;ldquo;preview&amp;rdquo; version of Google Contacts doesn&amp;rsquo;t allow this yet, but you can &amp;ldquo;go to old version&amp;rdquo; and then find &lt;strong&gt;Export&lt;/strong&gt; on the &lt;strong&gt;More&lt;/strong&gt; drop-down menu.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;In the exported CSV, move the test entry created in step 1 to the second line, just under the field names.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Using the field names and test entry as a guide, write a SPARQL query that returns the relevant information from the RDF address book file in the order shown in the exported file.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Execute the query, requesting CSV output.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Replace the query output&amp;rsquo;s header row with the header row from the original exported file and then import the result into Google contacts.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Step 4 seemed a bit intimidating. With something like 88 columns in step 2&amp;rsquo;s exported CSV, I knew that messing up one comma (for example, putting the 47th piece of information after the 47th comma instead of before it) would mess up all the information after it. I have made plenty of mistakes like this when creating wide-body CSV before.&lt;/p&gt;
&lt;p&gt;I had a great idea, though, that made it much simpler: I created the SELECT statement from the first line of the exported CSV. I copied that line to a text editor, replaced the spaces in the field names with underscores, removed the hyphens (not allowed in SPARQL variable names), and then replaced each comma with a space and a question mark to turn the name after it into a variable name. Finally, I manually added a question mark to the very first name (the global replace in the previous step didn&amp;rsquo;t do that because there was no comma there) and added the word SELECT before it, and I had the SELECT statement that my query needed.&lt;/p&gt;
&lt;p&gt;This way, before I&amp;rsquo;d even begun implementing the logic to pull each piece of data out of the address book RDF, I knew that when I did they would come out in the right places.&lt;/p&gt;
&lt;p&gt;Adding two bits of that logic to a WHERE clause gave me this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;PREFIX  v: &amp;lt;http://www.w3.org/2006/vcard/ns#&amp;gt;


SELECT ?Name ?Given_Name ?Additional_Name ?Family_Name ?Yomi_Name
       ?Given_Name_Yomi ?Additional_Name_Yomi ?Family_Name_Yomi ?Name_Prefix
       ?Name_Suffix ?Initials ?Nickname ?Short_Name ?Maiden_Name ?Birthday
        # 21 more lines of variable names
       ?Custom_Field_2__Value ?Custom_Field_3__Type ?Custom_Field_3__Value
WHERE {
          ?entry v:family-name ?Family_Name . 
          ?entry v:given-name  ?Given_Name .
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;When I ran &lt;a href=&#34;https://jena.apache.org/documentation/query/&#34;&gt;arq&lt;/a&gt; with this command,&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;arq --query addrbook2csv.rq --data addrbook.rdf --results=CSV
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;It gave me CSV output with the &lt;code&gt;?Family_Name&lt;/code&gt; and &lt;code&gt;?Given_Name&lt;/code&gt; values right where they needed to be for Google Contacts to import them properly.&lt;/p&gt;
&lt;p&gt;I wish I could say that the rest of the query development was just a matter of adding triple patterns like the &lt;code&gt;?Family_Name&lt;/code&gt; and &lt;code&gt;?Given_Name&lt;/code&gt; ones shown above, but it got more complicated because of the ad hoc structure of my address book data. I needed a UNION, lots of OPTIONAL blocks, and even some nested OPTIONAL blocks that I&amp;rsquo;m not proud of. Still, I was a bit proud that I came up with this simple way to make sure that all the values came out in the right places in this fairly complicated target output.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2017">2017</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
    </item>
    
    <item>
      <title>Instead of writing SPARQL queries for Wikipedia--query for them!</title>
      <link>https://www.bobdc.com/blog/instead-of-writing-sparql-quer/</link>
      <pubDate>Mon, 29 May 2017 10:11:18 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/instead-of-writing-sparql-quer/</guid>
      
      
      <description><div>Queries as data to help you get at more data.</div><div>&lt;p&gt;&lt;a href=&#34;https://commons.wikimedia.org/wiki/Category:Portraits_with_fruits&#34;&gt;&lt;img id=&#34;idm139674224380352&#34; height=&#34;140&#34; border=&#34;0&#34; align=&#34;right&#34; hspace=&#34;30px&#34; vspace=&#34;10px&#34; src=&#34;https://www.bobdc.com/img/main/352px-Portrait_of_Cornelis_Cornelisz_Schellinger_%281551-1635%29.jpg&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Let&amp;rsquo;s say, hypothetically, that you want to execute a SPARQL query that lists all of Wikimedia&amp;rsquo;s portraits with fruit. Wikimedia does have a &lt;a href=&#34;https://commons.wikimedia.org/wiki/Category:Portraits_with_fruits&#34;&gt;category&lt;/a&gt; for this, so what would be the quickest way to come up with the query?&lt;/p&gt;
&lt;p&gt;If you click the &lt;a href=&#34;https://www.wikidata.org/wiki/Q29789760&#34;&gt;Wikidata item&lt;/a&gt; link on this category&amp;rsquo;s page, you&amp;rsquo;ll see all the data about it that you can retrieve with a SPARQL query to the Wikimedia endpoint, as I&amp;rsquo;ve described in my last few blog entries. The cool thing for this particular resource is that one property is called &lt;a href=&#34;https://www.wikidata.org/wiki/Property:P3921&#34;&gt;Wikidata SPARQL query equivalent&lt;/a&gt;, and its value is the query that will retrieve a list of the portraits with fruit. In other words, Wikidata has a triple that looks like this:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;subject&lt;/strong&gt;:     &lt;code&gt;wd:Q29789760&lt;/code&gt; (the Wikidata category &amp;ldquo;portraits with fruit&amp;rdquo;)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;predicate&lt;/strong&gt;:   &lt;code&gt;p:P3921&lt;/code&gt; (&amp;ldquo;Wikidata SPARQL query equivalent&amp;rdquo;)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;object&lt;/strong&gt;:      &lt;code&gt;SELECT DISTINCT ?item WHERE { ?item wdt:P31/wdt:P279\* wd:Q838948 . ?item wdt:P136/wdt:P31?/wdt:P279\* wd:Q134307 . ?item wdt:P180/wdt:P31?/wdt:P279\* wd:Q3314483 . }&lt;/code&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Paste that object value into the &lt;a href=&#34;https://query.wikidata.org/&#34;&gt;Wikidata query service&lt;/a&gt;, and you can run it to get a list of the portraits.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://commons.wikimedia.org/wiki/Category:Portraits_with_fruits&#34;&gt;&lt;img id=&#34;idm139674224368480&#34; height=&#34;140&#34; border=&#34;0&#34; align=&#34;right&#34; hspace=&#34;30px&#34; vspace=&#34;10px&#34; src=&#34;https://www.bobdc.com/img/main/Aase_Bye_1932.jpg&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;That may seem like a lot of trouble to get this list, but that&amp;rsquo;s not really the point. This query gives you a head start in developing more sophisticated queries on the topic.&lt;/p&gt;
&lt;p&gt;When I wondered how many Wikimedia resources used this predicate, I found that the ones using it were easier to understand if they also had an rdfs:label value. So, I entered this query to count the subjects that had both:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;SELECT (count(*) as ?count) WHERE { 
  ?s wdt:P3921 ?o ;
     rdfs:label ?label .
  }
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;a href=&#34;https://commons.wikimedia.org/wiki/Category:Portraits_with_fruits&#34;&gt;&lt;img id=&#34;idm139674224364448&#34; height=&#34;140&#34; border=&#34;0&#34; align=&#34;right&#34; hspace=&#34;30px&#34; vspace=&#34;10px&#34; src=&#34;https://www.bobdc.com/img/main/376px-Felix_Esterl_-_Frau_des_Kunstlers_mit_Fruchtteller_-_1925.jpeg&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Two weeks ago there were 316, but as I write this there are almost a hundred more, so the number is growing at a good pace.&lt;/p&gt;
&lt;p&gt;The idea of a SPARQL query as an object in an RDF triple is not new. It&amp;rsquo;s part of the &lt;a href=&#34;https://www.w3.org/TR/shacl/&#34;&gt;Shapes Constraint Language&lt;/a&gt; (SHACL), as demonstrated by &lt;a href=&#34;https://github.com/w3c/data-shapes/blob/gh-pages/data-shapes-test-suite/tests/sparql/node/sparql-001.ttl&#34;&gt;one of its test cases&lt;/a&gt;. SHACL is a W3C specification that lets you specify constraints on data&amp;ndash;for example, to validate that certain properties are required for instances of a particular class and that others are optional. (This is a lot more difficult using OWL.) I&amp;rsquo;ll be looking at SHACL more closely in the coming months; meanwhile, I&amp;rsquo;ll be keeping an eye on the SPARQL queries being added to Wikidata where we can retrieve them with our own SPARQL queries.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://commons.wikimedia.org/wiki/Category:Portraits_with_fruits&#34;&gt;&lt;img id=&#34;idm139674224358768&#34; height=&#34;140&#34; border=&#34;0&#34; vspace=&#34;10px&#34; src=&#34;https://www.bobdc.com/img/main/Dido_Elizabeth_Belle.jpg&#34; style=&#34;display: block;margin-left: auto;margin-right: auto &#34;/&gt;&lt;/a&gt;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2017">2017</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
      <category domain="https://www.bobdc.com//categories/wikidata">Wikidata</category>
      
    </item>
    
    <item>
      <title>The Wikidata data model and your SPARQL queries</title>
      <link>https://www.bobdc.com/blog/the-wikidata-data-model-and-yo/</link>
      <pubDate>Sun, 23 Apr 2017 09:43:01 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/the-wikidata-data-model-and-yo/</guid>
      
      
      <description><div>Reference works to get you taking advantage of the fancy parts quickly.</div><div>&lt;blockquote id=&#34;idm140623751257040&#34; class=&#34;pullquote&#34;&gt;RDF standards were used to describe the Wikibase model that was developed independently of W3C standards.&lt;/blockquote&gt;
&lt;p&gt;&lt;a href=&#34;https://www.bobdc.com/blog/wikidatas-excellent-sample-spa&#34;&gt;Last month&lt;/a&gt; I promised that I would dig further into the Wikidata data model, its mapping to RDF, and how we can take advantage of this with SPARQL queries. I had been trying to understand the structure of the data based on the RDF classes and properties I saw and the documentation that I could find, and some of the vocabulary discussing these issues confused me&amp;ndash;for example, RDF is about describing resources, but I was seeing lots of references to entities, which can mean slightly different things in different branches of computer science. But, as Daniel Kinzler &lt;a href=&#34;https://lists.wikimedia.org/pipermail/wikidata/2017-March/010418.html&#34;&gt;explained&lt;/a&gt; to me, &amp;ldquo;The Wikidata (or technically, Wikibase) data model is not defined in terms of RDF&amp;rdquo;; RDF standards were used to describe the Wikibase model that was developed independently of W3C standards.&lt;/p&gt;
&lt;p&gt;Wikibase, as described by its &lt;a href=&#34;http://wikiba.se/&#34;&gt;home page&lt;/a&gt;, &amp;ldquo;is a collection of applications and libraries for creating, managing and sharing structured data&amp;hellip;Wikibase was developed for and is used by Wikidata, the free knowledge base and Wikipedia, the encyclopedia that anyone can edit.&amp;rdquo; The same page describes Wikidata as one of the &amp;ldquo;projects powered by Wikibase&amp;rdquo;, along with the &lt;a href=&#34;http://www.eagle-network.eu/wiki/index.php/Main_Page&#34;&gt;europeana eagle project&lt;/a&gt; and &lt;a href=&#34;https://data.droidwiki.org/wiki/Hauptseite&#34;&gt;Droid wiki&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The &lt;a href=&#34;https://www.mediawiki.org/wiki/Wikibase/DataModel&#34;&gt;Wikibase/DataModel&lt;/a&gt; document is fairly long and detailed, and I would suggest starting instead with the &lt;a href=&#34;https://www.mediawiki.org/wiki/Wikibase/DataModel/Primer&#34;&gt;Wikibase/DataModel/Primer&lt;/a&gt;. The Primer describes how &amp;ldquo;Entities are the basic elements of the knowledge base&amp;rdquo; and how &amp;ldquo;there are two predefined kinds of Entities: Items and Properties&amp;rdquo; (both of which RDF people consider to be resources). The document goes on to describe the information that can be associated with items and properties.&lt;/p&gt;
&lt;p&gt;I had originally found their &lt;a href=&#34;https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format&#34;&gt;RDF Dump Format&lt;/a&gt; document abstruse and confusing, but it was easier to follow after I read the Wikibase data model primer because I had a better idea of the dump format&amp;rsquo;s basis. It&amp;rsquo;s even easier to follow if you just skim the Dump Format document to get a general idea of what it covers and then go to the &lt;a href=&#34;https://www.mediawiki.org/wiki/Wikidata_query_service/User_Manual&#34;&gt;Wikidata query service/User Manual&lt;/a&gt;, where you&amp;rsquo;ll get an even faster start querying Wikidata. (Their &lt;a href=&#34;https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/queries/examples&#34;&gt;sample queries&lt;/a&gt; that I &lt;a href=&#34;https://www.bobdc.com/blog/wikidatas-excellent-sample-spa&#34;&gt;described last month&lt;/a&gt; also help a lot.) The User Manual describes the declared prefixes, some nice tricks for taking advantage of different kinds of labels, how to work with geo data, available endpoints that you can federate into your queries, and more. It also provides more context for understanding the Dump Format document.&lt;/p&gt;
&lt;p&gt;The Data Model document &lt;a href=&#34;https://www.mediawiki.org/wiki/Wikibase/DataModel/Primer#Statements&#34;&gt;describes&lt;/a&gt; the fundamental role of &lt;em&gt;statements&lt;/em&gt; in the Wikibase data model. (Longstanding members of the RDF community will enjoy Kingsley Idehen&amp;rsquo;s &lt;a href=&#34;https://lists.wikimedia.org/pipermail/wikidata/2017-March/thread.html#10456&#34;&gt;continuation&lt;/a&gt; of my thread with Daniel, in which Kingsley insists that Wikidata is a collection of reified RDF statements, and Daniel says that, well, no, not really. They eventually agree to disagree.) The RDF Dump Format document describes two &lt;a href=&#34;https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format#Statement_types&#34;&gt;statement types&lt;/a&gt; that are important to how we treat Wikidata as an RDF repository but are also potentially very confusing. The first type is known as a &lt;a href=&#34;https://en.wikipedia.org/wiki/Truthiness&#34;&gt;truthy&lt;/a&gt; statement, or &amp;ldquo;direct claim&amp;rdquo;; these are simple triples that assert facts. The other statement type is the full statement, which is used to &amp;ldquo;represent all data about the statement in the system&amp;rdquo;.&lt;/p&gt;
&lt;p&gt;As one way to quickly recognize the difference, Wikimedia usually uses specific namespaces in specific places in both truthy and full statements. For example, the namespace &lt;a href=&#34;http://www.wikidata.org/prop/direct/&#34;&gt;http://www.wikidata.org/prop/direct/&lt;/a&gt;, which is abbreviated using the prefix &lt;code&gt;wdt:&lt;/code&gt;, is usually used for the predicate of a truthy statement. (The Dump format document has a nice list of all of these in the &lt;a href=&#34;https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format#Predicates&#34;&gt;Predicates&lt;/a&gt; section. As you work with this data, you&amp;rsquo;ll often go back to the &lt;a href=&#34;https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format#Prefixes_used&#34;&gt;Prefixes used&lt;/a&gt; section of the RDF Dump Format and also the &lt;a href=&#34;https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format#Full_list_of_prefixes&#34;&gt;Full list of prefixes&lt;/a&gt; section that follows it.)&lt;/p&gt;
&lt;p&gt;Here&amp;rsquo;s an example of the two kinds of statements that Daniel provided me: the triple {&lt;code&gt;wd:Q64 wdt:P1376 wd:Q183&lt;/code&gt;} is a truthy triple saying that Berlin is the capital of Germany. Here is the full version of that statement:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;wds:Q64-43CCD3D6-F52E-4742-B0E3-BCA671B69D2C a wikibase:Statement,
                 wikibase:BestRank ;
   wikibase:rank wikibase:PreferredRank ;
   ps:P1376 wd:Q183 ;
   prov:wasDerivedFrom wdref:ba76a7c0f885fa85b10368696ab4ac89680aa073 .

wdref:ba76a7c0f885fa85b10368696ab4ac89680aa073 a wikibase:Reference ;
   pr:P248 wd:Q451546 ;
   pr:P958 &amp;quot;Artikel 2 (1)&amp;quot; .
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;To understand this better, I wanted to see this for a different statement: the fact that bebop musician Tommy Potter played the bass. First, I clicked the &amp;ldquo;Wikidata item&amp;rdquo; link on Potter&amp;rsquo;s &lt;a href=&#34;https://en.wikipedia.org/wiki/Tommy_Potter&#34;&gt;Wikipedia page&lt;/a&gt; and substituted /entity/ for /wiki/, as I described in my February blog entry &lt;a href=&#34;https://www.bobdc.com/blog/getting-to-know-wikidata&#34;&gt;Getting to know Wikidata&lt;/a&gt;, to get the URI that represents him: &lt;a href=&#34;http://www.wikidata.org/entity/Q1369941&#34;&gt;http://www.wikidata.org/entity/Q1369941&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;However, after doing this, it wasn&amp;rsquo;t as simple as you might think to find the triple about the instrument he played. A &lt;a href=&#34;https://query.wikidata.org/embed.html#SELECT%20*%20WHERE%20%7bwd%3aQ1369941%20%3fp%20%3fo%7d&#34;&gt;query for {wd:Q1369941 ?p ?o}&lt;/a&gt; (using the prefix substitution for brevity) retrieves all the triples about him, but they&amp;rsquo;re the &amp;ldquo;truthy&amp;rdquo; ones, in which the predicates are known as direct claim predicates. Three of these triples described him as a Jazzbassist, a contrebassiste de jazz, and a contrabbassista statunitense, but none listed the &amp;ldquo;bass&amp;rdquo; as the instrument that he played in any language. Queries about the predicates themselves&amp;ndash;that is, queries for triples where the properties used by these triples were the objects so that I could learn more about the truthy triples I retrieved about Potter (for example, whether the properties have &lt;code&gt;rdfs:&lt;/code&gt; label values in different languages)&amp;ndash;showed very little information. It turned out that, to learn more about these properties, I could look for triples that had these properties as &lt;em&gt;objects&lt;/em&gt;, with a predicate of &lt;code&gt;wikibase:directClaim&lt;/code&gt; linking the actual Wikidata data model property to the predicate used in the direct claim. When I queried for triples that had these Wikidata data model properties as subjects so that I could learn more about them, I found plenty.&lt;/p&gt;
&lt;p&gt;To put these relationships to use, I entered the following query to find out more about Tommy Potter:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;SELECT ?pname ?o ?olabel WHERE 
{
  wd:Q1369941 ?directClaimP ?o .          # Get the truthy triples.
  ?p wikibase:directClaim ?directClaimP . # Find the Wikibase properties linked
  ?p rdfs:label ?pname .                  # to the truthy triples&#39; predicates
  FILTER ( lang(?pname) = &amp;quot;en&amp;quot; )          # and their labels, in English.
  OPTIONAL {
     ?o rdfs:label ?olabel  
     FILTER ( lang(?olabel) = &amp;quot;en&amp;quot; )
  }
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The &lt;a href=&#34;https://query.wikidata.org/embed.html#SELECT%20%3Fpname%20%3Fo%20%3Folabel%20WHERE%20%0A%7B%0A%20%20wd%3AQ1369941%20%3FdirectClaimP%20%3Fo%20.%0A%20%20%3Fp%20wikibase%3AdirectClaim%20%3FdirectClaimP%20.%0A%20%20%3Fp%20rdfs%3Alabel%20%3Fpname%20.%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%0A%20%20FILTER%20%28%20lang%28%3Fpname%29%20%3D%20%22en%22%20%29%0A%20%20OPTIONAL%20%7B%0A%20%20%20%20%20%3Fo%20rdfs%3Alabel%20%3Folabel%20%20%0A%20%20%20%20%20FILTER%20%28%20lang%28%3Folabel%29%20%3D%20%22en%22%20%29%0A%20%20%7D%0A%7D&#34;&gt;result of this query&lt;/a&gt; is a mostly-human readable statement of facts about him. You could substitute the URI for just about any Wikidata entity as the subject in that first triple pattern to see information about that entity. You could also view the property names in other languages besides English, which is a big advantage of the Wikibase data model.&lt;/p&gt;
&lt;p&gt;If you send your browser to the &lt;a href=&#34;http://www.wikidata.org/entity/Q1369941&#34;&gt;http://www.wikidata.org/entity/Q1369941&lt;/a&gt; URI that represents Potter, you will get redirected to a Wikidata page with a nicely formatted human-readable version of data about Potter at &lt;a href=&#34;https://www.wikidata.org/wiki/Q1369941&#34;&gt;https://www.wikidata.org/wiki/Q1369941&lt;/a&gt;. On the other hand, if you add .ttl (or .nt or .rdf) to the end of the /entity/ version of the URI, you&amp;rsquo;ll get RDF of all the data about Potter, including the full representations with triples that include predicates such as &lt;code&gt;wikibase:BestRank&lt;/code&gt; and &lt;code&gt;prov:wasDerivedFrom&lt;/code&gt;, just like the full version of the data above about Berlin being the capital of Germany.&lt;/p&gt;
&lt;p&gt;After looking at the full data about Potter, some queries to find out more about it often found less than what I expected. I eventually learned from the &lt;a href=&#34;https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format#WDQS_data_differences&#34;&gt;WDQS data differences&lt;/a&gt; section of the RDF Dump Format document that &amp;ldquo;Data nodes (&lt;code&gt;wdata:Q2&lt;/code&gt;) are not stored&amp;hellip; This is done for performance reasons.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;After all this exploration, I still haven&amp;rsquo;t gotten to the kinds of structural queries I&amp;rsquo;ve been planning on&amp;ndash;for example, looking for instances based on their class&amp;rsquo;s relationship(s) to other classes. The Stack Exchange question &lt;a href=&#34;https://opendata.stackexchange.com/questions/9591/how-to-include-sub-classes-in-a-wikidata-sparql-query-example-when-querying&#34;&gt;How to include sub-classes in a Wikidata SPARQL query?&lt;/a&gt;, which has a solid answer, looks pretty inspirational. I&amp;rsquo;m looking forward to playing with it.&lt;/p&gt;
&lt;p&gt;Meanwhile, as you use SPARQL to play with Wikidata, you&amp;rsquo;re going to see a lot of cryptic resource names, like &lt;code&gt;wdt:P279&lt;/code&gt; in the Stack Exchange answer, and you&amp;rsquo;ll wonder what their human-readable name is. I created the form below to help me with the prefixes I used the most. You can use this form yourself (for example, enter P279 in the &lt;code&gt;wdt:&lt;/code&gt; field and press Enter), but you&amp;rsquo;d probably be best off copying it from this page&amp;rsquo;s source into your own page that you can customize.&lt;/p&gt;
&lt;p&gt;It turns out that &lt;code&gt;wdt:P279&lt;/code&gt; means &amp;ldquo;subclass of&amp;rdquo;. This is something I&amp;rsquo;ll certainly be getting to know better in the future.&lt;/p&gt;
&lt;hr /&gt;
&lt;p&gt;&lt;code&gt;wd:&lt;/code&gt;    &lt;form id=&#34;idm140623751218144&#34; name=&#34;wdform&#34; action=&#34;javascript:window.location.href = &#39;http://www.wikidata.org/entity/&#39;.concat(document.wdform[&#39;localName&#39;].value)&#34;&gt; &lt;input id=&#34;idm140623751217200&#34; type=&#34;text&#34; name=&#34;localName&#34;/&gt; &lt;/form&gt;
&lt;code&gt;wdt:&lt;/code&gt;   &lt;form id=&#34;idm140623751215680&#34; name=&#34;wdtform&#34; action=&#34;javascript:window.location.href = &#39;http://www.wikidata.org/entity/&#39;.concat(document.wdtform[&#39;localName&#39;].value)&#34;&gt; &lt;input id=&#34;idm140623751214784&#34; type=&#34;text&#34; name=&#34;localName&#34;/&gt; &lt;/form&gt;
&lt;code&gt;p:&lt;/code&gt;     &lt;form id=&#34;idm140623751213296&#34; name=&#34;pform&#34; action=&#34;javascript:window.location.href = &#39;http://www.wikidata.org/entity/&#39;.concat(document.pform[&#39;localName&#39;].value)&#34;&gt; &lt;input id=&#34;idm140623751212400&#34; type=&#34;text&#34; name=&#34;localName&#34;/&gt; &lt;/form&gt;&lt;/p&gt;
&lt;hr /&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2017">2017</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
      <category domain="https://www.bobdc.com//categories/wikidata">Wikidata</category>
      
    </item>
    
    <item>
      <title>Wikidata&#39;s excellent sample SPARQL queries</title>
      <link>https://www.bobdc.com/blog/wikidatas-excellent-sample-spa/</link>
      <pubDate>Sun, 26 Mar 2017 12:40:00 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/wikidatas-excellent-sample-spa/</guid>
      
      
      <description><div>Learning about the data, its structure, and more.</div><div>&lt;p&gt;&lt;a href=&#34;https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/queries/examples#Children_of_Genghis_Khan&#34;&gt;&lt;img id=&#34;idm139673944676400&#34; width=&#34;300&#34; src=&#34;https://www.bobdc.com/img/main/wikidatagkhan.png&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;part of Khan and descendants graph&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://www.bobdc.com/blog/getting-to-know-wikidata&#34;&gt;Last month&lt;/a&gt; I finally got to know &lt;a href=&#34;https://www.wikidata.org/wiki/Wikidata:Main_Page&#34;&gt;Wikidata&lt;/a&gt; more and saw that it has a lot of great stuff to explore. I&amp;rsquo;ve continued to explore the data and its model using two strategies: exploring the ontology built around the data and playing with the sample queries.&lt;/p&gt;
&lt;p&gt;Exploring the ontology takes some work. I&amp;rsquo;ll describe the resources available for this (and the ontology!) in greater detail when I have a better handle on it all. For sample queries, I have my own queries that I use to explore a dataset, as I described in the &amp;ldquo;Exploring the Data&amp;rdquo; section of the &lt;a href=&#34;http://learningsparql.com&#34;&gt;Learning SPARQL&lt;/a&gt; chapter &amp;ldquo;A SPARQL Cookbook&amp;rdquo;, but the wise people behind Wikidata have done much better than this by giving us a page of &lt;a href=&#34;https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/queries/examples&#34;&gt;sample queries&lt;/a&gt; that highlight some of the data and syntax available.&lt;/p&gt;
&lt;p&gt;The sample queries range from simple to complex, and each has a &amp;ldquo;Try it!&amp;rdquo; link that loads the query into the &lt;a href=&#34;https://query.wikidata.org/&#34;&gt;query form&lt;/a&gt;. (Before you get too far into the list of queries, note that the &lt;a href=&#34;https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format&#34;&gt;RDF Dump Format&lt;/a&gt; documentation page, which I will describe more next time, has a &lt;a href=&#34;https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format#Full_list_of_prefixes&#34;&gt;list of the URIs represented by the prefixes in the queries&lt;/a&gt;.)&lt;/p&gt;
&lt;p&gt;Here are some that I particularly liked after my brief tour:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;The second example query, for data about &lt;a href=&#34;https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/queries/examples#Horses&#34;&gt;Horses&lt;/a&gt;, is a good example of the excellent commenting that you will find in many of the sample queries.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The &lt;a href=&#34;https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/queries/examples#Recent_Events&#34;&gt;Recent Events&lt;/a&gt; query nicely demonstrates how Wikidata models time and how a query can use that to identify events with a particular time window&amp;ndash;in this case of this sample query, between 0 and 31 days ago.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The &lt;a href=&#34;https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/queries/examples#Popular_eye_colors&#34;&gt;Popular eye colors&lt;/a&gt; one demonstrates the use of &lt;a href=&#34;https://www.mediawiki.org/wiki/Wikidata_query_service/User_Manual#Default_views&#34;&gt;Default views&lt;/a&gt;&amp;ndash;special comments in directives that the Wikidata Query Service understands as an indication of how to present the data. The eye color query&amp;rsquo;s directive of &amp;ldquo;#defaultView:BubbleChart&amp;rdquo; means that running the query on &lt;a href=&#34;https://query.wikidata.org&#34;&gt;https://query.wikidata.org&lt;/a&gt; will (quickly!) give you this:&lt;/p&gt;
&lt;img id=&#34;idm139673944660704&#34; width=&#34;400&#34; src=&#34;https://www.bobdc.com/img/main/wikidata2eyecolors.png&#34; border=&#34;0&#34; style=&#34;display: block;margin-left: auto;margin-right: auto &#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;result of eye color query below&#34;/&gt;
&lt;p&gt;&lt;a href=&#34;https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/queries/examples#Popular_surnames_among_humans&#34;&gt;Popular surnames among humans&lt;/a&gt; creates another nice bubble chart.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The &lt;a href=&#34;https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/queries/examples#Even_more_cats.2C_with_pictures&#34;&gt;Even more cats, with pictures&lt;/a&gt; query that follows the eye color one uses an ImageGrid defaultView to create the following, finally filling the gap between &amp;ldquo;SPARQL&amp;rdquo; and &amp;ldquo;cat pictures&amp;rdquo; that has bedeviled web technology for so long:&lt;/p&gt;
&lt;img id=&#34;idm139673944722400&#34; width=&#34;400&#34; src=&#34;https://www.bobdc.com/img/main/wikidatacats.png&#34; border=&#34;0&#34; style=&#34;display: block;margin-left: auto;margin-right: auto &#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;result of eye color query below&#34;/&gt;
&lt;p&gt;The remaining six defaultViews also look like a lot of fun.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The &lt;a href=&#34;https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/queries/examples#Children_of_Genghis_Khan&#34;&gt;Children of Ghengis Khan&lt;/a&gt; sample query uses the Graph defaultView to display Khan&amp;rsquo;s children and grandchildren, with images of them when available, in a graph that lets you zoom and drag nodes around. A piece of it is shown above. The Music Genres query after that is similar. The line graph resulting from the &lt;a href=&#34;https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/queries/examples#Number_of_bands_by_year_and_genre&#34;&gt;Number of bands by year and genre&lt;/a&gt; query is also interesting.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;After getting this far, I hadn&amp;rsquo;t even seen 10% of the sample queries, but I did find the answer to my original question about how to get to know the range of possibilities with SPARQL queries of Wikidata better. (One more nice sample query that I wanted to mention is not on the samples page but on the &lt;a href=&#34;https://www.mediawiki.org/wiki/Wikidata_query_service/User_Manual&#34;&gt;User Manual&lt;/a&gt; one: an example of &lt;a href=&#34;https://www.mediawiki.org/wiki/Wikidata_query_service/User_Manual#Geospatial_search&#34;&gt;Geospatial searches&lt;/a&gt; that lists airports within 100km of Berlin.)&lt;/p&gt;
&lt;p&gt;To really learn about how Wikidata executes SPARQL queries, the &lt;a href=&#34;https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/query_optimization&#34;&gt;SPARQL query service/query optimization&lt;/a&gt; page provides good background on how &lt;a href=&#34;https://www.bobdc.com/blog/trying-out-blazegraph&#34;&gt;Blazegraph&lt;/a&gt;, the triplestore and query engine that Wikidata&amp;rsquo;s SPARQL endpoint uses, goes about executing the queries. (I found it pretty gutsy of this page&amp;rsquo;s authors to add a &amp;ldquo;Try it!&amp;rdquo; link after a &lt;a href=&#34;https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/query_optimization#A_query_that_has_difficulties&#34;&gt;sample query&lt;/a&gt; that the page itself says will time out.) As I wrote in the &amp;ldquo;Query Efficiency and Debugging&amp;rdquo; chapter of &amp;ldquo;Learning SPARQL&amp;rdquo;, query engines often optimize for you. Their methods for doing so are how these query engines try to distinguish themselves from each other, so learning more about the one that you&amp;rsquo;re using is worth it when you&amp;rsquo;re dealing with large-scale data like Wikidata. The &amp;ldquo;SPARQL query service/query optimization&amp;rdquo; page also describes how adding an &lt;code&gt;explain&lt;/code&gt; keyword to the query URL will get you a report on how it parses and optimizes your query.&lt;/p&gt;
&lt;p&gt;As much as I&amp;rsquo;d like to keep playing with of the sample queries, I&amp;rsquo;m going to dig into the Wikidata data model and its mapping to RDF next. Watch this space&amp;hellip;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2017">2017</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
      <category domain="https://www.bobdc.com//categories/wikidata">Wikidata</category>
      
    </item>
    
    <item>
      <title>Getting to know Wikidata</title>
      <link>https://www.bobdc.com/blog/getting-to-know-wikidata/</link>
      <pubDate>Sun, 26 Feb 2017 10:23:49 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/getting-to-know-wikidata/</guid>
      
      
      <description><div>First (SPARQL-oriented) steps.</div><div>&lt;img id=&#34;idm139800651418576&#34; src=&#34;https://www.bobdc.com/img/main/wikidatasparql.png&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;Wikidata and SPARQL logos&#34;/&gt;
&lt;p&gt;I&amp;rsquo;ve written &lt;a href=&#34;https://www.google.com/search?q=inurl%3Asnee.com%2Fbobdc.blog+dbpedia&amp;amp;oq=inurl%3Asnee.com%2Fbobdc.blog+dbpedia/&#34;&gt;so often&lt;/a&gt; about DBpedia here that a few times I considered writing a book about it. As I saw &lt;a href=&#34;https://www.wikidata.org/wiki/Wikidata:Main_Page&#34;&gt;Wikidata&lt;/a&gt; get bigger and bigger, I kept postponing the day when I would dig in and learn more about this Wikipedia sibling project. I&amp;rsquo;ve finally done this, starting with a few basic steps and one extra fun one:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Learn how to hit the SPARQL endpoint from an operating system command line with &lt;a href=&#34;https://curl.haxx.se/&#34;&gt;curl&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Explore, if available, the web form front end to the endpoint&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Learn how to find the identifier for whatever I like (a band, a person, a concept) so that I can create queries about it&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Automate the finding of the identifier when looking at a Wikipedia page&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;idm139800651465968&#34;&gt;Wikidata SPARQL queries from the command line&lt;/h2&gt;
&lt;p&gt;For that first task, you can append an &lt;a href=&#34;http://www.utilities-online.info/urlencode/&#34;&gt;escaped&lt;/a&gt; version of your query to &lt;code&gt;https://query.wikidata.org/sparql?query=&lt;/code&gt; and pass that to curl. For example, doing it with the query &amp;ldquo;SELECT DISTINCT ?p WHERE { ?s ?p ?o } LIMIT 10&amp;rdquo; gives you this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;        curl https://query.wikidata.org/sparql?query=SELECT%20DISTINCT%20%3Fp%20WHERE%20%7B%20%3Fs%20%3Fp%20%3Fo%20%7D%20LIMIT%2010
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;That command line retrieves the result in the default &lt;a href=&#34;https://www.w3.org/TR/rdf-sparql-XMLres/&#34;&gt;XML format&lt;/a&gt;. curl&amp;rsquo;s &lt;code&gt;-H&lt;/code&gt; option let&amp;rsquo;s you add HTTP header information to your request; for example adding &amp;lsquo;-H &amp;ldquo;Accept: text/csv&amp;rsquo;&amp;rdquo; after &amp;lsquo;curl&amp;rsquo; on the command line above retrieves a CSV version of the result set instead of XML.&lt;/p&gt;
&lt;h2 id=&#34;idm139800651385200&#34;&gt;Web form front end for entering Wikidata SPARQL queries&lt;/h2&gt;
&lt;p&gt;&lt;a href=&#34;https://query.wikidata.org/&#34;&gt;https://query.wikidata.org/&lt;/a&gt; is one of the nicest web forms I&amp;rsquo;ve ever seen for entering SPARQL queries. It offers color coding, auto-completion, and drop-down menus of tools, prefixes, and help.&lt;/p&gt;
&lt;p&gt;When I enter a query like the one above into this form and click the &lt;em&gt;Run&lt;/em&gt; button, the form runs the query and shows a URL in the browser&amp;rsquo;s address bar that incorporates the query. Pasting that full URL into another browser address bar takes me to the query form and enters that query (see &lt;a href=&#34;https://query.wikidata.org/#SELECT%20DISTINCT%20%3Fp%20WHERE%20%7B%20%3Fs%20%3Fp%20%3Fo%20%7D%20LIMIT%2010&#34;&gt;this&lt;/a&gt; for an example), but doesn&amp;rsquo;t execute it the way &lt;a href=&#34;http://dbpedia.org/snorql/?query=SELECT+DISTINCT+%3Fp+WHERE+%7B+%3Fs+%3Fp+%3Fo+%7D+LIMIT+10&#34;&gt;DBpedia does&lt;/a&gt; in the same situation&amp;ndash;with the Wikidata form, you still need to click that &lt;strong&gt;Run&lt;/strong&gt; button. If anyone knows of some parameter that I can add to the Wikidata URL to make this happen, I&amp;rsquo;d love to hear about it; I could then use it to replace the delivery of the handful of JSON in the scriplet described &lt;a href=&#34;#automating&#34;&gt;below&lt;/a&gt;. &lt;em&gt;March 4 update: I have learned from &lt;a href=&#34;https://twitter.com/JonasMKress/statuses/836905093215760384&#34;&gt;Jonas M. Kress&lt;/a&gt; that appending the escaped query to &amp;ldquo;&lt;a href=&#34;https://query.wikidata.org/embed.html&#34;&gt;https://query.wikidata.org/embed.html&lt;/a&gt;#&amp;rdquo; gives you a URL that will execute the query directly, &lt;a href=&#34;https://query.wikidata.org/embed.html#SELECT%20DISTINCT%20%3Fp%20WHERE%20%7B%20%3Fs%20%3Fp%20%3Fo%20%7D%20LIMIT%2010&#34;&gt;like this&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;
&lt;h2 id=&#34;idm139800651379152&#34;&gt;Finding the identifier for a resource starting at its Wikipedia page&lt;/h2&gt;
&lt;p&gt;&lt;em&gt;Feb 27 update: it looks like I went to a lot of unnecessary trouble when I should have paid closer attention to the Wikipedia pages themselves, which now have a &amp;ldquo;Wikidata item&amp;rdquo; link on the left. I learned about this from &lt;a href=&#34;https://twitter.com/atomotic&#34;&gt;Raffaele Messuti&lt;/a&gt;, who also &lt;a href=&#34;https://twitter.com/atomotic/status/836139354548490240&#34;&gt;told me&lt;/a&gt; that a Ctrl+option+g keystroke will do the same thing. This keystroke combination didn&amp;rsquo;t work for me using a Das Keyboard under Ubuntu with either Chrome or Firefox, but may for you. The important thing is the nice link from every Wikipedia page to the corresponding Wikimedia page, although you&amp;rsquo;ll want to substitute &amp;ldquo;/entity/&amp;rdquo; for &amp;ldquo;/wiki/&amp;rdquo; in the Wikimedia URL to get the actual entity URI.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;When viewing a Wikipedia page for something, you can usually find that thing&amp;rsquo;s DBpedia URI by rearranging the Wikipedia URL a little. &lt;a href=&#34;https://www.bobdc.com/blog/from-a-wikipedia-page-to-the-c&#34;&gt;Almost six years ago&lt;/a&gt; I automated this in a scriptlet that takes a browser from a Wikipedia page to the DBpedia URI for the page&amp;rsquo;s subject in one click.&lt;/p&gt;
&lt;p&gt;The usage of the English terms from the Wikipedia URLs in the corresponding DBpedia URIs worked pretty well for a bottom-up, easily crowd-sourced bootstrapping of the DBpedia URI design, but the English basis and the problems introduced by the occasional use of punctuation are not ideal. The Wikidata team did more initial design of the URI structure and went with the best practice of not incorporating actual names. (My favorite explication of this practice is on slides &lt;a href=&#34;https://www.slideshare.net/reduxd/beyond-the-polar-bear/41-Choosing_a_Nice_URL_design&#34;&gt;41&lt;/a&gt; and &lt;a href=&#34;https://www.slideshare.net/reduxd/beyond-the-polar-bear/42-The_resultant_URLbr_httpwwwbbccoukprogrammesb00t8wp0br_The&#34;&gt;42&lt;/a&gt; of &lt;a href=&#34;https://www.slideshare.net/reduxd/beyond-the-polar-bear/&#34;&gt;this BBC slide deck.&lt;/a&gt;) For example, while the DBpedia URI for &amp;ldquo;house&amp;rdquo; is &lt;code&gt;http://dbpedia.org/resource/House&lt;/code&gt;, the Wikidata one is &lt;code&gt;http://www.wikidata.org/entity/Q3947&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;So if we can&amp;rsquo;t go from a Wikipedia page to a Wikidata URI by manipulating a string version of the Wikipedia URL, how do we do it? The &lt;a href=&#34;https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format&#34;&gt;Wikibase/Indexing/RDF Dump Format&lt;/a&gt; page explains a lot about the structure of the data, and its &lt;a href=&#34;https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format#Sitelinks&#34;&gt;Sitelinks&lt;/a&gt; section describes how a triple with a predicate of &lt;code&gt;schema:about&lt;/code&gt; links a Wikipedia page to the Wikidata URI for the entity being described. If I want to know the URI for the concept of House and I know the concept&amp;rsquo;s Wikipedia URL, I can enter the query &amp;ldquo;SELECT ?uri WHERE { &lt;a href=&#34;https://en.wikipedia.org/wiki/House&#34;&gt;https://en.wikipedia.org/wiki/House&lt;/a&gt; schema:about ?uri }&amp;rdquo;. (You can try it in the Wikidata query form by clicking &lt;a href=&#34;https://query.wikidata.org/#SELECT%20%3Furi%20WHERE%20%7B%20%3Chttps%3A%2F%2Fen.wikipedia.org%2Fwiki%2FHouse%3E%20schema%3Aabout%20%3Furi%20%7D&#34;&gt;here&lt;/a&gt;.)&lt;/p&gt;
&lt;h2 id=&#34;idm139800651368800&#34;&gt;Automating that&lt;/h2&gt;
&lt;p&gt;To go from a Wikipedia page to a Wikidata URI in one click, I needed to embed a SPARQL query about the page&amp;rsquo;s &lt;code&gt;schema:about&lt;/code&gt; value in a &lt;a href=&#34;https://en.wikipedia.org/wiki/Scriptlet&#34;&gt;scriptlet&lt;/a&gt; that would send the query to the Wikidata SPARQL endpoint. (I would have liked to send it to the query form and execute that, but as I described above, I couldn&amp;rsquo;t work out how to trigger the running of the query from the submitted URL.) I did get this to work, and you can drag this link to your Chrome bookmarks bar: &lt;a href=&#34;javascript:location.href=(%22https://query.wikidata.org/sparql?query=SELECT%20%3Furi%20WHERE%20%7B%3C%22%20+%20encodeURIComponent(location.href.replace(/_/g,%22%2520%22))%20+%20%22%3E%20schema%3Aabout%20%3Furi%7D&amp;amp;format=json%22)&#34;&gt;wp -&amp;gt; wikidata&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The scriptlet is a bit limited, though:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;It returns a small handful of JSON instead of just the URI, which I would have preferred.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;When used with Chrome, it displays the JSON in the browser. In a brief test with Firefox, the browser offered to download the JSON instead of displaying it.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;I mentioned above how Wikipedia and DBpedia use English words in their URL identifiers, and this often includes disambiguation language, so the scriptlet doesn&amp;rsquo;t work on those. For example, adding the string &amp;ldquo;Asteroid&amp;rdquo; to the base URL &amp;ldquo;&lt;a href=&#34;https://en.wikipedia.org/wiki/%22&#34;&gt;https://en.wikipedia.org/wiki/&amp;quot;&lt;/a&gt; will give you the Wikipedia URL for the English-language page describing minor planets, and if you&amp;rsquo;re looking at the &lt;a href=&#34;https://en.wikipedia.org/wiki/Asteroid&#34;&gt;Wikipedia page for that&lt;/a&gt; my new scriptlet will work just fine. However, if you add the string &amp;ldquo;Rock&amp;rdquo; to the same base URL, you get the URL for a Wikipedia &lt;a href=&#34;https://en.wikipedia.org/wiki/Rock&#34;&gt;disambiguation page&lt;/a&gt;. If you are viewing the Wikipedia page for &lt;a href=&#34;https://en.wikipedia.org/wiki/Rock_(geology)&#34;&gt;Rock (geology)&lt;/a&gt;, my scriplet&amp;rsquo;s little bit of string manipulation that constructs a SPARQL query to send to the Wikidata endpoint won&amp;rsquo;t have enough to go on.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The scriplet is about 180 characters of JavaScript that does the following:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;For the current location in the browser (that is, the URL of the displayed Wikipedia page) replace any underscores with %2520. This is the escaped version of the escaped version of a space character, which I discovered is necessary through trial and error.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Escape the remainder of that URL as necessary.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Insert the result into a SPARQL query of the form &lt;code&gt;SELECT ?uri WHERE {&amp;lt;escaped-url&amp;gt; schema:about ?uri}&lt;/code&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Create a SPARQL endpoint GET request URL by appending all that to &amp;ldquo;&lt;a href=&#34;https://query.wikidata.org/sparql?query=%22&#34;&gt;https://query.wikidata.org/sparql?query=&amp;quot;&lt;/a&gt; and add &amp;ldquo;&amp;amp;format=json&amp;rdquo; at the end. (I tried &amp;ldquo;&amp;amp;format=csv&amp;rdquo; but instead of displaying the result Chrome offered to download it.)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Set &lt;code&gt;location.href&lt;/code&gt; to the result. This &amp;ldquo;sends&amp;rdquo; the browser to the constructed URL, which should then display the result of the query in JSON.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Once I could find the URIs to represent the resources I ws interested in, it was time to start querying for information about them. In my next blog entry, I&amp;rsquo;ll talk about exploring Wikidata and its RDF-related resources with SPARQL. There are definitely some great features there.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2017">2017</category>
      
      <category domain="https://www.bobdc.com//categories/rdf">RDF</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
      <category domain="https://www.bobdc.com//categories/wikidata">Wikidata</category>
      
    </item>
    
    <item>
      <title>Brand-name companies using SPARQL: the sparql.club</title>
      <link>https://www.bobdc.com/blog/brand-name-companies-using-spa/</link>
      <pubDate>Sun, 22 Jan 2017 09:37:51 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/brand-name-companies-using-spa/</guid>
      
      
      <description><div>Disney! Apple! Amazon! MasterCard!</div><div>&lt;p&gt;Since I wrote &lt;a href=&#34;https://www.bobdc.com/blog/experience-in-sparql-a-plus&#34;&gt;&amp;ldquo;Experience in SPARQL a plus&amp;rdquo;&lt;/a&gt; about SPARQL appearances in job postings almost three years ago, I still find myself &lt;a href=&#34;https://twitter.com/LearningSPARQL/status/731115175856594944&#34;&gt;pointing people to it&lt;/a&gt; to show them that SPARQL is not some academic theoretical thing but a popular tool in production use at well-known companies.&lt;/p&gt;
&lt;p&gt;On the job listing site &lt;a href=&#34;http://www.indeed.com&#34;&gt;indeed.com&lt;/a&gt;, I have a saved search for SPARQL mentions. The daily email of new search hits that this sends me typically lists a few entries for companies that I have heard of and some for companies that I haven&amp;rsquo;t. Every now and then I&amp;rsquo;ll pick out one to tweet about on &lt;a href=&#34;http://twitter.com/learningsparql&#34;&gt;@learningsparql&lt;/a&gt;, although I don&amp;rsquo;t do it nearly as often as I could.&lt;/p&gt;
&lt;p&gt;Between this ongoing stream of new job postings, the increasing age of that blog posting, and my ownership (inspired by Paul Ford&amp;rsquo;s &lt;a href=&#34;http://tilde.club/&#34;&gt;tilde.club&lt;/a&gt;) of the domain name &lt;a href=&#34;http://sparql.club/&#34;&gt;sparql.club&lt;/a&gt;, I thought it would be fun to keep an updated list there so that I can point the SPARQL haters at it.&lt;/p&gt;
&lt;p&gt;So the next time you see someone making ridiculous claims about SPARQL not catching on, tell them to check out the members of the &lt;a href=&#34;http://sparql.club/&#34;&gt;sparql.club&lt;/a&gt;!&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2017">2017</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
    </item>
    
    <item>
      <title>A modern neural network in 11 lines of Python</title>
      <link>https://www.bobdc.com/blog/a-modern-neural-network-in-11/</link>
      <pubDate>Thu, 22 Dec 2016 07:52:56 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/a-modern-neural-network-in-11/</guid>
      
      
      <description><div>And a great learning tool for understanding neural nets.</div><div>&lt;p&gt;&lt;a href=&#34;https://en.wikipedia.org/wiki/Perceptron&#34;&gt;&lt;img id=&#34;idm140324229593680&#34; src=&#34;https://www.bobdc.com/img/main/Mark_I_perceptron.jpeg&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;the mark I Perceptron&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;When you learn new technology, it&amp;rsquo;s common to hear &amp;ldquo;don&amp;rsquo;t worry about the low-level details&amp;ndash;use the tools!&amp;rdquo; That&amp;rsquo;s a good long-term strategy, but when you learn the lower-level details of how the tools work, it gives you a fuller understanding of what they can do for you. I decided to go through Andrew Trask&amp;rsquo;s &lt;a href=&#34;http://iamtrask.github.io/2015/07/12/basic-python-network/&#34;&gt;A Neural Network in 11 lines of Python&lt;/a&gt; to really learn how every line worked, and it&amp;rsquo;s been very helpful. I had to review some matrix math and look up several &lt;a href=&#34;http://www.numpy.org/&#34;&gt;numpy&lt;/a&gt; function calls that he uses, but it was worth it.&lt;/p&gt;
&lt;p&gt;My title here refers to it as a &amp;ldquo;modern neural network&amp;rdquo; because while neural nets have been around since the 1950s, the use of backpropagation, a sigmoid function and the sigmoid&amp;rsquo;s derivative in Andrew&amp;rsquo;s script highlight the advances that have made neural nets so popular in machine learning today. For some excellent background on how we got from Frank Rosenblatt&amp;rsquo;s 1957 hard-wired &lt;a href=&#34;https://en.wikipedia.org/wiki/Perceptron&#34;&gt;Mark I Perceptron&lt;/a&gt; (pictured here) to how derivatives and backpropagation addressed the limitations of these early neural nets, see Andrey Kurenkov&amp;rsquo;s &lt;a href=&#34;http://www.andreykurenkov.com/writing/a-brief-history-of-neural-nets-and-deep-learning/&#34;&gt;A &amp;lsquo;Brief&amp;rsquo; History of Neural Nets and Deep Learning, Part 1&lt;/a&gt;. The story includes a bit more drama than you might expect, with early AI pioneers Marvin Minsky and Seymour Papert convincing the community that limitations in the perceptron model would prevent neural nets from getting very far. I also recommend Michael Nielsen&amp;rsquo;s &lt;a href=&#34;http://neuralnetworksanddeeplearning.com/chap1.html&#34;&gt;Using neural nets to recognize handwritten digits&lt;/a&gt;, in particular the part on &lt;a href=&#34;http://neuralnetworksanddeeplearning.com/chap1.html#perceptrons&#34;&gt;perceptrons&lt;/a&gt;, which gives further background on that part of Kurenkov&amp;rsquo;s &amp;ldquo;Brief History,&amp;rdquo; and then Nielsen&amp;rsquo;s &lt;a href=&#34;http://neuralnetworksanddeeplearning.com/chap1.html#sigmoid_neurons&#34;&gt;sigmoid neurons&lt;/a&gt; part that follows it and describes how these limitations were addressed.&lt;/p&gt;
&lt;p&gt;Andrew&amp;rsquo;s 11-line neural network, with its lack of comments and whitespace, is more for show. The 42-line version that follows that is easier to follow and includes a great line-by-line explanation. Below are some of my own additional notes that I made as I dissected and played with his code. Often, I&amp;rsquo;m just restating something he already wrote but in my own words to try to understand it better. Hereafter, when I refer to his script, I mean the 42-line one.&lt;/p&gt;
&lt;p&gt;I took his advice of trying the script in an IPython (&lt;a href=&#34;https://www.bobdc.com/blog/sparql-in-a-jupyter-aka-ipytho&#34;&gt;Jupyter&lt;/a&gt;) notebook, where it was a lot easier to change some numbers (for example, the number of iterations in the main &lt;code&gt;for&lt;/code&gt; loop) and to add print statements that told me more about what was happening to the variables through the training step iterations. After playing with this a bit and reviewing his piece again, I realized that many of my experiments were things that he suggests in his bulleted list that begins with &amp;ldquo;Compare l1 after the first iteration and after the last iteration.&amp;rdquo; That whole list is good advice for learning more about how the script works.&lt;/p&gt;
&lt;p&gt;Beneath his script and above his line-by-line description he includes a chart explaining each variable&amp;rsquo;s role. As you read through the line-by-line description, I encourage you to refer back to that chart often.&lt;/p&gt;
&lt;p&gt;I have minimal experience with the numpy library, but based on the functions from Andrew&amp;rsquo;s script that I looked up, it seems typical that if you take a numpy function that does something to a number and pass it a data structure such as an array or matrix filled with numbers, it will do that thing to all the numbers and return the data structure.&lt;/p&gt;
&lt;p&gt;Line 23 of Andrew&amp;rsquo;s script initializes the weights that tell the neural net how much attention to pay to the input at each neuron. Ultimately, a neural net&amp;rsquo;s job is to tune these weights based on what it sees in how input (in this script&amp;rsquo;s case, the rows of &lt;code&gt;X&lt;/code&gt;) corresponds to output (the values of &lt;code&gt;y&lt;/code&gt;) so that when it later sees new input it will hopefully output the right things. When this script starts, it has no idea what values to use as weights, so it puts random values in, but not completely random&amp;ndash;as Andrew writes, they should have a mean of 0. The &lt;code&gt;np.random.random((x,y))&lt;/code&gt; function returns a matrix of &lt;code&gt;x&lt;/code&gt; rows of &lt;code&gt;y&lt;/code&gt; random numbers between 0 and 1, so &lt;code&gt;2*np.random.random((3,1))&lt;/code&gt; returns 3 rows with 1 number each between 0 and 2, and the &amp;ldquo;- 1&amp;rdquo; added to that makes them random numbers between -1 and 1.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;np.dot()&lt;/code&gt; returns dot products. I found the web page &lt;a href=&#34;https://www.mathsisfun.com/algebra/matrix-multiplying.html&#34;&gt;How to multiply matrices&lt;/a&gt; (that is, how to find their dot product) helpful in reviewing something I hadn&amp;rsquo;t thought about in a while. You can reproduce that page&amp;rsquo;s &amp;ldquo;Multiplying a Matrix by a Matrix&amp;rdquo; example using numpy with this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;matrix1 = np.array([[1,2,3],[4,5,6]])
matrix2 = np.array([[7,8],[9,10],[11,12]])
print(np.dot(matrix1,matrix2))
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The four lines of code in Andrew&amp;rsquo;s main loop perform three tasks:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;predict the output based on the input (&lt;code&gt;l0&lt;/code&gt;) and the current set of weights (&lt;code&gt;syn0&lt;/code&gt;)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;check how far off the predictions were&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;use that information to update the weights before proceeding to the next iteration&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;If you increase the number of iterations, you&amp;rsquo;ll see that first step get closer and closer to predicting an output of [[0][0][1][1]] in its final passes.&lt;/p&gt;
&lt;p&gt;Line 29 does its prediction by calculating the dot product of the input and the weights and then passing the result (a 4 x 1 matrix like [[-4.98467345] [-5.19108471] [ 5.39603866] [ 5.1896274 ]], as I learned from one of those extra &lt;code&gt;print&lt;/code&gt; statements I mentioned) to the sigmoid function named &lt;code&gt;nonlin()&lt;/code&gt; that is defined at the beginning of the script. If you graphed the values potentially returned by this function, they would not fall in a line (it&amp;rsquo;s &amp;ldquo;&lt;strong&gt;nonlin&lt;/strong&gt;ear&amp;rdquo;) but along an S (sigmoid) curve. Looking at the &lt;a href=&#34;https://en.wikipedia.org/wiki/Sigmoid_function&#34;&gt;Sigmoid function Wikipedia page&lt;/a&gt; shows that the expression &lt;code&gt;1/(1+np.exp(-x))&lt;/code&gt; that Andrew&amp;rsquo;s &lt;code&gt;nonlin()&lt;/code&gt; function uses to calculate the function&amp;rsquo;s return value (if the optional &lt;code&gt;deriv&lt;/code&gt; parameter has a value of False) corresponds to the formula shown near the top of the Wikipedia page. This &lt;code&gt;nonlin()&lt;/code&gt; function takes any number and returns a number between 0 and 1; as Andrew writes, &amp;ldquo;We use it to convert numbers to probabilities.&amp;rdquo; For example, if you pass a 0 to the function (or look at an S curve graph) you&amp;rsquo;ll see that the function returns .5; if you pass it a 4 or higher it returns a number very close to 1, and if you pass it a -4 or lower it returns a number very close to 0. The &lt;a href=&#34;https://docs.scipy.org/doc/numpy/reference/generated/numpy.exp.html&#34;&gt;&lt;code&gt;np.exp()&lt;/code&gt;&lt;/a&gt; function used within that expression calculates the exponential of the passed value&amp;ndash;or all the values in an array or matrix, returning the same data structure. For example, np.exp(1) returns the &lt;a href=&#34;https://en.wikipedia.org/wiki/Natural_logarithm&#34;&gt;natural logarithm&lt;/a&gt; &lt;code&gt;e&lt;/code&gt;, which is about 2.718.&lt;/p&gt;
&lt;p&gt;Line 29 calls that function and stores the returned matrix in the &lt;code&gt;l1&lt;/code&gt; variable. Reviewing the variable chart, this is the &amp;ldquo;Second Layer of the Network, otherwise known as the hidden layer.&amp;rdquo; Line 32 then subtracts the &lt;code&gt;l1&lt;/code&gt; matrix from &lt;code&gt;y&lt;/code&gt; (the array of answers that it was hoping to get) and stores the difference in &lt;code&gt;l1_error&lt;/code&gt;. (Subtracting matrices follows the basic pattern of &lt;code&gt;np.array([[5],[4],[3]]) - np.array([[1],[1],[1]]) = np.array([[4],[3],[2]])&lt;/code&gt;.)&lt;/p&gt;
&lt;p&gt;Remember how line 23 assigned random values to the weights? After line 32 executes, the &lt;code&gt;l1_error&lt;/code&gt; matrix has clues about how to tune those weights, so as the comments in lines 34 and 35 say, the script multiplies how much it missed (&lt;code&gt;l1_error&lt;/code&gt;) by the slope of the sigmoid at the values in &lt;code&gt;l1&lt;/code&gt;. We find that slope by passing &lt;code&gt;l1&lt;/code&gt; to the same &lt;code&gt;nonlin()&lt;/code&gt; function, but this time, setting the &lt;code&gt;deriv&lt;/code&gt; parameter to &lt;code&gt;True&lt;/code&gt; to get that slope. (See &amp;ldquo;using the derivatives&amp;rdquo; in Kurenkov&amp;rsquo;s &lt;a href=&#34;http://www.andreykurenkov.com/writing/a-brief-history-of-neural-nets-and-deep-learning/&#34;&gt;A &amp;lsquo;Brief&amp;rsquo; History&lt;/a&gt; for an explanation of why derivatives played such a big role in helping neural nets move beyond the simple perceptron models.) As Andrew writes, &amp;ldquo;When we multiply the &amp;lsquo;slopes&amp;rsquo; by the error, we are &lt;strong&gt;reducing the error of high confidence predictions&lt;/strong&gt;&amp;rdquo; (his emphasis). In other words, we&amp;rsquo;re putting more faith in those high confidence predictions when we create the data that will be used to update the weights.&lt;/p&gt;
&lt;p&gt;The script stores the result of multiplying the error by the slope in the &lt;code&gt;l1_delta&lt;/code&gt; variable and then uses the dot product of that and &lt;code&gt;l0&lt;/code&gt; (from the variable table: &amp;ldquo;First Layer of the Network, specified by the input data&amp;rdquo;) to update the weights stored in &lt;code&gt;syn0&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Per Harald Borgen&amp;rsquo;s &lt;a href=&#34;https://medium.com/learning-new-stuff/how-to-learn-neural-networks-758b78f2736e#.qkx5pzw2b&#34;&gt;Learning How To Code Neural Networks&lt;/a&gt; (which begins with an excellent description of the relationship of a neuron&amp;rsquo;s inputs to its weights and goes on to talk about how useful Andrew&amp;rsquo;s &amp;ldquo;A Neural Network in 11 lines of Python&amp;rdquo; is) says that backpropagation &amp;ldquo;essentially means that you look at how wrong the network guessed, and then adjust the networks weights accordingly.&amp;rdquo; When someone on Quora &lt;a href=&#34;https://www.quora.com/Which-is-your-favorite-Machine-Learning-Algorithm&#34;&gt;asked Yann LeCun&lt;/a&gt; (director of AI research at Facebook and one of the &lt;a href=&#34;http://www.cs.toronto.edu/~hinton/absps/NatureDeepReview.pdf&#34;&gt;Three Kings&lt;/a&gt; of Deep Learning) &amp;ldquo;Which is your favorite Machine Learning Algorithm?&amp;rdquo; his answer was a single eight-letter word: &amp;ldquo;backprop.&amp;rdquo; Backpropagation is that important to why neural nets have become so fundamental in so many modern computer applications, so the updating of &lt;code&gt;syn0&lt;/code&gt; in line 39 is very crucial here.&lt;/p&gt;
&lt;p&gt;And that&amp;rsquo;s it for the neural net training code. After the first iteration, the weighting values in &lt;code&gt;syn0&lt;/code&gt; will be a bit less random, and after 9,999 more iterations, they&amp;rsquo;ll be a lot closer to where you want them. I found that adding the following lines after line 29 gave me a better idea of what was happening in the &lt;code&gt;l1&lt;/code&gt; variable at the beginning and end of the script&amp;rsquo;s execution:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;   if (iter &amp;lt; 4 or iter &amp;gt; 9997):
        print(&amp;quot;np.dot(l0,syn0) at iteration &amp;quot; + str(iter) + &amp;quot;: &amp;quot; + str(np.dot(l0,syn0)))
        print(&amp;quot;l1 = &amp;quot; + str(l1))
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;(One note for people using Python 3, like I did: in addition to adding the parentheses in calls to the &lt;code&gt;print&lt;/code&gt; function, the main &lt;code&gt;for&lt;/code&gt; loop had to say just &amp;ldquo;range&amp;rdquo; instead of &amp;ldquo;xrange&amp;rdquo;. More on this at &lt;a href=&#34;http://stackoverflow.com/questions/15014310/why-is-there-no-xrange-function-in-python3&#34;&gt;stackoverflow&lt;/a&gt;.)&lt;/p&gt;
&lt;p&gt;These new lines showed that after the second iteration, l1 had these values, rounded to two decimal place here: [[ 0.26] [ 0.36] [ 0.23] [ 0.32 ]]. As Andrew&amp;rsquo;s output shows, at the very end, &lt;code&gt;l1&lt;/code&gt; equals [[ 0.00966449] [ 0.00786506] [ 0.99358898] [ 0.99211957]], so it got a lot closer to the [0,0,1,1] that it was shooting for. How can you make it get even closer? By increasing the iteration count to be greater than 10,000.&lt;/p&gt;
&lt;p&gt;For some real fun, I added the following after the script&amp;rsquo;s last line, because if you&amp;rsquo;re going to train a neural net on some data, why not then try the trained network (that is, the set of tuned weights) on some other data to see how well it performs? After all, Andrew does write &amp;ldquo;All of the learning is stored in the syn0 matrix.&amp;rdquo;&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;X1 = np.array([ [0,1,1], [1,1,0], [1,0,1],[1,1,1] ])  
x1prediction = nonlin(np.dot(X1,syn0))
print(x1prediction)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The first two rows of my new input are different from those in the training data. The &lt;code&gt;xlprediction&lt;/code&gt; variable ended up as [[ 0.00786466] [ 0.9999225 ] [ 0.99358931] [ 0.99211997]], which was great to see. Rounded, these are 0, 1, 1, and 1, so the neural net knew that for those first two rows of data&amp;ndash;which it hadn&amp;rsquo;t seen before&amp;ndash;the output should be the first value from each.&lt;/p&gt;
&lt;p&gt;Everything I describe here is from part 1 of Andrew&amp;rsquo;s exposition, &amp;ldquo;A Tiny Toy Network.&amp;rdquo; Part 2, &amp;ldquo;A Slightly Harder Problem&amp;rdquo; has a script that is eight lines longer (four lines if you don&amp;rsquo;t count white space and comments) and I plan to dig into that next, because among other things, it has a more explicit demo of backpropagation.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Image courtesy of &lt;a href=&#34;https://en.wikipedia.org/wiki/Perceptron&#34;&gt;Wikipedia&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2016">2016</category>
      
      <category domain="https://www.bobdc.com//categories/ai-and-machine-learning">AI and machine learning</category>
      
    </item>
    
    <item>
      <title>Pulling RDF out of MySQL</title>
      <link>https://www.bobdc.com/blog/pulling-rdf-out-of-mysql/</link>
      <pubDate>Sun, 13 Nov 2016 10:09:53 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/pulling-rdf-out-of-mysql/</guid>
      
      
      <description><div>With a command line option and a very short stylesheet.</div><div>&lt;img id=&#34;idm45376093529616&#34; src=&#34;https://www.bobdc.com/img/main/mysqlrdflogos.png&#34; border=&#34;0&#34; align=&#34;right&#34; hspace=&#34;30px&#34; alt=&#34;MySQL and RDF logos&#34; width=&#34;120&#34;/&gt;
&lt;p&gt;When I wrote the blog posting &lt;a href=&#34;https://www.bobdc.com/blog/my-sql-quick-reference&#34;&gt;My SQL quick reference&lt;/a&gt; last month, I showed how you can pass an SQL query to MySQL from the operating system command line when starting up MySQL, and also how adding a &lt;code&gt;-B&lt;/code&gt; switch requests a tab-separated version of the data. I did not mention that &lt;code&gt;-X&lt;/code&gt; requests it in XML, and that this XML is simple enough that a fifteen-line XSLT 1.0 spreadsheet can convert any such output to RDF.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve written before about how tools like the open source &lt;a href=&#34;http://d2rq.org/&#34;&gt;D2RQ&lt;/a&gt; and Capsenta&amp;rsquo;s &lt;a href=&#34;https://capsenta.com/&#34;&gt;Ultrawrap&lt;/a&gt; provide middleware layers that let you send SPARQL queries to relational databases&amp;ndash;and to &lt;a href=&#34;http://online.liebertpub.com/doi/full/10.1089/big.2012.0004#_i4&#34;&gt;combinations of relational databases from different vendors&lt;/a&gt;, which is where the real fun begins. This command line stylesheet trick gives you a simpler, more lightweight way to pull the relational data you want into an RDF file where you can use it with SPARQL or any other RDF tool.&lt;/p&gt;
&lt;p&gt;If you have MySQL and &lt;a href=&#34;https://linux.die.net/man/1/xsltproc&#34;&gt;xsltproc&lt;/a&gt; installed, you can do it all with a single command at the operating system prompt:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;mysql -u someuser --password=someuserpw -X -e &#39;USE employees; SELECT * FROM employees LIMIT 5&#39; | xsltproc mysql2ttl.xsl -
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;(Two notes about that command line: 1. don&amp;rsquo;t miss that hyphen at the very end, which tells xsltproc to read from standard in. 2. I added the LIMIT part for faster testing because the &lt;code&gt;employees&lt;/code&gt; table has 30,024 rows. To come up with that number of 30,024, I had to look at my last blog entry to remember how to count the table&amp;rsquo;s rows, so writing out that quick reference has already paid off for me.) The XML returned by MySQL looks like this, with data from subsequent rows following a similar pattern:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;  &amp;lt;resultset statement=&amp;quot;SELECT * FROM employees LIMIT 5&amp;quot;
      xmlns:xsi=&amp;quot;http://www.w3.org/2001/XMLSchema-instance&amp;quot;&amp;gt;
  &amp;lt;row&amp;gt;
    &amp;lt;field name=&amp;quot;emp_no&amp;quot;&amp;gt;10001&amp;lt;/field&amp;gt;
    &amp;lt;field name=&amp;quot;first_name&amp;quot;&amp;gt;Georgi&amp;lt;/field&amp;gt;
    &amp;lt;field name=&amp;quot;last_name&amp;quot;&amp;gt;Facello&amp;lt;/field&amp;gt;
    &amp;lt;field name=&amp;quot;birth_date&amp;quot;&amp;gt;1953-09-02&amp;lt;/field&amp;gt;
    &amp;lt;field name=&amp;quot;gender&amp;quot;&amp;gt;M&amp;lt;/field&amp;gt;
    &amp;lt;field name=&amp;quot;hire_date&amp;quot;&amp;gt;1986-06-26&amp;lt;/field&amp;gt;
    &amp;lt;field name=&amp;quot;department&amp;quot;&amp;gt;Development&amp;lt;/field&amp;gt;
  &amp;lt;/row&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I thought the inclusion of the query as an attribute of the &lt;code&gt;resultset&lt;/code&gt; attribute was a nice touch. The following XSLT stylesheet converts any such XML to Turtle RDF; you&amp;rsquo;ll want to adjust the prefix declarations to use URIs more appropriate to your data:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;lt;xsl:stylesheet version=&amp;quot;1.0&amp;quot; xmlns:xsl=&amp;quot;http://www.w3.org/1999/XSL/Transform&amp;quot;&amp;gt;


&amp;lt;xsl:output method=&amp;quot;text&amp;quot;/&amp;gt;


&amp;lt;xsl:template match=&amp;quot;resultset&amp;quot;&amp;gt;
  @prefix v: &amp;lt;http://learningsparql.com/ns/myVocabURI/&amp;gt; . 
  @prefix d: &amp;lt;http://learningsparql.com/ns/myDataURI/&amp;gt; . 
      &amp;lt;xsl:apply-templates/&amp;gt;
    &amp;lt;/xsl:template&amp;gt;


        &amp;lt;xsl:template match=&amp;quot;row&amp;quot;&amp;gt;
d:&amp;lt;xsl:value-of select=&amp;quot;count(preceding-sibling::row) + 1&amp;quot;/&amp;gt; 
          &amp;lt;xsl:apply-templates/&amp;gt; . 
        &amp;lt;/xsl:template&amp;gt;


    &amp;lt;xsl:template match=&amp;quot;field&amp;quot;&amp;gt;
      v:&amp;lt;xsl:value-of select=&amp;quot;@name&amp;quot;/&amp;gt; &amp;quot;&amp;lt;xsl:value-of select=&amp;quot;.&amp;quot;/&amp;gt;&amp;quot; ;
    &amp;lt;/xsl:template&amp;gt;


&amp;lt;/xsl:stylesheet&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The result includes some extra blank lines that I could suppress with &lt;code&gt;xsl:text&lt;/code&gt; elements wrapping certain bits of the stylesheet, but a Turtle parser doesn&amp;rsquo;t care, so neither do I:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;  d:1

    
      v:emp_no &amp;quot;10001&amp;quot; ;

    

    
      v:first_name &amp;quot;Georgi&amp;quot; ;

    

    
      v:last_name &amp;quot;Facello&amp;quot; ;

    

    
      v:birth_date &amp;quot;1953-09-02&amp;quot; ;

    

    
      v:gender &amp;quot;M&amp;quot; ;

    

    
      v:hire_date &amp;quot;1986-06-26&amp;quot; ;

    

    
      v:department &amp;quot;Development&amp;quot; ;

    
   . 
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;You can customize the stylesheet for specific input data. For example, the URIs in your triple subjects could build on an ID value selected from the data instead of building on the position of the XML &lt;code&gt;row&lt;/code&gt; element, as I did. As another customization, instead outputting all triple objects as strings, you could insert this template rule into the XSLT stylesheet to output the two date fields typed as actual dates, as long as you remembered to also add an &lt;code&gt;xsd&lt;/code&gt; prefix declaration at the top of the spreadsheet:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;    &amp;lt;xsl:template match=&amp;quot;field[@name=&#39;birth_date&#39; or @name=&#39;birth_date&#39;]&amp;quot;&amp;gt;
      v:&amp;lt;xsl:value-of select=&amp;quot;@name&amp;quot;/&amp;gt; &amp;quot;&amp;lt;xsl:value-of select=&amp;quot;.&amp;quot;/&amp;gt;&amp;quot;^^xsd:date ;
    &amp;lt;/xsl:template&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Or, you could leave the XSLT stylesheet in its generic form and convert the data types using a SPARQL query further down your processing pipeline with something like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;PREFIX v: &amp;lt;http://learningsparql.com/ns/myVocabURI/&amp;gt; 
PREFIX xsd: &amp;lt;http://www.w3.org/2001/XMLSchema#&amp;gt;


CONSTRUCT {
  ?row v:birth_date ?bdate ;
       v:hire_date ?hdate . 
}
WHERE {
  ?row v:birth_date ?bdateString ;
  v:hire_date ?hdateString . 
  BIND(xsd:date(?bdateString) AS ?bdate)
  BIND(xsd:date(?hdateString) AS ?hdate)
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;However you choose to do it, the nice thing is that you have lots of options for grabbing the massive amounts of data stored in the many MySQL databases out there and then using that data as triples with a variety of lightweight, open source software.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2016">2016</category>
      
      <category domain="https://www.bobdc.com//categories/rdf">RDF</category>
      
    </item>
    
    <item>
      <title>My SQL quick reference</title>
      <link>https://www.bobdc.com/blog/my-sql-quick-reference/</link>
      <pubDate>Sun, 30 Oct 2016 11:49:36 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/my-sql-quick-reference/</guid>
      
      
      <description><div>Pun intended.</div><div>&lt;p&gt;&lt;a href=&#34;https://www.flickr.com/photos/duncan/8749152201/in/photolist-6e4LKh-8ari4m-oGnN-um4W4-6z6vtc-4YqUM-ek8G8K-bPndYP-5fnqw-98uvoe-aiE4fq-62uR1H-5pMuiD-mrnJR-dc3BvJ-bPnpj2-7ARZNf-ynemY-e8UZD8-aiE4aN-4wcs8A-3nyAxk-doabzD-aiE4cQ-2erzo1-9rFiwj-9rFimf-9rCp5p-7bGugW-9rCndH-9LxX1-9rFmg3-9rCqw8-6nijTX-9rFkA7-7bCGsc-9rFkf1-9rFi6o-9rFnPN-7rGTAc-8uRMpb-9rFiHY-9rCpAB-9rFhCm-9rCrc8-9rFhRs-6nifBe-9rCnMD-6nnYzL-9rCpqe&#34;&gt;&lt;img id=&#34;idm46017889960000&#34; src=&#34;https://www.bobdc.com/img/main/sql.jpg&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;SQL graffiti&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I sometimes go many months with no need to use SQL, so over the years I&amp;rsquo;ve developed my own quick reference to remind me how to do basic tasks when necessary. Most SQL quick reference sheets out there try to pack as much different syntax as they can in a small space, but mine focuses on what the basic tasks are and how to do them. I hope that someone finds it useful.&lt;/p&gt;
&lt;p&gt;Most of my SQL experience has been with MySQL, and I separated what I believe are the standard SQL parts below from the MySQL-specific ones. Corrections welcome. If you really want to know where SQL implementations differ from the standard, &lt;a href=&#34;http://troels.arvin.dk/db/rdbms/&#34;&gt;Comparison of different SQL implementations&lt;/a&gt; is an excellent, detailed reference on what&amp;rsquo;s different from one implementation to another.&lt;/p&gt;
&lt;p&gt;I tested all the SELECT commands shown with the &lt;a href=&#34;https://dev.mysql.com/doc/employee/en/&#34;&gt;MySQL employee sample database&lt;/a&gt; that I downloaded from &lt;a href=&#34;https://github.com/datacharmer/test_db&#34;&gt;github&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;(I also later converted this to be the &lt;a href=&#34;https://learnxinyminutes.com/docs/sql/&#34;&gt;SQL&lt;/a&gt; page for the wonderful &lt;a href=&#34;https://learnxinyminutes.com/&#34;&gt;Learn X in Y Minutes&lt;/a&gt; site; that page has since been translated to Spanish, Italian, Russian, Turkish, and Chinese!)&lt;/p&gt;
&lt;h2 id=&#34;idm46017889953792&#34;&gt;Standard SQL&lt;/h2&gt;
&lt;p&gt;Enter these at the SQL command line. I don&amp;rsquo;t think semicolons are necessary after every one of these commands, but I find it simplest to just always add them. SQL is not case-sensitive about keywords, and I tend to enter them in lower-case, but I&amp;rsquo;m showing them in the conventional upper-case here because it makes it easier to distinguish them from database, table, and column names.&lt;/p&gt;
&lt;table id=&#34;idm46017889952592&#34; border=&#34;1&#34; style=&#34;border: 1px solid; border-collapse: collapse; border-spacing: 10px; text-align: left;&#34;&gt;
&lt;tr id=&#34;idm46017889951728&#34; &gt;&lt;td style=&#34;padding: 8px; background: white;&#34; id=&#34;idm46017889951600&#34; width=&#34;30%&#34;&gt;quit to return to the operating system command line&lt;/td&gt;&lt;td style=&#34;padding: 8px; background: white;&#34; id=&#34;idm46017889950992&#34;&gt;&lt;tt id=&#34;idm46017889950816&#34;&gt;quit;&lt;/tt&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr id=&#34;idm46017889950400&#34;&gt;&lt;td style=&#34;padding: 8px; background: white; &#34; id=&#34;idm46017889950272&#34;&gt;list available databases&lt;/td&gt;
&lt;td style=&#34;padding: 8px; background: white;&#34; id=&#34;idm46017889949840&#34;&gt;&lt;tt id=&#34;idm46017889949712&#34;&gt;# comments start with a pound sign&lt;/tt&gt;&lt;br id=&#34;idm46017889949360&#34;/&gt;
&lt;tt id=&#34;idm46017889949104&#34;&gt;SHOW DATABASES;&lt;/tt&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr id=&#34;idm46017889948688&#34;&gt;&lt;td style=&#34;padding: 8px; background: white;&#34; id=&#34;idm46017889948560&#34;&gt;select the database named &lt;tt id=&#34;idm46017889948256&#34;&gt;employees&lt;/tt&gt; to use&lt;/td&gt;&lt;td style=&#34;padding: 8px; background: white;&#34; id=&#34;idm46017889947808&#34;&gt;&lt;tt id=&#34;idm46017889947680&#34;&gt;USE employees;&lt;/tt&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr id=&#34;idm46017889947264&#34;&gt;&lt;td style=&#34;padding: 8px; background: white;&#34; id=&#34;idm46017889947136&#34;&gt;create a new database called &lt;tt id=&#34;idm46017889946832&#34;&gt;someDatabase&lt;/tt&gt;&lt;/td&gt;
&lt;td style=&#34;padding: 8px; background: white;&#34; id=&#34;idm46017889946416&#34;&gt;&lt;tt id=&#34;idm46017889946288&#34;&gt;# database and table names are case-sensitive&lt;/tt&gt;&lt;br id=&#34;idm46017889945968&#34;/&gt;
&lt;tt id=&#34;idm46017889945712&#34;&gt;CREATE DATABASE someDatabase;&lt;/tt&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr id=&#34;idm46017889945280&#34;&gt;&lt;td style=&#34;padding: 8px; background: white;&#34; id=&#34;idm46017889945152&#34;&gt;delete database &lt;tt id=&#34;idm46017889944864&#34;&gt;someDatabase&lt;/tt&gt;&lt;/td&gt;&lt;td style=&#34;padding: 8px; background: white;&#34; id=&#34;idm46017889944576&#34;&gt;&lt;tt id=&#34;idm46017889944448&#34;&gt;DROP DATABASE someDatabase;&lt;/tt&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr id=&#34;idm46017889944016&#34;&gt;&lt;td style=&#34;padding: 8px; background: white;&#34; id=&#34;idm46017889943888&#34;&gt;create a table called &lt;tt id=&#34;idm46017889943600&#34;&gt;tablename1&lt;/tt&gt;, with the two columns shown, for the database currently in use&lt;/td&gt;
&lt;td style=&#34;padding: 8px; background: white;&#34; id=&#34;idm46017889942976&#34;&gt;&lt;tt id=&#34;idm46017889942848&#34;&gt;# lots of other options available for how you specify the columns...&lt;/tt&gt;&lt;br id=&#34;idm46017889942512&#34;/&gt;
&lt;tt id=&#34;idm46017889942256&#34;&gt;CREATE TABLE  tablename1 (&#39;fname&#39; VARCHAR(20),&#39;lname&#39; VARCHAR(20));&lt;br/&gt;
# The apostrophes in the line above should be backticks (`). &lt;br/&gt;
# Hugo&#39;s rendering engine won&#39;t let me put them there. 
&lt;/tt&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr id=&#34;idm46017889941792&#34;&gt;&lt;td style=&#34;padding: 8px; background: white;&#34; id=&#34;idm46017889941664&#34;&gt;insert a row of data into the table &lt;tt id=&#34;idm46017889941360&#34;&gt;tablename1&lt;/tt&gt;&lt;/td&gt;&lt;td style=&#34;padding: 8px; background: white;&#34; id=&#34;idm46017889941072&#34;&gt;&lt;tt id=&#34;idm46017889940944&#34;&gt;INSERT INTO tablename1 VALUES(&#39;Richard&#39;,&#39;Mutt&#39;);&lt;/tt&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr id=&#34;idm46017889940496&#34;&gt;&lt;td style=&#34;padding: 8px; background: white;&#34; id=&#34;idm46017889940368&#34;&gt;delete the table &lt;tt id=&#34;idm46017889940080&#34;&gt;tablename1&lt;/tt&gt;&lt;/td&gt;&lt;td style=&#34;padding: 8px; background: white;&#34; id=&#34;idm46017889939792&#34;&gt;&lt;tt id=&#34;idm46017889939664&#34;&gt;DROP TABLE tablename1;&lt;/tt&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr id=&#34;idm46017889939248&#34;&gt;&lt;td style=&#34;padding: 8px; background: white;&#34; id=&#34;idm46017889939120&#34;&gt;show all data in the &lt;tt id=&#34;idm46017889938832&#34;&gt;departments&lt;/tt&gt; table&lt;/td&gt;&lt;td style=&#34;padding: 8px; background: white;&#34; id=&#34;idm46017889938384&#34;&gt;&lt;tt id=&#34;idm46017889938256&#34;&gt;SELECT * FROM departments;&lt;/tt&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr id=&#34;idm46017889937824&#34;&gt;&lt;td style=&#34;padding: 8px; background: white;&#34; id=&#34;idm46017889937696&#34;&gt;show just the &lt;code id=&#34;idm46017889937408&#34;&gt;dept_no&lt;/code&gt; and &lt;code id=&#34;idm46017889936960&#34;&gt;dept_name&lt;/code&gt; columns from the &lt;tt id=&#34;idm46017889936512&#34;&gt;departments&lt;/tt&gt; table&lt;/td&gt;&lt;td style=&#34;padding: 8px; background: white;&#34; id=&#34;idm46017889936064&#34;&gt;&lt;tt id=&#34;idm46017889935936&#34;&gt;SELECT dept_no, dept_name FROM departments;&lt;/tt&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr id=&#34;idm46017889935488&#34;&gt;&lt;td style=&#34;padding: 8px; background: white;&#34; id=&#34;idm46017889935360&#34;&gt;just get the first 5 rows from table &lt;tt id=&#34;idm46017889935056&#34;&gt;departments&lt;/tt&gt;&lt;/td&gt;&lt;td style=&#34;padding: 8px; background: white;&#34; id=&#34;idm46017889934768&#34;&gt;&lt;tt id=&#34;idm46017889934640&#34;&gt;SELECT * FROM departments LIMIT 5;&lt;/tt&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr id=&#34;idm46017889934208&#34;&gt;&lt;td style=&#34;padding: 8px; background: white;&#34; id=&#34;idm46017889934080&#34;&gt;show dept_name column values in table &lt;tt id=&#34;idm46017889933776&#34;&gt;departments&lt;/tt&gt; where dept_name has the substring &#34;en&#34;&lt;/td&gt;&lt;td style=&#34;padding: 8px; background: white;&#34; id=&#34;idm46017889933312&#34;&gt;&lt;tt id=&#34;idm46017889933184&#34;&gt;SELECT dept_name FROM departments WHERE dept_name LIKE &#34;%en%&#34;;&lt;/tt&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr id=&#34;idm46017889932336&#34;&gt;&lt;td style=&#34;padding: 8px; background: white;&#34; id=&#34;idm46017889932208&#34;&gt;show all columns from table &lt;tt id=&#34;idm46017889931952&#34;&gt;departments&lt;/tt&gt; where the dept_name column starts with an &#34;S&#34; and has exactly 4  characters after it&lt;/td&gt;&lt;td style=&#34;padding: 8px; background: white;&#34; id=&#34;idm46017889931440&#34;&gt;&lt;tt id=&#34;idm46017889931312&#34;&gt;SELECT * FROM departments WHERE dept_name LIKE &#34;S____&#34;;&lt;/tt&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr id=&#34;idm46017889930864&#34;&gt;&lt;td style=&#34;padding: 8px; background: white;&#34; id=&#34;idm46017889930736&#34;&gt;Select &lt;tt id=&#34;idm46017889930448&#34;&gt;title&lt;/tt&gt; values from the &lt;tt id=&#34;idm46017889930000&#34;&gt;titles&lt;/tt&gt; table but don&#39;t show duplicates&lt;/td&gt;&lt;td style=&#34;padding: 8px; background: white;&#34; id=&#34;idm46017889929536&#34;&gt;&lt;tt id=&#34;idm46017889929408&#34;&gt;SELECT DISTINCT title FROM titles;&lt;/tt&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr id=&#34;idm46017889928976&#34;&gt;&lt;td style=&#34;padding: 8px; background: white;&#34; id=&#34;idm46017889928848&#34;&gt;Same as above, but sorted (case-sensitive) by the &lt;tt id=&#34;idm46017889928528&#34;&gt;title&lt;/tt&gt; values&lt;/td&gt;&lt;td style=&#34;padding: 8px; background: white;&#34; id=&#34;idm46017889928080&#34;&gt;&lt;tt id=&#34;idm46017889927952&#34;&gt;SELECT DISTINCT title FROM titles ORDER BY title;&lt;/tt&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr id=&#34;idm46017889927504&#34;&gt;&lt;td style=&#34;padding: 8px; background: white;&#34; id=&#34;idm46017889927376&#34;&gt;Count the rows in the &lt;tt id=&#34;idm46017889927088&#34;&gt;departments&lt;/tt&gt; table&lt;/td&gt;&lt;td style=&#34;padding: 8px; background: white;&#34; id=&#34;idm46017889926640&#34;&gt;&lt;tt id=&#34;idm46017889926512&#34;&gt;SELECT count(*) FROM departments;&lt;/tt&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr id=&#34;idm46017889926080&#34;&gt;&lt;td style=&#34;padding: 8px; background: white;&#34; id=&#34;idm46017889925952&#34;&gt;Count the rows in the &lt;tt id=&#34;idm46017889925664&#34;&gt;departments&lt;/tt&gt; table that have &#34;en&#34; as a substring of the &lt;tt id=&#34;idm46017889925184&#34;&gt;dept_name&lt;/tt&gt; value&lt;/td&gt;&lt;td style=&#34;padding: 8px; background: white;&#34; id=&#34;idm46017889924736&#34;&gt;&lt;tt id=&#34;idm46017889924608&#34;&gt;SELECT count(*) FROM departments WHERE dept_name LIKE &#34;%en%&#34;;&lt;/tt&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr id=&#34;idm46017889924144&#34;&gt;&lt;td style=&#34;padding: 8px; background: white;&#34; id=&#34;idm46017889924016&#34;&gt;In &lt;tt id=&#34;idm46017889923712&#34;&gt;tablename1&lt;/tt&gt;, change the &lt;code id=&#34;idm46017889923264&#34;&gt;fname&lt;/code&gt; value to &#34;John&#34; for all rows that have an &lt;code id=&#34;idm46017889922784&#34;&gt;lname&lt;/code&gt; value of &#34;Mutt&#34;&lt;/td&gt;&lt;td style=&#34;padding: 8px; background: white;&#34; id=&#34;idm46017889922336&#34;&gt;&lt;tt id=&#34;idm46017889922208&#34;&gt;UPDATE tablename1 SET fname=&#34;John&#34; WHERE lname=&#34;Mutt&#34;;&lt;/tt&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr id=&#34;idm46017889921760&#34;&gt;&lt;td style=&#34;padding: 8px; background: white;&#34; id=&#34;idm46017889921632&#34;&gt;delete all rows from the &lt;tt id=&#34;idm46017889921328&#34;&gt;tablename1&lt;/tt&gt; table&lt;/td&gt;&lt;td style=&#34;padding: 8px; background: white;&#34; id=&#34;idm46017889920880&#34;&gt;&lt;tt id=&#34;idm46017889920752&#34;&gt;DELETE FROM tablename1;&lt;/tt&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr id=&#34;idm46017889920336&#34;&gt;&lt;td style=&#34;padding: 8px; background: white;&#34; id=&#34;idm46017889920208&#34;&gt;delete rows from the &lt;tt id=&#34;idm46017889919920&#34;&gt;tablename1&lt;/tt&gt; table where the &lt;tt id=&#34;idm46017889919472&#34;&gt;lname&lt;/tt&gt; value begins with &#34;M&#34;&lt;/td&gt;&lt;td style=&#34;padding: 8px; background: white;&#34; id=&#34;idm46017889919024&#34;&gt;&lt;tt id=&#34;idm46017889918896&#34;&gt;DELETE FROM tablename1 WHERE lname like &#34;M%&#34;;&lt;/tt&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;
&lt;h2 id=&#34;idm46017889918320&#34;&gt;MySQL-specific SQL prompt commands&lt;/h2&gt;
&lt;table id=&#34;idm46017889917888&#34; border=&#34;1&#34; style=&#34;border: 1px solid; border-collapse: collapse; border-spacing: 10px; text-align: left;&#34;&gt;
&lt;tr id=&#34;idm46017889917072&#34;&gt;&lt;td style=&#34;padding: 8px; background: white;&#34; id=&#34;idm46017889916944&#34; width=&#34;30%&#34;&gt;list the tables in the currently selected database&lt;/td&gt;&lt;td style=&#34;padding: 8px; background: white;&#34; id=&#34;idm46017889916384&#34;&gt;&lt;tt id=&#34;idm46017889916256&#34;&gt;SHOW TABLES;&lt;/tt&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr id=&#34;idm46017889915840&#34;&gt;&lt;td style=&#34;padding: 8px; background: white;&#34; id=&#34;idm46017889915712&#34;&gt;Describe the columns in table &lt;tt id=&#34;idm46017889915408&#34;&gt;departments&lt;/tt&gt; (handy before doing SELECT statements to see column names and types)&lt;/td&gt;&lt;td style=&#34;padding: 8px; background: white;&#34; id=&#34;idm46017889914912&#34;&gt;&lt;tt id=&#34;idm46017889914784&#34;&gt;DESCRIBE departments;&lt;/tt&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr id=&#34;idm46017889914368&#34;&gt;&lt;td style=&#34;padding: 8px; background: white;&#34; id=&#34;idm46017889914240&#34;&gt;run the SQL commands stored in the file myscript.sql&lt;/td&gt;&lt;td style=&#34;padding: 8px; background: white;&#34; id=&#34;idm46017889913920&#34;&gt;&lt;tt id=&#34;idm46017889913792&#34;&gt;SOURCE myscript.sql;&lt;/tt&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr id=&#34;idm46017889913376&#34;&gt;&lt;td style=&#34;padding: 8px; background: white;&#34; id=&#34;idm46017889913248&#34;&gt;Load a local csv file (enabling this may require  &lt;tt id=&#34;idm46017889912928&#34;&gt;--local-infile&lt;/tt&gt; with the mysql startup command or the adjustment of a config file)&lt;/td&gt;
&lt;td style=&#34;padding: 8px; background: white;&#34; id=&#34;idm46017889912304&#34;&gt;&lt;tt id=&#34;idm46017889912176&#34;&gt;# Enter the following as one command&lt;br id=&#34;idm46017889911872&#34;/&gt;
LOAD DATA LOCAL INFILE &#39;/some/path/names.csv&#39; INTO TABLE tablename1 COLUMNS TERMINATED BY &#39;,&#39;;&lt;/tt&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr id=&#34;idm46017889911376&#34;&gt;&lt;td style=&#34;padding: 8px; background: white;&#34; id=&#34;idm46017889911248&#34;&gt;Create new user &lt;tt id=&#34;idm46017889910960&#34;&gt;jane&lt;/tt&gt; with password &lt;tt id=&#34;idm46017889910512&#34;&gt;janepw&lt;/tt&gt;, then grant her access to everything&lt;/td&gt;&lt;td style=&#34;padding: 8px; background: white;&#34; id=&#34;idm46017889910048&#34;&gt;&lt;tt id=&#34;idm46017889909920&#34;&gt;CREATE USER &#39;jane&#39; IDENTIFIED BY &#39;janepw&#39;;&lt;br id=&#34;idm46017889909600&#34;/&gt;
GRANT ALL ON *.* TO &#39;jane&#39;;
&lt;/tt&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;
&lt;h2 id=&#34;idm46017889909040&#34;&gt;Handy MySQL commands from the operating system prompt&lt;/h2&gt;
&lt;p&gt;There are often multiple ways to execute some of the following tasks, but these work for me. Treat all as single-line commands.&lt;/p&gt;
&lt;table id=&#34;idm46017889908064&#34; border=&#34;1&#34; style=&#34;border: 1px solid; border-collapse: collapse; border-spacing: 10px; text-align: left;&#34;&gt;
&lt;tr id=&#34;idm46017889907248&#34;&gt;&lt;td style=&#34;padding: 8px; background: white;&#34; id=&#34;idm46017889907120&#34; width=&#34;30%&#34;&gt;start up MySQL with a single command (which includes the plain text password, which is not a good idea for any kind of production system)&lt;/td&gt;&lt;td style=&#34;padding: 8px; background: white;&#34; id=&#34;idm46017889906464&#34;&gt;&lt;tt id=&#34;idm46017889906336&#34;&gt;mysql -u someuser --password=somepassword&lt;/tt&gt;&lt;/td&gt;&lt;/tr&gt;
  &lt;tr id=&#34;idm46017889905888&#34;&gt;&lt;td style=&#34;padding: 8px; background: white;&#34; id=&#34;idm46017889905760&#34;&gt;Run a script of SQL commands  from the operating system command line and then return to the command line; output of the SELECT statements will be tab-delimited&lt;/td&gt;&lt;td style=&#34;padding: 8px; background: white;&#34; id=&#34;idm46017889905328&#34;&gt;&lt;tt id=&#34;idm46017889905200&#34;&gt;mysql -u someuser --password=somepassword -t &amp;lt; employees.sql&lt;/tt&gt;&lt;/td&gt;&lt;/tr&gt;
  &lt;tr id=&#34;idm46017889904592&#34;&gt;&lt;td style=&#34;padding: 8px; background: white;&#34; id=&#34;idm46017889904464&#34;&gt;create a file of SQL commands to recreate the database employees (with the employees demo database, this created a 168MB file)&lt;/td&gt;&lt;td style=&#34;padding: 8px; background: white;&#34; id=&#34;idm46017889904064&#34;&gt;&lt;tt id=&#34;idm46017889903936&#34;&gt;mysqldump -u someuser --password=somepassword employees &amp;gt; makeemployees.sql&lt;/tt&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr id=&#34;idm46017889903456&#34;&gt;&lt;td style=&#34;padding: 8px; background: white;&#34; id=&#34;idm46017889903328&#34;&gt;Run a SQL command (or more than one using a semicolon to separate them) from the operating system prompt&lt;/td&gt;&lt;td style=&#34;padding: 8px; background: white;&#34; id=&#34;idm46017889902944&#34;&gt;&lt;tt id=&#34;idm46017889902816&#34;&gt;mysql -u someuser --password=somepassword -e &#39;USE employees; SELECT * FROM departments&#39;
&lt;/tt&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr id=&#34;idm46017889902320&#34;&gt;&lt;td style=&#34;padding: 8px; background: white;&#34; id=&#34;idm46017889902192&#34;&gt;Same as above, but getting output as tab-separated values--only difference is to add &lt;tt id=&#34;idm46017889901744&#34;&gt;-B&lt;/tt&gt; for &#34;batch&#34; mode&lt;/td&gt;&lt;td style=&#34;padding: 8px; background: white;&#34; id=&#34;idm46017889901280&#34;&gt;&lt;tt id=&#34;idm46017889901152&#34;&gt;mysql -u someuser --password=somepassword -B -e &#39;USE employees; SELECT  * FROM departments&#39;&lt;/tt&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;
&lt;h2 id=&#34;idm46017889900144&#34;&gt;Other handy tricks, as covered in the MySQL documentation&lt;/h2&gt;
&lt;p&gt;The MySQL documentation&amp;rsquo;s &lt;a href=&#34;http://dev.mysql.com/doc/refman/en/examples.html&#34;&gt;Examples of Common Queries&lt;/a&gt; covers many additional useful tasks:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The Maximum Value for a Column&lt;/li&gt;
&lt;li&gt;The Row Holding the Maximum of a Certain Column&lt;/li&gt;
&lt;li&gt;Maximum of Column per Group&lt;/li&gt;
&lt;li&gt;The Rows Holding the Group-wise Maximum of a Certain Column&lt;/li&gt;
&lt;li&gt;Using User-Defined Variables&lt;/li&gt;
&lt;li&gt;Using Foreign Keys&lt;/li&gt;
&lt;li&gt;Searching on Two Keys&lt;/li&gt;
&lt;li&gt;Calculating Visits Per Day&lt;/li&gt;
&lt;li&gt;Using AUTO_INCREMENT&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Note that my URL for the link to this information doesn&amp;rsquo;t include a version number, but gets redirected by mysql.com to the URL for the latest release&amp;rsquo;s version of this documentation, as documentation URLs should do.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://creativecommons.org/licenses/by-nc/2.0/&#34;&gt;CC BY-NC&lt;/a&gt; &lt;a href=&#34;https://www.flickr.com/photos/duncan/8749152201/in/photolist-6e4LKh-8ari4m-oGnN-um4W4-6z6vtc-4YqUM-ek8G8K-bPndYP-5fnqw-98uvoe-aiE4fq-62uR1H-5pMuiD-mrnJR-dc3BvJ-bPnpj2-7ARZNf-ynemY-e8UZD8-aiE4aN-4wcs8A-3nyAxk-doabzD-aiE4cQ-2erzo1-9rFiwj-9rFimf-9rCp5p-7bGugW-9rCndH-9LxX1-9rFmg3-9rCqw8-6nijTX-9rFkA7-7bCGsc-9rFkf1-9rFi6o-9rFnPN-7rGTAc-8uRMpb-9rFiHY-9rCpAB-9rFhCm-9rCrc8-9rFhRs-6nifBe-9rCnMD-6nnYzL-9rCpqe&#34;&gt;photo&lt;/a&gt; by &lt;a href=&#34;https://www.flickr.com/photos/duncan/&#34;&gt;duncan&lt;/a&gt;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2016">2016</category>
      
      <category domain="https://www.bobdc.com//categories/data-science">data science</category>
      
    </item>
    
    <item>
      <title>Semantic web semantics vs. vector embedding machine learning semantics</title>
      <link>https://www.bobdc.com/blog/semantic-web-semantics-vs-vect/</link>
      <pubDate>Sun, 25 Sep 2016 11:01:39 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/semantic-web-semantics-vs-vect/</guid>
      
      
      <description><div>It&#39;s all semantics.</div><div>&lt;img id=&#34;idm140335378958656&#34; src=&#34;https://www.bobdc.com/img/main/homersemantics.png&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;Home and semantics&#34; width=&#34;300&#34;/&gt;
&lt;p&gt;When I presented &amp;ldquo;intro to the semantic web&amp;rdquo; slides in &lt;a href=&#34;http://www.topquadrant.com&#34;&gt;TopQuadrant&lt;/a&gt; product training classes, I described how people talking about &amp;ldquo;semantics&amp;rdquo; in the context of semantic web technology mean something specific, but that other claims for computerized semantics (especially, in many cases, &amp;ldquo;semantic search&amp;rdquo;) were often vague attempts to use the word as a marketing term. Since joining &lt;a href=&#34;http://www.ccri.com&#34;&gt;CCRi&lt;/a&gt;, though, I&amp;rsquo;ve learned plenty about machine learning applications that use semantics to get real work done (often, &amp;ldquo;semantic search&amp;rdquo;), and they can do some great things.&lt;/p&gt;
&lt;h2 id=&#34;idm140335378953440&#34;&gt;Semantic Web semantics&lt;/h2&gt;
&lt;p&gt;To review the semantic web sense of &amp;ldquo;semantics&amp;rdquo;: RDF gives us a way to state facts using {subject, predicate, object} triples. RDFS and OWL give us vocabularies to describe the resources referenced in these triples, and the descriptions can record semantics about those resources that let us get more out of the data. Of course, the descriptions themselves are triples, letting us say things like &lt;code&gt;{ex:Employee rdfs:subClassOf ex:Person}&lt;/code&gt;, which tells us that any instance of the &lt;code&gt;ex:Employee&lt;/code&gt; class is also an instance of &lt;code&gt;ex:Person&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;That example indicates some of the semantics of what it means to be an employee, but people familiar with object-oriented development take that ability for granted. OWL can take the recording of semantics well beyond that. For example, because properties themselves are resources, when I say &lt;code&gt;{dm:locatedIn  rdf:type owl:TransitiveProperty}&lt;/code&gt;, I&amp;rsquo;m encoding some of the meaning of the &lt;code&gt;dm:locatedIn&lt;/code&gt; property in a machine-readable way: I&amp;rsquo;m saying that it&amp;rsquo;s transitive, so that if &lt;code&gt;{x:resource1 dm:locatedIn x:resource2}&lt;/code&gt; and &lt;code&gt;{x:resource2 dm:locatedIn x:resource3}&lt;/code&gt;, we can infer that &lt;code&gt;{x:resource1 dm:locatedIn x:resource3}&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;A tool that understands what &lt;code&gt;owl:TransitiveProperty&lt;/code&gt; means will let me get more out of my data. My blog entry &lt;a href=&#34;https://www.bobdc.com/blog/trying-out-blazegraph&#34;&gt;Trying Out Blazegraph&lt;/a&gt; from earlier this year showed how I took advantage of OWL metadata to query for all the furniture in a particular building even though the dataset had no explicit data about any resources being furniture or any resources being in that building other than some rooms.&lt;/p&gt;
&lt;p&gt;This is all built on very explicit semantics: we use triples to say things about resources so that people and applications can understand and do more with those resources. The interesting semantics work in the machine learning world is more about inferring semantic relationships.&lt;/p&gt;
&lt;h2 id=&#34;idm140335378944400&#34;&gt;Semantics and embedded vector spaces&lt;/h2&gt;
&lt;p&gt;(All suggestions for corrections to this section are welcome.) Machine learning is essentially the use of data-driven algorithms that perform better as they have more data to work with, &amp;ldquo;learning&amp;rdquo; from this additional data. For example, Netflix can make better recommendations to you now than they could ten years ago because the additional accumulated data about what you like to watch and what other people with similar tastes have also watched gives Netflix more to go on when making these recommendations.&lt;/p&gt;
&lt;p&gt;The world of &lt;a href=&#34;https://en.wikipedia.org/wiki/Distributional_semantics&#34;&gt;distributional semantics&lt;/a&gt; shows that analysis of what words appear with what other words, in what order, can tell us a lot about these words and their relationships—if you analyze enough text. Let&amp;rsquo;s say we begin by using a neural network to assign a vector of numbers to each word. This creates a collection of vectors known as a &amp;ldquo;vector space&amp;rdquo;; adding vectors to this space is known as &amp;ldquo;embedding&amp;rdquo; them. Performing linear algebra on these vectors can provide insight about the relationships between the words that the vectors represent. In the most popular example, the mathematical relationship between the vectors for the words &amp;ldquo;king&amp;rdquo; and &amp;ldquo;queen&amp;rdquo; is very similar to the relationship between the vectors for &amp;ldquo;man&amp;rdquo; and &amp;ldquo;woman&amp;rdquo;. This diagram from the TensorFlow tutorial &lt;a href=&#34;https://www.tensorflow.org/versions/r0.10/tutorials/word2vec/index.html&#34;&gt;Vector Representations of Words&lt;/a&gt; shows that other identified relationships include grammatical and geographical ones:&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://www.tensorflow.org/versions/r0.10/tutorials/word2vec/index.html&#34;&gt;&lt;img id=&#34;idm140335378939248&#34; src=&#34;https://www.bobdc.com/img/main/linear-relationships.png&#34; border=&#34;0&#34; style=&#34;display: block;margin-left: auto;margin-right: auto &#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;TensorFlow diagram about inferred word relationships&#34; width=&#34;500&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The popular open source &lt;a href=&#34;https://github.com/dav/word2vec&#34;&gt;word2vec&lt;/a&gt; implementation of this developed at Google includes a script that lets you do analogy queries. (The TensorFlow tutorial mentioned above uses word2vec; another great way to get hands-on experience with word vectors is Radim Rehurek&amp;rsquo;s &lt;a href=&#34;https://radimrehurek.com/gensim/tutorial.html&#34;&gt;gensim tutorial&lt;/a&gt;.) I installed word2vec on an Ubuntu machine easily enough, started up the demo-analogy.sh script, and it prompted me to enter three words. I entered &amp;ldquo;king queen father&amp;rdquo; to ask it &amp;ldquo;king is to queen as father is to what?&amp;rdquo; It gave me a list of 40 word-score pairs with these at the top:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;     mother    0.698822
    husband    0.553576
     sister    0.552917
        her    0.548955
grandmother    0.529910
       wife    0.526212
    parents    0.512507
   daughter    0.509455
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Entering &amp;ldquo;london england berlin&amp;rdquo; produced a list that began with this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;   germany     0.522487
   prussia     0.482481
   austria     0.447184
    saxony     0.435668
   bohemia     0.429096
westphalia     0.407746
     italy     0.406134
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I entered &amp;ldquo;run ran walk&amp;rdquo; in the hope of seeing &amp;ldquo;walked&amp;rdquo; but got a list that began like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;   hooray      0.446358
    rides      0.445045
ninotchka      0.444158
searchers      0.442369
   destry      0.435961
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;It did a pretty good job with most of these, but obviously not a great job throughout. The past tense of walk is definitely not &amp;ldquo;hooray&amp;rdquo;, but these inferences were based on a training data set of 96 megabytes, which isn&amp;rsquo;t very large. A Google search on phrases from the text8 input file included with word2vec for this demo shows that it&amp;rsquo;s probably part of a &lt;a href=&#34;http://www.mattmahoney.net/dc/textdata&#34;&gt;2006 Wikipedia dump&lt;/a&gt; used for text compression tests and other processes that need a non-trivial text collection. More serious applications of word2vec often read much larger Wikipedia subsets as training data, and of course you&amp;rsquo;re not limited to using Wikipedia data: the exploration of other datasets that use a variety of spoken languages and scripts is one of the most interesting aspects of these early days of the use of this technology.&lt;/p&gt;
&lt;p&gt;The one-to-one relationships shown in the TensorFlow diagrams above make the inferred relationships look more magical than they are. As you can see from the results of my queries, word2vec finds the words that are closest to what you asked for and lists them with their scores, and you may have several with good scores or none. Your application can just pick the result with the highest score, but you might want to first set an acceptable cutoff value so that you don&amp;rsquo;t take the &amp;ldquo;hooray&amp;rdquo; inference too seriously.&lt;/p&gt;
&lt;p&gt;On the other hand, if you just pick the single result with the highest score, you might miss some good inferences, because while Berlin is the capital of Germany, it was also the capital of Prussia for over 200 years, so I was happy to see that get the second-highest score there—although, if we put too much faith in a score of 0.482481 (or even of 0.522487) we&amp;rsquo;re going to get some &amp;ldquo;king queen father&amp;rdquo; answers that we don&amp;rsquo;t want. Again, a bigger training data set would help there.&lt;/p&gt;
&lt;p&gt;If you look at the &lt;a href=&#34;https://github.com/dav/word2vec/blob/master/scripts/demo-analogy.sh&#34;&gt;demo-analogy.sh&lt;/a&gt; script itself, you&amp;rsquo;ll see various parameters that you can tweak when creating the vector data. The use of larger training sets is not the only thing that can improve the results above, and machine learning expertise means not only getting to know the algorithms that are available but also learning how to tune parameters like these.&lt;/p&gt;
&lt;p&gt;The script is simple enough that I saw that I could easily revise it to make it read some other file instead of the text8 one included with it. I set it to read the &lt;a href=&#34;https://en.wikipedia.org/wiki/Summa_Theologica&#34;&gt;Summa Theologica&lt;/a&gt;, in which St. Thomas Aquinas laid out all the theology of the Catholic Church, as I made grand plans for Big Question analogy queries like &amp;ldquo;man is to soul as God is to what?&amp;rdquo; My eventual query results were a lot more like the &amp;ldquo;run ran walk hooray&amp;rdquo; results above than anything sensible, with low scores for what it did find. With my text file of the complete Summa Thelogica weighing in at 17 megabytes, I was clearly hoping for too much from it. I do have ideas for other input to try and I encourage you to try it for yourself.&lt;/p&gt;
&lt;p&gt;An especially exciting thing about the use of embedding vectors to identify potentially previously unknown relationships is that it&amp;rsquo;s not limited to use on text. You can use it with images, video, audio, and any other machine readable data, and at CCRi, we have. (I&amp;rsquo;m using the marketing &amp;ldquo;we&amp;rdquo; here; if you&amp;rsquo;ve read this far you&amp;rsquo;re familiar with all of my hands-on experience with embedding vectors.)&lt;/p&gt;
&lt;h2 id=&#34;idm140335378921232&#34;&gt;Embedding vector space semantics and semantic web semantics&lt;/h2&gt;
&lt;p&gt;Can there be any connection between these two &amp;ldquo;semantic&amp;rdquo; technologies? RDF-based models are designed to take advantage of explicit semantics, and a program like word2vec can infer semantic relationships and make them explicit. Modifications to the scripts included with word2vec could output OWL or SKOS triples that enumerate relationships between identified resources, making a nice contribution to the many systems using SKOS taxonomies and thesauruses. Another possibility is that if you can train a machine learning model with instances (for example, labeled pictures of dogs and cats) that are identified with declared classes in an ontology, then running the model on new data can do classifications that take advantage of the ontology—for example, after identifying new cat and dog pictures, a query for mammals can find them.&lt;/p&gt;
&lt;p&gt;Going the other way, machine learning systems designed around unstructured text can often do even more with structured text, where it&amp;rsquo;s easier to find what you want, and I&amp;rsquo;ve learned at CCRi that RDF (if not RDFS or OWL) is much more popular among such applications than I realized. Large taxonomies such as those of the &lt;a href=&#34;http://id.loc.gov/download/&#34;&gt;Library of Congress&lt;/a&gt;, &lt;a href=&#34;http://wiki.dbpedia.org/&#34;&gt;DBpedia&lt;/a&gt;, and &lt;a href=&#34;https://www.wikidata.org/wiki/Wikidata:Main_Page&#34;&gt;Wikidata&lt;/a&gt; have lots of synonyms, explicit subclass relationships, and sometimes even definitions, and they can contribute a great deal to these applications.&lt;/p&gt;
&lt;p&gt;A well-known success story in combining the two technologies is IBM&amp;rsquo;s Watson. The paper &lt;a href=&#34;http://www.aclweb.org/anthology/W13-3413&#34;&gt;Semantic Technologies in IBM Watson&lt;/a&gt; describes the technologies used in Watson and how these technologies formed the basis of a seminar course given at Columbia University; distributional semantics, semantic web technology, and DBpedia all play a role. Frederick Giasson and Mike Bergman&amp;rsquo;s &lt;a href=&#34;http://www.mkbergman.com/1981/cognonto-is-on-the-hunt-for-big-ai-game/&#34;&gt;Cognonto&lt;/a&gt; also looks like an interesting project to connect machine learning to large collections of triples. I&amp;rsquo;m sure that other interesting combinations are happening around the world, especially considering the amount of open source software available in both areas.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2016">2016</category>
      
      <category domain="https://www.bobdc.com//categories/ai-and-machine-learning">AI and machine learning</category>
      
      <category domain="https://www.bobdc.com//categories/semantic-web">semantic web</category>
      
    </item>
    
    <item>
      <title>Converting between MIDI and RDF: readable MIDI and more fun with RDF</title>
      <link>https://www.bobdc.com/blog/converting-between-midi-and-rd/</link>
      <pubDate>Sun, 28 Aug 2016 12:24:21 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/converting-between-midi-and-rd/</guid>
      
      
      <description><div>Listen to my fun!</div><div>&lt;img id=&#34;idm140614557973488&#34; src=&#34;https://www.bobdc.com/img/main/midirdf.png&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;MIDI and RDF logos&#34;/&gt;
&lt;p&gt;When I first heard about Albert Meroño-Peñuela and Rinke Hoekstra&amp;rsquo;s &lt;a href=&#34;https://github.com/albertmeronyo/midi2rdf&#34;&gt;midi2rdf&lt;/a&gt; project, which converts back and forth between the venerable &lt;a href=&#34;https://en.wikipedia.org/wiki/MIDI&#34;&gt;Musical Instrument Digital Interface&lt;/a&gt; binary format and RDF, at first I thought it seemed like an interesting academic exercise. Thinking about it more, I realized that it makes a great contribution to both the MIDI world and to musical RDF geeks.&lt;/p&gt;
&lt;p&gt;MIDI has been the standard protocol for integrating synthesizers and related musical equipment together since the 1980s. I&amp;rsquo;ve only recently thrown out a book with the MIDI specs that I&amp;rsquo;ve owned for nearly that long because, as with so many other technical specifications, they&amp;rsquo;re now available &lt;a href=&#34;https://www.midi.org/specifications&#34;&gt;online&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Meroño-Peñuela and Hoekstra&amp;rsquo;s midi2rdf lets you convert between MIDI files and Turtle RDF. I love the title of their ESWC 2016 paper on it, &amp;ldquo;The Song Remains the Same&amp;rdquo; (&lt;a href=&#34;http://2016.eswc-conferences.org/sites/default/files/papers/Accepted%20Posters%20and%20Demos/ESWC2016_DEMO_The_Song_Remains_the_Same.pdf&#34;&gt;pdf&lt;/a&gt;)&amp;ndash;I was pretty young when Led Zeppelin&amp;rsquo;s &lt;a href=&#34;https://www.youtube.com/watch?v=8w3emvHepgU&#34;&gt;Houses of the Holy&lt;/a&gt; album came out, but I remember it vividly. The song remains the same because the project&amp;rsquo;s midi2rdf and rdf2midi scripts provide lossless round trip compression between the two formats, which makes it a very valuable tool: it gives us a text file serialization of MIDI based on a published standard, which makes MIDI downright readable. Looking at these RDF files and spending no serious time with the MIDI spec, I worked out which resources and properties were doing what and used this to create my own MIDI files.&lt;/p&gt;
&lt;p&gt;As a somewhat musical RDF geek, this was a lot of fun. I wrote Python scripts to generate different Turtle files of different kinds of random music, then converted them to MIDI so that I could listen to them. (You can find it all in &lt;a href=&#34;https://github.com/bobdc/misc/tree/master/midirdffun&#34;&gt;github&lt;/a&gt;.) The use of random functions means that running the same script several times creates different variations on the music. Below you will find links to MP3 versions of what I called fakeBebop and two versions of some whole-tone piano music that I generated, along with the MIDI and RDF files that go with them.&lt;/p&gt;
&lt;p&gt;Each MIDI file (and its RDF equivalent) starts with some setup data to identify information such as the sounds that it will play and the tempo. Instead of learning all those setup details for my program to generate, I used the excellent Linux/Mac/Windows open source &lt;a href=&#34;https://musescore.com/&#34;&gt;MuseScore&lt;/a&gt; music scoring program to generate a MIDI file with just a few notes of whatever instruments I wanted and then converted that to RDF. (This ability to convert in both directions is is an important part of the value of the midi2rdf package.) Then, keeping the setup part of that RDF, I deleted the actual notes and had my script copy the setup part and then generate new notes that it appended to the setup part.&lt;/p&gt;
&lt;p&gt;In RDF terms, the note generation meant two things: adding a pair of &lt;code&gt;mid:NoteOnEvent&lt;/code&gt; resources (one to start playing a note and one to stop) and then adding references to those events onto a musical track listing the events to execute. So, for example, the first &lt;code&gt;mid:NoteOnEvent&lt;/code&gt; in the following pair defines the start of of a note at pitch 69, which is A above middle C on a piano. The &lt;code&gt;mid:channel&lt;/code&gt; of 0 had been defined in the setup part, and the &lt;code&gt;mid:tick&lt;/code&gt; value specifies how long the note will play until the next &lt;code&gt;mid:NoteOnEvent&lt;/code&gt;. (I was too lazy to look up how the &lt;code&gt;mid:tick&lt;/code&gt; values relate to elapsed time and picked some through trial and error.) The &lt;code&gt;mid:velocity&lt;/code&gt; values essentially turn the note on and off.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;p2:event0104 a mid:NoteOnEvent ;
    mid:channel 0 ;
    mid:pitch 69 ;
    mid:tick 400 ;
    mid:velocity 80 .


p2:event0105 a mid:NoteOnEvent ;
    mid:channel 0 ;
    mid:pitch 69 ;
    mid:tick 500 ;
    mid:velocity 0 .
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;As my script outputs noteOn events after the setup part, it appends references to them onto a string in memory that begins like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;mid:pianoHeadertrack01 a mid:Track ;
    mid:hasEvent p2:event0000,
        p2:event0001,
        p2:event0002,
        p2:event0003,
        # etc. until you finish with a period
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;After outputting all the &lt;code&gt;mid:NoteOnEvent&lt;/code&gt; events, the script outputs this string. (While the triples in this resource are technically unordered, rdf2midi seemed to assume that the event names are &amp;ldquo;event&amp;rdquo; followed by a zero-padded number. When an early version of my first script didn&amp;rsquo;t do this, the notes got played in an odd order. Maybe it&amp;rsquo;s just playing them in alphabetic sort order.)&lt;/p&gt;
&lt;p&gt;That&amp;rsquo;s all for just one track. My fakeBebop script does this for three tracks: a bass track playing fairly random quarter notes in the range of an upright bass, a muted trumpet track playing fairly random triplet-feel eighth notes (sometimes with a rest substituted), and a percussion track repeating a standard bebop ride cymbal pattern. You can see some generated Turtle RDF at &lt;a href=&#34;https://github.com/bobdc/misc/blob/master/midirdffun/fakeBebop.ttl&#34;&gt;fakeBebop.ttl&lt;/a&gt;, the MIDI file generated from the Turtle file by midi2rdf at &lt;a href=&#34;https://github.com/bobdc/misc/blob/master/midirdffun/fakeBebop.mid&#34;&gt;fakeBebop.mid&lt;/a&gt;, and listen to what it sounds like at &lt;a href=&#34;http://www.snee.com/bobdc.blog/files/fakeBebop.mp3&#34;&gt;fakeBebop.mp3&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;By &amp;ldquo;fairly random&amp;rdquo; I mean a random note within 5 half steps (a major third) of the previous note. Without any melodies beyond this random selection of notes, I think it still sounds a bit beboppy because, as the early bebop pioneers added more complex scales to the simple major and minor scales played by earlier jazz musicians, it all got more chromatic.&lt;/p&gt;
&lt;p&gt;I have joked with my brother about how if you quietly play random notes on a piano with both hands using the same whole tone scale, it can sound a bit like Debussy, who was one of the &lt;a href=&#34;https://en.wikipedia.org/wiki/List_of_pieces_which_use_the_whole_tone_scale&#34;&gt;early users&lt;/a&gt; of this scale. My wholeTonePianoQuarterNotes.py script follows logic similar to the fakeBebop script but outputs two piano tracks that correspond to a piano player&amp;rsquo;s left and right hands and use the same whole tone scale. You can see some generated Turtle RDF at &lt;a href=&#34;https://github.com/bobdc/misc/blob/master/midirdffun/wholeTonePianoQuarterNotes.ttl&#34;&gt;wholeTonePianoQuarterNotes.ttl&lt;/a&gt;, the MIDI file generated from that by rdf2midi at &lt;a href=&#34;https://github.com/bobdc/misc/blob/master/midirdffun/wholeTonePianoQuarterNotes.mid&#34;&gt;wholeTonePianoQuarterNotes.mid&lt;/a&gt;, and hear what it sounds like at &lt;a href=&#34;http://www.snee.com/bobdc.blog/files/wholeTonePianoQuarterNotes.mp3&#34;&gt;wholeTonePianoQuarterNotes.mp3&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Before doing the whole tone piano quarter notes script I did one with random note durations, so it sounds like something from a bit later in the twentieth century. Generated Turtle RDF: &lt;a href=&#34;https://github.com/bobdc/misc/blob/master/midirdffun/wholeTonePiano.ttl&#34;&gt;wholeTonePiano.ttl&lt;/a&gt;; MIDI file generated by rdf2midi: &lt;a href=&#34;https://github.com/bobdc/misc/blob/master/midirdffun/wholeTonePiano.mid&#34;&gt;wholeTonePiano.mid&lt;/a&gt;; MP3: &lt;a href=&#34;http://www.snee.com/bobdc.blog/files/wholeTonePiano.mp3&#34;&gt;wholeTonePiano.mp3&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I can think of all kinds of ideas for additional experiments, such as redoing the two piano experiments with the four voices of a string quartet or having the fakeBebop one generate &lt;a href=&#34;http://www.jazzguitar.be/jazz_chord_progressions.html&#34;&gt;common jazz chord progressions&lt;/a&gt; and typical licks over them. (Speaking of string quartets and Debussy, I love that &lt;a href=&#34;http://www.popisms.com/TelevisionCommercial/127626/Apple-Commercial-for-Apple-iPad-Pro-2016.aspx&#34;&gt;Apple iPad Pro ad&lt;/a&gt; that NBC showed so often during the recent Olympics.) It would also be interesting to try some experiments with &lt;a href=&#34;https://en.wikipedia.org/wiki/Black_MIDI&#34;&gt;Black MIDI&lt;/a&gt; (or perhaps &amp;ldquo;Black RDF&amp;rdquo;!). If I had pursued these ideas, I wouldn&amp;rsquo;t be writing this blog entry right now, because I had to cut myself off at some point.&lt;/p&gt;
&lt;p&gt;I &lt;a href=&#34;http://thebridgepai.org/benjamin-obrien-concert-supercollider-workshop/&#34;&gt;recently learned&lt;/a&gt; about &lt;a href=&#34;http://supercollider.github.io/&#34;&gt;Supercollider&lt;/a&gt;, an open source Windows/Mac/Linux IDE with its own programming language that several serious electronic music composers use for generating music, and I could easily picture spending all of my free time playing with that. At least midi2rdf&amp;rsquo;s RDF basis gave me the excuse of having a work-related angle as I wrote scripts to generate odd music. Although I was just slapping together some demo code for fun, I do think that midi2rdf&amp;rsquo;s ability to provide lossless round-trip conversion between a popular old binary music format and a readable standardized format has a lot of potential to help people doing music with computers.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2016">2016</category>
      
      <category domain="https://www.bobdc.com//categories/rdf">RDF</category>
      
      <category domain="https://www.bobdc.com//categories/music">music</category>
      
    </item>
    
    <item>
      <title>SPARQL in a Jupyter (a.k.a. IPython) notebook</title>
      <link>https://www.bobdc.com/blog/sparql-in-a-jupyter-aka-ipytho/</link>
      <pubDate>Sun, 31 Jul 2016 10:15:07 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/sparql-in-a-jupyter-aka-ipytho/</guid>
      
      
      <description><div>With just a bit of Python to frame it all.</div><div>&lt;p&gt;In a recent blog entry for my employer titled &lt;a href=&#34;http://www.ccri.com/2016/06/28/geomesa-analytics-jupyter-notebook/&#34;&gt;GeoMesa analytics in a Jupyter notebook&lt;/a&gt;, I wrote&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;As described on its &lt;a href=&#34;http://jupyter.org/&#34;&gt;home page&lt;/a&gt;, “The Jupyter Notebook is a web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, machine learning and much more.” Once you install the open source Jupyter server on your machine, you can create notebooks, share them with others, and learn from notebooks created by others. (You can also learn from others’ notebooks without installing Jupyter locally if those notebooks are hosted on a shared server.)&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;An animated GIF below that passage shows a sample mix of formatted text and executable Python code in a short Jupyter notebook, and it also demonstrates how code blocks can be tweaked, run in place, and build on previous code blocks. The blog entry goes on to describe how we at CCRi embedded Scala code in a Jupyter notebook to demonstrate the use of Apache Spark with the Hadoop-based &lt;a href=&#34;http://www.geomesa.org/&#34;&gt;GeoMesa&lt;/a&gt; spatio-temporal database to perform data analysis and visualization.&lt;/p&gt;
&lt;p&gt;Jupyter supports over 40 languages besides Scala and Python, but not SPARQL. I realized recently, though, that with a minimum of Python code (Python being the original language for these notebooks; &amp;ldquo;Jupyter&amp;rdquo; was originally called &amp;ldquo;IPython&amp;rdquo;) someone who hardly knows Python can enter and run SPARQL queries in a Jupyter notebook.&lt;/p&gt;
&lt;p&gt;I created a Jupyter notebook that you can download and try yourself called &lt;a href=&#34;https://github.com/bobdc/misc/blob/master/JupyterSPARQL/JupyterSPARQLFun.ipynb&#34;&gt;JupyterSPARQLFun&lt;/a&gt;. If you look at the raw version of the file you&amp;rsquo;ll see a lot of JSON, but if you follow that link you&amp;rsquo;ll see that github renders the notebook the same way that a Jupyter server does, so you can read through the notebook and see all the formatted explanations with the code and the results.&lt;/p&gt;
&lt;p&gt;If you did download the notebook and run it on a Jupyter server (and installed the rdflib and RDFClosure python libraries), you could edit the cells that have executable code, rerun them, and see the results, just like in the animated GIF mentioned above. In the case of this notebook, you&amp;rsquo;d be doing SPARQL manipulation of an RDF graph from your copy of the notebook. (I used the &lt;a href=&#34;https://www.continuum.io/downloads&#34;&gt;Anaconda&lt;/a&gt; Jupyter distribution. It was remarkably difficult to find out from their website how to start up Jupyter, but I did find out from the &lt;a href=&#34;https://jupyter-notebook-beginner-guide.readthedocs.io/en/latest/execute.html&#34;&gt;Jupyter Notebook Beginner Guide&lt;/a&gt; that you just enter &amp;ldquo;jupyter notebook&amp;rdquo; at the command line. When working with a notebook, you&amp;rsquo;ll also find &lt;a href=&#34;http://johnlaudun.org/20131228-ipython-notebook-keyboard-shortcuts/&#34;&gt;this list of keyboard shortcuts&lt;/a&gt; to be handy.)&lt;/p&gt;
&lt;p&gt;I won&amp;rsquo;t go into great detail here about what&amp;rsquo;s in the JupyterSPARQLFun notebook, because much of the point of these notebooks is that their ability to mix formatted text with executable code lets people take explanation of code to a new level. So, to find out how I got SPARQL and inferencing working in the notebook, I recommend that you just read the explanations and code that I put in it.&lt;/p&gt;
&lt;p&gt;I mentioned above how you can learn from others’ notebooks; some nice examples accompany the &lt;a href=&#34;https://www.youtube.com/watch?v=elojMnjn4kk&amp;amp;list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A&#34;&gt;Data School Machine Learning videos&lt;/a&gt; on YouTube. These videos demonstrate various concepts by adding and running code within notebooks, adding explanatory text as well along the way. Because I could download the &lt;a href=&#34;https://github.com/justmarkham/scikit-learn-videos&#34;&gt;finished notebooks&lt;/a&gt; created in the videos, I could run all the example code myself, in place, with no need to copy it from one place and paste it to another. I could also tweak the code samples to try different variations, which made for some much more hands-on learning of the machine learning concepts being demonstrated.&lt;/p&gt;
&lt;p&gt;That experience really showed me the power of Jupyter notebooks, and it&amp;rsquo;s great to see that with just a little setup Python code, we can do SPARQL querying and RDF inferencing inside these notebooks as well.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://github.com/bobdc/misc/blob/master/JupyterSPARQL/JupyterSPARQLFun.ipynb&#34;&gt;&lt;img src=&#34;https://www.bobdc.com/img/main/jupytersparql1.png&#34; width=&#34;640&#34; border=&#34;0&#34;   style=&#34;display: block; margin-left: auto; margin-right: auto; &#34;/&gt;&lt;/a&gt;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2016">2016</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
    </item>
    
    <item>
      <title>Emoji SPARQL😝!</title>
      <link>https://www.bobdc.com/blog/emoji-sparql/</link>
      <pubDate>Sun, 12 Jun 2016 11:46:31 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/emoji-sparql/</guid>
      
      
      <description><div>If emojis have Unicode code points, then we can...</div><div>&lt;p&gt;I knew that emojis have Unicode code points, but it wasn&amp;rsquo;t until I saw &lt;a href=&#34;http://i.imgur.com/Tb26fCb.jpg&#34;&gt;this goofy picture&lt;/a&gt; in a chat room at work that I began to wonder about using emojis in RDF data and SPARQL queries. I have since learned that the relevant specs are fine with it, but as with the simple display of emojis on non-mobile devices, the tools you use to work with these characters (and the tools used to build those tools) aren&amp;rsquo;t always as cooperative as you&amp;rsquo;d hope.&lt;/p&gt;
&lt;p&gt;After hunting around a bit among these tools, I did have some with fun this. Black and white emojis, as shown in the &lt;strong&gt;Browser&lt;/strong&gt; column of the unicode.org &lt;a href=&#34;http://unicode.org/emoji/charts/emoji-list.html&#34;&gt;Emoji Data&lt;/a&gt; page, display with no problem in my Ubuntu terminal window and in web page forms, but I wanted the full-color emojis from that page&amp;rsquo;s &lt;strong&gt;Sample&lt;/strong&gt; column. The Emacs &lt;a href=&#34;https://github.com/iqbalansari/emacs-emojify&#34;&gt;Emojify mode&lt;/a&gt; did the trick, so what you see below are screen shots from there.&lt;/p&gt;
&lt;img id=&#34;idm45192433970336&#34; src=&#34;https://www.bobdc.com/img/main/sparqlemoji1.png&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;sample RDF with emoji&#34; width=&#34;200&#34;/&gt;
&lt;p&gt;I started by converting that same unicode.org web page (as opposed to the site&amp;rsquo;s much larger &lt;a href=&#34;http://unicode.org/emoji/charts/full-emoji-list.html&#34;&gt;Full Emoji Data&lt;/a&gt; page) to a Turtle file called emoji-list.ttl with a short perl script. (You can find both in github at &lt;a href=&#34;https://github.com/bobdc/misc/tree/master/emojirdf&#34;&gt;emojirdf&lt;/a&gt;.) On the right, you can see triples from that web page&amp;rsquo;s row about the french fries emoji. For the keywords assigned to each character, the Emoji Data web page has links, so it was tempting to use the link destinations as URI values for the &lt;code&gt;lse:annotation&lt;/code&gt; values instead of strings, but some of those link destinations have local names like &lt;a href=&#34;http://unicode.org/emoji/charts/emoji-annotations.html#+1&#34;&gt;+1&lt;/a&gt;, which won&amp;rsquo;t make for nice URIs in RDF triples.&lt;/p&gt;
&lt;p&gt;I thought about augmenting my emoji-list.ttl file to turn it into an emoji ontology. I first dutifully searched for &amp;ldquo;emoji rdf&amp;rdquo; on Google (which asked me &amp;ldquo;did you mean emoji pdf? emoji def?&amp;rdquo;) to avoid the reinvention of any wheels. The most promising search result was an &lt;a href=&#34;https://github.com/oarrabi/EmojiOntology&#34;&gt;Emoji Ontology&lt;/a&gt; that adds some interesting metadata to the emojis, but its &lt;a href=&#34;https://github.com/oarrabi/EmojiOntology/blob/master/deliverables/FinalEmoji.owl&#34;&gt;Final emoji ontology in OWL/XML format&lt;/a&gt; has little to do with OWL or even RDF, and I didn&amp;rsquo;t feel like writing the XSLT to convert its additional metadata to proper RDF.&lt;/p&gt;
&lt;p&gt;With no proper emoji ontology already available, I thought more about creating my own by adding triples that would arrange the emojis into a hierarchical ontology or taxonomy. This would let me say that the ant 🐜 and the honeybee 🐝 are both insects, and that the ox 🐂 and the many, many cats are mammals, and then I could query for animals and see them all or query for insects and see just first two. This would add little, though, because the existing annotation values already serve as a non-hierarchical tagging system that identifies insects, so I could just query for those &lt;code&gt;lse:annotation&lt;/code&gt; values.&lt;/p&gt;
&lt;p&gt;Some of these annotation values led to some fun queries of the emoji-list.ttl file. I used Dave Beckett&amp;rsquo;s &lt;a href=&#34;http://librdf.org/rasqal/roqet.html&#34;&gt;Redlands roqet&lt;/a&gt; as a query processor, telling it to give me CSV data that I redirected to a file. Here&amp;rsquo;s a query asking for the character and label of any emojis that have both &amp;ldquo;face&amp;rdquo; and &amp;ldquo;cold&amp;rdquo; in their annotation values:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;PREFIX lse:  &amp;lt;http://learningsparq.com/emoji/&amp;gt; 
PREFIX rdfs: &amp;lt;http://www.w3.org/2000/01/rdf-schema#&amp;gt;


SELECT ?char ?label
WHERE 
{ 
  ?s lse:annotation &#39;face&#39;, &#39;cold&#39; ;
     rdfs:label ?label ;
     lse:char ?char .
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;It returned this result, showing that &amp;ldquo;cold&amp;rdquo; can refer to both low temperature and wintertime sniffles:&lt;/p&gt;
&lt;img id=&#34;idm45192433957120&#34; src=&#34;https://www.bobdc.com/img/main/sparqlemoji2.png&#34; border=&#34;0&#34; style=&#34;display: block;margin-left: auto;margin-right: auto &#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;result of first SPARQL emoji query&#34; width=&#34;400&#34;/&gt;
&lt;p&gt;This next query uses emojis in string data to ask which annotations have tagged both the alien head and one of the moon face emojis:&lt;/p&gt;
&lt;img id=&#34;idm45192433954448&#34; src=&#34;https://www.bobdc.com/img/main/sparqlemoji3.png&#34; border=&#34;0&#34; style=&#34;display: block;margin-left: auto;margin-right: auto &#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;SPARQL query&#34; width=&#34;400&#34;/&gt;
&lt;p&gt;(Apparently, Emacs SPARQL mode thinks that the &amp;ldquo;not&amp;rdquo; in &amp;ldquo;annotation&amp;rdquo; is the SPARQL keyword, because it resets the substring&amp;rsquo;s font color.) Here is the query result; note that, as is typical with many query tools, the first row is the variable name, not a returned value:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;annotation
face
nature
space
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Emoji Unicode code points range from x1F600 to x1F1FF, which SPARQL spec productions &lt;a href=&#34;https://www.w3.org/TR/sparql11-query/#rPN_CHARS_BASE%20&#34;&gt;164&lt;/a&gt; - 166 say are legal for use in variable names. The following query requests the satellite dish character&amp;rsquo;s annotation values and stores them in a variable whose three-character name is three emojis:&lt;/p&gt;
&lt;img id=&#34;idm45192433949744&#34; src=&#34;https://www.bobdc.com/img/main/sparqlemoji4.png&#34; border=&#34;0&#34; style=&#34;display: block;margin-left: auto;margin-right: auto &#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;SPARQL emoji query&#34; width=&#34;400&#34;/&gt;
&lt;p&gt;Here is our result:&lt;/p&gt;
&lt;img id=&#34;idm45192433947168&#34; src=&#34;https://www.bobdc.com/img/main/sparqlemoji5.png&#34; border=&#34;0&#34; style=&#34;display: block;margin-left: auto;margin-right: auto &#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;SPARQL query result&#34; width=&#34;130&#34;/&gt;
&lt;p&gt;This is actually why I used roqet—the Java-based SPARQL engines that I first tried may have implemented the spec faithfully, but some layer of the Java tooling underneath them couldn&amp;rsquo;t handle the full extent of Unicode in every place where it should.&lt;/p&gt;
&lt;p&gt;Emojis in RDF data are not limited to quoted strings. When I told roqet to run a query against this next Turtle file, which uses emoji characters as prefixes and as subject and predicate local names in its one triple, it had no problem:&lt;/p&gt;
&lt;img id=&#34;idm45192433943456&#34; src=&#34;https://www.bobdc.com/img/main/sparqlemoji6.png&#34; border=&#34;0&#34; style=&#34;display: block;margin-left: auto;margin-right: auto &#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;Turtle file with emoji properties&#34; width=&#34;400&#34;/&gt;
&lt;p&gt;This final query went even further, and roqet had no problem with it: it defines a bowl of spaghetti emoji as a namespace prefix and then, using emojis for the variable names, asks for the subjects and objects of any triples that have the predicate from the one triple in the Turtle file above.&lt;/p&gt;
&lt;img id=&#34;idm45192433940592&#34; src=&#34;https://www.bobdc.com/img/main/sparqlemoji7.png&#34; border=&#34;0&#34; style=&#34;display: block;margin-left: auto;margin-right: auto &#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;Turtle file with emoji properties&#34; width=&#34;400&#34;/&gt;
&lt;p&gt;Of course, it&amp;rsquo;s difficult to read, and the fact that running the query and even just displaying it required me to dig around for the right combination of tools doesn&amp;rsquo;t speak well for the use of emojis in queries. Besides being a fun exercise, though, the experience and the result—that it all ultimately worked—provided a nice testament to the design of the Unicode, RDF, and SPARQL standards.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2016">2016</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
      <category domain="https://www.bobdc.com//categories/neat-tricks">neat tricks</category>
      
    </item>
    
    <item>
      <title>Trying out Blazegraph</title>
      <link>https://www.bobdc.com/blog/trying-out-blazegraph/</link>
      <pubDate>Tue, 17 May 2016 08:17:00 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/trying-out-blazegraph/</guid>
      
      
      <description><div>Especially inferencing.</div><div>&lt;p&gt;I&amp;rsquo;ve been hearing more about the &lt;a href=&#34;https://www.blazegraph.com/&#34;&gt;Blazegraph&lt;/a&gt; triplestore (well, &amp;ldquo;graph database with RDF support&amp;rdquo;), especially its &lt;a href=&#34;https://www.blazegraph.com/product/gpu-accelerated/&#34;&gt;support&lt;/a&gt; for running on &lt;a href=&#34;https://en.wikipedia.org/wiki/Graphics_processing_unit&#34;&gt;GPUs&lt;/a&gt;, and because they also advertise some degree of RDFS and OWL support, I wanted to see how quickly I could try that after downloading the community edition. It was pretty quick.&lt;/p&gt;
&lt;p&gt;Downloading from the &lt;a href=&#34;https://www.blazegraph.com/download/&#34;&gt;main download page&lt;/a&gt; with my Ubuntu machine got me an rpm file, but I found it simpler to download the jar file version that I could start as a server from the command line as described on the &lt;a href=&#34;https://wiki.blazegraph.com/wiki/index.php/NanoSparqlServer&#34;&gt;Nano SPARQL Server&lt;/a&gt; page. I found the jar file (and several other download options) on the &lt;a href=&#34;https://sourceforge.net/projects/bigdata/files/bigdata/2.1.0/&#34;&gt;sourceforge page&lt;/a&gt; for release 2.1.&lt;/p&gt;
&lt;p&gt;The jar file&amp;rsquo;s startup message tells you the URL for the web-based interface to the Nano SPARQL Server, shown here:&lt;/p&gt;
&lt;img id=&#34;idm45368802998704&#34; src=&#34;https://www.bobdc.com/img/main/blazegraph1.jpg&#34; width=&#34;640&#34;/&gt;
&lt;p&gt;At this point, uploading some RDF on the UPDATE tab and issuing SPARQL queries on the QUERY tab was easy. I was more interested sending it SPARQL queries that could take advantage of RDFS and OWL inferencing, so after a &lt;a href=&#34;https://sourceforge.net/p/bigdata/mailman/bigdata-developers/?viewmonth=201605&#34;&gt;little help&lt;/a&gt; from Blazegraph Chief Scientist Bryan Thompson via their mailing list (with a quick answer on a Saturday) I learned how: I had to first create a namespace on the NAMESPACES tab with the &lt;strong&gt;Inference&lt;/strong&gt; checkbox checked. The same form also offers checkboxes for &lt;strong&gt;Isolatable indexes&lt;/strong&gt;, &lt;strong&gt;Full text index&lt;/strong&gt;, and &lt;strong&gt;Enable geospatial&lt;/strong&gt; when configuring a new namespace. I found this typical of how Blazegraph lets you configure it to take advantage of more powerful features while leaving the out-of-box configuration simple and easy to use.&lt;/p&gt;
&lt;p&gt;For finer-grained namespace configuration, after you select checkboxes and click the &lt;strong&gt;Create namespace&lt;/strong&gt; button, a dialog box lets you edit the configuration details, with each of these lines explained in the Blazegraph &lt;a href=&#34;https://wiki.blazegraph.com/wiki/index.php/Main_Page&#34;&gt;documentation&lt;/a&gt;:&lt;/p&gt;
&lt;img id=&#34;idm45368802992480&#34; src=&#34;https://www.bobdc.com/img/main/blazegraph2.jpg&#34; width=&#34;600&#34;/&gt;
&lt;p&gt;I wanted to check Blazegraph&amp;rsquo;s support for &lt;code&gt;owl:TransitiveProperty&lt;/code&gt;, because this is such a basic, useful OWL class, as well as its ability to do subclass inferencing. I created some data about chairs, desks, rooms, and buildings, specifying which chairs and desks were in which rooms and which rooms were in which buildings, and also made &lt;code&gt;dm:locatedIn&lt;/code&gt; a transitive property:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;@prefix d: &amp;lt;http://learningsparql.com/ns/data#&amp;gt; .
@prefix dm: &amp;lt;http://learningsparql.com/ns/demo#&amp;gt; .
@prefix owl: &amp;lt;http://www.w3.org/2002/07/owl#&amp;gt; .
@prefix rdf: &amp;lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#&amp;gt; .
@prefix rdfs: &amp;lt;http://www.w3.org/2000/01/rdf-schema#&amp;gt; .


dm:Room rdfs:subClassOf owl:Thing .
dm:Building rdfs:subClassOf owl:Thing .
dm:Furniture rdfs:subClassOf owl:Thing .
dm:Chair rdfs:subClassOf dm:Furniture .
dm:Desk rdfs:subClassOf dm:Furniture .


dm:locatedIn a owl:TransitiveProperty. 


d:building100 rdf:type dm:Building .
d:building200 rdf:type dm:Building .
d:room101 rdf:type dm:Room ; dm:locatedIn d:building100 . 
d:room102 rdf:type dm:Room ; dm:locatedIn d:building100 . 
d:room201 rdf:type dm:Room ; dm:locatedIn d:building200 . 
d:room202 rdf:type dm:Room ; dm:locatedIn d:building200 . 


d:chair15 rdf:type dm:Chair ; dm:locatedIn d:room101 . 
d:chair23 rdf:type dm:Chair ; dm:locatedIn d:room101 . 
d:chair35 rdf:type dm:Chair ; dm:locatedIn d:room202 . 
d:desk22 rdf:type dm:Desk ; dm:locatedIn d:room101 . 
d:desk59 rdf:type dm:Desk ; dm:locatedIn d:room202 . 
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The following query asks for furniture in building 100. No triples above will match either of the query&amp;rsquo;s two triple patterns, so a SPARQL engine that can&amp;rsquo;t do inferencing won&amp;rsquo;t return anything. I wanted the query engine to infer that if chair 15 is a Chair, and Chair is a subclass of Furniture, then chair 15 is Furniture; also, if that furniture is in room 101 and room 101 is in building 100, then that furniture is in building 100.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;PREFIX dm: &amp;lt;http://learningsparql.com/ns/demo#&amp;gt; 
PREFIX d: &amp;lt;http://learningsparql.com/ns/data#&amp;gt; 
SELECT ?furniture
WHERE 
{ 
  ?furniture a dm:Furniture .
  ?furniture dm:locatedIn d:building100 . 
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We need the first triple pattern because the data above includes triples saying that rooms 101 and 102 are located in building 100, so those would have bound to &lt;code&gt;?furniture&lt;/code&gt; in the second triple pattern if the first triple pattern wasn&amp;rsquo;t there. This is a nice example of why declaring resources as instances of specific classes, while not necessary in RDF, does a favor to anyone who will query that data—it makes it easier for them to specify more detail about exactly what data they want.&lt;/p&gt;
&lt;p&gt;When using this query and data in a namespace (in the Blazegraph sense of the term) configured to do inferencing, Blazegraph executed the query against the original triples plus the inferred triples and listed the furniture in building 100:&lt;/p&gt;
&lt;img id=&#34;idm45368802983120&#34; src=&#34;https://www.bobdc.com/img/main/blazegraph3.jpg&#34;/&gt;
&lt;p&gt;Several years ago I &lt;a href=&#34;https://www.bobdc.com/blog/selling-rdf-technology-to-big&#34;&gt;backed off&lt;/a&gt; from discussions of the &amp;ldquo;semantic web&amp;rdquo; as a buzzphrase tying together technology around RDF-related standards because I felt that the phrase was not aging well and that the technology could be sold on its own without the buzzphrase, but the example above really does show semantics at work. Saying that &lt;code&gt;dm:locatedIn&lt;/code&gt; is a transitive property stores some semantics about that property, and these extra semantics let me get more out of the data set: they let me query for which furniture is in which building, even though the data has no explicit facts about furniture being in buildings. (Saying that Desk and Chair are subclasses of Furniture also stores semantics about all three terms, but that won&amp;rsquo;t be as interesting to a typical developer with object-oriented experience.)&lt;/p&gt;
&lt;p&gt;Blazegraph calls their subset of OWL &lt;a href=&#34;https://wiki.blazegraph.com/wiki/index.php/InferenceAndTruthMaintenance#Triples_Modes&#34;&gt;RDFS+&lt;/a&gt;, which was &lt;a href=&#34;https://sourceforge.net/p/bigdata/mailman/message/35069336/&#34;&gt;inspired by&lt;/a&gt; Jim Hendler and Dean Allemang&amp;rsquo;s RDFS+ superset of RDF that added in OWL&amp;rsquo;s most useful bits. (It&amp;rsquo;s similar but not identical to AllegroGraph&amp;rsquo;s &lt;a href=&#34;http://franz.com/agraph/support/documentation/current/reasoner-tutorial.html&#34;&gt;RDFS++&lt;/a&gt; profile, which has the same goal.) Blazegraph&amp;rsquo;s &lt;a href=&#34;https://www.blazegraph.com/product/product-description/&#34;&gt;Product description&lt;/a&gt; page describes which parts of OWL it supports, and their &lt;a href=&#34;https://wiki.blazegraph.com/wiki/index.php/InferenceAndTruthMaintenance#Configuring_Inference&#34;&gt;Inference And Truth Maintenance&lt;/a&gt; page describes more.&lt;/p&gt;
&lt;p&gt;A few other interesting things about Blazegraph as a triplestore and query engine:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;The &lt;a href=&#34;https://wiki.blazegraph.com/wiki/index.php/REST_API&#34;&gt;REST&lt;/a&gt; interface offers access to a wide range of features.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Queries can include &lt;a href=&#34;https://wiki.blazegraph.com/wiki/index.php/QueryHints&#34;&gt;Query Hints&lt;/a&gt; to optimize how the SPARQL engine executes them, which will be handy if you plan on scaling way up.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;I saw no no direct references to &lt;a href=&#34;https://www.bobdc.com/blog/visualizing-dbpedia-geographic&#34;&gt;GeoSPARQL&lt;/a&gt; in the Blazegraph documentation, but they recently &lt;a href=&#34;http://www.pressreleaserocket.net/blazegraph-2-1-0-graph-database-now-enables-geospatial-searching-and-pubchem-data-processing/443176/%20&#34;&gt;announced&lt;/a&gt; support for geospatial SPARQL queries. (I&amp;rsquo;ve been learning a lot about working with geospatial data at Hadoop scale with &lt;a href=&#34;http://www.ccri.com/case-studies/geomesa/&#34;&gt;GeoMesa&lt;/a&gt;.)&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Blazegraph&amp;rsquo;s main selling points seems to be speed and scalability (for example, see its &lt;a href=&#34;https://wiki.blazegraph.com/wiki/index.php/ClusterGuide&#34;&gt;Scaleout Cluster&lt;/a&gt; mode) and I didn&amp;rsquo;t play with those at all, but I liked seeing that SPARQL querying with inferencing support can take advantage of such new hotness technology as GPUs. It will be interesting to see where Blazegraph takes it.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2016">2016</category>
      
      <category domain="https://www.bobdc.com//categories/rdf/owl">RDF/OWL</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
      <category domain="https://www.bobdc.com//categories/semantic-web">semantic web</category>
      
      <category domain="https://www.bobdc.com//categories/triplestores">triplestores</category>
      
    </item>
    
    <item>
      <title>Playing with a proximity beacon</title>
      <link>https://www.bobdc.com/blog/playing-with-a-proximity-beaco/</link>
      <pubDate>Sat, 23 Apr 2016 08:30:34 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/playing-with-a-proximity-beaco/</guid>
      
      
      <description><div>Nine-dollar devices send URLs to your phone over Bluetooth.</div><div>&lt;p&gt;I&amp;rsquo;ve been hearing about proximity beacons lately and thought it would be fun to try one of these inexpensive devices that broadcast a URL for a range of just a few meters via Bluetooth Low Energy (a.k.a. &lt;a href=&#34;https://en.wikipedia.org/wiki/Bluetooth_low_energy&#34;&gt;BLE&lt;/a&gt;, which I assume is pronounced &amp;ldquo;bleh&amp;rdquo;). Advocates often cite the use case of how a beacon device located near a work of art in a museum might broadcast a URL pointing to a web page about it—for example, one near Robert Rauschenberg&amp;rsquo;s &lt;em&gt;Bed&lt;/em&gt; in New York&amp;rsquo;s Museum of Modern Art could broadcast the URL &lt;a href=&#34;http://moma.org/collection/works/78712&#34;&gt;http://moma.org/collection/works/78712&lt;/a&gt;, their web site&amp;rsquo;s page with information about the work. When the appropriate app on your phone (or perhaps your phone&amp;rsquo;s operating system) saw this, it would alert you to the availability of this localized information.&lt;/p&gt;
&lt;img id=&#34;idm45746389029872&#34; src=&#34;http://snee.com/bobdc.blog/img/beaconfun1.jpg&#34; width=&#34;40%&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;beacon in phone charger&#34;/&gt;
&lt;p&gt;You can find these beacons for as little as &lt;a href=&#34;http://www.amazon.com/Radius-Networks-RadBeacon-Dot-Technology/dp/B00JJ4P864&#34;&gt;$14&lt;/a&gt;, and even cheaper on eBay, where colorful &lt;a href=&#34;http://www.ebay.com/sch/i.html?_nkw=ibeacon+eddystone+wristband+bracelet&#34;&gt;bracelet versions&lt;/a&gt; can cost less then $10. Most need batteries, typically the kind you put in a watch, so to avoid this I got a &lt;a href=&#34;http://www.amazon.com/RadBeacon-USB-Proximity-Eddystone-Technology/dp/B00R9NUQOG&#34;&gt;RadBeacon USB&lt;/a&gt; from Radius Technologies that draws its power from any USB port where you plug it in. At the right you can see mine plugged into a conference swag phone recharger.&lt;/p&gt;
&lt;p&gt;I also chose this one because it supports Google&amp;rsquo;s &lt;a href=&#34;https://github.com/google/eddystone&#34;&gt;Eddystone&lt;/a&gt; open beacon format, Apple&amp;rsquo;s &lt;a href=&#34;https://developer.apple.com/ibeacon/&#34;&gt;iBeacon&lt;/a&gt; format, and Radius Network&amp;rsquo;s &lt;a href=&#34;http://altbeacon.org/&#34;&gt;AltBeacon&lt;/a&gt;. I haven&amp;rsquo;t dug into the pros and cons of these different formats yet; I just wanted something that was likely to work out of the box with both my Samsung S6 Android phone and my wife&amp;rsquo;s iPhone. The RadBeacon USB did fine.&lt;/p&gt;
&lt;p&gt;You configure it with a phone app built for that particular beacon product line. The Android &lt;a href=&#34;https://play.google.com/store/apps/details?id=com.radiusnetworks.radbeacon&amp;amp;hl=en&#34;&gt;RadBeacon app&lt;/a&gt; generally worked, although I often had to press &amp;ldquo;Apply&amp;rdquo; several times and restart Bluetooth before new settings would actually take hold. Its &lt;a href=&#34;https://radiusnetworks.zendesk.com/hc/en-us/articles/205022884-How-do-I-configure-Eddystone-Developer-Kit-beacons-%20&#34;&gt;documentation&lt;/a&gt; shows the kinds of properties it lets you set, such as the URL to broadcast and the Transmit Power (which affects the battery life and the distance that the URL is broadcast—in a museum, you want people receiving the URL of the painting in front of them, not the one twenty feet to the left of it).&lt;/p&gt;
&lt;p&gt;I had set mine to the URL of a &lt;a href=&#34;http://snee.com/beacontest/&#34;&gt;sample web page&lt;/a&gt; that I created for this purpose. While waiting for my RadBeacon to arrive in the mail, after Dan Brickley &lt;a href=&#34;https://twitter.com/danbri/status/713455538785415170&#34;&gt;tweeted&lt;/a&gt; the mobiForge article &lt;a href=&#34;https://mobiforge.com/design-development/eddystone-beacon-technology-and-the-physical-web&#34;&gt;Eddystone beacon technology and the Physical Web&lt;/a&gt;, I learned a lot from it about which components of my web page would be picked up by an app that received the broadcast URL.&lt;/p&gt;
&lt;p&gt;After I configured the beacon, the open source &lt;a href=&#34;https://play.google.com/store/apps/details?id=physical_web.org.physicalweb&amp;amp;hl=en&#34;&gt;physical web&lt;/a&gt; app found it and displayed the following on my Samsung S6:&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://www.bobdc.com/img/main/beaconfun2.png&#34;&gt;&lt;img id=&#34;idm45746389014208&#34; src=&#34;https://www.bobdc.com/img/main/beaconfun2.png&#34; border=&#34;0&#34; width=&#34;40%&#34; style=&#34;display:block;margin: 0 auto;&#34; alt=&#34;screenshot of physical web app&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Tapping the blue title took the phone to the web page. This all worked the same, with the same app, on my wife&amp;rsquo;s iPhone.&lt;/p&gt;
&lt;p&gt;I don&amp;rsquo;t want to have to bring such an app to the foreground every time I want to check for nearby beacons, so I was glad to see that the app also added something to my phone&amp;rsquo;s notifications list:&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://www.bobdc.com/img/main/beaconfun3.png&#34;&gt;&lt;img id=&#34;idm45746389011040&#34; src=&#34;https://www.bobdc.com/img/main/beaconfun3.png&#34; border=&#34;0&#34; width=&#34;40%&#34; style=&#34;display:block;margin: 0 auto;&#34; alt=&#34;screenshot of Android notifications&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Touching the notification sent the phone to the referenced web page.&lt;/p&gt;
&lt;p&gt;Both notifications above show what the app pulled from my &lt;a href=&#34;http://snee.com/beacontest/&#34;&gt;sample web page&lt;/a&gt;: the content of the &lt;code&gt;head&lt;/code&gt; element&amp;rsquo;s &lt;code&gt;title&lt;/code&gt; element and the value of the &lt;code&gt;content&lt;/code&gt; attribute from the &lt;code&gt;meta&lt;/code&gt; element that had a &lt;code&gt;name&lt;/code&gt; attribute value of &amp;ldquo;description&amp;rdquo;. They also displayed the hastily-drawn favicon image I created for the web page.&lt;/p&gt;
&lt;p&gt;A beacon won&amp;rsquo;t broadcast just any URI that you want, because the allowable length is somewhat limited. (This could vary by beacon product.) The article mentioned above describes the role of URL shorteners in the architecture. Still, the idea of such inexpensive hardware using URIs to identify things brings a nice semantic web touch to an Internet of Things architecture.&lt;/p&gt;
&lt;p&gt;One experiment I tried was the use of &lt;a href=&#34;https://sourceforge.net/projects/tagtool/&#34;&gt;Audio Tag Tool&lt;/a&gt; to add every metadata field available to an MP3. I then configured my beacon to broadcast that MP3&amp;rsquo;s URL, but none of the metadata showed up on my phone&amp;rsquo;s display. I thought that the idea of location-specific audio might be interesting. (You could also implement location-specific audio with much older technology—for example, &lt;a href=&#34;http://www.victor-victrola.com/&#34;&gt;Victrolas&lt;/a&gt;—but the ability to control the audio from a central server could lead to interesting possibilities.)&lt;/p&gt;
&lt;p&gt;The museum use case for beacons is nice and cultured, but I wonder about the attraction of a technology whose real main use case for now is to pump ads at people. (When was the last time you scanned a &lt;a href=&#34;http://blog.hubspot.com/marketing/qr-codes-dead&#34;&gt;QR code&lt;/a&gt; with your phone?) I say &amp;ldquo;for now&amp;rdquo; because I remain hopeful that creative people will come up with more interesting things to do with these, especially if they dig into the &lt;a href=&#34;https://developers.google.com/beacons/proximity/guides&#34;&gt;Eddystone&lt;/a&gt;, &lt;a href=&#34;https://developer.apple.com/ibeacon/&#34;&gt;iBeacon&lt;/a&gt;, and &lt;a href=&#34;https://altbeacon.github.io/android-beacon-library/&#34;&gt;AltBeacon&lt;/a&gt; APIs. For example, you could add features to your own apps to check for or even act as beacons, communicating with other beacons and apps around your phone whether these devices had Internet connections or not. The Opera browser&amp;rsquo;s use of schema.org metadata stored in web pages referenced by beacons is also promising, and I know that Dan is putting more thought into what role schema.org can play.&lt;/p&gt;
&lt;p&gt;The idea of the broadcast URL showing up as a notification on your phone that you can follow or ignore is much simpler than starting up a special app on your phone and then pointing the phone at one corner of a poster, which the QR enthusiasts thought we&amp;rsquo;d be happier to do. The short article &lt;a href=&#34;http://unacast.com/5-common-misconceptions-about-beacons-and-proximity-marketing/&#34;&gt;5 Common Misconceptions About Beacons and Proximity Marketing&lt;/a&gt; gives a good perspective on where beacons can fit into the communications ecosystem in general and the world of marketing in particular. The article is from one of several companies building a business model around advertising via beacons, but like I said above, I hope that the APIs inspire other users for them as well.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2016">2016</category>
      
      <category domain="https://www.bobdc.com//categories/miscellaneous">miscellaneous</category>
      
    </item>
    
    <item>
      <title>Adding custom menus to Google docs</title>
      <link>https://www.bobdc.com/blog/adding-custom-menus-to-google/</link>
      <pubDate>Sun, 20 Mar 2016 12:00:15 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/adding-custom-menus-to-google/</guid>
      
      
      <description><div>Using Google Apps Script, but unfortunately not in Google apps.</div><div>&lt;img id=&#34;idm45531756847456&#34; src=&#34;https://www.bobdc.com/img/main/googleappsmenu.png&#34; alt=&#34;Google apps menu&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; width=&#34;300&#34;/&gt;
&lt;p&gt;I&amp;rsquo;ve been using Google Docs more because at work it&amp;rsquo;s great for collaboration, and also, for shopping lists and notes to myself, I can easily edit the same documents from my phone, tablet, and laptop. I found out that it&amp;rsquo;s pretty easy to add menus that perform custom functions, so I created a few menu choices&amp;hellip; and then found out that they weren&amp;rsquo;t available on my phone or tablet. Still, it&amp;rsquo;s good to know how easy it is to automate a few things.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://developers.google.com/apps-script/guides/docs#the_basics&#34;&gt;Extending Google Docs&lt;/a&gt; is a good introduction to getting started. Picking &lt;strong&gt;Script Editor&lt;/strong&gt; from the &lt;strong&gt;Tools&lt;/strong&gt; menu puts you into this editor with an empty function waiting for you to fill it in or, more likely, to replace it with code you copied from web pages such as &amp;ldquo;Extending Google Docs.&amp;rdquo; &lt;a href=&#34;https://developers.google.com/apps-script/&#34;&gt;Google Apps Script&lt;/a&gt; is basically Javascript, and I had an easy time searching for any code that I wanted to plug in.&lt;/p&gt;
&lt;p&gt;For example, when writing a note about something, I sometimes want to add a date-time stamp to show exactly when I made a particular note, because if it&amp;rsquo;s ongoing research it&amp;rsquo;s easier to see my progress leading up to where I left off. (I&amp;rsquo;ve had my &lt;code&gt;.emacs&lt;/code&gt; file set up to let me add this with Alt+D for years.) To add a &lt;strong&gt;timestamp&lt;/strong&gt; menu choice to Google Docs, I replaced the blank function in the script editor with menu code based on what you see in &lt;a href=&#34;https://developers.google.com/apps-script/guides/menus#custom_menus_in_google_docs_sheets_or_forms&#34;&gt;Custom Menus in Google Apps&lt;/a&gt;, and then I added a line to insert the current date and time at the cursor using the format &amp;ldquo;Sun Mar 13 2016 10:40:33 GMT-0400 (EDT).&amp;rdquo; I&amp;rsquo;d prefer the terser ISO 8601 format, and I found a function to convert it, but the function wants to know what time zone you&amp;rsquo;re in, and the simpler &lt;code&gt;Date()&lt;/code&gt; function that creates the more verbose form already knows.&lt;/p&gt;
&lt;p&gt;When I read something on my tablet and I&amp;rsquo;m taking notes, I often paste blocks of text into a Google docs document. To remember which parts are large verbatim blocks of someone else&amp;rsquo;s writing, I enclose them in &lt;code&gt;&amp;lt;blockquote&amp;gt;&amp;lt;/blockquote&amp;gt;&lt;/code&gt; tags. My second new menu item inserts this string and then moves the cursor between those tags so that if I have something in my copy-paste buffer I can just paste it right there. The &amp;ldquo;utilities&amp;rdquo; menu that I added also demonstrates how to add a menu separator and a submenu that pops up a message box.&lt;/p&gt;
&lt;p&gt;The code is all shown below. If I want to share these features across multiple documents, to be honest, the simplest way I&amp;rsquo;ve found is to paste this code into the script editor for each of the other documents. This is not, if I may string together some buzzwords, a scalable code maintenance solution.&lt;/p&gt;
&lt;p&gt;These are known as &amp;ldquo;bound&amp;rdquo; scripts because they&amp;rsquo;re bound to specific documents. You can also create &lt;a href=&#34;https://developers.google.com/apps-script/guides/standalone&#34;&gt;standalone&lt;/a&gt; scripts, which I hoped would be a way to store shared code that could be referenced from multiple documents, but you actually run them independently of the documents to perform tasks that are not tied to any specific document such as, in the example on that page, searching Google Drive for documents meeting certain conditions.&lt;/p&gt;
&lt;p&gt;If you have a script that adds choices to a document and you want to use it from multiple documents, you must publish it. As the &lt;a href=&#34;https://developers.google.com/apps-script/add-ons/publish&#34;&gt;Publishing an Add-on&lt;/a&gt; web page says,&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Publishing add-ons allows them to be used by other users in their own documents. Public add-ons require a review before publication, although if you are a member of a private Google Apps domain, you can publish just for users within your domain without a review. You can also publish an add-on for domain-wide installation, which lets a domain admins find &lt;em&gt;[sic]&lt;/em&gt;, authorize and install your add-on on behalf of all users within their domain.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;There&amp;rsquo;s even an add-on &lt;a href=&#34;https://docs.google.com/document/d/1FrqgxH_kh44rLj0374uYBlG1n1HxqEfFVL3XI_MhzJA/edit?addon_store%20&#34;&gt;store&lt;/a&gt; with offerings available from some recognizable brand names.&lt;/p&gt;
&lt;p&gt;I never did find a way to create a single script that I could share among my own documents without going through some approval process. In an even greater disappointment, I found that the menu I created was &lt;a href=&#34;https://www.quora.com/How-do-I-run-a-Google-Apps-Script-spreadsheet-on-a-mobile-phone&#34;&gt;not available when editing that same document on my phone or tablet&lt;/a&gt;, which was much of the point of creating them. In other words, this part of Google Apps script doesn&amp;rsquo;t work with Google apps.&lt;/p&gt;
&lt;p&gt;Still, skimming the &lt;a href=&#34;https://developers.google.com/apps-script/reference/calendar/&#34;&gt;Apps Script Reference&lt;/a&gt; for available methods to call when customizing for Google Docs, spreadsheets, calendars, and more shows that there&amp;rsquo;s a lot to play with, and I didn&amp;rsquo;t even try a standalone script. If this ever works on phones and tablets, I will definitely be digging back into the reference material again.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;function onOpen() {
  var ui = DocumentApp.getUi();
  // Or DocumentApp or FormApp.
  ui.createMenu(&#39;utilities&#39;)
      .addItem(&#39;timestamp&#39;, &#39;insertTimestamp&#39;)
      .addItem(&#39;blockquote&#39;, &#39;insertBqTags&#39;)
     .addSeparator()
      .addSubMenu(ui.createMenu(&#39;Sub-menu&#39;)
          .addItem(&#39;Second item&#39;, &#39;menuItem2&#39;))
      .addToUi();
}


function insertTimestamp() {
  DocumentApp.getUi() ; 
  var doc = DocumentApp.getActiveDocument(); 
  var body = doc.getBody();
  // The following gives me ISO format, which I prefer, but unlike Date(), 
  // needs to be told the time zone 
  // var timestamp = Utilities.formatDate(new Date(), &amp;quot;EDT&amp;quot;, &amp;quot;yyyy-MM-dd&#39;T&#39;HH:mm:ss&amp;quot;); 
  var timestamp = new Date();
  // https://developers.google.com/apps-script/reference/document/document#getcursor
  // has error-checking code for the following that would make it more robust.
  var cursor = DocumentApp.getActiveDocument().getCursor();
  var element = cursor.insertText(timestamp);
}


function insertBqTags() {
  DocumentApp.getUi() ;
  var doc = DocumentApp.getActiveDocument(); 
  var body = doc.getBody();
  var cursor = DocumentApp.getActiveDocument().getCursor();
  var insertedText = cursor.insertText(&amp;quot;&amp;lt;blockquote&amp;gt;&amp;lt;/blockquote&amp;gt;&amp;quot;);
  var position = doc.newPosition(insertedText, 12);
  doc.setCursor(position); 
}


function menuItem2() {
  DocumentApp.getUi() // Or DocumentApp or FormApp.
     .alert(&#39;You clicked the second menu item!&#39;);
}
&lt;/code&gt;&lt;/pre&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2016">2016</category>
      
      <category domain="https://www.bobdc.com//categories/neat-tricks">neat tricks</category>
      
    </item>
    
    <item>
      <title>&#34;Readings in Database Systems&#34;: wisdom from Michael Stonebraker</title>
      <link>https://www.bobdc.com/blog/readings-in-database-systems-w/</link>
      <pubDate>Sat, 27 Feb 2016 11:03:23 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/readings-in-database-systems-w/</guid>
      
      
      <description><div>and two other guys--updated and free online.</div><div>&lt;img id=&#34;idm45348838695216&#34; src=&#34;https://www.bobdc.com/img/main/stonebraker.jpg&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;Michael Stonebraker&#34;/&gt;
&lt;p&gt;As I &lt;a href=&#34;https://twitter.com/bobdc/status/621727213528948736&#34;&gt;tweeted&lt;/a&gt; last July, I always learn so much about both the past and future of database computing from recent Turing Award winner &lt;a href=&#34;https://en.wikipedia.org/wiki/Michael_Stonebraker&#34;&gt;Michael Stonebraker&lt;/a&gt;. I recently learned that the latest edition of &lt;a href=&#34;http://www.redbook.io/all-chapters.html&#34;&gt;Readings in Database Systems&lt;/a&gt;, also known as the &amp;ldquo;Red Book,&amp;rdquo; is available for free online under a Creative Commons license—or at least the introductions to the readings are. With most of these being by Stonebraker, and quite up-to-date, I consider these 43 pages required reading for anyone interested in database technology.&lt;/p&gt;
&lt;p&gt;The serious student should find and read the actual papers, but I learned plenty from the introductions by Stonebraker and his co-editors Peter Bailis and Joe Hellerstein. (Ben Lorica&amp;rsquo;s recent &lt;a href=&#34;https://www.oreilly.com/ideas/metadata-services-can-lead-to-performance-and-organizational-improvements&#34;&gt;podcast interview&lt;/a&gt; with Hellerstein is also worth a listen.) For example, after reading the introduction to chapter 4 I now have me a much better understanding of the advantages of column stores over more traditional row stores, and chapter 12 helped me to understand the history of Data Warehouses and the role of &lt;a href=&#34;https://en.wikipedia.org/wiki/Extract,_transform,_load&#34;&gt;ETL&lt;/a&gt; much better.&lt;/p&gt;
&lt;p&gt;This is the fifth edition of the book, published in 2015, so it is very current, as you can see from the way it treats MapReduce as past history. They published the &lt;a href=&#34;http://redbook.cs.berkeley.edu/bib1.html&#34;&gt;first edition&lt;/a&gt; in 1988, so this has clearly been a long-term project, and it&amp;rsquo;s interesting to see which twentieth century papers appear in the new fifth edition—for example, Sergey Brin and Larry Page&amp;rsquo;s 1998 classic &lt;a href=&#34;http://infolab.stanford.edu/~backrub/google.html&#34;&gt;The Anatomy of a Large-scale Hypertextual Web Search Engine&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Several of Stonebraker&amp;rsquo;s more opinionated assertions were enough fun to read they tempted me to start a fake Twitter account, modeled on the hilarious &lt;a href=&#34;https://twitter.com/boredelonmusk&#34;&gt;@boredElonMusk&lt;/a&gt;, that I would call @crankyMikeStonebraker. It would feature real quotes from the Red Book such as these:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&amp;ldquo;SQL will be the COBOL of 2020, a language we are stuck with that everybody will complain about.&amp;rdquo;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&amp;ldquo;[JSON] is a disaster in the making as a general hierarchical data format.&amp;rdquo;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;“I consider ODBC among the worst interfaces on the planet.”&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&amp;ldquo;The rest of the world is seeing what Google figured out earlier; Map-Reduce is not an architecture with any broad scale applicability.&amp;rdquo;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&amp;ldquo;The MapReduce crowd has turned into a SQL crowd and Map-Reduce, as an interface, is history.&amp;rdquo;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&amp;ldquo;Just because Google thinks something is a good idea does not mean you should adopt it.&amp;rdquo;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&amp;ldquo;We begin with a sad truth. Most data science platforms are file-based and have nothing to do with DBMSs.&amp;rdquo;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&amp;ldquo;the new buzzword is master data management (MDM)&amp;hellip; MDM is the opposite of business agility.&amp;rdquo;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;While the very title of &amp;ldquo;Readings in Database Systems&amp;rdquo; will make some peoples&amp;rsquo; eyes glaze over, bits like these make it much more fun to read than many would expect, especially if you care at all about the role that database systems play in modern applications.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Photo of Michael Stonebraker by &lt;a href=&#34;https://www.flickr.com/photos/dcoetzee/&#34;&gt;D Coetzee&lt;/a&gt; via &lt;a href=&#34;https://www.flickr.com/photos/dcoetzee/4673939138/in/photolist-882agy-87XXa8-ocTrpE-bo2GTX&#34;&gt;flickr&lt;/a&gt; (CC0)&lt;/em&gt;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2016">2016</category>
      
      <category domain="https://www.bobdc.com//categories/nosql">NoSQL</category>
      
      <category domain="https://www.bobdc.com//categories/data-science">data science</category>
      
      <category domain="https://www.bobdc.com//categories/technology-future">technology, future</category>
      
      <category domain="https://www.bobdc.com//categories/technology-past">technology, past</category>
      
    </item>
    
    <item>
      <title>The past and present of hypertext</title>
      <link>https://www.bobdc.com/blog/the-past-and-present-of-hypert/</link>
      <pubDate>Sun, 17 Jan 2016 10:58:51 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/the-past-and-present-of-hypert/</guid>
      
      
      <description><div>You know, links in the middle of sentences.</div><div>&lt;p&gt;I&amp;rsquo;ve been thinking lately about the visionary optimism of the days when people dreamed of the promise of large-scale hypertext systems. I&amp;rsquo;m pretty sure they didn&amp;rsquo;t mean linkless content down the middle of a screen with columns of ads to the left and right of it, which is much of what we read off of screens these days. I certainly don&amp;rsquo;t want to start one of those rants of &amp;ldquo;the World Wide Web is deficient because it&amp;rsquo;s missing features X and Y, which by golly we had in the HyperThingie™ system that I helped design back in the 80s, and the W3C should have paid more attention to us&amp;rdquo; because I&amp;rsquo;ve seen too many of those. The web got so popular because Tim Berners-Lee found such an excellent balance between which features to incorporate and which (for example, central link management) to skip.&lt;/p&gt;
&lt;p&gt;The idea of &lt;a href=&#34;https://en.wikipedia.org/wiki/Inline_linking&#34;&gt;inline links&lt;/a&gt;, in which words and phrases in the middle of sentences link to other documents related to those words and phrases, was considered an exciting thing back when we got most of information from printed paper. A hypertext system had links between the documents stored in that system, and the especially exciting thing about a &amp;ldquo;world wide&amp;rdquo; hypertext system was that any document could link to any other document in the world.&lt;/p&gt;
&lt;p&gt;But who does, in 2016? The reason I&amp;rsquo;ve been thinking more about the past and present of hypertext (a word that, sixteen years into the twenty-first century, is looking a bit quaint) is that since adding a few links to something I was writing at work recently, I&amp;rsquo;ve been more mindful of which major web sites include how many inline links and how many of those links go to other sites. For example, while reading the article &lt;a href=&#34;http://blogs.scientificamerican.com/cross-check/bayes-s-theorem-what-s-the-big-deal/&#34;&gt;Bayes&amp;rsquo;s Theorem: What&amp;rsquo;s the Big Deal?&lt;/a&gt; on &lt;a href=&#34;http://www.scientificamerican.com/&#34;&gt;Scientific American&amp;rsquo;s site&lt;/a&gt; recently, I found myself thinking &amp;ldquo;good for you guys, with all those useful links to other web sites right in the body of your article!&amp;rdquo;&lt;/p&gt;
&lt;p&gt;To get some idea of relative proportions of internal links, external links, and linkless text on today&amp;rsquo;s successful websites, I went to a &lt;a href=&#34;http://www.ebizmba.com/articles/blogs&#34;&gt;top 15 most popular blogs list&lt;/a&gt; and did some random checking of articles on these sites. (An exercise for the reader to make up for my haphazard skimming: write some scripts to scrape some editorial content from each site, count the internal and external links, and produce a bar chart.) Because these are professionally managed sites, I imagine that management at some of them encourage links to other articles on the same site and discourage links to others as a matter of policy, because they want to keep their readers looking at their advertisers&amp;rsquo; ads.&lt;/p&gt;
&lt;p&gt;There is a gray area between internal and external links: linking to other sites that are part of the same organization, such the many links in a &lt;a href=&#34;http://www.businessinsider.com/who-won-900-million-powerball-2016-1&#34;&gt;Business Insider article&lt;/a&gt; to &lt;a href=&#34;http://www.techinsider.io/&#34;&gt;Tech Insider&lt;/a&gt; articles, or the many links between members of the &lt;a href=&#34;https://en.wikipedia.org/wiki/Gawker_Media#List_of_Gawker_Media_weblogs&#34;&gt;Gawker Media&lt;/a&gt; stable, which is heavily represented in the top 15.&lt;/p&gt;
&lt;p&gt;Of those top 15:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;http://www.huffingtonpost.com/&#34;&gt;Huffington Post&lt;/a&gt;: a mix of internal and external links, but their number of external links fits with their business model of being a hub of other sites&amp;rsquo; content.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;All about the internal links: &lt;a href=&#34;http://www.tmz.com/&#34;&gt;TMZ&lt;/a&gt;, &lt;a href=&#34;http://mashable.com/&#34;&gt;Mashable&lt;/a&gt;, &lt;a href=&#34;http://gawker.com/&#34;&gt;Gawker&lt;/a&gt;, &lt;a href=&#34;http://www.thedailybeast.com/&#34;&gt;The Daily Beast&lt;/a&gt;, &lt;a href=&#34;http://www.engadget.com/&#34;&gt;Engadget&lt;/a&gt;, &lt;a href=&#34;http://jezebel.com/&#34;&gt;Jezebel&lt;/a&gt; (where most external links are to their Gawker Media sibling &lt;a href=&#34;http://gawker.com/&#34;&gt;Gawker&lt;/a&gt;).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;http://deadspin.com/&#34;&gt;Deadspin&lt;/a&gt;: a reasonable percentage of external links.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Gawker Media&amp;rsquo;s video game site &lt;a href=&#34;http://kotaku.com/&#34;&gt;Kotaku&lt;/a&gt;: long stretches of text with no links, and others with both internal and external links.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;http://techcrunch.com/&#34;&gt;TechCrunch&lt;/a&gt;: mostly internal and several to Gizmodo, even though TechCrunch is an AOL site and Gizmodo a Gawker media site.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Gawker Media&amp;rsquo;s &lt;a href=&#34;http://www.lifehacker.com&#34;&gt;lifehacker&lt;/a&gt;, which is probably the site I visit most of all those listed here: external links if an article describes the external site&amp;rsquo;s article, company, or product, but otherwise, internal links.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;http://perezhilton.com/&#34;&gt;Perez Hilton&lt;/a&gt;: mostly internal links; external links tend to be redirected via goo.gl, I suppose so that Mr. Hilton&amp;rsquo;s people can track which external links get clicked.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Gawker Media&amp;rsquo;s &lt;a href=&#34;http://gizmodo.com/&#34;&gt;Gizmodo&lt;/a&gt;: plenty of external links, even to non-Gawker sites, for a gadget site that I assume is mostly interested in helping advertisers sell gadgets.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;http://www.cheezburger.com/&#34;&gt;Cheezburger&lt;/a&gt;: textual content not much of an issue here.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I&amp;rsquo;m guessing that there is no policy across all of Gawker Media about the use of links, but that each of their major properties has some sort of policy in place. (For an interesting, explicit enumeration of one carefully managed site&amp;rsquo;s linking policy, see the guidelines at &lt;a href=&#34;http://www.ibm.com/developerworks/library/styleguidelines/#links&#34;&gt;IBM Developer Works&lt;/a&gt;.)&lt;/p&gt;
&lt;p&gt;On particularly link-rich bit of content that I read regularly is &lt;a href=&#34;http://tinyletter.com/data-is-plural/letters/data-is-plural-2016-01-06-edition&#34;&gt;Data is Plural&lt;/a&gt;, which ironically is delivered via email—a technology that had a firm foothold in the Internet before Berners-Lee came up with the Web, and which most young people today only use to communicate with us old people.&lt;/p&gt;
&lt;p&gt;Who even thinks about hypertext as hypertext anymore? A quick look at the former &lt;a href=&#34;https://en.wikipedia.org/wiki/Usenet&#34;&gt;Usenet newsgroup&lt;/a&gt; (and now &lt;a href=&#34;https://groups.google.com/forum/#!overview&#34;&gt;Google Group&lt;/a&gt;) &lt;a href=&#34;https://groups.google.com/forum/#!forum/alt.hypertext&#34;&gt;alt.hypertext&lt;/a&gt; shows an average of about one new message or comment per month for the last few years, including spam. (Compare January of 1998, when the newsgroup had 39 topics with one or more postings in that one month.) The most recent topic shown is titled &amp;ldquo;NCSA Mosaic for X 0.10 available&amp;rdquo; from Marc Andreesen, posted—I thought—last month, making me think &amp;ldquo;isn&amp;rsquo;t he a bit busy for Mosaic these days?&amp;rdquo; It turned out that last month someone added a comment to his original 1993 post. A relatively recent new topic is &lt;a href=&#34;https://medium.com/@ftrain&#34;&gt;Paul Ford&amp;rsquo;s&lt;/a&gt; &lt;a href=&#34;https://groups.google.com/forum/#!topic/alt.hypertext/H5c9yfm-t3A&#34;&gt;January 2014 query&lt;/a&gt; &amp;ldquo;Do documents have a chance? Or is the future more and smarter optimized applications?&amp;rdquo; Actually, that makes a solid answer to my question that began this paragraph: Paul Ford, and I&amp;rsquo;m really looking forward to his &lt;a href=&#34;http://www.amazon.com/Secret-Lives-Web-Pages/dp/0374261113/&#34;&gt;upcoming book&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://www.bobdc.com/img/main/afternoonAStory.jpg&#34;&gt;&lt;img id=&#34;idm140153782618704&#34; src=&#34;https://www.bobdc.com/img/main/afternoonAStory.jpg&#34; border=&#34;0&#34; width=&#34;320&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;&#39;Afternoon, A Story&#39; package&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;The hypertext &amp;ldquo;novel&amp;rdquo; I bought in 1994 for $25&lt;/em&gt;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2016">2016</category>
      
      <category domain="https://www.bobdc.com//categories/publishing">publishing</category>
      
    </item>
    
    <item>
      <title>My new job</title>
      <link>https://www.bobdc.com/blog/my-new-job-1/</link>
      <pubDate>Sun, 20 Dec 2015 09:27:39 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/my-new-job-1/</guid>
      
      
      <description><div>Lots of cutting edge technologies, 18 minutes from my home.</div><div>&lt;p&gt;&lt;a href=&#34;http://www.ccri.com&#34;&gt;&lt;img id=&#34;idm140447919716000&#34; src=&#34;https://www.bobdc.com/img/main/ccrilogo.png&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;CCRi logo&#34; width=&#34;180&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I recently began a new full-time position as a technical writer at Commonwealth Computer Research, Inc., more commonly known as &lt;a href=&#34;http://www.ccri.com/&#34;&gt;CCRi&lt;/a&gt;. CCRi was doing large-scale data science long before the term &amp;ldquo;data science&amp;rdquo; became so popular; &lt;a href=&#34;http://www.ccri.com/about/#panel-29-2-0-3&#34;&gt;one company founder&lt;/a&gt; also directs the University of Virginia&amp;rsquo;s &lt;a href=&#34;https://dsi.virginia.edu/&#34;&gt;Data Science Institute&lt;/a&gt;. They also do a lot of work with distributed machine learning and other cutting edge technologies, especially in the area of geospatial analytics. The chance to work with so many different interesting new technologies and smart people—engineering and math PhD&amp;rsquo;s tend to be the norm instead of the exception—right here in Charlottesville, after telecommuting for over eight years, was just too good to pass up.&lt;/p&gt;
&lt;p&gt;Having recently grown to over 80 employees, CCRi has gotten large enough that it&amp;rsquo;s become difficult for everyone there to know about all the technology and projects going on in other parts of the company. Part of my role will be to help with that, documenting these things so that it&amp;rsquo;s easier for people to find connections between the different existing and new efforts underway. I&amp;rsquo;ll also be helping them with marketing and business development.&lt;/p&gt;
&lt;p&gt;RDF and SPARQL do play a role in some of the projects there, mostly using the &lt;a href=&#34;https://wiki.apache.org/incubator/RyaProposal&#34;&gt;Rya&lt;/a&gt; triplestore because of its use of Apache &lt;a href=&#34;https://accumulo.apache.org/&#34;&gt;Accumulo&lt;/a&gt; for storage. Accumulo is a key-value pair NoSQL database built on Hadoop whose design is based on Google&amp;rsquo;s BigTable database, and it plays an important part in several CCRi projects.&lt;/p&gt;
&lt;p&gt;One of the biggest projects at CCRi is GeoMesa, which is described by its &lt;a href=&#34;http://www.ccri.com/case-studies/geomesa/&#34;&gt;product page&lt;/a&gt; is &amp;ldquo;an open-source solution maintained and supported by CCRi for storing, indexing, querying, transforming, and visualizing spatio-temporal data at scale in Accumulo.&amp;rdquo; For a start, it adds to Accumulo what &lt;a href=&#34;http://postgis.net/&#34;&gt;PostGIS&lt;/a&gt; adds to &lt;a href=&#34;http://www.postgresql.org/&#34;&gt;PostgreSQL&lt;/a&gt;: datatypes, functions, and more features that make it easy to store and query geospatial data. Going beyond that, GeoMesa lets you store spatio-temporal data, so that event timestamps can play a role in applications that use GeoMesa. &lt;a href=&#34;http://kafka.apache.org/&#34;&gt;Apache Kafka&lt;/a&gt; provides GeoMesa with some nice infrastructure for handling real time streaming data. For example, it was used to create this animated U.S. map of tweets over the 2015 Super Bowl week.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://www.youtube.com/watch?v=ugZdvnFrh4Q&#34;&gt;&lt;img id=&#34;idm140447919702816&#34; src=&#34;https://www.bobdc.com/img/main/superbowltweets.png&#34; width=&#34;420&#34; height=&#34;236&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;As alternatives to using Accumulo for storage, GeoMesa can also use &lt;a href=&#34;https://hbase.apache.org/&#34;&gt;Apache HBase&lt;/a&gt; and &lt;a href=&#34;https://cloud.google.com/bigtable/docs/&#34;&gt;Google Cloud BigTable&lt;/a&gt;, the public version of Google&amp;rsquo;s internal Bigtable storage system. After Google heard about this, they contacted CCRi about a partnership, which was exciting enough in this town for a local TV station to run the news story shown below. That video is fun, but if you only have a minute and a half to watch a video about GeoMesa, I recommend the &lt;a href=&#34;https://www.youtube.com/watch?v=S7VTAoP0bu4&#34;&gt;GeoMesa on Google BigTable&lt;/a&gt; one, which shows off some of the excellent visualizations that are possible.&lt;/p&gt;
&lt;p&gt;In addition to products like GeoMesa and &lt;a href=&#34;http://www.ccri.com/case-studies/&#34;&gt;others&lt;/a&gt; that you can see on the website, the company does applied research, often for government agencies. (I&amp;rsquo;m learning a lot about those—did you know that the U.S. has an &lt;a href=&#34;http://www.iarpa.gov/index.php/about-iarpa/anticipating-surprise&#34;&gt;Office for Anticipating Surprise&lt;/a&gt;?) In this era of Big Data, the question sometimes comes up of how to best make use of all this data now that tools for working with such large quantities of it have become more easily available. CCRi&amp;rsquo;s &lt;a href=&#34;http://www.ccri.com/capabilities/&#34;&gt;capabilities&lt;/a&gt; such as predictive analytics, optimization, and text analysis are helping customers get more out of this data in settings ranging from international sales patterns to battlefields. If anyone wants to contact me to learn more, I&amp;rsquo;d be happy to set them up with the right people to tell them about the kinds of services CCRi offers.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://www.newsplex.com/home/headlines/Local-Company-Earns-Partnership-with-Google--303871131.html&#34;&gt;&lt;img id=&#34;idm140447919694288&#34; src=&#34;https://www.bobdc.com/img/main/googlenewsplex.png&#34; width=&#34;420&#34; height=&#34;236&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2015">2015</category>
      
      <category domain="https://www.bobdc.com//categories/ai-and-machine-learning">AI and machine learning</category>
      
      <category domain="https://www.bobdc.com//categories/gis">GIS</category>
      
    </item>
    
    <item>
      <title>13 ways to make your writing look more professional</title>
      <link>https://www.bobdc.com/blog/13-ways-to-make-your-writing-l/</link>
      <pubDate>Tue, 17 Nov 2015 14:35:41 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/13-ways-to-make-your-writing-l/</guid>
      
      
      <description><div>Simple copyediting things.</div><div>&lt;blockquote id=&#34;idm139759380511536&#34; class=&#34;pullquote&#34;&gt;The nice thing about these is that, unlike with truly good writing, no skill and very little work is required to put them into practice. They’re all just a matter of paying attention.&lt;/blockquote&gt;
&lt;p&gt;I’ve done some copyediting as part of my job, especially with marketing material. Certain basic mistakes come up so often that I made a list that I’ve been tempted to give to whoever gave me the original content and say “please make sure that it doesn’t have any of these problems first!” I didn’t, but for those who are interested, following these simple rules will make your writing look more professional. The nice thing about these is that, unlike with truly good writing, no skill and very little work is required to put them into practice. They’re all just a matter of paying attention.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Never give someone something to read that you haven’t spell checked. If it has typos that a spell checker would have caught, it’s like saying “my time is so much more valuable than yours that I couldn’t bother doing this simple, mechanical two-minute task before giving this to you.” If you’re writing with a tool that doesn’t have a spell checker, paste the text into Microsoft Word or LibreOffice and look for the red squiggly lines. If a spell checker doesn’t recognize a company name and you’re not 100% sure of its spelling, take ten seconds to check it on their website, especially if someone from that company may see the piece.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Only put one space after a period, question mark, or exclamation mark ending a sentence, not two. People used two in the days of manual typewriters for hard copy manuscripts that would be submitted to typesetters, but as with the carriage returns that we formerly added to the end of every single line on typewriters, we now leave it up to the computer to decide how much spacing is appropriate. If you put two spaces after a period, your word processor will put too much space there.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;In something published by an American company, punctuation at the end of a quoted phrase goes inside the quotes, “like this,” not outside, “like this”. In the UK they do it outside. This is a stickier issue with technical writing, where you may be referring to specific strings of quoted text; for example, if I write that a password is “swordfish”, I don’t want readers thinking that the comma is part of the password. The important thing is to be consistent within a document.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;In a bulleted or numbered list, either end all the bullets with punctuation that treats the bullets as complete sentences or end none of them that way. Don’t do this:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Go out the front door&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Pull the mail out of the mailbox.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Bring the mail back inside&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Leave the mail on the dining room table.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The items of a list like that should be grammatically consistent: all complete sentences or all grammatically consistent phrases (for example, all noun phrases) with no complete sentences. For example, if the first item says “Easier setup and installation” and the second says “Wide choice of reports,” then no other items in that list should be complete sentences.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Put consistent spacing around em dashes and don’t confuse them with hyphens. A hyphen is the keyboard character that usually connects words being used together as a single adjective as in user-friendly interface or in-memory database. An em dash (named for being the width of the letter “m”) is used for appositive phrases. It’s often written with two hyphens&lt;code&gt;--&lt;/code&gt;like this&lt;code&gt;--&lt;/code&gt;which Microsoft Word and LibreOffice will convert to an em dash character. In HTML, you can enter — or just paste the character from somewhere else. (An en dash is a bit narrower and used for date ranges. Handy hint when you&amp;rsquo;re unloading your last few tiles at the end of a Scrabble game: both em and en are legal words.) Em dashes should either have a space on both sides — like that, or on neither side—like that. Pick one spacing convention and make sure that all the em dashes in a given document are spaced consistently.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Some phrases may or may not use initial caps, like Artificial Intelligence. If you do, capitalize it consistently throughout a document. Don’t refer to Artificial Intelligence in the first paragraph and artificial intelligence in the fourth. Also, with phrases that may or may not be written as one word, pick one and be consistent; don’t write “filename” in one paragraph and “file name” further on in the document. (Early drafts of this blog post made this mistake with “spellcheck.”)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;We use apostrophes to stand in for a missing letter in a contraction (such as standing in for the “o” from “is not” in “isn’t”) or for the possessive, as in Jim’s car, so never ever use “it’s” as a possessive—“it’s” can only be used as a contraction for “it is.” Don&amp;rsquo;t use an apostrophe and an “s” to indicate a plural. (Some people make exceptions for numbers like 1990’s and abbreviations such as M.D.&amp;rsquo;s.)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Use English instead of Latin abbreviations: “for example” instead of “e.g.” and “that is” instead of “i.e.” Instead of saying “etc.,” introduce a list with “such as” to indicate that the list is incomplete and that there are probably more entries. For example, say “baseball teams such as the Mets, Yankees, and Red Sox” and not “Mets, Yankees, Red Sox, etc.”.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;In the age of the web, underlining means hypertext link. Don&amp;rsquo;t use it for anything else because it clutters a layout. (In the old days, it was an indication to a typesetter to italicize text.) For emphasis, use bold or italics. For example: &lt;em&gt;Never use an apostrophe and an “s” to indicate a plural.&lt;/em&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Check that all the links work. As with spell checking, this is best done (or redone) just before sending a document off to someone, because if you do it and then make many other edits, those edits may introduce new problems.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;If a product name is trademarked, only put the trademark symbol after the first mention of the product in a document. Here is what &lt;a href=&#34;http://www.forbes.com/sites/work-in-progress/2014/03/12/when-and-how-do-i-have-to-use-trademark-symbols/&#34;&gt;one intellectual property attorney&lt;/a&gt; tells us:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;In written documents — it articles, press releases, promotional materials, and the like — it is only necessary to use a symbol with the first instance of the mark, or with the most prominent placement of the mark. It is a common misconception that each and every instance of the mark should bear a trademark symbol. Overuse creates visual clutter and may detract from the aesthetic appeal of the piece. Provided there is at least one conspicuous use of the TM, SM, or ® on the face of the writing, do not be afraid to eliminate superfluous markings.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Don&amp;rsquo;t say “and/or.” If necessary, rewrite the sentence. In general, the use of slashes to indicate indecision is a bad idea. Decide on something, or rewrite the sentence.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;2016-12-23 update: In her article &lt;a href=&#34;https://medium.freecodecamp.com/lessons-from-a-years-worth-of-hiring-data-dacf4e7668d4&#34;&gt;Lessons from a year&amp;rsquo;s worth of hiring data&lt;/a&gt;, Aline Lerner demonstrates her surprising finding that the fewer grammatical and spelling mistakes software developers made on their resumes, the more likely they were to be worth hiring. See her &lt;a href=&#34;https://medium.freecodecamp.com/lessons-from-a-years-worth-of-hiring-data-dacf4e7668d4#228d&#34;&gt;Number of errors&lt;/a&gt; example in which many of the rules I&amp;rsquo;ve listed above are broken.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2015">2015</category>
      
      <category domain="https://www.bobdc.com//categories/documenting-software">documenting software</category>
      
    </item>
    
    <item>
      <title>Data wrangling, feature engineering, and dada</title>
      <link>https://www.bobdc.com/blog/data-wrangling-feature-enginee/</link>
      <pubDate>Sat, 17 Oct 2015 09:58:52 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/data-wrangling-feature-enginee/</guid>
      
      
      <description><div>And surrealism, and impressionism...</div><div>&lt;img id=&#34;id125024&#34; src=&#34;https://www.bobdc.com/img/main/object2bedestroyed.jpg&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;Man Ray assemblage&#34; width=&#34;300&#34;/&gt;
&lt;p&gt;In my &lt;a href=&#34;http://www.datascienceglossary.org&#34;&gt;data science glossary&lt;/a&gt;, the entry for &lt;a href=&#34;http://datascienceglossary.org/#datawrangling&#34;&gt;data wrangling&lt;/a&gt; gives this example: &amp;ldquo;If you have 900,000 &lt;em&gt;birthYear&lt;/em&gt; values of the format yyyy-mm-dd and 100,000 of the format mm/dd/yyyy and you write a Perl script to convert the latter to look like the former so that you can use them all together, you&amp;rsquo;re doing data wrangling.&amp;rdquo; Data wrangling isn&amp;rsquo;t always cleanup of messy data, but can also be more creative, downright fun work that qualifies as what machine learning people call &amp;ldquo;feature engineering,&amp;rdquo; which Charles L. Parker &lt;a href=&#34;http://blog.bigml.com/2013/02/21/everything-you-wanted-to-know-about-machine-learning-but-were-too-afraid-to-ask-part-two/&#34;&gt;described&lt;/a&gt; as &amp;ldquo;when you use your knowledge about the data to create fields that make machine learning algorithms work better.&amp;rdquo; In other words, you&amp;rsquo;re creating new fields (or features, or properties, or attributes, depending on your modeling frame of mind) from existing data to let systems do more with that data.&lt;/p&gt;
&lt;p&gt;New York&amp;rsquo;s &lt;a href=&#34;http://www.moma.org/&#34;&gt;Museum of Modern Art&lt;/a&gt; released metadata about their complete collection on &lt;a href=&#34;https://github.com/MuseumofModernArt/collection&#34;&gt;github&lt;/a&gt;, and I recently had a great time doing some data wrangling with it. I managed to transform the data so that it could answer interesting questions such as &amp;ldquo;who are the youngest painters in MoMA&amp;rsquo;s collection?&amp;rdquo; and &amp;ldquo;on average, which country&amp;rsquo;s painters make the biggest paintings?&amp;rdquo; Neither of these questions could be answered with a query against their original data.&lt;/p&gt;
&lt;p&gt;I enjoyed working with this data so much because I went to MoMA pretty regularly during my years in New York City. In addition to iconic paintings such as Picasso&amp;rsquo;s &lt;a href=&#34;http://www.moma.org/collection/works/79766?locale=en&#34;&gt;Demoiselles d&amp;rsquo;Avignon&lt;/a&gt;, Dalí&amp;rsquo;s &lt;a href=&#34;http://www.moma.org/collection/works/79018?locale=en&#34;&gt;Persistence of Memory&lt;/a&gt;, and van Gogh&amp;rsquo;s &lt;a href=&#34;http://www.moma.org/collection/works/79802?locale=en&#34;&gt;The Starry Night&lt;/a&gt;, they have many key works by my own favorites such as Marcel Duchamp and Man Ray. My wife and I were members there for several years, which let us go to the members&amp;rsquo; special openings of some exhibits, and through a friend of hers we sometimes got to go to the more exclusive pre-members&amp;rsquo; openings where we&amp;rsquo;d see celebrities such as Chuck Close and David Bowie.&lt;/p&gt;
&lt;h2 id=&#34;id124724&#34;&gt;The data&lt;/h2&gt;
&lt;p&gt;The data on github is a comma-separated value file with 123,920 rows and 14 columns that have labels across the top such as &amp;ldquo;ArtistBio&amp;rdquo;, &amp;ldquo;Medium&amp;rdquo;, and &amp;ldquo;Dimensions&amp;rdquo;. The feature engineering fun comes from looking in the more descriptive fields to find patterns that identify pieces of data that can be stored on their own with more structure so that they&amp;rsquo;re easier to query. For example, the smaller of their two Monet &lt;a href=&#34;http://www.moma.org/collection/works/80298?locale=en&#34;&gt;Water Lilies&lt;/a&gt; paintings has a &amp;ldquo;Dimensions&amp;rdquo; value of &amp;ldquo;6&amp;rsquo; 6 1/2&amp;rdquo; x 19&amp;rsquo; 7 1/2&amp;quot; (199.5 x 599 cm)&amp;quot; and Man Ray&amp;rsquo;s assemblage &lt;a href=&#34;http://www.moma.org/collection/works/81209?locale=en&#34;&gt;Indestructible Object (or Object to Be Destroyed)&lt;/a&gt; has a value of &amp;ldquo;8 7/8 x 4 3/8 x 4 5/8&amp;rdquo; (22.5 x 11 x 11.6 cm)&amp;quot;. Along with that optional third dimension, other variations in this column include the use of the symbol &amp;ldquo;×&amp;rdquo; instead of the letter &amp;ldquo;x&amp;rdquo; and descriptive additions such as &amp;ldquo;Approx.&amp;rdquo; (174 works) or &amp;ldquo;irregular&amp;rdquo; (101).&lt;/p&gt;
&lt;p&gt;I wrote a Python script that churned through this data and used regular expressions to pull individual pieces of information from several different fields. (&lt;a href=&#34;https://en.wikipedia.org/wiki/Regular_expression&#34;&gt;Regular expressions&lt;/a&gt;, also known as regexes, offer ways to look for patterns in data such as &amp;ldquo;four numeric digits followed by optional space, a hyphen, optional space, and then either two or four digits&amp;rdquo;. O&amp;rsquo;Reilly has a &lt;a href=&#34;http://shop.oreilly.com/product/0636920012337.do&#34;&gt;whole book&lt;/a&gt; about them.) For the Dimensions field, my script pulled out the metric width, height, and, if included, the depth and descriptive note. My script, available with the resulting data on &lt;a href=&#34;https://github.com/bobdc/momacsv2rdf&#34;&gt;github&lt;/a&gt;, converts all the input fields and new data to RDF so that I could query it with SPARQL. For example, when writing the previous paragraph, I knocked out some quick SPARQL queries to find that the script had pulled &amp;ldquo;Approx.&amp;rdquo; from the Dimensions data 174 times and &amp;ldquo;irregular&amp;rdquo; 101 times.&lt;/p&gt;
&lt;p&gt;I considered also outputting the results to a new CSV table with additional columns for the extracted properties, but when an artist like Elizabeth Catlett is listed as both American and Mexican, I wanted to output these two separate facts about her, which would require two columns or a separate artist nationality table to handle artists with multiple values for this field. This would be a pain with table-based data, but of course, it&amp;rsquo;s not an issue with RDF.&lt;/p&gt;
&lt;p&gt;Artist nationalities came from the CSV file&amp;rsquo;s ArtistBio column, which had simple descriptions such as &amp;ldquo;(Swiss, born 1943)&amp;rdquo; and more complex ones such as &amp;ldquo;(French and Swiss, born Switzerland 1944)&amp;rdquo; and &amp;ldquo;(American, born Germany. 1886-1969)&amp;rdquo;. For each work&amp;rsquo;s artist, my Python script&amp;rsquo;s regular expressions pulled out nationality values, where they were born if specified, their birth years, and their death years (if specified) into separate RDF triples.&lt;/p&gt;
&lt;p&gt;Not counting the header row and blank cells, the MoMA CSV file has 1,625,710 pieces of information in it. The resulting RDF has 2,364,277 triples, so it&amp;rsquo;s clearly much richer.&lt;/p&gt;
&lt;h2 id=&#34;id127395&#34;&gt;Queries to play with the new data&lt;/h2&gt;
&lt;p&gt;I could make many interesting queries against the original CSV values that were converted to triples with no manipulation, but the value of this feature engineering is clearer if we look at queries that take advantage of the new, extracted data. (For those interested in the geekier details, each bullet below links to the actual SPARQL query and results.) You&amp;rsquo;ll see that a common theme among the queries is doing a bit of arithmetic with numeric values extracted from the more descriptive CSV values, such as multiplying height by width to determine a work&amp;rsquo;s area.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;http://www.snee.com/bobdc.blog/files/momaqueries.html#q1&#34;&gt;What&amp;rsquo;s the single largest painting?&lt;/a&gt; At 798,972 square cm, James Rosenquist&amp;rsquo;s &lt;a href=&#34;&#34;&gt;F-111&lt;/a&gt;. I knew of and had seen this work, but didn&amp;rsquo;t realize until looking at his &lt;a href=&#34;https://en.wikipedia.org/wiki/James_Rosenquist&#34;&gt;Wikipedia page&lt;/a&gt; just now that F-111 was how this important sixties pop artist first came to the art world&amp;rsquo;s attention.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;http://www.snee.com/bobdc.blog/files/momaqueries.html#q2&#34;&gt;What&amp;rsquo;s the largest photograph?&lt;/a&gt; Mariah Robertson&amp;rsquo;s &lt;a href=&#34;http://www.moma.org/collection/works/163921&#34;&gt;11&lt;/a&gt;, which uses a thirty-inch-wide one-hundred-foot roll of photographs as part of a three-dimensional work. (I might not consider this a &amp;ldquo;Photograph&amp;rdquo;, but that is its Classification value in the original CSV data.)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;http://www.snee.com/bobdc.blog/files/momaqueries.html#q3&#34;&gt;What&amp;rsquo;s the largest three-dimensional work?&lt;/a&gt; The 1994 installation &lt;a href=&#34;http://www.moma.org/collection/works/163921&#34;&gt;Stations&lt;/a&gt; by Bill Viola, who first became known as a video artist. (The piece includes five video projections.)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;http://www.snee.com/bobdc.blog/files/momaqueries.html#q4&#34;&gt;How many painters come from each country?&lt;/a&gt; No surprise that the U.S. leads with 494 artists, followed by the French, German, British, and other European countries until you get to Argentina in seventh place and Japan in eighth. The full list has 52 countries, and I thought Argentina&amp;rsquo;s high placement was interesting; off the top of my head I can&amp;rsquo;t name a single artist from that country.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;http://www.snee.com/bobdc.blog/files/momaqueries.html#q5&#34;&gt;What&amp;rsquo;s the average painting size by country?&lt;/a&gt; This query filters out countries with less than eleven paintings in the collection to increase the chance of getting a representative sampling, and again it&amp;rsquo;s not a surprise that the U.S. leads with an average painting size of 28,244 square cm. (I&amp;rsquo;m sure Rosenquist helped here.) The next few are Germany, Britain, Japan, and Italy, all with average sizes over 20,000 square cm. The Russians have the smallest paintings, with the 32 of them having an average size of 6,758 square cm. I&amp;rsquo;m sure that closer analysis would find smaller or larger sizes to be favored by particular artists who are well-represented in MoMA&amp;rsquo;s collection and skewing the average for their countries.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;http://www.snee.com/bobdc.blog/files/momaqueries.html#q6&#34;&gt;What are the oldest pieces in the collection and who made them?&lt;/a&gt; Besides a brocade from 1600 by &amp;ldquo;unknown&amp;rdquo;, there are four &amp;ldquo;Black basalt with glazed interior&amp;rdquo; works dated 1768 such as this &lt;a href=&#34;http://www.moma.org/collection/works/91978&#34;&gt;sugar bowl&lt;/a&gt;. These are pretty old for a museum of modern art, but if you look at any of them you&amp;rsquo;ll see why they fit right into the collection. And, they&amp;rsquo;re credited to a familiar name: Josiah Wedgwood, founder of the &lt;a href=&#34;https://www.wedgwood.com/&#34;&gt;company that bears his name&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;http://www.snee.com/bobdc.blog/files/momaqueries.html#q7&#34;&gt;Who are the five youngest painters with work in the collection?&lt;/a&gt; One work apparently co-credited to two artists gives us a total of six names, all born in the eighties, and none of whom I&amp;rsquo;ve heard of.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Most of these queries focus on work in specific media because broader versions often ran into data anomalies that led to odd answers. For example, a query for the work in the collection took the longest to create showed several photographs that apparently took over a hundred years. I assume that the elapsed time represented the span between the exposure of the negatives and the creation of the prints in MoMA&amp;rsquo;s collection. A query for the oldest living artist seemed simple enough&amp;ndash;just look for the earliest birth year with no corresponding death year, but it turned out that there was no death date recorded for one artist born in 1731. (Sometimes the data has question marks as a birth or death date, but I didn&amp;rsquo;t want to store those in a property that I&amp;rsquo;d use to perform arithmetic.) A query about the youngest artist in the whole collection found that it was someone named &amp;ldquo;Technology will save us&amp;rdquo; born in 2012&amp;ndash;clearly a collective founded in that year and not a person. Also, since all artist names and information are properties of a &amp;ldquo;work&amp;rdquo;, an artist whose name is spelled two different ways will be considered as two different artists with the current setup.&lt;/p&gt;
&lt;p&gt;Other odd answers led to tweaks to the regular expressions and other logic in the data conversion and queries, but at some point, unless someone&amp;rsquo;s paying you otherwise, you&amp;rsquo;ve got to quit and make the best you can of what you have. (On this topic, I highly recommend Jeni Tennison&amp;rsquo;s classic &lt;a href=&#34;https://theodi.org/blog/five-stages-of-data-grief&#34;&gt;Five Stages of Data Grief&lt;/a&gt;.)&lt;/p&gt;
&lt;p&gt;Even if my script doesn&amp;rsquo;t create perfect data about every work in MoMA&amp;rsquo;s collection, the data it creates still offers plenty to query. I think it demonstrates pretty nicely how data wrangling techniques such as the use of regular expressions&amp;ndash;in addition to cleaning up messes such as badly formatted data&amp;ndash;can do the kind of feature engineering that improve a dataset to make it even more useful.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Photo of Man Ray&amp;rsquo;s &amp;ldquo;Indestructible Object (or Object to be Destroyed)&amp;rdquo; by &lt;a href=&#34;https://www.flickr.com/photos/chrisjb/&#34;&gt;Chris Barker&lt;/a&gt; via &lt;a href=&#34;https://www.flickr.com/photos/chrisjb/3980831927&#34;&gt;Flickr&lt;/a&gt; (&lt;a href=&#34;https://creativecommons.org/licenses/by-nc-nd/2.0/&#34;&gt;CC BY-NC-ND 2.0&lt;/a&gt;)&lt;/em&gt;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2015">2015</category>
      
      <category domain="https://www.bobdc.com//categories/rdf">RDF</category>
      
      <category domain="https://www.bobdc.com//categories/data-science">data science</category>
      
    </item>
    
    <item>
      <title>My data science glossary</title>
      <link>https://www.bobdc.com/blog/my-data-science-glossary/</link>
      <pubDate>Sat, 19 Sep 2015 10:23:04 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/my-data-science-glossary/</guid>
      
      
      <description><div>Complete with a dot org domain name.</div><div>&lt;img id=&#34;id128296&#34; src=&#34;https://www.bobdc.com/img/main/glossary.jpg&#34; border=&#34;0&#34; width=&#34;240&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;glossary in dictionary&#34;/&gt;
&lt;p&gt;Lately I&amp;rsquo;ve been studying up on the math and technology associated with data science because there are so many interesting things going on. Despite taking many notes, I found myself learning certain important terms, seeing them again later, and then thinking &amp;ldquo;What was that again? P-values? Huh?&amp;rdquo;&lt;/p&gt;
&lt;p&gt;So, I turned a portion of my notes into a glossary to make these things easy to look up when I wanted to remember them. I decided that I may as well publish this glossary in case others found it helpful, or if they had suggestions or corrections. And, when I found that the domain name datascienceglossary.org wasn&amp;rsquo;t taken, I couldn&amp;rsquo;t resist grabbing it.&lt;/p&gt;
&lt;p&gt;Now it&amp;rsquo;s up and ready for the world: &lt;a href=&#34;http://www.datascienceglossary.org&#34;&gt;datascienceglossary.org&lt;/a&gt;. I also took the opportunity to try out &lt;a href=&#34;http://getbootstrap.com/&#34;&gt;Bootstrap&lt;/a&gt; to see how easily it might make my new little website look presentable on Android and Apple phones and tablets in addition to bigger screens. It was pretty easy, especially after I found their &lt;a href=&#34;http://getbootstrap.com/css/&#34;&gt;documentation page&lt;/a&gt;. (In the past, I&amp;rsquo;ve found that many CSS frameworks that are supposed to make your life easier have horrible if any documentation&amp;ndash;&amp;ldquo;just look out our fabulous examples&amp;rdquo; isn&amp;rsquo;t enough; if the class values that we&amp;rsquo;re supposed to assign to our HTML elements are packed with cryptic little abbreviations, then &lt;em&gt;tell us what all the abbreviations stand for&lt;/em&gt;.)&lt;/p&gt;
&lt;p&gt;I hope my data science glossary is useful to some people. I know it will be useful to me, especially the next time I forget what &amp;ldquo;P-value&amp;rdquo; means.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2015">2015</category>
      
      <category domain="https://www.bobdc.com//categories/data-science">data science</category>
      
    </item>
    
    <item>
      <title>Querying machine learning movie ratings data with SPARQL</title>
      <link>https://www.bobdc.com/blog/querying-machine-learning-movi/</link>
      <pubDate>Sat, 22 Aug 2015 10:10:21 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/querying-machine-learning-movi/</guid>
      
      
      <description><div>Well, movie ratings data popular with machine learning people.</div><div>&lt;blockquote id=&#34;id118468&#34; class=&#34;pullquote&#34;&gt;I hope that more people using R, pandas, and other popular tools associated with data science projects appreciate what a nice addition SPARQL can be to their tool box.&lt;/blockquote&gt;
&lt;p&gt;While watching an excellent &lt;a href=&#34;https://vimeo.com/59324550&#34;&gt;video&lt;/a&gt; about the &lt;a href=&#34;http://pandas.pydata.org/&#34;&gt;pandas&lt;/a&gt; python data analysis library recently, I learned about how the University of Minnesota&amp;rsquo;s &lt;a href=&#34;http://grouplens.org/&#34;&gt;grouplens&lt;/a&gt; project has made a large amount of movie rating data from the &lt;a href=&#34;https://movielens.org/&#34;&gt;movielens&lt;/a&gt; website available. Their &lt;a href=&#34;http://grouplens.org/datasets/movielens/&#34;&gt;download&lt;/a&gt; page lets you pull down 100,000, one million, ten million, or 100 million ratings, including data about the people doing the rating and the movies they rated.&lt;/p&gt;
&lt;p&gt;This dataset is popular in the machine learning world: a &lt;a href=&#34;https://www.google.com/search?q=movielens+%22machine+learning%22&#34;&gt;Google search on &amp;ldquo;movielens &amp;lsquo;machine learning&amp;rsquo;&amp;rdquo;&lt;/a&gt; gets over 33 thousand hits, with over ten percent being in scholarly articles. I thought it would be fun to query this data with SPARQL, so I downloaded the 1 million rating set, wrote some short perl scripts to convert the ratings, users, and movies &amp;ldquo;tables&amp;rdquo; to turtle RDF, and was off and running.&lt;/p&gt;
&lt;h2 id=&#34;id118281&#34;&gt;The data&lt;/h2&gt;
&lt;p&gt;I put &amp;ldquo;tables&amp;rdquo; in quotes above because while most people like to think of data in terms of tables, the data about the movies themselves was not strictly a normalized table. As the &lt;a href=&#34;http://files.grouplens.org/datasets/movielens/ml-1m-README.txt&#34;&gt;README&lt;/a&gt; file tells us, each line has the structure &amp;ldquo;MovieID::Title::Genres&amp;rdquo;, in which Genres is a pipe-delimited list of one or more genres selected from the list in the README file. Here&amp;rsquo;s one example:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;3932::Invisible Man, The (1933)::Horror|Sci-Fi
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The potential presence of more than one genre value in that last column means that this table&amp;rsquo;s data is not fully normalized, but speaking as an RDF guy, we don&amp;rsquo;t need no stinkin&amp;rsquo; normalization. A short perl script converted that line into the following turtle:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;gldm:i3932 rdfs:label &amp;quot;The Invisible Man&amp;quot; ;
   a schema:Movie ;
   dcterms:type &amp;quot;Horror&amp;quot; ;
   dcterms:type &amp;quot;Sci-Fi&amp;quot; ;
   schema:datePublished &amp;quot;1933&amp;quot; .
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;As you can see, my perl script also moved the word &amp;ldquo;The&amp;rdquo; in the film&amp;rsquo;s title back where it belonged and pulled the release date out into its own triple, which let me query for things like the effect of a movie&amp;rsquo;s age on its popularity among viewers. Although the 3,883 movies listed went back to 1919, most were from the 1990s.&lt;/p&gt;
&lt;p&gt;Something else from the 1990s was the movie file&amp;rsquo;s Latin 1 encoding, so I used the &lt;a href=&#34;http://www.gnu.org/savannah-checkouts/gnu/libiconv/documentation/libiconv-1.13/iconv.1.html&#34;&gt;iconv&lt;/a&gt; utility to convert it to UTF-8 before running the script that turned it into turtle so that a title such as &amp;ldquo;Not Love, Just Frenzy (Más que amor, frenesí)&amp;rdquo; wouldn&amp;rsquo;t get mangled along the way.&lt;/p&gt;
&lt;p&gt;A simpler perl script converted user descriptions of the format &amp;ldquo;UserID::Gender::Age::Occupation::Zip-code&amp;rdquo; to triples like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;gldu:i48 a schema:Person ;
   schema:gender &amp;quot;M&amp;quot; ;
   glschema:age glda:i25 ;
   schema:jobTitle gldo:i4 ;
   schema:postalCode &amp;quot;92107&amp;quot; .
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I created a ratingsSchemaAndCodeLists.ttl file to assign the age range and job title labels shown in the README file to the age and jobTitle values with triples like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;glda:i25 rdfs:label &amp;quot;25-34&amp;quot; . 
gldo:i4 a schema:jobTitle ;
   rdfs:label &amp;quot;college/grad student&amp;quot; . 
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Finally, a third perl script converted ratings lines of the format &amp;ldquo;UserID::MovieID::Rating::Timestamp&amp;rdquo; to triples grouped together with blank nodes like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;[
  a schema:Review ;
  schema:author gldu:i1 ;
  schema:about gldm:i661 ;
  schema:reviewRating 3 ;
  dcterms:date &amp;quot;2000-12-31&amp;quot; 
] .
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The scripts and the ratingsSchemaAndCodeLists.ttl file are available on &lt;a href=&#34;https://github.com/bobdc/movielens2rdf&#34;&gt;github&lt;/a&gt;, and you can see the queries described below and their results at &lt;a href=&#34;http://www.snee.com/bobdc.blog/files/movieLensQueries.html&#34;&gt;movieLensQueries.html&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;id120755&#34;&gt;The queries&lt;/h2&gt;
&lt;p&gt;I mentioned that most of the movies were from the 1990s; the &lt;a href=&#34;http://www.snee.com/bobdc.blog/files/movieLensQueries.html#q1results&#34;&gt;results&lt;/a&gt; of &lt;a href=&#34;http://www.snee.com/bobdc.blog/files/movieLensQueries.html#q1&#34;&gt;query 1&lt;/a&gt; show the actual number of rated movies by release year.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://www.snee.com/bobdc.blog/files/movieLensQueries.html#q2&#34;&gt;Query 2&lt;/a&gt; listed the movie genres sorted by the average ratings they received. The &lt;a href=&#34;http://www.snee.com/bobdc.blog/files/movieLensQueries.html#q2results&#34;&gt;results&lt;/a&gt; put Film-Noir, Documentary, War, and Drama in the top four spots. Does that make these four genres the most popular? Perhaps, if you measure popularity by assigned ratings, but if you measure it by the movies that people actually choose to see (or, more accurately, to rate), as &lt;a href=&#34;http://www.snee.com/bobdc.blog/files/movieLensQueries.html#q3&#34;&gt;query 3&lt;/a&gt; does, the &lt;a href=&#34;http://www.snee.com/bobdc.blog/files/movieLensQueries.html#q3results&#34;&gt;results&lt;/a&gt; reveal that the four most popular genres to see are Comedy, Drama, Action, and Thrillers, with Film-Noir and Documentary ranking in the bottom two spots.&lt;/p&gt;
&lt;p&gt;Breaking ratings down by age group makes things more interesting. &lt;a href=&#34;http://www.snee.com/bobdc.blog/files/movieLensQueries.html#q4&#34;&gt;Query 4&lt;/a&gt; asks for average ratings by age group, and the &lt;a href=&#34;http://www.snee.com/bobdc.blog/files/movieLensQueries.html#q4results&#34;&gt;results&lt;/a&gt; show a strong correlation between age and ratings: while movie viewers aged 18-24 give slightly lower ratings than those under 18&amp;ndash;it is a cynical age to be&amp;ndash;from there on up, the older the viewers, the higher the average ratings.&lt;/p&gt;
&lt;p&gt;What are each age group&amp;rsquo;s favorite genres by rating and by attendance? &lt;a href=&#34;http://www.snee.com/bobdc.blog/files/movieLensQueries.html#q5&#34;&gt;Query 5&lt;/a&gt; asks for attendance figures and average ratings broken down by age group and genres. In the &lt;a href=&#34;http://www.snee.com/bobdc.blog/files/movieLensQueries.html#q5aresults&#34;&gt;first version of these results&lt;/a&gt;, sorted by rating, we see that most age groups give the highest average ratings to Film Noir, Documentary, and War movies, in that order, except the two oldest groups, who rate War movies higher than Documentaries, and the youngest group, whose average rating for Documentary films puts them behind Film-Noir, War and Drama.&lt;/p&gt;
&lt;p&gt;With the same results &lt;a href=&#34;http://www.snee.com/bobdc.blog/files/movieLensQueries.html#q5bresults&#34;&gt;sorted by attendance&lt;/a&gt; within each age group, we see that the three age groups under 35 prefer to watch Comedy, Drama, and Action movies, in that order. Most people 35 and older would rather watch Drama than Comedies, with Action in third place for them as well.&lt;/p&gt;
&lt;p&gt;I was curious whether a movie&amp;rsquo;s age affected viewers&amp;rsquo; choices of what to see and their ratings&amp;ndash;for example, when watching a movie that you&amp;rsquo;ve heard about for a few years, are you more likely to assume that it&amp;rsquo;s good because it hasn&amp;rsquo;t faded away? &lt;a href=&#34;http://www.snee.com/bobdc.blog/files/movieLensQueries.html#q6&#34;&gt;Query 6&lt;/a&gt; lists the average ratings given to movies by movie type if the movie was seen more than five years after release. In &lt;a href=&#34;http://www.snee.com/bobdc.blog/files/movieLensQueries.html#q6results&#34;&gt;these results&lt;/a&gt;, Film Noir is once again at the top, but the average rating of War movies puts them above Documentaries, and Mysteries climb from seventh to fourth place.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://www.snee.com/bobdc.blog/files/movieLensQueries.html#q7&#34;&gt;Query 7&lt;/a&gt; asks the same thing about movies that were ten years old when viewed. &lt;a href=&#34;http://www.snee.com/bobdc.blog/files/movieLensQueries.html#q7results&#34;&gt;These results&lt;/a&gt; show Mysteries climbing to third place and pushing Documentaries down to fourth, so it appears that Mysteries age better than Documentaries. (Nothing ages better than Film-Noir, whose average ratings go up with age, but remember that they&amp;rsquo;re not nearly as popular to watch as the other genres; people who like them just like them more.)&lt;/p&gt;
&lt;p&gt;Finally, &lt;a href=&#34;http://www.snee.com/bobdc.blog/files/movieLensQueries.html#q8&#34;&gt;Query 8&lt;/a&gt; asks for the average ratings and total attendance by age group for the movies that were more than ten years old when viewed. Comparing the &lt;a href=&#34;http://www.snee.com/bobdc.blog/files/movieLensQueries.html#q8aresults&#34;&gt;results sorted by rating&lt;/a&gt; with the same figures calculated for all movies (the &lt;a href=&#34;http://www.snee.com/bobdc.blog/files/movieLensQueries.html#q5aresults&#34;&gt;first query 5 results&lt;/a&gt;), we see that it&amp;rsquo;s the older movie viewers driving the higher ratings of older Mysteries over Documentaries&amp;ndash;the ratings of the 199 movie viewers aged 18-24 actually put Documentaries at the top of their list of older movies. The same results &lt;a href=&#34;http://www.snee.com/bobdc.blog/files/movieLensQueries.html#q8bresults&#34;&gt;sorted by attendance&lt;/a&gt; were remarkably similar to the &lt;a href=&#34;http://www.snee.com/bobdc.blog/files/movieLensQueries.html#q5bresults&#34;&gt;query 5 version&lt;/a&gt; that took all the movies into account.&lt;/p&gt;
&lt;h2 id=&#34;id121016&#34;&gt;And more queries&lt;/h2&gt;
&lt;p&gt;It&amp;rsquo;s easy to think of more questions to ask; we haven&amp;rsquo;t even asked about about specific movies and their roles in the ratings. For example, what were these older Documentaries that the 18-24 year-old viewers liked so much? Perhaps there was some breakout hit that skewed the averages by being more popular than Documentaries typically are. Do viewers&amp;rsquo; genders or job titles affect their choice of movies to see or the ratings they gave them? If you&amp;rsquo;re wondering, or thinking of new queries, you can download the data from the grouplens link above, convert it to turtle with my perl scripts, and query away.&lt;/p&gt;
&lt;p&gt;With more recent ratings and movies, these kinds of explorations of the data could be used to plan advertising budgets or a film festival program. I mostly found it fun as a way to use SPARQL to explore a set of data that was not designed to be represented in RDF, but was very easy to convert, and I hope that more people using R, pandas, and other popular tools associated with data science projects appreciate what a great addition SPARQL can be to their tool box.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2015">2015</category>
      
      <category domain="https://www.bobdc.com//categories/ai-and-machine-learning">AI and machine learning</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
    </item>
    
    <item>
      <title>Visualizing DBpedia geographic data</title>
      <link>https://www.bobdc.com/blog/visualizing-dbpedia-geographic/</link>
      <pubDate>Wed, 15 Jul 2015 08:34:43 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/visualizing-dbpedia-geographic/</guid>
      
      
      <description><div>With some help from SPARQL.</div><div>&lt;a href=&#39;https://www.bobdc.com/img/main/astrobirthplaces.png&#39;&gt;
  &lt;img id=&#34;id139196&#34; src=&#34;https://www.bobdc.com/img/main/astrobirthplaces.png&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; width=&#34;400&#34; alt=&#34;US astronaut birth places&#34;/&gt;
  &lt;/a&gt;
&lt;p&gt;I&amp;rsquo;ve been learning about &lt;a href=&#34;https://en.wikipedia.org/wiki/Geographic_information_system&#34;&gt;Geographical Information System&lt;/a&gt; (GIS) data lately. More and more projects and businesses are doing interesting things by associating new kinds of data with specific latitude/longitude pairs; this data might be about &lt;a href=&#34;http://geohealth.us/index.html&#34;&gt;air quality&lt;/a&gt;, &lt;a href=&#34;http://www.zillow.com&#34;&gt;real estate prices&lt;/a&gt;, or &lt;a href=&#34;https://www.uber.com/&#34;&gt;the make and model of the nearest Uber car&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://wiki.dbpedia.org/&#34;&gt;DBpedia&lt;/a&gt; has a lot of latitude and longitude data, and SPARQL queries let you associate it with other data. Because you can retrieve these query results as CSV files, and many GIS packages can read CSV data, you can do a lot of similar interesting things yourself.&lt;/p&gt;
&lt;p&gt;A &lt;a href=&#34;http://dbpedia.org/snorql/?query=SELECT+*+WHERE+%7B%0D%0A++%3Fastronaut+dcterms%3Asubject+category%3AAmerican_astronauts+%3B%0D%0A+++++++++++++dbpedia-owl%3AbirthYear+%3FbirthYear+%3B+%0D%0A++++++++++++++dbpedia2%3Anationality+%3AUnited_States+.++%0D%0A%7D%0D%0AORDER+BY+%3FbirthYear%0D%0A&#34;&gt;query of DBpedia data about American astronauts&lt;/a&gt; shows that the oldest one was born in 1918 and the youngest one was born in 1979. I wondered whether, over time, there were any patterns in what part of the country they came from, and I managed to combine a DBpedia SPARQL query with an open-source GIS visualization package to create the map shown here.&lt;/p&gt;
&lt;p&gt;The following query asks for the birth year and latitude and longitude of the birthplace of each American astronaut:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;SELECT (MAX(?latitude) AS ?maxlat) (MAX(?longitude) AS ?maxlong) 
       ?astronaut (substr(str(MAX(?birthYear)),1,4) AS ?by) 
  WHERE {
  ?astronaut dcterms:subject category:American_astronauts ;
             dbpedia-owl:birthPlace ?birthPlace ;
             dbpedia-owl:birthYear ?birthYear ; 
              dbpedia2:nationality :United_States .  
  ?birthPlace geo:lat ?latitude ;
              geo:long ?longitude . 
}
GROUP BY ?astronaut
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;(The query has no prefix declarations because it uses the ones built into DBpedia. Also, because some places have more than one pair of geo:lat and geo:long values, I found it simplest to just take the maximum value of each to get one pair for each astronaut.) The following shows the first few lines of the result when I asked for CSV:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;quot;maxlat&amp;quot;,&amp;quot;maxlong&amp;quot;,&amp;quot;astronaut&amp;quot;,&amp;quot;by&amp;quot;
37.195,-93.2861,&amp;quot;http://dbpedia.org/resource/Janet_L._Kavandi&amp;quot;,&amp;quot;1959&amp;quot;
42.6461,-83.2925,&amp;quot;http://dbpedia.org/resource/Brent_W._Jett,_Jr.&amp;quot;,&amp;quot;1958&amp;quot;
40.1,-75.0997,&amp;quot;http://dbpedia.org/resource/John-David_F._Bartoe&amp;quot;,&amp;quot;1944&amp;quot;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;a href=&#34;http://www.qgis.org/en/site/&#34;&gt;QGIS&lt;/a&gt; Desktop is an open-source tool for working with GIS data that, among other things, lets you visualize data. The data can come from disk files or from several other sources, including the &lt;a href=&#34;http://postgis.net/&#34;&gt;PostGIS&lt;/a&gt; add-on to the &lt;a href=&#34;http://www.postgresql.org/&#34;&gt;PostgreSQL&lt;/a&gt; database, which lets you scale up pretty far in the amount of data you can work with.&lt;/p&gt;
&lt;p&gt;Using QGIS to create the image above, I first loaded the &lt;a href=&#34;https://en.wikipedia.org/wiki/Shapefile&#34;&gt;shapefile&lt;/a&gt; (actually a collection of files, including an old-fashioned dBase dbf file) from the &lt;a href=&#34;https://www.census.gov/cgi-bin/geo/shapefiles2014/main&#34;&gt;US Census website&lt;/a&gt; with outlines of the individual states of the United States.&lt;/p&gt;
&lt;p&gt;GIS visualization is often about layering of data such as state boundaries, altitude data, and roads to see the combined effects; &lt;a href=&#34;https://commons.wikimedia.org/wiki/File:Uber_screenshot.PNG&#34;&gt;those little cars&lt;/a&gt; in your phone&amp;rsquo;s Uber app would like kind of silly if the roads and your current location weren&amp;rsquo;t shown with them. For my experiment, the census shapefile was my first layer, and QGIS Desktop&amp;rsquo;s &amp;ldquo;Add Delimited Text Layer&amp;rdquo; feature let me add the results of my SPARQL query about astronaut data as another layer. One tricky bit for us GIS novices is that these tools usually ask you to specify a &lt;a href=&#34;http://gis.stackexchange.com/questions/23690/is-wgs84-itself-a-coordinate-reference-system&#34;&gt;Coordinate Reference System&lt;/a&gt; for any set of data, typically as an &lt;a href=&#34;http://www.epsg-registry.org/&#34;&gt;EPSG&lt;/a&gt; number, and there are a lot of those out there. I used EPSG 4269.&lt;/p&gt;
&lt;p&gt;At first, QGIS added in all the astronaut birthplace locations as little black circles filled with the same shade of green. It had also set the default fill color of the US map to green, so I reset that to white in the dialog box for configuring that layer&amp;rsquo;s properties. Then, in the astronaut data layer&amp;rsquo;s properties, I found that instead of using identical symbols to represent each point on the map, I could pick &amp;ldquo;Graduated&amp;rdquo; and specify a &amp;ldquo;color ramp&amp;rdquo; that QGIS would use to assign color values according to the values in the property that I selected for this: &lt;code&gt;by&lt;/code&gt;, or birth year, which you&amp;rsquo;ll recognized from the fourth column of the sample CSV output above. QGIS looked at the lowest and highest of these values and offered to assign the following colors to &lt;code&gt;by&lt;/code&gt; values in the ranges shown, and I just accepted the default:&lt;/p&gt;
&lt;img id=&#34;id141520&#34; src=&#34;https://www.bobdc.com/img/main/qgiscolors.png&#34; style=&#34;display: block;margin-left: auto;margin-right: auto&#34; border=&#34;0&#34; class=&#34;rightAlignedOpeningPicture&#34; width=&#34;400&#34; alt=&#34;QGIS color configuration&#34;/&gt;
&lt;p&gt;(While the earlier query showed a few astronauts born in 1978 and 1979, the range here only goes up to 1977 because I now see that some geographic coordinates in DBpedia are specified with &lt;code&gt;dbpprop:latitude&lt;/code&gt; and &lt;code&gt;dbpprop:longitude&lt;/code&gt; instead of &lt;code&gt;geo:lat&lt;/code&gt; and &lt;code&gt;geo:long&lt;/code&gt;, so if I was redoing this I&amp;rsquo;d revise the query to take those into account.)&lt;/p&gt;
&lt;p&gt;If you look at a &lt;a href=&#39;https://www.bobdc.com/img/main/astrobirthplaces.png&#39;&gt;larger image of the map above&lt;/a&gt;, you&amp;rsquo;ll see that many early astronauts came from the midwest, and then over time, they gradually came from the four corners of the continental US. Why so many from the New York City area and none from Wyoming? Is there something in New York more conducive to producing astronauts than the wide-open spaces of Wyoming? Yes: there are more people there, so the odds are that more astronauts will come from there. See &lt;a href=&#34;https://xkcd.com/1138/&#34;&gt;this excellent xkcd cartoon&lt;/a&gt; for more on this principle.&lt;/p&gt;
&lt;p&gt;I only scratched the surface of what QGIS can do. I found &lt;a href=&#34;https://www.youtube.com/watch?v=WAbOR_E2xtI&#34;&gt;this video from the Vermont Center for Geographic Info&lt;/a&gt; to be an excellent introduction. I learned from it and the book &lt;a href=&#34;http://www.amazon.com/exec/obidos/ISBN=1617291390/bobducharmeA/&#34;&gt;PostGIS in Action&lt;/a&gt; that an important set of features that GIS systems such as QGIS add is the automation of some of the math involved in computing distances and areas, which is not simple geometry because it takes place on the curved surface of the earth. A package like PostGIS adds specialized datatypes and functions to a general-purpose database like PostgreSQL to do the more difficult parts of the geography math. This lets your SQL queries do proximity analysis and other GIS tasks as well as handing off of such data to a visualization tool such as QGIS. (The open-source &lt;a href=&#34;http://www.geomesa.org/&#34;&gt;GeoMesa&lt;/a&gt; database adds similar features to &lt;a href=&#34;https://accumulo.apache.org/&#34;&gt;Apache Accumulo&lt;/a&gt; and &lt;a href=&#34;https://en.wikipedia.org/wiki/BigTable&#34;&gt;Google BigTable&lt;/a&gt; for more Hadoop-scale applications.)&lt;/p&gt;
&lt;p&gt;The great news for SPARQL users is that a GIS extension called &lt;a href=&#34;https://en.wikipedia.org/wiki/GeoSPARQL&#34;&gt;GeoSPARQL&lt;/a&gt; does something similar. You can try it out at the &lt;a href=&#34;http://geosparql.org/&#34;&gt;geosparql.org&lt;/a&gt; website. For example, entering the following query there will list all the airports within 10 miles of New York City:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;PREFIX spatial:&amp;lt;http://jena.apache.org/spatial#&amp;gt;
PREFIX rdfs: &amp;lt;http://www.w3.org/2000/01/rdf-schema#&amp;gt;
PREFIX geo:&amp;lt;http://www.w3.org/2003/01/geo/wgs84_pos#&amp;gt;
PREFIX gn:&amp;lt;http://www.geonames.org/ontology#&amp;gt;


Select ?name 
WHERE{
  ?object spatial:nearby(40.712700 -74.005898 10 &#39;mi&#39;).
  ?object a &amp;lt;http://www.lotico.com/ontology/Airport&amp;gt; ;
  gn:name ?name 
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;(The data uses a fairly broad definition of &amp;ldquo;airport,&amp;rdquo; including heliports and seaplane bases.) I have not played with any GeoSPARQL implementations outside of geosparql.org, but the &lt;a href=&#34;http://parliament.semwebcentral.org/&#34;&gt;Parliament&lt;/a&gt; one mentioned on the GeoSPARQL wikipedia page looks interesting. I have not played much with the &lt;a href=&#34;http://sisinflab.poliba.it/semanticweb/lod/losm/&#34;&gt;Linked Open Streeet Map SPARQL endpoint&lt;/a&gt;, but it also looks great for people who interested in GIS and SPARQL.&lt;/p&gt;
&lt;p&gt;Whether you try out GeoSPARQL or not, when you take DBpedia&amp;rsquo;s ability to associate such a broad range of data with geographic coordinates, and you combine that with the ability of GIS visualization tools like QGIS to work with that data (especially the ability to visualize the associated data—in my case, the color coding of astronaut birth years), you have a vast new category of cool things you can do with SPARQL.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2015">2015</category>
      
      <category domain="https://www.bobdc.com//categories/dbpedia">DBpedia</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
    </item>
    
    <item>
      <title>Artificial Intelligence, then (1960) and now</title>
      <link>https://www.bobdc.com/blog/artificial-intelligence-then-1/</link>
      <pubDate>Sat, 20 Jun 2015 10:50:25 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/artificial-intelligence-then-1/</guid>
      
      
      <description><div>Especially machine learning.</div><div>&lt;blockquote id=&#34;id104247&#34; class=&#34;pullquote&#34;&gt;It&#39;s fascinating how relevant much of [this 1960 paper] still is today, especially considering the limited computing power available 55 years ago.&lt;/blockquote&gt;
&lt;p&gt;Earlier this month I &lt;a href=&#34;https://twitter.com/bobdc/status/606169477496061953&#34;&gt;tweeted&lt;/a&gt; &amp;ldquo;When people write about AI like it&amp;rsquo;s this brand new thing, should I be amused, feel old, or both?&amp;rdquo; The tweet linked to a recent Harvard Business Review article called &lt;a href=&#34;https://hbr.org/2015/05/data-scientists-dont-scale&#34;&gt;Data Scientists Don&amp;rsquo;t Scale&lt;/a&gt; about the things that Artificial Intelligence is currently doing, which just happened to be the things that the author of the article&amp;rsquo;s automated prose-generation company is doing.&lt;/p&gt;
&lt;p&gt;The article provided absolutely no historical context to this phrase that has thrilled, annoyed, and fascinated people since the term was first coined by John McCarthy in 1955. (For a little historical context, this was two years after Dwight Eisenhower succeeded Harry Truman as President of the United States. Three years later, McCarthy invented Lisp—a programming language that, besides providing the basis of other popular languages such as Scheme and the currently very hot Clojure, is still used today.) I recently came across a link to the seminal 1960 paper &lt;a href=&#34;https://web.media.mit.edu/~minsky/papers/steps.html&#34;&gt;Steps Toward Artificial Intelligence&lt;/a&gt; by AI pioneer Marvin Minsky, who was there at the beginning in 1955, and so I read it on a long plane ride. It&amp;rsquo;s fascinating how relevant much of it still is today, especially when you take into account the limited computing power available 55 years ago.&lt;/p&gt;
&lt;p&gt;After enumerating the five basic categories of &amp;ldquo;making computers solve really difficult problems&amp;rdquo; (search, pattern-recognition, learning, planning, and induction), the paper mentions several algorithms that are still considered to be basic tools in Machine Learning toolboxes: &lt;a href=&#34;https://www.youtube.com/watch?v=kOFBnKDGtJM&#34;&gt;hill climbing&lt;/a&gt;, &lt;a href=&#34;http://en.wikipedia.org/wiki/Naive_Bayes_classifier&#34;&gt;naive Bayesian classification&lt;/a&gt;, &lt;a href=&#34;http://en.wikipedia.org/wiki/Perceptron&#34;&gt;perceptrons&lt;/a&gt;, &lt;a href=&#34;https://www.udacity.com/course/machine-learning-reinforcement-learning--ud820&#34;&gt;reinforcement learning&lt;/a&gt;, and &lt;a href=&#34;https://en.wikipedia.org/wiki/Artificial_neural_network&#34;&gt;neural nets&lt;/a&gt;. He mentions that one part of Bayesian classification &amp;ldquo;can be made by a simple network device&amp;rdquo; that he illustrates with this diagram:&lt;/p&gt;
&lt;img id=&#34;id104104&#34; src=&#34;https://www.bobdc.com/img/main/minskybayes.png&#34; border=&#34;0&#34; style=&#34;display: block;margin-left: auto;margin-right: auto &#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;some description&#34;/&gt;
&lt;p&gt;It&amp;rsquo;s wild to consider that the software possibilities were so limited at the time that implementing some of these ideas were easier by just building specialized hardware. Minksy also describes the implementation of a certain math game by a network of resistors as designed by Claude Shannon (who I was happy to hear mentioned in the season 1 finale of &lt;a href=&#34;http://www.imdb.com/title/tt2575988/&#34;&gt;Silicon Valley)&lt;/a&gt;:&lt;/p&gt;
&lt;img id=&#34;id104068&#34; src=&#34;https://www.bobdc.com/img/main/shannonresistors.png&#34; border=&#34;0&#34; style=&#34;display: block;margin-left: auto;margin-right: auto &#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;some description&#34;/&gt;
&lt;p&gt;Minsky&amp;rsquo;s paper also references the work of B.F. Skinner, of &lt;a href=&#34;https://en.wikipedia.org/wiki/Operant_conditioning_chamber&#34;&gt;Skinner box&lt;/a&gt; fame, when describing reinforcement learning, and it cites Noam Chomsky when describing inductive learning. I mention these two together because this past week I also read an interview that took place just three years ago titled &lt;a href=&#34;http://www.theatlantic.com/technology/archive/2012/11/noam-chomsky-on-where-artificial-intelligence-went-wrong/261637/?single_page=true&#34;&gt;Noam Chomsky on Where Artificial Intelligence Went Wrong&lt;/a&gt;. Describing those early days of AI research, the interview&amp;rsquo;s introduction tells us how&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Some of McCarthy&amp;rsquo;s colleagues in neighboring departments, however, were more interested in how intelligence is implemented in humans (and other animals) first. Noam Chomsky and others worked on what became cognitive science, a field aimed at uncovering the mental representations and rules that underlie our perceptual and cognitive abilities. Chomsky and his colleagues had to overthrow the then-dominant paradigm of behaviorism, championed by Harvard psychologist B.F. Skinner, where animal behavior was reduced to a simple set of associations between an action and its subsequent reward or punishment. The undoing of Skinner&amp;rsquo;s grip on psychology is commonly marked by Chomsky&amp;rsquo;s 1959 critical review of Skinner&amp;rsquo;s book &lt;a href=&#34;http://www.chomsky.info/articles/1967----.htm&#34;&gt;Verbal Behavior&lt;/a&gt;, a book in which Skinner attempted to explain linguistic ability using behaviorist principles.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The introduction goes on to describe a 2011 symposium at MIT on &amp;ldquo;Brains, Minds and Machines,&amp;rdquo; which &amp;ldquo;was meant to inspire multidisciplinary enthusiasm for the revival of the scientific question from which the field of artificial intelligence originated: how does intelligence work?&amp;rdquo;&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Noam Chomsky, speaking in the symposium, wasn&amp;rsquo;t so enthused. Chomsky critiqued the field of AI for adopting an approach reminiscent of behaviorism, except in more modern, computationally sophisticated form. Chomsky argued that the field&amp;rsquo;s heavy use of statistical techniques to pick regularities in masses of data is unlikely to yield the explanatory insight that science ought to offer. For Chomsky, the &amp;ldquo;new AI&amp;rdquo; — focused on using statistical learning techniques to better mine and predict data — is unlikely to yield general principles about the nature of intelligent beings or about cognition.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The whole interview is worth reading. I&amp;rsquo;m not saying that I completely agree with Chomsky or completely disagree (as Google&amp;rsquo;s Peter Norvig has in an essay that has the excellent URL &lt;a href=&#34;http://norvig.com/chomsky.html&#34;&gt;http://norvig.com/chomsky.html&lt;/a&gt; but gets a little ad hominem when he starts comparing Chomsky to Bill O&amp;rsquo;Reilly), only that Minsky&amp;rsquo;s 1960 paper and Chomsky&amp;rsquo;s 2012 interview, taken together, provide a good perspective on where AI came from and the path it took to the roles it play today.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ll closed with this nice quote from a discussion in Minsky&amp;rsquo;s paper of what exactly &amp;ldquo;intelligence&amp;rdquo; is and whether machines are capable of it:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Programmers, too, know that there is never any &amp;ldquo;heart&amp;rdquo; in a program. There are high-level routines in each program, but all they do is dictate that &amp;ldquo;if such-and-such, then transfer to such-and-such a subroutine.&amp;rdquo; And when we look at the low-level subroutines, which &amp;ldquo;actually do the work,&amp;rdquo; we find senseless loops and sequences of trivial operations, merely carrying out the dictates of their superiors. The intelligence in such a system seems to be as intangible as becomes the meaning of a single common word when it is thoughtfully pronounced over and over again.&lt;/p&gt;
&lt;/blockquote&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2015">2015</category>
      
      <category domain="https://www.bobdc.com//categories/ai-and-machine-learning">AI and machine learning</category>
      
    </item>
    
    <item>
      <title>SPARQL: the video</title>
      <link>https://www.bobdc.com/blog/sparql-the-video/</link>
      <pubDate>Sun, 03 May 2015 16:15:07 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/sparql-the-video/</guid>
      
      
      <description><div>Well, a video, but a lot of important SPARQL basics in a short period of time.</div><div>&lt;p&gt;&lt;a href=&#34;https://www.youtube.com/watch?v=FvGndkpa4K0&#34;&gt;&lt;img id=&#34;id146827&#34; src=&#34;https://www.bobdc.com/img/main/sparqlvideo1still.jpg&#34; width=&#34;320&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;SPARQL in 11 minutes&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;While doing training for a TopQuadrant customer recently, the schedule led to my having ten minutes to explain the basics of writing SPARQL queries. I think I did OK, but on the plane home I thought harder about what to put in those ten minutes, which led to my making the video &lt;a href=&#34;https://www.youtube.com/watch?v=FvGndkpa4K0&#34;&gt;SPARQL in 11 minutes&lt;/a&gt;. While the video is 11 minutes and 14 seconds long, between the opening part about RDF and the plug for &lt;a href=&#34;http://www.learningsparql.com&#34;&gt;Learning SPARQL&lt;/a&gt; at the end, the SPARQL introduction is less than eight minutes.&lt;/p&gt;
&lt;p&gt;After explaining what RDF triples are and how they&amp;rsquo;re represented in Turtle, the video walks through some simple SELECT queries and how they work with the data. This leads up to a CONSTRUCT query and a list of other things that people will find useful if they learn more about SPARQL. I had a lot of fun making the video&amp;rsquo;s &lt;a href=&#34;http://www.snee.com/bobdc.blog/files/KorgMonotronPanningStereo.mp3&#34;&gt;SPARQL engine noise&lt;/a&gt; with my &lt;a href=&#34;http://www.amazon.com/exec/obidos/ISBN=B00684KFFW/bobducharmeA/&#34;&gt;Korg Monotron synthesizer&lt;/a&gt; and also making more traditional music for the introduction and ending.&lt;/p&gt;
&lt;p&gt;I hope this video is helpful for people who are new to SPARQL. The other SPARQL videos on YouTube are mostly real-time classroom lectures. My favorite is an ad for what seems like a Dutch cable TV provider that has nothing to do with the query language but has the excellent domain name &lt;a href=&#34;http://www.sparql.nl&#34;&gt;sparql.nl&lt;/a&gt;. If you skip ahead to 1:03 of &lt;a href=&#34;https://www.youtube.com/watch?v=BZxMJ5s3WiU#t=1m03s&#34;&gt;this ad&lt;/a&gt; for the company, you&amp;rsquo;ll see a finger snap turn into a swirl of flames and then their shining &amp;ldquo;sparql&amp;rdquo; logo, all with the most dramatic music possible. My production values were not quite that high, but higher than most of the other SPARQL videos you&amp;rsquo;ll find on YouTube.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://www.youtube.com/watch?v=BZxMJ5s3WiU#t=1m03s&#34;&gt;&lt;img id=&#34;id146706&#34; src=&#34;https://www.bobdc.com/img/main/sparqlnl.jpg&#34; border=&#34;0&#34; style=&#34;display: block;margin-left: auto;margin-right: auto &#34; width=&#34;240&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;some description&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2015">2015</category>
      
    </item>
    
    <item>
      <title>Running Spark GraphX algorithms on Library of Congress subject heading SKOS</title>
      <link>https://www.bobdc.com/blog/running-spark-graphx-algorithm/</link>
      <pubDate>Sun, 12 Apr 2015 09:55:45 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/running-spark-graphx-algorithm/</guid>
      
      
      <description><div>Well, one algorithm, but a very cool one.</div><div>&lt;img id=&#34;id116247&#34; src=&#34;https://www.bobdc.com/img/main/GraphXLoCSKOS.png&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; width=&#34;160&#34; alt=&#34;GraphX LoC SKOS logos&#34;/&gt;
&lt;p&gt;&lt;em&gt;(This blog entry has also been published on the &lt;a href=&#34;https://databricks.com/blog/2015/04/14/running-spark-graphx-algorithms-on-library-of-congress-subject-heading-skos.html&#34;&gt;databricks company blog&lt;/a&gt;.)&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Last month, in &lt;a href=&#34;https://www.bobdc.com/blog/spark-and-sparql-rdf-graphs-an&#34;&gt;Spark and SPARQL; RDF Graphs and GraphX&lt;/a&gt;, I described how Apache Spark has emerged as a more efficient alternative to MapReduce for distributing computing jobs across clusters. I also described how Spark&amp;rsquo;s GraphX library lets you do this kind of computing on graph data structures and how I had some ideas for using it with RDF data. My goal was to use RDF technology on GraphX data and vice versa to demonstrate how they could help each other, and I demonstrated the former with a Scala program that output some GraphX data as RDF and then showed some SPARQL queries to run on that RDF.&lt;/p&gt;
&lt;p&gt;Today I&amp;rsquo;m demonstrating the latter by reading in a well-known RDF dataset and executing GraphX&amp;rsquo;s Connected Components algorithm on it. This algorithm collects nodes into groupings that connect to each other but not to any other nodes. In classic Big Data scenarios, this helps applications perform tasks such as the identification of subnetworks of people within larger networks, giving clues about which products or cat videos to suggest to those people based on what their friends liked.&lt;/p&gt;
&lt;p&gt;The US Library of Congress has been working on their &lt;a href=&#34;http://id.loc.gov/authorities/subjects.html&#34;&gt;Subject Headings&lt;/a&gt; metadata since 1898, and it&amp;rsquo;s available in SKOS RDF. Many of the subjects include &amp;ldquo;related&amp;rdquo; values; for example, you can see that the subject &lt;a href=&#34;http://id.loc.gov/authorities/subjects/sh85027617.html&#34;&gt;Cocktails&lt;/a&gt; has related values of &lt;a href=&#34;http://id.loc.gov/authorities/subjects/sh85027615.html&#34;&gt;Cocktail parties&lt;/a&gt; and &lt;a href=&#34;http://id.loc.gov/authorities/subjects/sh2009010761.html&#34;&gt;Happy hours&lt;/a&gt;, and that Happy hours has related values of &lt;a href=&#34;http://id.loc.gov/authorities/subjects/sh93000452.html&#34;&gt;Bars (Drinking establishments)&lt;/a&gt;, &lt;a href=&#34;http://id.loc.gov/authorities/subjects/sh85113249.html&#34;&gt;Restaurants&lt;/a&gt;, and Cocktails. So, while it includes skos:related triples that indirectly link Cocktails to Restaurants, it has none that link these to the subject of &lt;a href=&#34;http://id.loc.gov/authorities/subjects/sh85125961.html&#34;&gt;Space stations&lt;/a&gt;, so the Space stations subject is not part of the same Connected Components subgraph as the Cocktails subject.&lt;/p&gt;
&lt;p&gt;After reading the Library of Congress Subject Header RDF into a GraphX graph and running the Connected Components algorithm on the skos:related connections, here are some of the groupings I found near the beginning of the output:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;quot;Hiding places&amp;quot; 
&amp;quot;Secrecy&amp;quot; 
&amp;quot;Loneliness&amp;quot; 
&amp;quot;Solitude&amp;quot; 
&amp;quot;Privacy&amp;quot; 
--------------------------
&amp;quot;Cocktails&amp;quot; 
&amp;quot;Bars (Drinking establishments)&amp;quot; 
&amp;quot;Cocktail parties&amp;quot; 
&amp;quot;Restaurants&amp;quot; 
&amp;quot;Happy hours&amp;quot; 
--------------------------
&amp;quot;Space stations&amp;quot; 
&amp;quot;Space colonies&amp;quot; 
&amp;quot;Large space structures (Astronautics)&amp;quot; 
&amp;quot;Extraterrestrial bases&amp;quot; 
--------------------------
&amp;quot;Inanna (Sumerian deity)&amp;quot; 
&amp;quot;Ishtar (Assyro-Babylonian deity)&amp;quot; 
&amp;quot;Astarte (Phoenician deity)&amp;quot; 
--------------------------
&amp;quot;Cross-cultural orientation&amp;quot; 
&amp;quot;Cultural competence&amp;quot; 
&amp;quot;Multilingual communication&amp;quot; 
&amp;quot;Intercultural communication&amp;quot; 
&amp;quot;Technical assistance--Anthropological aspects&amp;quot; 
--------------------------
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;(You can find the &lt;a href=&#34;http://snee.com/bobdc.blog/files/readLoCSH.out&#34;&gt;complete output here&lt;/a&gt;, a 565K file.) People working with RDF-based applications already know that this kind of data can help to enhance search. For example, someone searching for media about &amp;ldquo;Space stations&amp;rdquo; will probably also be interested in media filed under &amp;ldquo;Space colonies&amp;rdquo; and &amp;ldquo;Extraterrestrial bases&amp;rdquo;. This data can also help other applications, and now, it can help distributed applications that use Spark.&lt;/p&gt;
&lt;h2 id=&#34;id116235&#34;&gt;Storing RDF in GraphX data structures&lt;/h2&gt;
&lt;p&gt;First, as I mentioned in the earlier blog entry, GraphX development currently means coding with the Scala programming language, so I have been learning Scala. My old friend from XML days &lt;a href=&#34;http://www.contakt.org/&#34;&gt;Tony Coates&lt;/a&gt; wrote &lt;a href=&#34;http://www.contakt.org/Blog/Post/13/A-Scala-API-for-RDF-Processing&#34;&gt;A Scala API for RDF Processing&lt;/a&gt;, which takes better advantage of native Scala data structures than I ever could, and the &lt;a href=&#34;https://github.com/w3c/banana-rdf&#34;&gt;banana-rdf Scala library&lt;/a&gt; also looks interesting, but although I was using Scala my main interest was storing RDF in Spark GraphX data structures, not in Scala particularly.&lt;/p&gt;
&lt;p&gt;The basic Spark data structure is the Resilient Distributed Dataset, or RDD. The graph data structure used by GraphX is a combination of an RDD for vertices and one for edges. Each of these RDDs can have additional information; the Spark website&amp;rsquo;s &lt;a href=&#34;https://spark.apache.org/docs/1.1.1/graphx-programming-guide.html#example-property-graph&#34;&gt;Example Property Graph&lt;/a&gt; includes (name, role) pairs with its vertices and descriptive property strings with its edges. The obvious first step for storing RDF in a GraphX graph would be to store predicates in the edges RDD, subjects and resource objects in the vertices RDD, and literal properties as extra information in these RDDs like the (name, role) pairs and edge description strings in the Spark website&amp;rsquo;s Example Property Graph.&lt;/p&gt;
&lt;p&gt;But, as I also wrote last time, a hardcore RDF person would ask &lt;a href=&#34;https://www.bobdc.com/blog/spark-and-sparql-rdf-graphs-an#id106263&#34;&gt;these questions&lt;/a&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;What about properties of edges? For example, what if I wanted to say that an &lt;code&gt;xp:advisor&lt;/code&gt; property was an &lt;code&gt;rdfs:subPropertyOf&lt;/code&gt; the Dublin Core property &lt;code&gt;dc:contributor&lt;/code&gt;?&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The ability to assign properties such as a name of &amp;ldquo;rxin&amp;rdquo; and a role of &amp;ldquo;student&amp;rdquo; to a node like 3L is nice, but what if I don&amp;rsquo;t have a consistent set of properties that will be assigned to every node—for example, if I&amp;rsquo;ve aggregated person data from two different sources that don&amp;rsquo;t use all the same properties to describe these persons?&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The Example Property Graph can store these (name, role) pairs with the vertices because that RDD is declared as &lt;code&gt;RDD[(VertexId, (String, String))]&lt;/code&gt;. Each vertex will have two strings stored with it; no more and no less. It&amp;rsquo;s a data structure, but you can also think of it as a proscriptive schema, and the second bullet above is asking how to get around that.&lt;/p&gt;
&lt;p&gt;I got around both issues by storing the data in three data structures—the two RDDs described above and one more:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;For the vertex RDD, along with the required long integer that must be stored as each vertex&amp;rsquo;s identifier, I only stored one extra piece of information: the URI associated with that RDF resource. I did this for the subjects, the predicates (which may not be &amp;ldquo;vertices&amp;rdquo; in the GraphX sense of the word, but damn it, they&amp;rsquo;re resources that can be the subjects or objects of triples if I want them to), and the relevant objects. After reading the triple { &lt;code&gt;&amp;lt;http://id.loc.gov/authorities/subjects/sh85027617&amp;gt; &amp;lt;http://www.w3.org/2004/02/skos/core#related&amp;gt; &amp;lt;http://id.loc.gov/authorities/subjects/sh2009010761&amp;gt;&lt;/code&gt;} from the Library of Congress data, the program will create three vertices in this RDD whose node identifiers might be 1L, 2L, and 3L, with each of the triple&amp;rsquo;s URIs stored with one of these RDD vertices.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;For the edge RDD, along with the required two long integers identifying the vertices at the start and end of the edge, each of my edges also stores the URI of the relevant predicate as the &amp;ldquo;description&amp;rdquo; of the edge. The edge for the triple above would be (1L, 3L, &lt;a href=&#34;http://www.w3.org/2004/02/skos/core&#34;&gt;http://www.w3.org/2004/02/skos/core&lt;/a&gt;#related).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;To augment the graph data structure created from the two RDDs above, I created a third RDD to store literal property values. Each entry stores the long integer representing the vertex of the resource that has the property, a long integer representing the property (the integer assigned to that property in the vertex RDD), and a string representing the property value. For the triple { &lt;code&gt;&amp;lt;http://id.loc.gov/authorities/subjects/sh2009010761&amp;gt; &amp;lt;http://www.w3.org/2004/02/skos/core#prefLabel&amp;gt; &amp;quot;Happy hours&amp;quot;&lt;/code&gt;} it might store (3L, 4L, &amp;ldquo;Happy hours&amp;rdquo;), assuming that 4L had been stored as the internal identifier for the skos:prefLabel property. To run the Connected Components algorithm and then output the preferred label of each member of each subgraph, I didn&amp;rsquo;t need this RDD, but it does open up many possibilities for what you can do with RDF in an a Spark GraphX program.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;id118737&#34;&gt;Creating a report on Library of Congress Subject Heading connecting components&lt;/h2&gt;
&lt;p&gt;After loading up these data structures (plus another one that allows quick lookups of preferred labels) my program below applies the GraphX Connected Components algorithm to the subset of the graph that uses the skos:related property to connect vertices such as &amp;ldquo;Cocktails&amp;rdquo; and &amp;ldquo;Happy hours&amp;rdquo;. Iterating through the results, it uses them to load a hash map with a list for each subgraph of connected components. Then, it goes through each of these lists, printing the label associated with each member of each subgraph and a string of hyphens to show where each list ends, as you can see in the excerpt above.&lt;/p&gt;
&lt;p&gt;I won&amp;rsquo;t go into more detail about what&amp;rsquo;s in my program because I commented it pretty heavily. (I do have to thank my friend Tony, mentioned above, for helping me past one point where I was stuck on a Scala scoping issue. Also, as I&amp;rsquo;ve warned before, my coding style will probably make experienced Scala programmers choke on their Red Bull. I&amp;rsquo;d be happy to hear about suggested improvements.)&lt;/p&gt;
&lt;p&gt;After getting the program to run properly with a small subset of the data, I ran it on the 1 GB subjects-skos-2014-0306.nt file that I downloaded from the Library of Congress with its 7,705,147 triples. Spark lets applications scale up by giving you an infrastructure to distribute program execution across multiple machines, but the 8GB on my single machine wasn&amp;rsquo;t enough to run this, so I used two grep commands to create a version of the data that only had the skos:related and skos:prefLabel triples. At this point I had a total of 439,430 triples. Because my code didn&amp;rsquo;t account for blank nodes, I removed the 385 triples that used them, leaving 439,045 to work with in a 60MB file. This ran successfully and you can follow the link shown earlier to see the complete output.&lt;/p&gt;
&lt;h2 id=&#34;id118755&#34;&gt;Other GraphX algorithms to run on your RDF data&lt;/h2&gt;
&lt;p&gt;&lt;a href=&#34;https://spark.apache.org/docs/latest/graphx-programming-guide.html#graph-algorithms&#34;&gt;Other GraphX algorithms&lt;/a&gt; besides Connected Components include Page Rank and Triangle Counting. &lt;a href=&#34;http://en.wikipedia.org/wiki/Graph_theory&#34;&gt;Graph theory&lt;/a&gt; is an interesting world, in which my favorite phrase so far is &amp;ldquo;&lt;a href=&#34;http://en.wikipedia.org/wiki/Strangulated_graph&#34;&gt;strangulated graph&lt;/a&gt;&amp;rdquo;.&lt;/p&gt;
&lt;p&gt;One of the greatest things about RDF and Linked Data technology is the &lt;a href=&#34;http://linkeddata.org/&#34;&gt;growing amount&lt;/a&gt; of interesting data being made publicly available, and with new tools such as these algorithms to work with this data—tools that can be run on inexpensive, scalable clusters faster than typical Hadoop MapReduce jobs—there are a lot of great possibilities.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;//////////////////////////////////////////////////////////////////
// readLoCSH.scala: read Library of Congress Subject Headings into
// Spark GraphX graph and apply connectedComponents algorithm to those
// connected by skos:related property.


import scala.io.Source 
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.graphx._
import org.apache.spark.rdd.RDD
import scala.collection.mutable.ListBuffer
import scala.collection.mutable.HashMap


object readLoCSH {


    val componentLists = HashMap[VertexId, ListBuffer[VertexId]]()
    val prefLabelMap =  HashMap[VertexId, String]()


    def main(args: Array[String]) {
        val sc = new SparkContext(&amp;quot;local&amp;quot;, &amp;quot;readLoCSH&amp;quot;, &amp;quot;127.0.0.1&amp;quot;)


        // regex pattern for end of triple
        val tripleEndingPattern = &amp;quot;&amp;quot;&amp;quot;\s*\.\s*$&amp;quot;&amp;quot;&amp;quot;.r    
        // regex pattern for language tag
        val languageTagPattern = &amp;quot;@[\\w-]+&amp;quot;.r    


        // Parameters of GraphX Edge are subject, object, and predicate
        // identifiers. RDF traditionally does (s, p, o) order but in GraphX
        // it&#39;s (edge start node, edge end node, edge description).


        // Scala beginner hack: I couldn&#39;t figure out how to declare an empty
        // array of Edges and then append Edges to it (or how to declare it
        // as a mutable ArrayBuffer, which would have been even better), but I
        // can append to an array started like the following, and will remove
        // the first Edge when creating the RDD.


        var edgeArray = Array(Edge(0L,0L,&amp;quot;http://dummy/URI&amp;quot;))
        var literalPropsTriplesArray = new Array[(Long,Long,String)](0)
        var vertexArray = new Array[(Long,String)](0)


        // Read the Library of Congress n-triples file
        //val source = Source.fromFile(&amp;quot;sampleSubjects.nt&amp;quot;,&amp;quot;UTF-8&amp;quot;)  // shorter for testing
        val source = Source.fromFile(&amp;quot;PrefLabelAndRelatedMinusBlankNodes.nt&amp;quot;,&amp;quot;UTF-8&amp;quot;)


        val lines = source.getLines.toArray


        // When parsing the data we read, use this map to check whether each
        // URI has come up before.
        var vertexURIMap = new HashMap[String, Long];


        // Parse the data into triples.
        var triple = new Array[String](3)
        var nextVertexNum = 0L
        for (i &amp;lt;- 0 until lines.length) {
            // Space in next line needed for line after that. 
            lines(i) = tripleEndingPattern.replaceFirstIn(lines(i),&amp;quot; &amp;quot;)  
            triple = lines(i).mkString.split(&amp;quot;&amp;gt;\\s+&amp;quot;)       // split on &amp;quot;&amp;gt; &amp;quot;
            // Variables have the word &amp;quot;triple&amp;quot; in them because &amp;quot;object&amp;quot; 
            // by itself is a Scala keyword.
            val tripleSubject = triple(0).substring(1)   // substring() call
            val triplePredicate = triple(1).substring(1) // to remove &amp;quot;&amp;lt;&amp;quot;
            if (!(vertexURIMap.contains(tripleSubject))) {
                vertexURIMap(tripleSubject) = nextVertexNum
                nextVertexNum += 1
            }
            if (!(vertexURIMap.contains(triplePredicate))) {
                vertexURIMap(triplePredicate) = nextVertexNum
                nextVertexNum += 1
            }
            val subjectVertexNumber = vertexURIMap(tripleSubject)
            val predicateVertexNumber = vertexURIMap(triplePredicate)


            // If the first character of the third part is a &amp;lt;, it&#39;s a URI;
            // otherwise, a literal value. (Needs more code to account for
            // blank nodes.)
            if (triple(2)(0) == &#39;&amp;lt;&#39;) { 
                val tripleObject = triple(2).substring(1)   // Lose that &amp;lt;.
                if (!(vertexURIMap.contains(tripleObject))) {
                    vertexURIMap(tripleObject) = nextVertexNum
                    nextVertexNum += 1
                }
                val objectVertexNumber = vertexURIMap(tripleObject)
                edgeArray = edgeArray :+
                    Edge(subjectVertexNumber,objectVertexNumber,triplePredicate)
            }
            else {
                literalPropsTriplesArray = literalPropsTriplesArray :+
                    (subjectVertexNumber,predicateVertexNumber,triple(2))
            }
        }


        // Switch value and key for vertexArray that we&#39;ll use to create the
        // GraphX graph.
        for ((k, v) &amp;lt;- vertexURIMap) vertexArray = vertexArray :+  (v, k)   


        // We&#39;ll be looking up a lot of prefLabels, so create a hashmap for them. 
        for (i &amp;lt;- 0 until literalPropsTriplesArray.length) {
            if (literalPropsTriplesArray(i)._2 ==
                vertexURIMap(&amp;quot;http://www.w3.org/2004/02/skos/core#prefLabel&amp;quot;)) {
                // Lose the language tag.
                val prefLabel =
                    languageTagPattern.replaceFirstIn(literalPropsTriplesArray(i)._3,&amp;quot;&amp;quot;)
                prefLabelMap(literalPropsTriplesArray(i)._1) = prefLabel;
            }
        }


        // Create RDDs and Graph from the parsed data.


        // vertexRDD Long: the GraphX longint identifier. String: the URI.
        val vertexRDD: RDD[(Long, String)] = sc.parallelize(vertexArray)


        // edgeRDD String: the URI of the triple predicate. Trimming off the
        // first Edge in the array because it was only used to initialize it.
        val edgeRDD: RDD[Edge[(String)]] =
            sc.parallelize(edgeArray.slice(1,edgeArray.length))


        // literalPropsTriples Long, Long, and String: the subject and predicate
        // vertex numbers and the the literal value that the predicate is
        // associating with the subject.
        val literalPropsTriplesRDD: RDD[(Long,Long,String)] =
            sc.parallelize(literalPropsTriplesArray)


        val graph: Graph[String, String] = Graph(vertexRDD, edgeRDD)


        // Create a subgraph based on the vertices connected by SKOS &amp;quot;related&amp;quot;
        // property.
        val skosRelatedSubgraph =
            graph.subgraph(t =&amp;gt; t.attr ==
                           &amp;quot;http://www.w3.org/2004/02/skos/core#related&amp;quot;)


        // Find connected components  of skosRelatedSubgraph.
        val ccGraph = skosRelatedSubgraph.connectedComponents() 


        // Fill the componentLists hashmap.
        skosRelatedSubgraph.vertices.leftJoin(ccGraph.vertices) {
        case (id, u, comp) =&amp;gt; comp.get
        }.foreach
        { case (id, startingNode) =&amp;gt; 
          {
              // Add id to the list of components with a key of comp.get
              if (!(componentLists.contains(startingNode))) {
                  componentLists(startingNode) = new ListBuffer[VertexId]
              }
              componentLists(startingNode) += id
          }
        }


        // Output a report on the connected components. 
        println(&amp;quot;------  connected components in SKOS \&amp;quot;related\&amp;quot; triples ------\n&amp;quot;)
        for ((component, componentList) &amp;lt;- componentLists){
            if (componentList.size &amp;gt; 1) { // don&#39;t bother with lists of only 1
                for(c &amp;lt;- componentList) {
                    println(prefLabelMap(c));
                }
                println(&amp;quot;--------------------------&amp;quot;)
            }
        }


        sc.stop
    }
}
&lt;/code&gt;&lt;/pre&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2015">2015</category>
      
    </item>
    
    <item>
      <title>Spark and SPARQL; RDF Graphs and GraphX</title>
      <link>https://www.bobdc.com/blog/spark-and-sparql-rdf-graphs-an/</link>
      <pubDate>Sun, 29 Mar 2015 12:24:38 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/spark-and-sparql-rdf-graphs-an/</guid>
      
      
      <description><div>Some interesting possibilities for working together.</div><div>&lt;img id=&#34;id104233&#34; src=&#34;https://www.bobdc.com/img/main/graphxrdf.png&#34; width=&#34;160&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;some description&#34;/&gt;
&lt;p&gt;In &lt;a href=&#34;http://ibmdatamag.com/2015/03/spark-is-the-new-black/&#34;&gt;Spark Is the New Black&lt;/a&gt; in IBM Data Magazine, I recently wrote about how popular the Apache Spark framework is for both Hadoop and non-Hadoop projects these days, and how for many people it goes so far as to replace one of Hadoop&amp;rsquo;s fundamental components: MapReduce. (I still have trouble writing &amp;ldquo;Spar&amp;rdquo; without writing &amp;ldquo;ql&amp;rdquo; after it.) While waiting for that piece to be copyedited, I came across &lt;a href=&#34;http://svds.com/post/5-reasons-why-spark-matters-business&#34;&gt;5 Reasons Why Spark Matters to Business&lt;/a&gt; by my old XML.com editor Edd Dumbill and &lt;a href=&#34;http://www.infoworld.com/article/2897287/big-data/5-reasons-to-turn-to-spark-for-big-data-analytics.html&#34;&gt;5 reasons to turn to Spark for big data analytics&lt;/a&gt; in InfoWorld, giving me a total of 10 reasons that Spark&amp;hellip; is getting hotter.&lt;/p&gt;
&lt;p&gt;I originally became interested in Spark because one of its key libraries is &lt;a href=&#34;http://spark.apache.org/graphx/&#34;&gt;GraphX&lt;/a&gt;, Spark&amp;rsquo;s API for working with graphs of nodes and arcs. The &amp;ldquo;GraphX: Unifying Data-Parallel and Graph-Parallel Analytics&amp;rdquo; paper by GraphX&amp;rsquo;s inventors (&lt;a href=&#34;https://amplab.cs.berkeley.edu/wp-content/uploads/2014/02/graphx.pdf&#34;&gt;pdf&lt;/a&gt;) has a whole section on RDF as related work, saying &amp;ldquo;we adopt some of the core ideas from the RDF work including the triples view of graphs.&amp;rdquo; The possibility of using such a hot new Big Data technology with RDF was intriguing, so I decided to look int it.&lt;/p&gt;
&lt;p&gt;I thought it would be interesting to output a typical GraphX graph as RDF so that I could perform SPARQL queries on it that were not typical of GraphX processing, and then to go the other way: read a good-sized RDF dataset into GraphX and do things with it that would not be typical of SPARQL processing. I have had some success at both, so I think that RDF and GraphX systems have much to offer each other.&lt;/p&gt;
&lt;p&gt;This wouldn&amp;rsquo;t have been very difficult if I wasn&amp;rsquo;t learning the Scala programming language as I went along, but GraphX libraries are not available for Python or Java yet, so what you see below is essentially my first Scala program. A huge help in my attempts to learn Scala, Spark, and GraphX were the &lt;a href=&#34;https://www.sics.se/~amir/files/download/dic/&#34;&gt;class handouts&lt;/a&gt; of Swedish Institute of Computer Science senior researcher Amir H. Payberah. I just stumbled across them in some web searches while trying to get a Scala GraphX program to compile, and his PDFs introducing &lt;a href=&#34;https://www.sics.se/~amir/files/download/dic/scala.pdf&#34;&gt;Scala&lt;/a&gt;, &lt;a href=&#34;https://www.sics.se/~amir/files/download/dic/spark.pdf&#34;&gt;Spark&lt;/a&gt;, and &lt;a href=&#34;https://www.sics.se/~amir/files/download/dic/graph_processing.pdf&#34;&gt;graph processing&lt;/a&gt; (especially the GraphX parts) lit a lot of &amp;ldquo;a-ha&amp;rdquo; lightbulbs for me, and I had already looked through several introductions to Scala and Spark. He has since encouraged me to share the link to course materials for his &lt;a href=&#34;https://www.sics.se/~amir/cloud14/&#34;&gt;current course on cloud computing&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;While I had a general idea of &lt;em&gt;how&lt;/em&gt; &lt;a href=&#34;http://en.wikipedia.org/wiki/Functional_programming&#34;&gt;functional programming languages&lt;/a&gt; worked, one of the lightbulbs that Dr. Payberah&amp;rsquo;s work lit for me was &lt;em&gt;why&lt;/em&gt; they&amp;rsquo;re valuable, at least in the case of using Spark from Scala: Spark provides higher-order functions that can hand off your own functions and data to structures that can be stored in distributed memory. This allows the kinds of interactive and iterative (for example, machine learning) tasks that generally don&amp;rsquo;t work well with Hadoop&amp;rsquo;s batch-oriented MapReduce model. Apparently, for tasks that would work fine with MapReduce, Spark versions also run much faster because their better use of memory lets them avoid all the disk I/O that is typical of MapReduce jobs.&lt;/p&gt;
&lt;p&gt;Spark lets you use this distributed memory by providing a data structure called a Resilient Distributed Dataset, or RDD. When you store your data in RDDs, you can let Spark take care of their distribution across a computing cluster. GraphX lets you store a set of nodes, arcs, and—crucially for us RDF types—extra information about each in RDDs. To output a &amp;ldquo;typical&amp;rdquo; GraphX graph structure as RDF, I took the &lt;a href=&#34;https://spark.apache.org/docs/1.1.1/graphx-programming-guide.html#example-property-graph&#34;&gt;Example Property Graph&lt;/a&gt; example in the Apache Spark &lt;a href=&#34;https://spark.apache.org/docs/1.1.1/graphx-programming-guide.html&#34;&gt;GraphX Programming Guide&lt;/a&gt; and expanded it a bit. (If experienced Scala programmers don&amp;rsquo;t gag when they see my program, they will in my next installment, where I show how I read RDF into GraphX RDDs. Corrections welcome.)&lt;/p&gt;
&lt;p&gt;My Scala program below, like the Example Property Graph mentioned above, creates an RDD called &lt;code&gt;users&lt;/code&gt; of nodes about people at a university and an RDD called &lt;code&gt;relationships&lt;/code&gt; that stores information about edges that connect the nodes. RDDs use long integers such as the 3L and 7L values shown below as identifiers for the nodes, and you&amp;rsquo;ll see that it can store additional information about nodes—for example, that node 3L is named &amp;ldquo;rxin&amp;rdquo; and has the title &amp;ldquo;student&amp;rdquo;—as well as additional information about edges—for example, that the user represented by 5L has an &amp;ldquo;advisor&amp;rdquo; relationship to user 3L. I added a few extra nodes and edges to give the eventual SPARQL queries a little more to work with.&lt;/p&gt;
&lt;p&gt;Once the node and edge RDDs are defined, the program creates a graph from them. After that, I added code to output RDF triples about node relationships to other nodes (or, in RDF parlance, object property triples) using a base URI that I defined at the top of the program to convert identifiers to URIs when necessary. This produced triples such as &lt;code&gt;&amp;lt;http://snee.com/xpropgraph#istoica&amp;gt; &amp;lt;http://snee.com/xpropgraph#colleague&amp;gt; &amp;lt;http://snee.com/xpropgraph#franklin&amp;gt;&lt;/code&gt; in the output. Finally, the program outputs non-relationship values (literal properties), producing triples such as &lt;code&gt;&amp;lt;http://snee.com/xpropgraph#rxin&amp;gt; &amp;lt;http://snee.com/xpropgraph#role&amp;gt; &amp;quot;student&amp;quot;&lt;/code&gt;.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.graphx._
import org.apache.spark.rdd.RDD


object ExamplePropertyGraph {
    def main(args: Array[String]) {
        val baseURI = &amp;quot;http://snee.com/xpropgraph#&amp;quot;
    val sc = new SparkContext(&amp;quot;local&amp;quot;, &amp;quot;ExamplePropertyGraph&amp;quot;, &amp;quot;127.0.0.1&amp;quot;)


        // Create an RDD for the vertices
        val users: RDD[(VertexId, (String, String))] =
            sc.parallelize(Array(
                (3L, (&amp;quot;rxin&amp;quot;, &amp;quot;student&amp;quot;)),
                (7L, (&amp;quot;jgonzal&amp;quot;, &amp;quot;postdoc&amp;quot;)),
                (5L, (&amp;quot;franklin&amp;quot;, &amp;quot;prof&amp;quot;)),
                (2L, (&amp;quot;istoica&amp;quot;, &amp;quot;prof&amp;quot;)),
                // Following lines are new data
                (8L, (&amp;quot;bshears&amp;quot;, &amp;quot;student&amp;quot;)),
                (9L, (&amp;quot;nphelge&amp;quot;, &amp;quot;student&amp;quot;)),
                (10L, (&amp;quot;asmithee&amp;quot;, &amp;quot;student&amp;quot;)),
                (11L, (&amp;quot;rmutt&amp;quot;, &amp;quot;student&amp;quot;)),
                (12L, (&amp;quot;ntufnel&amp;quot;, &amp;quot;student&amp;quot;))
            ))
        // Create an RDD for edges
        val relationships: RDD[Edge[String]] =
            sc.parallelize(Array(
                Edge(3L, 7L, &amp;quot;collab&amp;quot;),
                Edge(5L, 3L, &amp;quot;advisor&amp;quot;),
                Edge(2L, 5L, &amp;quot;colleague&amp;quot;),
                Edge(5L, 7L, &amp;quot;pi&amp;quot;),
                // Following lines are new data
                Edge(5L, 8L, &amp;quot;advisor&amp;quot;),
                Edge(2L, 9L, &amp;quot;advisor&amp;quot;),
                Edge(5L, 10L, &amp;quot;advisor&amp;quot;),
                Edge(2L, 11L, &amp;quot;advisor&amp;quot;)
            ))
        // Build the initial Graph
        val graph = Graph(users, relationships)


        // Output object property triples
        graph.triplets.foreach( t =&amp;gt; println(
            s&amp;quot;&amp;lt;$baseURI${t.srcAttr._1}&amp;gt; &amp;lt;$baseURI${t.attr}&amp;gt; &amp;lt;$baseURI${t.dstAttr._1}&amp;gt; .&amp;quot;
        ))


        // Output literal property triples
        users.foreach(t =&amp;gt; println(
            s&amp;quot;&amp;quot;&amp;quot;&amp;lt;$baseURI${t._2._1}&amp;gt; &amp;lt;${baseURI}role&amp;gt; \&amp;quot;${t._2._2}\&amp;quot; .&amp;quot;&amp;quot;&amp;quot;
        ))


        sc.stop


    }
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The program writes out the RDF with full URIs for each every resource, but I&amp;rsquo;m showing a Turtle version here that uses prefixes to help it fit on this page better:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;@prefix xp: &amp;lt;http://snee.com/xpropgraph#&amp;gt; . 


xp:istoica  xp:colleague xp:franklin .
xp:istoica  xp:advisor   xp:nphelge .
xp:istoica  xp:advisor   xp:rmutt .
xp:rxin     xp:collab    xp:jgonzal .
xp:franklin xp:advisor   xp:rxin .
xp:franklin xp:pi        xp:jgonzal .
xp:franklin xp:advisor   xp:bshears .
xp:franklin xp:advisor   xp:asmithee .
xp:rxin     xp:role      &amp;quot;student&amp;quot; .
xp:jgonzal  xp:role      &amp;quot;postdoc&amp;quot; .
xp:franklin xp:role      &amp;quot;prof&amp;quot; .
xp:istoica  xp:role      &amp;quot;prof&amp;quot; .
xp:bshears  xp:role      &amp;quot;student&amp;quot; .
xp:nphelge  xp:role      &amp;quot;student&amp;quot; .
xp:asmithee xp:role      &amp;quot;student&amp;quot; .
xp:rmutt    xp:role      &amp;quot;student&amp;quot; .
xp:ntufnel  xp:role      &amp;quot;student&amp;quot; .
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;My first SPARQL query of the RDF asked this: for each person with advisees, how many do they have?&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;PREFIX xp: &amp;lt;http://snee.com/xpropgraph#&amp;gt;


SELECT ?person (COUNT(?advisee) AS ?advisees)
WHERE {
  ?person xp:advisor ?advisee
}
GROUP BY ?person
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Here is the result:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;--------------------------
| person      | advisees |
==========================
| xp:franklin | 3        |
| xp:istoica  | 2        |
--------------------------
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The next query asks about the roles of rxin&amp;rsquo;s collaborators:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;PREFIX xp: &amp;lt;http://snee.com/xpropgraph#&amp;gt;


SELECT ?collaborator ?role
WHERE {
  xp:rxin xp:collab ?collaborator . 
  ?collaborator xp:role ?role . 
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;As it turns out, there&amp;rsquo;s only one:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;----------------------------
| collaborator | role      |
============================
| xp:jgonzal   | &amp;quot;postdoc&amp;quot; |
----------------------------
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Does nphelge have a relationship to any prof, and if so, who and what relationship?&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;PREFIX xp: &amp;lt;http://snee.com/xpropgraph#&amp;gt;


SELECT ?person ?relationship
WHERE {


  ?person xp:role &amp;quot;prof&amp;quot; . 


  { xp:nphelge ?relationship ?person }
  UNION
  { ?person ?relationship xp:nphelge }


}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And here is our answer:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;-----------------------------
| person     | relationship |
=============================
| xp:istoica | xp:advisor   |
-----------------------------
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;A hardcore RDF person will have two questions about the sample data:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;What about properties of edges? For example, what if I wanted to say that an &lt;code&gt;xp:advisor&lt;/code&gt; property was an &lt;code&gt;rdfs:subPropertyOf&lt;/code&gt; the Dublin Core property &lt;code&gt;dc:contributor&lt;/code&gt;?&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The ability to assign properties such as a name of &amp;ldquo;rxin&amp;rdquo; and a role of &amp;ldquo;student&amp;rdquo; to a node like 3L is nice, but what if I don&amp;rsquo;t have a consistent set of properties that will be assigned to every node—for example, if I&amp;rsquo;ve aggregated person data from two different sources that don&amp;rsquo;t use all the same properties to describe these persons?&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Neither of those were difficult with GraphX, and next month I&amp;rsquo;ll show my approach. I&amp;rsquo;ll also show how I applied that approach to let a GraphX program read in any RDF and then perform GraphX operations on it.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2015">2015</category>
      
    </item>
    
    <item>
      <title>Driving Hadoop data integration with standards-based models instead of code</title>
      <link>https://www.bobdc.com/blog/driving-hadoop-data-integratio/</link>
      <pubDate>Fri, 13 Feb 2015 13:43:02 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/driving-hadoop-data-integratio/</guid>
      
      
      <description><div>RDFS models!</div><div>&lt;p&gt;&lt;em&gt;&lt;strong&gt;Note:&lt;/strong&gt; I wrote this blog entry to accompany the IBM Data Magazine piece mentioned in the first paragraph, so for people following the link from there this goes into a little more detail on what RDF, triples, and SPARQL are than I normally would on this blog. I hope that readers already familiar with these standards will find the parts about doing the inferencing on a Hadoop cluster interesting.&lt;/em&gt;&lt;/p&gt;
&lt;img src=&#34;https://www.bobdc.com/img/main/rdfhadoop.png&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;RDF and Hadoop logos&#34; /&gt;
&lt;p&gt;In a short piece in &lt;a href=&#34;http://ibmdatamag.com/&#34;&gt;IBM Data Magazine&lt;/a&gt; (migrated, since then, to the &lt;a href=&#34;http://www.ibmbigdatahub.com/&#34;&gt;IBM Big Data &amp;amp; Analytics Hub&lt;/a&gt;) titled &lt;a href=&#34;http://www.ibmbigdatahub.com/blog/scale-data-integration-data-models-and-inferencing&#34;&gt;Scale up Your Data Integration with Data Models and Inferencing&lt;/a&gt;, I give a high-level overview of why the use of W3C standards-based models can provide a more scalable alternative to using code-driven transformations when integrating data from multiple sources:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;When driving this process with code generated from models (instead of from the models themselves), evolution of the code makes the code more brittle and turns the original models into out-of-date system documentation.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Mature commercial and open-source tools are available to infer, for example, that a &lt;code&gt;LastName&lt;/code&gt; value from one database and a &lt;code&gt;last_name&lt;/code&gt; value from another can both be treated as values of &lt;code&gt;FamilyName&lt;/code&gt; from a central canonical data model.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;After running such a conversion with these models, modifying the conversion to accommodate additional input data often means simply expanding the unifying model, with no need for new code.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;It can work on a Hadoop cluster with little more than a brief Python script to drive it all.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Here, we&amp;rsquo;ll look at an example of &lt;em&gt;how&lt;/em&gt; this can work. I&amp;rsquo;m going to show how I used these techniques to integrate data from the SQL Server sample &lt;a href=&#34;http://northwinddatabase.codeplex.com/&#34;&gt;Northwind&lt;/a&gt; database&amp;rsquo;s &amp;ldquo;Employees&amp;rdquo; table with data from the Oracle sample &lt;a href=&#34;https://docs.oracle.com/cd/E11882_01/server.112/e40540/tablecls.htm#CBBJICEB&#34;&gt;HR&lt;/a&gt; database&amp;rsquo;s &amp;ldquo;EMPLOYEES&amp;rdquo; table. These use different names for similar properties, and we&amp;rsquo;ll identify the relationships between those properties in a model that uses a W3C standard modeling language. Next, a Python script will use this model to combine data from the two different employee tables into one dataset that conforms to a common model. Finally, we&amp;rsquo;ll see that a small addition to the model, with no new code added to the Python script, lets the script integrate additional data from the different databases. And, we&amp;rsquo;ll do this all on a Hadoop cluster.&lt;/p&gt;
&lt;h2 id=&#34;the-data-and-the-model&#34;&gt;The data and the model&lt;/h2&gt;
&lt;p&gt;RDF represents facts in three-part {entity, property name, property value} statements known as triples. We could, for example, say that employee 4 has a FirstName value of &amp;ldquo;Margaret&amp;rdquo;, but RDF requires that the entity and property name identifiers be URIs to ensure that they&amp;rsquo;re completely unambiguous. URIs usually look like URLs, but instead of being Universal Resource Locators, they&amp;rsquo;re Universal Resource Identifiers, merely identifying resources instead of naming a location for them. This means that while some of them might look like web addresses, pasting them into a web browser&amp;rsquo;s address bar won&amp;rsquo;t necessarily get you a web page. (RDF also encourages you to represent property values as URIs as well, making it easier to connect triples into graphs that can be traversed and queried. Doing this to connect triples from different sources is another area where RDF shines in data integration work.)&lt;/p&gt;
&lt;p&gt;The use of domain names in URIs, as with Java package names, lets an organization control the naming conventions around their resources. When I used &lt;a href=&#34;http://d2rq.org/d2r-server&#34;&gt;D2R&lt;/a&gt;—an open source middleware tool that can extract data from popular relational database packages—to pull the employees tables from the Northwind and HR databases, I had it build identifiers around my own snee.com domain name. Doing this, it created entity-name-value triples such as &lt;code&gt;{&amp;lt;http://snee.com/vocab/SQLServerNorthwind#employees_4&amp;gt; &amp;lt;http://snee.com/vocab/schema/SQLServerNorthwind#employees_FirstName&amp;gt; &amp;quot;Margaret&amp;quot;}&lt;/code&gt;. A typical fact pulled out of the HR database was &lt;code&gt;{&amp;lt;http://snee.com/vocab/OracleHR#employees_191&amp;gt; &amp;lt;http://snee.com/vocab/schema/OracleHR#employees_first_name&amp;gt; &amp;quot;Randall&amp;quot;}&lt;/code&gt;, which tells us that employee 191 in that database has a first_name value of &amp;ldquo;Randall&amp;rdquo;. If the HR database also had an employee number 4 or used a column name of first_name, the use of the URIs would leave no question as to which employee or property was being referenced by each triple.&lt;/p&gt;
&lt;p&gt;It was simplest to have D2R pull the entire tables, so in addition to the first and last names of each employee, I had it pull all the other data in the Northwind and HR employee tables. To integrate this data, we&amp;rsquo;ll start with just the first and last names, and then we&amp;rsquo;ll see how easy it is to broaden the scope of our data integration.&lt;/p&gt;
&lt;p&gt;RDF offers several syntaxes for recording triples. &lt;a href=&#34;http://www.w3.org/TR/rdf-syntax-grammar/&#34;&gt;RDF/XML&lt;/a&gt; was the first to become standardized, but has fallen from popularity as simpler alternatives became available. The simplest syntax, called &lt;a href=&#34;http://www.w3.org/TR/n-triples/&#34;&gt;N-Triples&lt;/a&gt;, spells out one triple per line with full URIs and a period at the end, just like a sentence stating a fact would end with a period. Below you can see some of the data about employee 122 from the HREmployees.nt file that I pulled from the HR database&amp;rsquo;s employees table. (For this and the later N-Triples examples, I&amp;rsquo;ve added carriage returns to each line to more easily fit them here.)&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;lt;http://snee.com/vocab/OracleHR#employees_122&amp;gt; 
&amp;lt;http://snee.com/vocab/schema/OracleHR#employees_department_id&amp;gt; 
&amp;lt;http://snee.com/vocab/OracleHR#departments_50&amp;gt; .

&amp;lt;http://snee.com/vocab/OracleHR#employees_122&amp;gt; 
&amp;lt;http://snee.com/vocab/schema/OracleHR#employees_first_name&amp;gt; &amp;quot;Payam&amp;quot; .

&amp;lt;http://snee.com/vocab/OracleHR#employees_122&amp;gt; 
&amp;lt;http://snee.com/vocab/schema/OracleHR#employees_hire_date&amp;gt; 
&amp;quot;1995-05-01&amp;quot;^^&amp;lt;http://www.w3.org/2001/XMLSchema#date&amp;gt; .

&amp;lt;http://snee.com/vocab/OracleHR#employees_122&amp;gt;
&amp;lt;http://snee.com/vocab/schema/OracleHR#employees_last_name&amp;gt; &amp;quot;Kaufling&amp;quot; .

&amp;lt;http://snee.com/vocab/OracleHR#employees_122&amp;gt; 
&amp;lt;http://snee.com/vocab/schema/OracleHR#employees_phone_number&amp;gt; &amp;quot;650.123.3234&amp;quot; .
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The NorthwindEmployees.nt file pulled by D2R represents the Northwind employees with the same syntax as the HREmployees.nt file but uses URIs appropriate for that data, with &amp;ldquo;SQLServerNorthwind&amp;rdquo; in their base URI instead of &amp;ldquo;OracleHR&amp;rdquo;.&lt;/p&gt;
&lt;p&gt;For a target canonical integration model, I chose the &lt;a href=&#34;http://schema.org/docs/schema_org_rdfa.html&#34;&gt;schema.org&lt;/a&gt; model designed by a consortium of major search engines for the embedding of machine-readable data into web pages.The following shows the schemaOrgPersonSchema.ttl file, where I&amp;rsquo;ve stored an excerpt of the schema.org model describing the Person class using the W3C standard RDF Schema (RDFS) language. I&amp;rsquo;ve added carriage returns to some of the &lt;code&gt;rdfs:comment&lt;/code&gt; values to fit them here:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;@prefix schema: &amp;lt;http://schema.org/&amp;gt; .
@prefix rdfs:   &amp;lt;http://www.w3.org/2000/01/rdf-schema#&amp;gt; .
@prefix dc:     &amp;lt;http://purl.org/dc/terms/&amp;gt; .
@prefix owl:    &amp;lt;http://www.w3.org/2002/07/owl#&amp;gt; .
@prefix rdf:    &amp;lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#&amp;gt; .

schema:Person a             rdfs:Class;
        rdfs:label          &amp;quot;Person&amp;quot;;
        dc:source           &amp;lt;http://www.w3.org/wiki/WebSchemas/SchemaDotOrgSources#source_rNews&amp;gt;;
        rdfs:comment        &amp;quot;A person (alive, dead, undead, or fictional).&amp;quot;;
        rdfs:subClassOf     schema:Thing;
        owl:equivalentClass &amp;lt;http://xmlns.com/foaf/0.1/Person&amp;gt; .

schema:familyName a           rdf:Property ;
        rdfs:comment          &amp;quot;Family name. In the U.S., the last name of an Person. 
          This can be used along with givenName instead of the Name property.&amp;quot; ;
        rdfs:label            &amp;quot;familyName&amp;quot; ;
        schema:domainIncludes schema:Person ;
        schema:rangeIncludes  schema:Text .

schema:givenName a           rdf:Property ;
       rdfs:comment          &amp;quot;Given name. In the U.S., the first name of a Person. 
         This can be used along with familyName instead of the Name property.&amp;quot; ;
       rdfs:label            &amp;quot;givenName&amp;quot; ;
       schema:domainIncludes schema:Person ;
       schema:rangeIncludes  schema:Text .

schema:telephone a           rdf:Property ;
       rdfs:comment          &amp;quot;The telephone number.&amp;quot; ;
       rdfs:label            &amp;quot;telephone&amp;quot; ;
       schema:domainIncludes schema:ContactPoint , schema:Organization , 
                             schema:Person , schema:Place ;
       schema:rangeIncludes  schema:Text .
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Note that the RDFS &amp;ldquo;language&amp;rdquo; is really just a set of properties and classes to use in describing data models, not a syntax. I could have done this with the the N-Triples syntax mentioned earlier, but this excerpt from schema.org uses RDF&amp;rsquo;s &lt;a href=&#34;http://www.w3.org/TR/turtle/&#34;&gt;Turtle&lt;/a&gt; syntax to describe the class and properties. Turtle is similar to N-Triples but offers a few shortcuts to reduce verbosity:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;You can declare prefixes to stand in for common parts of URIs, so that &lt;code&gt;rdfs:label&lt;/code&gt; means the same thing as &lt;code&gt;&amp;lt;http://www.w3.org/2000/01/rdf-schema#label&amp;gt;&lt;/code&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;A semicolon means &amp;ldquo;here comes another triple with the same subject as the last one&amp;rdquo;, letting you list multiple facts about a particular resource without repeating the resource&amp;rsquo;s URI or prefixed name.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The keyword &amp;ldquo;a&amp;rdquo; stands in for the prefixed name &lt;code&gt;rdf:type&lt;/code&gt;, so that the first line after the prefix declarations above says that the resource &lt;code&gt;schema:Person&lt;/code&gt; has a type of &lt;code&gt;rdfs:Class&lt;/code&gt; (that is, that it&amp;rsquo;s an instance of the &lt;code&gt;rdfs:Class&lt;/code&gt; class and is therefore a class itself). The first line about &lt;code&gt;schema:familyName&lt;/code&gt; says that it has an &lt;code&gt;rdf:type&lt;/code&gt; of &lt;code&gt;rdf:Property&lt;/code&gt;, and so forth.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Although Turtle is now the most popular syntax for representing RDF, I used N-Triples for the employee instance data because the use of one line per triple, with no dependencies on prefix declarations or anything else on previous lines, means that a Hadoop system can split up an N-Triples file at any line breaks that it wants to without hurting the integrity of the data.&lt;/p&gt;
&lt;p&gt;What if schema.org couldn&amp;rsquo;t accommodate my complete canonical model? For example, it has no Employee class; what if I wanted to add one that has a hireDate property as well as the other properties shown above? I could simply add triples saying that Employee was a subclass of &lt;code&gt;schema:Person&lt;/code&gt; and that hireDate was a property associated with my new class.&lt;/p&gt;
&lt;p&gt;I wouldn&amp;rsquo;t add these modifications directly to the file storing the schema.org model, but instead put them in a separate file so that I could manage local customizations separately from the published standard. (The ability to combine different RDF datasets that use the same syntax—regardless of their respective data models—by just concatenating the files is another reason that RDF is popular for data integration.) This is the same strategy I used to describe my canonical model integration information, storing the following four triples in the integrationModel.ttl file to describe the relationship of the relevant HR and Northwind properties to the schema.org model:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;@prefix rdfs:     &amp;lt;http://www.w3.org/2000/01/rdf-schema#&amp;gt; . 
@prefix schema:   &amp;lt;http://schema.org/&amp;gt; . 
@prefix oraclehr: &amp;lt;http://snee.com/vocab/schema/OracleHR#&amp;gt; .
@prefix nw:       &amp;lt;http://snee.com/vocab/schema/SQLServerNorthwind#&amp;gt; .

oraclehr:employees_first_name rdfs:subPropertyOf schema:givenName  . 
oraclehr:employees_last_name  rdfs:subPropertyOf schema:familyName . 
nw:employees_FirstName        rdfs:subPropertyOf schema:givenName  . 
nw:employees_LastName         rdfs:subPropertyOf schema:familyName . 
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;(Note that in RDF, any resource that can be represented by a URI can have properties assigned to it, including properties themselves. This file uses this ability to say that the two &lt;code&gt;oraclehr&lt;/code&gt; properties and the two &lt;code&gt;nw&lt;/code&gt; properties shown each have an &lt;code&gt;rdfs:subPropertyOf&lt;/code&gt; value.) At this point, with my schemaOrgPersonSchema.ttl file storing the excerpt of schema.org that models a Person and my integrationModel.ttl file modeling the relationships between schema:Person and the Northwind and HR input data, I have all the data modeling I need to drive a simple data integration.&lt;/p&gt;
&lt;h2 id=&#34;the-python-script-and-the-hadoop-cluster&#34;&gt;The Python script and the Hadoop cluster&lt;/h2&gt;
&lt;p&gt;Hadoop&amp;rsquo;s streaming interface lets you configure MapReduce logic using any programming language that can read from standard input and write to standard output, so because I knew of a Python library that could do RDFS inferencing, I wrote the following mapper routine in Python:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;#!/usr/bin/python

# employeeInferencing.py: read employee data and models relating it to 
# schema.org, then infer and output schema.org version of relevant facts.

# sample execution:
# cat NorthwindEmployees.nt HREmployees.nt | employeeInferencing.py &amp;gt; temp.ttl

# Reads ntriples from stdin and writes ntriples results to 
# stdout so that it can be used as a streaming Hadoop task. 

import sys
import rdflib
import RDFClosure

diskFileGraph = rdflib.Graph()        # Graph to store data and models

# Read the data from standard input
streamedInput = &amp;quot;&amp;quot;
for line in sys.stdin:
    streamedInput += line
diskFileGraph.parse(data=streamedInput,format=&amp;quot;nt&amp;quot;)

# Read the modeling information
diskFileGraph.parse(
  &amp;quot;http://snee.com/rdf/inferencingDataIntegration/schemaOrgPersonSchema.ttl&amp;quot;,
  format=&amp;quot;turtle&amp;quot;)
diskFileGraph.parse(
  &amp;quot;http://snee.com/rdf/inferencingDataIntegration/integrationModel.ttl&amp;quot;,
  format=&amp;quot;turtle&amp;quot;)

# Do the inferencing
RDFClosure.DeductiveClosure(RDFClosure.RDFS_Semantics).expand(diskFileGraph)

# Use a SPARQL query to extract the data that we want to return: any
# statements whose properties are associated with the schema:Person
# class. (Note that standard RDFS would use rdfs:domain for this, but
# schema.org uses schema:domainIncludes.)

queryForPersonData = &amp;quot;&amp;quot;&amp;quot;
PREFIX schema: &amp;lt;http://schema.org/&amp;gt; 
CONSTRUCT { ?subject ?personProperty ?object }
WHERE { 
  ?personProperty schema:domainIncludes schema:Person .
  ?subject ?personProperty ?object .
}&amp;quot;&amp;quot;&amp;quot;

personData = diskFileGraph.query(queryForPersonData)

# Add the query results to a graph that we can output.
personDataGraph  = rdflib.Graph()
for row in personData:
    personDataGraph.add(row)

# Send the result to standard out.
personDataGraph.serialize(sys.stdout, format=&amp;quot;nt&amp;quot;)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;After importing the sys library to allow reading from standard input and writing to standard output, the script imports two more libraries: &lt;a href=&#34;https://github.com/RDFLib&#34;&gt;RDFLib&lt;/a&gt;, the most popular Python library for working with RDF, and RDFClosure from the related &lt;a href=&#34;https://github.com/RDFLib/OWL-RL&#34;&gt;OWL-RL&lt;/a&gt; project, which can do inferencing from RDFS modeling statements as well as inferencing that uses the Web Ontology Language (OWL), a more expressive superset of RDFS. (Other available tools for doing RDFS and OWL inferencing include TopQuadrant&amp;rsquo;s TopSPIN engine, Ontotext&amp;rsquo;s OWLIM, and Clark &amp;amp; Parsia&amp;rsquo;s Pellet.) After initializing &lt;code&gt;diskFileGraph&lt;/code&gt; as a graph to store the triples that the script will work with, the script reads any N-Triples data fed to it via standard input into this graph and then reads in the schemaOrgPersonSchema.ttl and integrationModel.ttl files of modeling data described above. The identification of these files as &lt;a href=&#34;http://snee.com/rdf/inferencingDataIntegration/schemaOrgPersonSchema.ttl&#34;&gt;http://snee.com/rdf/inferencingDataIntegration/schemaOrgPersonSchema.ttl&lt;/a&gt; and &lt;a href=&#34;http://snee.com/rdf/inferencingDataIntegration/integrationModel.ttl&#34;&gt;http://snee.com/rdf/inferencingDataIntegration/integrationModel.ttl&lt;/a&gt; are not URIs in the RDF sense, but actual URLs: send your browser to either and you&amp;rsquo;ll find copies of those files stored at those locations. That&amp;rsquo;s where the script is reading them from.&lt;/p&gt;
&lt;p&gt;Next, the script computes the &lt;a href=&#34;http://en.wikipedia.org/wiki/Deductive_closure&#34;&gt;deductive closure&lt;/a&gt; of the triples aggregated from standard input and the modeling information. For example, when it sees the triple &lt;code&gt;{&amp;lt;http://snee.com/vocab/OracleHR#employees_122&amp;gt; &amp;lt;http://snee.com/vocab/schema/OracleHR#employees_last_name&amp;gt; &amp;quot;Kaufling&amp;quot;}&lt;/code&gt; and the triple &lt;code&gt;{oraclehr:employees_last_name  rdfs:subPropertyOf schema:familyName}&lt;/code&gt;, it infers the new triple &lt;code&gt;{&amp;lt;http://snee.com/vocab/OracleHR#employees_122&amp;gt; schema:familyName &amp;quot;Kaufling&amp;quot;}&lt;/code&gt;. Because the inference engine&amp;rsquo;s job is to infer new triples based on all the relevant ones it can find, newly inferred triples may make new inferences possible, so it continues inferencing until there is nothing new that it can infer from the existing set—it has achieved closure.&lt;/p&gt;
&lt;p&gt;At this point, the script will have all of the original triples that it read in plus the new ones that it inferred, but I&amp;rsquo;m going to assume that applications using data conforming to the canonical model are only interested in that data and not in all the other input. To extract the relevant subset, the script runs a query in SPARQL, the query language from the RDF family of W3C standards. As with SQL, it&amp;rsquo;s common to see SPARQL queries that begin with SELECT statements listing columns of data to return, but this Python script uses a CONSTRUCT query instead, which returns triples instead of columns of data. The query&amp;rsquo;s WHERE clause identifies the triples that the query wants by using &amp;ldquo;triple patterns&amp;rdquo;, or triples that include variables as wildcards to describe the kinds of triples to look for, and the CONSTRUCT part describes what should be in the triples that get returned.&lt;/p&gt;
&lt;p&gt;In this case, the triples to return are any whose predicate value has a &lt;code&gt;schema:domainIncludes&lt;/code&gt; value of &lt;code&gt;schema:Person&lt;/code&gt;—in other words, any property associated with the &lt;code&gt;schema:Person&lt;/code&gt; class. As the comment in the code says, it&amp;rsquo;s more common for RDFS and OWL models to use the standard &lt;code&gt;rdfs:domain&lt;/code&gt; property to associate properties with classes, but this can get messy when associating a particular property with multiple classes, so the schema.org project defined their own &lt;code&gt;schema:domainIncludes&lt;/code&gt; property for this.&lt;/p&gt;
&lt;p&gt;This SPARQL query could be extended to implement additional logic if necessary. For example, if one database had separate &lt;code&gt;lastName&lt;/code&gt; and &lt;code&gt;firstName&lt;/code&gt; fields and another had a single &lt;code&gt;name&lt;/code&gt; field with values of the form &amp;ldquo;Smith, John&amp;rdquo;, then string manipulation functions in the SPARQL query could concatenate the &lt;code&gt;lastName&lt;/code&gt; and &lt;code&gt;firstName&lt;/code&gt; values with a comma or split the &lt;code&gt;name&lt;/code&gt; value at the comma to create new values. This brings the script past strict model-based mapping to include transformation, but most independently-developed data models don&amp;rsquo;t line up neatly enough to describe their relationships with nothing but simple mappings.&lt;/p&gt;
&lt;p&gt;The data returned by the query and stored in the &lt;code&gt;personData&lt;/code&gt; variable is not one of RDFLib&amp;rsquo;s &lt;code&gt;Graph()&lt;/code&gt; structures like the &lt;code&gt;diskFileGraph&lt;/code&gt; instance that it has been working with throughout the script, so the script creates a new instance called &lt;code&gt;personDataGraph&lt;/code&gt; and adds the data from &lt;code&gt;personData&lt;/code&gt; to it. Once this is done, all that&amp;rsquo;s left is to output this graph&amp;rsquo;s contents to standard out in the N-Triples format, identified as &amp;ldquo;nt&amp;rdquo; in the call to the &lt;code&gt;serialize&lt;/code&gt; method.&lt;/p&gt;
&lt;p&gt;In a typical Hadoop job, the data returned by the mapper routine is further processed by a reducer routine, but to keep this example simple I created a dummyReducer.py script that merely copied the returned data through unchanged:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;#!/usr/bin/python
# dummyReducer.py: just copy stdin to stdout

import sys

for line in sys.stdin:
    sys.stdout.write(line)
&lt;/code&gt;&lt;/pre&gt;
&lt;h2 id=&#34;running-it-expanding-the-model-and-running-it-again&#34;&gt;Running it, expanding the model, and running it again&lt;/h2&gt;
&lt;p&gt;With my two Python scripts, my two modeling files, and one file of data from each of the two database&amp;rsquo;s employee tables, I had everything I needed to have Hadoop integrate the data to the canonical model using RDFS inferencing. I set up a four-node Hadoop cluster using the steps describe in &lt;a href=&#34;http://letsdobigdata.wordpress.com/2014/01/13/setting-up-hadoop-multi-node-cluster-on-amazon-ec2-part-1/&#34;&gt;part 1&lt;/a&gt; and &lt;a href=&#34;http://letsdobigdata.wordpress.com/2014/01/13/setting-up-hadoop-1-2-1-multi-node-cluster-on-amazon-ec2-part-2/&#34;&gt;part 2&lt;/a&gt; of Hardik Pandya&amp;rsquo;s &amp;ldquo;Setting up Hadoop multi-node cluster on Amazon EC2&amp;rdquo;, formatted the distributed file system, and copied the NorthwindEmployees.nt and HREmployees.nt files to the &lt;code&gt;/data/employees&lt;/code&gt; directory on that file system. Because the employeeInferencing.py script would be passed to the slave nodes to run on the subsets of input data sent to those nodes, I also installed the RDFLib and OWL-RL Python modules that this script needed on the slave nodes. Then, with the Python scripts stored in &lt;code&gt;/home/ubuntu/dataInt/&lt;/code&gt; on the cluster&amp;rsquo;s master node, I was ready to run the job with the following command (split over six lines here to fit on this page) on the master node:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;hadoop jar contrib/streaming/hadoop-streaming-1.2.1.jar 
  -file /home/ubuntu/dataInt/employeeInferencing.py 
  -mapper /home/ubuntu/dataInt/employeeInferencing.py 
  -file /home/ubuntu/dataInt/dummyReducer.py 
  -reducer /home/ubuntu/dataInt/dummyReducer.py 
  -input /data/employees/* -output /data/myOutputDir
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;After running that, the following copied the result from the distributed file system to a run1.nt file in my local filesystem:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;hadoop dfs -cat /data/myOutputDir/part-00000 &amp;gt; outputCopies/run1.nt
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Here are a few typical lines from run1.nt:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;lt;http://snee.com/vocab/OracleHR#employees_100&amp;gt; 
&amp;lt;http://schema.org/familyName&amp;gt; &amp;quot;King&amp;quot; .   

&amp;lt;http://snee.com/vocab/OracleHR#employees_100&amp;gt; 
&amp;lt;http://schema.org/givenName&amp;gt; &amp;quot;Steven&amp;quot; .  

&amp;lt;http://snee.com/vocab/SQLServerNorthwind#employees_2&amp;gt; 
&amp;lt;http://schema.org/familyName&amp;gt; &amp;quot;Fuller&amp;quot; . 

&amp;lt;http://snee.com/vocab/SQLServerNorthwind#employees_2&amp;gt; 
&amp;lt;http://schema.org/givenName&amp;gt; &amp;quot;Andrew&amp;quot; .  
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The entire file is all &lt;code&gt;schema:givenName&lt;/code&gt; and &lt;code&gt;schema:familyName&lt;/code&gt; triples about the resources from the Oracle HR and SQL Server Northwind databases.&lt;/p&gt;
&lt;p&gt;This isn&amp;rsquo;t much so far, with the output only having the first and last name values from the two source databases, but here&amp;rsquo;s where it gets more interesting. We add the following two lines to the copy of integrationModel.ttl stored on the snee.com server:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;oraclehr:employees_phone_number rdfs:subPropertyOf schema:telephone .  
nw:employees_HomePhone          rdfs:subPropertyOf schema:telephone . 
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Then, with no changes to the Python scripts or anything else, re-running the same command on the Hadoop master node (with a new output directory parameter) produces a result with lines like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;lt;http://snee.com/vocab/OracleHR#employees_100&amp;gt; 
&amp;lt;http://schema.org/familyName&amp;gt; &amp;quot;King&amp;quot; .

&amp;lt;http://snee.com/vocab/OracleHR#employees_100&amp;gt; 
&amp;lt;http://schema.org/givenName&amp;gt; &amp;quot;Steven&amp;quot; .

&amp;lt;http://snee.com/vocab/OracleHR#employees_100&amp;gt; 
&amp;lt;http://schema.org/telephone&amp;gt; &amp;quot;515.123.4567&amp;quot; .

&amp;lt;http://snee.com/vocab/SQLServerNorthwind#employees_2&amp;gt; 
&amp;lt;http://schema.org/givenName&amp;gt; &amp;quot;Andrew&amp;quot; .

&amp;lt;http://snee.com/vocab/SQLServerNorthwind#employees_2&amp;gt; 
&amp;lt;http://schema.org/familyName&amp;gt; &amp;quot;Fuller&amp;quot; .

&amp;lt;http://snee.com/vocab/SQLServerNorthwind#employees_2&amp;gt; 
&amp;lt;http://schema.org/telephone&amp;gt; &amp;quot;(206) 555-9482&amp;quot; .
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Expanding the scope of the data integration required no new coding in the Python script—just an expansion of the integration model. The integration is truly being driven by the model, and not by procedural transformation code. And, adding a completely new data source wouldn&amp;rsquo;t be any more trouble than adding the phone data was above; you only need to identify which properties of the new data source correspond to which properties of the canonical data model.&lt;/p&gt;
&lt;h2 id=&#34;modeling-more-complex-relationships-for-more-complex-mapping&#34;&gt;Modeling more complex relationships for more complex mapping&lt;/h2&gt;
&lt;p&gt;All the inferencing so far has been done with just one property from the RDFS standard: &lt;code&gt;rdfs:subPropertyOf&lt;/code&gt;. RDFS offers additional modeling constructs that let you do more. As I mentioned earlier, schema.org does not define an Employee class, but if my application needs one, I can use RDFS to define it in my own namespace as a subclass of &lt;code&gt;schema:Person&lt;/code&gt;. Also, the Northwind employee data has an &lt;code&gt;nw:employees_HireDate&lt;/code&gt; property that I&amp;rsquo;d like to associate with my new class. I can do both of these by adding these two triples to integrationModel.ttl, shown here with a prefix declaration to make the triples shorter:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;@prefix emp: &amp;lt;http://snee.com/vocab/employees#&amp;gt; .
emp:Employee rdfs:subClassOf schema:Person . 
nw:employees_HireDate rdfs:domain emp:Employee .
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The SPARQL query in employeeInferencing.py only looked for properties associated with instances of &lt;code&gt;schema:Person&lt;/code&gt;, so after expanding that a bit to request the Employee and class membership triples as well, running the inferencing script shows us that the RDFSClosure engine has inferred these new triples about Andrew Fuller:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;lt;http://snee.com/vocab/SQLServerNorthwind#employees_2&amp;gt; 
&amp;lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#type&amp;gt; 
&amp;lt;http://snee.com/vocab/employees#Employee&amp;gt; .

&amp;lt;http://snee.com/vocab/SQLServerNorthwind#employees_2&amp;gt; 
&amp;lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#type&amp;gt; 
&amp;lt;http://schema.org/Person&amp;gt; .

&amp;lt;http://snee.com/vocab/SQLServerNorthwind#employees_2&amp;gt; 
&amp;lt;http://snee.com/vocab/schema/SQLServerNorthwind#employees_HireDate&amp;gt; 
&amp;quot;1992-08-14T00:00:00&amp;quot;^^&amp;lt;http://www.w3.org/2001/XMLSchema#dateTime&amp;gt; .
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;In other words, because he has an &lt;code&gt;nw:employees_HireDate&lt;/code&gt; value, it inferred that he is an instance of the class &lt;code&gt;emp:Employee&lt;/code&gt;, and because that&amp;rsquo;s a subclass of &lt;code&gt;schema:Person&lt;/code&gt;, we see that he is also a member of that class.&lt;/p&gt;
&lt;p&gt;The W3C&amp;rsquo;s &lt;a href=&#34;http://www.w3.org/2001/sw/wiki/OWL&#34;&gt;OWL&lt;/a&gt; standard adds additional properties beyond those defined by RDFS to further describe your data, as well as special classes and the ability to define your own classes to use in describing your data. For example, if the HR database&amp;rsquo;s departments table had a &lt;code&gt;related&lt;/code&gt; property so that you could specify that the shipping department is related to the receiving department, then specifying in our integration model that &lt;code&gt;{nw:related rdf:type owl:SymmetricProperty}&lt;/code&gt; would tell the RDFSClosure engine that this property is symmetric and that it should infer that the receiving department was related to shipping department. (When telling RDFSClosure&amp;rsquo;s DeductiveClosure method to do OWL inferencing in addition to RDFS inferencing, pass it an RDFS_OWLRL_Semantics parameter instead of RDFS_Semantics.)&lt;/p&gt;
&lt;p&gt;OWL also includes an &lt;code&gt;owl:inverseOf&lt;/code&gt; property that can help with data integration. For example, imagine that the Northwind database had an &lt;code&gt;nw:manages&lt;/code&gt; property that let you say things like &lt;code&gt;{emp:jack nw:manages emp:shippingDepartment}&lt;/code&gt;, but the HR database identified the relationship in the opposite direction with an &lt;code&gt;oraclehr:managedBy&lt;/code&gt; relationship used in triples of the form &lt;code&gt;{emp:receivingDepartment oraclehr:managedBy emp:jill}&lt;/code&gt;. When you tell an OWL engine that these two properties are the inverse of each other with the triple &lt;code&gt;{oraclehr:managedBy owl:inverseOf nw:manages}&lt;/code&gt;, it will infer from the triples above that &lt;code&gt;{emp:shippingDepartment oraclehr:managedBy emp:jack}&lt;/code&gt; and that &lt;code&gt;{emp:jill nw:manages emp:receivingDepartment}&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;When processing of the input is distributed over multiple nodes, as with a Hadoop cluster, this inferencing has some limitations. For example, the &lt;code&gt;owl:TransitiveProperty&lt;/code&gt; class lets me say that an &lt;code&gt;ex:locatedIn&lt;/code&gt; property is transitive by using a triple such as &lt;code&gt;{ex:locatedIn rdf:type owl:TransitiveProperty}&lt;/code&gt;. Then, when an OWL engine sees that &lt;code&gt;{ex:chair38 ex:locatedIn ex:room47}&lt;/code&gt; and that &lt;code&gt;{ex:room47 ex:locatedIn ex:building6}&lt;/code&gt;, it can infer that &lt;code&gt;{ex:chair38 ex:locatedIn ex:building6}&lt;/code&gt;. When distributing the processing across a Hadoop cluster, however, the &lt;code&gt;{ex:chair38 ex:locatedIn ex:room47}&lt;/code&gt; triple may get sent to one node and the &lt;code&gt;{ex:room47 ex:locatedIn ex:building6}&lt;/code&gt; triple to another, so neither will have enough information to infer which building the chair is in. So, when you review the RDFS and OWL standards for properties and classes that you can use to describe the data that you want to integrate on a distributed Hadoop system, keep in mind which of these can do their inferencing based on a single triple of instance data input and which require multiple triples.(The Reduce step of a MapReduce, where above I just put a dummy script to copy the data through, would be a potential place to do additional inferencing based on the output of the mapping steps done on the distributed Hadoop nodes.)&lt;/p&gt;
&lt;h2 id=&#34;other-tools-for-working-with-rdf-on-hadoop&#34;&gt;Other tools for working with RDF on Hadoop&lt;/h2&gt;
&lt;p&gt;There have been other projects for taking advantage of the RDF data model on Hadoop before I tried this, and there are more coming along. At ApacheCon Europe in 2012, Cloudera&amp;rsquo;s Paolo Castagna (formerly of Kasabi, Talis, and HP Labs in Bristol, which is quite an RDF pedigree) gave a talk titled &amp;ldquo;Handling RDF data with tools from the Hadoop ecosystem&amp;rdquo; (&lt;a href=&#34;http://archive.apachecon.com/eu2012/presentations/07-Wednesday/L2L-Web_Infra/aceu-2012-handling-rdf-data-with-tools-from-the-hadoop-ecosystem.pdf&#34;&gt;slides PDF&lt;/a&gt;) where he mostly covered the application of popular Hadoop tools to N-Triples files, but he also described his &lt;a href=&#34;https://github.com/castagna/jena-grande&#34;&gt;jena-grande&lt;/a&gt; project to mix the &lt;a href=&#34;https://jena.apache.org/&#34;&gt;Apache Jena&lt;/a&gt; RDF library with these tools. At the 2014 ApacheCon, YarcData&amp;rsquo;s Rob Vesse gave a talk titled &amp;ldquo;Quadrupling Your Elephants: RDF and The Hadoop Ecosystem&amp;rdquo; (&lt;a href=&#34;http://events.linuxfoundation.org/sites/events/files/slides/Quadrupling%20your%20Elephants_0.pdf&#34;&gt;slides PDF&lt;/a&gt;), which reviewed tools for using RDF on Hadoop and described the Jena Hadoop RDF tools project, which has since been renamed as &lt;a href=&#34;http://jena.apache.org/documentation/hadoop/&#34;&gt;Jena Elephas&lt;/a&gt;. (Rob &lt;a href=&#34;https://twitter.com/RobVesse/status/535452136025628672&#34;&gt;described&lt;/a&gt; Paolo&amp;rsquo;s jena-grande as a &amp;ldquo;useful reference &amp;amp; inspiration in developing the new stuff&amp;rdquo;.)&lt;/p&gt;
&lt;p&gt;The kind of scripting that I did with Hadoop&amp;rsquo;s streaming interface is a great way to get Hadoop tasks up and running quickly, but more serious Hadoop applications are typically written in Java, as I&amp;rsquo;ve &lt;a href=&#34;https://www.bobdc.com/blog/hadoop#id120676&#34;&gt;described&lt;/a&gt; in a recent blog entry, and by bringing the full power of Jena to this kind of development, Elephas will open up some great new possibilities for taking advantage of the RDF data model (and SPARQL, and RDFS, and OWL) on Hadoop. I&amp;rsquo;m definitely looking forward to seeing where that leads.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2015">2015</category>
      
    </item>
    
    <item>
      <title>R (and SPARQL), part 2</title>
      <link>https://www.bobdc.com/blog/r-and-sparql-part-2/</link>
      <pubDate>Tue, 20 Jan 2015 08:32:54 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/r-and-sparql-part-2/</guid>
      
      
      <description><div>Retrieve data from a SPARQL endpoint, graph it and more, then automate it.</div><div>&lt;blockquote id=&#34;id104228&#34; class=&#34;pullquote&#34;&gt;In the future whenever I use SPARQL to retrieve numeric data I&#39;ll have some much more interesting ideas about what I can do with that data. &lt;/blockquote&gt;
&lt;p&gt;In &lt;a href=&#34;https://www.bobdc.com/blog/r-and-sparql-part-1&#34;&gt;part 1&lt;/a&gt; of this series, I discussed the history of &lt;a href=&#34;http://www.r-project.org/&#34;&gt;R&lt;/a&gt;, the programming language and environment for statistical computing and graph generation, and why it&amp;rsquo;s become so popular lately. The many libraries that people have contributed to it are a key reason for its popularity, and the SPARQL one inspired me to learn some R to try it out. Part 1 showed how to load this library, retrieve a SPARQL result set, and perform some basic statistical analysis of the numbers in the result set. After I published it, it was nice to see how its &lt;a href=&#34;https://plus.google.com/101006505484718936507/posts/ETSaKzd6hQe&#34;&gt;comments section&lt;/a&gt; filled up with a nice list of projects out there that combine R and SPARQL.&lt;/p&gt;
&lt;p&gt;If you executed the sample commands from Part 1 and saved your session when quitting out of R (or in the case of what I was doing last week, RGui), all of the variables set in that session will be available for the commands described here. Today we&amp;rsquo;ll look at a few more commands for analyzing the data, how to plot points and regression lines, and how to automate it all so that you can quickly perform the same analysis on different SPARQL result sets. Again, corrections welcome.&lt;/p&gt;
&lt;p&gt;My original goal was to find out how closely the number of employees in the companies making up the Dow Jones Industrial Average &lt;a href=&#34;http://en.wikipedia.org/wiki/Correlation_and_dependence&#34;&gt;correlated&lt;/a&gt; with the net income, which we can find out with R&amp;rsquo;s &lt;code&gt;cor()&lt;/code&gt; function:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;gt; cor(queryResult$netIncome,queryResult$numEmployees)
[1] 0.1722887
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;A correlation figure close to 1 or -1 indicates a strong correlation (a negative correlation indicates that one variable&amp;rsquo;s values tend to go in the opposite direction of the other&amp;rsquo;s—for example, if incidence of a certain disease goes down as the use of a particular vaccine goes up) and 0 indicates no correlation. The correlation of 0.1722887 is much closer to 0 than it is to 1 or -1, so we see very little correlation here. (Once we automate this series of steps, we&amp;rsquo;ll finder strong correlations when we focus on specific industries.)&lt;/p&gt;
&lt;h2 id=&#34;id104120&#34;&gt;More graphing&lt;/h2&gt;
&lt;p&gt;We&amp;rsquo;re going to graph the relationship between the employee and net income figures, and then we&amp;rsquo;ll tell R to draw a straight line that fits as closely as possible to the pattern created by the plotted values. This is called a &lt;a href=&#34;https://en.wikipedia.org/wiki/Linear_regression&#34;&gt;linear regression model&lt;/a&gt;, and before we do that we tell R to calculate some data necessary for this task with the &lt;a href=&#34;https://stat.ethz.ch/R-manual/R-patched/library/stats/html/lm.html&#34;&gt;lm()&lt;/a&gt; (&amp;ldquo;linear model&amp;rdquo;) function:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;gt; myLinearModelData &amp;lt;- lm(queryResult$numEmployees~queryResult$netIncome) 
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Next, we draw the graph:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;gt; plot(queryResult$netIncome,queryResult$numEmployees,xlab=&amp;quot;net income&amp;quot;,
   ylab=&amp;quot;# of employees&amp;quot;, main=&amp;quot;Dow Jones Industrial Average companies&amp;quot;)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;As with the histogram that we saw in Part 1, R offers many ways to control the graph&amp;rsquo;s appearance, and add-in libraries let you do even more. (Try a &lt;a href=&#34;https://www.google.com/search?q=fancy+r+plots&amp;amp;source=lnms&amp;amp;tbm=isch&amp;amp;sa=X&#34;&gt;Google image search on &amp;ldquo;fancy R plots&amp;rdquo;&lt;/a&gt; to get a feel for the possibilities.) In the call to &lt;code&gt;plot()&lt;/code&gt; I included three parameters to set a main title and labels for the X and Y axes, and we see these in the result:&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://www.bobdc.com/img/main/djiasansregression.jpg&#34;&gt;&lt;img id=&#34;id104180&#34; src=&#34;https://www.bobdc.com/img/main/djiasansregression.jpg&#34; width=&#34;320&#34; border=&#34;0&#34; style=&#34;display: block;margin-left: auto;margin-right: auto&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;DJIA plot&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;We can see more intuitively what the &lt;code&gt;cor()&lt;/code&gt; function already told us: that there is minimal correlation between the rise of employee counts and net income in the companies comprising the Dow Jones Industrial average.&lt;/p&gt;
&lt;p&gt;Let&amp;rsquo;s put the data that we stored in &lt;code&gt;myLinearModelData&lt;/code&gt; to use. The &lt;a href=&#34;http://stat.ethz.ch/R-manual/R-devel/library/graphics/html/abline.html&#34;&gt;abline()&lt;/a&gt; function can use it to add a regression line to our plot:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;gt; abline(myLinearModelData)  
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;a href=&#34;https://www.bobdc.com/img/main/djiawithregrline.jpg&#34;&gt;&lt;img id=&#34;id106625&#34; src=&#34;https://www.bobdc.com/img/main/djiawithregrline.jpg&#34; width=&#34;320&#34; border=&#34;0&#34; style=&#34;display: block;margin-left: auto;margin-right: auto&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;DJIA plot with regression line&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;When you type in function calls such as &lt;code&gt;sd(queryResult$numEmployees)&lt;/code&gt; and &lt;code&gt;cor(queryResult$netIncome,queryResult$numEmployees)&lt;/code&gt;, R prints the return values as output, but you can use the return values in other operations. In the following, I&amp;rsquo;ve replotted the graph with the &lt;code&gt;cor()&lt;/code&gt; function call&amp;rsquo;s result used in a subtitle for the graph, concatenated onto the string &amp;ldquo;correlation: &amp;quot; with R&amp;rsquo;s &lt;code&gt;paste()&lt;/code&gt; function:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt; plot(queryResult$netIncome,queryResult$numEmployees,xlab=&amp;quot;net income&amp;quot;,
   ylab=&amp;quot;# of employees&amp;quot;, main=&amp;quot;Dow Jones Industrial Average companies&amp;quot;,
   sub=paste(&amp;quot;correlation: &amp;quot;,cor(queryResult$numEmployees,
             queryResult$netIncome),sep=&amp;quot;&amp;quot;))
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;(The &lt;code&gt;paste()&lt;/code&gt; function&amp;rsquo;s &lt;code&gt;sep&lt;/code&gt; argument here shows that we don&amp;rsquo;t want any separator between our concatenated pieces. I&amp;rsquo;m guessing that &lt;code&gt;paste()&lt;/code&gt; is more typically used to create delimited data files.) R puts the subtitle at the image&amp;rsquo;s bottom:&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://www.bobdc.com/img/main/djiawithsubtitle.jpg&#34;&gt;&lt;img id=&#34;id106685&#34; src=&#34;https://www.bobdc.com/img/main/djiawithsubtitle.jpg&#34; width=&#34;320&#34; border=&#34;0&#34; style=&#34;display: block;margin-left: auto;margin-right: auto&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;DJIA plot with subtitle&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Instead of plotting the graph on the screen, we can tell R to send it to a JPEG, BMP, PNG, or TIFF file. Calling a &lt;a href=&#34;http://stat.ethz.ch/R-manual/R-devel/library/grDevices/html/png.html&#34;&gt;graphics devices&lt;/a&gt; function such as &lt;code&gt;jpeg()&lt;/code&gt; before doing the plot tells R to send the results to a file, and &lt;code&gt;dev.off()&lt;/code&gt; turns off the &amp;ldquo;device&amp;rdquo; that writes to the image file.&lt;/p&gt;
&lt;h2 id=&#34;id106712&#34;&gt;Automating it&lt;/h2&gt;
&lt;p&gt;Now we know nearly enough commands to create a useful script. The remainder are just string manipulation functions that I found easy enough to look up when I needed them, although having a string concatenation command called &lt;code&gt;paste()&lt;/code&gt; is another example of the odd R terminology that I warned about last week. Here is my script:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;library(SPARQL) 


category &amp;lt;- &amp;quot;Companies_in_the_Dow_Jones_Industrial_Average&amp;quot;
#category &amp;lt;- &amp;quot;Electronics_companies_of_the_United_States&amp;quot;
#category &amp;lt;- &amp;quot;Financial_services_companies_of_the_United_States&amp;quot;


query &amp;lt;- &amp;quot;PREFIX rdfs: &amp;lt;http://www.w3.org/2000/01/rdf-schema#&amp;gt;
PREFIX dcterms: &amp;lt;http://purl.org/dc/terms/&amp;gt;
PREFIX dbo: &amp;lt;http://dbpedia.org/ontology/&amp;gt;
PREFIX dbpprop: &amp;lt;http://dbpedia.org/property/&amp;gt;
PREFIX xsd: &amp;lt;http://www.w3.org/2001/XMLSchema#&amp;gt;
SELECT ?label ?numEmployees ?netIncome  
WHERE {
  ?s dcterms:subject &amp;lt;http://dbpedia.org/resource/Category:DUMMY-CATEGORY-NAME&amp;gt; ;
     rdfs:label ?label ;
     dbo:netIncome ?netIncomeDollars ;
     dbpprop:numEmployees ?numEmployees . 
     BIND(replace(?numEmployees,&#39;,&#39;,&#39;&#39;) AS ?employees)  # lose commas
     FILTER ( lang(?label) = &#39;en&#39; )
     FILTER(contains(?netIncomeDollars,&#39;E&#39;))
     # Following because DBpedia types them as dbpedia:datatype/usDollar
     BIND(xsd:float(?netIncomeDollars) AS ?netIncome)
     # Original query on following line had two 
     # slashes, but R needed both escaped.
     FILTER(!(regex(?numEmployees,&#39;\\\\d+&#39;)))
}
ORDER BY ?numEmployees&amp;quot;


query &amp;lt;- sub(pattern=&amp;quot;DUMMY-CATEGORY-NAME&amp;quot;,replacement=category,x=query)


endpoint &amp;lt;- &amp;quot;http://dbpedia.org/sparql&amp;quot;
resultList &amp;lt;- SPARQL(endpoint,query)
queryResult &amp;lt;- resultList$results 
correlationLegend=paste(&amp;quot;correlation: &amp;quot;,cor(queryResult$numEmployees,
                         queryResult$netIncome),sep=&amp;quot;&amp;quot;)
myLinearModelData &amp;lt;- lm(queryResult$numEmployees~queryResult$netIncome) 
plotTitle &amp;lt;- chartr(old=&amp;quot;_&amp;quot;,new=&amp;quot; &amp;quot;,x=category)
outputFilename &amp;lt;- paste(&amp;quot;c:/temp/&amp;quot;,category,&amp;quot;.jpg&amp;quot;,sep=&amp;quot;&amp;quot;)
jpeg(outputFilename)
plot(queryResult$netIncome,queryResult$numEmployees,xlab=&amp;quot;net income&amp;quot;,
     ylab=&amp;quot;number of employees&amp;quot;, main=plotTitle,cex.main=.9,
     sub=correlationLegend)
abline(myLinearModelData) 
dev.off()
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Instead of hardcoding the URI of the industry category whose data I wanted, my script has DUMMY-CATEGORY-NAME, a string that it substitutes with the &lt;code&gt;category&lt;/code&gt; value assigned at the script&amp;rsquo;s beginning. The category value here is &amp;ldquo;Companies_in_the_Dow_Jones_Industrial_Average&amp;rdquo;, with the setting of two other potential &lt;code&gt;category&lt;/code&gt; values commented out so that we can easily try them later. (R, like SPARQL, uses the # character for commenting.) I also used the &lt;code&gt;category&lt;/code&gt; value to create the output filename.&lt;/p&gt;
&lt;p&gt;An additional embellishment to the sequence of commands that we entered manually is that the script stores the plot title in a &lt;code&gt;plotTitle&lt;/code&gt; variable, replacing the underscores in the category name with spaces. Because this sometimes resulted in titles that were too wide for the plot image, I added &lt;code&gt;cex.main=9&lt;/code&gt; as a &lt;code&gt;plot()&lt;/code&gt; argument to reduce the title&amp;rsquo;s size.&lt;/p&gt;
&lt;p&gt;With the script stored in /temp/myscript.R, entering the following at the R prompt runs it:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;source(&amp;quot;/temp/myscript.R&amp;quot;)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;If I don&amp;rsquo;t have an R interpreter up and running, I can run the script from the operating system command line by calling rscript, which is included with R:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;rscript /temp/myscript.R
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;After it runs, my /temp directory has this Companies_in_the_Dow_Jones_Industrial_Average.jpg file in it:&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://www.bobdc.com/img/main/Companies_in_the_Dow_Jones_Industrial_Average.jpg&#34;&gt;&lt;img id=&#34;id106837&#34; src=&#34;https://www.bobdc.com/img/main/Companies_in_the_Dow_Jones_Industrial_Average.jpg&#34; width=&#34;320&#34; border=&#34;0&#34; style=&#34;display: block;margin-left: auto;margin-right: auto&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;DJIA plot from script&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;When I uncomment the script&amp;rsquo;s second &lt;code&gt;category&lt;/code&gt; assignment line instead of the first and run the script again, it creates the file Electronics_companies_of_the_United_States.jpg:&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://www.bobdc.com/img/main/Electronics_companies_of_the_United_States.jpg&#34;&gt;&lt;img id=&#34;id106858&#34; src=&#34;https://www.bobdc.com/img/main/Electronics_companies_of_the_United_States.jpg&#34; width=&#34;320&#34; border=&#34;0&#34; style=&#34;display: block;margin-left: auto;margin-right: auto&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;data on U.S. electronics companies&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;There&amp;rsquo;s better correlation this time, of almost .5. Fitting two particular outliers onto the plot means that R put enough points in the lower-left to make a bit of a blotch; I did find with experimentation that the &lt;code&gt;plot()&lt;/code&gt; command offers parameters to only display the points within a particular range of values on the horizontal or vertical axis, making it easier to show a zoomed view.&lt;/p&gt;
&lt;p&gt;Here&amp;rsquo;s what we get when querying about Financial_services_companies_of_the_United_States:&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://www.bobdc.com/img/main/Financial_services_companies_of_the_United_States.jpg&#34;&gt;&lt;img id=&#34;id106922&#34; src=&#34;https://www.bobdc.com/img/main/Financial_services_companies_of_the_United_States.jpg&#34; width=&#34;320&#34; border=&#34;0&#34; style=&#34;display: block;margin-left: auto;margin-right: auto&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;data on U.S. electronics companies&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;We see the strongest correlation yet: over .84. I suppose that at financial services companies, hiring more people is more likely to increase revenue than in other typical sectors because you can provide (and charge for) a higher volume of services. This is only a theory, but that&amp;rsquo;s why people use statistical analysis packages: to look for patterns that can suggest theories, and it&amp;rsquo;s great to know that such a powerful open-source package can do this with data retrieved from SPARQL endpoints.&lt;/p&gt;
&lt;p&gt;If I was going to run this script from the operating system command line regularly, then instead of setting the &lt;code&gt;category&lt;/code&gt; value at the beginning of the script, I would &lt;a href=&#34;https://cwcode.wordpress.com/2013/04/16/the-joys-of-rscript/&#34;&gt;pass it to rscript as an argument with the script name&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;id106949&#34;&gt;Learning more about R&lt;/h2&gt;
&lt;p&gt;Because of R&amp;rsquo;s age and academic roots, there is a lot of stray documentation around, often in LaTeXish-looking PDFs from several years ago. Many introductions to R are aimed at people in a specific field, and I suppose my blog entries here fall in this category.&lt;/p&gt;
&lt;p&gt;The best short, modern tour of R that I&amp;rsquo;ve found recently is Sharon Machlis&amp;rsquo;s six-part series beginning at &lt;a href=&#34;http://www.computerworld.com/article/2497143/business-intelligence-beginner-s-guide-to-r-introduction.html&#34;&gt;Beginner&amp;rsquo;s Guide to R: Introduction&lt;/a&gt;. Part six points to many other places to learn about R ranging from blog entries to complete books to videos, and reviewing the list now I see more entries that I hadn&amp;rsquo;t noticed before that look worth investigating.&lt;/p&gt;
&lt;p&gt;Her list is where I learned about Jeffrey M. Stanton&amp;rsquo;s &lt;a href=&#34;https://itunes.apple.com/us/book/introduction-to-data-science/id529088127?mt=11&#34;&gt;Introduction to Data Science&lt;/a&gt;, an excellent introduction to both Data Science and to the use of R to execute common data science analysis tasks. The link here goes to an iTunes version of the book, but there&amp;rsquo;s also a &lt;a href=&#34;http://surface.syr.edu/cgi/viewcontent.cgi?article=1165&amp;amp;context=istpub&#34;&gt;PDF&lt;/a&gt; version, which I read beginning to end.&lt;/p&gt;
&lt;p&gt;The &lt;a href=&#34;https://en.wikibooks.org/wiki/R_Programming&#34;&gt;R Programming Wikibook&lt;/a&gt; makes a good quick reference work, especially when you need a particular function for something; see the table of contents down its right side. I found myself going back to the &lt;a href=&#34;https://en.wikibooks.org/wiki/R_Programming/Text_Processing&#34;&gt;Text Processing&lt;/a&gt; page there several times. The four-page &amp;ldquo;R Reference Card&amp;rdquo; (&lt;a href=&#34;http://cran.r-project.org/doc/contrib/Short-refcard.pdf&#34;&gt;pdf&lt;/a&gt;) by Tom Short is also worth printing out.&lt;/p&gt;
&lt;p&gt;Last week I mentioned John D. Cook&amp;rsquo;s &lt;a href=&#34;http://www.johndcook.com/blog/r_language_for_programmers/&#34;&gt;R language for programmers&lt;/a&gt;, a blog entry that will help anyone familiar with typical modern programming languages get over a few initial small humps more quickly when learning R.&lt;/p&gt;
&lt;p&gt;I described Machlis&amp;rsquo;s six-part series as &amp;ldquo;short&amp;rdquo; because there there are so many full-length books on R out there, such as free ones like Stanton&amp;rsquo;s and several offerings from O&amp;rsquo;Reilly and Manning. I&amp;rsquo;ve read the first few chapters of Manning&amp;rsquo;s &lt;a href=&#34;http://www.amazon.com/exec/obidos/ISBN=1935182390/bobducharmeA&#34;&gt;R in Action&lt;/a&gt; by Robert Kabakoff and find it very helpful so far. Apparently a new edition is coming out in March, so if you&amp;rsquo;re thinking of buying it you may want to wait or else get the &lt;a href=&#34;http://www.manning.com/kabacoff2/&#34;&gt;early access edition&lt;/a&gt;. Manning&amp;rsquo;s &lt;a href=&#34;http://www.amazon.com/exec/obidos/ISBN=1617291560/bobducharmeA&#34;&gt;Practical Data Science with R&lt;/a&gt; also looks good, but assumes a bit of R background (in fact, it recommends &amp;ldquo;R in Action&amp;rdquo; as a starting point), and a real beginner to this area would be better off starting with Stanton&amp;rsquo;s free book mentioned above.&lt;/p&gt;
&lt;p&gt;O&amp;rsquo;Reilly has several books on R, including an &lt;a href=&#34;http://www.amazon.com/exec/obidos/ISBN=0596809158/bobducharmeA&#34;&gt;R Cookbook&lt;/a&gt; whose very task-oriented table of contents is worth skimming, as well as an accompanying &lt;a href=&#34;http://www.amazon.com/exec/obidos/ISBN=1449316956/bobducharmeA&#34;&gt;R Graphics Cookbook&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I know that I&amp;rsquo;ll be going back to several of these books and web pages, because in the future whenever I use SPARQL to retrieve numeric data I&amp;rsquo;ll have some much more interesting ideas about what I can do with that data.&lt;/p&gt;
&lt;img id=&#34;id107116&#34; src=&#34;https://www.bobdc.com/img/main/fancyrplots.png&#34; border=&#34;0&#34; style=&#34;display: block;margin-left: auto;margin-right: auto &#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;fancy R plots on Google&#34; width=&#34;480&#34;/&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2015">2015</category>
      
    </item>
    
    <item>
      <title>R (and SPARQL), part 1</title>
      <link>https://www.bobdc.com/blog/r-and-sparql-part-1/</link>
      <pubDate>Tue, 13 Jan 2015 08:26:20 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/r-and-sparql-part-1/</guid>
      
      
      <description><div>Or, R for RDF people.</div><div>&lt;p&gt;&lt;a href=&#34;http://www.r-project.org/&#34;&gt;R&lt;/a&gt; is a programming language and environment for statistical computing and graph generation that, despite being over 30 years old, has gotten hot lately because it&amp;rsquo;s an open-source, cross-platform tool that brings a lot to the world of Data Science, a recently popular field often associated with the &lt;a href=&#34;https://www.bobdc.com/blog/hadoop#id123170&#34;&gt;analytics&lt;/a&gt; aspect of the drive towards Big Data. The large, active community around R has developed many add-on libraries, including one for working with data retrieved from SPARQL endpoints, so I thought I&amp;rsquo;d get to know R well enough to try that library. I first learned about this library from &lt;a href=&#34;http://www.programmingr.com/content/sparql-with-r/&#34;&gt;SPARQL with R in Less than 5 Minutes&lt;/a&gt;, which describes Semantic Web and Linked Data concepts to people familiar with R in order to demonstrate what they can do together; my goal here is to explain R to people familiar with RDF for the same reason. (Corrections to any misuse of statistical terminology are welcome.)&lt;/p&gt;
&lt;blockquote id=&#34;id119548&#34; class=&#34;pullquote&#34;&gt;an open-source, cross-platform tool that brings a lot to the world of Data Science&lt;/blockquote&gt;
&lt;p&gt;R has also been called &amp;ldquo;GNU S,&amp;rdquo; and first appeared in 1993 as an implementation of a statistical programming language developed at Bell Labs in 1976 known as &lt;a href=&#34;http://en.wikipedia.org/wiki/S_%28programming_language%29&#34;&gt;S&lt;/a&gt;. (This is cuter if you know that the C programming language was also developed at Bell Labs as a &lt;a href=&#34;http://en.wikipedia.org/wiki/C_%28programming_language%29#History&#34;&gt;successor&lt;/a&gt; to a language called B.) Its commercial competition includes &lt;a href=&#34;http://www.stata.com/&#34;&gt;Stata&lt;/a&gt;, &lt;a href=&#34;http://www.sas.com&#34;&gt;SAS&lt;/a&gt;, and &lt;a href=&#34;http://www-01.ibm.com/software/analytics/spss/&#34;&gt;SPSS&lt;/a&gt;, all of whom have plenty to fear from R as its its power and reputation grow while its cost stays at zero. According to a &lt;a href=&#34;http://www.nature.com/news/programming-tools-adventures-with-r-1.16609&#34;&gt;recent article in Nature&lt;/a&gt; on R&amp;rsquo;s growing popularity among scientists, &amp;ldquo;In the past decade, R has caught up with and overtaken the market leaders.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;Downloading and installing R on a Windows machine gave me an icon that opened up the RGui windowed environment, which contains a console window where you enter commands that add other windows within RGui as needed for graphics. (The distribution also includes an executable that you can run from your operating system command line; as we&amp;rsquo;ll see next week, you can use this to run scripts as well.) Most discussions of R recommend the open source &lt;a href=&#34;http://www.rstudio.com/&#34;&gt;RStudio&lt;/a&gt; as a more serious IDE for R development, but RGui was enough for me to play around.&lt;/p&gt;
&lt;p&gt;Some of R&amp;rsquo;s syntax is a bit awkward in places, possibly because of its age—some of its source code is written in Fortran, and it actually &lt;a href=&#34;http://www.r-bloggers.com/fortran-and-r-speed-things-up/&#34;&gt;lets you call Fortran subroutines&lt;/a&gt;. I found some of its terminology to be awkward as well, but probably because it was designed for statisticians and not for programmers accustomed to typical modern programming languages. I highly recommend the quick tour of syntax quirks in &lt;a href=&#34;http://www.johndcook.com/blog/r_language_for_programmers/&#34;&gt;R language for programmers&lt;/a&gt; by John D. Cook for such people when they&amp;rsquo;re getting started with R.&lt;/p&gt;
&lt;p&gt;For example, where I think of a table or a spreadsheet as consisting of rows and columns, R describes a data frame of observations and variables, meaning essentially the same thing. Of the simpler structures that come up in R, a vector is a one-dimensional set (I almost said &amp;ldquo;array&amp;rdquo; or &amp;ldquo;list&amp;rdquo; instead of &amp;ldquo;set&amp;rdquo; but these have different, specific meanings in R) of values of the same type, a matrix is a two-dimensional version, and an array is a three-dimensional version. A data frame looks like a matrix but &amp;ldquo;columns can be different modes&amp;rdquo; (that is, different properties and types), as described on the &lt;a href=&#34;http://www.statmethods.net/input/datatypes.html&#34;&gt;Data types&lt;/a&gt; page of the Quick-R website. The same page says that &amp;ldquo;data frames are the main structures you&amp;rsquo;ll use to store datasets,&amp;rdquo; which makes sense when you consider their similarity to spreadsheets, relational database tables, and, in the RDF world, SPARQL result sets.&lt;/p&gt;
&lt;p&gt;I don&amp;rsquo;t want to make too much of what may look like quirky terminology and syntax to people accustomed to other modern programming languages. I have come to appreciate the way R makes the most popular statistical operations so easy to carry out—even easier than Excel or &lt;a href=&#34;http://www.libreoffice.org/discover/calc/&#34;&gt;LibreOffice Calc&lt;/a&gt;, which have a surprising amount of basic statistical operations built in.&lt;/p&gt;
&lt;h2 id=&#34;id119458&#34;&gt;Retrieving data from a SPARQL endpoint&lt;/h2&gt;
&lt;p&gt;Below I&amp;rsquo;ve walked through a session of commands entered at an R command line that you can paste into an R session yourself, not counting the &amp;gt; prompt shown before each command. Let&amp;rsquo;s say that, using data retrieved from DBpedia, I&amp;rsquo;m wondering if there&amp;rsquo;s a correlation between the number of employees and the amount of net income in a given set of companies. (I only used U.S. companies to make it easier to compare income figures.) Typically, companies with more employees have more net income, but do they correlate more closely in some industries than others? R lets you quantify and graph this correlation very easily, and along the way we&amp;rsquo;ll see a few other things that it can do.&lt;/p&gt;
&lt;p&gt;To start, I install the SPARQL package with this command, which starts up a wizard that loads it from a remote mirror:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;gt; install.packages(&amp;quot;SPARQL&amp;quot;)  
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;After R installed the package, I loaded it for use in this session. The &lt;code&gt;help()&lt;/code&gt; function can tell us more about an installed package:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;gt; library(SPARQL)
&amp;gt; help(package=&amp;quot;SPARQL&amp;quot;) 
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;help()&lt;/code&gt; function pops up a browser window with documentation of the topic passed as an argument. You can pass any function name to &lt;code&gt;help()&lt;/code&gt; as well, so you can enter something like &lt;code&gt;help(library())&lt;/code&gt; or even &lt;code&gt;help(help)&lt;/code&gt;.&lt;/p&gt;
&lt;h2 id=&#34;id121918&#34;&gt;Analyzing the result&lt;/h2&gt;
&lt;p&gt;The next command uses R&amp;rsquo;s &lt;code&gt;&amp;lt;-&lt;/code&gt; assignment operator to assign a big multi-line string to the variable &lt;code&gt;query&lt;/code&gt;. The string holds a SPARQL query that will be sent to DBpedia; you can &lt;a href=&#34;http://dbpedia.org/snorql/?query=PREFIX+rdfs%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F2000%2F01%2Frdf-schema%23%3E%0D%0APREFIX+dcterms%3A+%3Chttp%3A%2F%2Fpurl.org%2Fdc%2Fterms%2F%3E%0D%0APREFIX+dbo%3A+%3Chttp%3A%2F%2Fdbpedia.org%2Fontology%2F%3E%0D%0APREFIX+dbpprop%3A+%3Chttp%3A%2F%2Fdbpedia.org%2Fproperty%2F%3E%0D%0APREFIX+xsd%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F2001%2FXMLSchema%23%3E%0D%0ASELECT+%3Flabel+%3FnumEmployees+%3FnetIncome++%0D%0AWHERE+%7B%0D%0A++%3Fs+dcterms%3Asubject+%3Chttp%3A%2F%2Fdbpedia.org%2Fresource%2FCategory%3ACompanies_in_the_Dow_Jones_Industrial_Average%3E+%3B%0D%0A+++++rdfs%3Alabel+%3Flabel+%3B%0D%0A+++++dbo%3AnetIncome+%3FnetIncomeDollars+%3B%0D%0A+++++dbpprop%3AnumEmployees+%3FnumEmployees+.+%0D%0A+++++BIND%28replace%28%3FnumEmployees%2C%27%2C%27%2C%27%27%29+AS+%3Femployees%29++%23+lose+commas%0D%0A+++++FILTER+%28+lang%28%3Flabel%29+%3D+%27en%27+%29%0D%0A+++++FILTER%28contains%28%3FnetIncomeDollars%2C%27E%27%29%29%0D%0A+++++%23+Following+because+DBpedia+types+them+as+dbpedia%3Adatatype%2FusDollar%0D%0A+++++BIND%28xsd%3Afloat%28%3FnetIncomeDollars%29+AS+%3FnetIncome%29%0D%0A+++++%23+original+query+on+following+line+had+two+slashes%2C+but+%0D%0A+++++%23+R+needed+both+escaped%0D%0A+++++FILTER%28!%28regex%28%3FnumEmployees%2C%27%5C%5Cd%2B%27%29%29%29%0D%0A%7D%0D%0AORDER+BY+%3FnumEmployees&#34;&gt;run the same query on DBpedia&amp;rsquo;s SNORQL interface&lt;/a&gt; to get a preview of the data (the query sent by that link is slightly different—see the last SPARQL comment in the query below):&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;gt; query &amp;lt;- &amp;quot;PREFIX rdfs: &amp;lt;http://www.w3.org/2000/01/rdf-schema#&amp;gt;
PREFIX dcterms: &amp;lt;http://purl.org/dc/terms/&amp;gt;
PREFIX dbo: &amp;lt;http://dbpedia.org/ontology/&amp;gt;
PREFIX dbpprop: &amp;lt;http://dbpedia.org/property/&amp;gt;
PREFIX xsd: &amp;lt;http://www.w3.org/2001/XMLSchema#&amp;gt;
SELECT ?label ?numEmployees ?netIncome  
WHERE {
  ?s dcterms:subject &amp;lt;http://dbpedia.org/resource/Category:Companies_in_the_Dow_Jones_Industrial_Average&amp;gt; ;
     rdfs:label ?label ;
     dbo:netIncome ?netIncomeDollars ;
     dbpprop:numEmployees ?numEmployees . 
     BIND(replace(?numEmployees,&#39;,&#39;,&#39;&#39;) AS ?employees)  # lose commas
     FILTER ( lang(?label) = &#39;en&#39; )
     FILTER(contains(?netIncomeDollars,&#39;E&#39;))
     # Following because DBpedia types them as dbpedia:datatype/usDollar
     BIND(xsd:float(?netIncomeDollars) AS ?netIncome)
     # original query on following line had two slashes, but 
     # R needed both escaped
     FILTER(!(regex(?numEmployees,&#39;\\\\d+&#39;)))
}
ORDER BY ?numEmployees&amp;quot;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The query asks for the net income and employee count figures for companies that comprise the Dow Jones Industrial Average. The SPARQL comments within the query describe the query&amp;rsquo;s steps in more detail.&lt;/p&gt;
&lt;p&gt;Next, we assign the endpoint&amp;rsquo;s URL to the &lt;code&gt;endpoint&lt;/code&gt; variable and call the SPARQL package&amp;rsquo;s &lt;code&gt;SPARQL()&lt;/code&gt; function to send the query to that endpoint, storing the result in a &lt;code&gt;resultList&lt;/code&gt; variable:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;gt; endpoint &amp;lt;- &amp;quot;http://dbpedia.org/sparql&amp;quot;
&amp;gt; resultList &amp;lt;- SPARQL(endpoint,query)
&amp;gt; typeof(resultList)
[1] &amp;quot;list&amp;quot;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The third command there, and R&amp;rsquo;s output, show that &lt;code&gt;resultList&lt;/code&gt; has a type of list, which is described on the &lt;a href=&#34;http://www.statmethods.net/input/datatypes.html&#34;&gt;Data types&lt;/a&gt; page mentioned earlier as an &amp;ldquo;ordered collection of objects (components). A list allows you to gather a variety of (possibly unrelated) objects under one name.&amp;rdquo; (Compare this with a vector, where everything must have the same type, or in R-speak, the same mode.)&lt;/p&gt;
&lt;p&gt;The next command uses the very handy &lt;code&gt;summary()&lt;/code&gt; function to learn more about what the &lt;code&gt;SPARQL()&lt;/code&gt; function put into the &lt;code&gt;resultList&lt;/code&gt; variable:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;gt; summary(resultList)
           Length Class      Mode
results    3      data.frame list
namespaces 0      -none-     NULL
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;It shows a list of two things: our query results and an empty list of namespaces. Because we don&amp;rsquo;t care about the empty list of namespaces, we&amp;rsquo;ll make it easier to work with the &lt;code&gt;results&lt;/code&gt; part by pulling it out and storing it in its own &lt;code&gt;queryResult&lt;/code&gt; variable using the &lt;code&gt;$&lt;/code&gt; operator to identify the part of &lt;code&gt;resultList&lt;/code&gt; that we want. Then, we use the &lt;code&gt;str()&lt;/code&gt; function to learn more about what&amp;rsquo;s in there:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;gt; queryResult &amp;lt;- resultList$results 
&amp;gt; str(queryResult)
&#39;data.frame&#39;:   27 obs. of  3 variables:
 $ label       : chr  &amp;quot;\&amp;quot;Visa Inc.\&amp;quot;@en&amp;quot; &amp;quot;\&amp;quot;The Travelers Companies\&amp;quot;@en&amp;quot; ...
 $ numEmployees: int  8500 30500 32900 44000 62800 64600 70000 ...
 $ netIncome   : num  2.14e+09 2.47e+09 8.04e+09 2.22e+09 5.36e+09 ...
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The output tells us that it&amp;rsquo;s a data frame, mentioned earlier as &amp;ldquo;the main structures you&amp;rsquo;ll use to store datasets,&amp;rdquo; with 27 obs[ervations] and 3 variables (that is, rows and columns).&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;summary()&lt;/code&gt; function tells us some great stuff about a data frame—a set of information that would be much more work to retrieve if the same data was loaded into a spreadsheet program:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;gt; summary(queryResult)
    label            numEmployees       netIncome        
 Length:27          Min.   :   8500   Min.   :2.144e+09  
 Class :character   1st Qu.:  72500   1st Qu.:4.863e+09  
 Mode  :character   Median : 107600   Median :8.040e+09  
                    Mean   : 205227   Mean   :1.050e+10  
                    3rd Qu.: 171711   3rd Qu.:1.530e+10  
                    Max.   :2200000   Max.   :3.258e+10  
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The SPARQL query&amp;rsquo;s SELECT statement asked for the label, numEmployees, and netIncome values, and we see some interesting information about the values returned for these, especially the numeric ones: the minimum, maximum, and mean (average) values of each, as well the boundary values if you split the returned values as closely as possible into four even groups known in statistics as &lt;a href=&#34;https://en.wikipedia.org/wiki/Quartile&#34;&gt;quartiles&lt;/a&gt;. The first quartile value marks the boundary between the bottom quarter and the next quarter, the median splits the values in half, and the third quartile splits the top quarter from the third one.&lt;/p&gt;
&lt;p&gt;We can very easily ask for the variance—a measure of how far apart all the values are spread from the mean—as well as the standard deviation, a useful measurement for describing how far any specific value is from the mean:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;gt; var(queryResult$numEmployees)
[1] 167791342395
&amp;gt; sd(queryResult$numEmployees)
[1] 409623.4
&lt;/code&gt;&lt;/pre&gt;
&lt;h2 id=&#34;id122364&#34;&gt;Our first plot: a histogram&lt;/h2&gt;
&lt;p&gt;For our first step into graphics, we&amp;rsquo;ll create a histogram, which illustrates the distribution of values. As with all R graphics, there are plenty of parameters available to control the image&amp;rsquo;s appearance, but we can get a pretty useful histogram by sticking with the defaults:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;hist(queryResult$netIncome) 
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;When running this interactively, RGui opens up a new window and displays the image there:&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://www.bobdc.com/img/main/rguihistogram.jpg&#34;&gt;&lt;img id=&#34;id122382&#34; src=&#34;https://www.bobdc.com/img/main/rguihistogram.jpg&#34; width=&#34;600&#34; border=&#34;0&#34; style=&#34;display: block;margin-left: auto;margin-right: auto&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;histogram generated with R&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://www.bobdc.com/blog/r-and-sparql-part-2&#34;&gt;Next week&lt;/a&gt; we&amp;rsquo;ll learn how to plot the specific points in the data, how to make the graph titles look nicer, and how to quantify the correlation between the two sets of values. (If you&amp;rsquo;ve been entering the commands shown here, then when you quit R with the &lt;code&gt;quit()&lt;/code&gt; command or by picking Exit from RGui&amp;rsquo;s File menu, it offers to save your workspace image for re-use the next time you start it up, so all of the variables that were set in a session like this will still be available in the next session.) We&amp;rsquo;ll also see how to automate this series of steps to make it easier to generate a graph, with the correlation figure included, as a JPEG file. This automation will make it easier to graph the results and find the correlation figures for different industries. Finally, I&amp;rsquo;ll list the best resources I found for learning R—there are a lot of them out there, of wildly varying quality.&lt;/p&gt;
&lt;p&gt;Meanwhile, you can gaze at this R plot of a Mandelbrot set from &lt;a href=&#34;http://en.wikipedia.org/wiki/R_%28programming_language%29&#34;&gt;R&amp;rsquo;s Wikipedia page&lt;/a&gt;, which includes all the commands necessary to generate it:&lt;/p&gt;
&lt;img id=&#34;id122437&#34; src=&#34;https://www.bobdc.com/img/main/Mandelbrot_Creation_Animation.gif&#34; width=&#34;320&#34; border=&#34;0&#34; style=&#34;display: block;margin-left: auto;margin-right: auto&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;Mandelbrot image generated with R&#34;/&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2015">2015</category>
      
    </item>
    
    <item>
      <title>Hadoop</title>
      <link>https://www.bobdc.com/blog/hadoop/</link>
      <pubDate>Sat, 13 Dec 2014 09:13:36 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/hadoop/</guid>
      
      
      <description><div>What it is and how people use it: my own summary.</div><div>&lt;p&gt;&lt;a href=&#34;http://hadoop.apache.org/&#34;&gt;&lt;img id=&#34;id120623&#34; src=&#34;https://www.bobdc.com/img/main/220px-Hadoop_logo.svg.png&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;Hadoop logo&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The web offers plenty of introductions to what &lt;a href=&#34;http://hadoop.apache.org/&#34;&gt;Hadoop&lt;/a&gt; is about. After reading up on it and trying it out a bit, I wanted to see if I could sum up what I see as the main points as concisely as possible. Corrections welcome.&lt;/p&gt;
&lt;p&gt;Hadoop is an open source Apache project consisting of several modules. The key ones are the Hadoop Distributed File System (whose acronym is &lt;a href=&#34;http://hadoop.apache.org/#What+Is+Apache+Hadoop%3F&#34;&gt;trademarked&lt;/a&gt;, apparently) and MapReduce. The HDFS lets you distribute storage across multiple systems and MapReduce lets you distribute processing across multiple systems by performing your &amp;ldquo;Map&amp;rdquo; logic on the distributed nodes and then the &amp;ldquo;Reduce&amp;rdquo; logic to gather up the results of the map processes on the master node that&amp;rsquo;s driving it all.&lt;/p&gt;
&lt;p&gt;This ability to spread out storage and processing makes it easier to do large-scale processing without requiring large-scale hardware. You can spread the processing across whatever boxes you have lying around or across virtual machines on a cloud platform that you spin up for only as long as you need them. This ability to inexpensively scale up has made Hadoop one of the most popular technologies associated with the buzzphrase &amp;ldquo;Big Data.&amp;rdquo;&lt;/p&gt;
&lt;h2 id=&#34;id120673&#34;&gt;Writing Hadoop applications&lt;/h2&gt;
&lt;p&gt;Hardcore Hadoop usage often means writing the map and reduce tasks in Java programs that must import special Hadoop libraries and play by Hadoop rules; see the source of the Apache Hadoop Wiki&amp;rsquo;s &lt;a href=&#34;http://wiki.apache.org/hadoop/WordCount&#34;&gt;Word Count&lt;/a&gt; program for an example. (Word count programs are ubiquitous in Hadoop primers.) Then, once you&amp;rsquo;ve started up the Hadoop &lt;a href=&#34;http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/#starting-your-single-node-cluster&#34;&gt;background processes&lt;/a&gt;, you can use Hadoop command line utilities to indicate the JAR file with your map and reduce logic and where on the HDFS to look for input and to put output. While your program runs, you can check on its progress with web interfaces to the various background processes.&lt;/p&gt;
&lt;p&gt;Instead of coding and compiling your own JAR file, one nice option is to use the hadoop-streaming-*.jar one that comes with the Hadoop distribution to hand off the processing to scripts you&amp;rsquo;ve written in just about any language that can read from standard input and write to standard output. There&amp;rsquo;s no need for these scripts to import any special Hadoop libraries. I found it very easy to go through Michael G. Noll&amp;rsquo;s &lt;a href=&#34;http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/&#34;&gt;Writing an Hadoop MapReduce Program in Python&lt;/a&gt; tutorial (creating yet another word count program) after first doing his &lt;a href=&#34;http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/&#34;&gt;Running Hadoop on Ubuntu Linux (Single-Node Cluster)&lt;/a&gt; tutorial to set up a small Hadoop environment. (If you try one of the many Hadoop tutorials you can find on the web, make sure to run the same version of Hadoop that the tutorial&amp;rsquo;s author did. The 2.* Hadoop releases are different enough from the 1.* ones that if you try to set up a distributed file system and share processing across it using a recent release while following instructions written using a 1.* release, there are more opportunities for problems. I had good luck with Hardik Pandya&amp;rsquo;s &amp;ldquo;How to Set Up a Multi-Node Hadoop Cluster on Amazon EC2,&amp;rdquo; split into &lt;a href=&#34;http://java.dzone.com/articles/how-set-multi-node-hadoop&#34;&gt;Part 1&lt;/a&gt; and &lt;a href=&#34;http://letsdobigdata.wordpress.com/2014/01/13/setting-up-hadoop-1-2-1-multi-node-cluster-on-amazon-ec2-part-2/&#34;&gt;Part 2&lt;/a&gt;, when I used the same release that he did.)&lt;/p&gt;
&lt;h2 id=&#34;id120450&#34;&gt;Hadoop&amp;rsquo;s native scripting environments&lt;/h2&gt;
&lt;p&gt;Instead of writing your own applications, you can take advantage of the increasing number of native Hadoop scripting languages that shield you from the lower-level parts. Several popular ones build on &lt;a href=&#34;https://cwiki.apache.org/confluence/display/Hive/HCatalog&#34;&gt;HCatalog&lt;/a&gt;, a layer built on top of the HDFS. As the Hortonworks Hadoop tutorial &lt;a href=&#34;http://hortonworks.com/hadoop-tutorial/hello-world-an-introduction-to-hadoop-hcatalog-hive-and-pig/&#34;&gt;Hello World! – An introduction to Hadoop with Hive and Pig&lt;/a&gt; puts it, &amp;ldquo;The function of HCatalog is to hold location and metadata about the data in a Hadoop cluster. This allows scripts and MapReduce jobs to be decoupled from data location and metadata like the schema. Additionally since HCatalog supports many tools, like Hive and Pig, the location and metadata can be shared between tools.&amp;rdquo; You can work with HCatalog &lt;a href=&#34;https://cwiki.apache.org/confluence/display/Hive/HCatalog+CLI&#34;&gt;directly&lt;/a&gt;, but it&amp;rsquo;s more common to use these other tools that are built on top of it, and you&amp;rsquo;ll often see HCatalog mentioned in discussions of those tools. (For example, the same tutorial refers to the need to register a file with HCatalog before Hive or Pig can use it.)&lt;/p&gt;
&lt;p&gt;Apache &lt;a href=&#34;https://hive.apache.org/&#34;&gt;Hive&lt;/a&gt;, according to its home page, &amp;ldquo;facilitates querying and managing large datasets residing in distributed storage. Hive provides a mechanism to project structure onto this data and query the data using a SQL-like language called HiveQL.&amp;rdquo; You can &lt;a href=&#34;https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-RunningHive&#34;&gt;start up Hive&lt;/a&gt; and enter HiveQL commands at its prompt or you can &lt;a href=&#34;https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli&#34;&gt;pass it scripts&lt;/a&gt; instead of using it interactively. If you know the basics of SQL, you&amp;rsquo;ll be off and running pretty quickly. The 4:33 video &lt;a href=&#34;https://www.youtube.com/watch?v=iaZcgAwBbS0&#34;&gt;Demonstration of Apache Hive&lt;/a&gt; by Rob Kerr gives a nice short introduction to writing and running Hive scripts.&lt;/p&gt;
&lt;p&gt;Apache &lt;a href=&#34;http://pig.apache.org/&#34;&gt;Pig&lt;/a&gt; is another Hadoop utility that takes advantage of HCatalog. The &amp;ldquo;Pig Latin&amp;rdquo; scripting language is less SQL-like (but straightforward enough) and lets you create data structures on the fly so that you can pipeline data through a series of steps. You can run its commands &lt;a href=&#34;http://pig.apache.org/docs/r0.14.0/start.html#interactive-mode&#34;&gt;interactively&lt;/a&gt; at its grunt shell or in &lt;a href=&#34;http://pig.apache.org/docs/r0.14.0/start.html#batch-mode&#34;&gt;batch mode&lt;/a&gt; from the operating system command line.&lt;/p&gt;
&lt;p&gt;When should you use Hive and when should you use Pig? It&amp;rsquo;s a common topic of discussion; a Google search for &lt;a href=&#34;https://www.google.com/search?q=%22pig+vs.+hive%22&#34;&gt;&amp;ldquo;pig vs. hive&amp;rdquo;&lt;/a&gt; gets over 2,000 hits. Sometimes it&amp;rsquo;s just a matter of convention at a particular shop. The stackoverflow thread &lt;a href=&#34;http://stackoverflow.com/questions/3356259/difference-between-pig-and-hive-why-have-both/6924095&#34;&gt;Difference between Pig and Hive? Why have both?&lt;/a&gt; has some good points as well as pointers to more detailed discussions, including a &lt;a href=&#34;https://developer.yahoo.com/blogs/hadoop/comparing-pig-latin-sql-constructing-data-processing-pipelines-444.html&#34;&gt;Yahoo developer network discussion&lt;/a&gt; that doesn&amp;rsquo;t mention Hive by name but has a good description of the basics of Pig and how it compares to an SQL approach.&lt;/p&gt;
&lt;blockquote id=&#34;id123024&#34; class=&#34;pullquote&#34;&gt;You know what would be cool? A Hive adapter for D2R.&lt;/blockquote&gt;
&lt;p&gt;Hive and Pig are both very big in the Hadoop world, but plenty of other such tools are coming along. The home page of Apache &lt;a href=&#34;https://storm.apache.org/&#34;&gt;Storm&lt;/a&gt; tells us that it &amp;ldquo;makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing.&amp;rdquo; Apache &lt;a href=&#34;https://spark.apache.org/&#34;&gt;Spark&lt;/a&gt; provides Java, Scala, and Python APIs and promises greater speed and an ability to layer on top of many different classes of data sources as its main advantages. There are other tools, but I mention these two because according to the recent &lt;a href=&#34;http://www.oreilly.com/data/free/files/2014-data-science-salary-survey.pdf&#34;&gt;O&amp;rsquo;Reilly 2014 Data Science Salary Survey&lt;/a&gt;, &amp;ldquo;Storm and Spark users earn the highest median salary&amp;rdquo; of all the data science tools they surveyed. Neither is restricted to use with Hadoop, but the big players described below advertise support for one or both as advantages of their Hadoop distributions.&lt;/p&gt;
&lt;p&gt;Another popular tool in the Hadoop ecosystem is Apache &lt;a href=&#34;http://hbase.apache.org/&#34;&gt;HBase&lt;/a&gt;, the most well-known of the column-oriented NoSQL databases. It can sit on top of HDFS, and its tables can host both input and output for MapReduce jobs.&lt;/p&gt;
&lt;h2 id=&#34;id123126&#34;&gt;The big players&lt;/h2&gt;
&lt;p&gt;The companies &lt;a href=&#34;http://www.cloudera.com&#34;&gt;Cloudera&lt;/a&gt;, &lt;a href=&#34;http://hortonworks.com&#34;&gt;HortonWorks&lt;/a&gt;, and &lt;a href=&#34;https://www.mapr.com/&#34;&gt;MapR&lt;/a&gt; have gotten famous and made plenty of money selling and supporting packaged Hadoop distributions that include additional tools to make them easier to set up and use than the Apache downloads. After hearing that HortonWorks stayed closer to the open source philosophy than the others, I tried their distribution and found that it includes many additional web-based tools to shield you from the command line. For example, it lets you enter Hive and Pig Latin commands into IDE-ish windows designed around these tools, and it includes a graphical drag-and-drop file browser interface to the HDFS. I found the tutorials in the &amp;ldquo;Hello World&amp;rdquo; section of their &lt;a href=&#34;http://hortonworks.com/tutorials/&#34;&gt;Tutorials&lt;/a&gt; page to be very helpful. I have no experience with the other two companies, but a Google search on &lt;a href=&#34;https://www.google.com/search?q=cloudera+hortonworks+mapr&#34;&gt;cloudera hortonworks mapr&lt;/a&gt; finds a &lt;em&gt;lot&lt;/em&gt; of discussions out there comparing the three.&lt;/p&gt;
&lt;p&gt;Pre-existing big IT names such as IBM and Microsoft have also jumped into the Hadoop market; when you do a Google search for just &lt;a href=&#34;https://www.google.com/search?q=hadoop&#34;&gt;hadoop&lt;/a&gt;, it&amp;rsquo;s interesting to see which companies have paid relatively how much for Google AdWord placement.&lt;/p&gt;
&lt;h2 id=&#34;id123170&#34;&gt;Hadoop&amp;rsquo;s future&lt;/h2&gt;
&lt;p&gt;One of Hadoop&amp;rsquo;s main uses so far has been to batch process large amounts of data (usually data that fits into one giant table, such as server or transaction logs) to harvest summary data that can be handed off to analytics packages. This is why &lt;a href=&#34;http://www.sas.com&#34;&gt;SAS&lt;/a&gt; and &lt;a href=&#34;http://www.pentaho.com&#34;&gt;Pentaho&lt;/a&gt;, who do not have their own Hadoop distributions, have paid for good Google AdWord placement when you search for &amp;ldquo;hadoop&amp;rdquo;—they want you to use their products for the analytics part.&lt;/p&gt;
&lt;p&gt;A hot area of growth seems to be the promise of using Hadoop for more real-time processing, which is driving the escalation in Storm and Spark&amp;rsquo;s popularity. Even in batch processing, there are still plenty of new opportunities in the Hadoop world as people adapt more kinds of data for use with the growing tool set. The &amp;ldquo;one giant table&amp;rdquo; representation is usually necessary to ease the splitting up of your data for distribution across multiple nodes; with my RDF hat on, I think there are some interesting possibilities for representing complex data structures in Hadoop using the &lt;a href=&#34;http://www.w3.org/TR/n-triples/&#34;&gt;N-Triples&lt;/a&gt; RDF syntax, which will still look like one giant three- (or four-) column table to Hadoop.&lt;/p&gt;
&lt;p&gt;Cloudera&amp;rsquo;s Paolo Castagna has done some work in this direction, as described in his presentation &amp;ldquo;Handling RDF data with tools from the Hadoop ecosystem&amp;rdquo; (&lt;a href=&#34;http://archive.apachecon.com/eu2012/presentations/07-Wednesday/L2L-Web_Infra/aceu-2012-handling-rdf-data-with-tools-from-the-hadoop-ecosystem.pdf&#34;&gt;pdf&lt;/a&gt;). A more recent presentation &lt;a href=&#34;http://www.slideshare.net/RobVesse/quadrupling-your-elephants-rdf-and-the-hadoop-ecosystem&#34;&gt;Quadrupling your Elephants: RDF and the Hadoop Ecosystem&lt;/a&gt; by YarcData&amp;rsquo;s Rob Vesse shows some interesting work as well, including the beginnings of some Jena-based tools for processing RDF with Hadoop. There has been some work at the University of Freiberg on SPARQL query processing using Hadoop (&lt;a href=&#34;https://github.com/lidingpku/iswc2014/blob/master/paper/87960161-sempala-interactive-sparql-query-processing-on-hadoop.pdf?raw=true&#34;&gt;pdf&lt;/a&gt;), and &lt;a href=&#34;http://sparqlcity.com/&#34;&gt;SPARQL City&lt;/a&gt; also offers a SPARQL front end to Hadoop-based storage. (If anyone&amp;rsquo;s looking for a semantic web project idea, you know what would be cool? A Hive adapter for &lt;a href=&#34;http://d2rq.org/d2r-server&#34;&gt;D2R&lt;/a&gt;.) I think there&amp;rsquo;s a very bright future for the cross-pollination for all of these tools.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2014">2014</category>
      
      <category domain="https://www.bobdc.com//categories/big-data">big data</category>
      
    </item>
    
    <item>
      <title>Querying aggregated Walmart and BestBuy data with SPARQL</title>
      <link>https://www.bobdc.com/blog/querying-aggregated-walmart-an/</link>
      <pubDate>Sun, 09 Nov 2014 09:35:56 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/querying-aggregated-walmart-an/</guid>
      
      
      <description><div>From structured data in their web pages!</div><div>&lt;img id=&#34;id124993&#34; src=&#34;https://www.bobdc.com/img/main/walmartbestbuysparql.jpg&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;some description&#34;/&gt;
&lt;p&gt;The combination of &lt;a href=&#34;http://www.w3.org/TR/microdata/&#34;&gt;microdata&lt;/a&gt; and &lt;a href=&#34;http://schema.org/&#34;&gt;schema.org&lt;/a&gt; seems to have hit a sweet spot that has helped both to get a lot of traction. I&amp;rsquo;ve been learning more about microdata recently, but even before I did, I found that the W3C&amp;rsquo;s &lt;a href=&#34;http://www.w3.org/2012/pyMicrodata/&#34;&gt;Microdata to RDF Distiller&lt;/a&gt; written by Ivan Herman would convert microdata stored in web pages into RDF triples, making it possible to query this data with SPARQL. With major retailers such as Walmart and BestBuy making such data available on—as far as I can tell—every single product&amp;rsquo;s web page, this makes some interesting queries possible to compare prices and other information from the two vendors.&lt;/p&gt;
&lt;p&gt;I extracted the data describing six external USB drives from both walmart.com and bestbuy.com, limiting myself to models that were available on both websites. (Instead of pulling it separately from the twelve individual web pages, it would have been nice to automate this a bit more. I did sign up for &lt;a href=&#34;https://developer.walmartlabs.com/&#34;&gt;Walmart&amp;rsquo;s API program&lt;/a&gt;, which was easy to try out, but the part of the API that lets you query products by category is &amp;ldquo;restricted, and is available on a request basis&amp;rdquo; according to their &lt;a href=&#34;https://developer.walmartlabs.com/docs/read/Home&#34;&gt;Data Feed API home page&lt;/a&gt;, so I didn&amp;rsquo;t bother. If I was going to pursue this further I would enroll in &lt;a href=&#34;https://developer.bestbuy.com/apis&#34;&gt;BestBuy&amp;rsquo;s Developer Program&lt;/a&gt; as well.) After using the Distiller form to do this several times, I downloaded its Python script from the &lt;a href=&#34;https://github.com/RDFLib/pymicrodata&#34;&gt;pymicrodata github page&lt;/a&gt; and found it easy to run locally.&lt;/p&gt;
&lt;p&gt;You can see a Turtle file of aggregated &lt;a href=&#34;http://snee.com/bobdc.blog/files/walmartplusbestbuy.ttl&#34;&gt;Walmart plus Bestbuy data here&lt;/a&gt;. Because of some slight differences in how they treated certain bits of data, I was tempted to clean up the aggregated data before querying it, but I really wanted to write queries that would work on the data in its native form, so I put the cleanup steps right in the queries.&lt;/p&gt;
&lt;p&gt;The various queries that I wrote led up to this one, which lists all the products by model number and price for easy comparison:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;PREFIX schema: &amp;lt;http://schema.org/&amp;gt; 
PREFIX xsd:    &amp;lt;http://www.w3.org/2001/XMLSchema#&amp;gt; 


SELECT ?productName ?modelNumber ?price ?sellerName 
WHERE {
   ?product a schema:Product . 
   ?product schema:name ?productNameVal . 
   # str() to strip any language tags
   BIND(str(?productNameVal) AS ?productName)
   ?product schema:model ?modelNumberVal . 
   BIND(str(?modelNumberVal) AS ?modelNumber)
   ?product schema:offers ?offer . 
   ?offer a schema:Offer . 
   ?offer schema:price ?priceVal . 
   # Remove $ and cast to decimal
   BIND(xsd:decimal(replace(?priceVal,&amp;quot;\$&amp;quot;,&amp;quot;&amp;quot;)) AS ?price)
   ?offer schema:seller ?seller. 
   # In case there&#39;s a level of indirection for seller name
   OPTIONAL {
    ?seller schema:name ?sellerSchemaName . 
   }
   BIND(str(coalesce(?sellerSchemaName,?seller)) AS ?sellerName )
}
ORDER BY ?modelNumber ?price
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Each comment in the query describes how it accounts for some difference between the Walmart microdata and the BestBuy microdata—for example, the BestBuy data included a dollar sign with prices, but the Walmart data did not.&lt;/p&gt;
&lt;p&gt;After running the query, requesting &lt;a href=&#34;http://www.w3.org/TR/rdf-sparql-XMLres/&#34;&gt;XML&lt;/a&gt; output, and then running a little XSLT on that output, I ended up with the table shown below.&lt;/p&gt;
&lt;p&gt;Product Name                                                                                                    Model Number         Price    Seller Name&lt;/p&gt;
&lt;hr /&gt;
&lt;p&gt;Buffalo - DriveStation Axis Velocity 2TB External USB 3.0/2.0 Hard Drive                                        HD-LX2.0TU3          106.99   BestBuy
Buffalo Technology DriveStation Axis Velocity 2TB USB 3.0 External Hard Drive with Hardware Encryption, Black   HD-LX2.0TU3          108.25   Walmart.com
Buffalo Technology DriveStation Axis Velocity 2TB USB 3.0 External Hard Drive with Hardware Encryption, Black   HD-LX2.0TU3          129.45   pcRUSH
Buffalo Technology DriveStation Axis Velocity 2TB USB 3.0 External Hard Drive with Hardware Encryption, Black   HD-LX2.0TU3          143.69   Tonzof&lt;/p&gt;
&lt;p&gt;Toshiba - Canvio Basics 1 TB External Hard Drive                                                                HDTB210XK3BA         68.60    Buy.com
Toshiba 1TB Canvio Basics USB 3.0 External Hard Drive                                                           HDTB210XK3BA         73.84    pcRUSH
Toshiba 1TB Canvio Basics USB 3.0 External Hard Drive                                                           HDTB210XK3BA         99.0     Walmart.com&lt;/p&gt;
&lt;p&gt;Toshiba Canvio Basics 2TB USB 3.0 External Hard Drive                                                           HDTB220XK3CA         103.14   Walmart.com
Toshiba - Canvio Basics Hard Drive                                                                              HDTB220XK3CA         108.57   Buy.com&lt;/p&gt;
&lt;p&gt;Seagate - Backup Plus Slim 1TB External USB 3.0/2.0 Portable Hard Drive - Black                                 STDR1000100          69.99    BestBuy
Seagate Backup Plus 1TB Slim Portable External Hard Drive, Black                                                STDR1000100          89.99    Walmart.com&lt;/p&gt;
&lt;p&gt;WD - My Book 3TB External USB 3.0 Hard Drive - Black                                                            WDBFJK0030HBK-NESN   128.99   BestBuy
WD My Book 3TB USB 3.0 External Hard Drive                                                                      WDBFJK0030HBK-NESN   129.99   Walmart.com&lt;/p&gt;
&lt;p&gt;WD - My Book 4TB External USB 3.0 Hard Drive - Black                                                            WDBFJK0040HBK-NESN   149.99   BestBuy
WD My Book 4TB USB 3.0 External Hard Drive                                                                      WDBFJK0040HBK-NESN   169.99   Walmart.com&lt;/p&gt;
&lt;p&gt;Vendors other than Walmart and BestBuy on the list were included in the Walmart data.&lt;/p&gt;
&lt;p&gt;Unfortunately, since I pulled the data that I was working with on October 15th, Walmart seems to have changed their web pages so that the W3C Microdata to RDF Distiller doesn&amp;rsquo;t find the data in them anymore. I still see schema.org microdata in the source of a page like &lt;a href=&#34;http://www.walmart.com/ip/Toshiba-Retail-Hard-Drives-HDTB210XK3BA-1tb-Canvio-Basics-Usb-3.-0/36467233&#34;&gt;this Walmart page for an external hard drive&lt;/a&gt;, but I guess it&amp;rsquo;s arranged differently. Perhaps they didn&amp;rsquo;t want people using standards-based technology to automate the process of finding out that BestBuy&amp;rsquo;s external hard drives usually cost less, or at least did in mid-October. A random check of products on other websites showed that the Distiller could pull useful data out of pages on &lt;a href=&#34;http://www.target.com&#34;&gt;target.com&lt;/a&gt;, &lt;a href=&#34;http://www.llbean.com&#34;&gt;llbean.com&lt;/a&gt;, and &lt;a href=&#34;http://www.marksandspencer.com/&#34;&gt;markesandspencer.com&lt;/a&gt;, so plenty of other major retailers are providing schema.org microdata in their product web pages.&lt;/p&gt;
&lt;p&gt;The important thing is that, even before I knew anything about the structure and syntax of microdata, a publicly available open source program let me pull and aggregate data from different big box stores&amp;rsquo; web sites so that I could query the combination with SPARQL. With more and more brand name retailers making data available for this, this will definitely make some interesting applications possible in the future.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2014">2014</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
    </item>
    
    <item>
      <title>Dropping OPTIONAL blocks from SPARQL CONSTRUCT queries</title>
      <link>https://www.bobdc.com/blog/dropping-optional-blocks-from/</link>
      <pubDate>Mon, 06 Oct 2014 19:42:01 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/dropping-optional-blocks-from/</guid>
      
      
      <description><div>And retrieving those triples much, much faster.</div><div>&lt;img id=&#34;id123899&#34; src=&#34;https://www.bobdc.com/img/main/animals.png&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;animals taxonomy&#34;/&gt;
&lt;p&gt;While preparing a demo for the upcoming &lt;a href=&#34;http://www.taxonomybootcamp.com/2014/Tuesday.aspx&#34;&gt;Taxonomy Boot Camp&lt;/a&gt; conference, I hit upon a trick for revising SPARQL CONSTRUCT queries so that they don&amp;rsquo;t need OPTIONAL blocks. As I wrote in the new &amp;ldquo;Query Efficiency and Debugging&amp;rdquo; chapter in the second edition of &lt;a href=&#34;http://www.learningsparql.com&#34;&gt;Learning SPARQL&lt;/a&gt;, &amp;ldquo;Academic papers on SPARQL query optimization agree: OPTIONAL is the guiltiest party in slowing down queries, adding the most complexity to the job that the SPARQL processor must do to find the relevant data and return it.&amp;rdquo; My new trick not only made the retrieval much faster; it also made it possible to retrieve a lot more data from a remote endpoint.&lt;/p&gt;
&lt;p&gt;First, let&amp;rsquo;s look at a simple version of the use case. DBpedia has a lot of &lt;a href=&#34;http://www.w3.org/TR/2009/REC-skos-reference-20090818/&#34;&gt;SKOS&lt;/a&gt; taxonomy data in it, and at Taxonomy Boot Camp I&amp;rsquo;m going to show how you can pull down and use that data. Now, imagine that a little animal taxonomy like the one shown in the illustration here is stored on an endpoint and I want to write a query to retrieve all the triples showing preferred labels and &amp;ldquo;has broader&amp;rdquo; values up to three levels down from the Mammal concept, assuming that the taxonomy&amp;rsquo;s structure uses SKOS to represent its structure. The following query asks for all three levels of the taxonomy below Mammal, but it won&amp;rsquo;t get the whole taxonomy:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;PREFIX skos: &amp;lt;http://www.w3.org/2004/02/skos/core#&amp;gt;
CONSTRUCT {
  ?level1 skos:prefLabel ?level1label . 
  ?level2 skos:broader ?level1 ;
          skos:prefLabel ?level2label . 
  ?level3 skos:broader ?level2 ;
          skos:prefLabel ?level3label . 
}
WHERE {
  ?level1 skos:broader v:Mammal ;
          skos:prefLabel ?level1label . 
  ?level2 skos:broader ?level1 ;
          skos:prefLabel ?level2label .
  ?level3 skos:broader ?level2 ;
          skos:prefLabel ?level3label . 
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;As with any SPARQL query, it&amp;rsquo;s only going to return triples for which &lt;em&gt;all&lt;/em&gt; the triple patterns in the WHERE clause match. While Horse may have a broader value of Mammal and therefore match the triple pattern &lt;code&gt;{?level1 skos:broader v:Mammal}&lt;/code&gt;, there are no nodes that have Horse as a broader value, so there will be no match for &lt;code&gt;{?level2 skos:broader v:Horse}&lt;/code&gt;. So, the Horse triples won&amp;rsquo;t be in the output. The same thing will happen with the Cat triples; only the Dog ones, which go down three levels below Mammal, will match the graph pattern in the WHERE clause above.&lt;/p&gt;
&lt;p&gt;If we want a CONSTRUCT query that retrieves all the triples of the subtree under Mammal, we need a way to retrieve the Horse and Cat concepts and any descendants they have, even if they have no descendants, and OPTIONAL makes this possible. The following will do this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;PREFIX skos: &amp;lt;http://www.w3.org/2004/02/skos/core#&amp;gt;
CONSTRUCT {
  ?level1 skos:prefLabel ?level1label . 
  ?level2 skos:broader ?level1 ;
          skos:prefLabel ?level2label . 
  ?level3 skos:broader ?level2 ;
          skos:prefLabel ?level3label . 
}
WHERE {
  ?level1 skos:broader v:Mammal ;
          skos:prefLabel ?level1label . 
  OPTIONAL {
    ?level2 skos:broader ?level1 ;
            skos:prefLabel ?level2label .
  }
  OPTIONAL {
    ?level3 skos:broader ?level2 ;
            skos:prefLabel ?level3label . 
  }
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The problem: this doesn&amp;rsquo;t scale. When I sent a nearly identical query to DBpedia to ask for the triples representing the hierarchy three levels down from &amp;lt;&lt;a href=&#34;http://dbpedia.org/page/Category:Mammals&#34;&gt;http://dbpedia.org/resource/Category:Mammals&lt;/a&gt;&amp;gt;, it timed out after 20 minutes, because the two OPTIONAL graph patterns gave DBpedia too much work to do.&lt;/p&gt;
&lt;p&gt;As a review, let&amp;rsquo;s restate the problem: we want the identified concept and the preferred labels and broader values of concepts up to three levels down from that concept, but without using the OPTIONAL keyword. How can we do this?&lt;/p&gt;
&lt;p&gt;By asking for each level in a separate query. When I split the DBpedia version of the query above into the following three queries, each retrieved its data in under a second, retrieving a total of 2,597 triples representing a taxonomy of 1,107 concepts:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;# query 1
PREFIX skos: &amp;lt;http://www.w3.org/2004/02/skos/core#&amp;gt;
CONSTRUCT {
  &amp;lt;http://dbpedia.org/resource/Category:Mammals&amp;gt; a skos:Concept . 
  ?level1 a skos:Concept ;
          skos:broader &amp;lt;http://dbpedia.org/resource/Category:Mammals&amp;gt; ;
          skos:prefLabel ?level1label .  
}
WHERE {
  ?level1 skos:broader &amp;lt;http://dbpedia.org/resource/Category:Mammals&amp;gt; ;
          skos:prefLabel ?level1label .  
}


# query 2
PREFIX skos: &amp;lt;http://www.w3.org/2004/02/skos/core#&amp;gt;
CONSTRUCT {
  ?level2 a skos:Concept ;
          skos:broader ?level1 ;  
          skos:prefLabel ?level2label .  
}
WHERE {
  ?level1 skos:broader &amp;lt;http://dbpedia.org/resource/Category:Mammals&amp;gt; .
  ?level2 skos:broader ?level1 ;  
            skos:prefLabel ?level2label .  
}


# query 3
PREFIX skos: &amp;lt;http://www.w3.org/2004/02/skos/core#&amp;gt;
CONSTRUCT {
  ?level3 a skos:Concept ;
          skos:broader ?level2 ;  
          skos:prefLabel ?level3label .  
}
WHERE {
?level2 skos:broader/skos:broader &amp;lt;http://dbpedia.org/resource/Category:Mammals&amp;gt; .
  ?level3 skos:broader ?level2 ;  
          skos:prefLabel ?level3label .  
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Going from timing out after 20 minutes to successful execution in under 3 seconds is quite a performance improvement. Below, you can see how the beginning of a small piece of this taxonomy looks in TopQuadrant&amp;rsquo;s &lt;a href=&#34;http://www.topquadrant.com/products/topbraid-enterprise-vocabulary-net/&#34;&gt;TopBraid EVN&lt;/a&gt; vocabulary manager. At the first level down, you can only see &lt;a href=&#34;https://en.wikipedia.org/wiki/Category:Afrosoricida&#34;&gt;Afrosoricida&lt;/a&gt;, &lt;a href=&#34;https://en.wikipedia.org/wiki/Category:Australosphenida&#34;&gt;Australosphenida&lt;/a&gt;, and &lt;a href=&#34;https://en.wikipedia.org/wiki/Category:Bats&#34;&gt;Bats&lt;/a&gt; in the picture; I then drilled down three more levels from there to show that &lt;a href=&#34;https://en.wikipedia.org/wiki/Category:Fictional_bats&#34;&gt;Fictional bats&lt;/a&gt; has the single subcategory &lt;a href=&#34;https://en.wikipedia.org/wiki/Category:Silverwing&#34;&gt;Silverwing&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;As you can tell from the Mammals URI in the queries above, these taxonomy concepts are categories, and each category has at least one member (for example, &lt;a href=&#34;http://en.wikipedia.org/wiki/Category:Bats_as_food&#34;&gt;Bats as food&lt;/a&gt;) in Wikipedia and is therefore represented as triples in DBpedia, ready for you to retrieve with SPARQL CONSTRUCT queries. I didn&amp;rsquo;t retrieve any instance triples here, but it&amp;rsquo;s great to know that they&amp;rsquo;re available, and that this technique for avoiding CONSTRUCT graph patterns will serve me for much more than SKOS taxonomy work.&lt;/p&gt;
&lt;p&gt;There has been plenty of talk lately on Twitter and in blogs about how it&amp;rsquo;s not a good idea for important applications to have serious dependencies on public SPARQL endpoints such as DBpedia. (Orri Erling has one of the most level-head discussions of this that I&amp;rsquo;ve seen in &lt;a href=&#34;http://www.openlinksw.com/weblog/oerling/?id=1815&#34;&gt;SEMANTiCS 2014 (part 3 of 3): Conversations&lt;/a&gt;; in my posting &lt;a href=&#34;https://www.bobdc.com/blog/semantic-web-journal-article-o&#34;&gt;Semantic Web Journal article on DBpedia&lt;/a&gt; on this blog I described a great article that lists other options.) There&amp;rsquo;s all this great data to use in DBpedia, and besides spinning up an Amazon Web Services image with your own copy of DBpedia, as Orri suggests, you can pull down the data you need to store locally when it is up. If you&amp;rsquo;re unsure about the structure and connections of the data you&amp;rsquo;re pulling down, OPTIONAL graph patterns seems like an obvious fix, but this trick for splitting up CONSTRUCT queries to avoid the use of OPTIONAL graph patterns means that you can pull down a lot more data lot more efficiently.&lt;/p&gt;
&lt;h2 id=&#34;i1&#34;&gt;&lt;a href=&#34;https://www.youtube.com/watch?v=eAzhz29eVec&#34;&gt;Stickin&amp;rsquo; to the UNION&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;&lt;em&gt;October 16th update:&lt;/em&gt; Once I split out the pieces of the original query into separate files, it should have occurred to me to at least try joining them back up into a single query with UNION instead of OPTIONAL, but it didn&amp;rsquo;t. Luckily for me, &lt;a href=&#34;https://twitter.com/wohnjalker&#34;&gt;John Walker&lt;/a&gt; suggested in the &lt;a href=&#34;https://plus.google.com/u/1/101006505484718936507/posts/R73sqAwMPzk&#34;&gt;comments&lt;/a&gt; for this blog entry that I try this, so I did. It worked great, with the benefit of being simpler to read and maintain than using a collection of queries to retrieve a single set of triples. This version only took three seconds to retrieve the triples:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;PREFIX skos: &amp;lt;http://www.w3.org/2004/02/skos/core#&amp;gt;
CONSTRUCT {
  &amp;lt;http://dbpedia.org/resource/Category:Mammals&amp;gt; a skos:Concept . 
  ?level1 a skos:Concept ;
          skos:broader &amp;lt;http://dbpedia.org/resource/Category:Mammals&amp;gt; ;
          skos:prefLabel ?level1label .  
  ?level2 a skos:Concept ;
          skos:broader ?level1 ;  
          skos:prefLabel ?level2label .  
  ?level3 a skos:Concept ;
          skos:broader ?level2 ;  
          skos:prefLabel ?level3label .  


}
WHERE {
  ?level1 skos:broader &amp;lt;http://dbpedia.org/resource/Category:Mammals&amp;gt; ;
          skos:prefLabel ?level1label .  
  {
    ?level2 skos:broader ?level1 ;  
    skos:prefLabel ?level2label .  
  }
  UNION
  {
    ?level2 skos:broader ?level1 .
    ?level3 skos:broader ?level2 ;  
            skos:prefLabel ?level3label .  
  }
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;There are two lessons here:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;If you&amp;rsquo;ve figured out a way to do something better, don&amp;rsquo;t be too satisfied too quickly—keep trying to make it even better.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;UNION is going to be useful in more situations than I originally thought it would.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;img id=&#34;id126274&#34; src=&#34;https://www.bobdc.com/img/main/mammalsDbpedia.png&#34; border=&#34;0&#34; alt=&#34;animals taxonomy&#34;/&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2014">2014</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
    </item>
    
    <item>
      <title>A schemaless computer database in 1965</title>
      <link>https://www.bobdc.com/blog/a-schemaless-computer-database/</link>
      <pubDate>Sat, 13 Sep 2014 11:09:09 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/a-schemaless-computer-database/</guid>
      
      
      <description><div>To enable flexible metadata aggregation, among other things.</div><div>&lt;img id=&#34;id117341&#34; src=&#34;https://www.bobdc.com/img/main/varLengthRecordLayout.jpg&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;figure 3&#34; width=&#34;360&#34;/&gt;
&lt;p&gt;I&amp;rsquo;ve been reading up on America&amp;rsquo;s post-war attempt to keep up the accelerated pace of R&amp;amp;D that began during World War II. This effort led to an infrastructure that made accomplishments such as the moon landing and the Internet possible; it also led to some very dry literature, and I&amp;rsquo;m mostly interested in what new metadata-related techniques were developed to track and share the products of the research as they led to development.&lt;/p&gt;
&lt;p&gt;One dry bit of literature is the proceedings of the 1965 &lt;a href=&#34;http://www.worldcat.org/title/toward-a-national-information-system-second-annual-national-colloquium-on-information-retrieval-april-23-24-1965-philadelphia-pennsylvania/oclc/675171&#34;&gt;Toward a National Information System: Second Annual National Colloquium On Information Retrieval&lt;/a&gt;. The conference was sponsored by the American Documentation Institute, who had a big role in the post-war information sharing work, as well as the University of Pennsylvania&amp;rsquo;s Moore School of Electrical Engineering (where Eckert and Mauchly built &lt;a href=&#34;http://en.wikipedia.org/wiki/ENIAC&#34;&gt;ENIAC&lt;/a&gt; and its successor &lt;a href=&#34;http://en.wikipedia.org/wiki/EDVAC&#34;&gt;EDVAC&lt;/a&gt;) and some ACM chapters.&lt;/p&gt;
&lt;p&gt;In a chapter on how the North American Aviation company (now part of Boeing) revamped their practices for sharing information among divisions, I came across this description of some very flexible metadata storage:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;All bibliographic information contained in both the corporate and divisional Electronic Data Processing (EDP) subsystems is retained permanently on magnetic tape in the form of variable length records containing variable length fields. Each field, with the exception of sort keys, consists of three adjacent field parts: field character count, field identification, and field text (see Figure 3). There are several advantages to this format: it is extremely compact, thereby reducing computer read-write time; it provides for definition and consequent addition of new types of fields of bibliographic information without reformatting extant files; and its flexibility allows conversion of files from other indexing abstracting services.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I especially like that &amp;ldquo;it provides for definition and consequent addition of new types of fields of bibliographic information without reformatting extant files.&amp;rdquo; This reminds me of &lt;a href=&#34;http://www.slideshare.net/bobdc/semantic-web-standards-and-the-variety-v-of-big-data/7&#34;&gt;one slide&lt;/a&gt; in my presentation last month at the &lt;a href=&#34;http://semtechbizsj2014.semanticweb.com&#34;&gt;Semantic Technology and Business&lt;/a&gt; / &lt;a href=&#34;http://nosql2014.dataversity.net/&#34;&gt;NoSQL Now!&lt;/a&gt; conferences last month, where my talk was on a track shared by both conferences, about how a key advantage of schemaless NoSQL databases is the ability to add a new value for a new property to a data set with no need for the schema evolution steps that can be so painful in a relational database.&lt;/p&gt;
&lt;p&gt;Moore&amp;rsquo;s law has led to less of a reliance on arranging data in tables to allow the efficient retrieval of that data. The various NoSQL options have explored new ways to do this, and it was great to see that one aerospace company was doing it 49 years ago. Of course, retrieving data from &lt;a href=&#34;http://en.wikipedia.org/wiki/Magnetic_tape_data_storage&#34;&gt;magnetic tape&lt;/a&gt; is less efficient than modern alternatives, but it was a big step past the use of piles of punched cards, and pretty modern for its time, as you can see from the tape spools on the picture of EDVAC&amp;rsquo;s gleaming successor below. I thought it was cool to see that, although tabular representation of data long predates relational databases (&lt;a href=&#34;http://en.wikipedia.org/wiki/Hierarchical_database_model&#34;&gt;hierarchical&lt;/a&gt; and &lt;a href=&#34;http://en.wikipedia.org/wiki/Network_model&#34;&gt;network&lt;/a&gt; databases also stored sets of entities as tables, but with much less flexibility) that someone had implemented such a flexible model so long ago, especially to represent metadata, with a use case that we often see now with RDF: to allow &amp;ldquo;conversion of files from other indexing abstracting services&amp;rdquo;—in other words, to accomodate the aggregation of metadata from other sources that may not have structured their data the same way that yours is structured.&lt;/p&gt;
&lt;img id=&#34;id117169&#34; src=&#34;https://www.bobdc.com/img/main/Univac_9400.jpg&#34; border=&#34;0&#34; vspace=&#34;30px&#34; alt=&#34;Univac 9400&#34; width=&#34;640&#34;/&gt;
&lt;p&gt;Univac photo by &lt;a href=&#34;http://www.technikum29.de/en/computer/univac9400&#34;&gt;H. Müller&lt;/a&gt; &lt;a href=&#34;http://creativecommons.org/licenses/by-sa/2.5&#34;&gt;CC-BY-SA-2.5&lt;/a&gt;, &lt;a href=&#34;http://commons.wikimedia.org/wiki/File%3AUnivac_9400.jpg&#34;&gt;via Wikimedia Commons&lt;/a&gt;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2014">2014</category>
      
      <category domain="https://www.bobdc.com//categories/metadata">metadata</category>
      
      <category domain="https://www.bobdc.com//categories/technology-past">technology, past</category>
      
    </item>
    
    <item>
      <title>Exploring a SPARQL endpoint</title>
      <link>https://www.bobdc.com/blog/exploring-a-sparql-endpoint/</link>
      <pubDate>Sun, 24 Aug 2014 13:03:27 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/exploring-a-sparql-endpoint/</guid>
      
      
      <description><div>In this case, semanticweb.org.</div><div>&lt;p&gt;&lt;a href=&#34;https://www.bobdc.com/img/main/iswcsparqlpapers.png&#34;&gt;&lt;img id=&#34;id124989&#34; src=&#34;https://www.bobdc.com/img/main/iswcsparqlpapers.png&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; width=&#34;300&#34; alt=&#34;graph of ISWC SPARQL papers&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;In the second edition of my book &lt;a href=&#34;http://www.learningsparql.com/&#34;&gt;Learning SPARQL&lt;/a&gt;, a new chapter titled &amp;ldquo;A SPARQL Cookbook&amp;rdquo; includes a section called &amp;ldquo;Exploring the Data,&amp;rdquo; which features useful queries for looking around a dataset that you know little or nothing about. I was recently wondering about the data available at the SPARQL endpoint &lt;a href=&#34;http://data.semanticweb.org/sparql&#34;&gt;http://data.semanticweb.org/sparql&lt;/a&gt;, so to explore it I put several of the queries from this section of the book to work.&lt;/p&gt;
&lt;p&gt;An important lesson here is how easy SPARQL and RDF make it to explore a dataset that you know nothing about. If you don&amp;rsquo;t know about the properties used, or whether any schema or schemas were used and how much they was used, you can just query for this information. Most hypertext links below will execute the queries they describe using semanticweb.org&amp;rsquo;s &lt;a href=&#34;http://data.semanticweb.org/snorql&#34;&gt;SNORQL interface&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I started with what is generally my favorite query, &lt;a href=&#34;http://data.semanticweb.org/snorql/?query=SELECT+DISTINCT+%3Fp+WHERE+%7B%0D%0A++%3Fs+%3Fp+%3Fo%0D%0A%7D%0D%0A&#34;&gt;listing which predicates are used in the data&lt;/a&gt;, because that&amp;rsquo;s the quickest way to get a flavor for what kind of data is available. Several of the predicates that got listed immediately told me some interesting things:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;rdfs:subClassOf&lt;/code&gt; shows me that there&amp;rsquo;s probably some structure worth exploring.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;dcterms:subject&lt;/code&gt; (and &lt;code&gt;dc:subject&lt;/code&gt;) shows that things have probably been tagged with keywords.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;ical properties such as dtstart shows that events are recorded.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;FOAF properties show that there is probably information about people.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;dcterms:title&lt;/code&gt;, &lt;code&gt;swrc:booktitle&lt;/code&gt;, &lt;code&gt;dc:title&lt;/code&gt;, &lt;code&gt;src:title&lt;/code&gt;, and &lt;code&gt;swrc:subtitle&lt;/code&gt; show me that works are covered.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;An RDF dataset may or may not have explicit structure, and the use of &lt;code&gt;rdfs:subClassOf&lt;/code&gt; in this data showed me that there was, so my next query asked &lt;a href=&#34;http://data.semanticweb.org/snorql/?query=SELECT+*+WHERE+%7B%0D%0A++%3Fsubclass+rdfs%3AsubClassOf+%3Fsuperclass%0D%0A%7D%0D%0A&#34;&gt;what classes were subclasses of what classes&lt;/a&gt; so that I could get an overview of how much structure the dataset included. The result showed me that the ontology seemed to be mostly in the swc namespace, which turns out to be the semanticweb.com conference ontology. The site does include &lt;a href=&#34;http://data.semanticweb.org/ns/swc/swc_2009-05-09.html&#34;&gt;nice documentation for this ontology&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The use of the FOAF vocabulary showed me that there are probably people described, but if the properties &lt;code&gt;foaf:name&lt;/code&gt;, &lt;code&gt;foaf:lastName&lt;/code&gt;, &lt;code&gt;foaf:familyName&lt;/code&gt;, &lt;code&gt;foaf:family_name&lt;/code&gt;, and &lt;code&gt;foaf:surname&lt;/code&gt; are all in there, which should I try first? A quick &lt;a href=&#34;http://data.semanticweb.org/snorql/?query=SELECT+%3Fs+%3Fp+WHERE+%7B%0D%0A++%3Fs+%3Fp+%22DuCharme%22%0D%0A%7D%0D%0A&#34;&gt;ego search&lt;/a&gt; showed &lt;code&gt;foaf:family_name&lt;/code&gt; being used. It also showed that the URI used to represent me is &lt;a href=&#34;http://data.semanticweb.org/person/bob-ducharme&#34;&gt;http://data.semanticweb.org/person/bob-ducharme&lt;/a&gt;, and because they&amp;rsquo;ve published this data as linked data, sending a browser to that URL showed that it described me as a member of the &lt;a href=&#34;http://iswc2010.semanticweb.org/&#34;&gt;2010 ISWC&lt;/a&gt; program committee.&lt;/p&gt;
&lt;p&gt;It also showed me to be a proud instance of the &lt;code&gt;foaf:Person&lt;/code&gt; class, so I did a query to find out &lt;a href=&#34;http://data.semanticweb.org/snorql/?query=SELECT+%28COUNT%28DISTINCT+%3Fperson%29+AS+%3Fcount%29+WHERE+%7B%0D%0A++%3Fperson+a+foaf%3APerson.+%0D%0A%7D%0D%0A&#34;&gt;how many persons there were in all&lt;/a&gt;: 10,982.&lt;/p&gt;
&lt;p&gt;Given the domain of the ontology and the reason that I was listed, I guessed that it was all about ISWC conferences, so I listed the &lt;code&gt;dc:title&lt;/code&gt; values to see what would show up. The query took long enough that I added a LIMIT keyword to create a &lt;a href=&#34;http://data.semanticweb.org/snorql/?query=SELECT+*+WHERE+%7B%0D%0A++%3Fwork+dc%3Atitle+%3Ftitle%0D%0A%7D%0D%0ALIMIT+100%0D%0A&#34;&gt;politer version&lt;/a&gt; of that query. Looking at the &lt;a href=&#34;http://data.semanticweb.org/snorql/?describe=http%3A%2F%2Fdata.semanticweb.org%2Fconference%2Fiswc-aswc%2F2007%2Ftracks%2Fdoctoral-consortium%2Fpapers%2F905&#34;&gt;complete data&lt;/a&gt; for one work showed all kinds of interesting information, including an &lt;code&gt;swrc:year&lt;/code&gt; value to indicate the year of this paper&amp;rsquo;s conference. A &lt;a href=&#34;http://data.semanticweb.org/snorql/?query=SELECT+DISTINCT+%3Fyear+WHERE+%7B%0D%0A+++%3Fs+swrc%3Ayear+%3Fyear%0D%0A%7D%0D%0A&#34;&gt;list of all year values&lt;/a&gt; showed a range from 2001 right up to 2014, so it&amp;rsquo;s nice to see that they&amp;rsquo;re keeping the data up to date.&lt;/p&gt;
&lt;p&gt;Next, I &lt;a href=&#34;http://data.semanticweb.org/snorql/?query=SELECT+DISTINCT+%3Fyear+%3Ftitle+%7B%0D%0A+++%3Fpaper+dc%3Atitle+%3Ftitle+.+%0D%0A+++FILTER%28contains%28%3Ftitle%2C%22SPARQL%22%29%29%0D%0A+++%3Fpaper+swrc%3Ayear+%3Fyear+.+%0D%0A%7D%0D%0AORDER+BY+%3Fyear%0D%0A&#34;&gt;listed all papers that mention &amp;ldquo;SPARQL&amp;rdquo; in their title&lt;/a&gt;, with their years. After listing &lt;a href=&#34;http://data.semanticweb.org/snorql/?query=SELECT+%3Fyear+%28count%28%3Fyear%29+AS+%3FSPARQLPapers%29+%7B%0D%0A+++%3Fpaper+dc%3Atitle+%3Ftitle+.+%0D%0A+++FILTER%28contains%28%3Ftitle%2C%22SPARQL%22%29%29%0D%0A+++%3Fpaper+swrc%3Ayear+%3Fyear+.+%0D%0A%7D%0D%0AGROUP+BY+%3Fyear&#34;&gt;the number of papers with SPARQL in their title each year&lt;/a&gt;, I used &lt;a href=&#34;http://dev.data2000.no/sgvizler/&#34;&gt;sgvizler&lt;/a&gt; (which I &lt;a href=&#34;https://www.bobdc.com/blog/making-charts-out-of-sparql-qu&#34;&gt;described here last September)&lt;/a&gt; to create the chart of these figures shown above.&lt;/p&gt;
&lt;p&gt;The use of &lt;code&gt;dcterms:subject&lt;/code&gt; and &lt;code&gt;dc:subject&lt;/code&gt; was interesting because these add some pretty classic metadata for navigating content. Listing triples that used either, I included LIMIT 100 to be polite to the server in case these properties were used a lot. They are. &lt;a href=&#34;http://data.semanticweb.org/snorql/?query=SELECT+*+WHERE+%7B%0D%0A++%3Fs+dc%3Asubject+%3Fsubject%0D%0A%7D%0D%0ALIMIT+100&#34;&gt;Doing this with &lt;code&gt;dc:subject&lt;/code&gt;&lt;/a&gt; shows subjects such as &amp;ldquo;ontology alignment&amp;rdquo; and &amp;ldquo;controlled natural language&amp;rdquo; assigned to articles. &lt;a href=&#34;http://data.semanticweb.org/snorql/?query=SELECT+*+WHERE+%7B%0D%0A++%3Fs+dcterms%3Asubject+%3Fsubject%0D%0A%7D%0D%0ALIMIT+100&#34;&gt;Doing it with dcterms:subject&lt;/a&gt; showed it used more the way I might use &lt;code&gt;rdf:type&lt;/code&gt;, indicating that something is an instance of a particular class: for example, &lt;code&gt;swc:Chair&lt;/code&gt; and &lt;code&gt;swc:Delegate&lt;/code&gt; each have &lt;code&gt;dcterms:subject&lt;/code&gt; values of &lt;code&gt;http://dbpedia.org/resource/Role&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;My interest in taxonomies (spurred by my work with TopQuadrant&amp;rsquo;s &lt;a href=&#34;http://www.topquadrant.com/products/topbraid-enterprise-vocabulary-net/&#34;&gt;TopBraid EVN&lt;/a&gt;) led me to look harder at the &lt;code&gt;dc:subject&lt;/code&gt; values. They&amp;rsquo;re string values, and not instances of something like &lt;a href=&#34;http://www.w3.org/TR/2009/REC-skos-reference-20090818/#concepts&#34;&gt;&lt;code&gt;skos:Concept&lt;/code&gt;&lt;/a&gt;, so they have no hierarchical relationship or other metadata themselves. I&amp;rsquo;m guessing that this is because key phrases assigned to conference papers are more of a folksonomy, in which people can make up their own key phrases as they wish. Either some people must have been aware of other key phrases in use or some were added automatically, because, while &lt;a href=&#34;http://data.semanticweb.org/snorql/?query=SELECT+%28count%28DISTINCT+%3Fsubject%29+AS+%3FsubjectCount%29+WHERE+%7B%0D%0A++%3Fs+dc%3Asubject+%3Fsubject%0D%0A%7D%0D%0A%0D%0A&#34;&gt;counting how many different ones there were&lt;/a&gt; came up with 3,594, a &lt;a href=&#34;http://data.semanticweb.org/snorql/?query=SELECT+%28count%28%3Fsubject%29+AS+%3FsubjectCount%29+%3Fsubject+WHERE+%7B%0D%0A++%3Fs+dc%3Asubject+%3Fsubject%0D%0A%7D%0D%0AGROUP+BY+%3Fsubject%0D%0AHAVING+%28%3FsubjectCount+%3E+100%29%0D%0AORDER+BY+DESC%28%3FsubjectCount%29%0D%0A%0D%0A&#34;&gt;query to see which were the most popular&lt;/a&gt; showed that &amp;ldquo;Corpus (creation, annotation, etc.)&amp;rdquo; was far and away the most used, with 506 papers having that subject.&lt;/p&gt;
&lt;p&gt;I could go on. Call me a SPARQL geek, but I really enjoy looking around a data set like this, especially when (as the presence of the papers for ISWC 2014 shows) the data is kept up to date. For people interested in any aspect of semantic web technology, the ability to look around this particular dataset and count up which data falls into which patterns is a great resource.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2014">2014</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
    </item>
    
    <item>
      <title>When did linking begin?</title>
      <link>https://www.bobdc.com/blog/when-did-linking-begin/</link>
      <pubDate>Sun, 20 Jul 2014 09:40:55 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/when-did-linking-begin/</guid>
      
      
      <description><div>Pointing somewhere with a dereferenceable address, in the twelfth (or maybe fifth) century.</div><div>&lt;p&gt;&lt;a href=&#34;http://2008.igem.org/Team:Bologna/Team&#34;&gt;&lt;img id=&#34;id138094&#34; src=&#34;https://www.bobdc.com/img/main/Uniantica.jpg&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; width=&#34;300&#34; alt=&#34;University of Bologna woodcut&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;As I have &lt;a href=&#34;https://www.bobdc.com/blog/a-nineteenth-century-linking-a&#34;&gt;once before&lt;/a&gt;, I&amp;rsquo;m republishing an entry from an O&amp;rsquo;Reilly blog I had from 2003 to 2005 on topics related to linking. I&amp;rsquo;ve been reading up on early concepts of metadata lately—I particularly recommend Ann Blair&amp;rsquo;s &lt;a href=&#34;http://www.amazon.com/exec/obidos/ISBN=0300165390/bobducharmeA/&#34;&gt;Too Much to Know: Managing Scholarly Information before the Modern Age&lt;/a&gt;—and have recently found another interesting reference to the &amp;ldquo;Regulae Iuris&amp;rdquo; book mentioned below. When I wrote this, I was more interested in hypertext issues, and if I was going to change anything to update this piece, I would change the word &amp;ldquo;traverse&amp;rdquo; to &amp;ldquo;dereference,&amp;rdquo; but all the points are still meaningful.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Works about linking often claim that it&amp;rsquo;s been around for thousands of years, and then they give examples that are no more than a few centuries old. I can only find one reference to something more than a thousand years old that qualifies as a link: Peter Stein&amp;rsquo;s 1966 work &amp;ldquo;Regulae Iuris: from Juristic Rules to Legal Maxims&amp;rdquo; describes some late fifth-century lecture notes on a commentary by the legal scholar Ulpian. The notes mention that confirmation of a particular point can be found in the Regulae (&amp;ldquo;Rules&amp;rdquo;) of the third-century Roman jurist (and student of Ulpian) Modestinus, &amp;ldquo;seventeen regulae from the end, in the regula beginning &amp;lsquo;Dotis&amp;rsquo;&amp;hellip;&amp;rdquo;. The citation&amp;rsquo;s explicit identification of the point in the cited work where the material could be found makes it the earliest link that I know of.&lt;/p&gt;
&lt;p&gt;Other than Stein&amp;rsquo;s tantalizing example, all of my research points to the 12th century as the beginning of linking. In a 1938 work on the medieval scholars of Bologna, Italy, who studied what remained of ancient Roman law, Hermann Kantorowicz wrote that in &amp;ldquo;the eleventh century&amp;hellip;titles of law books are cited without indicating the passage, books of the Code are numbered, and the name of the law book is considered a sufficient reference.&amp;rdquo; He uses this to build his argument that that a particular work described in his essay is from the eleventh century and not the twelfth, as other scholars had argued. Apparently, it was common knowledge in Kantoriwicz&amp;rsquo;s field that twelfth century Bolognese scholars would reference a written law using the name of the law book, the rubric heading, and the first few words of the law itself. (Referencing of particular chapters and sections by their first few words was common at the time; the use of chapter, section, and page numbers didn&amp;rsquo;t begin until the following century.)&lt;/p&gt;
&lt;p&gt;Italian legal scholars trying to organize and make sense of the massive amounts of accumulated Roman law contributed a great deal to the mechanics of the cross-referencing that provide many of the earliest examples of linking. The medievalist husband and wife team Richard and Mary Rouse also found some in their research into evolving scholarship techniques in the great universities of England and France (that is, Oxford, Cambridge, and the Sorbonne) and they described Gilbert of Poitiers&amp;rsquo;s innovative twelfth-century mechanism for addressing specific parts of his work on the psalms: he added a selection of Greek letters and other symbols down the side of each page to identify concepts such as the Penitential Psalms or the Passion and Resurrection. If you found the symbol for the Passion and Resurrection in the margin of Psalm 2 with a little 8 next to it (actually, a little &amp;ldquo;viii&amp;rdquo;—they weren&amp;rsquo;t using Arabic numerals quite yet), it would tell you that the next discussion of this concept appeared in Psalm 8. Once you found the same symbol on one of the eighth psalm&amp;rsquo;s pages, you might find a little &amp;ldquo;xii&amp;rdquo; with it to show that the next discussion of the same concept was in Psalm 12. This addressing system made it possible for someone preparing a sermon on the Passion and Resurrection to easily find the relevant material in the Psalms. (In fact, aids to sermon preparation was one of the main forces in the development of new research tools, as clergymen were encouraged to go out and compete with the burgeoning heretic movements for the hearts and minds of the people.)&lt;/p&gt;
&lt;p&gt;The use of information addressing systems really got rolling in the thirteenth-century English and French universities, as scholarly monks developed concordances, subject indexes, and page numbers for both Christian religious works and the classic ancient Greek works that they learned about from their contact with the Arabic world. In fact, this is where Arabic numbers start to appear in Europe; page numbering was one of the early drivers for its adoption.&lt;/p&gt;
&lt;p&gt;Quoting of one work by another was certainly around long before the twelfth century, but if an author doesn&amp;rsquo;t identify an address for his source, his reference can&amp;rsquo;t be traversed, so it&amp;rsquo;s not really a link. Before the twelfth century, religious works had a long tradition of quoting and discussing other works, but in many traditions (for example, Islam, Theravada Buddhism, and Vedic Hinduism) memorization of complete religious works was so common that telling someone where to look within a work was unnecessary. If one Muslim scholar said to another &amp;ldquo;In the words of the Prophet&amp;hellip;&amp;rdquo; he didn&amp;rsquo;t need to name the sura of the Qur&amp;rsquo;an that the quoted words came from; he could assume that his listener already knew. Describing such allusions as &amp;ldquo;links&amp;rdquo; adds heft to claims that linking is thousands of years old, but a link that doesn&amp;rsquo;t provide an address for its destination can&amp;rsquo;t be traversed, and a link that can&amp;rsquo;t be traversed isn&amp;rsquo;t much of a link. And, such claims diminish the tremendous achievements of the 12th-century scholars who developed new techniques to navigate the accumulating amounts of recorded information they were studying.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2014">2014</category>
      
      <category domain="https://www.bobdc.com//categories/legal-publishing">legal publishing</category>
      
      <category domain="https://www.bobdc.com//categories/technology-past">technology, past</category>
      
    </item>
    
    <item>
      <title>Integrating hiphop vocabulary scores with other relevant data—then querying it</title>
      <link>https://www.bobdc.com/blog/integrating-hiphop-vocabulary/</link>
      <pubDate>Tue, 10 Jun 2014 08:41:16 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/integrating-hiphop-vocabulary/</guid>
      
      
      <description><div>With a little JSON + DBpedia integration.</div><div>&lt;p&gt;&lt;a href=&#34;http://rappers.mdaniels.com.s3-website-us-east-1.amazonaws.com/&#34;&gt;&lt;img id=&#34;id117356&#34; src=&#34;https://www.bobdc.com/img/main/rappervocabs.jpg&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;rapper vocabularies chart&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;About a month ago, media outlets ranging from &lt;a href=&#34;http://www.npr.org/2014/05/05/309840473/yeezy-or-the-bard-whos-the-best-wordsmith-in-hip-hop&#34;&gt;NPR&lt;/a&gt; to &lt;a href=&#34;http://www.rollingstone.com/music/news/aesop-rock-gza-have-largest-vocabularies-in-hip-hop-says-new-study-20140505&#34;&gt;Rolling Stone&lt;/a&gt; to Britain&amp;rsquo;s &lt;a href=&#34;http://www.dailymail.co.uk/sciencetech/article-2621331/Wu-Tang-Clan-bigger-vocabulary-SHAKESPEARE-Infographic-ranks-rappers-use-English-language.html&#34;&gt;Daily Mail&lt;/a&gt; reported on how a &amp;ldquo;designer, coder, and data scientist&amp;rdquo; named Matt Daniels had analyzed the number of unique words in samples of work by Shakespeare, Herman Melville, and 85 rappers. He then published a &lt;a href=&#34;http://rappers.mdaniels.com.s3-website-us-east-1.amazonaws.com/&#34;&gt;chart and article&lt;/a&gt; about how their scores related to each other. The highest score went to &lt;a href=&#34;http://aesoprock.com/&#34;&gt;Aesop Rock&lt;/a&gt;, who I thought I&amp;rsquo;d heard of but hadn&amp;rsquo;t—I was confusing him with &lt;a href=&#34;http://www.asvpxrocky.com/&#34;&gt;A$AP Rocky&lt;/a&gt;, who was not included in the survey.&lt;/p&gt;
&lt;p&gt;The chart and discussion were interesting, but what I really wanted to see was the complete list of subjects with their scores, and after searching around the web a bit I found that it was under my nose the whole time—the chart is dynamically generated from JSON embedded in his web page. So, I converted that JSON to RDF, used some SPARQL to retrieve additional data about each rapper from DBpedia such as their record labels, the years their careers began, any subject keywords assigned to them, and the abstracts, or summaries of their careers. (You&amp;rsquo;ll find more details on the procedure for doing this below; the resulting integrated data is available for you to query &lt;a href=&#34;http://www.snee.com/bobdc.blog/files/rapperDataIntegrated.ttl&#34;&gt;here&lt;/a&gt; as a Turtle file.) Combining this additional data with the vocabulary scores let me do some interesting queries and provide an excellent example of how RDF and SPARQL let you perform ad hoc data integration to combine different data sets into aggregates that let you identify new patterns and other information.&lt;/p&gt;
&lt;p&gt;For example, of all record labels with more than four rappers associated with them, I found that MCA&amp;rsquo;s roster had the highest average vocabulary score at 5472.5, well above the overall average of 4624. Who are these artists? Another simple query showed their names and scores:&lt;/p&gt;
&lt;hr /&gt;
&lt;p&gt;GZA              6426
The Roots        5803
Killah Priest    5737
Blackalicious    5480
Big Daddy Kane   4768
Rakim            4621&lt;/p&gt;
&lt;hr /&gt;
&lt;p&gt;(As Daniels pointed out, members of the Wu-Tang Clan tend to have higher scores, so GZA and Killah Priest are a big help to MCA&amp;rsquo;s average score.)&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;dcterms:subject&lt;/code&gt; values assigned to the rappers in DBpedia provide the most interesting opportunities for exploration. In fact, it turned out that I didn&amp;rsquo;t even need to pull down the record label values, because they each have corresponding &lt;code&gt;dcterms:subject&lt;/code&gt; values. For example, each of the artists listed above have a &lt;code&gt;dcterms:subject&lt;/code&gt; value of &lt;a href=&#34;http://dbpedia.org/resource/Category:MCA_Records_artists&#34;&gt;http://dbpedia.org/resource/Category:MCA_Records_artists&lt;/a&gt; along with their other &lt;code&gt;dcterms:subject&lt;/code&gt; values.&lt;/p&gt;
&lt;p&gt;Of the subject categories with more than four rappers, here are several interesting ones with high average scores, ranked by number of members in the category:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;                                         count   avg score
&lt;/code&gt;&lt;/pre&gt;
&lt;hr /&gt;
&lt;p&gt;Members of the Nation of Gods and Earths   13      5117
Underground rappers                        8       5849
People from Brooklyn                       7       5323
MCA Records artists                        7       5401
Rappers from Long Island                   6       5160
Alternative hip hop groups                 5       5286
Wu-Tang Clan members                       5       5611&lt;/p&gt;
&lt;p&gt;I hadn&amp;rsquo;t heard of the Nation of Gods and Earths, also known as the &lt;a href=&#34;https://en.wikipedia.org/wiki/Five-Percent_Nation&#34;&gt;Five-Percent Nation&lt;/a&gt;; again, we have Wu-Tang skewing the numbers up. After I saw the high averages for &amp;ldquo;People from Brooklyn&amp;rdquo; and &amp;ldquo;Rappers from Long Island&amp;rdquo; but no mention of Staten Island, I clicked around and found out that only about half of Wu-Tang came from the borough in which they were based, which I never knew before.&lt;/p&gt;
&lt;p&gt;Here are some interesting low scoring categories. Again, remember that the overall average score is 4624:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;                                                   count   avg score
&lt;/code&gt;&lt;/pre&gt;
&lt;hr /&gt;
&lt;p&gt;Participants in American reality television series   8       4108
People convicted of drug offenses                    7       3741
American philanthropists                             6       4022
American shooting survivors                          5       4025
American fashion businesspeople                      5       4110&lt;/p&gt;
&lt;p&gt;Of course, the data collection itself isn&amp;rsquo;t very scientific; what constitutes an &amp;ldquo;alternative&amp;rdquo; rapper? A less successful artist popular with music nerds? &amp;ldquo;People convicted of drug offenses&amp;rdquo; seems like a more cut and dried category, but remember that data from a Wikipedia page is not an authoritative source for such facts.&lt;/p&gt;
&lt;p&gt;As with the list of MCA artists above, a simple query of the data can tell you who falls in each of these categories, so pull down the data from the link above and have fun querying it. If you&amp;rsquo;re interested in how I did the integration, read on.&lt;/p&gt;
&lt;h2 id=&#34;id119932&#34;&gt;Integrating the data&lt;/h2&gt;
&lt;p&gt;Upon seeing that Daniels includes a score for Ghostface Killah, it&amp;rsquo;s easy to ask DBpedia for all the &lt;code&gt;{ &amp;lt;http://dbpedia.org/resource/Ghostface_Killah&amp;gt; ?p ?o }&lt;/code&gt; triples. It&amp;rsquo;s not as simple for many other artists, though, for several reasons:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Some rappers use stage names that are &lt;a href=&#34;https://en.wikipedia.org/wiki/Common_(entertainer)&#34;&gt;common&lt;/a&gt; phrases and words, so putting that name at the end of &amp;ldquo;&lt;a href=&#34;http://dbpedia.org/resource/%22&#34;&gt;http://dbpedia.org/resource/&amp;quot;&lt;/a&gt; won&amp;rsquo;t necessarily get you data about them.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Tricky spellings and punctuation are pretty common in hiphop names. For example, Jay Z originally spelled his name &lt;a href=&#34;http://rapgenius.com/Jay-z-dead-presidents-ii-lyrics#note-7214&#34;&gt;with a hyphen&lt;/a&gt; but later &lt;a href=&#34;http://www.newsday.com/entertainment/pop-cult-1.811972/jay-z-drops-hyphen-from-his-name-reports-say-1.5724389&#34;&gt;dropped it&lt;/a&gt;, much as LexisNexis did &lt;a href=&#34;http://newsbreaks.infotoday.com/NewsBreaks/LexisNexis-Undertakes-Modest-Rebranding-Effort-17590.asp&#34;&gt;twelve years earlier&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Daniels sometimes included qualifications in names (&amp;ldquo;GZA (only solo albums)&amp;rdquo;), included or didn&amp;rsquo;t include the word &amp;ldquo;The&amp;rdquo; that was in the DBpedia name (&amp;ldquo;Roots&amp;rdquo; vs. &amp;ldquo;The Roots&amp;rdquo;) or just spelled their names wrong, such as omitting the final &amp;ldquo;t&amp;rdquo; from &amp;ldquo;Missy Elliott.&amp;rdquo;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Dropping parenthesized qualifications was easy enough. Even better, DBpedia often has the data necessary to find the page based on a slightly wrong name, and the techniques I described in &lt;a href=&#34;https://www.bobdc.com/blog/normalizing-company-names-with&#34;&gt;Normalizing company names with SPARQL and DBpedia&lt;/a&gt; worked for most of them. This is not a minor point: &lt;em&gt;even when the names aren&amp;rsquo;t quite right, sending the right SPARQL queries to DBpedia can still retrieve valuable data about them.&lt;/em&gt; This has applications in all kinds of domains.&lt;/p&gt;
&lt;p&gt;You can find the scripts and queries mentioned below in &lt;a href=&#34;http://www.snee.com/bobdc.blog/files/rapperrdf.zip&#34;&gt;rapperrdf.zip&lt;/a&gt;. The rapperdata.js file is taken directly from the source of Daniels&amp;rsquo; web page, and loads his data into an array. Another JavaScript file, rappervocab.js, loads rapperdata.js and outputs Turtle RDF of the rapper&amp;rsquo;s scores and the Daniels versions of their names. (If you&amp;rsquo;re using the &lt;a href=&#34;http://www.topquadrant.com/technology/topbraid-platform-overview/&#34;&gt;TopBraid platform&lt;/a&gt; and working with JSON, there&amp;rsquo;s an excellent SPARQLMotion module to automate the conversion of any JSON to RDF.) I used Rhino to run the JavaScript, as I described in &lt;a href=&#34;https://www.bobdc.com/blog/javascript-from-the-command-li&#34;&gt;Javascript from the command line&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Another short script called rapperValuesList.js reads the same data and creates the list of names that I inserted as a VALUES list into the retrieveRapperData.rq SPARQL query that actually retrieves the relevant data from DBpedia. (VALUES is a great SPARQL technique for saying &amp;ldquo;I need data about this list of specific things,&amp;rdquo; as I&amp;rsquo;ve written &lt;a href=&#34;https://www.bobdc.com/blog/using-values-to-map-values-in&#34;&gt;here before&lt;/a&gt;.) This SPARQL query uses the SERVICE keyword to send the request off to DBpedia and does a CONSTRUCT to save the triples. It uses the &amp;ldquo;Normalizing company names&amp;rdquo; trick mentioned above to see if the Daniels name with the parenthesized part stripped out is either the &amp;ldquo;official&amp;rdquo; &lt;code&gt;rdfs:label&lt;/code&gt; value for a resource or otherwise attached to something that gets redirected to that.&lt;/p&gt;
&lt;p&gt;Of the 81 artists in Daniels&amp;rsquo; list, there were 12 whose names couldn&amp;rsquo;t be looked up even with the redirect trick in retrieveRapperData.rq. To account for these, I created extraRapperDanielsNames.ttl with a text editor to link Daniels&amp;rsquo; names for these 12 extra rappers to their DBpedia resource URIs such as &lt;a href=&#34;http://dbpedia.org/resource/Common_(entertainer)&#34;&gt;http://dbpedia.org/resource/Common_(entertainer)&lt;/a&gt;, which I had to look up manually. The retrieveExtraRapperData.rq query then uses that to retrieve the same data about those 12.&lt;/p&gt;
&lt;p&gt;The queries only retrieve the start year, record label, abstract, and subjects about the artists because they all had those values. Retrieving data that only some of them have (such as the birth year, which you don&amp;rsquo;t have for bands like The Roots) would mean using the OPTIONAL keyword, and DBpedia said that my query would take too long when I tried that—I&amp;rsquo;m sure the big VALUES part has a lot to do with that.&lt;/p&gt;
&lt;p&gt;The integrateRapperData.rq query reads the extraRapperDanielsNames.ttl data and the data created by rappervocab.js, retrieveRapperData.rq, and retrieveExtraRapperData.rq, and then creates the final product: rapperDataIntegrated.ttl.&lt;/p&gt;
&lt;h2 id=&#34;id120061&#34;&gt;Querying the data&lt;/h2&gt;
&lt;p&gt;Next was the fun part: executing queries to explore that integrated data. The zip file includes queries to find the following information from rapperDataIntegrated.ttl:&lt;/p&gt;
&lt;hr /&gt;
&lt;p&gt;&lt;strong&gt;averageScore.rq&lt;/strong&gt;          overall average Daniels score
&lt;strong&gt;averageScoreByLabel.rq&lt;/strong&gt;   average score by record label for labels with more than four artists associated with them
&lt;strong&gt;subjectReport.rq&lt;/strong&gt;         average score by subject associated with the rappers for all subjects (like &amp;ldquo;Underground rappers&amp;rdquo; and &amp;ldquo;American philanthropists&amp;rdquo;)
&lt;strong&gt;MCAArtists.rq&lt;/strong&gt;            MCA artists
&lt;strong&gt;JamaicanDescent.rq&lt;/strong&gt;       the name, Daniels score, and abstract of &amp;ldquo;American rappers of Jamaican descent&amp;rdquo;&lt;/p&gt;
&lt;hr /&gt;
&lt;p&gt;That last one can provide a template for the creation of other queries about who falls into which subject categories.&lt;/p&gt;
&lt;p&gt;Linking this data with other data about the artists from some of the blue parts of the &lt;a href=&#34;http://lod-cloud.net/versions/2011-09-19/lod-cloud_colored.html&#34;&gt;Linked Data Cloud&lt;/a&gt; such as DBTune or the BBC would provide some even more interesting possibilities. As one taste, &lt;a href=&#34;http://linkedbrainz.org/sparql?default-graph-uri=&amp;amp;query=SELECT+*+WHERE+%7B%0D%0A%23++%3Fs+foaf%3Aname+%22Missy+Elliott%22+.+%0D%0A%3Chttp%3A%2F%2Fmusicbrainz.org%2Fartist%2Fa0b8cb9e-7532-45fe-a74c-30e7c4009a39%23_%3E+%3Fp+%3Fo+.+%0D%0A%7D%0D%0A&amp;amp;format=text%2Fhtml&amp;amp;timeout=0&amp;amp;debug=on&#34;&gt;this link&lt;/a&gt; has a SPARQL query that retrieves all the MusicBrainz data about Missy Elliott.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2014">2014</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
    </item>
    
    <item>
      <title>&#34;Experience in SPARQL a plus&#34;</title>
      <link>https://www.bobdc.com/blog/experience-in-sparql-a-plus/</link>
      <pubDate>Fri, 09 May 2014 09:06:17 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/experience-in-sparql-a-plus/</guid>
      
      
      <description><div>The long tail story of SPARQL success: appearances in job postings.</div><div>&lt;img id=&#34;id116278&#34; src=&#34;https://www.bobdc.com/img/main/sparqlcompanies.jpg&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;logos of companies hiring SPARQL talent&#34;/&gt;
&lt;p&gt;When people talk about semantic web or linked data success stories, they usually talk about the big, well-known projects such as those at BestBuy, the BBC, NASA, life sciences companies, the whole vocabulary and taxonomy management industry, and the growing use of DBpedia by a range of companies. I&amp;rsquo;ve always found that a company&amp;rsquo;s job postings provide interesting clues about their potential technology directions, and the increasing references to SPARQL in these postings is another positive trend. These fly further under most radars than the projects mentioned earlier, but their volume adds up to a real long tail, in the &lt;a href=&#34;https://en.wikipedia.org/wiki/Long_tail&#34;&gt;Chris Anderson&lt;/a&gt; sense of the word.&lt;/p&gt;
&lt;p&gt;For a while now, I&amp;rsquo;ve had a &lt;a href=&#34;http://www.indeed.com/jobs?q=sparql=&#34;&gt;saved search for appearances of &amp;ldquo;SPARQL&amp;rdquo;&lt;/a&gt; on the job posting site indeed.com so that I could occasionally mention companies looking for SPARQL experience in the &lt;a href=&#34;https://twitter.com/learningsparql&#34;&gt;Twitter feed&lt;/a&gt; for my book &lt;a href=&#34;http://www.learningsparql.com/&#34;&gt;Learning SPARQL&lt;/a&gt;. I&amp;rsquo;ve mostly limited it to high-profile, brand name companies because there are really too many to mention them all; last Saturday&amp;rsquo;s email from indeed.com listed six positions ranging from Xerox (who has two positions open) to Axius Technologies in Hoboken, who I&amp;rsquo;ve never heard of.&lt;/p&gt;
&lt;p&gt;I thought it would be fun to review the names of the companies I&amp;rsquo;ve tweeted about and make a list of the most well-known ones, and as you can see below, it&amp;rsquo;s an impressive list. Sometimes these companies bury the mention of SPARQL deep down in their descriptions of duties for the Java developer or &amp;ldquo;solution architect&amp;rdquo; that they seek, but others, like Xerox, say right in the job title that they want a &lt;a href=&#34;http://www.indeed.com/viewjob?jk=b35171606a45dbd3&#34;&gt;Database-SPARQL Developer&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;What does this mean? It means that the use of RDF and SPARQL is really getting traction at the grass roots level, as large and small companies move beyond side projects for investigating the technology to projects that require RDF and SPARQL enough to influence their hiring budgets. That&amp;rsquo;s some nice progress.&lt;/p&gt;
&lt;hr /&gt;
&lt;p&gt;Accenture                            Morgenthaler Life Science&lt;/p&gt;
&lt;p&gt;Amazon                               NBC Entertainment&lt;/p&gt;
&lt;p&gt;AstraZeneca                          Nokia&lt;/p&gt;
&lt;p&gt;Bank of America                      Northrop Grumman&lt;/p&gt;
&lt;p&gt;Boeing                               Orbis&lt;/p&gt;
&lt;p&gt;Boston Public Library                Pearson&lt;/p&gt;
&lt;p&gt;Children&amp;rsquo;s Hospital of Los Angeles   Pitney Bowes&lt;/p&gt;
&lt;p&gt;Columbia University&amp;rsquo;s Lamont-\       Reed Elsevier
Doherty Earth Observatory&lt;/p&gt;
&lt;p&gt;Comcast                              SAIC&lt;/p&gt;
&lt;p&gt;Craig Venter Institute               SAP&lt;/p&gt;
&lt;p&gt;Deloitte                             Sears&lt;/p&gt;
&lt;p&gt;Elsevier                             Siemens&lt;/p&gt;
&lt;p&gt;Ely Lilly                            Socrata&lt;/p&gt;
&lt;p&gt;Goldman Sachs                        Sony&lt;/p&gt;
&lt;p&gt;Google                               Stanford University&lt;/p&gt;
&lt;p&gt;Harvard Medical School               Thomson Reuters&lt;/p&gt;
&lt;p&gt;IBM Global Business Services         Turner Broadcasting&lt;/p&gt;
&lt;p&gt;JP Morgan Chase                      Vodafone&lt;/p&gt;
&lt;p&gt;Lockheed Martin                      Xerox&lt;/p&gt;
&lt;p&gt;Los Alamos National Laboratory       Yahoo&lt;/p&gt;
&lt;p&gt;Mayo Clinic                          Yale University library&lt;/p&gt;
&lt;p&gt;Microsoft                            Zoominfo&lt;/p&gt;
&lt;hr /&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2014">2014</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
    </item>
    
    <item>
      <title>RDF lists and SPARQL</title>
      <link>https://www.bobdc.com/blog/rdf-lists-and-sparql/</link>
      <pubDate>Mon, 21 Apr 2014 08:35:38 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/rdf-lists-and-sparql/</guid>
      
      
      <description><div>Not great, but not terrible, and a bit better with SPARQL 1.1</div><div>&lt;blockquote id=&#34;id117366&#34; class=&#34;pullquote&#34;&gt;I have yet to ever say to myself &#34;what I need here is an RDF collection, which I will implement with lots of ``rdf:first`` and ``rdf:rest`` triples!&#34;&lt;/blockquote&gt;
&lt;p&gt;That fact that RDF expresses everything using the same simple three-part data structure has usually been a great strength, but in the case of ordered lists (or &lt;a href=&#34;http://www.w3.org/TR/rdf11-mt/#rdf-collections&#34;&gt;RDF collections&lt;/a&gt;) it&amp;rsquo;s pretty messy. The &lt;a href=&#34;http://www.w3.org/TR/rdf11-mt/#rdf-collections&#34;&gt;specification&lt;/a&gt; defines a LISP-like way of using triples to identify, for each position in a list, what the first member is and what list has the rest of them after that. When saying &amp;ldquo;and here are the rest&amp;rdquo; for every member of the list, you don&amp;rsquo;t want to have to come up with a unique URI for each one, so datasets typically use blank nodes for these placeholders, and you can end up with a lot of them.&lt;/p&gt;
&lt;p&gt;Putting all this together, you could represent the list (&amp;ldquo;one&amp;rdquo;, &amp;ldquo;two&amp;rdquo;, &amp;ldquo;three&amp;rdquo;, &amp;ldquo;four&amp;rdquo;, &amp;ldquo;five&amp;rdquo;) with these triples:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;@prefix rdf: &amp;lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#&amp;gt; . 
@prefix d:   &amp;lt;http://learningsparql.com/ns/data#&amp;gt; .


d:myList d:contents _:b1 .


_:b1 rdf:first &amp;quot;one&amp;quot; .
_:b1 rdf:rest _:b2 .


_:b2 rdf:first &amp;quot;two&amp;quot; .
_:b2 rdf:rest _:b3 .


_:b3 rdf:first &amp;quot;three&amp;quot; .
_:b3 rdf:rest _:b4 .


_:b4 rdf:first &amp;quot;four&amp;quot; .
_:b4 rdf:rest _:b5 .


_:b5 rdf:first &amp;quot;five&amp;quot; .
_:b5 rdf:rest rdf:nil .
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Turtle and SPARQL include syntax that lets you write out a more human-readable version without explicit blank nodes and with the list represented as, well, a list. The following is the equivalent of the example above:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;@prefix d: &amp;lt;http://learningsparql.com/ns/data#&amp;gt; .


d:myList d:contents (&amp;quot;one&amp;quot; &amp;quot;two&amp;quot; &amp;quot;three&amp;quot; &amp;quot;four&amp;quot; &amp;quot;five&amp;quot;) 
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;To do much with these lists, though, especially in SPARQL, you still have to think in terms of &lt;code&gt;rdf:first&lt;/code&gt; and &lt;code&gt;rdf:rest&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;To be honest, I&amp;rsquo;ve never found much need to do anything with RDF lists, but after seeing recent references to them—or, in Manu Sporny&amp;rsquo;s case, &lt;a href=&#34;http://manu.sporny.org/2014/json-ld-origins-2/&#34;&gt;the lack of them&lt;/a&gt;—I thought I&amp;rsquo;d play around a bit to see how difficult it was in SPARQL to do four basic list tasks:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Retrieve the Nth member of a list&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Retrieve all the members of a list&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Insert a new member at a specified position&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Delete a member from a specified position&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;em&gt;Update after posting my original entry: Andy Seaborne pointed me to his 2011 blog entry &lt;a href=&#34;http://seaborne.blogspot.co.uk/2011/03/updating-rdf-lists-with-sparql.html&#34;&gt;Updating RDF Lists with SPARQL&lt;/a&gt;, which includes SPARQL queries covering several additional cases. Also, more from Joshua Taylor at &lt;a href=&#34;http://stackoverflow.com/questions/17523804/is-it-possible-to-get-the-position-of-an-element-in-an-rdf-collection-in-sparql/17530689#17530689&#34;&gt;stackoverflow&lt;/a&gt;, thanks to Paula Gearon.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;I found that SPARQL 1.1&amp;rsquo;s property paths made it easier to concisely address a specific list member without lots of triple patterns, and of course without SPARQL 1.1 update there would be no insertion or deletion of list members. (I&amp;rsquo;m happy to take suggestions on improving the queries.)&lt;/p&gt;
&lt;h2 id=&#34;id117303&#34;&gt;Retrieving the Nth member&lt;/h2&gt;
&lt;p&gt;The following query retrieves the third member from the list defined above:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;PREFIX d: &amp;lt;http://learningsparql.com/ns/data#&amp;gt;


SELECT ?item
WHERE {
  d:myList d:contents/rdf:rest{2}/rdf:first ?item
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;If you think of it as zero-based counting, it&amp;rsquo;s simple: you just plug the number of the member you&amp;rsquo;re interested in into the curly braces. Using ARQ, the query returns this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;-----------
| item    |
===========
| &amp;quot;three&amp;quot; |
-----------
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;But&amp;hellip; after writing and testing that, I remembered that the ability to specify a specific number of repeated property path steps by putting a number between curly braces was dropped in the &lt;a href=&#34;http://www.w3.org/TR/2012/WD-sparql11-query-20120724&#34;&gt;24 July 2012&lt;/a&gt; Working Draft of the SPARQL 1.1 Query spec, so it&amp;rsquo;s not proper SPARQL. It works just the same when you replace &lt;code&gt;rdf:rest{2}&lt;/code&gt; with &lt;code&gt;rdf:rest/rdf:rest&lt;/code&gt;, which is a minor change, but specifying every step like that will be a pain if you want to retrieve the twenty-third member of the list.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve replaced the &lt;code&gt;rdf:rest{2}&lt;/code&gt; that was in the original draft of the insert and delete queries below with &lt;code&gt;rdf:rest/rdf:rest&lt;/code&gt;.&lt;/p&gt;
&lt;h2 id=&#34;id119694&#34;&gt;Retrieving all the members&lt;/h2&gt;
&lt;p&gt;The following retrieves all of the list items. As an added bonus, ARQ displayed them in order, but that was just luck, and not something to count on, because stored triples have no order.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;PREFIX d:   &amp;lt;http://learningsparql.com/ns/data#&amp;gt;
PREFIX rdf: &amp;lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#&amp;gt;


SELECT ?item
WHERE {
  d:myList d:contents/rdf:rest*/rdf:first ?item
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Of course, changing the first line to &lt;code&gt;SELECT (count(?item) AS ?items)&lt;/code&gt; would give you the number of members in the list, which is also handy.&lt;/p&gt;
&lt;h2 id=&#34;id119712&#34;&gt;Inserting a new member at a specific position&lt;/h2&gt;
&lt;p&gt;The main work is breaking the link where the insertion will take place and then linking the new member in.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;PREFIX d: &amp;lt;http://learningsparql.com/ns/data#&amp;gt;
DELETE {
  ?insertionPoint rdf:rest ?rest . 
}
INSERT {
  _:b1 rdf:first &amp;quot;threePointFive&amp;quot; ; rdf:rest ?rest . 
  ?insertionPoint rdf:rest _:b1 . 
}
WHERE {
  d:myList d:contents/rdf:rest/rdf:rest/rdf:first ?item .
  ?insertionPoint rdf:first ?item ; rdf:rest ?rest . 
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Here is how the dataset looks after using TopBraid Composer to run this query on the data above:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;@prefix d: &amp;lt;http://learningsparql.com/ns/data#&amp;gt; .
d:myList
  d:contents (
      &amp;quot;one&amp;quot;
      &amp;quot;two&amp;quot;
      &amp;quot;three&amp;quot;
      &amp;quot;threePointFive&amp;quot;
      &amp;quot;four&amp;quot;
      &amp;quot;five&amp;quot;
    ) ;
.
&lt;/code&gt;&lt;/pre&gt;
&lt;h2 id=&#34;id119730&#34;&gt;Deleting a member from a specified position&lt;/h2&gt;
&lt;p&gt;The following deletes the third item from the list. As with the previous query, the main work is breaking the link and creating a new one across the gap where the deleted item was:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;PREFIX d: &amp;lt;http://learningsparql.com/ns/data#&amp;gt;
DELETE {
  ?previousMember rdf:rest ?deletionPoint .
  ?deletionPoint rdf:rest ?rest . 
  ?s ?p ?item   . 
  ?item ?s ?p . 
}
INSERT {
  ?previousMember rdf:rest ?rest.
}
WHERE {
  d:myList d:contents/rdf:rest/rdf:rest/rdf:first ?item .
  ?deletionPoint rdf:first ?item ;  rdf:rest ?rest . 
  ?previousMember rdf:rest ?deletionPoint .
  ?s ?p ?item . 
  OPTIONAL { ?item ?s ?o . }
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Running this update request after running the insertion one before it results in a dataset that looks like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;@prefix d: &amp;lt;http://learningsparql.com/ns/data#&amp;gt; .
d:myList
  d:contents (
      &amp;quot;one&amp;quot;
      &amp;quot;two&amp;quot;
      &amp;quot;threePointFive&amp;quot;
      &amp;quot;four&amp;quot;
      &amp;quot;five&amp;quot;
    ) ;
.
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;So we know it worked.&lt;/p&gt;
&lt;h2 id=&#34;id119818&#34;&gt;Taking it further&lt;/h2&gt;
&lt;p&gt;I won&amp;rsquo;t remember the syntax of these queries without reviewing them as written here, but I know that I can copy them from here and paste them elsewhere with minor modifications to perform these basic list manipulation goals.&lt;/p&gt;
&lt;p&gt;On the other hand, in the work I&amp;rsquo;ve done with RDF and SPARQL, I have yet to say to myself &amp;ldquo;what I need here is an RDF collection, which I will implement with lots of &lt;code&gt;rdf:first&lt;/code&gt; and &lt;code&gt;rdf:rest&lt;/code&gt; triples!&amp;rdquo; So, the exercise above seems a bit academic. (In fact, my original goals above look like a homework assignment; for extra credit, modify the queries so that the targets can be specified based on their values and not their positions.) If I need to order some instances in RDF, I&amp;rsquo;m more likely to give them some property I can use to sort them. I&amp;rsquo;d love to hear pointers from anyone about places where using &lt;code&gt;rdf:first&lt;/code&gt; and &lt;code&gt;rdf:rest&lt;/code&gt; addressed a data modeling issue better than any alternative would.&lt;/p&gt;
&lt;p&gt;Still, the queries above show that maybe RDF collections are not as bad as I originally thought, and that SPARQL 1.1 property paths can make certain tasks more straightforward to achieve.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2014">2014</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
    </item>
    
    <item>
      <title>Easier querying of strings with RDF 1.1</title>
      <link>https://www.bobdc.com/blog/easier-querying-of-strings-wit/</link>
      <pubDate>Sat, 08 Mar 2014 10:09:38 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/easier-querying-of-strings-wit/</guid>
      
      
      <description><div>In which a spoonful of syntactic sugar makes the string querying go down a bit easier.</div><div>&lt;blockquote id=&#34;id115168&#34; class=&#34;pullquote&#34;&gt;If it looks and walks and talks like a string...&lt;/blockquote&gt;
&lt;p&gt;The &lt;a href=&#34;http://www.w3.org/blog/news/archives/3701&#34;&gt;recent publication of RDF 1.1 specifications&lt;/a&gt; fifteen years and three days after &lt;a href=&#34;http://www.w3.org/TR/1999/REC-rdf-syntax-19990222/&#34;&gt;RDF 1.0&lt;/a&gt; became a Recommendation has not added many new features to RDF, although it has made a few new syntaxes official, and there were no new documents about the SPARQL query language. The new Recommendations did clean up a few odds and ends, and one bit of cleanup officially removes an annoying impediment to straightforward querying of strings.&lt;/p&gt;
&lt;p&gt;Near the beginning of chapter 5 of my book &lt;a href=&#34;http://www.learningsparql.com/&#34;&gt;Learning SPARQL&lt;/a&gt;, I wrote&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Discussions are currently underway at the W3C about potentially doing away with the concept of the plain literal and just making &lt;code&gt;xsd:string&lt;/code&gt; the default datatype, so that &lt;code&gt;&amp;quot;this&amp;quot;&lt;/code&gt; and &lt;code&gt;&amp;quot;this&amp;quot;^^xsd:string&lt;/code&gt; would mean the same thing.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;When dealing with the difference between simple literals and those that were explicitly cast as &lt;code&gt;xsd:string&lt;/code&gt; values, casting in one direction or the other with the &lt;code&gt;str()&lt;/code&gt; and &lt;code&gt;xsd:string()&lt;/code&gt; functions gave us a workaround, but once all the query engines catch up with RDF 1.1 we won&amp;rsquo;t have to work around this anymore.&lt;/p&gt;
&lt;p&gt;The 2011 document &lt;a href=&#34;http://www.w3.org/2011/rdf-wg/wiki/StringLiterals/LanguageTaggedLiteralDatatypeProposal&#34;&gt;StringLiterals/LanguageTaggedStringDatatypeProposal&lt;/a&gt; describes the problem in more detail, but here&amp;rsquo;s a short example. Imagine that you want to query for the author of one of the works listed in these triples:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;@prefix dc:  &amp;lt;http://purl.org/dc/elements/1.1/&amp;gt; .
@prefix xsd: &amp;lt;http://www.w3.org/2001/XMLSchema#&amp;gt; .
@prefix ls:  &amp;lt;http://learningsparql.com/id#&amp;gt; . 


ls:i1001 dc:creator &amp;quot;Jane Austen&amp;quot; ;
         dc:title &amp;quot;Persuasion&amp;quot; .
ls:i1002 dc:creator &amp;quot;Nathaniel Hawthorne&amp;quot; ;
         dc:title &amp;quot;The Scarlet Letter&amp;quot;^^xsd:string .
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;For example, let&amp;rsquo;s say you want to know who wrote &amp;ldquo;The Scarlet Letter&amp;rdquo; and you enter this query:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;PREFIX dc:  &amp;lt;http://purl.org/dc/elements/1.1/&amp;gt; 
PREFIX xsd: &amp;lt;http://www.w3.org/2001/XMLSchema#&amp;gt; 


SELECT ?author WHERE { 
  ?work  dc:title &amp;quot;The Scarlet Letter&amp;quot; ; 
         dc:creator ?author . 
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Using a SPARQL engine that was strictly compliant with RDF 1.0, this query wouldn&amp;rsquo;t find anything, because the &lt;code&gt;dc:title&lt;/code&gt; value of &lt;code&gt;ls:i1002&lt;/code&gt; is the typed literal &lt;code&gt;&amp;quot;The Scarlet Letter&amp;quot;^^xsd:string&lt;/code&gt; and not the untyped string that the query was looking for. If a similar query asked for the author of &lt;code&gt;&amp;quot;Persuasion&amp;quot;^^xsd:string&lt;/code&gt;, it wouldn&amp;rsquo;t find anything, because the query is looking for a string that has been explicitly typed as an &lt;code&gt;xsd:string&lt;/code&gt;, and in the data the value is an untyped literal.&lt;/p&gt;
&lt;p&gt;This, in fact, is what happens with release 2.6.4 of Sesame, the version currently on my hard disk. Sesame is now up to &lt;a href=&#34;http://www.openrdf.org/news.jsp#sesame-2.7.10&#34;&gt;2.7.10&lt;/a&gt;, and, seeing the change coming, may have accounted for it by now. ARQ and the TopBraid platform stopped distinguishing between simple literals and typed string literals several years ago.&lt;/p&gt;
&lt;p&gt;Treating the simple literal and typed string versions of a string as the same thing is now officially what&amp;rsquo;s supposed to happen. According to &lt;a href=&#34;http://www.w3.org/TR/2014/REC-rdf11-concepts-20140225/#section-Graph-Literal&#34;&gt;section 3.3&lt;/a&gt; of the new RDF 1.1 Concepts and Abstract Syntax Recommendation, &amp;ldquo;Simple literals are syntactic sugar for abstract syntax literals with the datatype IRI &lt;a href=&#34;http://www.w3.org/2001/XMLSchema&#34;&gt;http://www.w3.org/2001/XMLSchema&lt;/a&gt;#string&amp;rdquo;. In other words, if it looks and walks and talks like a string, treat it like a string.&lt;/p&gt;
&lt;p&gt;With this update, there&amp;rsquo;s nothing to hold back other SPARQL engines from treating simple literals and typed string literals the same way. This is going to make the development of a lot of SPARQL queries a little bit simpler.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2014">2014</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
    </item>
    
    <item>
      <title>Querying my own MP3, image, and other file metadata with SPARQL</title>
      <link>https://www.bobdc.com/blog/querying-my-own-mp3-image-and/</link>
      <pubDate>Sun, 09 Feb 2014 11:31:20 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/querying-my-own-mp3-image-and/</guid>
      
      
      <description><div>And a standard part of Ubuntu.</div><div>&lt;p&gt;Ubuntu has a utility called &lt;a href=&#34;https://wiki.ubuntu.com/Tracker&#34;&gt;Tracker&lt;/a&gt; that makes it easy to search your hard disk, a bit like the old &lt;a href=&#34;https://en.wikipedia.org/wiki/Google_Desktop&#34;&gt;Google Desktop&lt;/a&gt; with a few extra features. One extra feature ranks among the coolest SPARQL applications I&amp;rsquo;ve ever seen: the ability to execute SPARQL queries against data extracted from files on your hard disk.&lt;/p&gt;
&lt;img id=&#34;id115177&#34; src=&#34;https://www.bobdc.com/img/main/IMG_5257.jpg&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;Anarchy paper lantern&#34; width=&#34;220&#34;/&gt;
&lt;p&gt;To install it, I did a &lt;code&gt;sudo apt-get install&lt;/code&gt; of &lt;code&gt;tracker-gui&lt;/code&gt; to get the base parts of tracker and then did a similar installation of &lt;code&gt;tracker-utils&lt;/code&gt; to get the SPARQL query utility. Next, I added the Ubuntu applications &amp;ldquo;Desktop search&amp;rdquo; and &amp;ldquo;search and indexing&amp;rdquo; as applications and used the latter to search and index 94 GB of MP3s and some image files. The indexing took a few hours. (&lt;code&gt;tracker-control -S&lt;/code&gt; was a handy command for checking on the indexing progress.) The worldofgnome.org page &lt;a href=&#34;http://worldofgnome.org/indexing-preferences-in-gnome-3-8/&#34;&gt;Indexing preferences in GNOME 3.8&lt;/a&gt; was helpful for understanding the indexing options.&lt;/p&gt;
&lt;p&gt;Once the file metadata is indexed, the &lt;code&gt;tracker-sparql&lt;/code&gt; command-line utility lets you query it. For example, the following runs the query stored in bea.spq against the metadata:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;tracker-sparql -f bea.spq
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;(The &lt;a href=&#34;http://manpages.ubuntu.com/manpages/natty/man1/tracker-sparql.1.html&#34;&gt;tracker-sparql help&lt;/a&gt; said that I was also supposed to include &lt;code&gt;-q&lt;/code&gt; to show that it was a SPARQL query, but it seemed to work fine without this command line switch.) The following shows bea.spq, a query for artist names that begin with &amp;ldquo;Bea&amp;rdquo;, allowing for an optional &amp;ldquo;The &amp;quot; before that:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;PREFIX nmm: &amp;lt;http://www.tracker-project.org/temp/nmm#&amp;gt;
SELECT DISTINCT ?artistName WHERE {
        ?artist a nmm:Artist . 
       ?artist nmm:artistName ?artistName .
       FILTER(regex(?artistName,&amp;quot;^(The )?Bea&amp;quot;))
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Here is the output:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Results:
  Beachwood Sparks
  Beastie Boys/Beck/Dust Brothers
  Beastie Boys/Dust Brothers
  Beatles
  The Beach Boys
  The Beastie Boys
  The Beatles
  The Beatniks
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;One frustrating thing about tracker-sparql is that it rejects certain queries because, as it tells us, &amp;ldquo;Unrestricted predicate variables not supported.&amp;rdquo; In my experience, this meant that you couldn&amp;rsquo;t have a variable in a triple pattern&amp;rsquo;s predicate position if there was another one in the subject position. So, for example, while I know that the Dust Brothers have worked with the Beastie Boys and Beck separately, I&amp;rsquo;ve never heard of all of them working together, but I couldn&amp;rsquo;t enter a query to see which work was created by an artist with a nmm:artistName value of &amp;ldquo;Beastie Boys/Beck/Dust Brothers&amp;rdquo;. I did try dc:contributor, nmm:performer, and some other properties that were used to connect an artist to a work, but with no luck. (My guess: it was some sort of remix that combined a few Dust Brothers works.)&lt;/p&gt;
&lt;p&gt;This was a fun query, asking what values of &amp;ldquo;genre&amp;rdquo; were stored in my MP3s:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;SELECT DISTINCT ?genre WHERE
{
  ?work nfo:genre ?genre
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The results:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Results:
  Jazz
  Rock
  Classical
  New Wave
  Avantgarde
  Pop
  Salsa
  Blues
  Soundtrack
  RETRO SWING
  Swing
  Country
  Other
  Sound Clip
  jazz
  Latin
  Lo-Fi
  Rock &amp;amp; Roll
  Hip-Hop
  Techno-Industrial
  Euro-Techno
  Booty Bass
  Alternative
  Reggae
  Indian
  Podcast
  Electronic
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This can lead to a real rabbit hole of additional queries as I wonder &amp;ldquo;what do I have in &lt;em&gt;that&lt;/em&gt; category?&amp;rdquo; but I&amp;rsquo;ll spare you that part.&lt;/p&gt;
&lt;p&gt;tracker-sparql has a few command line options that are shortcuts to common queries for exploring a dataset. For example, &lt;code&gt;-c&lt;/code&gt; lists classes, and gave me a list of 230. A query for distinct &lt;code&gt;rdf:type&lt;/code&gt; values showed only 67 being used in my file metadata, so I assume that &lt;code&gt;-c&lt;/code&gt; refers to classes that are declared in an internal schema. The tracker-stats utility shows how many instances each class has. (The &amp;ldquo;SEE ALSO&amp;rdquo; section of the &lt;a href=&#34;http://manpages.ubuntu.com/manpages/oneiric/man1/tracker-store.1.html&#34;&gt;help page for tracker-store&lt;/a&gt; had the best list I could find of the various tracker utilities.)&lt;/p&gt;
&lt;p&gt;The tracker indexer also pulls fairly typical metadata out of image files. Unfortunately, it doesn&amp;rsquo;t pull latitude and longitude data out when present, but it does let you add and query tag values in images. I played with this using the image file above, which shows a paper lantern with the anarchy symbol that I saw in San Francisco&amp;rsquo;s Chinatown during the 2010 Semantic Technologies conference. Using the &lt;code&gt;tracker-tag&lt;/code&gt; utility, I added a tag to the image like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;tracker-tag --add=anarchy /my/path/semtech/2010/pics/IMG_5257.jpg
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This added the following triples to the dataset:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;@prefix nao:  &amp;lt;http://www.semanticdesktop.org/ontologies/2007/08/15/nao#&amp;gt; . 
@prefix tr:   &amp;lt;http://www.tracker-project.org/ontologies/tracker#&amp;gt; .
@prefix rdfs: &amp;lt;http://www.w3.org/2000/01/rdf-schema#&amp;gt; . 
@prefix rdf:  &amp;lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#&amp;gt; . 
@prefix nao:  &amp;lt;http://www.semanticdesktop.org/ontologies/2007/08/15/nao#&amp;gt; . 


&amp;lt;urn:uuid:5aa32bbc-7f08-da08-3bbd-8ae6650411fb&amp;gt; nao:hasTag  
  &amp;lt;urn:uuid:a49c693c-d439-529b-8e27-296d589e905c&amp;gt; . 


&amp;lt;urn:uuid:a49c693c-d439-529b-8e27-296d589e905c&amp;gt;
  tr:added &amp;quot;2014-01-18T22:31:44Z&amp;quot; ;
  tr:modified 7170 ;
  rdf:type rdfs:Resource ;
  rdf:type  nao:Tag ;
  nao:prefLabel &amp;quot;anarchy&amp;quot; . 
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The first triple says that the image resource has a particular tag, and the remaining triples tell us about that tag. It was nice to see that the tag is a resource and not just a string, so it can be renamed without losing its relationships with tagged resources. It also means that the tag itself can have additional metadata assigned to it such as &lt;code&gt;skos:broader&lt;/code&gt; values to create a taxonomy hierarchy. And of course, there are all kinds of possibilities for SPARQL queries about what is tagged with what. (It would be fun to pull a set of nao:Tag resource triples into &lt;a href=&#34;http://www.topquadrant.com/products/topbraid-enterprise-vocabulary-net/&#34;&gt;TopBraid EVN&lt;/a&gt; and really turn them into a proper SKOS taxonomy.)&lt;/p&gt;
&lt;p&gt;A few random closing notes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;I tried a few SPARQL 1.1 features like BIND and contains() with no luck, but the tracker-sparql help page does show that the count() function and SPARQL UPDATE are supported. I tried adding a triple with an UPDATE request, but I didn&amp;rsquo;t get it to work. If it was possible to add arbitrary triples about existing resources, we could store additional data about them such as the &lt;code&gt;skos:broader&lt;/code&gt; values mentioned above and triples about the latitude and longitude where the picture was taken, which &lt;a href=&#34;http://www.sno.phy.queensu.ca/~phil/exiftool/&#34;&gt;ExifTool&lt;/a&gt; can extract from image files. Apache Tika, which I&amp;rsquo;ve &lt;a href=&#34;https://www.bobdc.com/blog/pull-rdf-metadata-out-of-jpegs&#34;&gt;written about here before&lt;/a&gt;, would also be great to throw into the mix.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;It&amp;rsquo;s interesting that the resources were identified with URNs instead of URLs.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The Adrian Perez blog post &lt;a href=&#34;http://perezdecastro.org/2012/some-tracker-sparql-bits.html&#34;&gt;Some Tracker + SPARQL bits&lt;/a&gt; has some good tips, and it points to two blog entries by Adrien Bustany that describe some nice predicate functions built into Tracker&amp;rsquo;s SPARQL engine.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;It was nice to see the &lt;a href=&#34;http://www.semanticdesktop.org/ontologies/&#34;&gt;Nepomuk&lt;/a&gt; ontology used here. Talk about a semantic desktop! (Since writing the first draft of this, I have learned that the next generation of Nepomuk is &lt;a href=&#34;http://community.kde.org/Baloo#Why_change_Nepomuk.3F&#34;&gt;not using RDF&lt;/a&gt;, which I was sorry to hear.) It would be nice to see a schema for the Tracker-specific classes and properties; the &lt;a href=&#34;http://www.tracker-project.org/ontologies&#34;&gt;http://www.tracker-project.org/ontologies&lt;/a&gt; base URI used for some of the namespaces currently doesn&amp;rsquo;t go anywhere. (If someone can point me to such a schema, I&amp;rsquo;d be happy to update this.)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The metadata that the indexer pulled from a PDF on my hard disk included the complete text of the PDF stored using the &lt;code&gt;nie:plainTextContent&lt;/code&gt; property. That could be very useful for searches and text extraction.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Playing with this dataset, if I limited myself to SPARQL queries about my own MP3s, I could stay busy for hours. Assigning, querying, and curating tags (while I assigned one to a JPEG file above, they could be assigned to any resources), as I mentioned above, is something else that would be a lot of fun to play with. For example, imagine running some text analytics on &lt;code&gt;nie:plainTextContent&lt;/code&gt; values to come up with tag values to assign to that PDF. And, if music files have an artist property and PDFs have a plainTextContent property, there are probably plenty of other properties that are specific to certain file types and reveal interesting things about them—especially when queried with SPARQL to find patterns among the values of the files in your own collection.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2014">2014</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
    </item>
    
    <item>
      <title>Storing and querying RDF in Neo4j</title>
      <link>https://www.bobdc.com/blog/storing-and-querying-rdf-in-ne/</link>
      <pubDate>Tue, 07 Jan 2014 08:56:28 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/storing-and-querying-rdf-in-ne/</guid>
      
      
      <description><div>Hands-on experience with another NoSQL database manager.</div><div>&lt;p&gt;In the &lt;a href=&#34;https://en.wikipedia.org/wiki/NoSQL#Taxonomy&#34;&gt;typical classification&lt;/a&gt; of NoSQL databases, the &amp;ldquo;graph&amp;rdquo; category is one that was not covered in the &amp;ldquo;NoSQL Databases for RDF: An Empirical Evaluation&amp;rdquo; paper that I described in my &lt;a href=&#34;https://www.bobdc.com/blog/storing-and-querying-rdf-in-no&#34;&gt;last blog entry&lt;/a&gt;. (Several were &amp;ldquo;column-oriented&amp;rdquo; databases, which I always thought sounded like triple stores—the &amp;ldquo;table&amp;rdquo; part of they way people describe these always sounded to me like a stretched metaphor designed to appeal to relational database developers.) A triplestore is a graph database, and Brazilian software developer &lt;a href=&#34;http://www.linkedin.com/pub/paulo-roberto-costa-leite/24/154/749&#34;&gt;Paulo Roberto Costa Leite&lt;/a&gt; has developed a &lt;a href=&#34;http://neo4j-contrib.github.io/sparql-plugin/&#34;&gt;SPARQL plugin&lt;/a&gt; for Neo4j, the most popular of the NoSQL graph databases. This gave me enough incentive to install Neo4j and play with it and the SPARQL plugin.&lt;/p&gt;
&lt;blockquote id=&#34;id118461&#34; class=&#34;pullquote&#34;&gt;While this plugin has a ways to go before people can get serious work done with it, it&#39;s still a great start and fun to play with.&lt;/blockquote&gt;
&lt;p&gt;To quote Neo4j&amp;rsquo;s &lt;a href=&#34;http://www.neo4j.org/&#34;&gt;home page&lt;/a&gt;, it&amp;rsquo;s &amp;ldquo;a robust (fully ACID) transactional property graph database. Due to its graph data model, Neo4j is highly agile and blazing fast. For connected data operations, Neo4j runs a thousand times faster than relational databases.&amp;rdquo; According to the popular NoSQL introduction &lt;a href=&#34;http://www.amazon.com/exec/obidos/ISBN=1934356921/bobducharmeA/&#34;&gt;Seven Databases in Seven Weeks&lt;/a&gt;, Neo4j &amp;ldquo;can store tens of billions of nodes and as many edges.&amp;rdquo; The ability to distribute a database across a cluster is another thing that makes Neo4j popular.&lt;/p&gt;
&lt;p&gt;From what I can tell, at least on Windows, you don&amp;rsquo;t want the installer version of Neo4j on its &lt;a href=&#34;http://www.neo4j.org/download&#34;&gt;download&lt;/a&gt; page, because that doesn&amp;rsquo;t create a plugins directory where you can add the SPARQL one, so get the zip version. I got release 1.9.5 of that one.&lt;/p&gt;
&lt;p&gt;I don&amp;rsquo;t know much about Neo4j except some basics that I read in the &amp;ldquo;Seven Databses&amp;rdquo; book, so please forgive any basic misunderstandings or big deviations from standard Neo4j practices. Once I installed it and started it up with bin\neo4j.bat, I sent a browser to the main screen at http://localhost:7474 to make sure that I had installed it properly. This all worked fine; installation was really just a matter of unzipping, once I determined the right distribution to unzip.&lt;/p&gt;
&lt;p&gt;To install the SPARQL plugin, I downloaded the distribution zip file from its &lt;a href=&#34;https://github.com/paulrocost/sparqlPlugin-Neo4j&#34;&gt;github page&lt;/a&gt; (not to be confused with the project&amp;rsquo;s &lt;a href=&#34;https://github.com/neo4j-contrib/sparql-plugin&#34;&gt;github page&lt;/a&gt;, which has the source), unzipped that inside of the neo4j-community-1.9.5\plugins folder, and restarted neo4j (that is, I shut it down with a ^C in the terminal window that it created when I started it up, then started it again the same way I did originally).&lt;/p&gt;
&lt;h2 id=&#34;id118284&#34;&gt;Inserting data&lt;/h2&gt;
&lt;p&gt;I like to use &lt;a href=&#34;http://curl.haxx.se/&#34;&gt;curl&lt;/a&gt; to test RESTful (or REST-ish) interfaces, and found that I had better luck interacting with Neo4j by using curl from the &lt;a href=&#34;http://www.cygwin.com/&#34;&gt;cygwin&lt;/a&gt; sh shell under Windows than using it with the native Windows command line prompt. Following some examples in the SPARQL plugin&amp;rsquo;s documentation, I tried the following, which successfully inserted some data. (Assume that all curl command lines shown here were actually executed as a single line.)&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;curl -X POST -H Content-Type:application/json -H Accept:application/json 
  --data-binary @sampledata.txt 
  http://localhost:7474/db/data/ext/SPARQLPlugin/graphdb/insert_quad 
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The sampledata.txt file named in that command line had this in it:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;{ 
  &amp;quot;s&amp;quot; : &amp;quot;http://neo4j.org#jim&amp;quot;,  
  &amp;quot;p&amp;quot; : &amp;quot;http://neo4j.org#knows&amp;quot;,  
  &amp;quot;o&amp;quot; : &amp;quot;http://neo4j.org#mitch&amp;quot;,  
  &amp;quot;c&amp;quot; : &amp;quot;http://neo4j.org&amp;quot; 
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Note that it&amp;rsquo;s inserting a quad, not a triple, with &amp;ldquo;c&amp;rdquo; being a named graph. I&amp;rsquo;m guessing that the &amp;ldquo;c&amp;rdquo; stands for &amp;ldquo;context&amp;rdquo; because the plugin uses a lot of &lt;a href=&#34;http://www.openrdf.org/&#34;&gt;Sesame&lt;/a&gt; jar files.&lt;/p&gt;
&lt;p&gt;The following successfully inserted a similar query with the quad specified on the command line:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;curl -X POST -H Content-Type:application/json -H 
   Accept:application/json 
   http://localhost:7474/db/data/ext/SPARQLPlugin/graphdb/insert_quad  
   -d &#39;{  &amp;quot;s&amp;quot; : &amp;quot;http://neo4j.org#joe&amp;quot;,  &amp;quot;p&amp;quot; : &amp;quot;http://neo4j.org#knows&amp;quot;,  
   &amp;quot;o&amp;quot; : &amp;quot;http://neo4j.org#sara&amp;quot;,  &amp;quot;c&amp;quot; : &amp;quot;http://neo4j.org&amp;quot;}&#39;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This worked to insert a literal string,&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;curl -X POST -H Content-Type:application/json -H Accept:application/json 
  http://localhost:7474/db/data/ext/SPARQLPlugin/graphdb/insert_quad -d 
  &#39;{  &amp;quot;s&amp;quot; : &amp;quot;http://neo4j.org#joe&amp;quot;,  &amp;quot;p&amp;quot; : &amp;quot;http://learningsparql.com/ns/data#lastName&amp;quot;, 
  &amp;quot;o&amp;quot; : &amp;quot;\&amp;quot;Schmoe\&amp;quot;&amp;quot;,  &amp;quot;c&amp;quot; : &amp;quot;http://learningsparql.com/ns/data#test1/&amp;quot;}&#39;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;and this inserted a value with an explicit type:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;curl -X POST -H Content-Type:application/json -H Accept:application/json 
  http://localhost:7474/db/data/ext/SPARQLPlugin/graphdb/insert_quad  -d 
  &#39;{  &amp;quot;s&amp;quot; : &amp;quot;http://neo4j.org#joe&amp;quot;,  &amp;quot;p&amp;quot; : &amp;quot;http://learningsparql.com/ns/data#hireDate&amp;quot;, 
  &amp;quot;o&amp;quot; : &amp;quot;\&amp;quot;2012-11-09\&amp;quot;^^&amp;lt;http://www.w3.org/2001/XMLSchema#date&amp;gt;&amp;quot;,  &amp;quot;c&amp;quot; : 
  &amp;quot;http://learningsparql.com/ns/data#test1/&amp;quot;}&#39;
&lt;/code&gt;&lt;/pre&gt;
&lt;h2 id=&#34;id120764&#34;&gt;Querying&lt;/h2&gt;
&lt;p&gt;With this SPARQL query stored in neo4jquery1.json,&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;{
  &amp;quot;query&amp;quot; : &amp;quot;SELECT * WHERE { ?s &amp;lt;http://neo4j.org#knows&amp;gt; ?o .}&amp;quot;
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I entered this at the cygwin sh prompt,&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;curl -X POST -H Content-Type:application/json -H Accept:application/json  
   --data-binary @neo4jquery1.json 
   http://localhost:7474/db/data/ext/SPARQLPlugin/graphdb/execute_sparql
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;and got this result:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;[ {
  &amp;quot;s&amp;quot; : &amp;quot;http://neo4j.org#jane&amp;quot;,
  &amp;quot;o&amp;quot; : &amp;quot;http://neo4j.org#jim&amp;quot;
}, {
  &amp;quot;s&amp;quot; : &amp;quot;http://neo4j.org#joe&amp;quot;,
  &amp;quot;o&amp;quot; : &amp;quot;http://neo4j.org#sara&amp;quot;
} ]
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I found it best to execute queries from a stored file like that, because although JSON won&amp;rsquo;t let me spread a string (in the case, the query itself) across multiple lines, it was still a little easier than packing it into a curl command line with the other parameters.&lt;/p&gt;
&lt;p&gt;A similar command line executed this query, which specifies the named graph whose triples should be returned:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;{
  &amp;quot;query&amp;quot; : &amp;quot;SELECT * WHERE { GRAPH &amp;lt;http://neo4j.org&amp;gt;  {?s ?p ?o }  }&amp;quot;
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I tried a few random SPARQL 1.1 features such as BIND and COUNT, and they worked fine. Because most of the Sesame JAR files say &amp;ldquo;2.6.10,&amp;rdquo; which is only a little more than a year old, I&amp;rsquo;m guessing that the support of the SPARQL 1.1 query language is pretty complete.&lt;/p&gt;
&lt;p&gt;The plugin currently does not support the &lt;a href=&#34;http://www.w3.org/TR/sparql11-update/&#34;&gt;SPARQL UPDATE&lt;/a&gt; language. Deleting the data inserted above would require the use of &lt;a href=&#34;http://components.neo4j.org/neo4j-server/1.4/rest.html&#34;&gt;native Neo4j commands&lt;/a&gt;, which would require you to know the internal Neo4j identifiers used for the nodes and edges that represent RDF resources and predicates. Perhaps a bit ironically to RDF people, these identifiers are URIs, but they will rarely be universally unique; for example, my URI &lt;a href=&#34;http://neo4j.org&#34;&gt;http://neo4j.org&lt;/a&gt;#mitch was actually stored with the URI http://localhost:7474/db/data/node/7, a URI that very likely refers to other resources on other Neo4j installations that use the default system name and port number of localhost:7474. (I assume that much of Paulo&amp;rsquo;s work in building the query plugin was mapping from the SPARQL URI references to the internal Neo4j references.)&lt;/p&gt;
&lt;h2 id=&#34;id120927&#34;&gt;The plugin, JSON, and the future&lt;/h2&gt;
&lt;p&gt;You&amp;rsquo;ve probably noticed that all the input and output to this SPARQL plugin is always JSON: you send data and queries to Neo4j embedded in JSON, and your results are JSON, but not &lt;a href=&#34;http://www.w3.org/TR/sparql11-results-json/&#34;&gt;the W3C SPARQL Query Results JSON Format&lt;/a&gt;. This use of JSON isn&amp;rsquo;t specific to Paulo&amp;rsquo;s plugin, but a default for the &lt;a href=&#34;http://docs.neo4j.org/chunked/stable/rest-api.html&#34;&gt;Neo4j REST API&lt;/a&gt;, which currently provides the context for all SPARQL-oriented communication with a Neo4j server. While the plugin&amp;rsquo;s &lt;a href=&#34;http://neo4j-contrib.github.io/sparql-plugin/&#34;&gt;documentation&lt;/a&gt; refers to an endpoint, it&amp;rsquo;s not a SPARQL endpoint in the sense that it supports the &lt;a href=&#34;http://www.w3.org/TR/sparql11-protocol/&#34;&gt;SPARQL Protocol&lt;/a&gt; (the &amp;ldquo;P&amp;rdquo; in &amp;ldquo;SPARQL&amp;rdquo;), but an endpoint that, at this point, has its own interface for accepting SPARQL queries and delivering results.&lt;/p&gt;
&lt;p&gt;The insert_quad and execute_sparql methods shown above are currently the only two that the plugin offers, and as you might guess from the singular form of &amp;ldquo;insert_quad,&amp;rdquo; it can only insert one at a time. For now, inserting multiple quads will mean either multiple calls to this method or digging down into the lower levels of the plugin.&lt;/p&gt;
&lt;p&gt;So, while this plugin has a ways to go before people can get serious work done with it, it&amp;rsquo;s still a great start and fun to play with. I don&amp;rsquo;t want to finish this with a discussion of the RDF features that it&amp;rsquo;s missing, but instead with some mentions of the cool Neo4j things that would be great to try with RDF. I&amp;rsquo;ve already mentioned the ease with which data can apparently be distributed across clusters; another is Neo4j&amp;rsquo;s built-in &lt;a href=&#34;http://docs.neo4j.org/chunked/stable/rest-api-graph-algos.html&#34;&gt;shortest path&lt;/a&gt; algorithm(s), something I&amp;rsquo;ve always wanted for an RDF store.&lt;/p&gt;
&lt;p&gt;I look forward to Paulo&amp;rsquo;s future work, and I&amp;rsquo;d like to thank him for helping this Neo4j neophyte get this far with Neo4j and with his plugin.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;6/23/14 update: I have just discovered Michael B&amp;rsquo;s &lt;a href=&#34;http://michaelbloggs.blogspot.de/2013/05/importing-ttl-turtle-ontologies-in-neo4j.html&#34;&gt;Importing ttl (Turtle) ontologies in Neo4j&lt;/a&gt; from over a year ago. It describes things mostly in terms of Java source code, so I&amp;rsquo;m not about to jump on it and try it out right away, it but will make a good resource for people interested in using RDF in Neo4j. And, the fact that he&amp;rsquo;s an IBM employee makes it more interesting.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;1/19/2015 update: You may also be interested in my recent &lt;a href=&#34;https://twitter.com/jimwebber/status/555332116352098305&#34;&gt;Twitter exchange&lt;/a&gt; with Neo4j&amp;rsquo;s chief scientist after he said that Neo4j supports SPARQL and pointed to Paulo&amp;rsquo;s library and this blog entry.&lt;/em&gt;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2014">2014</category>
      
      <category domain="https://www.bobdc.com//categories/nosql">NoSQL</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
    </item>
    
    <item>
      <title>Storing (and querying) RDF in NoSQL database managers</title>
      <link>https://www.bobdc.com/blog/storing-and-querying-rdf-in-no/</link>
      <pubDate>Wed, 04 Dec 2013 08:36:31 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/storing-and-querying-rdf-in-no/</guid>
      
      
      <description><div>Interesting progress, carefully measured.</div><div>&lt;blockquote id=&#34;id134844&#34; class=&#34;pullquote&#34;&gt; &#34;...we are confident that NoSQL databases will present an ever growing opportunity to store and manage RDF data in the cloud.&#34;&lt;/blockquote&gt;
&lt;p&gt;A little over a year ago, in a blog entry titled &lt;a href=&#34;https://www.bobdc.com/blog/sparql-and-big-data-and-nosql&#34;&gt;SPARQL and Big Data (and NoSQL)&lt;/a&gt;, I wrote this:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;What I&amp;rsquo;d love to see, and have heard about tentative steps toward, would be SPARQL endpoints for some of these NoSQL database systems. The &lt;a href=&#34;http://d2rq.org/&#34;&gt;D2RQ&lt;/a&gt; and &lt;a href=&#34;http://www.w3.org/TR/r2rml/&#34;&gt;R2RML&lt;/a&gt; work have accomplished things that should be easier for graph-oriented NoSQL databases like Neo4J and, if I understand the quote above [from Edd Dumbill&amp;rsquo;s &lt;a href=&#34;http://shop.oreilly.com/product/0636920025559.do&#34;&gt;Planning for Big Data&lt;/a&gt;] correctly, for column-oriented NoSQL databases as well. Google searches on SPARQL and either Hadoop, Neo4J, HBase, or Cassandra show that some people have been discussing and even doing a bit of coding on several of these.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Discussions and bits of coding are nice, but I recently found something much better in a paper titled &amp;ldquo;NoSQL Databases for RDF: An Empirical Evaluation&amp;rdquo; (&lt;a href=&#34;http://ribs.csres.utexas.edu/nosqlrdf/nosqlrdf_iswc2013.pdf&#34;&gt;pdf&lt;/a&gt;)—a methodical comparison of the storage and querying of RDF in different NoSQL systems. This &lt;a href=&#34;http://iswc2013.semanticweb.org/&#34;&gt;ISWC 2013&lt;/a&gt; paper, written by ten authors from four universities in four countries, included this in its abstract:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;This work is, to the best of our knowledge, the first systematic attempt at characterizing and comparing NoSQL stores for RDF processing. In the following, we describe four different NoSQL stores and compare their key characteristics when running standard RDF benchmarks on a popular cloud infrastructure using both single-machine and distributed deployments.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The paper then describes the storage and querying of RDF using &lt;a href=&#34;http://hbase.apache.org/&#34;&gt;HBase&lt;/a&gt; with &lt;a href=&#34;http://jena.apache.org/&#34;&gt;Jena&lt;/a&gt; for querying, HBase with &lt;a href=&#34;http://hive.apache.org/&#34;&gt;Hive&lt;/a&gt; as the query engine (with Jena&amp;rsquo;s ARQ to parse the queries before converting them to HiveQL), &lt;a href=&#34;https://code.google.com/p/cumulusrdf/&#34;&gt;CumulusRDF&lt;/a&gt; (&lt;a href=&#34;http://cassandra.apache.org/&#34;&gt;Cassandra&lt;/a&gt; with Sesame), and &lt;a href=&#34;http://www.couchbase.com/&#34;&gt;Couchbase&lt;/a&gt;. The study also includes the &lt;a href=&#34;http://4store.org/&#34;&gt;4store&lt;/a&gt; triplestore so that the authors could compare their NoSQL storage benchmarks with those of a native RDF triplestore. (As you might guess from its name, 4store is actually a quad store—and speaking of quads, while adding links to this paragraph, I found that fully four technologies listed here are their own separate Apache projects.)&lt;/p&gt;
&lt;p&gt;The benchmarks and testing environments are all rigorously documented in the paper. You can read these details yourself, so I&amp;rsquo;ll skip ahead to the end of their conclusion: &amp;ldquo;we are confident that NoSQL databases will present an ever growing opportunity to store and manage RDF data in the cloud.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;I didn&amp;rsquo;t recognize many of the authors&amp;rsquo; names, but I certainly recognized the name of &lt;a href=&#34;http://www.juansequeda.com/&#34;&gt;Juan Sequeda&lt;/a&gt; of the University of Texas and Capsenta. His PhD work at UT that led to Capsenta&amp;rsquo;s Ultrawrap product makes Juan about the most qualified person I can think of to perform this kind of methodical review of the potential value of NoSQL database managers for storing and querying RDF, so I&amp;rsquo;m glad that he and his co-authors on the paper are doing this. Additional good news is that they&amp;rsquo;ve made &amp;ldquo;all results, as well as [their] source code, how-to guides, and EC2 images to rerun [their] experiments&amp;rdquo; available on their project&amp;rsquo;s &lt;a href=&#34;http://ribs.csres.utexas.edu/nosqlrdf&#34;&gt;web site&lt;/a&gt; for others to build on, and it looks like they have continued that work since publishing the paper. I look forward to further reports from them as efforts to store RDF in NoSQL database managers move forward.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2013">2013</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
      <category domain="https://www.bobdc.com//categories/triplestores">triplestores</category>
      
    </item>
    
    <item>
      <title>Using SPARQL queries from native Android apps</title>
      <link>https://www.bobdc.com/blog/using-sparql-queries-from-nati/</link>
      <pubDate>Sat, 09 Nov 2013 09:13:54 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/using-sparql-queries-from-nati/</guid>
      
      
      <description><div>With a free, kid-friendly development kit.</div><div>&lt;img id=&#34;id123906&#34; src=&#34;https://www.bobdc.com/img/main/appinventorrdf.png&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;App Inventor and RDF logos&#34;/&gt;
&lt;p&gt;Google once developed a simple environment called Google App Inventor for easy development of native Android apps. After they announced that they would discontinue support and open source it in 2011, the MIT Center for Mobile learning picked it up, so it&amp;rsquo;s now the &lt;a href=&#34;http://appinventor.mit.edu/&#34;&gt;MIT App Inventor&lt;/a&gt;. (Its &lt;a href=&#34;http://en.wikipedia.org/wiki/App_Inventor_for_Android&#34;&gt;Wikipedia page&lt;/a&gt; has a nice summary of its history.) I played with it a bit and found it pretty easy to build apps for my phone, even an app that used an RDFS model to drive a user interface. My simple experiments only scratched the surface of what was possible using SPARQL and RDF as part of a mobile app, and much more sophisticated work is on the way from our friends at the &lt;a href=&#34;http://tw.rpi.edu/&#34;&gt;Tetherless World Constellation&lt;/a&gt; group.&lt;/p&gt;
&lt;p&gt;To get a flavor for how application development works with this toolkit, flip through some of the &lt;a href=&#34;http://appinventor.mit.edu/explore/tutorials.html&#34;&gt;tutorials&lt;/a&gt;, especially the &lt;a href=&#34;http://appinventor.mit.edu/explore/content/hellopurr.html&#34;&gt;Hello Purr&lt;/a&gt; one where they recommend that you start. After installing an App Inventor tool on your phone, you log in with a Google ID to a web-based application that lets you design your screens by dragging on and configuring various components. From there, you download a Java application called the blocks editor where you configure programming logic. With a wi-fi connection from the machine running those to your phone, you can try out your app on your phone as you work on it with the screen designer and blocks editor. The &lt;a href=&#34;http://beta.appinventor.mit.edu/learn/reference/&#34;&gt;documentation&lt;/a&gt; tells you more about the available components and blocks.&lt;/p&gt;
&lt;p&gt;The screen designer lets you add pick lists to your app, and because their choices can be configured dynamically (and because an App Inventor web component lets you do HTTP GETs, and because there are plenty of string manipulation functions) I wrote an app that sets the pick list choices with the results of a SPARQL query. It&amp;rsquo;s a nice example of model-driven development—something we talk about a lot at TopQuadrant—in which an application&amp;rsquo;s behavior is driven by a model stored in an RDFS schema or an OWL ontology.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve written more about how my little phone app works below, but first wanted to say a little more about the Tetherless World Constellation&amp;rsquo;s work on new App Inventor semantic web components for use in these applications, because these will allow much more sophisticated use of RDF than my demo does. Others at MIT have already built a &lt;a href=&#34;http://web.mit.edu/newsoffice/2013/building-disaster-relief-phone-apps-0930.html&#34;&gt;Disaster relief phone app&lt;/a&gt; with it.&lt;/p&gt;
&lt;p&gt;Instead of logging in to the web-based screen design app mentioned above, using these new semantic web components currently requires the use of a specialized version of this application hosted at tw.rpi.edu. Apparently these new components are on their way to inclusion in &lt;a href=&#34;http://appinventor.mit.edu/explore/tutorial-version/app-inventor-2.html&#34;&gt;App Inventor 2&lt;/a&gt;, so I&amp;rsquo;m really looking forward to that. You can learn more about how these extensions work from a YouTube video of a presentation by Tetherless World&amp;rsquo;s Evan Patton titled &lt;a href=&#34;http://www.youtube.com/watch?v=lR8s6AtO24Q&#34;&gt;Extending the MIT AppInventor with Semantic Web Techno&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;My little application lets you pick an item of clothing, the size, and the color, and then it sends a string of the selected data off to a script on another server.&lt;/p&gt;
&lt;img id=&#34;id123736&#34; src=&#34;https://www.bobdc.com/img/main/appinventor1.png&#34; height=&#34;400&#34;/&gt;
&lt;p&gt;The choice of items, sizes, and colors comes from the model below, stored on a SPARQL endpoint:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;@prefix ps:   &amp;lt;http://snee.com/ns/demos/productSchema#&amp;gt; . 
@prefix rdfs: &amp;lt;http://www.w3.org/2000/01/rdf-schema#&amp;gt; .
@prefix rdf:  &amp;lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#&amp;gt; . 

ps:Product a rdfs:Class . 

ps:Color   a rdfs:Class . 

ps:Size    a rdfs:Class . 

ps:color a rdf:Property ;
         rdfs:domain ps:Product ;
         rdfs:range ps:Color .

ps:size a rdf:Property ;
        rdfs:domain ps:Product ;
        rdfs:range ps:Size .

ps:tshirt  a ps:Product ; rdfs:label &amp;quot;T-shirt&amp;quot; . 
ps:sweater a ps:Product ; rdfs:label &amp;quot;sweater&amp;quot; . 
ps:pants   a ps:Product ; rdfs:label &amp;quot;pants&amp;quot; . 

ps:black a ps:Color ; rdfs:label &amp;quot;black&amp;quot; . 
ps:blue  a ps:Color ; rdfs:label &amp;quot;blue&amp;quot; . 
ps:white a ps:Color ; rdfs:label &amp;quot;white&amp;quot; . 

ps:small  a ps:Size ; rdfs:label &amp;quot;small&amp;quot; . 
ps:medium a ps:Size ; rdfs:label &amp;quot;medium&amp;quot; . 
ps:large  a ps:Size ; rdfs:label &amp;quot;large&amp;quot; . 
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;When you touch the color button on the interface, the app displays the choices from the model:&lt;/p&gt;
&lt;img id=&#34;id123860&#34; src=&#34;https://www.bobdc.com/img/main/appinventor2.png&#34; height=&#34;400&#34;/&gt;
&lt;p&gt;Selecting one displays that value on the button:&lt;/p&gt;
&lt;img id=&#34;id123843&#34; src=&#34;https://www.bobdc.com/img/main/appinventor3.png&#34; height=&#34;400&#34;/&gt;
&lt;p&gt;After you select an item, color, and size, touching the Submit button sends the selected data off to another web server with an HTTP GET.&lt;/p&gt;
&lt;p&gt;The interesting part of this app (at least, to RDF geeks) is clearer when we change the model that specifies the interface details—for example, by adding a new instance to the data model&amp;rsquo;s Color class on the server with the SPARQL endpoint:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;ps:red a ps:Color ; rdfs:label &amp;quot;red&amp;quot; . 
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;After clicking the app&amp;rsquo;s Refresh button (or shutting down and restarting the app) the next time you press the Select Color button, you&amp;rsquo;ll see the new choice of colors reflected:&lt;/p&gt;
&lt;img id=&#34;id126289&#34; src=&#34;https://www.bobdc.com/img/main/appinventor4.png&#34; height=&#34;400&#34;/&gt;
&lt;p&gt;Here&amp;rsquo;s how it works: upon startup or when pressing the Refresh button, the app sends the following query to the SPARQL endpoint to find out how instances of the Product class are modeled, requesting the result as comma-separated values. To do this, the query asks for all the properties associated with the Product class (that is, which properties have an &lt;code&gt;rdfs:domain&lt;/code&gt; of &lt;code&gt;ps:Product&lt;/code&gt;) and what their potential values are (that is, what the instances of the class specified as each property&amp;rsquo;s &lt;code&gt;rdfs:range&lt;/code&gt; value are):&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;PREFIX ps:   &amp;lt;http://snee.com/ns/demos/productSchema#&amp;gt;
PREFIX rdfs: &amp;lt;http://www.w3.org/2000/01/rdf-schema#&amp;gt; 

SELECT ?list ?listItem
WHERE 
{
  {
    ?property rdfs:domain ps:Product ;
              rdfs:range ?range . 
    ?propertyValue a ?range ;
                   rdfs:label ?listItem .
    BIND(strafter(str(?property),&amp;quot;#&amp;quot;) AS ?list)
  }
  UNION
  { 
    ?item a ps:Product ; rdfs:label ?listItem .
    BIND(&amp;quot;item&amp;quot; AS ?list)
  }
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;App Inventor blocks offer plenty of options for parsing out the CSV, so my app uses some of these to find the values it needs in the query result and then uses those values to set each list picker widget&amp;rsquo;s choices.&lt;/p&gt;
&lt;a name=&#34;multilanguage&#34;/&gt;
&lt;p&gt;If the data had included the value labels in multiple languages, like this, the query above would need a small addition to retrieve only the English versions of the labels:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;ps:black a ps:Color ; rdfs:label &amp;quot;black&amp;quot;@en ; rdfs:label &amp;quot;negro&amp;quot;@es . 
ps:blue  a ps:Color ; rdfs:label &amp;quot;blue&amp;quot;@en  ; rdfs:label &amp;quot;azul&amp;quot;@es .  
ps:white a ps:Color ; rdfs:label &amp;quot;white&amp;quot;@en ; rdfs:label &amp;quot;blanco&amp;quot;@es .  
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;A slight change to that version of the query could have it retrieve the Spanish labels instead, so creating an Android app with easily configurable multi-language support would be easy. (I could use the same technique of using RDF and SPARQL to change the rest of the text on the form—for example, to change the button label &amp;ldquo;Select Size&amp;rdquo; to say &amp;ldquo;Selecciona Tamaño.&amp;rdquo;) Because App Inventor lets you dynamically assemble the URL containing the SPARQL query before sending it to the endpoint, the app could modify the query to retrieve either English or Spanish labels based on whether the user picked &amp;ldquo;English&amp;rdquo; or &amp;ldquo;español&amp;rdquo; from a new &amp;ldquo;Select language&amp;rdquo; button that would be easy to add.&lt;/p&gt;
&lt;p&gt;It would have been even nicer if, instead of hardcoding my form with Item, Color, and Size fields, those could have been auto-generated in the form based on which properties the query found that had an &lt;code&gt;rdfs:domain&lt;/code&gt; value of &lt;code&gt;ps:Product&lt;/code&gt;. This is the sort of thing that the Tetherless World extension will allow. (TopQuadrant&amp;rsquo;s TopBraid platform has always made this possible, but native phone apps are not a current target.)&lt;/p&gt;
&lt;p&gt;I want to reiterate two key points about App Inventor: first, it&amp;rsquo;s very easy to use, drawing a lot on MIT research into programming for kids using environments such as &lt;a href=&#34;http://scratch.mit.edu/&#34;&gt;Scratch&lt;/a&gt;. (Young people continue to be a &lt;a href=&#34;http://appinventor.mit.edu/explore/news/teens-show-app-inventor-apps-technovation-world-pitch-competition.html&#34;&gt;big target&lt;/a&gt; for App Inventor developer evangelism.) Second—and this is especially impressive considering that we&amp;rsquo;re talking about a programming environment that&amp;rsquo;s so easy to use—it&amp;rsquo;s creating &lt;em&gt;native apps&lt;/em&gt;. You&amp;rsquo;re not creating scripts that require some runtime thing to execute; you can create .apk files that anyone with an Android phone can install and use. I think that this is pretty exciting, and the ability to work RDF-based technology into the mix makes it even more exciting.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2013">2013</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
    </item>
    
    <item>
      <title>Lou Reed</title>
      <link>https://www.bobdc.com/blog/lou-reed/</link>
      <pubDate>Mon, 28 Oct 2013 08:35:12 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/lou-reed/</guid>
      
      
      <description><div>And New York City.</div><div>&lt;img id=&#34;id122829&#34; src=&#34;https://www.bobdc.com/img/main/loureed.jpg&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;Lou Reed&#34; width=&#34;200&#34;/&gt;
&lt;p&gt;(To listen to while you read this: &lt;a href=&#34;http://www.youtube.com/watch?v=vAEbOdnRUc0&#34;&gt;The Blue Mask&lt;/a&gt;.) New York City helped to define who Lou Reed was, but since I first became aware of him in the mid-seventies, Lou Reed played a big part in defining what New York City was to me. It&amp;rsquo;s difficult for me to picture the city without him.&lt;/p&gt;
&lt;p&gt;The possibility of actually seeing him in public there was part of the fun. Once, while my wife and I were attending the lesser-known Shakespeare play &amp;ldquo;Corialanus&amp;rdquo; at the BAM Harvey theater, I went to the men&amp;rsquo;s room during the intermission, and on my way out there was a line of guys waiting to get in, with Lou halfway back in the line.&lt;/p&gt;
&lt;p&gt;My better Lou story took place at Rudy&amp;rsquo;s Music Stop somewhere in the mid-eighties. 48th street was known for its music stores, and while there were big famous ones like Sam Ash and Manny&amp;rsquo;s, when I worked near that neighborhood and was in a band I usually went to Rudy&amp;rsquo;s Music Stop, a smaller one, to get guitar strings and so forth. They specialized in Schecter guitars, one of which I had (and still have), and Reed was Schecter&amp;rsquo;s most famous customer. One day on my lunch break I went there to look at some pickups I was thinking of adding to my guitar, and Reed was sitting on a fold-up chair, alone in a store that would have been crowded with seven people in it. You know how sometimes you see a celebrity in day-to-day life and you&amp;rsquo;re not sure whether it&amp;rsquo;s really the person you think it is? With Lou Reed, there was absolutely no question who was sitting there with a black T-shirt and black jeans faded to two different shades of gray.&lt;/p&gt;
&lt;p&gt;I didn&amp;rsquo;t want to be a gushing fan boy and tried to act like a cool New York musician guy, with plenty of inspiration for this three feet away from me. There was an empty, open guitar case across the counter of the store&amp;rsquo;s main glass case, and I had to lean down to peer under it at the pickups that I was interested in, and Reed jumped up said &amp;ldquo;Oh, let me move that for you.&amp;rdquo; I guess he was waiting for them to do some work on the case&amp;rsquo;s contents in the back room; I took the opportunity to say that I had seen him the previous April at the Ritz (after it moved to the former Studio 54—it was his Blue Mask tour) and that he had an amazing band with him: &lt;a href=&#34;http://en.wikipedia.org/wiki/Robert_Quine&#34;&gt;Robert Quine&lt;/a&gt;, another hero of mine, on lead guitar; Fred Maher on drums, and Fernando Saunders on bass. Lou said &amp;ldquo;Thanks, man&amp;rdquo; and I left it at that.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve certainly heard stories of him being an asshole to people, but I&amp;rsquo;ll never forget him jumping up to move his guitar case for me. I&amp;rsquo;ll also never forget how I found &amp;ldquo;White Light White Heat&amp;rdquo; in a local record store cutout bin when I was 16 and thought &amp;ldquo;this is Lou Reed&amp;rsquo;s old band, before he put out &amp;lsquo;Walk on the Wild Side&amp;rsquo; and &amp;lsquo;Rock and Roll Animal&amp;rsquo;&amp;rdquo; and how I brought it home, put it on, and learned—as I learned from William Burroughs around the same time—that there was a much bigger world out there than I had imagined. And I&amp;rsquo;ll never forget how, since then, Reed put out enough great music to guarantee his historical importance even if there had never been a Velvet Underground: Street Hassle, The Blue Mask, Magic and Loss, and the solo albums before I discovered the Velvet Underground: Transformer, Coney Island Baby, Berlin, and all the ones in between. I saw him perform live four times, and he was always mesmerizing and always rocked very, very hard.&lt;/p&gt;
&lt;p&gt;In the simplified history of rock and roll, Bob Dylan showed everyone that lyrics could be about more than cars, girls, and school. From his study with Delmore Schwartz at Syracuse University, Reed had already figured that much out, and when that background got paired with John Cale&amp;rsquo;s Lamont Young and John Cage influence in a loud dissonant band playing at Andy Warhol parties, it made people of that decade and every decade since rethink the possibilities of what rock and roll could be. Reed continued to produce great songs, lyrics, music, and guitar playing in each of those decades, and it&amp;rsquo;s sad to think that we in general and New York City in particular won&amp;rsquo;t have him anymore.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Lou Reed picture by &lt;a href=&#34;http://www.flickr.com/photos/streamofconsciousness/&#34;&gt;Mike McGrath&lt;/a&gt;, &lt;a href=&#34;http://creativecommons.org/licenses/by-nc-nd/2.0/&#34;&gt;Creative Commons CC BY-NC-ND 2.0&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2013">2013</category>
      
      <category domain="https://www.bobdc.com//categories/music">music</category>
      
    </item>
    
    <item>
      <title>Linked Open Data Cloud: The Animated GIF!</title>
      <link>https://www.bobdc.com/blog/linked-open-data-cloud-the-ani/</link>
      <pubDate>Thu, 17 Oct 2013 08:52:10 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/linked-open-data-cloud-the-ani/</guid>
      
      
      <description><div>My first animated GIF.</div><div>&lt;p&gt;&lt;a href=&#34;https://www.bobdc.com/img/main/LODCloud.gif&#34;&gt;&lt;img id=&#34;id128268&#34; src=&#34;https://www.bobdc.com/img/main/LODCloud.gif&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;some description&#34; width=&#34;300&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve had a new respect for animated GIFs since reading Anil Dash&amp;rsquo;s blog posting &lt;a href=&#34;http://dashes.com/anil/2011/07/animated-gifs-triumphant.html&#34;&gt;Animated GIFs triumphant&lt;/a&gt;. When I found that I could create them with &lt;a href=&#34;http://www.gimp.org/&#34;&gt;gimp&lt;/a&gt;, a program on my top five list of software to install on a brand new machine, I couldn&amp;rsquo;t resist trying to make one. I have plenty of PowerPoint presentations where a series of slides show the growth of the &lt;a href=&#34;http://lod-cloud.net/&#34;&gt;Linked Data Cloud&lt;/a&gt;, so I made the animated GIF you see here of the &lt;a href=&#34;http://lod-cloud.net/#history&#34;&gt;available diagrams&lt;/a&gt;. Click it to see the full-sized version.&lt;/p&gt;
&lt;p&gt;While the hilarious &lt;a href=&#34;http://www.youtube.com/show/yousuckatphotoshop&#34;&gt;You Suck at Photoshop&lt;/a&gt; videos are a deliberate parody, the YouTube video &lt;a href=&#34;http://www.youtube.com/watch?v=4yhc1-I0CIY&#34;&gt;Animated Gif Tutorial GIMP 2.6&lt;/a&gt; seems like an untintentional parody, with someone using a jigsaw and other crashing noises in the background, but it did showed me what I had to do. (For one thing, I had to get more comfortable using gimp layers.)&lt;/p&gt;
&lt;p&gt;If you&amp;rsquo;re wondering why there haven&amp;rsquo;t been any new diagrams since September of 2011, the answer is good news: the network of available linked open data sites just got too big to fit into such a diagram. People are making new linked data sources available all the time, both small and experimental and &lt;a href=&#34;https://www.ebi.ac.uk/rdf/&#34;&gt;large and robust-looking&lt;/a&gt;. Lately a lot of people have been complaining about the existence of unreliable public SPARQL endpoints out there; I prefer to concentrate on the ones that work, like the new EMBL-EBI one. (The regular web has plenty of dead sites too. And, I don&amp;rsquo;t build applications with dynamic dependencies on public endpoints. If they have data that will be useful to me, I use SPARQL queries to pull that data down where I can store it locally. I mean, duh.)&lt;/p&gt;
&lt;p&gt;My animation shows a very exciting period in the history of the growth of linked data. And, for an added bonus, I&amp;rsquo;m sure its transition from black and white to color reminds you of the &lt;a href=&#34;http://www.youtube.com/watch?v=x6D8PAGelN8&#34;&gt;corresponding moment&lt;/a&gt; of the &amp;ldquo;The Wizard of Oz,&amp;rdquo; right?&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2013">2013</category>
      
      <category domain="https://www.bobdc.com//categories/linked-data">linked-data</category>
      
    </item>
    
    <item>
      <title>Making charts out of SPARQL query results with sgvizler</title>
      <link>https://www.bobdc.com/blog/making-charts-out-of-sparql-qu/</link>
      <pubDate>Sun, 22 Sep 2013 09:21:28 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/making-charts-out-of-sparql-qu/</guid>
      
      
      <description><div>Embed a query in your HTML, name an endpoint, and pick a chart type.</div><div>&lt;p&gt;I finally got around to trying sgvizler, and I wish I&amp;rsquo;d done so earlier. Once your HTML page references the sgvizler JavaScript and CSS, you can specify a query to send to any SPARQL endpoint you want and then see a chart of the query results on that web page. Scroll down a bit on sgvizler&amp;rsquo;s &lt;a href=&#34;https://code.google.com/p/sgvizler/&#34;&gt;Google code home page&lt;/a&gt; and you&amp;rsquo;ll see a nice range of available chart types.&lt;/p&gt;
&lt;p&gt;After I &lt;a href=&#34;https://code.google.com/p/sgvizler/downloads/list&#34;&gt;downloaded&lt;/a&gt; and unzipped the sgvizler distribution (a file that, before unzipping, was all of 72K in size) I had a directory with a few files and an &lt;code&gt;example&lt;/code&gt; subdirectory. One of the files was sgvizler.html, which displays a &lt;a href=&#34;http://data.semanticweb.org/snorql/&#34;&gt;SNORQL&lt;/a&gt;-like form where you can enter queries to send off to the &lt;a href=&#34;http://sws.ifi.uio.no/sparql/world&#34;&gt;http://sws.ifi.uio.no/sparql/world&lt;/a&gt; endpoint. The page includes a lot of JavaScript code where you can change the endpoint and other parameters; I had some trouble figuring this out and was happy to find that the files in the &lt;code&gt;example&lt;/code&gt; subdirectory were much more minimal and only required the setting of attributes on an HTML &lt;code&gt;div&lt;/code&gt; element to configure.&lt;/p&gt;
&lt;p&gt;Based on those examples, I created a simple web page using sgvizler that creates a &lt;a href=&#34;http://snee.com/sparql/sgvizler/examples/USComputerCompanies.html&#34;&gt;Revenue and operating income of US computer companies&lt;/a&gt; chart using data from DBpedia. If you follow the link in the previous sentence, you&amp;rsquo;ll see the image dynamically generated. Here&amp;rsquo;s a screen shot:&lt;/p&gt;
&lt;img id=&#34;id118496&#34; width=&#34;600&#34; src=&#34;https://www.bobdc.com/img/main/sgvizlerdbp.png&#34; alt=&#34;sgvizler graph using TopBraid Composer as an endpoint&#34;/&gt;
&lt;p&gt;If you follow that link and do a View Source you&amp;rsquo;ll see that very little was required besides the URL of the endpoint and the query itself. The HTML &lt;code&gt;head&lt;/code&gt; element has some &lt;code&gt;link&lt;/code&gt; and &lt;code&gt;script&lt;/code&gt; elements that point to appropriate sgvizler JavaScript and CSS files, and then the actual chart is specified with an empty &lt;code&gt;div&lt;/code&gt; element that enumerates details with various attributes:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt; &amp;lt;div id=&amp;quot;sgvzl_example_query&amp;quot; 
   data-sgvizler-endpoint=&amp;quot;http://dbpedia.org/sparql&amp;quot;
   data-sgvizler-chart=&amp;quot;gColumnChart&amp;quot;
   data-sgvizler-chart-options=&amp;quot;title=Revenue and Operating Income of US Computer Companies (revenue &amp;gt; $1B)|
                                vAxis.title=US Dollars|chartArea.left=150&amp;quot;
   data-sgvizler-loglevel=&amp;quot;2&amp;quot;
   style=&amp;quot;width:1200px; height:400px;&amp;quot;         
   data-sgvizler-query=&#39;
     PREFIX dbo: &amp;lt;http://dbpedia.org/ontology/&amp;gt;
     PREFIX dct: &amp;lt;http://purl.org/dc/terms/&amp;gt;
     SELECT ?name ?revenue ?operatingIncome
     WHERE {
       ?company rdfs:label ?taggedName ;
               dct:subject &amp;lt;http://dbpedia.org/resource/Category:Computer_companies_of_the_United_States&amp;gt; ;
               dbo:revenue ?revenueFloat ;
               dbo:operatingIncome ?opIncomeFloat . 
        BIND(xsd:integer(?revenueFloat) AS ?revenue) . 
        BIND(xsd:integer(?opIncomeFloat) AS ?operatingIncome) .
        BIND(str(?taggedName)AS ?name)
        FILTER(lang(?taggedName) = &amp;quot;en&amp;quot;)
        FILTER(?revenue &amp;gt; 1000000000)
     }
     ORDER BY ?revenue
&#39;&amp;gt;&amp;lt;/div&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The use of the different attributes is documented on their &lt;a href=&#34;https://code.google.com/p/sgvizler/wiki/UsingSgvizler&#34;&gt;UsingSgvizler&lt;/a&gt; Google code page. Note how the actual SPARQL query, bolded above, is just another attribute value: &lt;code&gt;data-sgvizler-query&lt;/code&gt;. (Also note that each &amp;lt; character in the query must be represented as the entity reference &amp;lt; because it&amp;rsquo;s inside of an attribute value.)&lt;/p&gt;
&lt;p&gt;Because sgvizler builds on &lt;a href=&#34;https://developers.google.com/chart/?csw=1&#34;&gt;Google charts&lt;/a&gt;, the options for the &lt;code&gt;data-sgvizler-chart-options&lt;/code&gt; attribute depend on which chart type you select in the &lt;code&gt;data-sgvizler-chart&lt;/code&gt; attribute; see the sgvizler home page for named examples of the options. I picked gColumnChart for image above and found options like &lt;code&gt;title&lt;/code&gt; and &lt;code&gt;vAxis.title&lt;/code&gt; at the Google Charts &lt;a href=&#34;https://google-developers.appspot.com/chart/interactive/docs/gallery/columnchart&#34;&gt;Visualization: Column Chart&lt;/a&gt; page. (I haven&amp;rsquo;t tried any of the animation or interactivity options, and I&amp;rsquo;m not sure which sgvizler supports, but they sound like fun.)&lt;/p&gt;
&lt;p&gt;With my USComputerCompanies.html file sitting on my hard disk and the appropriate sgvizler files in its parent directory, I could do a File/Open from Chrome or Firefox and the generated image displayed just fine. It turns out that the image doesn&amp;rsquo;t necessarily display when opening an HTML page like this if the SPARQL endpoint that it references is local, as opposed to being remote like DBpedia. I think this is because of some jQuery features to prevent malicious code from doing damage. To make it work with a local endpoint, the key is to point your browser at a local web server and open the HTML file with an http:// URL instead of using File/Open to open it as if it were a file:/// URL. As far as I could tell, this was necessary with any HTML file that used sgvizler to send a SPARQL query to an endpoint at http://localhost.&lt;/p&gt;
&lt;p&gt;For example, &lt;a href=&#34;http://www.topquadrant.com/products/TB_Composer.html&#34;&gt;TopBraid Composer&lt;/a&gt; Maestro Edition can act as a local SPARQL endpoint, but sgvizler wouldn&amp;rsquo;t create a chart of results retrieved from http://localhost:8083/tbl/sparql when I displayed my tbtest.html file by selecting File/Open in my browser. When I put tbtest.html in a tbl-www\sgvtest\sample subdirectory of a project called sandbox and the appropriate sgvizler files in the tbl-www\sample directory, sgvizler had no problem when I sent a browser to http://localhost:8083/tbl/data/sandbox/sgvtest/sample/tbtest.html, and displayed this graph of which schools had more than one attendee from the Kennedys extended family:&lt;/p&gt;
&lt;img id=&#34;id120783&#34; width=&#34;600&#34; src=&#34;https://www.bobdc.com/img/main/sgvizlertbc.png&#34; alt=&#34;sgvizler graph using TopBraid Composer as an endpoint&#34;/&gt;
&lt;p&gt;A different example: using &lt;a href=&#34;http://www.openrdf.org/&#34;&gt;Sesame&lt;/a&gt; as a local endpoint, I created a sesameTest.html file that sent a SPARQL query to the Sesame endpoint http://localhost:8080/openrdf-sesame/repositories/myRepo. When I stored the HTML file in webapps\ROOT\sgvtest\samples in the Tomcat directory where I was running Sesame (ROOT being where you&amp;rsquo;d store files being delivered by Tomcat acting as a regular web server outside of the Sesame servlet), I opened up the web page as http://localhost:8080/sgvtest/samples/sesameTest.html and sgvizler generated the test graph from the results of that page&amp;rsquo;s SPARQL query just fine, unlike when I opened the same file with File/Open.&lt;/p&gt;
&lt;p&gt;All my tests showed one chart at a time, but some examples on the sgvizler web site show how easy it is to display &lt;a href=&#34;http://sgvizler.googlecode.com/svn/release/0.5/example/exNPD2.html&#34;&gt;multiple graphs at once&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Between the range of available charts, the extra attributes available to customize each chart type&amp;rsquo;s appearance, the fact that I eventually got it to work with every SPARQL endpoint that I tried it with, and the ability to set everything up by merely entering values in &lt;code&gt;div&lt;/code&gt; attributes in an HTML page with no JavaScript wrangling necessary, I am very impressed with sgvizler. I look forward to using it more.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2013">2013</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
    </item>
    
    <item>
      <title>Semantic Web Journal article on DBpedia</title>
      <link>https://www.bobdc.com/blog/semantic-web-journal-article-o/</link>
      <pubDate>Sun, 25 Aug 2013 10:31:15 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/semantic-web-journal-article-o/</guid>
      
      
      <description><div>DBpedia: more impressive all the time.</div><div>&lt;p&gt;&lt;a href=&#34;http://dbpedia.org/About&#34;&gt;&lt;img id=&#34;id140279&#34; src=&#34;https://www.bobdc.com/img/main/dbpedia_logo.png&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;DBpedia logo&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;It took me a while to finally sit down and read the &lt;a href=&#34;http://www.semantic-web-journal.net/&#34;&gt;Semantic Web Journal&lt;/a&gt; paper &amp;ldquo;DBpedia - A Large-scale, Multilingual Knowledge Base Extracted from Wikipedia&amp;rdquo; &lt;a href=&#34;http://www.semantic-web-journal.net/system/files/swj499.pdf&#34;&gt;(pdf)&lt;/a&gt;, but I&amp;rsquo;m glad I did, and I wanted to summarize a few things I learned from it.&lt;/p&gt;
&lt;p&gt;Near the beginning the paper has a good summary of what DBpedia is working from:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Wikipedia articles consist mostly of free text, but also comprise various types of structured information in the form of wiki markup. Such information includes infobox templates, categorisation information, images, geo-coordinates, links to external web pages, disambiguation pages, redirects between pages, and links across different language editions of Wikipedia. The DBpedia extraction framework extracts this structured information from Wikipedia and turns it into a rich knowledge base.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;That&amp;rsquo;s a rich knowledge base that is represented in RDF so that we can query it with SPARQL and treat it as Linked Data.&lt;/p&gt;
&lt;p&gt;According to the article, the DBpedia project began in 2006, and four years later began an effort to develop &amp;ldquo;an ontology schema and mappings from Wikipedia infobox properties to this ontology&amp;hellip; This significantly increases the quality of the raw Wikipedia infobox data by typing resources, merging name variations and assigning specific datatypes to the values.&amp;rdquo; Like DBpedia (and Wikipedia), this development is a community-based effort. The people working on it use the &lt;a href=&#34;http://mappings.dbpedia.org/index.php/Main_Page&#34;&gt;DBpedia Mappings Wiki&lt;/a&gt;, a set of tools that includes a Mapping Validator, an Extraction Tester, and a Mapping Tool.&lt;/p&gt;
&lt;p&gt;I always described DBpedia as an RDF representation of Wikipedia infobox data, but this ontology work is only one example of how it does more than just provide SPARQL access to infobox data. As the infobox data evolves, the work of mapping it to an ontology is never done, so the available properties reflect the differences. For example, I had wondered about the difference between the properties &lt;a href=&#34;http://dbpedia.org/property/birthPlace&#34;&gt;http://dbpedia.org/property/birthPlace&lt;/a&gt; and &lt;a href=&#34;http://dbpedia.org/ontology/birthPlace&#34;&gt;http://dbpedia.org/ontology/birthPlace&lt;/a&gt;, and these two excerpts from the paper&amp;rsquo;s bulleted list about URI schemes explain it:&lt;/p&gt;
&lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;http://dbpedia.org/property/&#34;&gt;http://dbpedia.org/property/&lt;/a&gt; (prefix dbp) for representing properties extracted from the raw infobox extraction (cf. Section 2.3), e.g. dbp:population.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;http://dbpedia.org/ontology/&#34;&gt;http://dbpedia.org/ontology/&lt;/a&gt; (prefix dbo) for representing the DBpedia ontology (cf. Section 2.4), e.g. dbo: populationTotal.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;So, while there has been work to develop a DBpedia ontology, if some infobox field doesn&amp;rsquo;t fit the ontology, they don&amp;rsquo;t throw it out; they define a property for it in the &lt;a href=&#34;http://dbpedia.org/property/&#34;&gt;http://dbpedia.org/property/&lt;/a&gt; namespace. Of course, this doesn&amp;rsquo;t completely answer my original question, because if the ontology includes a &lt;a href=&#34;http://dbpedia.org/ontology/birthPlace&#34;&gt;http://dbpedia.org/ontology/birthPlace&lt;/a&gt; property, that sounds like a good place to store the value that had been stored using &lt;a href=&#34;http://dbpedia.org/property/birthPlace&#34;&gt;http://dbpedia.org/property/birthPlace&lt;/a&gt;. However, comparing the ontology/birthPlace values with the property/birthPlace values for some resources reveals that they don&amp;rsquo;t always line up perfectly, and the alignment can&amp;rsquo;t always be automated—just because the two URIs have the same local name doesn&amp;rsquo;t mean that they refer to the same thing—so the project stores all the values until a human can get to each resource to review these issues.&lt;/p&gt;
&lt;p&gt;I also didn&amp;rsquo;t realize just how much modeling has been done. The diagram in Figure 3 of the paper illustrates some subclass, domain, and range relationships between various classes and properties such as the PopulatedPlace class. &lt;a href=&#34;http://dbpedia.org/snorql/?query=SELECT+*+WHERE+%7B%0D%0A++%3Fs+%3Fp+%3Chttp%3A%2F%2Fdbpedia.org%2Fontology%2FPopulatedPlace%3E%0D%0A++FILTER%28+%3Fp+%21%3D+rdf%3Atype%29%0D%0A%7D&#34;&gt;This SPARQL query&lt;/a&gt; shows not only that this class has six subclasses, but also that many properties have it as a domain or range. When I downloaded the T-BOX ontology that contains this modeling from the DBpedia&amp;rsquo;s &lt;a href=&#34;http://dbpedia.org/Ontology&#34;&gt;DBpedia Ontology&lt;/a&gt; page and brought it up in TopBraid Composer, it looked great:&lt;/p&gt;
&lt;img id=&#34;id140232&#34; src=&#34;https://www.bobdc.com/img/main/dbpediapaper1.png&#34; alt=&#34;dbpedia_3.8.owl in TopBraid Composer&#34; width=&#34;640&#34;/&gt;
&lt;p&gt;(Apparently, the property aircraftHelicopterAttack has a domain of MilitaryUnit and a range of MeanOfTransportation.) Another interesting point about this ontology appears later in the paper: &amp;ldquo;The DBpedia 3.8 ontology contains 45 equivalent class and 31 equivalent property links pointing to &lt;a href=&#34;http://schema.org&#34;&gt;http://schema.org&lt;/a&gt; terms,&amp;rdquo; so it can enhance the value of collections of data using this increasingly popular vocabulary.&lt;/p&gt;
&lt;p&gt;In RDF, object property values are more valuable than literal values because they can lead to additional data (hence the first &lt;a href=&#34;http://www.w3.org/DesignIssues/LinkedData.html&#34;&gt;principle&lt;/a&gt; of Linked Data: &amp;ldquo;Use URIs as names for things&amp;rdquo;), so it was nice to read about this step in DBpedia&amp;rsquo;s data preparation:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;If an infobox contains a string value that is not linked to another Wikipedia article, the extraction framework searches for hyperlinks in the same Wikipedia article that have the same anchor text as the infobox value string. If such a link exists, the target of that link is used to replace the string value in the infobox. This method further increases the number of object property assertions in the DBpedia ontology.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;It was also interesting to see how DBpedia makes changesets available to mirrors:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Whenever a Wikipedia article is processed, we get two disjoint sets of triples. A set for the added triples, and another set for the deleted triples. We write those two sets into N-Triples files, compress them, and publish the compressed files as changesets. If another DBpedia Live mirror wants to synchronise with the DBpedia Live endpoint, it can just download those files, decompress and integrate them.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Section 6.5 of the paper explains how popular this practice is, complete with a graph of synchronization requests.&lt;/p&gt;
&lt;p&gt;DBpedia also uses a lot more Natural Language Processing techniques than I realized, providing some nice connections between the two different senses of the term &amp;ldquo;semantic web.&amp;rdquo; Section 2.6 of the paper (&amp;ldquo;NLP Extraction&amp;rdquo;) describes some fascinating additional work done beyond the straight mapping of infobox fields to RDF. Natural Language Processing technology is used to create datasets of topic signatures, grammatical gender, localizations, and thematic concepts based on analysis of the Wikipedia unstructured free text paragraphs. The thematic concepts one is especially interesting:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The thematic concepts data set relies on Wikipedia’s category system to capture the idea of a ‘theme’, a subject that is discussed in its articles. Many of the categories in Wikipedia are linked to an article that describes the main topic of that category. We rely on this information to mark DBpedia entities and concepts that are ‘thematic’, that is, they are the center of discussion for a category.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I tried to find an example of such a theme tying together some entities and concepts, but had no luck; I&amp;rsquo;d be happy to list a few here if someone can point me in the right direction.&lt;/p&gt;
&lt;p&gt;Section 7.1 describes further NLP work such as the use of &lt;a href=&#34;http://wiki.dbpedia.org/Datasets/NLP&#34;&gt;specialized NLP data sets&lt;/a&gt; &amp;ldquo;to estimate the ambiguity of phrases, to help select unambiguous identifiers for ambiguous phrases, or to provide alternative names for entities, just to mention a few examples.&amp;rdquo; It also describes &lt;a href=&#34;https://github.com/dbpedia-spotlight/dbpedia-spotlight/wiki&#34;&gt;DBpedia Spotlight&lt;/a&gt;, which is&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&amp;hellip;an open source tool including a free web service that detects mentions of DBpedia resources in text&amp;hellip; The main advantage of this system is its comprehensiveness and flexibility, allowing one to configure it based on quality measures such as prominence, contextual ambiguity, topical pertinence and disambiguation confidence, as well as the DBpedia ontology. The resources that should be annotated can be specified by a list of resource types or by more complex relationships within the knowledge base described as SPARQL queries.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This is the first significant free tool I&amp;rsquo;ve heard of that can annotate free text with RDF metadata based on analysis of that text since Reuters Calais&amp;rsquo; free service became available &lt;a href=&#34;https://www.bobdc.com/blog/having-fun-with-reuters-calais&#34;&gt;over five years ago&lt;/a&gt;. I definitely look forward to playing with that.&lt;/p&gt;
&lt;p&gt;A few more fun facts from the paper:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;I had wondered about DBpedia&amp;rsquo;s relationship to &lt;a href=&#34;http://meta.wikimedia.org/wiki/Wikidata&#34;&gt;Wikidata&lt;/a&gt;, so I was happy to read that in &amp;ldquo;future versions, DBpedia will include more raw data provided by Wikidata and add services such as Linked Data/SPARQL endpoints, RDF dumps, linking and ontology mapping for Wikidata.&amp;rdquo;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;I had heard that DBpedia was one of the datasets used when IBM&amp;rsquo;s Watson system won the quiz show Jeopardy, but seeing it in this paper made it a little more official for me.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;In 2010, the DBpedia team replaced the PHP-based extraction framework with one written in Scala, the functional, object-oriented JVM language developed at the École Polytechnique Fédérale de Lausanne.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;I won&amp;rsquo;t summarize it here, but the paper includes information on usage of DBpedia by spoken language as well as the hardware in use and the maximum number and amount of requests allowed from a given IP address.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The conclusion of the paper says that it &amp;ldquo;demonstrated that DBpedia matured and improved significantly in the last years in particular also in terms of coverage, usability, and data quality.&amp;rdquo; I agree!&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2013">2013</category>
      
      <category domain="https://www.bobdc.com//categories/dbpedia">DBpedia</category>
      
    </item>
    
    <item>
      <title>Using VALUES to map values in a SPARQL query</title>
      <link>https://www.bobdc.com/blog/using-values-to-map-values-in/</link>
      <pubDate>Mon, 01 Jul 2013 19:07:37 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/using-values-to-map-values-in/</guid>
      
      
      <description><div>The VALUES keyword: even better than I thought.</div><div>&lt;p&gt;&lt;em&gt;Note: Ebook versions of the &amp;ldquo;raw, unedited&amp;rdquo; version of the &lt;a href=&#34;https://www.bobdc.com/blog/coming-soon-new-expanded-editi&#34;&gt;new expanded edition&lt;/a&gt; of my book &lt;a href=&#34;http://www.learningsparql.com&#34;&gt;Learning SPARQL&lt;/a&gt; are &lt;a href=&#34;http://shop.oreilly.com/product/0636920030829.do&#34;&gt;now available&lt;/a&gt; on O&amp;rsquo;Reilly&amp;rsquo;s website, and the cooked, edited version (not much different, really) should be available in all formats within a few days. While this edition adds coverage of the VALUES keyword, I came up with the example below too late to include it.)&lt;/em&gt;&lt;/p&gt;
&lt;blockquote id=&#34;id111923&#34; class=&#34;pullquote&#34;&gt;&#34;if I could just define a little mapping table...&#34;&lt;/blockquote&gt;
&lt;p&gt;I recently had to map a few values within a SPARQL query. I didn&amp;rsquo;t want to do a heavily nested IF() function, and thought &amp;ldquo;if I could just define a little mapping table&amp;hellip;&amp;rdquo; and then realized that I can, using a new SPARQL 1.1 keyword that I&amp;rsquo;ve &lt;a href=&#34;https://www.bobdc.com/blog/sparql-11s-new-values-keyword&#34;&gt;already written about&lt;/a&gt;: VALUES.&lt;/p&gt;
&lt;p&gt;Let&amp;rsquo;s say I want to output the names of the people in the following data, not with their associated airport codes, but with the names of those airports&amp;rsquo; cities.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;@prefix d:  &amp;lt;http://learningsparql.com/ns/data#&amp;gt; .
@prefix dm: &amp;lt;http://learningsparql.com/ns/demo#&amp;gt; .


d:i0432 dm:firstName &amp;quot;Richard&amp;quot; ;
        dm:airport &amp;quot;CHO&amp;quot; . 


d:i9771 dm:firstName &amp;quot;Cindy&amp;quot; ;
        dm:airport &amp;quot;RIC&amp;quot; . 


d:i8301 dm:firstName &amp;quot;Craig&amp;quot; ;
        dm:airport &amp;quot;LYH&amp;quot; . 
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;What I would really do is use triples associating airport codes with city names to drive the lookup, but storing the lookup information within the query gives some sense of how powerful the VALUES keyword can be for something like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;PREFIX dm: &amp;lt;http://learningsparql.com/ns/demo#&amp;gt; 


SELECT ?first ?city
WHERE {


  ?person dm:firstName ?first ;
          dm:airport ?airport . 


  VALUES (?airport ?city) {
    ( &amp;quot;CHO&amp;quot; &amp;quot;Charlottesville&amp;quot; )
    ( &amp;quot;RIC&amp;quot; &amp;quot;Richmond&amp;quot; )
    ( &amp;quot;LYH&amp;quot; &amp;quot;Lynchburg&amp;quot; )
  }


}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Running that query on the data above produces this result:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;---------------------------------
| first     | city              |
=================================
| &amp;quot;Richard&amp;quot; | &amp;quot;Charlottesville&amp;quot; |
| &amp;quot;Cindy&amp;quot;   | &amp;quot;Richmond&amp;quot;        |
| &amp;quot;Craig&amp;quot;   | &amp;quot;Lynchburg&amp;quot;       |
---------------------------------
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Each row in my VALUES table had only two values, and they were both strings, but you can have any number you like—with any types, including URIs. This makes for a lot of possibilities.&lt;/p&gt;
&lt;p&gt;So, the next time you&amp;rsquo;re thinking of adding a heavily nested IF() function to your SPARQL query to account for several possible values of something, consider using SPARQL&amp;rsquo;s &lt;a href=&#34;http://www.w3.org/TR/2012/WD-sparql11-query-20120724/&#34;&gt;newest&lt;/a&gt; keyword. A VALUES table is easier to create, use, and read.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2013">2013</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
    </item>
    
    <item>
      <title>Coming soon: new, expanded edition of &#34;Learning SPARQL&#34;</title>
      <link>https://www.bobdc.com/blog/coming-soon-new-expanded-editi/</link>
      <pubDate>Sun, 02 Jun 2013 19:44:41 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/coming-soon-new-expanded-editi/</guid>
      
      
      <description><div>55% more pages! 23% fewer mentions of the semantic web!</div><div>&lt;img id=&#34;id110795&#34; src=&#34;https://www.bobdc.com/img/main/ls2nded.gif&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;cover of Learning SPARQL&#39;s 2nd ed.&#34;/&gt;
&lt;p&gt;I&amp;rsquo;m very pleased to announce that O&amp;rsquo;Reilly will make the second, expanded edition of my book &lt;a href=&#34;http://www.learningsparql.com&#34;&gt;Learning SPARQL&lt;/a&gt; available sometime in late June or early July. The &lt;a href=&#34;http://shop.oreilly.com/product/0636920030829.do&#34;&gt;early release&lt;/a&gt; &amp;ldquo;raw and unedited&amp;rdquo; version should be available this week.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve updated the book to account for the final version of the SPARQL 1.1 specs, but the main additions are four new chapters:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;em&gt;Query Efficiency and Debugging&lt;/em&gt;: Things to keep in mind that can help your queries run more efficiently as you work with growing volumes of data.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;em&gt;Working with SPARQL Query Result Formats&lt;/em&gt;: How your applications can take advantage of the XML, JSON, CSV, and TSV formats defined by the W3C for SPARQL processors to return query results.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;em&gt;RDF Schema, OWL, and Inferencing&lt;/em&gt;: How SPARQL can take advantage of the metadata that RDF Schemas, OWL ontologies, and SPARQL rules can add to you data.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;em&gt;A SPARQL Cookbook&lt;/em&gt;: A set of SPARQL queries and update requests that can be useful in a wide variety of situations.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I&amp;rsquo;ve also expanded the Application Development chapter quite a bit.&lt;/p&gt;
&lt;p&gt;Preliminary reviewers have especially liked the cookbook chapter, and I learned a great deal researching, writing, and having the query efficiency chapter tech reviewed. I&amp;rsquo;m eager for others to see all the new chapters.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve also made some &lt;a href=&#34;http://oreilly.com/catalog/errata.csp?isbn=0636920020547&#34;&gt;corrections&lt;/a&gt;, improved the index, and many passive sentences were converted to the active voice (or rather, I converted many passive sentences&amp;hellip;).&lt;/p&gt;
&lt;p&gt;Having a lot more to it, the new edition will cost a little more, but if you bought an electronic version of the first edition, you can get the second edition in the same format for 40% off. At this week&amp;rsquo;s semtech conference in San Francisco, I&amp;rsquo;ll have some &lt;a href=&#34;http://us.moo.com/&#34;&gt;moo cards&lt;/a&gt; that give you 40% of the printed book or 50% off the ebook, so if you see me just ask for one.&lt;/p&gt;
&lt;p&gt;I joke about the book&amp;rsquo;s 23% reduction in mentions of the semantic web (and incremental reduction in mentions of &amp;ldquo;linked data&amp;rdquo;), as contrasted with the page count going up 53%, because of my &lt;a href=&#34;https://www.bobdc.com/blog/selling-rdf-technology-to-big&#34;&gt;recent belief&lt;/a&gt; that SPARQL and other RDF-related technologies can be sold on their own merits instead of being sold as the implementation of a vision that people must first buy into. Let people select the technology that they feel is best—even if it has a strange name like Hadoop or MongoDB or SPARQL—to implement the visionary buzzphrase that is getting their project funded, whether it&amp;rsquo;s &amp;ldquo;Big Data&amp;rdquo; or &amp;ldquo;Semantic Web&amp;rdquo; or whatever new buzzphrase will be hot two years from now and first noticed by Gartner two years after that. I think SPARQL and the associated standards have a huge amount to offer all of these new visions of ways to do more with data and metadata.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2013">2013</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
      <category domain="https://www.bobdc.com//categories/publishing">publishing</category>
      
    </item>
    
    <item>
      <title>A nineteenth-century linking application</title>
      <link>https://www.bobdc.com/blog/a-nineteenth-century-linking-a/</link>
      <pubDate>Wed, 01 May 2013 08:38:12 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/a-nineteenth-century-linking-a/</guid>
      
      
      <description><div>An encore presentation.</div><div>&lt;p&gt; 
&lt;a href=&#39;https://www.bobdc.com/img/main/AlabamaShepards.jpg&#39;&gt;&lt;img id=&#34;id115156&#34; src=&#34;https://www.bobdc.com/img/main/AlabamaShepards.jpg&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;Alabama Shepards&#34; width=&#34;200&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;From early 2003 to late 2005 I wrote a blog on oreillynet.com that I called &lt;a href=&#34;http://www.snee.com/xml/tal.html&#34;&gt;Thinking About Linking&lt;/a&gt;. The &lt;a href=&#34;http://www.oreillynet.com/xml/blog/2005/11/beyond_linking.html&#34;&gt;last entry&lt;/a&gt; summarizes what I covered and my experiences with that blog, but today I wanted to republish my favorite entry from that blog on the tenth anniversary of its original publication. It&amp;rsquo;s the same as the 2003 version except that I updated one link. On the right: a page from Shepard&amp;rsquo;s 1902 &amp;ldquo;Shepard&amp;rsquo;s Alabama Citations,&amp;rdquo; which I bought on ebay. (My comment below about link typing would certainly need updating now, given my experience with RDF.)&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Frank Shepard was a salesman for a Chicago legal publisher. Shortly after the American Civil War, he noticed that when one court case overruled, criticized, or otherwise cited another, lawyers often jotted a note about it in the margin of the reporter volume with the cited case&amp;rsquo;s text. For example, upon learning that the judge in the case known as “La Bourgogne” (210 U.S. 95) made a negative references to the “Moore v. American Transportation Company” (65 U.S. 1) case, a lawyer might turn to page 1 in volume 65 of the U.S. Supreme Court case reporter and write “210 U.S. 95, negative” in the margin next to the Moore case. This way, if if the Moore case ever came up in court, the lawyer would have a better idea of its exact value.&lt;/p&gt;
&lt;p&gt;Shepard had an idea: if he printed gummed labels for each case listing the cases that cited it, he could save the lawyers the trouble of writing in these references by hand. He built a business out of selling these inter-case links to the legal profession and named the company after himself: &lt;a href=&#34;http://www.lexisnexis.com/shepards/&#34;&gt;Shepard&amp;rsquo;s&lt;/a&gt;. (Full disclosure: since Reed Elsevier acquired Shepard&amp;rsquo;s in the mid-1990s, Shepard&amp;rsquo;s Citations has been a product of my employer, LexisNexis. Other than some occasional XSLT advice to the folks in Colorado Springs, where Shepard&amp;rsquo;s has been based since 1947, I don&amp;rsquo;t do any work on that particular product.) In one sense, the stickers they produced in 1873 were already more sophisticated than web links, because if more than one case had cited the same case, the sticker for that case added a one-to-many link to it.&lt;/p&gt;
&lt;p&gt;To help the lawyers quickly learn why one case had been cited by another, Shepard&amp;rsquo;s started including &lt;a href=&#34;http://www.informationliteracy.org/builder/view/1344/16060&#34;&gt;one-letter codes&lt;/a&gt; to show that the citing case had overruled, criticized, modified, or applied some other treatment to the cited case. Now their links had link types: indications about the nature of the links to give a clue about why they might be worth traversing.&lt;/p&gt;
&lt;p&gt;The stickers, or “Adhesive Annotations,” became very popular. While sitting on the Massachusetts Supreme Judicial Court, future United States Supreme Court Justice Oliver Wendell Holmes Jr. wrote “I regard Shepard&amp;rsquo;s Massachusetts Annotations as the most thorough labor-saving device that has even been brought to my attention. No one owning a set of reports can afford to be without one.”&lt;/p&gt;
&lt;p&gt;Before the nineteenth century came to a close, the company began producing alternatives to the sticker collections: bound books that listed, for each case, the cases that cited it and codes describing the citing case&amp;rsquo;s treatment. Today, we call this separation of the links from the linked resources “out-of-line links.”&lt;/p&gt;
&lt;p&gt;The books became so popular that their inventor&amp;rsquo;s last name became a verb. Any lawyer or law student knows that to &lt;a href=&#34;http://www.lectlaw.com/files/lwr17.htm&#34;&gt;Shepardize&lt;/a&gt; a case is to find out all relevant cases that cite it. Of course, automating the storage and lookup of these links is much easier with software, and it&amp;rsquo;s all online now. When you view a case using LexisNexis, clicking the “Shepardize” link displays a list of citing cases with links to the full text of those cases. This saves a lot of running around a law library, which was how the links were followed for the first century of their existence. (LexisNexis&amp;rsquo;s chief competitor, WestLaw, has a competing on-line product called KeyCite.)&lt;/p&gt;
&lt;p&gt;The success of Frank Shepard&amp;rsquo;s invention tells us several things about linking:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Link typing can add real value to a linking application.&lt;/strong&gt; If a lawyer who&amp;rsquo;s going to bring up a case in court Shepardizes it and sees only codes for positive treatment, there&amp;rsquo;s little need to look up the citing cases. If other cases criticized the case to be cited, however, it&amp;rsquo;s his job to find out why. (Too bad it&amp;rsquo;s &lt;a href=&#34;http://www.oreillynet.com/pub/wlg/3094&#34;&gt;so difficult&lt;/a&gt; to find other examples of link typing adding obvious value!)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Out-of-line links can sometimes be more useful than in-line links.&lt;/strong&gt; The web and other hypertext systems leading up to it have conditioned many to think of a link as something that connects the resource they&amp;rsquo;re looking at to a single other resource somewhere else, but links can be more than that. Shepard&amp;rsquo;s customers found that having all the citation links in a single set of books instead of as a set of stickers to be spread around hundreds of volumes can make the research go much more quickly, especially with the treatment codes added to the link identifiers to give clues about whether the links are worth traversing.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;It&amp;rsquo;s not about the technology, but about the information.&lt;/strong&gt; Just as a well-written song can work well when performed by different bands, a good linking application can still have value when implemented using different technologies.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2013">2013</category>
      
      <category domain="https://www.bobdc.com//categories/legal-publishing">legal publishing</category>
      
      <category domain="https://www.bobdc.com//categories/technology-past">technology, past</category>
      
    </item>
    
    <item>
      <title>Appreciating SPARQL property paths more</title>
      <link>https://www.bobdc.com/blog/appreciating-sparql-property-p/</link>
      <pubDate>Wed, 17 Apr 2013 08:46:59 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/appreciating-sparql-property-p/</guid>
      
      
      <description><div>More and more useful.</div><div>&lt;blockquote id=&#34;id140291&#34; class=&#34;pullquote&#34;&gt;I had been thinking of property paths as something that could slow down queries, and Paul&#39;s experience was that the property path version was more efficient.&lt;/blockquote&gt;
&lt;p&gt;I have &lt;a href=&#34;https://www.bobdc.com/blog/playing-more-with-sparql-11-pr&#34;&gt;played with&lt;/a&gt; SPARQL 1.1&amp;rsquo;s new property paths features and described them in &lt;a href=&#34;http://www.learningsparql.com/&#34;&gt;my book&lt;/a&gt;, and I&amp;rsquo;ve felt that I understood them for a while, but two recent occasions have helped me to appreciate them even more.&lt;/p&gt;
&lt;p&gt;First, to prepare for the talk I&amp;rsquo;m giving at the Semantic Technology &amp;amp; Business on &lt;a href=&#34;http://semtechbizsf2013.semanticweb.com/sessionPop.cfm?confid=70&amp;amp;proposalid=5096&#34;&gt;Enhancing Searches with Semantic Technology&lt;/a&gt;, at one point my demo app needed to find a SKOS concept that has either a skos:prefLabel or a skos:hiddenLabel value of a particular string. At first I thought I&amp;rsquo;d need a UNION query, like this,&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;PREFIX skos: &amp;lt;http://www.w3.org/2004/02/skos/core#&amp;gt;
SELECT ?c
WHERE {
 ?c a skos:Concept .
 {?c skos:prefLabel &amp;quot;motrin&amp;quot;@en }
 UNION
 {?c  skos:hiddenLabel &amp;quot;motrin&amp;quot;@en }
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;but then I realized that the alternative path operator could make it much terser: just two triple patterns in the query, with the second one&amp;rsquo;s predicate expression essentially saying &amp;ldquo;a predicate of &lt;code&gt;skos:prefLabel&lt;/code&gt; or of &lt;code&gt;skos:hiddenLabel&lt;/code&gt;&amp;rdquo;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;PREFIX skos: &amp;lt;http://www.w3.org/2004/02/skos/core#&amp;gt;
SELECT  ?c
WHERE {
 ?c a skos:Concept .
 ?c skos:prefLabel|skos:hiddenLabel &amp;quot;motrin&amp;quot;@en . 
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The second occasion for appreciating property paths more was reading the recent Paul Groth blog posting &lt;a href=&#34;http://thinklinks.wordpress.com/2013/04/03/5-heuristics-for-writing-better-sparql-queries/&#34;&gt;5 heuristics for writing better SPARQL queries&lt;/a&gt;, which recommended that we &amp;ldquo;use property paths to replace connected triple patterns where the object of one triple pattern is the subject of another.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;d seen examples of the XPath-like property paths, like the &lt;code&gt;foaf:knows/foaf:name&lt;/code&gt; one in the &lt;a href=&#34;http://www.w3.org/TR/sparql11-query/#propertypath-examples&#34;&gt;SPARQL 1.1 Query Recommendation&lt;/a&gt;, but I hadn&amp;rsquo;t realized their value for replacing triple patterns where the object of one triple pattern is the subject of another that has a different predicate, and I&amp;rsquo;ve written a lot of those. For example, to find the four-step connection between &lt;code&gt;d:a&lt;/code&gt; and &lt;code&gt;d:e&lt;/code&gt; in the following,&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;@prefix d:  &amp;lt;http://learningsparql.com/ns/data#&amp;gt; .
@prefix dm: &amp;lt;http://learningsparql.com/ns/demo#&amp;gt; .


d:a dm:prop1 d:b . 
d:b dm:prop2 d:c . 
d:c dm:prop3 d:d . 
d:d dm:prop4 d:e . 
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I would have written a SPARQL graph pattern that looked pretty much like the four triples that you see there, but with variables substituted for &lt;code&gt;d:b&lt;/code&gt;, &lt;code&gt;d:c&lt;/code&gt;, and &lt;code&gt;d:d&lt;/code&gt;. Paul&amp;rsquo;s blog entry made me realize that I could simply write this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;SELECT ?s ?o
WHERE
{ ?s dm:prop1/dm:prop2/dm:prop3/dm:prop4 ?o }
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;What makes this interesting is that I had been thinking of property paths as something that could slow down queries, and Paul&amp;rsquo;s experience was that the property path version was more efficient. Of course, I was generalizing too much—the property path &lt;code&gt;*&lt;/code&gt; and &lt;code&gt;+&lt;/code&gt; operators, while very handy, essentially say &amp;ldquo;and then keep looking for more,&amp;rdquo; which can really increase the search space and execution time. I suppose I was also still hearing the ringing in my ears of the alarm sounded by the paper &lt;a href=&#34;http://users.dcc.uchile.cl/~jperez/papers/www2012.pdf&#34;&gt;Counting Beyond a Yottabyte, or how SPARQL 1.1 Property Paths will Prevent Adoption of the Standard&lt;/a&gt; (pdf), but that too was focusing on a subset of property paths options unrelated to the path format that Paul was discussing. (After the release of that paper and before SPARQL 1.1&amp;rsquo;s ascent to Recommendation status, the SPARQL Working Group &lt;a href=&#34;http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2012Apr/0003.html&#34;&gt;did make adjustments&lt;/a&gt; to certain property path features to address the paper&amp;rsquo;s concerns.)&lt;/p&gt;
&lt;p&gt;In my formerly extensive use of XSLT, I never got to the point where I couldn&amp;rsquo;t picture being limited to XSLT 1.0, even though 2.0 became a Recommendation in 2007. (I know that Jeni Tennison got to that point about &lt;a href=&#34;http://www.jenitennison.com/blog/node/57&#34;&gt;about 2007&lt;/a&gt;, if not earlier.) Now that it&amp;rsquo;s been almost four weeks since the SPARQL 1.1 specs became Recommendations, I already have a difficult time being limited to SPARQL 1.0, which is still the case with some endpoints; there&amp;rsquo;s just so much great stuff in 1.1.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2013">2013</category>
      
    </item>
    
    <item>
      <title>In publishing? Listen to WFMU&#39;s &#34;Radio Free Culture&#34; podcast</title>
      <link>https://www.bobdc.com/blog/in-publishing-listen-to-wfmus/</link>
      <pubDate>Thu, 21 Mar 2013 09:04:05 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/in-publishing-listen-to-wfmus/</guid>
      
      
      <description><div>A new radio show (and podcast) has some great observations about the future of content creation and distribution.</div><div>&lt;img id=&#34;id159940&#34; src=&#34;https://www.bobdc.com/img/main/WFMU-logo-blog.jpg&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;some description&#34; width=&#34;200&#34;/&gt;
&lt;p&gt;People who listen to Jersey City freeform radio station &lt;a href=&#34;http://wfmu.org/&#34;&gt;WFMU&lt;/a&gt; tend to be a bit fanatical about it. The &lt;a href=&#34;http://en.wikipedia.org/wiki/WFMU&#34;&gt;Wikipedia page&lt;/a&gt; on the station quotes the New York Times referring to them as &amp;ldquo;a station whose name has become like a secret handshake among a certain tastemaking cognoscenti.&amp;rdquo; It&amp;rsquo;s not only because of the range of their musical eclecticism, which is an easy game for college and other non-profit radio stations to play; the depth of their commitment and their role in the music and art scenes of New York City and beyond has been impressive for over forty years.&lt;/p&gt;
&lt;p&gt;They have a new show called &lt;a href=&#34;http://wfmu.org/playlists/FC&#34;&gt;Radio Free Culture&lt;/a&gt; which is also available as a &lt;a href=&#34;http://wfmu.org/podcast/FC.xml&#34;&gt;podcast&lt;/a&gt;. Different FMU hosts from different shows take turns with this one, so it&amp;rsquo;s apparently on at different, unpredictable times, but the MP3s of past shows are all sitting there waiting for you. Some of the hosts are better than others, but as with the rest of the station, the unpredictability is part of the fun.&lt;/p&gt;
&lt;p&gt;The discussions are often about music, but not exclusively so, and still—the music industry has already been through stages that the movie and &amp;ldquo;print&amp;rdquo; publishing industries are only now sliding into, and distribution of files of content is distribution of files of content. Roles in creating and publicizing that content, the potential value of redistributors (for example, record companies or publishers), and especially issues about who pays who for what, at which stage of creation or distribution, are topics that the podcast returns to regularly.&lt;/p&gt;
&lt;p&gt;The most recent show, on &lt;a href=&#34;http://wfmu.org/playlists/shows/49593&#34;&gt;February 25th&lt;/a&gt;, interviewed several people involved with the Future of Music Coalition&amp;rsquo;s &lt;a href=&#34;http://money.futureofmusic.org/&#34;&gt;Artist Revenue Streams&lt;/a&gt; project, which gathered data on how musicians make money today, with statistics about how 25 categories of musicians make money from 48 potential income streams. Of course, the number and relative importance of the different categories and streams has evolved over time, and the reaction to the FMC&amp;rsquo;s work shows that there hasn&amp;rsquo;t been much serious data gathering in this area before, because a lot of organizations are very interested in using their work.&lt;/p&gt;
&lt;p&gt;The &lt;a href=&#34;http://wfmu.org/playlists/shows/48835&#34;&gt;December 31st&lt;/a&gt; show has a fascinating interview with MIT PhD candidate &lt;a href=&#34;http://mako.cc/&#34;&gt;Benjamin Mako Hill&lt;/a&gt; about the implications to our culture of the fact that the song &amp;ldquo;Happy Birthday to You&amp;rdquo; is copyrighted. Did you know that if a group of people sitting around a table in a restaurant or a bunch of kids in a summer camp sing this song without getting permission first, they are technically violating U.S. copyright laws? Warner Music Group collects literally millions of dollars every year from higher-profile performances of the song. (The second half of this particular show is people calling in to talk about their worst birthday ever, and I didn&amp;rsquo;t make it all the way through that.)&lt;/p&gt;
&lt;p&gt;The &lt;a href=&#34;http://wfmu.org/playlists/shows/48729&#34;&gt;December 24th&lt;/a&gt; show features talks from the FMU-sponsored Radiovision festival about &amp;ldquo;piracy&amp;rdquo; in its many meanings, with a particularly good talk by Anna Troberg, who once fought against content bootlegging but eventually became the leader of Sweden&amp;rsquo;s quite successful &lt;a href=&#34;http://en.wikipedia.org/wiki/Pirate_Party_(Sweden)&#34;&gt;Pirate political party&lt;/a&gt; after getting to understand their values better.&lt;/p&gt;
&lt;p&gt;The &lt;a href=&#34;http://wfmu.org/playlists/shows/47985&#34;&gt;October 29th&lt;/a&gt; discussion about live streaming public protests with independent journalist and video broadcaster Tim Pool, as well as several other more music-focused shows, often return to the issues of how new technology makes it easier to create and distribute content, but how larger infrastructures are necessary to build an audience for that content—infrastructures once only provided by traditional publishing companies but now involving social media networks as well. Of course, the roles, relationships, and relative need for publishers and social media networks are further fuel for discussion.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve know people involved in diverse aspects of many kinds of publishing, and I think that WFMU&amp;rsquo;s &amp;ldquo;Radio Free Culture&amp;rdquo; can teach all of us a lot about the range, direction, and magnitude of many of the current forces affecting how people create, distribute, pay for, and get paid for content now and in the future. (Further discussions of these topics as they relate to specific episodes are available at the show&amp;rsquo;s &lt;a href=&#34;http://freemusicarchive.org/tag/radio_free_culture/&#34;&gt;Free Music Archive&lt;/a&gt; page.)&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2013">2013</category>
      
      <category domain="https://www.bobdc.com//categories/publishing">publishing</category>
      
    </item>
    
    <item>
      <title>&#34;RDF and SPARQL&#34; article published in &#34;Big Data&#34; journal</title>
      <link>https://www.bobdc.com/blog/rdf-and-sparql-article-publish/</link>
      <pubDate>Fri, 15 Feb 2013 08:53:45 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/rdf-and-sparql-article-publish/</guid>
      
      
      <description><div>Or: RDF, SPARQL, and Big Data, part 3.</div><div>&lt;p&gt;&lt;a href=&#34;http://online.liebertpub.com/toc/big/1/1&#34;&gt;&lt;img id=&#34;id118426&#34; src=&#34;https://www.bobdc.com/img/main/bigdatacover.jpg&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;[Big Data cover]&#34; width=&#34;120&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;A few months ago here I wrote &lt;a href=&#34;https://www.bobdc.com/blog/sparql-and-big-data-and-nosql&#34;&gt;SPARQL and Big Data (and NoSQL): How to pursue the common ground?&lt;/a&gt; followed by &lt;a href=&#34;http://www.snee.com/bobdc.blog/2012/11/selling-rdf-technology-to-big.html&#34;&gt;Selling RDF technology to Big Data&lt;/a&gt;, where I put forth some theories about how to describe the value of RDF technology to people who may or may not have heard of it but were clearly interested in the hot buzz phrase &amp;ldquo;Big Data.&amp;rdquo; To practice what I preached—or perhaps to just preach to a new audience—I submitted an article to the new academic journal &lt;a href=&#34;http://www.liebertpub.com/overview/big-data/611/&#34;&gt;Big Data&lt;/a&gt;. They&amp;rsquo;ve just just published their &lt;a href=&#34;http://online.liebertpub.com/toc/big/1/1&#34;&gt;first issue&lt;/a&gt;, which includes my article &amp;ldquo;What Do RDF and SPARQL bring to Big Data Projects?&amp;rdquo; (&lt;a href=&#34;http://online.liebertpub.com/doi/pdfplus/10.1089/big.2012.0004&#34;&gt;pdf&lt;/a&gt;). The same issue also has interesting articles journal editor Edd Dumbill and the first of what will be a regular column by Jim Hendler; I haven&amp;rsquo;t checked out the other articles yet.&lt;/p&gt;
&lt;p&gt;My article provides a basic introduction to RDF and SPARQL, playing up the RDF support by IBM&amp;rsquo;s DB2, Oracle&amp;rsquo;s Spatial product, and Cray&amp;rsquo;s uRiKA, because these companies are well-known brand names in the world of large-scale data processing. And, except for a reference to the Linked Open Data Cloud and the possibility of private linked data clouds, there is no mention of Linked Data and no use of the phrase &amp;ldquo;semantic web,&amp;rdquo; in keeping with the my ideas described in &amp;ldquo;Selling RDF technology to Big Data.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;The paperwork I filled out on the way to having the article published led me to believe that this would be one of those expensive, tightly-controlled academic journals, but it looks like it&amp;rsquo;s being published with a Creative Commons CC-BY license, which was great to see. I look forward to their future issues.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2013">2013</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
    </item>
    
    <item>
      <title>Finding Europeana audio with SPARQL</title>
      <link>https://www.bobdc.com/blog/finding-europeana-audio-with-s/</link>
      <pubDate>Sun, 13 Jan 2013 11:06:26 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/finding-europeana-audio-with-s/</guid>
      
      
      <description><div>And video!</div><div>&lt;blockquote id=&#34;id115175&#34; class=&#34;pullquote&#34;&gt;As a SPARQL geek&#39;s alternative to YouTube, the 166,872 video resources with a `edm:type` value of &#34;VIDEO&#34; look like a tempting way to kill some time.&lt;/blockquote&gt;
&lt;p&gt;When I first heard about the &lt;a href=&#34;http://europeana.ontotext.com/&#34;&gt;SPARQL endpoint&lt;/a&gt; for the &lt;a href=&#34;http://www.europeana.eu/portal/&#34;&gt;Europeana&lt;/a&gt; aggregation of data about European cultural artifacts, the first example I heard about was an MP3 audio file of a &lt;a href=&#34;http://www.europeana.eu/portal/record/92056/BD9D5C6C6B02248F187238E9D7CC09EAF17BEA59.html&#34;&gt;Slovenian version of O sole mio&lt;/a&gt;. I happened to be in the middle of packing for a family visit over Christmas and immediately &lt;a href=&#34;https://twitter.com/bobdc/status/282920371085271040&#34;&gt;tweeted&lt;/a&gt; &amp;ldquo;Lots of holiday stuff to do, but the new Ontotext Europeana SPARQL endpoint points to MP3s! So tempting&amp;hellip;&amp;rdquo; This past Sunday morning I finally made some time to explore it more, and I found 6,219 audio files.&lt;/p&gt;
&lt;p&gt;The following query pulls down data about 100 of them (which 100 you pull depends on the OFFSET value), and &lt;a href=&#34;http://snee.com/bobdc.blog/files/EuropeanaSPARQL2HTML.xsl&#34;&gt;this XSLT stylesheet&lt;/a&gt; converts a SPARQL XML query result version of the results to a simple HTML file that shows the title, creator, and source of each one, with the title being a hypertext link to the audio file itself. Following some of these links, I found folk music, classical music, interviews, and plenty of Finnish spoken word material where I had no idea what they were saying.&lt;/p&gt;
&lt;p&gt;Here is the query itself:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;PREFIX edm: &amp;lt;http://www.europeana.eu/schemas/edm/&amp;gt;
PREFIX ore: &amp;lt;http://www.openarchives.org/ore/terms/&amp;gt;
PREFIX dc: &amp;lt;http://purl.org/dc/elements/1.1/&amp;gt; 


SELECT ?title ?mediaURL ?creator ?source WHERE {
  ?resource edm:type &amp;quot;SOUND&amp;quot; ;
            ore:proxyIn ?proxy ;
            dc:title ?title ;
            dc:creator ?creator ;
            dc:source ?source . 
  ?proxy edm:isShownBy ?mediaURL . 
 }
OFFSET 600
LIMIT 100
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;a href=&#34;http://europeana.ontotext.com/sparql?query=PREFIX+edm%3A+%3Chttp%3A%2F%2Fwww.europeana.eu%2Fschemas%2Fedm%2F%3E%0D%0APREFIX+ore%3A+%3Chttp%3A%2F%2Fwww.openarchives.org%2Fore%2Fterms%2F%3E%0D%0APREFIX+dc%3A+%3Chttp%3A%2F%2Fpurl.org%2Fdc%2Felements%2F1.1%2F%3E+%0D%0A%0D%0ASELECT+%3Ftitle+%3FmediaURL+%3Fcreator+%3Fsource+WHERE+%7B%0D%0A++%3Fresource+edm%3Atype+%22SOUND%22+%3B%0D%0A++++++++++++ore%3AproxyIn+%3Fproxy+%3B%0D%0A++++++++++++dc%3Atitle+%3Ftitle+%3B%0D%0A++++++++++++dc%3Acreator+%3Fcreator+%3B%0D%0A++++++++++++dc%3Asource+%3Fsource+.+%0D%0A++%3Fproxy+edm%3AisShownBy+%3FmediaURL+.+%0D%0A+%7D%0D%0AOFFSET+3000%0D%0ALIMIT+100&amp;amp;_implicit=false&amp;amp;implicit=true&amp;amp;_equivalent=false&amp;amp;equivalent=true&amp;amp;_form=%2Fsparql&#34;&gt;This link&lt;/a&gt; runs the query with an offset of 3000, and &lt;a href=&#34;http://snee.com/bobdc.blog/files/europeana.html&#34;&gt;this web page&lt;/a&gt; shows the result of running the stylesheet on the query results when run with an offset of 600 as above. As you&amp;rsquo;ll see and hear by following that page&amp;rsquo;s links, that batch seems to be mostly Norwegian folk music.&lt;/p&gt;
&lt;p&gt;A few notes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;As I mentioned in the tweet, it&amp;rsquo;s running Ontotext&amp;rsquo;s OWLIM triplestore. This made it the first large public endpoint that I&amp;rsquo;ve seen with SPARQL 1.1 support, which was great to see. I didn&amp;rsquo;t need any 1.1 features for the query above, but did for others on my way there—for example, to find out that there were 6,219 audio files.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;About half of the audio URLs had &amp;ldquo;mp3&amp;rdquo; at the end. When I tried some of the audio URLs that didn&amp;rsquo;t, they seemed to play audio just fine, but there may be some that don&amp;rsquo;t link to playable audio.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The proxy parts of the query deal with a level of indirection that was necessary because the site federates data from other sites. &lt;a href=&#34;http://pro.europeana.eu/edm-documentation&#34;&gt;Documentation&lt;/a&gt; of the data model is available (well, it isn&amp;rsquo;t the morning of January 13th, but Google has a cached copy), but I got to the query above by various hit-and-miss experiments starting with one that looked for resources whose names ended with &amp;ldquo;.mp3&amp;rdquo;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The &lt;a href=&#34;http://europeana.ontotext.com/sparql&#34;&gt;web-based front end&lt;/a&gt; to the Europeana SPARQL endpoint did some nice parentheses matching and color-coding of syntax as I entered queries. It doesn&amp;rsquo;t compare with TopBraid Composer&amp;rsquo;s SPARQL view, which has command completion and other IDE-oriented features, but it was impressive for a field on a web form.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;There is plenty more metadata available in addition to the title, creator, and source that my query requests for each resource; I encourage you to try variations on the query to explore it. Other possible &lt;code&gt;edm:type&lt;/code&gt; values are TEXT, IMAGE, VIDEO and 3D. (The two 3D resources were a 70-meg two-page PDF and a 59-meg eight-page one, each showing a church in Cyprus. Viewed with Adobe Reader, some of the images could be rotated, I think.)&lt;/p&gt;
&lt;p&gt;As a SPARQL geek&amp;rsquo;s alternative to YouTube, the 166,872 resources with an &lt;code&gt;edm:type&lt;/code&gt; value of &amp;ldquo;VIDEO&amp;rdquo; are a tempting way to kill some time. Just substitute &amp;ldquo;VIDEO&amp;rdquo; for &amp;ldquo;SOUND&amp;rdquo; in the query above and you&amp;rsquo;ll be off and running. (Don&amp;rsquo;t forget that LIMIT keyword, though—be polite and don&amp;rsquo;t ask for too much at once.)&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2013">2013</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
    </item>
    
    <item>
      <title>Normalizing company names with SPARQL and DBpedia</title>
      <link>https://www.bobdc.com/blog/normalizing-company-names-with/</link>
      <pubDate>Wed, 05 Dec 2012 07:46:10 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/normalizing-company-names-with/</guid>
      
      
      <description><div>Wikipedia page redirection data, waiting for you to query it.</div><div>&lt;p&gt;&lt;a href=&#34;http://ww2.odu.edu/ao/news/index.php?todo=details&amp;amp;id=23751&#34;&gt;&lt;img id=&#34;id143548&#34; src=&#34;https://www.bobdc.com/img/main/bigblue.jpg&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;[ODU mascot]&#34; width=&#34;160&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;If you send your browser to &lt;a href=&#34;http://en.wikipedia.org/wiki/Big_Blue&#34;&gt;http://en.wikipedia.org/wiki/Big_Blue&lt;/a&gt;, you&amp;rsquo;ll end up at IBM&amp;rsquo;s page, because Wikipedia knows that this nickname usually refers to this company. (&lt;a href=&#34;http://dbpedia.org/snorql/?query=SELECT+*+WHERE+%7B%0D%0A%3Fs+%3Fp+%22Big+Blue%22%40en+.+%0D%0A%7D&#34;&gt;Apparently&lt;/a&gt;, it&amp;rsquo;s also a nickname for several high schools and universities.) This data pointing from nicknames to official names is also stored in DBpedia, which means that we we can use SPARQL queries to normalize company names. You can use the same technique to normalize other kinds of names—for example, trying to send your browser to &lt;a href=&#34;http://en.wikipedia.org/wiki/Bobby_Kennedy&#34;&gt;http://en.wikipedia.org/wiki/Bobby_Kennedy&lt;/a&gt; will actually send it to &lt;a href=&#34;http://en.wikipedia.org/wiki/Robert_F._Kennedy&#34;&gt;http://en.wikipedia.org/wiki/Robert_F._Kennedy&lt;/a&gt;—but a query that sticks to one domain will have a simpler job. &lt;a href=&#34;https://www.bobdc.com/blog/the-dl-in-owl-dl&#34;&gt;Description Logics&lt;/a&gt; and all that.&lt;/p&gt;
&lt;p&gt;The query below can be run with any SPARQL client that supports 1.1. I wanted it to cover these three cases:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Run it with an unofficial company name such as Big Blue, Apple Computer, or Kodak, and it should return the official company name.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Run it with an official company name such as IBM, Apple, Inc., or Eastman Kodak, and it should return that name.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Run it with something that isn&amp;rsquo;t a company, such as Snee, and it shouldn&amp;rsquo;t return anything.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The query&amp;rsquo;s first BIND statement sets the name to check (including a language tag, because DBpedia is pretty consistent about using those) in the &lt;code&gt;?inputName&lt;/code&gt; variable, and the SERVICE keyword sends the bolded part of the query off to DBpedia&amp;rsquo;s SPARQL endpoint.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;PREFIX rdfs: &amp;lt;http://www.w3.org/2000/01/rdf-schema#&amp;gt;
PREFIX dbpo: &amp;lt;http://dbpedia.org/ontology/&amp;gt;
SELECT ?name 
WHERE {
  BIND(&amp;quot;Big Blue&amp;quot;@en AS ?inputName) 
  SERVICE &amp;lt;http://dbpedia.org/sparql&amp;gt; 
  {
    ?s rdfs:label ?inputName .
    {
      ?s dbpo:wikiPageRedirects ?actualResource .
      ?actualResource a dbpo:Company . 
      ?actualResource rdfs:label ?redirectsTo . 
      FILTER ( lang(?redirectsTo) = &amp;quot;en&amp;quot; )
    }
    UNION
    { ?s a dbpo:Company . }
  }
  BIND(STR(COALESCE(?redirectsTo,?inputName)) AS ?name)
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;After finding a resource (&lt;code&gt;?s&lt;/code&gt;) that has the bound value as an rdfs:label value, DBpedia returns the UNION of two graph patterns. The first checks whether this resource is supposed to redirect to another &lt;code&gt;dbpo:Company&lt;/code&gt; resource, and if so, stores the English &lt;code&gt;rdfs:label&lt;/code&gt; of that resource in the variable &lt;code&gt;?redirectsTo&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;If that graph pattern doesn&amp;rsquo;t return anything because &lt;code&gt;?s&lt;/code&gt; doesn&amp;rsquo;t have a &lt;code&gt;dbpo:wikiPageRedirects&lt;/code&gt; property, but DBpedia does know that it&amp;rsquo;s a &lt;code&gt;dbpo:Company&lt;/code&gt;, the graph pattern after the UNION keyword will match.&lt;/p&gt;
&lt;p&gt;After DBpedia returns any bound variables, the local client uses the COALESCE function to bind &lt;code&gt;?redirectsTo&lt;/code&gt; to the &lt;code&gt;?name&lt;/code&gt; variable if &lt;code&gt;?redirectsTo&lt;/code&gt; got bound, and otherwise binds &lt;code&gt;?inputName&lt;/code&gt; to it. (Because COALESCE is a new SPARQL 1.1 feature and DBPedia doesn&amp;rsquo;t support any of 1.1 that I know of yet, this part has to be done locally.) If nothing got bound, then there was no such company listed in DBpedia.&lt;/p&gt;
&lt;p&gt;I tested this with both ARQ and TopBraid Composer. With TBC (including the free version), it was fun to put the whole query into a SPIN function that I called normalizeCompanyName, so that I could make calls such as normalizeCompanyName(&amp;ldquo;Kodak&amp;rdquo;) or normalizeCompanyName(&amp;ldquo;Apple, Inc.&amp;rdquo;) in the middle of other SPARQL queries.&lt;/p&gt;
&lt;p&gt;It took me a lot of tweaking to get the query above to work the way I wanted to, and I wouldn&amp;rsquo;t be surprised if it can be improved at all. I&amp;rsquo;d love to hear any suggestions.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2012">2012</category>
      
      <category domain="https://www.bobdc.com//categories/dbpedia">DBpedia</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
    </item>
    
    <item>
      <title>Selling RDF technology to Big Data</title>
      <link>https://www.bobdc.com/blog/selling-rdf-technology-to-big/</link>
      <pubDate>Mon, 12 Nov 2012 08:51:59 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/selling-rdf-technology-to-big/</guid>
      
      
      <description><div>A clue: what we&#39;re selling is just that—RDF technology.</div><div>&lt;p&gt;I think I&amp;rsquo;ve figured it out. (This is a follow-up to my previous post &lt;a href=&#34;https://www.bobdc.com/blog/sparql-and-big-data-and-nosql&#34;&gt;SPARQL and Big Data (and NoSQL): How to pursue the common ground?&lt;/a&gt;) Here&amp;rsquo;s how to sell the Semantic Web and Linked Data visions to the Big Data folk: don&amp;rsquo;t. Sell them on RDF technology.&lt;/p&gt;
&lt;blockquote id=&#34;id104242&#34; class=&#34;pullquote&#34;&gt;Instead of telling these people about the Semantic Web or Linked Data visions, we should show them how we have technology that fulfills the vision that&#39;s apparently captured their imaginations.&lt;/blockquote&gt;
&lt;p&gt;The process of selling a set of technologies usually means selling a vision, getting people psyched about that vision, and then telling them about the technology that implements that vision. For RDF technology (by which I mean RDF, SPARQL, and optionally, RDFS and OWL), the vision for many years was the Semantic Web. Some people in that community eventually decided that an easier vision to sell was Linked Data. (Linked Data may not always include RDF technology—when Tim Berners-Lee added &amp;ldquo;(RDF*, SPARQL)&amp;rdquo; to his &lt;a href=&#34;http://www.w3.org/DesignIssues/LinkedData.html&#34;&gt;list of Linked Data principles&lt;/a&gt;, it became the &lt;a href=&#34;http://en.wikipedia.org/wiki/Filioque&#34;&gt;filioque&lt;/a&gt; controversy of the Linked Data community—but the boundaries of this or other sets of technologies I&amp;rsquo;m discussing are not the issue here. The point is, it&amp;rsquo;s very common to use the Linked Data vision to sell people on the value of using URIs, triples, and SPARQL together.)&lt;/p&gt;
&lt;p&gt;Big Data is itself a vision. Note how it&amp;rsquo;s spelled in initial caps, like &amp;ldquo;Semantic Web&amp;rdquo; and &amp;ldquo;Linked Data,&amp;rdquo; and features prominently in sales pitches from large and small system vendors. The 166-page IBM educational/marketing PDF &amp;ldquo;Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data&amp;rdquo; (available &lt;a href=&#34;http://www-01.ibm.com/software/info/rte/bdig/bdwa-7-post.html&#34;&gt;here&lt;/a&gt; with registration) is mostly about the Big Data vision: the issues, the common use cases that can now be handled, and in general, the possibilities. Instead of trying to sell Big Data people on one or two of our overlapping visions, we should be showing them the connections between our technology and the vision that they&amp;rsquo;re already sold on.&lt;/p&gt;
&lt;p&gt;Hadoop and NoSQL are currently the technologies being used to implement this vision. Hadoop is a software framework for certain kinds of distributed applications; its &lt;a href=&#34;http://en.wikipedia.org/wiki/MapReduce&#34;&gt;MapReduce&lt;/a&gt; algorithm is also implemented by several of the NoSQL database managers. &amp;ldquo;NoSQL&amp;rdquo; is a blanket term for a family of database management technologies that were developed independently of each other with no particular standards or organization to coordinate between them (other than not being SQL), so a new addition to the family is not going to look like some odd appendage to a seamless whole. In the book &lt;a href=&#34;http://www.amazon.com/exec/obidos/ISBN=1934356921/bobducharmeA/&#34;&gt;Seven Databases in Seven Weeks: A Guide to Modern Databases and the NoSQL Movement&lt;/a&gt; that I just read, the example database managers are PostgreSQL, Riak, HBase, MongoDB, CouchDB, Neo4J, and Redis. (While some people and organizations do work RDF into their NoSQL discussions, it&amp;rsquo;s not mentioned anywhere in this book.)&lt;/p&gt;
&lt;p&gt;Besides PostgreSQL, the other database managers covered by the book are all considered to be NoSQL systems. Each has techniques for addressing the Big Data vision, which Edd Dumbill, IBM and may others summarize by discussing the three Vs: Volume, Velocity, and Variety. For reasons described in my last blog entry, RDF technology is excellent for addressing all of the Variety issues, and reading &amp;ldquo;Seven Databases&amp;rdquo; has further convinced me of this. Velocity, and to some extent Volume (more on this one below), are issues for a platform to address, not a set of standards, so for that you need to talk to an RDF-related platform vendor such as &lt;a href=&#34;http://www.topquadrant.com&#34;&gt;TopQuadrant&lt;/a&gt;. (We&amp;rsquo;d be happy to discuss your requirements.)&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s a cliché in engineering-related sales that you have to focus on customer requirements. It&amp;rsquo;s also a sales cliché that talking about technical details will bore the people who write the checks. IBM and other such companies are putting big money into marketing Big Data solutions because they&amp;rsquo;ve found suit-wearing, check-writing managers who feel that their requirements line up with the promises of these solutions. Instead of telling these people about the Semantic Web or Linked Data visions, we should show them how we have (standardized!) technology that fulfills the vision that&amp;rsquo;s apparently captured their imaginations.&lt;/p&gt;
&lt;p&gt;Reading the &amp;ldquo;7 Databases&amp;rdquo; book, I realized that the &lt;a href=&#34;http://en.wikipedia.org/wiki/CAP_theorem&#34;&gt;CAP theorem&lt;/a&gt;, although based on some technical issues, is also part of the Big Data vision. If I understand it correctly, the basic idea is that database administrators have always wanted Consistency, Availability, and Partition tolerance in their databases, and that distributed databases can only do two of these at a time well, and that by deciding that you can work around subpar performance for one if you can get great performance with the other two, new possibilities emerge—possibilities that wouldn&amp;rsquo;t have occurred to earlier generations of database administrators who strained to optimize all three. For example, if you give up the need to have all data on all nodes be consistent with all the data on the other nodes all the time (and you include steps to have it eventually become consistent, just not all the time), you can get increased availability (as long as one server is running, the database will return something) and partition tolerance (loss of communication between nodes won&amp;rsquo;t affect the system).&lt;/p&gt;
&lt;p&gt;One point I didn&amp;rsquo;t make in my last posting is that the ease with which you can distribute and aggregate RDF triples in any combination gives you a lot more flexibility in how you implement your own two-out-of-three CAP theorem tradeoff. This should make it easier to store triples using one of the distributed NoSQL platforms; I don&amp;rsquo;t know of any definitive steps in this direction yet, but as I said before, Google searches show bits of work here and there.&lt;/p&gt;
&lt;p&gt;This potential good fit of triples to the new possibilities opened up by the CAP theorem hadn&amp;rsquo;t occurred to me when I wrote my last blog entry, but by further study of the vision associated with the hot new data processing goals, I found another connection between that vision and the technology that we RDF types are offering. Which is just what we should be doing: identifying connections between what our technology can do and what these customers need. Especially customers who are pumped up about the latest big technology vision.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2012">2012</category>
      
      <category domain="https://www.bobdc.com//categories/rdf">RDF</category>
      
    </item>
    
    <item>
      <title>SPARQL and Big Data (and NoSQL)</title>
      <link>https://www.bobdc.com/blog/sparql-and-big-data-and-nosql/</link>
      <pubDate>Wed, 24 Oct 2012 22:15:13 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/sparql-and-big-data-and-nosql/</guid>
      
      
      <description><div>How to pursue the common ground?</div><div>&lt;p&gt;I think it&amp;rsquo;s obvious that SPARQL and other RDF-related technologies have plenty to offer to the overlapping worlds of Big Data and NoSQL, but this doesn&amp;rsquo;t seem as obvious to people who focus on those areas. For example, the program for this week&amp;rsquo;s &lt;a href=&#34;http://strataconf.com/stratany2012/public/schedule/full/public&#34;&gt;Strata&lt;/a&gt; conference makes no mention of RDF or SPARQL. The more I look into it, the more I see that this flexible, standardized data model and query language align very well with what many of those people are trying to do.&lt;/p&gt;
&lt;blockquote id=&#34;id115166&#34; class=&#34;pullquote&#34;&gt;If there&#39;s just enough structure to get a toehold and build from there, your data is minimally structured.&lt;/blockquote&gt;
&lt;p&gt;But, we semantic web types can&amp;rsquo;t blame them for not noticing. If you build a better mouse trap, the world won&amp;rsquo;t necessarily beat a path to your door, because they have to find out about your mouse trap and what it does better. This requires marketing, which requires talking to those people in language that they understand, so I&amp;rsquo;ve been reading up on Big Data and NoSQL in order to better appreciate what they&amp;rsquo;re trying to do and how.&lt;/p&gt;
&lt;p&gt;A great place to start is the excellent (free!) booklet &lt;a href=&#34;http://shop.oreilly.com/product/0636920025559.do&#34;&gt;Planning for Big Data&lt;/a&gt; by &lt;a href=&#34;http://eddology.com/&#34;&gt;Edd Dumbill&lt;/a&gt;. (Others contributed a few chapters.) For a start, he describes data that &amp;ldquo;doesn&amp;rsquo;t fit the strictures of your database architectures&amp;rdquo; as a good candidate for Big Data approaches. That&amp;rsquo;s a good start for us. Here are a few longer quotes that I found interesting, starting with these two paragraphs from the section titled &amp;ldquo;Ingesting and Cleaning&amp;rdquo; after a discussion about collecting data from multiple different sources (something else that RDF and SPARQL are good at):&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Once the data is collected, it must be ingested. In traditional business intelligence (BI) parlance, this is known as Extract, Transform, and Load (ETL): the act of putting the right information into the correct tables of a database schema and manipulating certain fields to make them easier to work with.&lt;/p&gt;
&lt;p&gt;One of the distinguishing characteristics of big data, however, is that the data is often unstructured. That means we don’t know the inherent schema of the information before we start to analyze it. We may still transform the information — replacing an IP address with the name of a city, for example, or anonymizing certain fields with a one-way hash function — but we may hold onto the original data and only define its structure as we analyze it.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;With my long history as an XML guy (which is how I know Edd, the former editor of XML.com), I know that ideas about &amp;ldquo;structured&amp;rdquo; vs. &amp;ldquo;unstructured&amp;rdquo; data are very relativistic—one person&amp;rsquo;s structured data is another person&amp;rsquo;s unstructured data, especially if the first person is an XML guy and the latter is an RDBMS person—and that the term &amp;ldquo;semi-structured&amp;rdquo; becomes the compromise adjective. I&amp;rsquo;ll coin a new term that seems to get no relevant Google hits: &amp;ldquo;minimally structured&amp;rdquo;—if there&amp;rsquo;s just enough structure to get a toehold and build from there, your data is minimally structured. And, RDFS is excellent if we want to &amp;ldquo;define [data&amp;rsquo;s] structure as we analyze it&amp;rdquo;. This can be done very incrementally, and OWL can take you many increments further.&lt;/p&gt;
&lt;p&gt;Some of that minimal structure can be inferred and made explicit; for example, if you have data about people&amp;rsquo;s genders and and about who is the parent of who, you can infer father and mother relationships (and grandfather, and aunt, and&amp;hellip;) and even classes by defining a Grandfather class as the set of instances that have a gender of male and have children who have children. I might say that this is creating new information, but a relational database person would say that it&amp;rsquo;s not—it&amp;rsquo;s just making implicit information explicit. Relational database people put &lt;a href=&#34;http://en.wikipedia.org/wiki/Database_normalization&#34;&gt;a lot of effort&lt;/a&gt; into avoiding the explicit storage of information that can be otherwise inferred, but a relational database is a very closed world, so new possibilities of things to infer within a given set of data don&amp;rsquo;t come up often. Accumulation of RDF from multiple sources can be very dynamic, making it easier to create new wholes that are greater than the sum of their parts (made greater by this kind of inferencing) which opens up new possibilities for patterns to find in different combinations of data.&lt;/p&gt;
&lt;p&gt;Another quote from Edd&amp;rsquo;s book:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Even where there’s not a radical data type mismatch, a disadvantage of the relational database is the static nature of its schemas. In an agile, exploratory environment, the results of computations will evolve with the detection and extraction of more signals. Semi-structured NoSQL databases meet this need for flexibility: they provide enough structure to organize data, but do not require the exact schema of the data before storing it.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;So do triplestores, which give you the best of both worlds: with no need for a schema, you can accumulate data and query it using a standardized query language, and then if you want you can incrementally add schema metadata (often based on query results) to aid further queries.&lt;/p&gt;
&lt;p&gt;Another quote on this topic:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;NoSQL databases are frequently called “schemaless,” because they don’t have the formal schema associated with relational databases. The lack of a formal schema, which typically has to be designed before any code is written, means that schemaless databases are a better fit for current software development practices, such as agile development. Starting from the simplest thing that could possibly work and iterating quickly in response to customer input doesn’t fit well with designing an all-encompassing data schema at the start of the project. It’s impossible to predict how data will be used, or what additional data you’ll need as the project unfolds.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Again, all very easy with RDF-based technology, where in addition to the choices of &amp;ldquo;assemble a big schema before you start developing&amp;rdquo; and &amp;ldquo;just blow off schemas, because they impair flexibility&amp;rdquo; you can work with a middle ground of little bits of schema metadata added when you need them as you go along.&lt;/p&gt;
&lt;p&gt;From what I&amp;rsquo;ve heard of the various classes of NoSQL databases, graph-oriented ones like &lt;a href=&#34;http://neo4j.org/&#34;&gt;Neo4J&lt;/a&gt; sound the closest to triplestores, which are also storing &lt;a href=&#34;http://www.w3.org/TR/rdf-mt/#graphdefs&#34;&gt;graphs&lt;/a&gt;. This description of another class of NoSQL databases really caught my attention, though:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Cassandra and HBase are usually called column-oriented databases, though a better term is a “sparse row store.” In these databases, the equivalent to a relational “table” is a set of rows, identified by a key. Each row consists of an unlimited number of columns; columns are essentially keys that let you look up values in the row. Columns can be added at any time, and columns that are unused in a given row don’t occupy any storage. NULLs don’t exist.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This is the &amp;ldquo;equivalent to a relational &amp;rsquo;table&amp;rsquo;&amp;rdquo;? It sounds more like the equivalent to a set of triples grouped by subject. Properties (predicates) are essentially keys that let you look up values associated with subjects; you can add property name/value pairs to a subject at any time, because they don&amp;rsquo;t depend on some schema, and properties that aren&amp;rsquo;t used for a given resource don&amp;rsquo;t occupy any storage. (And NULLs don&amp;rsquo;t exist.)&lt;/p&gt;
&lt;p&gt;What I&amp;rsquo;d love to see, and have heard about tentative steps toward, would be SPARQL endpoints for some of these NoSQL database systems. The &lt;a href=&#34;http://d2rq.org/&#34;&gt;D2RQ&lt;/a&gt; and &lt;a href=&#34;http://www.w3.org/TR/r2rml/&#34;&gt;R2RML&lt;/a&gt; work have accomplished things that should be easier for graph-oriented NoSQL databases like Neo4J and, if I understand the quote above correctly, for column-oriented NoSQL databases as well. Google searches on SPARQL and either Hadoop, Neo4J, HBase, or Cassandra show that some people have been discussing and even doing a bit of coding on several of these. (In addition to the column- and graph-oriented NoSQL databases, another category is the &amp;ldquo;document-oriented&amp;rdquo; ones, so AllegroGraph&amp;rsquo;s &lt;a href=&#34;http://www.franz.com/agraph/support/documentation/4.7/mongo-interface.html&#34;&gt;interface to MongoDB&lt;/a&gt; is an excellent sign of progress in this direction.) What can we do to encourage more of this kind of interaction?&lt;/p&gt;
&lt;p&gt;I have a lot more research to do, so I just started reading Eric Redmond and Jim Wilson&amp;rsquo;s &lt;a href=&#34;http://www.amazon.com/exec/obidos/ISBN=1934356921/bobducharmeA/&#34;&gt;Seven Databases in Seven Weeks&lt;/a&gt;. I will report back on further ideas I have. Meanwhile I&amp;rsquo;d appreciate hearing anyone else&amp;rsquo;s opinions on how Big Data and NoSQL technology and standards-based semantic technology can better take advantage of what each other has to offer.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2012">2012</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
    </item>
    
    <item>
      <title>SPARQL 1.1&#39;s new VALUES keyword</title>
      <link>https://www.bobdc.com/blog/sparql-11s-new-values-keyword/</link>
      <pubDate>Sat, 29 Sep 2012 16:42:52 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/sparql-11s-new-values-keyword/</guid>
      
      
      <description><div>New ways to filter search results.</div><div>&lt;p&gt;&lt;a href=&#34;http://youtu.be/1ajUJNm2oFM&#34;&gt;&lt;img id=&#34;id82399&#34; src=&#34;https://www.bobdc.com/img/main/IggyNewValues.jpg&#34; border=&#34;0&#34; align=&#34;right&#34; hspace=&#34;30px&#34; vspace=&#34;15px&#34; alt=&#34;[Iggy New Values cover]&#34; width=&#34;160&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;SPARQL 1.1&amp;rsquo;s new BIND keyword lets you assign a value to a variable, and the &lt;a href=&#34;http://www.w3.org/TR/2012/WD-sparql11-query-20120724/&#34;&gt;even newer&lt;/a&gt; VALUES keyword lets you create tables of values, giving you new options when filtering query results. As the July 24th draft of the SPARQL query 1.1 spec (where the keyword first appeared) tells us, VALUES, &amp;ldquo;replaces and generalizes BINDINGS,&amp;rdquo; a new keyword from earlier drafts of the SPARQL 1.1 spec. The &lt;a href=&#34;https://repository.apache.org/content/groups/snapshots/org/apache/jena/apache-jena/2.7.4-SNAPSHOT/&#34;&gt;ARQ 2.7.4 snapshot&lt;/a&gt; supports the VALUES keyword, so I played with it a bit.&lt;/p&gt;
&lt;p&gt;The following query ignores any input you pass to it (make sure to pass some anyway if you&amp;rsquo;re using command line ARQ, which complains if you don&amp;rsquo;t include a &lt;code&gt;--data&lt;/code&gt; parameter) and demonstrates how you can create a table of values. This example populates the table with qnames and literal values, but you can use any kinds of RDF values you want:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;PREFIX dm: &amp;lt;http://learningsparql.com/ns/demo#&amp;gt;


SELECT * WHERE { 
     VALUES (?color ?direction) {
     ( dm:red  &amp;quot;north&amp;quot; )
     ( dm:blue  &amp;quot;west&amp;quot; )
  }
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Here&amp;rsquo;s the result:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;-----------------------
| color   | direction |
=======================
| dm:red  | &amp;quot;north&amp;quot;   |
| dm:blue | &amp;quot;west&amp;quot;    |
-----------------------
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This result isn&amp;rsquo;t particularly exciting, but it shows how simple it is to create a two-dimensional table in a SPARQL query. To see what VALUES can add to our queries, we&amp;rsquo;ll use the following dataset:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;@prefix e: &amp;lt;http://learningsparql.com/ns/expenses#&amp;gt; .
@prefix d: &amp;lt;http://learningsparql.com/ns/data#&amp;gt; .


d:m40392 e:description &amp;quot;breakfast&amp;quot; ;
         e:date &amp;quot;2011-10-14&amp;quot; ;
         e:amount 6.53 . 


d:m40393 e:description &amp;quot;lunch&amp;quot; ;
         e:date &amp;quot;2011-10-14&amp;quot; ;
         e:amount 11.13 . 


d:m40394 e:description &amp;quot;dinner&amp;quot; ;
         e:date &amp;quot;2011-10-14&amp;quot; ;
         e:amount 28.30 . 


d:m40395 e:description &amp;quot;breakfast&amp;quot; ;
         e:date &amp;quot;2011-10-15&amp;quot; ;
         e:amount 4.32 . 


d:m40396 e:description &amp;quot;lunch&amp;quot; ;
         e:date &amp;quot;2011-10-15&amp;quot; ;
         e:amount 9.45 . 


d:m40396 e:description &amp;quot;lunch&amp;quot; ;
         e:date &amp;quot;2011-10-15&amp;quot; ;
         e:amount 6.20 . 


d:m40397 e:description &amp;quot;dinner&amp;quot; ;
         e:date &amp;quot;2011-10-15&amp;quot; ;
         e:amount 31.45 . 


d:m40398 e:description &amp;quot;breakfast&amp;quot; ;
         e:date &amp;quot;2011-10-16&amp;quot; ;
         e:amount 6.65 . 


d:m40399 e:description &amp;quot;lunch&amp;quot; ;
         e:date &amp;quot;2011-10-16&amp;quot; ;
         e:amount 10.00 . 


d:m40400 e:description &amp;quot;dinner&amp;quot; ;
         e:date &amp;quot;2011-10-16&amp;quot; ;
         e:amount 25.05 . 
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;As a baseline, we&amp;rsquo;ll start with a simple query that asks for the values of all the dataset&amp;rsquo;s properties without using the VALUES keyword:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;# filename: values1.rq


PREFIX e: &amp;lt;http://learningsparql.com/ns/expenses#&amp;gt; 


SELECT ?description ?date ?amount
WHERE
{
  ?meal e:description ?description ;
        e:date ?date ;
        e:amount ?amount . 
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;When run with the dataset above, this query lists all the description, date, and amount values:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;---------------------------------------
| description | date         | amount |
=======================================
| &amp;quot;dinner&amp;quot;    | &amp;quot;2011-10-16&amp;quot; | 25.05  |
| &amp;quot;lunch&amp;quot;     | &amp;quot;2011-10-16&amp;quot; | 10.00  |
| &amp;quot;breakfast&amp;quot; | &amp;quot;2011-10-16&amp;quot; | 6.65   |
| &amp;quot;dinner&amp;quot;    | &amp;quot;2011-10-15&amp;quot; | 31.45  |
| &amp;quot;lunch&amp;quot;     | &amp;quot;2011-10-15&amp;quot; | 6.20   |
| &amp;quot;lunch&amp;quot;     | &amp;quot;2011-10-15&amp;quot; | 9.45   |
| &amp;quot;breakfast&amp;quot; | &amp;quot;2011-10-15&amp;quot; | 4.32   |
| &amp;quot;dinner&amp;quot;    | &amp;quot;2011-10-14&amp;quot; | 28.30  |
| &amp;quot;lunch&amp;quot;     | &amp;quot;2011-10-14&amp;quot; | 11.13  |
| &amp;quot;breakfast&amp;quot; | &amp;quot;2011-10-14&amp;quot; | 6.53   |
---------------------------------------
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This next version of the query adds a VALUES clause saying that we&amp;rsquo;re only interested in results that have &amp;ldquo;lunch&amp;rdquo; or &amp;ldquo;dinner&amp;rdquo; in the &lt;code&gt;?description&lt;/code&gt; value:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;# filename: values2.rq


PREFIX e: &amp;lt;http://learningsparql.com/ns/expenses#&amp;gt; 


SELECT ?description ?date ?amount
WHERE
{
  ?meal e:description ?description ;
        e:date ?date ;
        e:amount ?amount . 
  VALUES ?description { &amp;quot;lunch&amp;quot; &amp;quot;dinner&amp;quot; }
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;(Note that, in this case, the VALUES data structure being created is one dimensional, not two; this is still a step up from the BIND keyword&amp;rsquo;s ability to only assign a single value to a variable at a time.) With the same meal expense data, this new query&amp;rsquo;s output is similar to the output of the preceding one without the &amp;ldquo;breakfast&amp;rdquo; result rows:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;---------------------------------------
| description | date         | amount |
=======================================
| &amp;quot;lunch&amp;quot;     | &amp;quot;2011-10-16&amp;quot; | 10.00  |
| &amp;quot;lunch&amp;quot;     | &amp;quot;2011-10-15&amp;quot; | 6.20   |
| &amp;quot;lunch&amp;quot;     | &amp;quot;2011-10-15&amp;quot; | 9.45   |
| &amp;quot;lunch&amp;quot;     | &amp;quot;2011-10-14&amp;quot; | 11.13  |
| &amp;quot;dinner&amp;quot;    | &amp;quot;2011-10-16&amp;quot; | 25.05  |
| &amp;quot;dinner&amp;quot;    | &amp;quot;2011-10-15&amp;quot; | 31.45  |
| &amp;quot;dinner&amp;quot;    | &amp;quot;2011-10-14&amp;quot; | 28.30  |
---------------------------------------
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This query&amp;rsquo;s VALUES clause could go after the SELECT clause&amp;rsquo;s closing curly brace, instead of before it, and it wouldn&amp;rsquo;t affect the results. (This won&amp;rsquo;t always be the case with the VALUES clause in GROUP BY and federated queries.)&lt;/p&gt;
&lt;p&gt;This next query of the same data creates a two-dimensional table to use for filtering output results:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;# filename: values3.rq


PREFIX e: &amp;lt;http://learningsparql.com/ns/expenses#&amp;gt; 


SELECT ?description ?date ?amount
WHERE
{
  ?meal e:description ?description ;
        e:date ?date ;
        e:amount ?amount . 


  VALUES (?date ?description) {
         (&amp;quot;2011-10-15&amp;quot; &amp;quot;lunch&amp;quot;) 
         (&amp;quot;2011-10-16&amp;quot; &amp;quot;dinner&amp;quot;)
  } 


}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;After retrieving all the meal data, this query only passes along the results that have either a &lt;code&gt;?date&lt;/code&gt; value of &amp;ldquo;2011-10-15&amp;rdquo; and a &lt;code&gt;?description&lt;/code&gt; value of &amp;ldquo;lunch&amp;rdquo; or a &lt;code&gt;?date&lt;/code&gt; value of &amp;ldquo;2011-10-16&amp;rdquo; and a &lt;code&gt;?description&lt;/code&gt; value of &amp;ldquo;dinner&amp;rdquo;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;---------------------------------------
| description | date         | amount |
=======================================
| &amp;quot;lunch&amp;quot;     | &amp;quot;2011-10-15&amp;quot; | 6.20   |
| &amp;quot;lunch&amp;quot;     | &amp;quot;2011-10-15&amp;quot; | 9.45   |
| &amp;quot;dinner&amp;quot;    | &amp;quot;2011-10-16&amp;quot; | 25.05  |
---------------------------------------
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;(It looks like someone had two lunches on October 15th.)&lt;/p&gt;
&lt;p&gt;When you use VALUES to create a data table, you don&amp;rsquo;t have to assign a value to every position. The UNDEF keyword acts as a wildcard, accepting any value that may come up there. The following variation on the preceding query asks for any result rows with &amp;ldquo;lunch&amp;rdquo; as the &lt;code&gt;?description&lt;/code&gt; value, regardless of the &lt;code&gt;?date&lt;/code&gt; value, and also for any result rows with a &lt;code&gt;?date&lt;/code&gt; value of &amp;ldquo;2011-10-16&amp;rdquo;, regardless of the &lt;code&gt;?description&lt;/code&gt; value:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;# filename: values4.rq


PREFIX e: &amp;lt;http://learningsparql.com/ns/expenses#&amp;gt; 


SELECT ?description ?date ?amount
WHERE
{
  ?meal e:description ?description ;
        e:date ?date ;
        e:amount ?amount . 


  VALUES (?date ?description) {
         (UNDEF &amp;quot;lunch&amp;quot;) 
         (&amp;quot;2011-10-16&amp;quot; UNDEF) 
  }


}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The output of this query has more rows than the previous query:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;---------------------------------------
| description | date         | amount |
=======================================
| &amp;quot;lunch&amp;quot;     | &amp;quot;2011-10-16&amp;quot; | 10.00  |
| &amp;quot;lunch&amp;quot;     | &amp;quot;2011-10-15&amp;quot; | 6.20   |
| &amp;quot;lunch&amp;quot;     | &amp;quot;2011-10-15&amp;quot; | 9.45   |
| &amp;quot;lunch&amp;quot;     | &amp;quot;2011-10-14&amp;quot; | 11.13  |
| &amp;quot;dinner&amp;quot;    | &amp;quot;2011-10-16&amp;quot; | 25.05  |
| &amp;quot;lunch&amp;quot;     | &amp;quot;2011-10-16&amp;quot; | 10.00  |
| &amp;quot;breakfast&amp;quot; | &amp;quot;2011-10-16&amp;quot; | 6.65   |
---------------------------------------
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;When you saw the descriptions of what each of these queries did, it may have occurred to you that all of these query conditions could have been specified without the VALUES keyword (for example, with a FILTER IN clause in the values2.rq query, although that would only work to replace a one-dimensional VALUES setting). That&amp;rsquo;s true, but I was using a small amount of data to demonstrate different ways to use the new keyword. When you work with larger amounts of data and especially with more complex filtering conditions, VALUES offers an extra layer of result filtering that can give you more control over your final search results with very little extra code in your query.&lt;/p&gt;
&lt;p&gt;(Thanks to Andy Seaborne for reviewing this befor publication.)&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2012">2012</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
    </item>
    
    <item>
      <title>IBM&#39;s DB2 as a triplestore</title>
      <link>https://www.bobdc.com/blog/ibms-db2-as-a-triplestore/</link>
      <pubDate>Wed, 29 Aug 2012 19:45:00 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/ibms-db2-as-a-triplestore/</guid>
      
      
      <description><div>Surprisingly easy to set up and use, but requiring lots of Java coding for any real application development.</div><div>&lt;p&gt;I thought it was pretty big news for the semantic web world when IBM announced that release 10.1 of their venerable &lt;a href=&#34;http://www-01.ibm.com/software/data/db2/&#34;&gt;DB2&lt;/a&gt; database manager could function as an RDF triplestore, but it seems that few others—not even, apparently, IBM staff responsible for marketing semantic technology—agreed with me. More on this below.&lt;/p&gt;
&lt;img id=&#34;id104266&#34; src=&#34;https://www.bobdc.com/img/main/db2rdf.png&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;RDF on DB2&#34; width=&#34;180&#34;/&gt;
&lt;p&gt;IBM &lt;a href=&#34;http://en.wikipedia.org/wiki/Edgar_F._Codd&#34;&gt;invented relational databases&lt;/a&gt;, and DB2 has been their main relational database product for almost twenty years. It runs on mainframes, PCs, Linux, the iSeries (descendants of the AS/400) and other platforms. (Although DB2 has also worked as an XML repository since 2006, with support for XQuery and XPath, I have not been aware of any shops using it for that instead of, say &lt;a href=&#34;http://www.marklogic.com/&#34;&gt;MarkLogic&lt;/a&gt; or &lt;a href=&#34;http://www.existsolutions.com/&#34;&gt;eXist&lt;/a&gt;. I assume it&amp;rsquo;s used for more transaction-oriented XML as opposed to content for publishing.) In addition to functioning as a triplestore, DB2 10.1 supports SPARQL 1.0 and &lt;a href=&#34;http://pic.dhe.ibm.com/infocenter/db2luw/v10r1/topic/com.ibm.swg.im.dbclient.rdf.doc/doc/c0060566.html&#34;&gt;a few of the more SQL-friendly features&lt;/a&gt; of SPARQL 1.1.&lt;/p&gt;
&lt;p&gt;I found the &lt;a href=&#34;http://www-01.ibm.com/software/data/db2/express/download.html&#34;&gt;free version of DB2&lt;/a&gt; for Windows to be fairly easy to download and install. I didn&amp;rsquo;t have to do anything special to get my downloaded copy to support RDF; after I finished the default installation, my hard disk had a &lt;code&gt;\Program Files\IBM\SQLLIB\rdf&lt;/code&gt; directory with a &lt;code&gt;lib&lt;/code&gt; subdirectory full of jar files and a set of batch files that call the jar files in a &lt;code&gt;bin&lt;/code&gt; subdirectory.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://pic.dhe.ibm.com/infocenter/db2luw/v10r1/topic/com.ibm.swg.im.dbclient.rdf.doc/doc/c0059661.html&#34;&gt;RDF application development for IBM data servers&lt;/a&gt; appears to be the main documentation page for DB2&amp;rsquo;s RDF support, but I used the developerWorks tutorial &lt;a href=&#34;http://www.ibm.com/developerworks/data/tutorials/dm-1205rdfdb210/section3.html&#34;&gt;Resource description framework application development in DB2 10 for Linux, UNIX, and Windows&lt;/a&gt; as my guide to getting started—in particular, to find out about the Jena and ARQ jar files to add to the &lt;code&gt;rdf/lib&lt;/code&gt; directory to make everything work properly.&lt;/p&gt;
&lt;p&gt;The tutorial has you using &amp;ldquo;IBM Data Studio&amp;rdquo;, their Eclipse-based DB2 administration interface, after you finish your initial setup, and I couldn&amp;rsquo;t get certain menu choices described by the article to show up in the copy of Data Studio that I downloaded, but with some generous email help from the article&amp;rsquo;s lead author, Mario Briggs, I managed to ultimately do everything I wanted to without Data Studio.&lt;/p&gt;
&lt;p&gt;(The developerWorks article is actually just Part 1, and I look forward to Part 2. Remember, though, that the article is more oriented toward explaining RDF to DB2 users than vice versa, and it also assumes that your main use of DB2&amp;rsquo;s RDF storage will be from Java code that you write yourself. I limited myself to the batch files in the &lt;code&gt;bin&lt;/code&gt; directory and two that Mario sent, and did manage to load and query some data.)&lt;/p&gt;
&lt;p&gt;The &amp;ldquo;Prerequisites for creating RDF stores&amp;rdquo; section of the tutorial article lists some very technical setup details to perform, but step 2 after that describes a script that takes care of these steps for you—for example, by creating the DB2 database RDFSAMPL that each of my command line examples below refer to. (Note that the script is called dbsetup.sql, not setup.sql, as the article currently says. Also, in Windows 7, you can&amp;rsquo;t do this in just any command line window, but must do it from one opened by right-clicking a command line window icon and picking &amp;ldquo;Run as administrator&amp;rdquo;.) That was not the first time that I did something specified by the article, saw that it didn&amp;rsquo;t work, and then read in the paragraphs after that about changes to make to the displayed command to make it work with my configuration. So, if you get stuck in the tutorial, read ahead a little before you get too frustrated.&lt;/p&gt;
&lt;p&gt;If you run a batch file from &lt;code&gt;\Program Files\IBM\SQLLIB\rdf\bin&lt;/code&gt; with no parameters, it displays help about the available parameters, so that will tell you more details about the steps that I executed below.&lt;/p&gt;
&lt;p&gt;Once I had the RDFSAMPL database defined using the dbsetup.sql script, running the following command from the &lt;code&gt;bin&lt;/code&gt; directory mentioned above created an RDF store in RDFSAMPL named myrdfstore (I had set the password values when I first installed DB2):&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;createrdfstore myrdfstore -db RDFSAMPL -user db2admin -password mydb2password
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;bin&lt;/code&gt; directory includes a createrdfstoreandloader.bat batch file to create and load data at once, but I usually used the loadrdfstore.bat batch file (available &lt;a href=&#34;http://snee.com/bobdc.blog/files/loadrdfstore.bat.txt&#34;&gt;here&lt;/a&gt; with &amp;ldquo;.txt&amp;rdquo; added to the filename for easier downloading) that Mario sent me. For example, this next command loaded some data into that RDF store and gave a report about how many triples were loaded:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;loadrdfstore myrdfstore -db RDFSAMPL -user db2admin -password mydb2password \temp\ex029.rdf
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Right now, DB2 can load RDF/XML and ntriples files, but not Turtle. As far as I can tell, without custom Java coding there is currently no way to add triples to an RDF store that already has triples in it or to add triples to named graphs. See the &lt;a href=&#34;http://pic.dhe.ibm.com/infocenter/db2luw/v10r1/index.jsp?topic=%2Fcom.ibm.swg.im.dbclient.rdf.doc%2Fdoc%2Ft0059630.html&#34;&gt;documentation&lt;/a&gt; for more on the relevant Java libraries and calls.&lt;/p&gt;
&lt;p&gt;Another short yet crucial batch file that Mario sent me was queryrdfstore (available &lt;a href=&#34;http://snee.com/bobdc.blog/files/queryrdfstore.bat.txt&#34;&gt;here&lt;/a&gt;). This next command uses it to run the query shown and displays the results along with a count of the milliseconds it took:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;queryrdfstore myrdfstore -db RDFSAMPL -user db2admin -password mydb2password &amp;quot;SELECT DISTINCT ?p WHERE { ?s ?p ?o }&amp;quot;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;(Keep in mind that the files that Mario sent me may not work with future versions of DB2&amp;rsquo;s RDF support; that&amp;rsquo;s why they were left out of the basic distribution. I&amp;rsquo;m sure they&amp;rsquo;ll have some sort of equivalent.) Instead of a quoted query, you can supply the name of a file with the SPARQL query stored in it:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;queryrdfstore myrdfstore -db RDFSAMPL -user db2admin -password mydb2password myquery1.rq
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;For now it looks like IBM isn&amp;rsquo;t that interested in selling DB2 and its RDF triplestore features to the semantic web crowd. For example, shortly before the big &lt;a href=&#34;http://semtechbizsf2012.semanticweb.com/&#34;&gt;Semantic Technologies Conference&lt;/a&gt; last June in San Francisco, semanticweb.com&amp;rsquo;s Eric Franzon interviewed IBM Director of Strategy and Marketing for Database Software and Systems Bernie Spang in an article titled &lt;a href=&#34;http://semanticweb.com/rdf-support-in-ibms-db2_b28098&#34;&gt;RDF Support in IBM’s DB2&lt;/a&gt;. Spang talked more in big picture terms, which is his job, and the article concludes by pointing out that IBM is a Gold Sponsor of the San Francisco conference. However, when I went to the IBM booth at the conference to ask about the RDF triplestore support in DB2, the two guys in the booth were genuinely surprised to hear that this had been added to DB2. (They were there to sell IBM&amp;rsquo;s &lt;a href=&#34;http://www-01.ibm.com/software/ecm/&#34;&gt;Enterprise Content Management&lt;/a&gt; product.) They did give me some excellent wind-up IBM robots, though.&lt;/p&gt;
&lt;p&gt;When I see a title of &amp;ldquo;&lt;a href=&#34;https://www.ibm.com/developerworks/mydeveloperworks/blogs/nlp/entry/db2_rdf_nosql_graph_support13?lang=en&#34;&gt;DB2-RDF (NoSQL Graph) Support in DB2 LUW 10.1&lt;/a&gt;&amp;rdquo; on another page on the developerWorks site, I can better see the logic of IBM&amp;rsquo;s approach: they&amp;rsquo;re saying &amp;ldquo;Hey, we can do NoSQL&amp;rdquo;, a message that can appeal to a bigger audience than a marketing effort focused on us semantic web geeks, especially when you consider the huge base of existing DB2 users who are wondering about the new database technologies getting the most buzz lately.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;m still very happy that IBM chose to go with a W3C standards-based approach to supporting NoSQL graph databases. I especially appreciate this direction because a lot of the NoSQL crowd seems unaware of what RDF and SPARQL technology can offer them. (Why, and what can we do about it? That&amp;rsquo;s another blog entry, but feel free to add comments here with your own theories.) I just think it&amp;rsquo;s great that I can store and query RDF on my laptop using one of the most respected database management packages without spending a dime, and that if I really want to scale up, I can do it with the same software on an IBM mainframe.&lt;/p&gt;
&lt;img id=&#34;id106749&#34; src=&#34;https://www.bobdc.com/img/main/ibmrobots.png&#34; border=&#34;0&#34; style=&#34;display: block;margin-left: auto;margin-right: auto&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;windup IBM schwag robots&#34; width=&#34;320&#34;/&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2012">2012</category>
      
      <category domain="https://www.bobdc.com//categories/triplestores">triplestores</category>
      
    </item>
    
    <item>
      <title>Properties</title>
      <link>https://www.bobdc.com/blog/properties/</link>
      <pubDate>Tue, 31 Jul 2012 09:05:39 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/properties/</guid>
      
      
      <description><div>Children&#39;s edition.</div><div>&lt;p&gt;Going through some old files, I found a homework assignment that my younger daughter did seven or eight years ago. When doing RDF-related data modeling you put a lot of thought into properties, and I remember getting a kick out of this introduction to the concept when she brought it home.&lt;/p&gt;
&lt;img id=&#34;id129378&#34; src=&#34;https://www.bobdc.com/img/main/properties.jpg&#34; border=&#34;0&#34; style=&#34;display: block;margin-left: auto;margin-right: auto&#34; alt=&#34;Properties homework&#34;/&gt;
&lt;p&gt;The smiley face shows that on a later page she did well on the worksheet that evaluated how well she understood this. By now, I think she would understand &lt;a href=&#34;http://www.w3.org/TR/owl-ref/#Property&#34;&gt;datatype vs. object properties&lt;/a&gt; if she was interested, but so far she&amp;rsquo;s not.&lt;/p&gt;
&lt;p&gt;The second page, not shown here, is titled &amp;ldquo;Properties Can Change,&amp;rdquo; a topic that continues to vex data modelers. In my own future data modeling, in addition to asking myself things like &amp;ldquo;is there a popular subproperty of rdfs:label that I should be using here?&amp;rdquo; I will also make a point of asking myself &amp;ldquo;Is it hard as a rock or as soft as a dream?&amp;rdquo;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2012">2012</category>
      
      <category domain="https://www.bobdc.com//categories/rdf">RDF</category>
      
      <category domain="https://www.bobdc.com//categories/semantic-web">semantic web</category>
      
    </item>
    
    <item>
      <title>Reclaiming my picture metadata from flickr</title>
      <link>https://www.bobdc.com/blog/reclaiming-my-picture-metadata/</link>
      <pubDate>Tue, 26 Jun 2012 19:54:44 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/reclaiming-my-picture-metadata/</guid>
      
      
      <description><div>Surprise: by converting multiple sources of data to triples and then running a SPARQL query.</div><div>&lt;blockquote id=&#34;id128256&#34; class=&#34;pullquote&#34;&gt;...a pretty nice example of how triples and SPARQL can make quick and dirty data integration easy even when the data in question isn&#39;t necessarily stored as triples.&lt;/blockquote&gt;
&lt;p&gt;We should give &lt;a href=&#34;http://www.flickr.com/photos/bobdc&#34;&gt;flickr&lt;/a&gt; some credit for providing an API that lets us download the metadata we&amp;rsquo;ve entered about our pictures (for example, titles, descriptions, and membership in custom sets such as &lt;a href=&#34;http://www.flickr.com/photos/bobdc/sets/72157627751552582/&#34;&gt;XML Summer School 2011&lt;/a&gt; or &lt;a href=&#34;http://www.flickr.com/photos/bobdc/sets/72157594523962443/&#34;&gt;Artsier Stuff&lt;/a&gt;) but that metadata all refers to pictures on flickr&amp;rsquo;s servers. What if I want to use &lt;a href=&#34;http://www.blurb.com/&#34;&gt;blurb.com&lt;/a&gt; to print a hardcopy album of one of these sets? Do I have to download that set&amp;rsquo;s pictures from flickr, even though I already have them on a hard disk, because I don&amp;rsquo;t know which ones on my hard disk correspond to the ones in that set on the flickr server?&lt;/p&gt;
&lt;p&gt;As it turns out, no. The general question is this: how do I connect metadata that I&amp;rsquo;ve entered on flickr.com with the files on my local hard disk? Assuming that I never took two different pictures in the same millisecond, I can use the date-time stamp stored inside of each JPEG image file as a unique ID (or, in more OWLish terms, as an inverse functional property, although I didn&amp;rsquo;t actually use owl:InverseFunctionalProperty anywhere and just let SPARQL do the work), so here&amp;rsquo;s what I did:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;I used the flickr API to download the metadata about all the pictures that I have stored there, including set membership. This data was all in XML, so I then used some XSLT to convert that to Turtle RDF.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;I used Apache &lt;a href=&#34;http://tika.apache.org/&#34;&gt;Tika&lt;/a&gt; (an open source toolkit I&amp;rsquo;ve written about here &lt;a href=&#34;https://www.bobdc.com/blog/pull-rdf-metadata-out-of-jpegs&#34;&gt;before&lt;/a&gt;) to pull out metadata about all the pictures on my hard disk as JSON. (I could have asked Tika for &lt;a href=&#34;http://www.snee.com/bobdc.blog/2008/10/new-xmp-spec.html&#34;&gt;XMP&lt;/a&gt;, which would give me RDF, but asking for JSON gets you more data.) I then used some JavaScript to convert this JSON to Turtle RDF. For the file \My Pictures\2012-01-12\IMG_5907.jpg, I created a IMG_5907.jpg.ttl file where the subject of all the triples is the URI &lt;a href=&#34;http://www.snee.com/bob/pics/id/2012-01-12/IMG_5907.jpg&#34;&gt;http://www.snee.com/bob/pics/id/2012-01-12/IMG_5907.jpg&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;I loaded all this RDF into a triplestore and then ran the query shown below, which (in this case) showed me the URIs for the image files on my hard disk that corresponded to each picture stored in my flickr &amp;ldquo;Artsier Stuff&amp;rdquo; set:&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;!-- --&gt;
&lt;pre&gt;&lt;code&gt;PREFIX dc:   &amp;lt;http://purl.org/dc/elements/1.1/&amp;gt;
PREFIX rdfs: &amp;lt;http://www.w3.org/2000/01/rdf-schema#&amp;gt;
PREFIX bf:   &amp;lt;http://snee.com/ns/flickr#&amp;gt;
PREFIX exif: &amp;lt;http://www.w3.org/2003/12/exif/ns#&amp;gt; 
SELECT * WHERE {
  ?ps a bf:Photoset ;
      dc:title &amp;quot;Artsier Stuff&amp;quot; ;
      rdfs:member ?memberPic .
      ?memberPic dc:title ?picTitle ;
      bf:dateTaken ?flickrDate. 
   OPTIONAL { ?diskPic exif:dateTimeOriginal ?flickrDate . }
}
ORDER BY ?flickrDate
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The query finds the bf:dateTaken value of each picture from that set, then looks for a local disk file with that same date-time stamp. I put that last bit in an OPTIONAL pattern because I wasn&amp;rsquo;t sure whether it would successfully find local versions of all the files, and wanted to see which ones it had trouble with. As it turned out, it didn&amp;rsquo;t have trouble with any of them, which was great to see.&lt;/p&gt;
&lt;p&gt;Finding those URIs was handy for gathering up local copies of pictures from a given set. Other queries could retrieve the title, description, and other data associated with any set of flickr pictures and show the disk files that they went with.&lt;/p&gt;
&lt;p&gt;The whole thing was a nice example of how triples and SPARQL can make quick and dirty data integration easy even when the data in question isn&amp;rsquo;t necessarily stored as triples. As an added bonus, the metadata remains meaningful even if I stop paying my subscription fee to flickr and lose access to metadata for all but 200 pictures, which is what happens when you scale back to a free Flickr account.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2012">2012</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
    </item>
    
    <item>
      <title>Trying out SPARQL 1.1&#39;s COPY and MOVE operations</title>
      <link>https://www.bobdc.com/blog/trying-out-sparql-11s-copy-and/</link>
      <pubDate>Sun, 03 Jun 2012 11:13:04 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/trying-out-sparql-11s-copy-and/</guid>
      
      
      <description><div>Copying and moving triples between graphs, named or otherwise.</div><div>&lt;p&gt;SPARQL 1.1 Update&amp;rsquo;s COPY and MOVE operations let you copy and move triples between named graphs or between the default graph and a named graph. These operations first appeared in the May 2011 &lt;a href=&#34;http://www.w3.org/TR/2011/WD-sparql11-update-20110512/&#34;&gt;SPARQL 1.1 Update&lt;/a&gt; draft, but with the recent &lt;a href=&#34;https://repository.apache.org/content/repositories/snapshots/org/apache/jena/jena-fuseki/0.2.2-SNAPSHOT/&#34;&gt;0.2.2 snapshot release of Fuseki&lt;/a&gt; I find I can try their full range of capabilities a little more than I could with the 0.2.1 incubating release of Fuseki.&lt;/p&gt;
&lt;p&gt;The spec&amp;rsquo;s description of &lt;a href=&#34;http://www.w3.org/TR/sparql11-update/#copy&#34;&gt;COPY&lt;/a&gt; and &lt;a href=&#34;http://www.w3.org/TR/sparql11-update/#move&#34;&gt;MOVE&lt;/a&gt; show that neither truly adds anything to the SPARQL Update language; each is a shortcut to a wordier query request that combines DROP and INSERT operations. I think the new ones will be handy.&lt;/p&gt;
&lt;p&gt;To try them, I first ran the update request at &lt;a href=&#34;http://www.learningsparql.com/examples/ex338.ru&#34;&gt;http://www.learningsparql.com/examples/ex338.ru&lt;/a&gt; in Fuseki to create some data to serve as a baseline. This update request inserts two triples in the default graph, two in the named graph &lt;code&gt;d:g1&lt;/code&gt;, and two in the named graph &lt;code&gt;d:g2&lt;/code&gt;. Running the &lt;a href=&#34;http://www.learningsparql.com/examples/ex332.rq&#34;&gt;query that lists all triples in all graphs&lt;/a&gt; got me this (with prefixes substituted for the original base URIs to more easily fit the output on this page):&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;---------------------------------
| g    | s   | p      | o       |
=================================
|      | d:x | dm:tag | &amp;quot;one&amp;quot;   |
|      | d:x | dm:tag | &amp;quot;two&amp;quot;   |
| d:g1 | d:x | dm:tag | &amp;quot;three&amp;quot; |
| d:g1 | d:x | dm:tag | &amp;quot;four&amp;quot;  |
| d:g2 | d:x | dm:tag | &amp;quot;five&amp;quot;  |
| d:g2 | d:x | dm:tag | &amp;quot;six&amp;quot;   |
---------------------------------
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The COPY operation copies triples from one graph into another, replacing any existing triples in the destination graph. (To quote the spec, &amp;ldquo;If the destination graph does not exist, it will be created.&amp;rdquo;) The following update request copies the triples from the default graph to graph &lt;code&gt;d:g2&lt;/code&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;PREFIX d:  &amp;lt;http://learningsparql.com/ns/data#&amp;gt;
COPY DEFAULT TO d:g2
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Pretty simple. After running it, running the query that lists all graphs and triples shows that the &amp;ldquo;five&amp;rdquo; and &amp;ldquo;six&amp;rdquo; triples that were in the &lt;code&gt;d:g2&lt;/code&gt; graph are no longer there and that the &amp;ldquo;one&amp;rdquo; and &amp;ldquo;two&amp;rdquo; triples are there and still in the default graph as well:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;---------------------------------
| g    | s   | p      | o       |
=================================
|      | d:x | dm:tag | &amp;quot;one&amp;quot;   |
|      | d:x | dm:tag | &amp;quot;two&amp;quot;   |
| d:g1 | d:x | dm:tag | &amp;quot;three&amp;quot; |
| d:g1 | d:x | dm:tag | &amp;quot;four&amp;quot;  |
| d:g2 | d:x | dm:tag | &amp;quot;one&amp;quot;   |
| d:g2 | d:x | dm:tag | &amp;quot;two&amp;quot;   |
---------------------------------
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The MOVE operation moves triples from one graph to another, also replacing existing triples in the destination graph. Again, if the destination graph doesn&amp;rsquo;t exist, it will be created. The following update request moves the triples in graph &lt;code&gt;d:g2&lt;/code&gt; to graph &lt;code&gt;d:g1&lt;/code&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;PREFIX d: &amp;lt;http://learningsparql.com/ns/data#&amp;gt;
MOVE d:g2 TO d:g1
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;When run against the result of the COPY update request above, the result shows that there&amp;rsquo;s nothing left in &lt;code&gt;d:g2&lt;/code&gt; and that &lt;code&gt;d:g1&lt;/code&gt; has the triples that used to be in &lt;code&gt;d:g2&lt;/code&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;-------------------------------
| g    | s   | p      | o     |
===============================
|      | d:x | dm:tag | &amp;quot;one&amp;quot; |
|      | d:x | dm:tag | &amp;quot;two&amp;quot; |
| d:g1 | d:x | dm:tag | &amp;quot;one&amp;quot; |
| d:g1 | d:x | dm:tag | &amp;quot;two&amp;quot; |
-------------------------------
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;You can see more variations on these two operations in the &lt;a href=&#34;http://www.w3.org/2009/sparql/docs/tests/summary.html&#34;&gt;SPARQL 1.1 Test Suite&lt;/a&gt;, but the tests are basically different combinations of moving triples between default and named graphs, pre-existing or otherwise.&lt;/p&gt;
&lt;p&gt;Like I said, neither of these operations add anything to SPARQL Update that couldn&amp;rsquo;t be done without them, but I would venture a guess that by making it easier to manipulate the relationships between triples and named graphs, the SPARQL Working Group is encouraging developers to use named graphs more as part of their application architectures. I look forward to asking some of them about this at the &lt;a href=&#34;http://semtechbizsf2012.semanticweb.com/&#34;&gt;Semantic Technology and Business conference&lt;/a&gt; in San Francisco this week. And now it&amp;rsquo;s off to the airport&amp;hellip;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2012">2012</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
    </item>
    
    <item>
      <title>Reuse? Ha!</title>
      <link>https://www.bobdc.com/blog/reuse-ha/</link>
      <pubDate>Mon, 28 May 2012 13:10:35 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/reuse-ha/</guid>
      
      
      <description><div>Reuse is Good, especially when you reuse my work.</div><div>&lt;img id=&#34;id116264&#34; src=&#34;https://www.bobdc.com/img/main/re-use-ha.jpg&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;reusable container&#34; width=&#34;300&#34;/&gt;
&lt;p&gt;I laughed when I found the container shown here in our house, because it demonstrates an overly common attitude about reuse of software, right down to the sanctimonious tone: everyone agrees that reuse and recycling is good, &lt;em&gt;so you should reuse this thing that we custom-designed for our particular needs.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Now, maybe the container is built of recycled plastic, in which case it has some practice behind its preaching, but the idea of demonstrating your commitment to the lofty principle that Reuse is Good (friendly to Planet Earth! With a capital &amp;ldquo;P&amp;rdquo; and &amp;ldquo;E&amp;rdquo;!) by insisting that others reuse your bespoke work is common in software development. There&amp;rsquo;s a good reason for this: finding code that suits your needs is often more work than just writing the code that does the job you need done. The developers who insist that others should reuse their fabulous code often didn&amp;rsquo;t think beyond their own specific needs when designing it, skipping the steps of generalizing the tasks performed to a wider range of related needs and of course documenting their work so that people understand &lt;em&gt;how&lt;/em&gt; to fit their work to other needs.&lt;/p&gt;
&lt;p&gt;This has been an issue since people began preaching about source code reuse over thirty years ago, and it drove much of the popularity of object-oriented analysis and design. It&amp;rsquo;s interesting that this is less of a problem with semantic web technology, for two reasons that I see:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Reuse of pieces of other work is much easier. If I just want to use a little bit of your ontology or schema or vocabulary, I can, and with RDF at the bottom layer of all of this, aggregation of pieces from multiple otherwise uncoordinated sources is much easier than it is with other technologies that advocate reuse, particularly programming and markup languages.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;We can do retroactive reuse. Let&amp;rsquo;s say I declare and use my own &lt;code&gt;bd:photographer&lt;/code&gt; property for image metadata. Later, I notice that Dublin Core has a creator property, and only then decide that mine should have been a subproperty of that. I can just add the triple &lt;code&gt;bd:photographer rdfs:subPropertyOf dc:creator&lt;/code&gt; to my data and still reap the benefits of reuse: applications that don&amp;rsquo;t know about my &lt;code&gt;bd:photographer&lt;/code&gt; property but do know about the more famous Dublin Core one will have a clue about the semantics of my property (that is, that a photographer is a kind of creator of an image) and can treat my property as a stand-in for the &lt;code&gt;dc:creator&lt;/code&gt; property.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Essentially, the ease with which we can loosely join small pieces of ontologies and RDF schema makes it much easier to use semantic technology to form a semantic web.&lt;/p&gt;
&lt;p&gt;(A side note on object-oriented work: today everyone thinks that &lt;a href=&#34;http://en.wikipedia.org/wiki/Paradigm_shift&#34;&gt;paradigm shift&lt;/a&gt; just means &amp;ldquo;big change.&amp;rdquo; When science historian Thomas Kuhn coined the term, he was describing a gradual fading away of research in a given problem area as people worked on new areas and the other one got left behind. Has object-oriented analysis and design faded away as an active area for computer science researchers since its heyday in the eighties? I think so. Try going to the Association for Computing Machinery&amp;rsquo;s &lt;a href=&#34;http://www.oopsla.org&#34;&gt;www.oopsla.org&lt;/a&gt; website and you&amp;rsquo;ll be redirected to the &amp;ldquo;Systems, Programming, Languages and Applications: Software for Humanity&amp;rdquo; conference that has folded it in. That conference has an &lt;a href=&#34;http://splashcon.org/2011/program/oopsla-research-papers&#34;&gt;OOPSLA track&lt;/a&gt;, but in the titles of the 61 papers presented in that track in 2011, the word &amp;ldquo;object&amp;rdquo; only comes up three times. Of course, object-oriented principles drive the code development of Java and several currently popular programming languages, so these principles are still going very strong, but I find it interesting that active research in the area has faded to such a small fraction of where it used to be.)&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2012">2012</category>
      
      <category domain="https://www.bobdc.com//categories/semantic-web">semantic web</category>
      
    </item>
    
    <item>
      <title>Simple federated queries with RDF</title>
      <link>https://www.bobdc.com/blog/simple-federated-queries-with/</link>
      <pubDate>Sun, 29 Apr 2012 18:22:31 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/simple-federated-queries-with/</guid>
      
      
      <description><div>A few more triples to identify some relationships, and you&#39;re all set.</div><div>&lt;blockquote id=&#34;id128269&#34; class=&#34;pullquote&#34;&gt;Easy aggregation without conversion is where semantic web technology shines the brightest.&lt;/blockquote&gt;
&lt;p&gt;Once, at an &lt;a href=&#34;http://xmlsummerschool.com/&#34;&gt;XML Summer School&lt;/a&gt; session, I was giving a talk about semantic web technology to a group that included several presenters from other sessions. This included &lt;a href=&#34;http://www.ltg.ed.ac.uk/~ht/&#34;&gt;Henry Thompson&lt;/a&gt;, who I&amp;rsquo;ve known since the SGML days. He was still a bit skeptical about RDF, and said that RDF was in the same situation as XML—that if he and I stored similar information using different vocabularies, we&amp;rsquo;d still have to convert his to use the same vocabulary as mine or vice versa before we could use our data together. I told him he was wrong—that easy aggregation without conversion is where semantic web technology shines the brightest.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve finally put together an example. Let&amp;rsquo;s say that I want to query across his address book and my address book together for the first name, last name, and email address of anyone whose email address ends with &amp;ldquo;.org&amp;rdquo;. Imagine that his address book uses the &lt;a href=&#34;http://www.w3.org/TR/vcard-rdf/&#34;&gt;vCard&lt;/a&gt; vocabulary and the Turtle syntax and looks like this,&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;# addressBookA.ttl


@prefix v:   &amp;lt;http://www.w3.org/2006/vcard/ns#&amp;gt; .
@prefix aba: &amp;lt;http://learningsparql.com/ns/abookA/data#&amp;gt; .        


aba:rick v:given-name &amp;quot;Richard&amp;quot; ;
         v:family-name &amp;quot;Mutt&amp;quot; ; 
         v:email &amp;quot;rick@selavy.org&amp;quot; . 


aba:al   v:given-name &amp;quot;Alan&amp;quot; ;
         v:family-name &amp;quot;Smithee&amp;quot; ; 
         v:email &amp;quot;alan@paramount.com&amp;quot; . 
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;and mine uses the &lt;a href=&#34;http://xmlns.com/foaf/spec/&#34;&gt;FOAF&lt;/a&gt; vocabulary and looks like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;# addressBookB.ttl 


@prefix foaf: &amp;lt;http://xmlns.com/foaf/0.1/&amp;gt; .
@prefix abb: &amp;lt;http://learningsparql.com/ns/abookB/data#&amp;gt; .        


abb:bill   foaf:givenName &amp;quot;Billy&amp;quot; ;
           foaf:familyName &amp;quot;Shears&amp;quot; ; 
           foaf:mbox &amp;quot;bill@northernsongs.org&amp;quot; . 


abb:nate foaf:givenName &amp;quot;Nanker&amp;quot; ;
           foaf:familyName &amp;quot;Phelge&amp;quot; ; 
           foaf:mbox &amp;quot;nate@abkco.com&amp;quot; . 
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Note that, in addition to the property names being different in the two address books, his properties, my properties, his data, and my data come from four different namespaces.&lt;/p&gt;
&lt;p&gt;A simple CONSTRUCT query would convert one address book to use the same vocabulary that the other uses—my book &lt;a href=&#34;http://www.learningsparql.com/&#34;&gt;Learning SPARQL&lt;/a&gt; includes a &lt;a href=&#34;http://www.learningsparql.com/examples/ex194.rq&#34;&gt;query that does this&lt;/a&gt; to convert an address book from the book&amp;rsquo;s demo namespace to vCard—but to address Henry&amp;rsquo;s question, I wanted to show how we can query across the two address books with no need for conversion. The key is a little bit of &lt;a href=&#34;http://www.w3.org/TR/rdf-schema/&#34;&gt;RDFS&lt;/a&gt; to define appropriate relationships between the properties used by the two address books:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;# mapping.ttl


@prefix foaf: &amp;lt;http://xmlns.com/foaf/0.1/&amp;gt; .
@prefix v:    &amp;lt;http://www.w3.org/2006/vcard/ns#&amp;gt; .
@prefix rdfs: &amp;lt;http://www.w3.org/2000/01/rdf-schema#&amp;gt; .
@prefix ab:   &amp;lt;http://learningsparql.com/ns/addressbook#&amp;gt; .


foaf:givenName  rdfs:subPropertyOf ab:firstName . 
v:given-name    rdfs:subPropertyOf ab:firstName . 


foaf:familyName rdfs:subPropertyOf ab:lastName . 
v:family-name   rdfs:subPropertyOf ab:lastName . 


foaf:mbox       rdfs:subPropertyOf ab:email . 
v:email         rdfs:subPropertyOf ab:email . 
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I could have used this mappings.ttl file to say that the FOAF properties were subproperties of the vCard ones (or vice versa) and gotten a similar result, but because these are two independent standards that I had nothing to do with, I didn&amp;rsquo;t feel right making assertions about their relationship, even if it was for a specialized local application. Instead, I declared properties from both to be subproperties of similar ones in an address book namespace that I created myself. When adding these &lt;code&gt;rdfs:subPropertyOf&lt;/code&gt; triples into the mix, a &lt;code&gt;foaf:giveName&lt;/code&gt; value and &lt;code&gt;v:given-name&lt;/code&gt; value are both &lt;code&gt;ab:firstName&lt;/code&gt; values, so I can just query for that, and the same goes for the the values of the other properties:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;#dotorg.rq


PREFIX ab: &amp;lt;http://learningsparql.com/ns/addressbook#&amp;gt; 


SELECT ?email ?fn ?ln WHERE { 
?s ab:firstName ?fn ;
   ab:lastName ?ln ;
   ab:email ?email . 
   FILTER (regex(?email, &amp;quot;.org$&amp;quot;)) .
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;There is a catch: the query will only find those values if I query for them with a tool that knows what &lt;code&gt;rdfs:subPropertyOf&lt;/code&gt; means. One such tool is the OWL reasoner &lt;a href=&#34;http://clarkparsia.com/pellet&#34;&gt;Pellet&lt;/a&gt;. Pellet&amp;rsquo;s command line interface only accepts one data file as an argument, and I needed to combine the two address book files and the mapping file, so I executed the query with a two-line script that first concatenated the three files together (did I mention that RDF is easy to aggregate?):&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;cat addressBookA.ttl addressBookB.ttl mapping.ttl &amp;gt; combo.ttl
pellet query -q dotorg.rq combo.ttl
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Here is Pellet&amp;rsquo;s answer. It found one email address in each of the two address books that ended with &amp;ldquo;org&amp;rdquo;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Query Results (2 answers):
email                    | fn        | ln
===============================================
&amp;quot;rick@selavy.org&amp;quot;        | &amp;quot;Richard&amp;quot; | &amp;quot;Mutt&amp;quot;
&amp;quot;bill@northernsongs.org&amp;quot; | &amp;quot;Billy&amp;quot;   | &amp;quot;Shears&amp;quot;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;In &lt;a href=&#34;http://topquadrant.com/products/TB_Composer.html&#34;&gt;TopBraid Composer&lt;/a&gt;, including the free edition, the simplest way to combine these data files is to create another one that imports the ones you want to query together. I created one called addressbooks.ttl and dragged the three relevant files into the &lt;strong&gt;Imports&lt;/strong&gt; view for that file:&lt;/p&gt;
&lt;img id=&#34;id130657&#34; src=&#34;https://www.bobdc.com/img/main/tbcfeabook1.png&#34; alt=&#34;TopBraid Composer import view&#34; width=&#34;500&#34;/&gt;
&lt;p&gt;(Before I explain the fourth included file: the saved addressbooks.ttl file imports the others using the standard &lt;code&gt;owl:imports&lt;/code&gt; property. Because of this, Pellet can do the same query as above on that &amp;ldquo;single&amp;rdquo; addressbooks.ttl file, because Pellet certainly knows what &lt;code&gt;owl:imports&lt;/code&gt; means. It&amp;rsquo;s always nice to work with a set of tools that play nice together because they conform to the same standards.)&lt;/p&gt;
&lt;p&gt;In order to infer the extra triples implied by the relationships specified in mappings.ttl, such as that &lt;code&gt;aba:rick&lt;/code&gt; has an &lt;code&gt;ab:firstName&lt;/code&gt; value of &amp;ldquo;Richard&amp;rdquo;, TopBraid Composer can use several different inference engines. The TopSPIN inferencing engine is included in all editions, including the free one, and does inferencing based on &lt;a href=&#34;http://www.w3.org/Submission/2011/02/&#34;&gt;SPARQL Inferencing Notation&lt;/a&gt; rules. The fourth file imported above, rdfsplus.ttl, contains rules (stored as triples) that implement RDFS Plus, a superset of RDFS developed by &lt;a href=&#34;http://workingontologist.org/&#34;&gt;Jim Hendler and Dean Allemang&lt;/a&gt; that has a few extra OWL constructs thrown in. (Other SPIN rule sets are available, such as one that implements &lt;a href=&#34;http://www.w3.org/TR/owl2-profiles/#OWL_2_RL&#34;&gt;OWL RL&lt;/a&gt;.) Once you run TopSPIN inferencing on addressbooks.ttl&amp;rsquo;s complete set of triples, running the query above in TopBraid Composer&amp;rsquo;s SPARQL view returns the same result as the Pellet command line query earlier:&lt;/p&gt;
&lt;img id=&#34;id130699&#34; src=&#34;https://www.bobdc.com/img/main/tbcfeabook2.png&#34; alt=&#34;TopBraid Composer SPARQL query and results&#34; width=&#34;500&#34;/&gt;
&lt;p&gt;&lt;a href=&#34;http://answers.semanticweb.com/questions/3534/rdfs-reasoning-support-and-sparql&#34;&gt;Other tools with inferencing support&lt;/a&gt; tend to be triple stores such as AllegroGraph, OWLIM (whose reasoning engine is another option in some versions of TopBraid Composer), Stardog, and Virtuoso. The use of a triplestore with this approach instead of three files loaded into memory together will obviously let you scale up to do it with larger amounts of data.&lt;/p&gt;
&lt;p&gt;Here&amp;rsquo;s a nice little trick that builds on the SPIN principle of letting SPARQL do the work: although &lt;a href=&#34;http://incubator.apache.org/jena/documentation/query/&#34;&gt;ARQ&lt;/a&gt; can&amp;rsquo;t do any inferencing, SPARQL 1.1 lets you build a form of inferencing right into your query. This revision of the original query uses &lt;a href=&#34;http://www.w3.org/TR/sparql11-query/#propertypaths&#34;&gt;property paths&lt;/a&gt; to find first and last name and email address values specified with any subproperties of &lt;code&gt;ab:firstName&lt;/code&gt;, &lt;code&gt;ab:lastName&lt;/code&gt;, and &lt;code&gt;ab:email&lt;/code&gt; among the triples at hand:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;# arqdotorg.rq


PREFIX ab: &amp;lt;http://learningsparql.com/ns/addressbook#&amp;gt; 
PREFIX rdfs: &amp;lt;http://www.w3.org/2000/01/rdf-schema#&amp;gt; 


SELECT ?email ?fn ?ln WHERE { 
?firstNameProp rdfs:subPropertyOf* ab:firstName . 
?lastNameProp rdfs:subPropertyOf* ab:lastName . 
?emailProp rdfs:subPropertyOf* ab:email . 
?s ?firstNameProp ?fn ;
   ?lastNameProp ?ln ;
   ?emailProp ?email . 
   FILTER (regex(?email, &amp;quot;.org$&amp;quot;)) .
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The following command line gets the same result set that the earlier arrangements got:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;arq.bat --query arqdotorg.rq --data addressBookA.ttl --data addressBookB.ttl --data mapping.ttl
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Implementing the inferencing logic as part of your query like this is only going to scale up so far, but it can still be handy pretty often.&lt;/p&gt;
&lt;p&gt;Overall, there are two important lessons here:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;In terms of work, the setups I&amp;rsquo;ve described may look comparable to building and running a simple conversion routine, but once the mapping setup is done, it&amp;rsquo;s done. If I or Henry adds a new address book entry to either of our address books with &lt;a href=&#34;mailto:stigohara@rutles.org&#34;&gt;stigohara@rutles.org&lt;/a&gt; as the email address, rerunning the query with any of these setups will find it. A big bonus is that we can each continue to use and edit our address books the same way we did before and we can still do these cross-address book queries with no need to convert anything to anything else.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;A little RDFS was all it took. Years ago &lt;a href=&#34;https://www.bobdc.com/blog/rdfs-without-rdfowl&#34;&gt;I wondered&lt;/a&gt; if anyone used RDFS without OWL, and lately the answer is a more and more emphatic Yes. The &lt;code&gt;owl:imports&lt;/code&gt; trick above was one approach to aggregating the necessary triples, but it played no role in the mapping between the two address books that made the query of the two together possible.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;So Henry: RDF and related technologies can be very useful, and the list of well-known XML people who have come to realize this is very impressive. In fact, several of them are giving XML and/or RDF presentations at this year&amp;rsquo;s &lt;a href=&#34;http://xmlsummerschool.com&#34;&gt;XML Summer School&lt;/a&gt; in Oxford this September!&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2012">2012</category>
      
      <category domain="https://www.bobdc.com//categories/rdf">RDF</category>
      
      <category domain="https://www.bobdc.com//categories/xml">XML</category>
      
    </item>
    
    <item>
      <title>Playing with SPARQL Graph Store HTTP Protocol</title>
      <link>https://www.bobdc.com/blog/playing-with-sparql-graph-stor/</link>
      <pubDate>Sat, 31 Mar 2012 10:16:38 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/playing-with-sparql-graph-stor/</guid>
      
      
      <description><div>GETting, POSTing, PUTting, and DELETEing named graphs.</div><div>&lt;p&gt;One of the new SPARQL 1.1 specifications is the &lt;a href=&#34;http://www.w3.org/TR/sparql11-http-rdf-update/&#34;&gt;SPARQL 1.1 Graph Store HTTP Protocol&lt;/a&gt;, which is currently still a W3C Working Draft. According to its abstract, it &amp;ldquo;describes the use of HTTP operations for the purpose of managing a collection of graphs in the REST architectural style.&amp;rdquo; Recent releases of &lt;a href=&#34;http://www.openrdf.org/&#34;&gt;Sesame&lt;/a&gt; support it, so I used that to try out some of the operations described by this spec. I managed to do GET, PUT, POST, and DELETE operations with individual named graphs, so that was fun, in an RDF geek kind of way.&lt;/p&gt;
&lt;blockquote id=&#34;id133719&#34; class=&#34;pullquote&#34;&gt;Adding and deleting triples at the named graph level of granularity (as opposed to the triple level) will also make more sense for data publishing workflows where sets of data are added and deleted as a unit.&lt;/blockquote&gt;
&lt;p&gt;As this Working Draft often points out, you can also perform most if not all of these operations with a query sent to a SPARQL endpoint. Hardcore RESTafarians will prefer the new HTTP protocol way, though, because it uses basic HTTP operations with URIs that name resources (in this case, graphs of triples) and the operations to perform on them, instead of the more implementation-detail-oriented practice of embedding queries in URLs.&lt;/p&gt;
&lt;p&gt;Adding and deleting triples at the named graph level of granularity (as opposed to the triple level) will also make more sense for data publishing workflows in which sets of data—probably with their own metadata about things like &lt;a href=&#34;http://www.w3.org/2011/prov/wiki/ProvenanceRDFNamedGraph&#34;&gt;provenance&lt;/a&gt;—are added and deleted as a unit. For example, if you&amp;rsquo;re a data publisher and I&amp;rsquo;m one of your providers, I would send you a set of data to replace the current set that you&amp;rsquo;re offering from my organization, which you may have distinguished from your other data offerings in your triplestore by keeping the data from my company in its own named graph.&lt;/p&gt;
&lt;p&gt;Maybe not enough people will agree, and find that UPDATE queries are good enough to achieve their goals. Ultimately, support for the Graph Store HTTP Protocol across the spectrum of semantic web tools will probably be tied to the extent of customer demand for it. At the very least I would expect all triplestores to support it shortly after it becomes a Recommendation, if not before.&lt;/p&gt;
&lt;p&gt;To test drive these operations, I used &lt;a href=&#34;http://curl.haxx.se/&#34;&gt;cURL&lt;/a&gt; from the command line. cURL is part of Linux and Mac OS, and a free version for Window is available. Your favorite programming language should also offer ways to perform GET, PUT, POST, and DELETE operations—if not natively, then with some add-in library.&lt;/p&gt;
&lt;p&gt;Everything below works, but not necessarily the best way possible. I went back and forth between the W3C specification document and the &lt;a href=&#34;http://www.openrdf.org/doc/sesame2/system/ch08.html#d0e638&#34;&gt;Sesame documentation&lt;/a&gt; on the topic a lot (with plenty of searches about cURL command line syntax in between) and I had plenty of both hits and misses. I probably missed some better ways to do several of these and I&amp;rsquo;m open to any suggestions.&lt;/p&gt;
&lt;p&gt;Also, I have no idea what role authorization could play in all of this—you don&amp;rsquo;t want to let just anyone with HTTP access change and delete your data—but this seemed like a nice start at getting to know this new part of the SPARQL standard.&lt;/p&gt;
&lt;h2 id=&#34;id133776&#34;&gt;Setup&lt;/h2&gt;
&lt;p&gt;To start, I created a new repository (a Sesame term, not a W3C standard term) called &lt;code&gt;updatetest&lt;/code&gt;. This will be important below, because the URLs to pass to Sesame must specify the name of the repository to act on.&lt;/p&gt;
&lt;p&gt;Then, on Sesame&amp;rsquo;s SPARQL Update screen, I entered the following to insert some starter data into the &lt;code&gt;updatetest&lt;/code&gt; repository:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;PREFIX d:  &amp;lt;http://learningsparql.com/ns/data#&amp;gt;
PREFIX dm: &amp;lt;http://learningsparql.com/ns/demo#&amp;gt;


INSERT DATA
{
  d:x dm:tag &amp;quot;one&amp;quot; . 
  d:x dm:tag &amp;quot;two&amp;quot; . 


  GRAPH d:g1
  { 
    d:x dm:tag &amp;quot;three&amp;quot; . 
    d:x dm:tag &amp;quot;four&amp;quot; . 
  }


  GRAPH d:g2
  { 
    d:x dm:tag &amp;quot;five&amp;quot; . 
    d:x dm:tag &amp;quot;six&amp;quot; . 
  }
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;It adds two triples to the repository&amp;rsquo;s default graph, creates named graphs called &lt;code&gt;d:g1&lt;/code&gt; and &lt;code&gt;d:g2&lt;/code&gt;, and puts two triples in each of those. (If you&amp;rsquo;re new to the use of named graphs or &lt;a href=&#34;http://www.w3.org/TR/sparql11-query/&#34;&gt;SPARQL 1.1 Update&lt;/a&gt;, which is also still in Working Draft status, see my O&amp;rsquo;Reilly book &lt;a href=&#34;http://learningsparql.com/&#34;&gt;Learning SPARQL&lt;/a&gt;.)&lt;/p&gt;
&lt;p&gt;To check that this update query above had the desired effect, and to see the results of the operations described below, I entered the following query on Sesame&amp;rsquo;s Query screen. It lists all the triples currently in the dataset:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;SELECT ?g ?s ?p ?o
WHERE
{
  { ?s ?p ?o }
  UNION
  { GRAPH ?g { ?s ?p ?o } }
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;After you execute this query you&amp;rsquo;ll see a URL-escaped version of it embedded in the URL in your browser&amp;rsquo;s address bar. (Don&amp;rsquo;t call that RESTful, though, or the &lt;a href=&#34;https://www.bobdc.com/blog/restful-sparql-queries-of-rdfa#comments&#34;&gt;RESTafarians&lt;/a&gt; will come after you!) If you&amp;rsquo;re going to try many of the examples below, you might want to bookmark the result of this query or keep it in its own browser tab so that you can reload it after trying each command line below to see the command&amp;rsquo;s effect on the data in the &lt;code&gt;updatetest&lt;/code&gt; repository.&lt;/p&gt;
&lt;h2 id=&#34;id136087&#34;&gt;GET&lt;/h2&gt;
&lt;p&gt;The GET examples should work when pasted as the URL into any browser, because a web browser that doesn&amp;rsquo;t support GET isn&amp;rsquo;t much of a browser. I did it with cURL anyway to be consistent with the rest of my examples. The following asks for everything in the &lt;code&gt;updatetest&lt;/code&gt; dataset&amp;rsquo;s default graph, and gets the &amp;ldquo;one&amp;rdquo; and &amp;ldquo;two&amp;rdquo; triples:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;curl http://localhost:8080/openrdf-sesame/repositories/updatetest/rdf-graphs/service?default
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;(Several command lines that I&amp;rsquo;ve pasted here may reach off to the right where you can&amp;rsquo;t see them because they&amp;rsquo;re too long. I chose not to break them up with carriage returns to make them easier to copy and paste if you want to try them.)&lt;/p&gt;
&lt;p&gt;Sesame returns the triples in the Turtle format, but in true RESTful fashion, you can ask for the result in one of the other &lt;a href=&#34;http://www.openrdf.org/doc/sesame2/system/ch08.html#table-rdf-formats&#34;&gt;formats that Sesame supports&lt;/a&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;curl -H &amp;quot;Accept: application/rdf+xml&amp;quot; http://localhost:8080/openrdf-sesame/repositories/updatetest/rdf-graphs/service?default
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The next request asks for all the triples in named graph &lt;code&gt;http://learningsparql.com/ns/data#g1&lt;/code&gt;. Note that graph name characters that might cause problems in URL parameters are escaped in the request:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;curl http://localhost:8080/openrdf-sesame/repositories/updatetest/rdf-graphs/service?graph=http%3A%2F%2Flearningsparql.com%2Fns%2Fdata%23g1
&lt;/code&gt;&lt;/pre&gt;
&lt;h2 id=&#34;id136130&#34;&gt;PUT&lt;/h2&gt;
&lt;p&gt;An HTTP PUT is a request to put a resource at a particular URL. The idea is to create a new resource at that URL—even if something already exists there, in which case the existing resource gets replaced.&lt;/p&gt;
&lt;p&gt;Our first PUT example puts the triples from the file test.ttl into the &lt;code&gt;http://learningsparql.com/ns/data#g2&lt;/code&gt; named graph, replacing any existing ones that may be there. (Note how the command line uses the cURL &lt;code&gt;-X&lt;/code&gt; switch to indicate the operation to perform, the @ character to point to the file with the triples to send, and the &lt;code&gt;-H&lt;/code&gt; switch to send a custom header indicating the MIME type of the data being sent.) If the &lt;code&gt;http://learningsparql.com/ns/data#g2&lt;/code&gt; graph didn&amp;rsquo;t exist, the PUT operation would create it.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;curl -X PUT -d @test.ttl -H &amp;quot;Content-Type: application/x-turtle&amp;quot; http://localhost:8080/openrdf-sesame/repositories/updatetest/rdf-graphs/service?graph=http%3A%2F%2Flearningsparql.com%2Fns%2Fdata%23g2
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;(For the remainder of these commands, I changed something in test.ttl each time to make sure that I could see, when querying Sesame, that the latest version of the data really had been sent to the repository.) For the next query, I wanted to completely replace all of the &lt;code&gt;updatetest&lt;/code&gt; repository&amp;rsquo;s triples with the ones in test.ttl. Based on the other working examples and the correspondences between the &lt;a href=&#34;http://www.openrdf.org/doc/sesame2/system/ch08.html#d0e638&#34;&gt;Sesame documentation&lt;/a&gt; and the standard, I thought this would work, but it didn&amp;rsquo;t:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;curl -X PUT -d @test.ttl  -H &amp;quot;Content-Type: application/x-turtle&amp;quot; http://localhost:8080/openrdf-sesame/repositories/updatetest/rdf-graphs/service?default
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This more Sesame-ish URL syntax did work to replace all of the update test repository&amp;rsquo;s triples with the ones in test.ttl:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;curl -X PUT -d @test.ttl -H &amp;quot;Content-Type: application/x-turtle&amp;quot; http://localhost:8080/openrdf-sesame/repositories/updatetest/statements
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;(After running it, you may want to rerun the INSERT DATA update query above to more easily see the effect of the remaining operations.)&lt;/p&gt;
&lt;h2 id=&#34;id136228&#34;&gt;POST&lt;/h2&gt;
&lt;p&gt;While a PUT command replaces any existing triples at the named URL with the ones being sent, a POST command adds the new ones to the existing ones. The following adds the test.ttl triples to the &lt;code&gt;http://learningsparql.com/ns/data#g2&lt;/code&gt; named graph:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;curl -X POST -d @test.ttl -H &amp;quot;Content-Type: application/x-turtle&amp;quot; http://localhost:8080/openrdf-sesame/repositories/updatetest/rdf-graphs/service?graph=http%3A%2F%2Flearningsparql.com%2Fns%2Fdata%23g2
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I could not put together a command line that POSTed triples to the default graph, and didn&amp;rsquo;t see any examples in the Sesame documentation.&lt;/p&gt;
&lt;h2 id=&#34;id136249&#34;&gt;DELETE&lt;/h2&gt;
&lt;p&gt;When applied to a named graph, this command&amp;rsquo;s effect is pretty obvious. The following deletes the &lt;code&gt;http://learningsparql.com/ns/data#g2&lt;/code&gt; named graph and all of its triples:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;curl -X DELETE http://localhost:8080/openrdf-sesame/repositories/updatetest/rdf-graphs/service?graph=http%3A%2F%2Flearningsparql.com%2Fns%2Fdata%23g2
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This last command deletes the default graph&amp;rsquo;s triples, leaving named graphs and their triples intact:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;curl -X DELETE http://localhost:8080/openrdf-sesame/repositories/updatetest/rdf-graphs/service?default
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Have you tried this new specification&amp;rsquo;s operations with other tools? Does anyone see clear-cut cases where they&amp;rsquo;d rather use this than send the corresponding queries to a SPARQL endpoint, or vice versa? Let me know at &lt;a href=&#34;https://plus.google.com/u/1/101006505484718936507/posts/789QKstz7Z8&#34;&gt;this Google+ post&lt;/a&gt;.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2012">2012</category>
      
      <category domain="https://www.bobdc.com//categories/rdf">RDF</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
      <category domain="https://www.bobdc.com//categories/triplestores">triplestores</category>
      
    </item>
    
    <item>
      <title>Pull RDF metadata out of JPEGs, MP3s, and more</title>
      <link>https://www.bobdc.com/blog/pull-rdf-metadata-out-of-jpegs/</link>
      <pubDate>Thu, 23 Feb 2012 08:54:10 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/pull-rdf-metadata-out-of-jpegs/</guid>
      
      
      <description><div>With open source Apache software.</div><div>&lt;p&gt;&lt;a href=&#34;http://tika.apache.org/1.0/index.html&#34;&gt;&lt;img id=&#34;id103339&#34; src=&#34;http://tika.apache.org/tika.png&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;Tika logo&#34; width=&#34;200&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve been having some fun with Apache &lt;a href=&#34;http://tika.apache.org/&#34;&gt;Tika&lt;/a&gt; lately. According to its homepage, the &amp;ldquo;Apache Tika™ toolkit detects and extracts metadata and structured text content from various documents using existing parser libraries.&amp;rdquo; It managed to pull some sort of metadata out of just about any file I&amp;rsquo;ve pointed to; the &lt;a href=&#34;http://tika.apache.org/1.0/formats.html&#34;&gt;list of formats&lt;/a&gt; it can handle includes PDF, JPEG, MP3, ePUB, Flash video files, Microsoft Office files, OpenOffice files, and more. What&amp;rsquo;s especially cool to me is that it can extract RDF.&lt;/p&gt;
&lt;p&gt;Tika can run as a server, or as GUI window that you drag files to, but I&amp;rsquo;ve mostly played with the command line version. Running it with an argument of &lt;code&gt;--help&lt;/code&gt; lists the available output options:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;java -jar tika-app-1.0.jar --help
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;-y&lt;/code&gt; option tells it to output XMP data, which comes out as RDF/XML. I&amp;rsquo;ve &lt;a href=&#34;https://www.bobdc.com/blog/new-xmp-spec&#34;&gt;written here&lt;/a&gt; about XMP several times before. It&amp;rsquo;s basically an Adobe spec for media metadata expressed as RDF/XML. Being RDF/XML, any semantic web tool should be able to read it. The bad news is that, by explicitly targeting XMP, this Tika output only includes metadata defined in the relevant Adobe namespaces. Specifying the &lt;code&gt;-j&lt;/code&gt; switch instead tells Tika to give you JSON output, and you get a lot more metadata. It would be nice if Tika included an &lt;code&gt;-r&lt;/code&gt; switch to output all the metadata it can find—the same that it outputs when you request JSON output—as RDF/XML or Turtle. They&amp;rsquo;ve obviously already done the hard parts.&lt;/p&gt;
&lt;p&gt;Why is Tika&amp;rsquo;s ability to output media metadata in RDF so interesting to me, especially if it could someday output all the same properties in RDF that it can now output in JSON? Because different media have different metadata properties (for example, an MP3 file has different metadata from a JPEG file) and one of the greatest strengths of the RDF data model is the way it lets you accumulate property-value pairs for resources without knowing which properties you&amp;rsquo;re going to gather in advance. So, let&amp;rsquo;s say I wanted to create an application around a single set of metadata that describes a particular collection of images, music files, and related documentation. Tika plus a few selections from a wide variety of standards-compliant semantic web software, such as TopQuadrant&amp;rsquo;s TopBraid platform and many other tools, would make this almost trivial. Of course, some extra RDFS modeling around the stored properties would add more, but Tika, a triplestore, and very little else would give you enough to be off and running with a very powerful application.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2012">2012</category>
      
      <category domain="https://www.bobdc.com//categories/rdf">RDF</category>
      
      <category domain="https://www.bobdc.com//categories/metadata">metadata</category>
      
    </item>
    
    <item>
      <title>A brief, opinionated history of XML</title>
      <link>https://www.bobdc.com/blog/a-brief-opinionated-history-of/</link>
      <pubDate>Wed, 25 Jan 2012 09:01:51 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/a-brief-opinionated-history-of/</guid>
      
      
      <description><div>From someone who had a front row seat.</div><div>&lt;p&gt;There are a few histories of XML out there, but I still find myself explaining certain points to people surprisingly often, so I thought I&amp;rsquo;d write them down. If you don&amp;rsquo;t want to read this whole thing, I&amp;rsquo;ll put the moral of the story right at the top:&lt;/p&gt;
&lt;blockquote id=&#34;id103361&#34; class=&#34;pullquote&#34;&gt;They didn&#39;t understand that it wasn&#39;t designed to meet their needs. It was designed to make electronic publishing in multiple media easier.&lt;/blockquote&gt;
&lt;p&gt;&lt;em&gt;XML was designed as a simplified subset of SGML to make electronic publishing in multiple media easier. People found it useful for other things. When some people working on those other things found that XML wasn&amp;rsquo;t perfect for their needs, they complained and complained about how badly designed XML was. They didn&amp;rsquo;t understand that it wasn&amp;rsquo;t designed to meet their needs. It was designed to make electronic publishing in multiple media easier.&lt;/em&gt;&lt;/p&gt;
&lt;h2 id=&#34;id103388&#34;&gt;Automated typesetting and page layout&amp;hellip;&lt;/h2&gt;
&lt;p&gt;In the 1970s, computerized typesetting made automated page layout much easier, but three guys at IBM named Goldfarb, Mosher, and Lorie got tired of the proprietary nature of the typesetting codes used in these systems, so they came up with a nonproprietary, generic way to store content for automated publishing that would make it easier to convert this content for publication on multiple systems. This became the ISO standard &lt;a href=&#34;http://en.wikipedia.org/wiki/SGML&#34;&gt;SGML&lt;/a&gt;, and the standardized nonproprietary part made it popular among U.S. defense contractors, legal publishers, and other organizations that did large-scale automated publishing.&lt;/p&gt;
&lt;p&gt;When I first got involved, SGML was gaining popularity among publishers creating CD-ROMs and bound books from the same content, because they could create and edit an SGML version and then run scripts to publish that content in the various media. The structure of an SGML document type (for example, the available text elements and element relationships in a set of legal court cases, or the elements and element relationships that you could use in a set of aircraft repair manuals) was specified in something called a &lt;a href=&#34;http://en.wikipedia.org/wiki/Document_Type_Definition&#34;&gt;DTD&lt;/a&gt;, which had its own syntax and was part of the SGML standard. The scripts to convert SGML documents were usually written using a language and engine called Omnimark, which was a proprietary product, but a perl-based alternative was also available.&lt;/p&gt;
&lt;p&gt;When Tim Berners-Lee was wondering how exactly to specify that one of his new hypertext documents had a title here, a subtitle there, and a link in the middle of a paragraph that led to another document, SGML was a logical choice—it was a text-based, flexible, non-proprietary, standardized way to specify document structure with various tools available to help you work with those documents. That&amp;rsquo;s why HTML tags are delimited with angle brackets: because SGML elements were (nearly always) delimited with angle brackets. Dan Connolly sketched out the &lt;a href=&#34;http://lists.w3.org/Archives/Public/www-talk/1992MayJun/0020.html&#34;&gt;first HTML DTD&lt;/a&gt; in 1992.&lt;/p&gt;
&lt;p&gt;SGML&amp;rsquo;s designers couldn&amp;rsquo;t see into the future, so they deliberately made it very flexible. For example, you could use other delimiters for element tags besides angle brackets, but everyone used angle brackets. SGML parsing programs were still required to account for the possibility that a document used other delimiters, and the possibility that many other options had been reset, so these parsers were large and complex, and few were available to choose from. By the mid-90s, enough best practices had developed that Sun Microsystems&amp;rsquo; Jon Bosak had the idea for a simplified, slimmer version of SGML that assumed a lot of default settings and could be parsed by a smaller program—maybe even a program written in Sun&amp;rsquo;s new Java language—and that could be transmitted over the web when necessary. The documents themselves would be easier to share over the web than typical SGML documents, following the example of HTML documents.&lt;/p&gt;
&lt;p&gt;Around this time SGML was considered a niche technology in the electronic publishing industry, and I worked at several jobs where I wrote and modified DTDs and Omnimark scripts to create and maintain document conversion systems. I also went to the relevant SGML conferences, where I got to know several of the people who eventually joined Jon to create the simplified version of SGML. (Many are still friends.) At first this group called their new spec WebSGML, but eventually they named it XML.&lt;/p&gt;
&lt;p&gt;&lt;span id=&#34;whydtds&#34; /&gt;You could still process XML with Omnimark and other SGML tools. Many people would fail to appreciate the value of this design decision: as a valid subset of SGML, XML documents could be processed with existing SGML technology. This meant that on that day in 1998 when XML became an official W3C standard, we already had plenty of software out there, including programs like Adobe&amp;rsquo;s special SGML edition of FrameMaker, that could process XML documents right away. This gave the new standard a running start, and XML may not have gotten anywhere without this running start, because those of us using the existing tools didn&amp;rsquo;t have to wait around for new tools for the new standard and then work out how to incorporate these tools into our publishing workflows. We already had tools and workflows that could take advantage of the new standard.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve heard some people describe certain things that SGML specialists didn&amp;rsquo;t like about XML, but these people don&amp;rsquo;t understand that XML was invented by and for SGML specialists, and it made SGML peoples&amp;rsquo; lives much easier. For one thing, we weren&amp;rsquo;t so dependent on Omnimark anymore; at least one of my former employers switched from SGML to XML just so they could ditch Omnimark. XML&amp;rsquo;s companion standard &lt;a href=&#34;http://en.wikipedia.org/wiki/XSLT&#34;&gt;XSLT&lt;/a&gt; let us convert XML to a variety of formats using robust, free, standardized software, and as the web became a bigger publishing medium we found ourselves writing XSLT stylesheets to convert the same XML documents to print, CD-ROM, and HTML. Electronic publishing had never been so easy.&lt;/p&gt;
&lt;h2 id=&#34;id103576&#34;&gt;&amp;hellip;and beyond&amp;hellip;&lt;/h2&gt;
&lt;p&gt;Then along came the dot com boom. People got excited about how &amp;ldquo;seamless e-commerce&amp;rdquo; would change everything. People would save money as obsolete middlemen were removed from old-fashioned transactions, and people would make lots of money by taking part in this streamlining (selling pick axes during a gold rush) or by automating the buying and selling of products.&lt;/p&gt;
&lt;p&gt;Orders would be transmitted over this fabulous free network known as The Internet instead of over the expensive, proprietary &lt;a href=&#34;http://en.wikipedia.org/wiki/Electronic_Data_Interchange&#34;&gt;EDI&lt;/a&gt; networks. But when my computer sent an order to yours, how exactly would this order be represented? XML provided a good syntax: it was plain text, easy to transmit and parse, and could group labeled pieces of information in fairly arbitrary structures while remaining an open, straightforward standard. (When I say &amp;ldquo;straightforward&amp;rdquo;, I&amp;rsquo;m talking about the &lt;a href=&#34;http://www.w3.org/TR/1998/REC-xml-19980210&#34;&gt;original spec&lt;/a&gt; here, not the collection of related specs that most people are referring to when they complain about the complexity of XML. More on this &lt;a href=&#34;#schema&#34;&gt;below&lt;/a&gt;.) This let people send any combination of information back and forth, regardless of the potential lack of compatibility between the back end systems that the different parties were using.&lt;/p&gt;
&lt;p&gt;So, as an important technology of the dot com boom, XML became trendy, and it was a heady feeling to suddenly be an expert in a trendy technology. I&amp;rsquo;ll never forget hearing it mentioned in a Microsoft ad on a prime time network TV show; sure, it was spoken by the character of a geek who normal people weren&amp;rsquo;t supposed to understand, but still, this subset of a niche technology that my friends help to invent was mentioned on prime time network TV. Three different series of XML conference series were running, and they were much better attended than the &lt;a href=&#34;http://www.idealliance.org/events/xtech-2012&#34;&gt;single one&lt;/a&gt; that&amp;rsquo;s left now. The best part was that there was enough money behind some of those conferences to fly most speakers in and put them up in hotels, which got me my first trips to London and Silicon Valley.&lt;/p&gt;
&lt;p&gt;XML wasn&amp;rsquo;t really a perfect fit for ecommerce systems, though. The elements vs. attributes distinction, which publishing systems used to distinguish between content to publish and metadata about that content, didn&amp;rsquo;t have a clear role when describing transactions that weren&amp;rsquo;t content for publishing. XML had some odd data types (NMTOKEN? CDATA?) that only applied to attribute values, instead of traditional data types like integer, string, and boolean that could be applied to content as well as attributes.&lt;/p&gt;
&lt;p&gt;And then there was that strange DTD syntax: if XML was so good at describing structure, why wasn&amp;rsquo;t XML used to describe the structure of a set of documents? The answer is &lt;a href=&#34;#whydtds&#34;&gt;above&lt;/a&gt;, but it didn&amp;rsquo;t get publicized very well, so many people complained about DTD syntax. Everyone agreed that an XML-based schema syntax that provided for traditional data types would be a Good Thing, so various groups came up with &lt;a href=&#34;http://docstore.mik.ua/orelly/xml/schema/appa_03.htm#xmlschema-APP-A-SECT-3.2&#34;&gt;proposals&lt;/a&gt; and the W3C convened a Working Group to review these proposals and come up with a single standard.&lt;/p&gt;
&lt;p&gt;But, in the words of Cindy Lauper, &lt;a href=&#34;http://www.youtube.com/watch?v=3aK-UjR3Oj4&#34;&gt;money changes everything&lt;/a&gt;. XML itself was assembled by eleven specialists in a niche technology, SGML, that wanted to make standardized electronic publishing simpler, and they managed to stay under most radar systems and come out with something &lt;a href=&#34;http://www.w3.org/TR/1998/REC-xml-19980210&#34;&gt;simple and lean&lt;/a&gt;. However, when the XML Schema Working Group convened, many big and small companies were smelling lots of money and wanted to influence the results. Of the 31 companies that sent representatives to this Working Group (31!), many had little or nothing to do with publishing, electronic or otherwise. There were database vendors such as Microsoft, Informix, Software AG, IBM and Oracle (to be fair, large software companies have always been up there with legal publishers and defense contractors as believers in automated publishing technology; note where SGML got its start). There were successful or aspiring B2B ecommerce vendors such as CommerceOne, Progress Software, and webMethods. Microsoft, Xerox, CommerceOne, IBM, Oracle, Progress Software, and Sun were each interested enough to send two representatives to the committee, so there were a lot of cooks working on this broth.&lt;/p&gt;
&lt;p&gt;The result was a &lt;a href=&#34;http://www.w3.org/TR/#tr_XML_Schema&#34;&gt;three-part specification&lt;/a&gt;: Part 0 was a primer, Part 1 specified how to define document structures, and Part 2 described basic data types and how to extend them. Part 2 is pretty good, and also provides the basis for RDF data typing. Part 1, in my opinion, ended up being an ugly, complicated mess in its attempt to serve so many powerful masters.&lt;/p&gt;
&lt;p&gt;Two members of the original eleven-member XML team, James Clark and Makoto Murata, developed an alternative to Part 1 that was both simpler and more powerful called &lt;a href=&#34;http://relaxng.org/&#34;&gt;RELAX NG schemas&lt;/a&gt;. Clark had written the only open source SGML parser, and the first XSLT processor, and came up with the name &amp;ldquo;XML,&amp;rdquo; among his many other achievements; he&amp;rsquo;s also written some &lt;a href=&#34;http://code.google.com/p/jing-trang/&#34;&gt;great software&lt;/a&gt; to implement RELAX NG and convert between schema formats. RELAX NG never became as popular as XML Schema, because it didn&amp;rsquo;t have the big industry names behind it, and because it was optimized around the original XML use case: describing content for publication.&lt;/p&gt;
&lt;p&gt;Despite a complex syntax, incompatibilities among parsers, an often inscrutable spec, and less expressive power than RELAX NG, the W3C XML Schema specification has become popular because it&amp;rsquo;s a W3C standard that addresses the original main problems of XML for ecommerce: it specifies document structures using XML, it lets you use traditional datatypes, and it has the added bonus for many developers of making it easier to round-trip XML elements to Java data structures. (After railing against the influence of this last part for years, I learned that it was primarily the work of Matthew Fuchs, an old friend I&amp;rsquo;ve known since he was finishing up his Ph.D. in computer science at NYU&amp;rsquo;s &lt;a href=&#34;http://cims.nyu.edu/&#34;&gt;Courant Institute&lt;/a&gt; when I was doing my masters there in the mid-nineties. He was the only other person there who even knew what SGML was.) So, XML Schema continues to be used by many large organizations to store data that doesn&amp;rsquo;t fit neatly into relational tables. In fact, &lt;a href=&#34;http://www.topquadrant.com&#34;&gt;TopQuadrant&lt;/a&gt; has been adding more and more features to the TopBraid platform to make it easier to incorporate such data into a system that uses semantic web standards.&lt;/p&gt;
&lt;h2 id=&#34;id103216&#34;&gt;&amp;hellip;and back.&lt;/h2&gt;
&lt;p&gt;Getting back to to the topic of leaner, simpler alternatives for representing information of potentially arbitrary structure, the JavaScript-based &lt;a href=&#34;http://json.org/&#34;&gt;JSON&lt;/a&gt; format started getting popular around 2006. The third paragraph of its &lt;a href=&#34;http://en.wikipedia.org/wiki/JSON&#34;&gt;Wikipedia page&lt;/a&gt; flatly states that &amp;ldquo;it is used primarily to transmit data between a server and web application, serving as an alternative to XML.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;A Google search for &lt;a href=&#34;https://www.google.com/search?q=%22json+replace+xml%22&#34;&gt;&amp;ldquo;json replace xml&amp;rdquo;&lt;/a&gt; gets over 5,000 hits. (That&amp;rsquo;s with the quotes around the search terms, to make Google search for the exact phrase. Without the quotes, it gets almost five million hits.) I like JSON, and see how it can replace many of the uses of XML that have been around since the dot com boom days, but anyone who thinks it can completely replace XML doesn&amp;rsquo;t understand what XML was designed for. Documents with inline markup (or, in XML geekspeak, &amp;ldquo;mixed content&amp;rdquo;—for example, the way the HTML &lt;code&gt;a&lt;/code&gt; element can be in the middle of a sentence within a &lt;code&gt;p&lt;/code&gt; element) would theoretically work fine in JSON, but in practice, it would be too easy to screw it up when editing it with a text editor by accidentally adding or removing a single curly brace. Tools to hide the syntax behind a more intuitive interface may address the issue, but dependence on such tools was something that the original XML designers wanted to avoid. And frankly, when I picture a complex prose document stored in JSON, I hear the ghost of Microsoft&amp;rsquo;s &lt;a href=&#34;http://en.wikipedia.org/wiki/Rich_Text_Format&#34;&gt;RTF&lt;/a&gt; dragging chains through the attic.&lt;/p&gt;
&lt;p&gt;Between JSON&amp;rsquo;s growing role as an inter-computer data format and RELAX NG&amp;rsquo;s foothold in schemas like DocBook and companies like LexisNexis, I see the XML infrastucture getting back to its original use cases, which makes good sense to me. Each year at the &lt;a href=&#34;http://xmlsummerschool.com/&#34;&gt;XML Summer School&lt;/a&gt; in Oxford, it&amp;rsquo;s been very interesting to see the new things people are doing with XML, especially as XQuery-based XML databases like &lt;a href=&#34;http://www.marklogic.com/&#34;&gt;MarkLogic&lt;/a&gt; and &lt;a href=&#34;http://exist.sourceforge.net/&#34;&gt;eXist&lt;/a&gt; grow in power. I&amp;rsquo;ve been chairing the semantic web track at the summer school for the past few years and hardly been involved in XML at all, but it&amp;rsquo;s always great to hear what my old friends are up to. Especially when there&amp;rsquo;s great beer available.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http:/www.snee.com/bob/sgmlfree/&#34;&gt;&lt;img id=&#34;id104074&#34; height=&#34;150&#34; src=&#34;https://www.bobdc.com/img/main/sgmlcdsmall.jpg&#34; border=&#34;0&#34; alt=&#34;SGML CD cover&#34;/&gt;&lt;/a&gt;     &lt;a href=&#34;http://www.snee.com/bob/xmlann/&#34;&gt;&lt;img id=&#34;id104092&#34; height=&#34;150&#34; src=&#34;https://www.bobdc.com/img/main/xmlasbig.gif&#34; border=&#34;0&#34; alt=&#34;XML Annotated Spec cover&#34;/&gt;&lt;/a&gt;     &lt;a href=&#34;http://www.snee.com/bob/xsltquickly/&#34;&gt;&lt;img id=&#34;id104109&#34; height=&#34;150&#34; src=&#34;https://www.bobdc.com/img/main//XQcoverSmall.jpg&#34; border=&#34;0&#34; alt=&#34;XSLT Quickly cover&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2012">2012</category>
      
      <category domain="https://www.bobdc.com//categories/xml">XML</category>
      
    </item>
    
    <item>
      <title>Having a Blue Ridge Christmas</title>
      <link>https://www.bobdc.com/blog/having-a-blue-ridge-christmas/</link>
      <pubDate>Fri, 16 Dec 2011 09:56:01 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/having-a-blue-ridge-christmas/</guid>
      
      
      <description><div>They&#39;re playing my song!</div><div>&lt;p&gt;A few months ago I saw a &lt;a href=&#34;http://cvillechristmascd.wordpress.com/about/&#34;&gt;call for contributions&lt;/a&gt; of recordings of original holiday songs for a CD to be called &amp;ldquo;A Charlottesville Songwriters Christmas&amp;rdquo; to benefit a &lt;a href=&#34;http://www.kidpanalley.org/&#34;&gt;local charity&lt;/a&gt;. Around here there seems to be a law that when you name a business you have to name it either Jefferson (whatever), Piedmont (whatever), or Blue Ridge (whatever), so I decided to write a song whose name is a variation on &amp;ldquo;Blue Christmas&amp;rdquo; called &amp;ldquo;Blue Ridge Christmas.&amp;rdquo; I thought about trying to put together a band to record it, but some friends who I&amp;rsquo;ve &lt;a href=&#34;https://www.facebook.com/pages/Jazz-Collective-9/155717025518&#34;&gt;played jazz&lt;/a&gt; with are also in a &lt;a href=&#34;http://soultransitband.com/&#34;&gt;local soul band&lt;/a&gt; with a really great singer (note his &lt;a href=&#34;http://www.jerusalemchurchva.org/&#34;&gt;day job&lt;/a&gt;), so I offered it to them, and they made a great recording of it.&lt;/p&gt;
&lt;p&gt;For the holiday season, the Charlottesville Downtown Business Association made a &lt;a href=&#34;http://youtu.be/uMvhYX63ds4&#34;&gt;video&lt;/a&gt; to encourage people to shop on the downtown mall and they chose this recording as the music. It was fun for me to see it, and it&amp;rsquo;s nice to know that letting my friends hear the song won&amp;rsquo;t mean ripping it from a charity CD and putting it where people can download it. This doesn&amp;rsquo;t quite compare with my &lt;a href=&#34;http://www.mcylinder.com/&#34;&gt;brother&amp;rsquo;s&lt;/a&gt; work for &lt;a href=&#34;http://www.youtube.com/watch?v=Pa0oA5IxwJk&#34;&gt;VW&lt;/a&gt; or &lt;a href=&#34;http://www.youtube.com/watch?v=h5n7bQdW0CQ&#34;&gt;Wendy&amp;rsquo;s&lt;/a&gt;, but it&amp;rsquo;s fun to know that it came out well and that lots of people can see the video—and that the song has had a bit of &lt;a href=&#34;https://www.facebook.com/permalink.php?story_fbid=319707264724842&amp;amp;id=115486258480278&#34;&gt;airplay&lt;/a&gt; on WNRN!&lt;/p&gt;

&lt;div style=&#34;position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;&#34;&gt;
  &lt;iframe src=&#34;https://www.youtube.com/embed/uMvhYX63ds4&#34; style=&#34;position: absolute; top: 0; left: 0; width: 100%; height: 100%; border:0;&#34; allowfullscreen title=&#34;YouTube Video&#34;&gt;&lt;/iframe&gt;
&lt;/div&gt;

</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2011">2011</category>
      
      <category domain="https://www.bobdc.com//categories/music">music</category>
      
    </item>
    
    <item>
      <title>Javascript from the command line</title>
      <link>https://www.bobdc.com/blog/javascript-from-the-command-li/</link>
      <pubDate>Mon, 21 Nov 2011 08:46:39 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/javascript-from-the-command-li/</guid>
      
      
      <description><div>In Linux and Windows. (Goodbye Cscript!)</div><div>&lt;p&gt;&lt;a href=&#34;http://www.mozilla.org/rhino/&#34;&gt;&lt;img id=&#34;id103337&#34; src=&#34;https://www.bobdc.com/img/main/rhino.jpg&#34; width=&#34;200&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;Mozilla Rhino&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;A few years ago I wrote about &lt;a href=&#34;https://www.bobdc.com/blog/windows-command-line-text-proc&#34;&gt;Windows command line text processing with Javascript&lt;/a&gt; using Microsoft&amp;rsquo;s &lt;a href=&#34;http://technet.microsoft.com/en-us/library/bb490887.aspx&#34;&gt;Cscript&lt;/a&gt; utility. I was surprised to find no Linux equivalent, and while I&amp;rsquo;d heard of &lt;a href=&#34;http://www.mozilla.org/rhino/&#34;&gt;Mozilla Rhino&lt;/a&gt; I had some vague ideas about how using it only meant integrating it into other applications.&lt;/p&gt;
&lt;p&gt;After some hunting, I learned that Rhino includes a jar file that makes it easy to run a script from the command line. Once you have it, running a script named myscript.js is as simple as this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;java -jar js.jar myscript.js
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;If you&amp;rsquo;re really interested in text processing, you can pipe and redirect the output.&lt;/p&gt;
&lt;p&gt;After I downloaded Rhino and got this to work I searched my hard disk and found that js.jar was already on my hard disk in several places: with OpenOffice, with Swoop, and with Eclipse (and therefore with TopBraid Composer), so I&amp;rsquo;ve had it right under my nose for years. &lt;a href=&#34;http://www.mcylinder.com/&#34;&gt;My brother&lt;/a&gt; checked his Mac and found that js.jar came with an &lt;a href=&#34;http://cmusphinx.sourceforge.net/sphinx4/&#34;&gt;open source speech recognizer&lt;/a&gt; that he had installed.&lt;/p&gt;
&lt;p&gt;One neat part was that some fairly complex JavaScript scripts that I had run with Cscript ran with js.jar after one minor change that actually improved the scripts: instead of a &lt;code&gt;print()&lt;/code&gt; function for basic text output, Cscript has this &lt;code&gt;WScript.Echo()&lt;/code&gt; thing instead (WScript is a more Windows-oriented version of Cscript), so I had put the following function in my command-line JavaScript scripts:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;function print(OutString) {
  WScript.Echo(OutString);
};
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Because js.jar supports a native &lt;code&gt;print()&lt;/code&gt; function, the only change necessary to any of my scripts was to comment out the three lines above, and js.jar then happily ran my existing scripts.&lt;/p&gt;
&lt;p&gt;If you start up js.jar without providing a script name as an argument, you get a js command line. Enter &lt;code&gt;help()&lt;/code&gt; there to see some interesting commands that you can add to your scripts—for example, &lt;code&gt;readUrl()&lt;/code&gt;. (Note that these commands are case-sensitive.)&lt;/p&gt;
&lt;p&gt;I mostly tested this on a Windows machine, but it all worked fine on a machine running the latest Ubuntu.&lt;/p&gt;
&lt;p&gt;The reason I got interested in this recently was that I had just pulled a ton of menu definition JavaScript off a website, with the majority of it being JSON definitions of the website&amp;rsquo;s menu structure. I wanted to store all these definitions in SKOS RDF. Once I added and redefined a few functions in the JavaScript code that I had downloaded, I ran it all and redirected the output to RDF files all pretty easily. I&amp;rsquo;m definitely going to have some more fun with this.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2011">2011</category>
      
      <category domain="https://www.bobdc.com//categories/neat-tricks">neat tricks</category>
      
    </item>
    
    <item>
      <title>Publishing academic research data</title>
      <link>https://www.bobdc.com/blog/publishing-academic-research-d/</link>
      <pubDate>Mon, 17 Oct 2011 13:54:58 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/publishing-academic-research-d/</guid>
      
      
      <description><div>My geeky perspective and some broader perspectives.</div><div>&lt;p&gt;&lt;a href=&#34;http://opencitations.wordpress.com/2011/10/17/the-five-stars-of-online-journal-articles-3/&#34;&gt;&lt;img id=&#34;id103344&#34; src=&#34;https://www.bobdc.com/img/main/5stars.png&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; width=&#34;200&#34; alt=&#34;David Shotton&#39;s 5 stars of academic publishing&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Along with Jo Rabin&amp;rsquo;s talk that I mentioned here &lt;a href=&#34;https://www.bobdc.com/blog/displaying-sparql-results-on-a&#34;&gt;earlier this month&lt;/a&gt;, another inspirational talk in the recent &lt;a href=&#34;http://xmlsummerschool.com/&#34;&gt;XML Summer School&lt;/a&gt; &lt;a href=&#34;http://xmlsummerschool.com/curriculum-2011/trends-and-transients-2011/&#34;&gt;Trends and Transients&lt;/a&gt; track was &amp;ldquo;Applying XML and semantic technologies to liberate infectious disease data&amp;rdquo; by Oxford University zoology professor &lt;a href=&#34;http://www.zoo.ox.ac.uk/staff/academics/shotton_dm.htm&#34;&gt;David Shotton&lt;/a&gt;. He described how, while assembling a paper on leptospira infection in urban slums, he used data and metadata from the project to create the version described in a separate paper, &lt;a href=&#34;http://ora.ox.ac.uk/objects/uuid%3A3e39b4ec-8cdd-40d6-8648-a5d7b2946bb9&#34;&gt;Semantically enhanced version of a research article from PLoS Neglected Tropical Diseases&lt;/a&gt;. (Note the bottom of that page, where it lets you pull down bibliographic data in your choice of RDF serializations. Also, don&amp;rsquo;t miss the &lt;a href=&#34;http://imageweb.zoo.ox.ac.uk/pub/2008/plospaper/latest/&#34;&gt;semantically enhanced paper&lt;/a&gt; itself, and make sure to click around in it.)&lt;/p&gt;
&lt;p&gt;After his presentation one audience member asked how an academic department with limited resources and technical background could move in this same direction without attempting to reproduce the full infrastructure, and Professor Shotton suggested that they start by putting their research data on the web along with some metadata about it. This got me thinking about Tim Berners-Lee&amp;rsquo;s &lt;a href=&#34;http://www.w3.org/DesignIssues/LinkedData.html&#34;&gt;Linked Data 5 Stars&lt;/a&gt;, a series of incremental steps toward publishing open linked data in machine-readable standardized formats. I raised my hand and suggested to Shotton that, building on his answer to that question, an alternative version of the five stars for academic researchers could provide a valuable guideline for others interested in following in his footsteps. And he&amp;rsquo;s done it! He just published &lt;a href=&#34;http://opencitations.wordpress.com/2011/10/17/the-five-stars-of-online-journal-articles-3/&#34;&gt;The Five Stars of Online Journal Articles&lt;/a&gt; on his blog, which points to a longer version of the article that he&amp;rsquo;s submitted to &lt;a href=&#34;http://www.nature.com/&#34;&gt;Nature&lt;/a&gt;. My original idea was more of a revision of Berners-Lee&amp;rsquo;s original five stars, but Shotton drew on his extensive academic publishing experience to bring in a lot of bigger-picture issues such as peer review and specific repositories that could host such data.&lt;/p&gt;
&lt;p&gt;I had been thinking about the potential of academic researchers publishing data using Linked Data principles before this year&amp;rsquo;s XML Summer School; one reason I started the &lt;a href=&#34;http://www.meetup.com/cvillesemweb/&#34;&gt;Charlottesville Semantic Web Meetup&lt;/a&gt; was to find people at the University of Virginia who were interested in pursuing this. I recently learned about someone else who&amp;rsquo;s been thinking hard about issues around publication of research data: UCLA&amp;rsquo;s &lt;a href=&#34;http://polaris.gseis.ucla.edu/cborgman/Chriss_Site/Welcome.html&#34;&gt;Christine Borgman&lt;/a&gt;, whose paper &lt;a href=&#34;http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1869155&#34;&gt;The Conundrum of Sharing Research Data&lt;/a&gt; appeared in the June issue of the Journal of the American Society for Information Science and Technology. (Click &amp;ldquo;One-Click Download&amp;rdquo; on that page to retrieve the paper itself.)&lt;/p&gt;
&lt;p&gt;As I realized when I read David Shotton&amp;rsquo;s article, I&amp;rsquo;ve been focused on the technical issues, but there are many others to consider. Here are a few quotes from Borgman&amp;rsquo;s abstract:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;This article explores the complexities of data, research practices, innovation, incentives, economics, intellectual property, and public policy associated with the data sharing conundrum.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;Rationales for sharing data vary along two dimensions: whether motivated by research concerns or by leveraging public investments, and whether intended to serve the interests of researchers who produce data or the interests of potential re-users of data.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;Four rationales for sharing research data are identified and positioned on these dimensions. Researchers’ incentives to share their data depend not only on these rationales, but on characteristics of their data and research practices, funding agency policies, and resources for data management. Much more is understood about why researchers do not share data than about when, why, and how researchers do share data, or about when, how, and why researchers or the public reuse data. The model and research agenda are illustrated with examples from the sciences, social sciences, and humanities.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Here&amp;rsquo;s one quote from the main body of the article:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;If the rewards of big data are to be reaped, then researchers who produce those data must share them, and do so in such a way that the data are interpretable and reusable by others. Underlying this simple statement are thick layers of complexity about the nature of data, research, innovation, and scholarship, incentives and rewards, economics and intellectual property, and public policy.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Her paper goes on to describe these layers. And, I have to love any academic paper that refers to a &amp;ldquo;dirty little secret.&amp;rdquo; I&amp;rsquo;ll let you find that part yourself. While Borgman&amp;rsquo;s paper doesn&amp;rsquo;t get down to the level of data models and serializations for sharing data, if you&amp;rsquo;re at all interested in how Linked Data may benefit the academic research world, her paper is really worth reading.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2011">2011</category>
      
      <category domain="https://www.bobdc.com//categories/linked-data">linked-data</category>
      
      <category domain="https://www.bobdc.com//categories/publishing">publishing</category>
      
    </item>
    
    <item>
      <title>Displaying SPARQL results on a mobile phone</title>
      <link>https://www.bobdc.com/blog/displaying-sparql-results-on-a/</link>
      <pubDate>Tue, 04 Oct 2011 09:31:53 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/displaying-sparql-results-on-a/</guid>
      
      
      <description><div>Nicely.</div><div>&lt;blockquote id=&#34;id103334&#34; class=&#34;pullquote&#34;&gt;The ability to create mobile-native web apps with SPARQL and simple XSLT stylesheets should open up a lot of possibilities.&lt;/blockquote&gt;
&lt;p&gt;&lt;a href=&#34;http://xmlsummerschool.com/faculty-2011/#rabin&#34;&gt;Jo Rabin&lt;/a&gt;&amp;rsquo;s &amp;ldquo;Mobile is not The Future (It’s Now)&amp;rdquo; presentation in the &lt;a href=&#34;http://xmlsummerschool.com/curriculum-2011/trends-and-transients-2011/&#34;&gt;Trends and Transients&lt;/a&gt; portion of this year&amp;rsquo;s &lt;a href=&#34;http://xmlsummerschool.com/&#34;&gt;XML Summer School&lt;/a&gt; (and the reading he suggested, such as &lt;a href=&#34;http://communities-dominate.blogs.com/brands/2011/09/22-percent-changed-their-mind-while-in-the-store-why-every-retailer-needs-a-mobile-strategy.html&#34;&gt;this Tomi Ahonen blog post&lt;/a&gt;) got me thinking much harder about mobile delivery. One of my first ideas was how easy the &lt;a href=&#34;http://jquerymobile.com/&#34;&gt;jQuery Mobile&lt;/a&gt; Javascript library could make it to deliver SPARQL query results, and in less than 30 minutes I wrote an &lt;a href=&#34;http://snee.com/sparql/xslt/SPARQLMobileResults.xsl&#34;&gt;XSLT stylesheet&lt;/a&gt; that can take the &lt;a href=&#34;http://www.w3.org/TR/2008/REC-rdf-sparql-XMLres-20080115/&#34;&gt;SPARQL Query Results XML Format&lt;/a&gt; version of any SPARQL query result and use this library to render the results nicely for mobile phones.&lt;/p&gt;
&lt;p&gt;A SPARQL query that SELECTs more than one variable returns a two-dimensional grid of information, but a more one-dimensional display works better on phones, so the initial display created by my stylesheet is a series of buttons that show the values of the first selected variable. Clicking one displays the values that go with it—the values that would have been the rest of its row in a two-dimensional display. Below, on both an LG Ally running Android and on an iPhone, you can see the stylesheet&amp;rsquo;s rendering of DBpedia&amp;rsquo;s results from a query for the name, artist, release date, and URI of albums produced by Timbaland. Below that you can see the same thing on the Ally after I turned the phone sideways. (Click either image to see a larger version.) You can see the results of the query in your own browse formatted for a mobile &lt;a href=&#34;http://snee.com/sparql/m/timbaland.html&#34;&gt;here&lt;/a&gt;; for context (and to see the actual query) see the &lt;a href=&#34;http://dbpedia.org/snorql/?query=PREFIX+dbpedia-owl%3A+%3Chttp%3A%2F%2Fdbpedia.org%2Fontology%2F%3E+%0D%0APREFIX+foaf%3A+%3Chttp%3A%2F%2Fxmlns.com%2Ffoaf%2F0.1%2F%3E%0D%0ASELECT+%3FalbumName+%3FartistName+%3FreleaseDate++%3FalbumURL+WHERE+%0D%0A%7B+%3FalbumURL+dbpedia-owl%3Aproducer+%3Chttp%3A%2F%2Fdbpedia.org%2Fresource%2FTimbaland%3E+%3B++++++++%0D%0A++++++++++++dbpedia-owl%3Aartist+%3Fartist+%3B++++++++%0D%0A++++++++++++dbpedia-owl%3AreleaseDate+%3FreleaseDate+%3B++++++++%0D%0A++++++++++++foaf%3Aname+%3FalbumName+.++%0D%0A++%3Fartist+foaf%3Aname+%3FartistName.++++%0D%0A++FILTER+%28+lang%28%3FartistName%29+%3D+%27en%27+%29++++%0D%0A++FILTER+%28+lang%28%3FalbumName%29+%3D+%27en%27+%29%0D%0A%7D%0D%0AORDER+BY+%3FreleaseDate%0D%0A&#34;&gt;DBpedia default display&lt;/a&gt; of the results.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://www.bobdc.com/img/main/AndroidAndIPhone.jpg&#34;&gt;&lt;img id=&#34;id103496&#34; src=&#34;https://www.bobdc.com/img/main/AndroidAndIPhone.jpg&#34; border=&#34;0&#34; alt=&#34;Android LG Ally and iPhone showing SPARQL results&#34; width=&#34;300&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://www.bobdc.com/img/main/AndroidHorz.jpg&#34;&gt;&lt;img id=&#34;id103515&#34; src=&#34;https://www.bobdc.com/img/main/AndroidHorz.jpg&#34; border=&#34;0&#34; alt=&#34;Horizontal Android LG Ally&#34; width=&#34;300&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;For another demo query, I asked DBpedia for the names, revenue figures, foundation year, and descriptions of CRM vendors. Compare the &lt;a href=&#34;http://snee.com/sparql/m/crm.html&#34;&gt;version formatted for mobiles&lt;/a&gt; with &lt;a href=&#34;http://dbpedia.org/snorql/?query=PREFIX+rdfs%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F2000%2F01%2Frdf-schema%23%3E%0D%0APREFIX+dct%3A+%3Chttp%3A%2F%2Fpurl.org%2Fdc%2Fterms%2F%3E+%0D%0APREFIX+dbo%3A+%3Chttp%3A%2F%2Fdbpedia.org%2Fontology%2F%3E%0D%0ASELECT+%3Fname+%3Ffounded+%3Frevenue+%3Fdescription+WHERE+%7B%0D%0A++%3Fco+dct%3Asubject+%3Chttp%3A%2F%2Fdbpedia.org%2Fresource%2FCategory%3ACRM_software_companies%3E+%3B%0D%0A++++++rdfs%3Alabel+%3Fname+%3B%0D%0A++++++dbo%3Arevenue+%3Frevenue+%3B%0D%0A++++++rdfs%3Acomment+%3Fdescription+%3B%0D%0A++++++dbo%3AformationYear+%3Ffounded+.%0D%0A++FILTER+%28+lang%28%3Fname%29+%3D+%27en%27+%29%0D%0A++FILTER+%28+lang%28%3Fdescription%29+%3D+%27en%27+%29%0D%0A%7D%0D%0AORDER+BY+%3Fname%0D%0A%0D%0A%0D%0A&#34;&gt;the default DBpedia display&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;A few issues to keep in mind:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;The display includes variable names with each value to show what that value represents (for example, albumName and releaseDate in the pictures above), but you could customize the stylesheet to display the text any way you like, especially if you planned on using it with a specific dataset. For example, you could omit the variable names or have your query provide &lt;code&gt;rdfs:label&lt;/code&gt; versions of them to use instead.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Long strings of text with no spaces to wrap, like the album URLs in the Timbaland query results, may not look great, but I included the albumURL one in that query just to make sure that my stylesheet would render them as working hypertext links.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;If your first variable represents a resource URI instead of a literal value, it won&amp;rsquo;t be a hypertext link in the displayed page, because pressing the button with each result row&amp;rsquo;s first value expands or contracts the display of the rest of the row&amp;rsquo;s values. It makes more sense to have human-readable values and not URIs on the initial display&amp;rsquo;s buttons anyway.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;If your query retrieves a lot of data, the stylesheet creates a big HTML file, and the button response may be slow on your phone, especially if the model is as old as my LG Ally.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I&amp;rsquo;ve read a little about jQuery, but I didn&amp;rsquo;t need any of what I learned from that reading to create this stylesheet. If you&amp;rsquo;re happy with the effects of a particular jQuery library, using it may mean no more than creating some simple HTML (typically, some &lt;code&gt;ul&lt;/code&gt;, &lt;code&gt;table&lt;/code&gt;, and &lt;code&gt;div&lt;/code&gt; elements) with specific attributes set for them so that the right jQuery code affects the right elements. To design the pages created by my stylesheet, I just viewed the source and followed the model on the &lt;a href=&#34;http://jquerymobile.com/demos/1.0b3/docs/content/content-collapsible-set.html&#34;&gt;collapsible content&lt;/a&gt; page of the jQuery Mobile site.&lt;/p&gt;
&lt;p&gt;The XML format SPARQL query results format is a model of elegant simplicity compared with RDF/XML. (Granted, it has a much simpler job to do.) Writing code to process it in any language is usually easy. If you&amp;rsquo;re new to XSLT, then with some bias I can recommend &lt;a href=&#34;http://www.snee.com/bob/xsltquickly/index.html&#34;&gt;a book on XSLT&lt;/a&gt; that has helped many people I know learn it quickly.&lt;/p&gt;
&lt;p&gt;The ability to create mobile-native web apps with SPARQL and simple XSLT stylesheets should open up a lot of possibilities, because semantic web and linked data application architectures ranging from simple batch files to TopBraid&amp;rsquo;s &lt;a href=&#34;http://topquadrant.com/products/SPARQLMotion.html&#34;&gt;SPARQLMotion&lt;/a&gt; let you hand off XML format SPARQL query results to an XSLT processor. (It should work with the SNORQL interface to Linked Data Cloud datasets such as DBpedia, where the input form lets you specify your own XSLT stylesheet to run, but &lt;a href=&#34;http://sourceforge.net/mailarchive/forum.php?thread_name=4E889FF0.40908%40openlinksw.com&amp;amp;forum_name=dbpedia-discussion&#34;&gt;this feature is currently disabled on the DBpedia Virtuoso instance&lt;/a&gt;. It will be great if they enable it or include a similar stylesheet among the installed choices; meanwhile, you can retrieve the XML results and run the XSLT on your own system.)&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;2011-10-05 update:&lt;/strong&gt; with Kingsley Idehen&amp;rsquo;s help, I now know how to query DBpedia with my own (or any other) XSLT stylesheet. Remove the carriage returns from the following and replace the &amp;amp;query parameter value as described:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;http://dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.org
&amp;amp;query=REPLACE-WITH-ESCAPED-QUERY
&amp;amp;format=application%2Fsparql-results%2Bxml&amp;amp;save=display&amp;amp;fname=
&amp;amp;xslt-uri=http://snee.com/sparql/xslt/SPARQLMobileResults.xsl
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;For example, &lt;a href=&#34;http://dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.org&amp;amp;query=PREFIX%20dbo%3A%20%3Chttp%3A%2F%2Fdbpedia.org%2Fontology%2F%3E%0ASELECT%20%3Fname%20%3Faliases%20%3Fborn%20%3Fdied%20WHERE%20%7B%0A%3Chttp%3A%2F%2Fdbpedia.org%2Fresource%2FThe_Beatles%3E%20dbo%3AbandMember%20%3FbeatleURL%20.%0A%3FbeatleURL%20%3Chttp%3A%2F%2Fdbpedia.org%2Fproperty%2FalternativeNames%3E%20%3Faliases%20%3B%0Ardfs%3Alabel%20%3Fname%20%3B%0Adbo%3AbirthDate%20%3Fborn%20.%0AOPTIONAL%20%7B%3FbeatleURL%20dbo%3AdeathDate%20%3Fdied%20.%20%7D%0AFILTER%20(%20lang(%3Fname)%20%3D%20%22en%22%20)%0A%7D%0A&amp;amp;format=application%2Fsparql-results%2Bxml&amp;amp;save=display&amp;amp;fname=&amp;amp;xslt-uri=http://snee.com/sparql/xslt/SPARQLMobileResults.xsl&#34;&gt;this query&lt;/a&gt; asks DBpedia for the Beatles&amp;rsquo; names, aliases, birth dates, and death dates, and formats the results with the spreadsheet described above.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Note on comments&lt;/strong&gt;: after turning off comments on this blog for a few days because of comment spam, turning them back seems to have no effect. So, inspired by &lt;a href=&#34;http://www.jenitennison.com/blog/&#34;&gt;Jeni Tennison&lt;/a&gt;, I&amp;rsquo;ll ask you to add any comments to &lt;a href=&#34;https://plus.google.com/101006505484718936507/posts/DDX3fjABLSf&#34;&gt;this Google+ post&lt;/a&gt;.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2011">2011</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
      <category domain="https://www.bobdc.com//categories/xslt">XSLT</category>
      
    </item>
    
    <item>
      <title>RDFa can be so simple</title>
      <link>https://www.bobdc.com/blog/rdfa-can-be-so-simple/</link>
      <pubDate>Tue, 16 Aug 2011 08:19:12 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/rdfa-can-be-so-simple/</guid>
      
      
      <description><div>Despite claims to the contrary.</div><div>&lt;blockquote id=&#34;id103350&#34; class=&#34;pullquote&#34;&gt;You can write simple, parsable RDFa with very little syntax and trouble. Really.&lt;/blockquote&gt;
&lt;p&gt;I got so tired of hearing people complain about how confusing RDFa is that while I was on hold during a recent phone call I threw together a &lt;a href=&#34;http://rdfdata.org/dat/rdfademo.html&#34;&gt;demo&lt;/a&gt; of just how simple it can be. The document has the two basic kinds of triples: one with a literal for an object, with data typing thrown in for good measure, and one with a resource URI as its object. A View Source of that document will show this in its &lt;code&gt;head&lt;/code&gt; element (namespaces are declared earlier):&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;    &amp;lt;meta about=&amp;quot;http://www.snee.com/bob/foaf.rdf#bob&amp;quot;
          property=&amp;quot;foaf:givenName&amp;quot;
          content=&amp;quot;Bob&amp;quot;
          datatype=&amp;quot;xsd:string&amp;quot;/&amp;gt;


    &amp;lt;meta about=&amp;quot;http://www.snee.com/bob/foaf.rdf#bob&amp;quot;
          rel=&amp;quot;foaf:homePage&amp;quot;
          href=&amp;quot;http://www.snee.com/bob&amp;quot;/&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;a href=&#34;http://www.w3.org/2007/08/pyRdfa/extract?uri=http%3A%2F%2Frdfdata.org%2Fdat%2Frdfademo.html&#34;&gt;This link&lt;/a&gt; will show you the triples as extracted by the W3C&amp;rsquo;s RDFa Distiller and Parser service.&lt;/p&gt;
&lt;p&gt;My little demo doesn&amp;rsquo;t take into account all the swirling attempts to innovate, accommodate, and disassociate various ideas about embedding machine-readable markup that are currently out there (if you want to stay on top of this, read &lt;a href=&#34;http://www.jenitennison.com/blog/&#34;&gt;Jeni Tennison&amp;rsquo;s blog)&lt;/a&gt;, but it highlights a principle that is probably older than FORTRAN: parsing data in a particular syntax can be a big job, because the parser must understand the full language, but writing data in a particular language can be simple because you can pick the subset that you prefer to work with.&lt;/p&gt;
&lt;p&gt;RDFa gives you many more options for embedding triples—especially if you want to embed metadata about content this is already part of an HTML page, which seems to be a key original use case, or about the page itself—but you can write simple, parsable RDFa with very little syntax and trouble. Really.&lt;/p&gt;
&lt;p&gt;(&lt;strong&gt;Note on comments&lt;/strong&gt;: after turning off comments on this blog for a few days because of comment spam, turning them back seems to have no effect. If you send me an email about what I&amp;rsquo;ve written at snee.com (bob), I&amp;rsquo;ll add it and any response here.)&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2011">2011</category>
      
      <category domain="https://www.bobdc.com//categories/rdf">RDF</category>
      
    </item>
    
    <item>
      <title>&#34;Learning SPARQL&#34; now available</title>
      <link>https://www.bobdc.com/blog/learning-sparql-now-available/</link>
      <pubDate>Wed, 27 Jul 2011 08:12:53 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/learning-sparql-now-available/</guid>
      
      
      <description><div>In print and ebook formats.</div><div>&lt;p&gt;&lt;a href=&#34;http://www.learningsparql.com&#34;&gt;&lt;img id=&#34;id103338&#34; src=&#34;http://www.learningsparql.com/img/cover.jpg&#34; width=&#34;200&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;Learning SPARQL cover&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;m very happy to announce that the ebook and print editions of &lt;a href=&#34;http://www.learningsparql.com&#34;&gt;Learning SPARQL&lt;/a&gt; are now &lt;a href=&#34;http://oreilly.com/catalog/0636920020547/&#34;&gt;available from O&amp;rsquo;Reilly&lt;/a&gt;. Print editions are also available from &lt;a href=&#34;http://www.amazon.com/exec/obidos/ISBN=1449306594/bobducharmeA/&#34;&gt;amazon.com&lt;/a&gt;, &lt;a href=&#34;http://www.amazon.co.uk/Learning-SPARQL-Bob-DuCharme/dp/1449306594&#34;&gt;amazon.co.uk&lt;/a&gt;, maybe some more Amazons, and &lt;a href=&#34;http://www.barnesandnoble.com/w/learning-sparql-bob-ducharme/1103138225&#34;&gt;Barnes and Noble&lt;/a&gt;. (&lt;a href=&#34;http://www.borders.com/online/store/TitleDetail?sku=1449306594&#34;&gt;Borders&lt;/a&gt; says that it&amp;rsquo;s on backorder, but I wouldn&amp;rsquo;t hold your breath for that.) You can read more about how I came to write the book in an &lt;a href=&#34;https://www.bobdc.com/blog/my-upcoming-oreilly-book-learn&#34;&gt;earlier blog posting&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Right now it&amp;rsquo;s the only complete book on the W3C standard query language for linked data and the semantic web, and as far as I know the only book at all that covers the full range of SPARQL 1.1 features such as the ability to update data. The book steps you through simple examples that can all be performed with free software, and all sample queries, data, and output are available on the book&amp;rsquo;s website. In the words of &lt;a href=&#34;http://datypic.com/&#34;&gt;Priscilla Walmsley&lt;/a&gt;, &amp;ldquo;It&amp;rsquo;s excellent—very well organized and written, a completely painless read. I not only feel like I understand SPARQL now, but I have a much better idea why RDF is useful (I was a little skeptical before!)&amp;rdquo;&lt;/p&gt;
&lt;p&gt;I will continue to post news about the book and about SPARQL on the book&amp;rsquo;s twitter account at &lt;a href=&#34;http://twitter.com/#!/learningsparql&#34;&gt;@LearningSPARQL&lt;/a&gt;. I&amp;rsquo;m not starting a separate blog for the book, so I will continue to blog about SPARQL &lt;a href=&#34;http://www.snee.com/bobdc.blog/metadata/rdf/sparql/&#34;&gt;here&lt;/a&gt;.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2011">2011</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
      <category domain="https://www.bobdc.com//categories/publishing">publishing</category>
      
    </item>
    
    <item>
      <title>Linking linked data to U.S. law</title>
      <link>https://www.bobdc.com/blog/linking-linked-data-to-us-law/</link>
      <pubDate>Fri, 08 Jul 2011 08:29:08 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/linking-linked-data-to-us-law/</guid>
      
      
      <description><div>Automating conversion of citations into URLs.</div><div>&lt;p&gt;At a recent &lt;a href=&#34;http://www.w3.org/2011/gld/wiki/F2F1&#34;&gt;W3C Government Linked Data Working Group working group meeting&lt;/a&gt;, I started thinking more about the role in linked data of laws that are published online. To summarize, you don&amp;rsquo;t want to publish the laws themselves as triples, because they&amp;rsquo;re a bad fit for the triples data model, but as online resources relevant to a lot of issues out there, they make an excellent set of resources to point to, although you may not always get the granularity you want.&lt;/p&gt;
&lt;blockquote id=&#34;id103354&#34; class=&#34;pullquote&#34;&gt;Plenty of government data references laws and related materials.&lt;/blockquote&gt;
&lt;p&gt;I&amp;rsquo;m discussing U.S. Federal law here, but similar principles should apply both in individual states and in other countries. The main sets of laws here are legislation, code, regulations, and court decisions. (&amp;ldquo;Code&amp;rdquo; refers to laws passed by legislation, arranged by topic; for example, laws passed about taxes are gathered into the Internal Revenue Code.) If you really want to learn about the various forms of legal material and their relationship, I highly recommend the book &lt;a href=&#34;http://www.amazon.com/Finding-Law-12th-American-Casebooks/dp/0314145796/bobducharmeA/&#34;&gt;Finding the Law&lt;/a&gt;, which I found indispensable when I worked at LexisNexis.&lt;/p&gt;
&lt;p&gt;Most law consists of narrative sentences arranged as paragraphs, often with metadata assigned to certain blocks of it. It&amp;rsquo;s such a good fit for XML that legal publishers were among the first users of XML&amp;rsquo;s predecessor, SGML. (Their use of XML and SGML account for a large chunk of my career, and I know that some old XML friends like &lt;a href=&#34;http://seanmcgrath.blogspot.com/&#34;&gt;Sean McGrath&lt;/a&gt; and Dale Waldt continue to make great contributions in this area.) So, while you wouldn&amp;rsquo;t get much benefit splitting these sentences and paragraphs into subjects, predicates, and objects and publishing them as triples, plenty of government data references laws and related materials, and it&amp;rsquo;s more helpful if they can reference them with URLs that lead to the actual laws. To add these URLs with any kind of scalability, you need to find out the common format for citing a document (or, if possible, a point within a document) and an online source of those legal documents whose URLs can be built from that citation format with a regular expression or some other automated tool.&lt;/p&gt;
&lt;p&gt;When creating links to any specific bits of U.S. law, the most valuable book is &lt;a href=&#34;http://www.amazon.com/Bluebook-Uniform-System-Citation/dp/0615361161/bobducharmeA&#34;&gt;The Bluebook: A Uniform System of Citation&lt;/a&gt;. As the subtitle implies, the book describes the normalized way to refer to legal documents and their components. Once you know these, a regular expression can often turn them into a URL that leads a browser right to the part you want. For example, while people often refer to the Supreme Court case outlawing school segregation as &amp;ldquo;Brown v. Board of Education&amp;rdquo;, its official name is &amp;ldquo;347 U.S. 483&amp;rdquo;, which means &amp;ldquo;the case beginning on page 483 of volume 347 of the official publication of U.S. Supreme Court decisions&amp;rdquo;.&lt;/p&gt;
&lt;p&gt;While there are several sites hosting Supreme Court decisions out there, notably Cornell Law School&amp;rsquo;s &lt;a href=&#34;http://www.law.cornell.edu/supct/&#34;&gt;Legal Information Institute&lt;/a&gt;, the one whose URLs are easiest to construct from a proper Supreme Court citation are at justia.com, where the URL for Brown v. Board of Education is &lt;a href=&#34;http://supreme.justia.com/us/347/483/case.html&#34;&gt;http://supreme.justia.com/us/347/483/case.html&lt;/a&gt;. (See also my favorite case, Campbell aka Skyywalker et al v. Acuff Rose Music, Inc. at &lt;a href=&#34;http://supreme.justia.com/us/510/569/case.html&#34;&gt;http://supreme.justia.com/us/510/569/case.html&lt;/a&gt;. Make sure to listen to the relevant work &lt;a href=&#34;http://www.youtube.com/watch?v=65GQ70Rf_8Y&#34;&gt;on YouTube&lt;/a&gt; while you review it.) If you&amp;rsquo;re really interested in linked data and U.S. Supreme Court cases, DBpedia has lots of great metadata for many important cases, as I wrote about in &lt;a href=&#34;https://www.bobdc.com/blog/court-decision-metadata-and-db&#34;&gt;Court decision metadata and DBpedia&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;To create a URL for other U.S. court systems, you&amp;rsquo;ll have to look up the proper way to cite them in a resource like the Bluebook and then look for versions of that court&amp;rsquo;s cases online with URLs that reflect the citation in a manner that lets you automate the creation of the URL. This is a theme for linking to any kind of law on the web, and you can be sure that developers at the Legal Information Institute, LexisNexis, WestLaw, and other legal publishers have put plenty of time into developing regular expressions to make this happen so that they can turn plain text citations into hypertext links. (It would be great if the LII made their regular expressions public. LexisNexis and WestLaw never would, although they&amp;rsquo;re more interested in keeping such proprietary work away from each other than from us.)&lt;/p&gt;
&lt;p&gt;Legislation can be more complicated, but two excellent resources make it remarkably simple: the Library of Congress&amp;rsquo;s &lt;a href=&#34;http://thomas.loc.gov/home/thomas.php&#34;&gt;THOMAS&lt;/a&gt; system lets you create persistent URLs for legislation using the &lt;a href=&#34;http://thomas.loc.gov/home/handles/help.html&#34;&gt;handle system&lt;/a&gt; (see also &lt;a href=&#34;http://www.handle.net/factsheet.html&#34;&gt;its inventor&amp;rsquo;s web page on it&lt;/a&gt;), which I hadn&amp;rsquo;t heard of before the Government Linked Data meeting. The Law Librarian Blog has a &lt;a href=&#34;http://lawprofessors.typepad.com/law_librarian_blog/2008/10/lc-thomas-imple.html&#34;&gt;nice entry&lt;/a&gt; showing examples of how to use it. &lt;a href=&#34;http://legislink.org/&#34;&gt;LegisLink&lt;/a&gt; is another way to link to legislation, and looks simpler to me. A Legal Information Institute &lt;a href=&#34;http://blog.law.cornell.edu/voxpop/tag/persistent-urls-for-legal-information/&#34;&gt;blog entry&lt;/a&gt; has a good explanation of this, and LegisLink provides an excellent &lt;a href=&#34;http://legislink.org/us&#34;&gt;form&lt;/a&gt; to construct the URLs. These even let you construct links to a specific section of a piece of legislation.&lt;/p&gt;
&lt;p&gt;Granularity is an even bigger issue when linking to code and regulations, which are often broken down into numbered and lettered pieces of pieces of pieces. Ever since I worked at the grandly named &lt;a href=&#34;http://ria.thomsonreuters.com/&#34;&gt;Research Institute of America&lt;/a&gt; (a publisher of hyperlinked U.S. tax law and related information), it&amp;rsquo;s always irked me to see people refer to a pension plan as a 401K, because as subsection k of section 401 of the U.S. Tax Code (title 26 of the U.S. Code), it&amp;rsquo;s more properly written 401(k), or, to use its full name, 26 USC 401(k). The Government Printing Office lets you you link directly to section 401, if not subsection k, with the URL &lt;a href=&#34;http://frwebgate.access.gpo.gov/cgi-bin/getdoc.cgi?dbname=browse_usc&amp;amp;docid=Cite:+26USC401&#34;&gt;http://frwebgate.access.gpo.gov/cgi-bin/getdoc.cgi?dbname=browse_usc&amp;amp;docid=Cite:+26USC401&lt;/a&gt;, and the LII lets you link to it with &lt;a href=&#34;http://www.law.cornell.edu/uscode/26/usc_sec_26_00000401----000-.html&#34;&gt;http://www.law.cornell.edu/uscode/26/usc_sec_26_00000401----000-.html&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;That&amp;rsquo;s the US Code, which arranges the laws by topic. Regulations are arranged by topic in the CFR, or Code of Federal Regulations. For example, the legal definition of bourbon is in title 27 of the CFR (Alcohol, Tobacco Products and Firearms), Part 5 (Labeling and Advertising of Distilled Spirits), section 22 (The standards of identity), subsection b (Class 2; whisky) subsubsubsection (1)(i). The full citation would be 27 CFR 5.22(b)(1)(i), but I know of no way to link to anything more specific than 27 CFR 5.22: &lt;a href=&#34;http://edocket.access.gpo.gov/cfr_2010/aprqtr/27cfr5.22.htm&#34;&gt;http://edocket.access.gpo.gov/cfr_2010/aprqtr/27cfr5.22.htm&lt;/a&gt;. (Bookmark that on your phone&amp;rsquo;s browser and then bet a Maker&amp;rsquo;s Mark with the next barroom loudmouth that you hear insisting that bourbon must legally be made in Bourbon County, Kentucky. He&amp;rsquo;s wrong. It can be made anywhere in the United States.)&lt;/p&gt;
&lt;p&gt;As you can see, there&amp;rsquo;s some work involved in creating URLs for links to laws, but research for this blog entry led me to new resources like LegisLink that I hadn&amp;rsquo;t heard of before, so I encourage you to let me know if there&amp;rsquo;s anything important that I&amp;rsquo;m missing.&lt;/p&gt;
&lt;p&gt;It was also interesting to see that the LII is involved in &lt;a href=&#34;http://topics.law.cornell.edu/wiki/lexcraft/urn_lex&#34;&gt;efforts&lt;/a&gt; to create an international standard for legal document URIs proposed by some Italian legal researchers. (This is particularly interesting when you consider that Italian legal researchers basically &lt;a href=&#34;http://www.oreillynet.com/xml/blog/2003/05/when_did_linking_begin.html&#34;&gt;invented the concept of linking&lt;/a&gt; 900 years ago.)&lt;/p&gt;
&lt;p&gt;A comment from Frank Bennett of Nagoya University&amp;rsquo;s Faculty of Law:&lt;/p&gt;
&lt;p&gt;These are indeed important developments. The systematic linking of case law and statutory data promise to have a large and positive impact on our access to legal resources. The only point I would take issue with is the reliance on Bluebook citation forms as the rosetta stone for identifying resources. Parsing cites out of plain text is a necessary kludge, given the general absence of meaningful structured metadata from online legal resources (thank you Lexis, thank you WestLaw), but it should be recognized as a kludge.&lt;/p&gt;
&lt;p&gt;To get a lively set of service layers running on top of legal data, the metadata contained in or relevant to a particular case, statutory provision or regulatory provision needs to be readily accessible to calling applications. While it is true that string parsing machinery can be written to a good standard, assuming perfectly regular citation forms and uniform document formats, neither of those constraints applies in the wild. The Bluebook shares the field in North America with the ALWD and the McGill Guide. To make matters worse, the Bluebook specifies citation forms for some foreign legal resources that vary significantly from the native citation forms of the target jurisdictions. Document formats vary as well, so getting an accurate string parse may require special-purpose serialization of the document before applying a string parser to the text &amp;ndash; which may be hundreds of pages in length. Although certainly better than nothing, string parsing is a fragile strategy that would be very cumbersome to standardize and does not scale well.&lt;/p&gt;
&lt;p&gt;Matching rendered cites to URLs is an important prospect, but we won&amp;rsquo;t see significant progress at the application level until the intervening step of producing true structured metadata &amp;ndash; and embedding it in our online resources &amp;ndash; is covered.&lt;/p&gt;
&lt;p&gt;A comment from Augusto Herrmann:&lt;/p&gt;
&lt;p&gt;I just read your interesting article intitled &amp;ldquo;Linking linked data to U.S. law&amp;rdquo;. I&amp;rsquo;d like to point you to a quite successful government project that uses URN for Brazilian legislation. The portal where you can search for legislation is at &lt;a href=&#34;http://www.lexml.gov.br&#34;&gt;http://www.lexml.gov.br&lt;/a&gt; and information about the project can be found on &lt;a href=&#34;http://projeto.lexml.gov.br&#34;&gt;http://projeto.lexml.gov.br&lt;/a&gt; . There you can find the document &lt;a href=&#34;http://projeto.lexml.gov.br/documentacao/Parte-2-LexML-URN.pdf&#34;&gt;&amp;ldquo;Parte 2: LEXML URN&amp;rdquo;&lt;/a&gt; which describes the rules to construct official URN for legislation and court decisions (it&amp;rsquo;s in portuguese, though). The project started circa 2004 and closely followed the footsteps of the Italian Norme in Rete project. If you aren&amp;rsquo;t yet familiar with it, it&amp;rsquo;s worth a look (see also akomantoso.organd metalex.eu).&lt;/p&gt;
&lt;p&gt;(&lt;strong&gt;Note on comments:&lt;/strong&gt; after turning off comments on this blog for a few days because of comment spam, turning them back seems to have no effect. If you send me an email about what I&amp;rsquo;ve written at snee.com (bob), I&amp;rsquo;ll add it and any response here.)&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2011">2011</category>
      
      <category domain="https://www.bobdc.com//categories/rdf">RDF</category>
      
      <category domain="https://www.bobdc.com//categories/legal-publishing">legal publishing</category>
      
    </item>
    
    <item>
      <title>My upcoming O&#39;Reilly book: &#34;Learning SPARQL&#34;</title>
      <link>https://www.bobdc.com/blog/my-upcoming-oreilly-book-learn/</link>
      <pubDate>Wed, 01 Jun 2011 10:07:13 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/my-upcoming-oreilly-book-learn/</guid>
      
      
      <description><div>Querying and Updating with SPARQL 1.1.</div><div>&lt;p&gt;&lt;a href=&#34;http://www.learningsparql.com&#34;&gt;&lt;img id=&#34;id103339&#34; src=&#34;http://www.learningsparql.com/img/cover.jpg&#34; width=&#34;200&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;Learning SPARQL cover&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;51 weeks ago at &lt;a href=&#34;http://semtech2010.semanticuniverse.com/&#34;&gt;last year&amp;rsquo;s semtech&lt;/a&gt; I couldn&amp;rsquo;t believe that there was still no book about SPARQL available. I had accumulated notes for such a book, and by that point I&amp;rsquo;d learned enough about SPARQL as a TopQuadrant employee that I decided to start studying the specifications (and especialy the 1.1 update) more systematically and write the book myself. (This explains why I&amp;rsquo;ve been writing less on my blog in the last year and &lt;a href=&#34;http://www.snee.com/bobdc.blog/metadata/rdf/sparql/&#34;&gt;writing about SPARQL&lt;/a&gt; more when I do.)&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;m proud to announce that I&amp;rsquo;m publishing the book with O&amp;rsquo;Reilly. Print and electronic versions will be available in July at the latest, and we&amp;rsquo;re already planning on releasing an expanded edition with additional new material and any necessary updates once SPARQL 1.1 becomes a Recommendation. Anyone who buys the ebook version of the first edition will get the expanded edition on SPARQL 1.1 at no extra cost.&lt;/p&gt;
&lt;p&gt;As you can tell from the book&amp;rsquo;s cover on the right, the O&amp;rsquo;Reilly animal for this one is the anglerfish—the one with the light that hangs off the front of its head, for the pun on &amp;ldquo;sparkle&amp;rdquo;. (I should really pick up the &lt;a href=&#34;http://www.neatoshop.com/product/Deep-Sea-Anglerfish-LED-Light?tag=2302&#34;&gt;nightlight version&lt;/a&gt; of this lovely fish.)&lt;/p&gt;
&lt;p&gt;From what I&amp;rsquo;ve seen so far, the only coverage of SPARQL in any existing books is a chapter or two in more general books on the semantic web, and I haven&amp;rsquo;t seen any coverage of SPARQL 1.1 in those books just yet. (The second edition of of Dean Allemang and Jim Hendler&amp;rsquo;s &lt;a href=&#34;http://www.amazon.com/exec/obidos/ISBN=0123859654/bobducharmeA/&#34;&gt;Semantic Web for the Working Ontologist&lt;/a&gt;, which is available on Amazon today, covers some SPARQL 1.1 query features, but not SPARQL Update.) &amp;ldquo;Learning SPARQL&amp;rdquo; is the first complete book on SPARQL, and covers both 1.0 and 1.1—including &lt;a href=&#34;http://www.w3.org/TR/sparql11-update/&#34;&gt;SPARQL Update&lt;/a&gt;—with working sample queries and data that you can try yourself with free software.&lt;/p&gt;
&lt;p&gt;I parked the domain name &lt;a href=&#34;http://www.learningsparql.com&#34;&gt;learningsparql.com&lt;/a&gt; some time ago, and now there&amp;rsquo;s a full web site about the book there. For up-to-date information about the book&amp;rsquo;s availability and SPARQL news in general, subscribe to the twitter feed &lt;a href=&#34;http://twitter.com/#!/LearningSPARQL&#34;&gt;@LearningSPARQL&lt;/a&gt;.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2011">2011</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
      <category domain="https://www.bobdc.com//categories/publishing">publishing</category>
      
    </item>
    
    <item>
      <title>Semantic web technology at NASA: lower costs and greater productivity</title>
      <link>https://www.bobdc.com/blog/semantic-web-at-nasa-lower-cos/</link>
      <pubDate>Fri, 27 May 2011 17:54:52 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/semantic-web-at-nasa-lower-cos/</guid>
      
      
      <description><div>An inspiring story.</div><div>&lt;p&gt;Ian Jacob&amp;rsquo;s recent &lt;a href=&#34;http://www.w3.org/QA/2011/05/semantic_web_its_not_rocket_sc.html&#34;&gt;interview with NASA&amp;rsquo;s Jean Holm&lt;/a&gt; on the W3C website is an excellent case study of semantic web technology. It&amp;rsquo;s not a long article, so I recommend that you read the whole thing. Here are few points that caught me eye:&lt;/p&gt;
&lt;img id=&#34;id103366&#34; src=&#34;http://humbabe.arc.nasa.gov/MarsDustWorkshop/NASA_Logo.gif&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;NASA logo&#34; width=&#34;140&#34;/&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;She gives nice hard numbers about money spent and money saved, and saw a downward trend of the costs.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;They used publication data to infer social networks and shared expertise and found other related ways to reduce the need for staff data entry.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The use of service agreements encouraged people to share data more easily.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;This sharing led to demonstrated serendipitous reuse of data.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;They plan to network the vocabularies (she doesn&amp;rsquo;t use this term literally—I know it from a &lt;a href=&#34;http://www.topquadrant.com/solutions/ent_vocab_net.html&#34;&gt;TopQuadrant context&lt;/a&gt;—but she&amp;rsquo;s clearly talking about the same thing).&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;It was nice to see the credit that she gave to Kendall Clark. With my TopQuadrant hat on, I wish she&amp;rsquo;d mentioned some of the &lt;a href=&#34;http://www.scribd.com/doc/25387652/NASA-Constellation-Program-Ontologies-Ralph-Hodgson-20080320&#34;&gt;extensive work&lt;/a&gt; that Raph Hodgson has done there, but NASA is a big organization.&lt;/p&gt;
&lt;p&gt;After reading Danny Ayers&amp;rsquo; &lt;a href=&#34;http://dannyayers.com/2011/05/27/Smell-the-coffee&#34;&gt;Smell the coffee&lt;/a&gt; blog post this morning, which wasn&amp;rsquo;t very hopeful about recent progress in the semantic web, I &lt;a href=&#34;http://twitter.com/#!/bobdc/status/74163734284742656&#34;&gt;hoped that&lt;/a&gt; Ian&amp;rsquo;s interview with Jeanne would cheer him up.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2011">2011</category>
      
      <category domain="https://www.bobdc.com//categories/semantic-web">semantic web</category>
      
    </item>
    
    <item>
      <title>Using SPARQL to find the right DBpedia URI</title>
      <link>https://www.bobdc.com/blog/using-sparql-to-find-the-right/</link>
      <pubDate>Tue, 17 May 2011 08:40:52 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/using-sparql-to-find-the-right/</guid>
      
      
      <description><div>Even with the wrong name.</div><div>&lt;img id=&#34;id103352&#34; src=&#34;https://www.bobdc.com/img/main/BobMarly.jpg&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;some description&#34;/&gt;
&lt;p&gt;In &lt;a href=&#34;https://www.bobdc.com/blog/pulling-skos-preflabel-and-alt&#34;&gt;Pulling SKOS prefLabel and altLabel values out of DBpedia&lt;/a&gt;, I described how Wikipedia and DBpedia store useful data about alternative names for resources described on Wikipedia, and I showed how you can use these to populate a SKOS dataset&amp;rsquo;s alternative and preferred label properties. Today I want to show how to use these as part of an application that lets you retrieve data even when you don&amp;rsquo;t necessarily have the right name for something—for example, retrieving a picture of Bob Marley using the misspelled version of his name &amp;ldquo;Bob Marly&amp;rdquo;.&lt;/p&gt;
&lt;p&gt;The &lt;a href=&#34;http://dbpedia.org/page/Bob_Marley&#34;&gt;DBpedia page for Bob Marley&lt;/a&gt; shows that dbpedia:Bob_Marly is one of the dbpedia-owl:wikiPageRedirects values of &lt;a href=&#34;http://dbpedia.org/page/Bob_Marley&#34;&gt;http://dbpedia.org/page/Bob_Marley&lt;/a&gt;. This means that if you send your browser to &lt;a href=&#34;http://en.wikipedia.org/wiki/Bob_Marly&#34;&gt;http://en.wikipedia.org/wiki/Bob_Marly&lt;/a&gt;, you&amp;rsquo;ll end up on &lt;a href=&#34;http://en.wikipedia.org/wiki/Bob_Marley&#34;&gt;http://en.wikipedia.org/wiki/Bob_Marley&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;It doesn&amp;rsquo;t show that this redirect URI has the rdfs:label value &amp;ldquo;Bob Marly&amp;rdquo;@en associated with it, and this is the really handy part for retrieving data based on not-quite-right values. Because of this, the following SPARQL query will return the URI &lt;a href=&#34;http://dbpedia.org/resource/Bob_Marley&#34;&gt;http://dbpedia.org/resource/Bob_Marley&lt;/a&gt; whether the quoted literal value is &amp;ldquo;Bob Marly&amp;rdquo; or &amp;ldquo;Bob Marley&amp;rdquo;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;# First two PREFIX declarations unnecessary on SNORQL
PREFIX rdf: &amp;lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#&amp;gt;
PREFIX foaf: &amp;lt;http://xmlns.com/foaf/0.1/&amp;gt;
PREFIX dbo: &amp;lt;http://dbpedia.org/ontology/&amp;gt;


SELECT ?s WHERE {
  {
    ?s rdfs:label &amp;quot;Bob Marly&amp;quot;@en ;
       a owl:Thing .       
  }
  UNION
  {
    ?altName rdfs:label &amp;quot;Bob Marly&amp;quot;@en ;
             dbo:wikiPageRedirects ?s .
  }
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The graph pattern before the UNION keyword checks whether there is an actual Wikipedia page for the quoted value, and the part after checks whether it&amp;rsquo;s a redirect of something else. Effectively, it will be one or the other; there are only about a dozen labels in DBpedia that can be both.&lt;/p&gt;
&lt;p&gt;To use this in a simple application, I created a &lt;a href=&#34;http://www.snee.com/sparqlforms/findWikipediaImage.html&#34;&gt;form&lt;/a&gt; that, after you enter a name on it, attempts to display a picture of what you entered. Because the redirect data includes common misspellings as well as nicknames, entering &amp;ldquo;Bob Marly&amp;rdquo; will get you a picture of Marley and the URL of the actual resource, as shown below the picture above. Other interesting nicknames and misspellings to try are Bob Dillan, Mary Casat, Prince Billy, Big Blue, and Proctor and Gamble. (Warning: DBpedia image data is incorrect for some very well-known people, like Abraham Lincoln and Barack Obama, even when the Wikipedia page has a picture, so you may see the symbol for a broken image link. I had hoped to have the picture above have a title of &amp;ldquo;&lt;a href=&#34;http://en.wikipedia.org/wiki/Abe_Lincon&#34;&gt;Abe Lincon&lt;/a&gt;&amp;rdquo;.)&lt;/p&gt;
&lt;p&gt;Because the output creates a specialized web page, I used the technique I described in &lt;a href=&#34;http://www.ibm.com/developerworks/xml/library/x-wikiquery/&#34;&gt;Build Wikipedia query forms with semantic technology&lt;/a&gt; (which can be used with any SPARQL endpoint, not just DBpedia): a CGI Python script stores a SPARQL query, replaces a string in that query with whatever was entered in the form, sends the query off to the endpoint, and then sends HTML based on the result back to the browser. You can see the source &lt;a href=&#34;http://www.snee.com/sparqlforms/findWikipediaImage.txt&#34;&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s safe to say that this ability to find the right information based on a nickname or common misspelling could add a lot to a lot of applications. Once again, while the most important part of the semantic web is the data—in this case, DBpedia&amp;rsquo;s &lt;a href=&#34;http://dbpedia.org/ontology/wikiPageRedirects&#34;&gt;wikiPageRedirects&lt;/a&gt; values—and not the standards and technologies used to get at the data, the existence of so much useful SPARQL-accessible data should make the SPARQL query language look more and more appealing to people who might have doubted before.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2011">2011</category>
      
      <category domain="https://www.bobdc.com//categories/dbpedia">DBpedia</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
    </item>
    
    <item>
      <title>SKOS overview article on IBM developerWorks</title>
      <link>https://www.bobdc.com/blog/skos-overview-article-on-ibm-d/</link>
      <pubDate>Wed, 11 May 2011 10:04:44 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/skos-overview-article-on-ibm-d/</guid>
      
      
      <description><div>SKOS, vocabulary management, the semantic web, and more</div><div>&lt;p&gt;&lt;a href=&#34;http://www.ibm.com/developerworks/xml/library/x-skostaxonomy/index.html&#34;&gt;&lt;img id=&#34;id103341&#34; src=&#34;http://www.ibm.com/developerworks/i/dwwordmark.gif&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;developerWorks logo&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;2021-09-30: The article referenced below has since been taken off of the IBM developerWorks site, so I republished it &lt;a href=&#34;../skosibm&#34;&gt;here&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve been interested in the SKOS standard for vocabulary management for several years (and written about it &lt;a href=&#34;http://www.snee.com/bobdc.blog/metadata/rdf/skos/&#34;&gt;here&lt;/a&gt; several times), but since we at TopQuadrant first began planning out the &lt;a href=&#34;http://www.topquadrant.com/solutions/ent_vocab_net.html&#34;&gt;Enterprise Vocabulary Net&lt;/a&gt; product, I&amp;rsquo;ve learned a lot more about the theory and practice of using SKOS. I&amp;rsquo;ve recently written up an overview of SKOS and where it fits into vocabulary management and the semantic web, and IBM developerWorks has just published this as &lt;a href=&#34;http://www.ibm.com/developerworks/xml/library/x-skostaxonomy/index.html&#34;&gt;Improve your taxonomy management using the W3C SKOS standard&lt;/a&gt;. I hope it provides useful to people who want to learn more about SKOS.&lt;/p&gt;
&lt;h2 id=&#34;5-comments&#34;&gt;5 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.ccil.org/~cowan&#34; title=&#34;http://www.ccil.org/~cowan&#34;&gt;John Cowan&lt;/a&gt; on &lt;a href=&#34;#comment-2858&#34;&gt;May 11, 2011 1:50 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&amp;ldquo;Mutt&amp;rdquo; has a specific meaning, so it&amp;rsquo;s a bad example. Lassie is a Rough Collie, Old Yeller is a mutt (a Labrador Retriever / Mastiff cross). &amp;ldquo;Mutt&amp;rdquo; should be an en-US alternative label for the concept whose preferred labels are &amp;ldquo;mongrel&amp;rdquo; (en-GB) and &amp;ldquo;mixed-breed dog&amp;rdquo; (en-US), which would be a hyponym of &amp;ldquo;dog&amp;rdquo;.&lt;/p&gt;
&lt;p&gt;But there&amp;rsquo;s no need to go into all that. Instead, you can just fix the article by replacing &amp;ldquo;mutt&amp;rdquo; with &amp;ldquo;pooch&amp;rdquo;.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.ibiblio.org/fred2.0/&#34; title=&#34;http://www.ibiblio.org/fred2.0/&#34;&gt;Simon Spero&lt;/a&gt; on &lt;a href=&#34;#comment-2859&#34;&gt;May 11, 2011 5:35 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;So what can you reliably infer about the relationship between [Bulldog] and [Mammal]?&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt;&lt;a href=&#34;http://www.snee.com/bobdc.blog&#34;&gt;&lt;img alt=&#34;Author Profile Page&#34; src=&#34;http://www.snee.com/mt-static/images/comment/mt_logo.png&#34; width=&#34;16&#34; height=&#34;16&#34; /&gt;&lt;/a&gt;{.commenter-profile} on &lt;a href=&#34;#comment-2860&#34;&gt;May 11, 2011 5:44 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I was going to say that Mammal is a broader term for Bulldog, but &lt;a href=&#34;http://www.w3.org/TR/2009/REC-skos-reference-20090818/#L2810&#34;&gt;http://www.w3.org/TR/2009/REC-skos-reference-20090818/#L2810&lt;/a&gt; says that &amp;quot; the properties skos:broader and skos:narrower are not declared as transitive properties&amp;quot; and that the skos:broaderTransitive property is provided to indicate such a relationship. I could use that if I wanted to more explicitly set up a taxonomy that I was defining to make it clear that I wanted Mammal to be seen as broader than Bulldog.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.ibiblio.org/fred2.0/&#34; title=&#34;http://www.ibiblio.org/fred2.0/&#34;&gt;Simon Spero&lt;/a&gt; on &lt;a href=&#34;#comment-2861&#34;&gt;May 11, 2011 6:22 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;But broaderTransitive is not supposed to be asserted, and is not guaranteed to be valid (see the SKOS Primer)&amp;hellip;&lt;/p&gt;
&lt;p&gt;\&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;https://github.com/innoq/iqvoc/wiki&#34; title=&#34;https://github.com/innoq/iqvoc/wiki&#34;&gt;Thomas Bandholtz&lt;/a&gt; on &lt;a href=&#34;#comment-2868&#34;&gt;May 13, 2011 3:19 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;in this article you mention iQvoc with a link to a German Web page which only describes this SKOS tool. Meanwhile iQvoc 3.0 is available under an Apache 2.0 license at &lt;a href=&#34;https://github.com/innoq/iqvoc/wiki&#34;&gt;https://github.com/innoq/iqvoc/wiki&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Best regards,&lt;br /&gt;
Thomas&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2011">2011</category>
      
      <category domain="https://www.bobdc.com//categories/skos">SKOS</category>
      
    </item>
    
    <item>
      <title>Quick and dirty linked data content negotiation</title>
      <link>https://www.bobdc.com/blog/quick-and-dirty-linked-data-co/</link>
      <pubDate>Mon, 09 May 2011 10:32:08 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/quick-and-dirty-linked-data-co/</guid>
      
      
      <description><div>Not even that dirty.</div><div>&lt;p&gt;I&amp;rsquo;ve managed to fill a key gap in the world&amp;rsquo;s supply of Linked Open Data by publishing triples that connect Mad Magazine film parody titles to the DBpedia URIs of the actual films. For example:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;lt;http://dbpedia.org/resource/Judge_Dredd_%28film%29&amp;gt;
      mad:FilmParody
              [ prism:CoverDate &amp;quot;1995-08-00&amp;quot; ;
                prism:issueIdentifier
                        &amp;quot;338&amp;quot; ;
                dc:title &amp;quot;Judge Dreck&amp;quot;
              ] .


&amp;lt;http://dbpedia.org/resource/2001:_A_Space_Odyssey_%28film%29&amp;gt;
      mad:FilmParody
              [ prism:CoverDate &amp;quot;1969-03-00&amp;quot; ;
                prism:issueIdentifier &amp;quot;125&amp;quot; ;
                dc:title &amp;quot;201 Minutes of a Space Idiocy&amp;quot;
              ] .
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;(To prepare the data, I scraped a &lt;a href=&#34;http://en.wikipedia.org/wiki/List_of_Mad&#39;s_movie_spoofs&#34;&gt;Wikipedia list&lt;/a&gt;, tested the URIs, then hand-corrected a few.) To really make this serious RESTful linked open data, I wanted to make it available as both RDF/XML and Turtle depending on the &lt;code&gt;Accept&lt;/code&gt; value in the header of the HTTP request. All this took was a few lines in the &lt;code&gt;.htaccess&lt;/code&gt; file (which I&amp;rsquo;ve been learning &lt;a href=&#34;https://www.bobdc.com/blog/form-driven-sparql-queries-wit&#34;&gt;more about lately&lt;/a&gt;) in the directory storing the RDF/XML and Turtle versions of the data.&lt;/p&gt;
&lt;p&gt;For example, either of the following two commands retrieves the Turtle version:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;wget --header=&amp;quot;Accept: text/turtle&amp;quot; http://www.rdfdata.org/dat/MadFilmParodies/
curl --header &amp;quot;Accept: text/turtle&amp;quot; -L http://www.rdfdata.org/dat/MadFilmParodies/
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Substituting &lt;code&gt;application/rdf+xml&lt;/code&gt; for &lt;code&gt;text/turtle&lt;/code&gt; in either command gets you the RDF/XML version, and omitting the &lt;code&gt;--header&lt;/code&gt; parameter altogether gets you an HTML version.&lt;/p&gt;
&lt;p&gt;Here&amp;rsquo;s the complete &lt;code&gt;.htaccess&lt;/code&gt; file:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;RewriteEngine on


RewriteCond %{HTTP_ACCEPT} ^.*text/turtle.*
RewriteRule ^index.html$ http://www.rdfdata.org/dat/MadFilmParodies/MadFilmParodies.ttl [L]
# no luck:
#RewriteRule ^index.html$ http://www.rdfdata.org/dat/MadFilmParodies/MadFilmParodies.ttl [R=303,L]


RewriteCond %{HTTP_ACCEPT} ^.*application/rdf\+xml.*
RewriteRule ^index.html$ http://www.rdfdata.org/dat/MadFilmParodies/MadFilmParodies.rdf [L]


RewriteRule ^index.html$ http://en.wikipedia.org/wiki/List_of_Mad&#39;s_movie_spoofs
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The Apache web server where I have this hosted is configured to look for an index.html file in a directory if the requested URL doesn&amp;rsquo;t mention a specific filename, so the three rules here each modify that &amp;ldquo;request&amp;rdquo; to look for something else, depending on what the &lt;code&gt;RewriteCond&lt;/code&gt; line finds in the &lt;code&gt;HTTP_ACCEPT&lt;/code&gt; value. If it finds &amp;ldquo;text/turtle&amp;rdquo;, it sends the Turtle version of my data, and the &lt;code&gt;L&lt;/code&gt; directive tells the Apache mod_rewrite module that is processing these instructions not to look at any more of them.&lt;/p&gt;
&lt;p&gt;The next rule performs the corresponding &lt;code&gt;HTTP_ACCEPT&lt;/code&gt; check and file delivery for an RDF/XML request, and the default behavior if neither of those happen is to deliver an HTML version of the data. (I took the lazy way out and just redirected to the appropriate Wikipedia page instead of creating a new HTML file.) As you can see from the two commented-out lines, I &lt;a href=&#34;http://www.qc4blog.com/?p=934&#34;&gt;had the impression&lt;/a&gt; that adding &lt;code&gt;R=303&lt;/code&gt; in the brackets with the &lt;code&gt;L&lt;/code&gt; would send an HTTP return code of &lt;a href=&#34;http://en.wikipedia.org/wiki/HTTP_303&#34;&gt;303&lt;/a&gt; back to the requester, overriding the default code of &lt;a href=&#34;http://en.wikipedia.org/wiki/HTTP_302&#34;&gt;302&lt;/a&gt;, but never got that to work. If anyone has any any suggestions about how to fix this, or whether 303 is even the most appropriate return code, please let me know.&lt;/p&gt;
&lt;p&gt;From what I&amp;rsquo;ve read on how the syntax of these instructions work, I shouldn&amp;rsquo;t have needed the full URLs for the Turtle and RDF/XML versions of the Mad Film Parody data, because they were in the same directory as the &lt;code&gt;.htaccess&lt;/code&gt; file, but that was the only way I could get this to work.&lt;/p&gt;
&lt;p&gt;Now that I know how to do this, I can do it again for other resources pretty quickly. It took me about five minutes to do it for the little &lt;a href=&#34;http://www.snee.com/ns/madMag/MadFilmParody&#34;&gt;http://www.snee.com/ns/madMag/MadFilmParody&lt;/a&gt; ontology that the data points to. I consider this solution quick and a bit dirty because it requires the maintenance of two copies of the data, but the XML guy in me knows that it would be wrong to perform parallel edits on the two copies, and that I should instead pick one as a master, edit it when necessary, and generate the other from it. If I had to do this on a larger scale, I learned from Brian Sletten at &lt;a href=&#34;http://semtech2010.semanticuniverse.com/sessionPop.cfm?confid=42&amp;amp;proposalid=3065&#34;&gt;last year&amp;rsquo;s semtech&lt;/a&gt; that I should look into &lt;a href=&#34;http://www.1060research.com/netkernel/&#34;&gt;NetKernel&lt;/a&gt;, but it was a good exercise to do it this way to learn what was really going on.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;m going to try to get into the habit of doing this for data and ontologies that I create, so I&amp;rsquo;d appreciate any suggestions about tweaking details before any suboptimal aspects of this become habits.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://www.dccomics.com/mad/about/?action=timeline&#34;&gt;&lt;img id=&#34;id103639&#34; src=&#34;http://www.dccomics.com/mad/i/timeline/jul1990.jpg&#34; border=&#34;0&#34; style=&#34;display: block;margin-left: auto;margin-right: auto &#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;MAD cover&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&#34;2-comments&#34;&gt;2 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://themantics.wordpress.com&#34; title=&#34;http://themantics.wordpress.com&#34;&gt;Ryan&lt;/a&gt; on &lt;a href=&#34;#comment-2854&#34;&gt;May 9, 2011 3:40 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;To help maintain a master copy of your RDF and transform into other formats through the command line, I&amp;rsquo;d recommend the rdfcat utility distributed with Jena: &lt;a href=&#34;http://jena.sourceforge.net/javadoc/jena/rdfcat.html&#34;&gt;http://jena.sourceforge.net/javadoc/jena/rdfcat.html&lt;/a&gt; . Personally, I&amp;rsquo;d make Turtle my master format language due to readability and file size, and transform that into XML after editing. Something like this:&lt;/p&gt;
&lt;p&gt;java jena.rdfcat MadFilmParodies.ttl -in TTL &amp;gt; parody.rdf&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt;&lt;a href=&#34;http://www.snee.com/bobdc.blog&#34;&gt;&lt;img alt=&#34;Author Profile Page&#34; src=&#34;http://www.snee.com/mt-static/images/comment/mt_logo.png&#34; width=&#34;16&#34; height=&#34;16&#34; /&gt;&lt;/a&gt;{.commenter-profile} on &lt;a href=&#34;#comment-2855&#34;&gt;May 9, 2011 4:14 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Thanks Ryan! I&amp;rsquo;ve used jena.rdfcopy, but never noticed rdfcat before.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2011">2011</category>
      
      <category domain="https://www.bobdc.com//categories/rdf">RDF</category>
      
    </item>
    
    <item>
      <title>Data providers</title>
      <link>https://www.bobdc.com/blog/data-providers/</link>
      <pubDate>Mon, 02 May 2011 08:31:42 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/data-providers/</guid>
      
      
      <description><div>RDF or otherwise.</div><div>&lt;p&gt;While beta testing Talis&amp;rsquo;s Kasabi, I got to wondering about the data publishing market: who out there is hosting raw data, potentially charging for it and passing money along to the data&amp;rsquo;s providers? Poking around, I learned who the key names are. (Corrections welcome.) I accidentally stumbled across a few more when I followed a &lt;a href=&#34;http://twitter.com/#!/xmlgrrl/status/62701417810509824&#34;&gt;tweet&lt;/a&gt; from @xmlgrrl (a.k.a. Eve Maler, a friend of mine in the XML world since it was the SGML world) and started looking at her husband Eli&amp;rsquo;s blog. His posting &lt;a href=&#34;http://www.eliasisrael.com/2011/04/05/ten-services-to-get-your-cloud-startup-off-the-ground-now/&#34;&gt;Ten services to get your cloud startup off the ground now&lt;/a&gt; mentioned a few more companies that provide raw data—one that even provides free RDF. I tagged a few with a &lt;a href=&#34;http://www.delicious.com/bobdc/data&#34;&gt;delicious.com&lt;/a&gt; bookmark, but wanted to write out notes about a few here in order of how interesting they are to a semantic web geek.&lt;/p&gt;
&lt;p&gt;Some general notes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;The more I studied, the more I found, but I didn&amp;rsquo;t want to spend more than an afternoon on this.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;These sites all let you download data directly. I didn&amp;rsquo;t include sites like &lt;a href=&#34;http://www.data.gov/&#34;&gt;Data.gov&lt;/a&gt; that function more as directories that link to data sources on other sites.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Most of these providers have boosted their numbers of available datasets by including small datasets with as few as 100 records, and by hosting copies of data from the well-known names in the &lt;a href=&#34;http://richard.cyganiak.de/2007/10/lod/&#34;&gt;Linked Data Cloud&lt;/a&gt;. The advertised added value is typically the ease of programmatic access to that data.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Despite the title of this blog entry (I was tempted to call it &amp;ldquo;Data resellers&amp;rdquo;, but many make the data available for free) I focused on a more narrow case of data providers: the redistributors that gather data from specific, identified places and then make it available publicly with attribution, not actual data sources themselves such as government agencies, university projects, media making their metadata available, and various other circles on the Linked Data Cloud diagram.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;If I&amp;rsquo;ve quoted some companies&amp;rsquo; websites more than others, it&amp;rsquo;s because they had &amp;ldquo;About&amp;rdquo; and &amp;ldquo;FAQ&amp;rdquo; pages that were easy to find and actually answered the questions I was wondering about.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The most interesting thing about &lt;a href=&#34;http://blog.kasabi.com/&#34;&gt;&lt;strong&gt;Kasabi&lt;/strong&gt;&lt;/a&gt; in this field is their commitment to providing data according to Linked Data principles, giving you SPARQL endpoints for data sources and the ability to define new APIs around each data source. The current data selection is interesting, considering that Kasabi is still in beta. For now it all looks like data that is freely available elsewhere, but the advantages of retrieving it from them go beyond the ability to use the SPARQL query language. For example, with BestBuy&amp;rsquo;s RDFa spread out across many different dynamically generated pages on bestbuy.com, querying this data from BestBuy&amp;rsquo;s server has a lot of limitations. Kasabi seems to have the BestBuy data aggregated so that their customers have more flexibility in how they query it.&lt;/p&gt;
&lt;blockquote id=&#34;id103435&#34; class=&#34;pullquote&#34;&gt;While disintermediation was a big buzzword of the dot com boom, intermediation is now getting bigger.&lt;/blockquote&gt;
&lt;p&gt;I list &lt;strong&gt;&lt;a href=&#34;http://www.socrata.com/&#34;&gt;Socrata&lt;/a&gt;&lt;/strong&gt; right after Kasabi because RDF is one of their export formats, along with XML, JSON, CSV, XLS, and more. In a business that depends on finding both data providers and data users, their home page makes the clearest case about why someone should work with them as a data provider: they&amp;rsquo;re clearly targeting government agencies who need to fulfill data transparency mandates. (Other providers are certainly targeting this market; just not as clearly.) The &lt;a href=&#34;http://www.socrata.com/company-info/&#34;&gt;company info&lt;/a&gt; page calls them &amp;ldquo;The Leader in Open Data Services for Government&amp;rdquo;. Another paragraph on the homepage makes a nice case for why developers should be interested in their data, and upcoming webinar titles of &amp;ldquo;Launch your own Data.Gov&amp;rdquo; and &amp;ldquo;Open Data as a Service Delivery Platform&amp;rdquo; are also pretty catchy to someone interested in this market.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&#34;http://www.factual.com/&#34;&gt;Factual&lt;/a&gt;&lt;/strong&gt; targets data users more than data providers on their current home page, telling developers &amp;ldquo;Access great data for your web and mobile apps&amp;rdquo;. The only download format I could find was CSV, but with their emphasis on helping developers build apps, they focus more data delivery with their RESTful &lt;a href=&#34;http://wiki.developer.factual.com/w/page/29670788/Server-API&#34;&gt;API&lt;/a&gt;. According to their &lt;a href=&#34;http://www.factual.com/FAQ&#34;&gt;FAQ&lt;/a&gt;, &amp;ldquo;Factual, Inc. is an open data platform for application developers that leverages large scale aggregation and community exchange&amp;hellip; Factual&amp;rsquo;s hosted data comes from our community of users, developers and partners, and from our powerful data mining tools&amp;hellip; Factual offers several hundred thousand datasets across a variety of topics (with a deep focus in Local) aggregated from multiple sources, made easily accessible for developers to build web and mobile apps&amp;hellip; Our APIs are free to everyone—if you want SLAs or have certain performance requirements, we would charge you a fee based on usage volume. Our downloads are free for smaller developers&amp;rdquo;. A &lt;a href=&#34;http://semantifi.wordpress.com/2010/02/11/data-is-the-future-of-web-latest-validation-from-prominent-investors/&#34;&gt;press release&lt;/a&gt; on Semantifi&amp;rsquo;s web site shows that some big names and big money are behind Factual.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&#34;http://www.infochimps.com&#34;&gt;Infochimp&lt;/a&gt;&lt;/strong&gt; seems to be one of the more well-known (and memorable) names in the field. From their &lt;a href=&#34;http://www.infochimps.com/faq&#34;&gt;FAQ&lt;/a&gt;: &amp;ldquo;Infochimps is a place for people to find, share and sell formatted data. Both users and Infochimps employees scrape, parse and format data so that it&amp;rsquo;s easily accessible to you. We take the chimp work out of working with data so you can literally start building cool stuff in minutes&amp;hellip; There is no sign up fee to use Infochimps. Some of the data sets available on our site are free. Some require attribution, and others are available for purchase. The first 100,000 data API calls are free. We offer subscriptions if you would like to use more&amp;hellip; The data sets available through our API are 1.) hosted for you and 2.) scraped on a regular basis. &amp;hellip; Most of our data comes in tsv, csv or yaml format&amp;rdquo;. The part about users scraping, parsing, and formatting highlights another aspect of the business model of some of these companies: crowd-sourcing the labor whenever possible.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&#34;http://www.aggdata.com&#34;&gt;AggData&lt;/a&gt;&lt;/strong&gt; sells CSV files, typically of locations of all the stores in a particular chain. For example, a complete list of &lt;a href=&#34;http://www.cinnabon.com/&#34;&gt;Cinnabon&lt;/a&gt; locations, with 454 records, costs $29. The &lt;a href=&#34;http://www.aggdata.com/locations/cinnabon&#34;&gt;description page&lt;/a&gt; for each data set lists the fields and lets you download a sample. Prices that I saw ranged from $9 to $49. According to their &lt;a href=&#34;http://www.aggdata.com/faq&#34;&gt;FAQ&lt;/a&gt;, you order a dataset, and when payment is confirmed they email you a URL for the data that is good for 5 downloads or 120 hours. Being founded in 2006 and therefore the oldest of these companies, AggData is the most low-tech (no APIs here) but it&amp;rsquo;s a lot easier to look at their lists of franchise locations and churches and imagine that data being useful to someone than it is for many of the other data providers. Infochimps lists AggData as a &amp;ldquo;featured data provider&amp;rdquo;, but lists the same prices for the same datasets, so I&amp;rsquo;m not sure whether they&amp;rsquo;re just routing you to the same batches of data or making it available through their own APIs. (I got an Infochimps ID, clicked through for an AggData dataset until it asked me for credit card information, and stopped there.)&lt;/p&gt;
&lt;p&gt;According to their &lt;a href=&#34;http://www.semantifi.com/SemantifiPortal.html&#34;&gt;About&lt;/a&gt; page, &lt;strong&gt;&lt;a href=&#34;http://www.semantifi.com/semantifiHome.action?type=SI&#34;&gt;Semantifi&lt;/a&gt;&lt;/strong&gt; &amp;ldquo;developed a meaning based search platform to search both structured and unstructured content and filed multiple patents&amp;rdquo;. Along with the platform, they say that they have an &amp;ldquo;App Store like marketplace for a community of publishers to build data search apps&amp;rdquo; and that &amp;ldquo;Both Socrata and Factual are quite similar in concept and both lack the technology to search datasets like Semantifi&amp;rdquo;. As far as I could tell, Socrata and Factual have a lot more datasets than Semantifi; the first three Semantifi links that I clicked to look into specific data sets went to an &lt;a href=&#34;http://wiki.semantifi.com/index.php/100_Best_Places_To_Live&#34;&gt;empty wiki page&lt;/a&gt;. (If I was clicking in the wrong place, that&amp;rsquo;s not a great reflection on their site design. Also, with all of the people with hardcore financial markets experience on Semantifi&amp;rsquo;s &lt;a href=&#34;http://www.semantifi.com/Management.htm&#34;&gt;management&lt;/a&gt; page, why they need Google ads on their home page?) Perhaps Semantifi is less like data providers Socrata and Factual then they think and more like &lt;a href=&#34;http://open.mflask.com/&#34;&gt;Open Data Directory&lt;/a&gt;, which doesn&amp;rsquo;t provide actual data but instead a search engine for data spread out across other sites that they index.&lt;/p&gt;
&lt;p&gt;I wanted to mention one other interesting source of fairly large-scale data to use in applications—when I learned how to add a volume for more disk space to an Amazon EC2 cloud image, I found that some of the volumes I could choose from included data from a choice of public data sets: DBpedia and Freebase dumps, the &lt;a href=&#34;http://www.cs.cmu.edu/~enron/&#34;&gt;Enron email&lt;/a&gt;, US Census, Labor, and economic data, various biological data collections, and more. There is a &lt;a href=&#34;http://aws.amazon.com/datasets?_encoding=UTF8&amp;amp;jiveRedirect=1&#34;&gt;list of such data&lt;/a&gt; on Amazon&amp;rsquo;s website, but doesn&amp;rsquo;t show all the choices; additional data sets include &lt;a href=&#34;http://ods.openlinksw.com/wiki/main/Main/VirtAWSBBCMusicProgs&#34;&gt;BBC Music and programs data&lt;/a&gt;. If you were going to jump into the data reseller market with the various companies described above, an Amazon image with some of this data would be one logical place to start your company.&lt;/p&gt;
&lt;p&gt;A local friend &lt;a href=&#34;http://twitter.com/#!/dep4b&#34;&gt;Eric Pugh&lt;/a&gt; was recently pointing out to me the irony of how, while disintermediation was a big buzzword of the dot com boom, intermediation is now getting bigger. These data resellers are a good example. If you&amp;rsquo;re going to insert yourself as a middleman between a data provider and a data user, it&amp;rsquo;s a compelling case for either side to use your service if you have a lot of customers on the other side, but before you get there, you need to make your own compelling case to each side. Some of the companies listed above are better at doing this than others, and it will be interesting to see which of them are in business in five years and why they lasted.&lt;/p&gt;
&lt;h2 id=&#34;3-comments&#34;&gt;3 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://publishmydata.com&#34; title=&#34;http://publishmydata.com&#34;&gt;Bill Roberts&lt;/a&gt; on &lt;a href=&#34;#comment-2845&#34;&gt;May 2, 2011 9:06 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Hi Bob&lt;/p&gt;
&lt;p&gt;If I may add ourselves to your list, as another company working in the data publishing market, Swirrl has a product &lt;a href=&#34;http://publishmydata.com&#34;&gt;PublishMyData&lt;/a&gt;. Rather than being a data aggregator or marketplace, we are aiming at enabling the data owners to publish it themselves as Linked Data. (We only do Linked Data in the full RDF/SPARQL sense of the term).&lt;/p&gt;
&lt;p&gt;At the moment we&amp;rsquo;re offering a hosted/full-service approach, but will be introducing do-it-yourself options in future. We&amp;rsquo;re currently concentrating on the public sector and on open data - so the data is all free to use and the business model is that the data owner pays to publish. So the data provided via our service could be used directly, or could be picked up and re-offered through one of these intermediary sites.&lt;/p&gt;
&lt;p&gt;But our philosophy is that the best way to get high quality Linked Data online (and so highly re-usable data) is for the data owners to take responsibility for doing it themselves.&lt;/p&gt;
&lt;p&gt;Cheers&lt;/p&gt;
&lt;p&gt;Bill&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt;&lt;a href=&#34;http://www.snee.com/bobdc.blog&#34;&gt;&lt;img alt=&#34;Author Profile Page&#34; src=&#34;http://www.snee.com/mt-static/images/comment/mt_logo.png&#34; width=&#34;16&#34; height=&#34;16&#34; /&gt;&lt;/a&gt;{.commenter-profile} on &lt;a href=&#34;#comment-2847&#34;&gt;May 2, 2011 7:27 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Thanks Bill, looks very cool!&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.aggdata.com&#34; title=&#34;http://www.aggdata.com&#34;&gt;Chris Hathaway&lt;/a&gt; on &lt;a href=&#34;#comment-2848&#34;&gt;May 2, 2011 8:16 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Hey Bill,&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;m the CEO &amp;amp; Founder of AggData. Thanks for the mention in your overview; it&amp;rsquo;s a great review. I did want to clear up a few uncertainties for our description. First, you are correct, Infochimps is currently just a reseller of our data, though we may integrate further with them in the future. And while our main aggdata.com site is pretty straightforward so people can easily get the data they are looking for, we have some broader and more technically involved options for interested clients, and we&amp;rsquo;re planning a public launch of an API very soon. Overall, it&amp;rsquo;s great to see the field growing and the excitement around providing quality data.&lt;/p&gt;
&lt;p&gt;Thanks,&lt;br /&gt;
Chris&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2011">2011</category>
      
    </item>
    
    <item>
      <title>Inserting data from a SPARQL endpoint into a relational database</title>
      <link>https://www.bobdc.com/blog/inserting-data-from-a-sparql-e/</link>
      <pubDate>Wed, 27 Apr 2011 09:28:18 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/inserting-data-from-a-sparql-e/</guid>
      
      
      <description><div>Via XML.</div><div>&lt;p&gt;Retrieval of triples from relational databases is a popular topic in the semantic web world, but I was recently wondering how much trouble it would be to go in the opposite direction: to retrieve data from a SPARQL endpoint and load it into a relational database. It wasn&amp;rsquo;t much trouble at all. When you retrieve the results in the &lt;a href=&#34;http://www.w3.org/TR/2008/REC-rdf-sparql-XMLres-20080115/&#34;&gt;SPARQL query results XML format&lt;/a&gt;, a straightforward XSLT spreadsheet can convert it into the necessary SQL INSERT statements. I was able to automate the data retrieval, conversion to INSERT statements, and actual insertion into a MySQL database with a three-line batch file that used no Windows-specific tricks, so I&amp;rsquo;m sure it would work on Linux just as well.&lt;/p&gt;
&lt;p&gt;I used the following SPARQL query to retrieve the name, founding year, and equity, revenue, net income, and operating income figures of companies listed on the New York Stock Exchange according to DBpedia. I used &lt;a href=&#34;http://jena.sourceforge.net/ARQ/&#34;&gt;ARQ&lt;/a&gt; to execute the query, so that after the inner query retrieved the raw data from the &lt;a href=&#34;http://DBpedia.org/sparql&#34;&gt;http://DBpedia.org/sparql&lt;/a&gt; SPARQL endpoint service, the outer query could use ARQ&amp;rsquo;s SPARQL 1.1 support to format the data a bit—mostly, by using the &lt;code&gt;str()&lt;/code&gt; function to strip language and datatype tags.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;PREFIX rdfs: &amp;lt;http://www.w3.org/2000/01/rdf-schema#&amp;gt;
PREFIX do: &amp;lt;http://dbpedia.org/ontology/&amp;gt;


SELECT (str(?name) as ?coName) 
       (substr(str(?formationYearTyped),1,4) as ?formationYear)
       (str(?equityTyped) as ?equity) 
       (str(?revenueTyped) as ?revenue) 
       (str(?netIncomeTyped) as ?netIncome) 
       (str(?operatingIncomeTyped) as ?operatingIncome) 
  WHERE {
  SERVICE &amp;lt;http://DBpedia.org/sparql&amp;gt;
  {
    SELECT * WHERE {
     ?company &amp;lt;http://purl.org/dc/terms/subject&amp;gt; 
     &amp;lt;http://dbpedia.org/resource/Category:Companies_listed_on_the_New_York_Stock_Exchange&amp;gt; .
     ?company rdfs:label ?name . 
    FILTER ( lang(?name) = &amp;quot;en&amp;quot; )
      OPTIONAL { ?company do:formationYear ?formationYearTyped . } 
      OPTIONAL { ?company do:equity ?equityTyped . }
      OPTIONAL { ?company do:revenue ?revenueTyped . } 
      OPTIONAL { ?company do:netIncome  ?netIncomeTyped . } 
      OPTIONAL { ?company do:operatingIncome ?operatingIncomeTyped . } 
    }
  }
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The following command line told ARQ to put the results of this query in an XML file called companyData.xml. (Because the query doesn&amp;rsquo;t have the FROM keyword, ARQ needs an input dataset specified, so the command names dummy.ttl as this input even though the query above ignores this file and gets its data from DBpedia using the SERVICE keyword.)&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;arq --results XML --query getCompanyData.spq --data dummy.ttl &amp;gt; companyData.xml
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Next, I ran the following command to apply an XSLT stylesheet to the result of the ARQ output using libxslt&amp;rsquo;s &lt;a href=&#34;http://xmlsoft.org/XSLT/xsltproc2.html&#34;&gt;xsltproc&lt;/a&gt; XSLT processor. (You could use Saxon or Xalan just as easily.) This generated the SQL statements that would add the data to a MySQL database and stored them in the file insertCompanData.sql:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;xsltproc SPARQLXMLtoSQL.xsl companyData.xml &amp;gt; insertCompanyData.sql
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The XSLT stylesheet is not particularly brief, but there&amp;rsquo;s no customized logic to process the output of the query above other than the use of the query&amp;rsquo;s variable names and the quotes that it adds around the &lt;code&gt;coName&lt;/code&gt; values. (The potential need for quotes depends on whether you&amp;rsquo;re inserting the value into the SQL database as a string.) The trickiest part was having the stylesheet output the string &amp;ldquo;NULL&amp;rdquo; when a value was missing; I used a named template, so it wasn&amp;rsquo;t too tricky.&lt;/p&gt;
&lt;p&gt;If I had many different query results to convert to SQL INSERT statements, I&amp;rsquo;d write a more generalized version of this stylesheet (for example, setting the the name of the database and table to receive the data in variables at the top), but if I only had two or three sets of SPARQL query results to deal with, I could adapt this one for each of those pretty quickly:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;lt;xsl:stylesheet version=&amp;quot;1.0&amp;quot;
                xmlns:s=&amp;quot;http://www.w3.org/2005/sparql-results#&amp;quot;
                xmlns:xsl=&amp;quot;http://www.w3.org/1999/XSL/Transform&amp;quot;&amp;gt;


  &amp;lt;xsl:strip-space elements=&amp;quot;*&amp;quot;/&amp;gt;
  &amp;lt;xsl:output method=&amp;quot;text&amp;quot;/&amp;gt;




  &amp;lt;xsl:template match=&amp;quot;s:sparql&amp;quot;&amp;gt;
    USE testdb;
    &amp;lt;xsl:apply-templates/&amp;gt;
  &amp;lt;/xsl:template&amp;gt;




  &amp;lt;xsl:template match=&amp;quot;text()&amp;quot;/&amp;gt; &amp;lt;!-- all values output with xsl:value-of --&amp;gt;




  &amp;lt;xsl:template match=&amp;quot;s:result&amp;quot;&amp;gt;


  &amp;lt;!-- Typical line for this template rule to create 
       (with carriage return added here):
       INSERT INTO company VALUES(
       &amp;quot;Protective Life&amp;quot;,1907,NULL,3.06E9,2.71E8,4.16E8);
   --&amp;gt;
    &amp;lt;xsl:text&amp;gt;INSERT INTO company VALUES(&amp;quot;&amp;lt;/xsl:text&amp;gt;
    &amp;lt;xsl:value-of select=&amp;quot;s:binding[@name=&#39;coName&#39;]/s:literal&amp;quot;/&amp;gt;
    &amp;lt;xsl:text&amp;gt;&amp;quot;,&amp;lt;/xsl:text&amp;gt;


    &amp;lt;xsl:call-template name=&amp;quot;valueOrNULL&amp;quot;&amp;gt;
      &amp;lt;xsl:with-param name=&amp;quot;value&amp;quot;
                      select=&amp;quot;s:binding[@name=&#39;formationYear&#39;]/s:literal&amp;quot;/&amp;gt;
    &amp;lt;/xsl:call-template&amp;gt;
    &amp;lt;xsl:text&amp;gt;,&amp;lt;/xsl:text&amp;gt;


    &amp;lt;xsl:call-template name=&amp;quot;valueOrNULL&amp;quot;&amp;gt;
      &amp;lt;xsl:with-param name=&amp;quot;value&amp;quot;
                      select=&amp;quot;s:binding[@name=&#39;equity&#39;]/s:literal&amp;quot;/&amp;gt;
    &amp;lt;/xsl:call-template&amp;gt;
    &amp;lt;xsl:text&amp;gt;,&amp;lt;/xsl:text&amp;gt;


    &amp;lt;xsl:call-template name=&amp;quot;valueOrNULL&amp;quot;&amp;gt;
      &amp;lt;xsl:with-param name=&amp;quot;value&amp;quot;
                      select=&amp;quot;s:binding[@name=&#39;revenue&#39;]/s:literal&amp;quot;/&amp;gt;
    &amp;lt;/xsl:call-template&amp;gt;
    &amp;lt;xsl:text&amp;gt;,&amp;lt;/xsl:text&amp;gt;


    &amp;lt;xsl:call-template name=&amp;quot;valueOrNULL&amp;quot;&amp;gt;
      &amp;lt;xsl:with-param name=&amp;quot;value&amp;quot;
                      select=&amp;quot;s:binding[@name=&#39;netIncome&#39;]/s:literal&amp;quot;/&amp;gt;
    &amp;lt;/xsl:call-template&amp;gt;
    &amp;lt;xsl:text&amp;gt;,&amp;lt;/xsl:text&amp;gt;


    &amp;lt;xsl:call-template name=&amp;quot;valueOrNULL&amp;quot;&amp;gt;
      &amp;lt;xsl:with-param name=&amp;quot;value&amp;quot;
                      select=&amp;quot;s:binding[@name=&#39;operatingIncome&#39;]/s:literal&amp;quot;/&amp;gt;
    &amp;lt;/xsl:call-template&amp;gt;


    &amp;lt;xsl:text&amp;gt;);&amp;amp;#10;&amp;lt;/xsl:text&amp;gt;
  &amp;lt;/xsl:template&amp;gt;

 


  &amp;lt;xsl:template name=&amp;quot;valueOrNULL&amp;quot;&amp;gt;
    &amp;lt;xsl:param name=&amp;quot;value&amp;quot;/&amp;gt;
    &amp;lt;xsl:choose&amp;gt;
      &amp;lt;xsl:when test=&amp;quot; $value != &#39;&#39; &amp;quot;&amp;gt;
        &amp;lt;xsl:value-of select=&amp;quot;$value&amp;quot;/&amp;gt;
      &amp;lt;/xsl:when&amp;gt;
      &amp;lt;xsl:otherwise&amp;gt;NULL&amp;lt;/xsl:otherwise&amp;gt;
    &amp;lt;/xsl:choose&amp;gt;
  &amp;lt;/xsl:template&amp;gt;




&amp;lt;/xsl:stylesheet&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;To run the created INSERT statements with a MySQL database table, I just did this, substituting my own MySQL username and password:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;mysql -u myusername --password=mypass &amp;lt; insertCompanyData.sql
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Of course, the created set of INSERT statements assumes that a database named testdb with a table named company already exists, and that the appropriate columns have been declared for that table.&lt;/p&gt;
&lt;p&gt;After combining the command line calls to arq, xsltproc, and mysql in a three-line batch file, it was fun to see it all happen unattended. For a more serious implementation, you&amp;rsquo;d want to look into the use of APIs to the various tools as a more efficient alternative to this kind of scripting, but it&amp;rsquo;s nice to see how much can be done with a little scripting.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2011">2011</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
    </item>
    
    <item>
      <title>Form-driven SPARQL queries without scripting</title>
      <link>https://www.bobdc.com/blog/form-driven-sparql-queries-wit/</link>
      <pubDate>Wed, 20 Apr 2011 08:39:27 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/form-driven-sparql-queries-wit/</guid>
      
      
      <description><div>Just two lines in an .htaccess file.</div><div>&lt;p&gt;In a &lt;a href=&#34;http://www.wfmu.org/playlists/shows/39765&#34;&gt;podcast&lt;/a&gt; of a radio show I was listening to recently, the host asserted that 80s rapper Schoolly D had scored most of director Abel Ferrara&amp;rsquo;s films. I was curious about this, so I went to IMDB&amp;rsquo;s &lt;a href=&#34;http://www.imdb.com/name/nm0001206/&#34;&gt;page for Ferrara&lt;/a&gt;, clicked on the first film title, scrolled down, clicked &amp;ldquo;Full cast and crew&amp;rdquo;, checked the music credit, returned to Ferrara&amp;rsquo;s main page, and repeated the last few steps&amp;hellip; until I realized that one SPARQL query could create a single list of Ferrara&amp;rsquo;s films with the film score credit next to each one.&lt;/p&gt;
&lt;p&gt;The following query, when entered on DBpedia&amp;rsquo;s &lt;a href=&#34;http://dbpedia.org/snorql/&#34;&gt;snorql&lt;/a&gt; form, shows that Mr. D is credited with two films, and that Joe Delia is credited with many more:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;SELECT ?title ?scorer WHERE 
{   
  ?director rdfs:label &amp;quot;Abel Ferrara&amp;quot;@en .    
  ?film &amp;lt;http://dbpedia.org/ontology/director&amp;gt; ?director .    
  ?film rdfs:label ?title .   
  FILTER ( lang(?title) = &amp;quot;en&amp;quot; )   
  ?film &amp;lt;http://dbpedia.org/property/music&amp;gt; ?scorer .  
} 
ORDER BY ?scorer
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;(Further research showed that Delia brought in D to contribute to many of the films for which he is credited. Also, I could have done this with the &lt;a href=&#34;http://www.linkedmdb.org/&#34;&gt;Linked Movie Database&lt;/a&gt; SPARQL endpoint, as I&amp;rsquo;ve &lt;a href=&#34;https://www.bobdc.com/blog/sparql-at-the-movies&#34;&gt;written about before&lt;/a&gt;, but I&amp;rsquo;ve been exploring DBpedia&amp;rsquo;s film data more lately.)&lt;/p&gt;
&lt;p&gt;A great way to spread the benefits of SPARQL and semantic web data while keeping the syntax parts under the covers is to create a web form for users to fill out and to insert the entered values into a SPARQL query. I thought that a form where you enter a director&amp;rsquo;s name and then see who scored his or her films would be a nice example of this. In the IBM developerWorks article &lt;a href=&#34;http://www.ibm.com/developerworks/xml/library/x-wikiquery/&#34;&gt;Build Wikipedia query forms with semantic technology&lt;/a&gt;, I described and linked to two such forms; the first listed all the actors who appeared in movies by the two directors whose names you entered in the form (for example, everyone who appeared in films by both Woody Allen and Martin Scorsese), and the other searched album and artist names for strings of text and displayed basic information about the albums it found.&lt;/p&gt;
&lt;p&gt;Both of those forms passed the entered values to python scripts that plugged the values into SPARQL queries before sending these queries off to the appropriate SPARQL endpoints. Recently, though, while reading Tom Heath and Christian Bizer&amp;rsquo;s book &lt;a href=&#34;http://linkeddatabook.com/editions/1.0/&#34;&gt;Linked Data: Evolving the Web into a Global Data Space&lt;/a&gt;, I had a better idea. I&amp;rsquo;ve used &lt;a href=&#34;http://httpd.apache.org/docs/1.3/howto/htaccess.html&#34;&gt;.htaccess&lt;/a&gt; files to redirect an Apache HTTP server from one requested URL to another (for example, when I&amp;rsquo;ve moved a file but don&amp;rsquo;t want to break links that point to it) but I didn&amp;rsquo;t know about the regular expression support in the Apach mod_rewrite module that carries out the .htaccess instructions. It turns out that, because of this feature, I don&amp;rsquo;t even need a script to execute a SPARQL query with values from a web form.&lt;/p&gt;
&lt;p&gt;A form that I put at &lt;a href=&#34;http://snee.com/sparqlforms/directors/filmscores.html&#34;&gt;http://snee.com/sparqlforms/directors/filmscores.html&lt;/a&gt; has a single field where you enter a director&amp;rsquo;s name. When you click the &amp;ldquo;go&amp;rdquo; button, the form&amp;rsquo;s action is &lt;a href=&#34;http://www.snee.com/sparqlforms/directors/composers&#34;&gt;http://www.snee.com/sparqlforms/directors/composers&lt;/a&gt;, so if you enter &amp;ldquo;John Ford&amp;rdquo; the form does an HTTP GET with the URL &lt;a href=&#34;http://www.snee.com/sparqlforms/directors/composers?director=John+Ford&#34;&gt;http://www.snee.com/sparqlforms/directors/composers?director=John+Ford&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The .htaccess file in the same directory has the following three lines (everything from &amp;ldquo;RewriteRule&amp;rdquo; to the end is one line, split up for easier viewing here):&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;RewriteEngine on


RewriteCond %{QUERY_STRING} ^director=(.*)$


RewriteRule ^composers.*$ http://dbpedia.org/sparql?query=
PREFIX+rdfs:+&amp;lt;http://www.w3.org/2000/01/rdf-schema#&amp;gt;+
SELECT+?title+?scorer+WHERE+{+?director+rdfs:label+&amp;quot;%1&amp;quot;@en+.+
?film+&amp;lt;http://dbpedia.org/ontology/director&amp;gt;+?director+.+
?film+rdfs:label+?title+.+FILTER+(+lang(?title)+=+&amp;quot;en&amp;quot;+)+
?film+&amp;lt;http://dbpedia.org/property/music&amp;gt;+?scorer+.++}+ORDER+BY+?scorer
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Most of the third &amp;ldquo;line&amp;rdquo; is just an &lt;a href=&#34;http://www.xs4all.nl/~jlpoutre/BoT/Javascript/Utils/endecode.html&#34;&gt;escaped&lt;/a&gt; version of the SPARQL query about who scored Abel Ferrara films. I won&amp;rsquo;t go into details about the syntax of the rest of the three lines because &lt;a href=&#34;http://www.workingwith.me.uk/articles/scripting/mod_rewrite&#34;&gt;this tutorial&lt;/a&gt; explains the basics better than I could and &lt;a href=&#34;http://httpd.apache.org/docs/1.3/mod/mod_rewrite.html&#34;&gt;this bit of Apache documentation&lt;/a&gt; is pretty comprehensive.&lt;/p&gt;
&lt;p&gt;To summarize, RewriteRule gets two expressions as arguments: what to look for and what to replace it with when redirecting your browser or other client. Regular expression matching in the first parameter can use parentheses, and the second expression can refer to these matched expressions with variable references like $1 and $2. HTTP GET parameters like &amp;ldquo;?directory=John+Ford&amp;rdquo; are a special case, though—RewriteRule regular expressions won&amp;rsquo;t find them—which is why I have the RewriteCond line above. That matches the director value parameter, and the RewriteRule references that with %1 (as distinguished from $1, which would reference something matched in the RewriteRule). This inserts the value into the escaped version of the SPARQL query where I had &amp;ldquo;Abel Ferrara&amp;rdquo; in my original query. The query is part of a URL that executes the query on DBpedia&amp;rsquo;s endpoint, so the user who clicks &amp;ldquo;go&amp;rdquo; on the form will see the list of film titles and music credits. Try the form yourself, and make sure to use a director&amp;rsquo;s official name (for example, &amp;ldquo;Marty Scorsese&amp;rdquo; won&amp;rsquo;t get you anything).&lt;/p&gt;
&lt;p&gt;This kind of URL revision is an important technique in Linked Data publishing, where you want to assign sensible, &lt;a href=&#34;http://www.w3.org/Provider/Style/URI.html&#34;&gt;cool&lt;/a&gt; URIs to resources but may have some less cool details in how you actually serve up the resource data. For a larger, more complex application, it&amp;rsquo;s nice to know that I would only need to add two more lines to the .htaccess file for each new form/query combination in my application. This can be a very valuable tool for semantic web application development. (I couldn&amp;rsquo;t get it to work with a local copy of the Apache HTTP server or with the &lt;a href=&#34;http://groups.google.com/group/urlrewrite/browse_thread/thread/6af6552e4d5a800c/&#34;&gt;Url Rewrite Filter&lt;/a&gt; designed to allow the same thing with Tomcat, though, so I may have to go back to the python CGI scripts for local applications.)&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://www.blogto.com/music/2006/09/schoolly_d_visits_the_t_dot/&#34;&gt;&lt;img id=&#34;id103641&#34; src=&#34;http://www.blogto.com/schoolly.jpg&#34; border=&#34;0&#34; style=&#34;display: block;margin-left: auto;margin-right: auto &#34; hspace=&#34;30px&#34; width=&#34;240&#34; alt=&#34;Schoolly D poster&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&#34;5-comments&#34;&gt;5 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.springboardseo.com&#34; title=&#34;http://www.springboardseo.com&#34;&gt;Matthew&lt;/a&gt; on &lt;a href=&#34;#comment-2832&#34;&gt;April 21, 2011 6:04 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Hi Bob,&lt;/p&gt;
&lt;p&gt;The first time I heard a Schooly D song in a Ferrara movie (King of New York) I flipped :)&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve been going through your post from 2007 on &lt;a href=&#34;https://www.bobdc.com/blog/querying-dbpedia&#34;&gt;querying DBpedia&lt;/a&gt; and see that the chalkboard query no longer works in snorql. Could you tell me why this is?&lt;/p&gt;
&lt;p&gt;thanks for an informative blog!\&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt;&lt;a href=&#34;http://www.snee.com/bobdc.blog&#34;&gt;&lt;img alt=&#34;Author Profile Page&#34; src=&#34;http://www.snee.com/mt-static/images/comment/mt_logo.png&#34; width=&#34;16&#34; height=&#34;16&#34; /&gt;&lt;/a&gt;{.commenter-profile} on &lt;a href=&#34;#comment-2833&#34;&gt;April 21, 2011 8:36 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Matthew,&lt;/p&gt;
&lt;p&gt;DBpedia has rearranged some of their vocabulary. I just fixed the queries and description in that post so that the query now works properly.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://vasiliy.faronov.name/&#34; title=&#34;http://vasiliy.faronov.name/&#34;&gt;Vasiliy Faronov&lt;/a&gt; on &lt;a href=&#34;#comment-2834&#34;&gt;April 21, 2011 12:15 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;And the user can’t pass a double quote into that string, right?&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt;&lt;a href=&#34;http://www.snee.com/bobdc.blog&#34;&gt;&lt;img alt=&#34;Author Profile Page&#34; src=&#34;http://www.snee.com/mt-static/images/comment/mt_logo.png&#34; width=&#34;16&#34; height=&#34;16&#34; /&gt;&lt;/a&gt;{.commenter-profile} on &lt;a href=&#34;#comment-2835&#34;&gt;April 21, 2011 12:30 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Vasiliy,&lt;/p&gt;
&lt;p&gt;I didn&amp;rsquo;t try that, but it makes sense.&lt;/p&gt;
&lt;p&gt;For more fine-grained control over things like that, a script probably would be better, but a regex guru might be able to work it right into the .htaccess code.&lt;/p&gt;
&lt;p&gt;By amit on &lt;a href=&#34;#comment-2837&#34;&gt;April 21, 2011 11:35 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Nice read. Please Try my webapp for querying the semantic data: &lt;a href=&#34;http://WWW.s3space.com&#34;&gt;http://WWW.s3space.com&lt;/a&gt;\&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2011">2011</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
    </item>
    
    <item>
      <title>Getting started with SPARQL Update</title>
      <link>https://www.bobdc.com/blog/getting-started-with-sparql-up/</link>
      <pubDate>Mon, 04 Apr 2011 09:08:09 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/getting-started-with-sparql-up/</guid>
      
      
      <description><div>Using Fuseki.</div><div>&lt;p&gt;I&amp;rsquo;ve described in earlier postings (&lt;a href=&#34;https://www.bobdc.com/blog/trying-sparql-11-new-query-fea&#34;&gt;[1]&lt;/a&gt;,&lt;a href=&#34;http://www.snee.com/bobdc.blog/2010/10/playing-more-with-sparql-11-pr.html&#34;&gt;[2]&lt;/a&gt;) how I mostly use Jena &lt;a href=&#34;http://jena.sourceforge.net/ARQ/&#34;&gt;ARQ&lt;/a&gt; to play with SPARQL 1.1 queries. To try out the new &lt;a href=&#34;http://www.w3.org/TR/2010/WD-sparql11-update-20101014/&#34;&gt;SPARQL Update&lt;/a&gt; commands, I wanted to use a simple triplestore where I could add, replace, and delete triples, and Jena &lt;a href=&#34;http://openjena.org/wiki/Fuseki&#34;&gt;Fuseki&lt;/a&gt; has turned out to be a very simple way to do this.&lt;/p&gt;
&lt;p&gt;Unzipping the file that I downloaded from the Fuseki &lt;a href=&#34;http://openjena.org/repo-dev/org/openjena/fuseki/0.2.0-SNAPSHOT/&#34;&gt;release 0.2 development snapshot&lt;/a&gt; created a directory with a jar file, some shell scripts, and a few other files. The shell scripts included with this zip file are all Linux-oriented, but looking at them I figured out how to start up the Fuseki server in Windows easily enough. Everything shown below worked in both Windows XP and Ubuntu.&lt;/p&gt;
&lt;p&gt;Running this command in the Fuseki directory lists your options:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;java -jar fuseki-sys.jar --help
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The following command line worked great for me to start up the Fuseki server with a 1200 meg Java heap space, a figure I saw in the fuseki-server shell script included with the zip file. It allows users of the server to update data, stores data in a TDB database in the &lt;code&gt;dataDir&lt;/code&gt; subdirectory that I created in the Fuseki directory before running this command, and selects the myDataset dataset:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;java -Xmx1200M -jar fuseki-sys.jar --update --loc=dataDir /myDataset
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;After starting the server up, you can send your browser to the main Fuseki screen at &lt;a href=&#34;http://localhost:3030/&#34;&gt;http://localhost:3030&lt;/a&gt;. Click its Control Panel link, then click the &lt;strong&gt;Select&lt;/strong&gt; button to pick the /myDataset dataset, and you&amp;rsquo;ll be on the Fuseki Query screen:&lt;/p&gt;
&lt;img id=&#34;id103414&#34; src=&#34;https://www.bobdc.com/img/main/fusekimainscreen.jpg&#34; width=&#34;540&#34;/&gt;
&lt;p&gt;To insert a bit of data into the triplestore, paste the following into the box in the SPARQL Update part of the form and click the &lt;strong&gt;Perform update&lt;/strong&gt; button under the box:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;PREFIX d:    &amp;lt;http://example.com/ns/data#&amp;gt; 
PREFIX rdfs: &amp;lt;http://www.w3.org/2000/01/rdf-schema#&amp;gt;


INSERT DATA
{ 
  d:i1 rdfs:label &amp;quot;one&amp;quot; . 
  d:i2 rdfs:label &amp;quot;two&amp;quot; . 
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Fuseki will show a screen saying that your &amp;ldquo;Update succeeded&amp;rdquo;. Click your browser&amp;rsquo;s Back button to return to the main Fuseki screen.&lt;/p&gt;
&lt;p&gt;Next, enter the following classic query in the SPARQL Query part of the Fuseki Query form and click the &lt;strong&gt;Get Results&lt;/strong&gt; button there:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;SELECT *
WHERE { ?s ?p ?o }
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Fuseki will show you all the triples in your triplestore&amp;rsquo;s default graph—both of them. Click your Back button again, and Fuseki&amp;rsquo;s query form will still have the two queries you entered. You can edit them, comment out certain lines, and do whatever you like as you experiment with inserting, changing, and deleting data in the triplestore.&lt;/p&gt;
&lt;p&gt;Because of the command line options that I used when starting up Fuseki above, the data persists. If go to the command line window where you started up the Fuseki server and press Ctrl+C to shut it down, then start it up again and try the same SELECT query above, you&amp;rsquo;d see that the data is still there.&lt;/p&gt;
&lt;p&gt;There&amp;rsquo;s a lot more you can do with Fuseki, but the steps above were all I needed to create an environment where I could try out the commands described in the SPARQL Update spec. As far as I can tell, all the important parts of the spec work in Fuseki, so it&amp;rsquo;s a fine way to get to know this great new addition to SPARQL.&lt;/p&gt;
&lt;h2 id=&#34;1-comments&#34;&gt;1 Comments&lt;/h2&gt;
&lt;p&gt;By Scott Henninger on &lt;a href=&#34;#comment-2825&#34;&gt;April 6, 2011 3:19 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Also see &lt;a href=&#34;http://topquadrantblog.blogspot.com/search/label/SPARQL%20endpoint&#34;&gt;http://topquadrantblog.blogspot.com/search/label/SPARQL%20endpoint&lt;/a&gt;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2011">2011</category>
      
    </item>
    
    <item>
      <title>From a Wikipedia page to the corresponding DBpedia page in one click</title>
      <link>https://www.bobdc.com/blog/from-a-wikipedia-page-to-the-c/</link>
      <pubDate>Fri, 25 Mar 2011 09:09:27 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/from-a-wikipedia-page-to-the-c/</guid>
      
      
      <description><div>And a similar link for Freebase.</div><div>&lt;p&gt;The following two links won&amp;rsquo;t do much if you click them now, but if you drag them to your bookmarks toolbar, clicking the first one there while viewing a Wikipedia page will take you to the corresponding DBpedia page, and clicking the second while viewing the Freebase page for a particular topic will take you to the page full of RDF for that topic.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;javascript:location.href=(location.href.replace(/en.wikipedia.org%5C/wiki/,%22dbpedia.org%5C/page%22).replace(%22(%22,%22%5C%2%5C8%22).replace(%22)%22,%22%5C%2%5C9%22))&#34;&gt;wp -&amp;gt; dbpedia&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;javascript:location.href=(location.href.replace(/http:%5C/%5C/www.freebase.com%5C/view%5C/en%5C//,%22http:%5C/%5C/rdf.freebase.com%5C/rdf%5C/en.%22))&#34;&gt;freebase rdf&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;They&amp;rsquo;re both scriptlets, or little bits of Javascript code embedded in links. Each reads the URL of the currently displayed page, does a bit of string manipulation on it, and sends your browser to the resulting URL. I&amp;rsquo;ve had the DBpedia one for a while, but I recently found that DBpedia URLs escape parentheses when Wikipedia URLs don&amp;rsquo;t, so I fixed the scriptlet to account for this. While I was at it, I created the Freebase one, which is much simpler.&lt;/p&gt;
&lt;p&gt;If anyone&amp;rsquo;s interested, I also have scriptlets to go to a site&amp;rsquo;s home page, an &amp;ldquo;up&amp;rdquo; button (cd ..), and a backlink button that searches Google for webpages linking to the currently displayed one.&lt;/p&gt;
&lt;h2 id=&#34;4-comments&#34;&gt;4 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://tfmorris.blogspot.com&#34; title=&#34;http://tfmorris.blogspot.com&#34;&gt;Tom Morris&lt;/a&gt; on &lt;a href=&#34;#comment-2814&#34;&gt;March 27, 2011 12:43 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Your Freebase script doesn&amp;rsquo;t look like it&amp;rsquo;s using the proper namespace. There&amp;rsquo;s no guarantee that there will be a key in the /en namespace and, if there is, it may only be loosely related to the name of the Wikipedia article. You should reference the /wikipedia/en namespace and make sure the key is properly quoted.&lt;/p&gt;
&lt;p&gt;See my WTF (Wikipedia-to-Freebase) Chrome extension or Zak Dweil&amp;rsquo;s Greasemonkey script for a way to do this.&lt;br /&gt;
&lt;a href=&#34;https://chrome.google.com/extensions/detail/hgmjdmegeidmljpoilgmfeifmiepnbkn&#34;&gt;https://chrome.google.com/extensions/detail/hgmjdmegeidmljpoilgmfeifmiepnbkn&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt;&lt;a href=&#34;http://www.snee.com/bobdc.blog&#34;&gt;&lt;img alt=&#34;Author Profile Page&#34; src=&#34;http://www.snee.com/mt-static/images/comment/mt_logo.png&#34; width=&#34;16&#34; height=&#34;16&#34; /&gt;&lt;/a&gt;{.commenter-profile} on &lt;a href=&#34;#comment-2815&#34;&gt;March 27, 2011 12:49 PM&lt;/a&gt;&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;it may only be loosely related to the name of the Wikipedia article&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The idea was to go from the Freebase article to the Freebase page, not from the Wikipedia article to the Freebase page. I was just taking advantage of the commonality I found in URIs when I clicked the RDF links at the bottom of Freebase pages.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.s3space.com&#34; title=&#34;http://www.s3space.com&#34;&gt;Amit&lt;/a&gt; on &lt;a href=&#34;#comment-2829&#34;&gt;April 10, 2011 11:50 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;This is very useful for creating SPARQL queries. I created a site s3space.com several months ago where users can learn, create and share the SPARQL queries.&lt;/p&gt;
&lt;p&gt;This conversion tool will help many to create more SPARQL queries easily.&lt;/p&gt;
&lt;p&gt;Thanks for awesome tool.&lt;/p&gt;
&lt;p&gt;Regards,&lt;br /&gt;
Amit&lt;br /&gt;
&lt;a href=&#34;http://www.s3space.com&#34;&gt;http://www.s3space.com&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt;&lt;a href=&#34;http://www.snee.com/bobdc.blog&#34;&gt;&lt;img alt=&#34;Author Profile Page&#34; src=&#34;http://www.snee.com/mt-static/images/comment/mt_logo.png&#34; width=&#34;16&#34; height=&#34;16&#34; /&gt;&lt;/a&gt;{.commenter-profile} on &lt;a href=&#34;#comment-2836&#34;&gt;April 21, 2011 12:32 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Thanks Amit, s3space.com looks pretty cool.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2011">2011</category>
      
      <category domain="https://www.bobdc.com//categories/dbpedia">DBpedia</category>
      
      <category domain="https://www.bobdc.com//categories/rdf">RDF</category>
      
    </item>
    
    <item>
      <title>Pulling SKOS prefLabel and altLabel values out of DBpedia</title>
      <link>https://www.bobdc.com/blog/pulling-skos-preflabel-and-alt/</link>
      <pubDate>Tue, 22 Feb 2011 09:12:28 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/pulling-skos-preflabel-and-alt/</guid>
      
      
      <description><div>Or, using linked data to build a standards-compliant thesaurus with SPARQL.</div><div>&lt;p&gt;When my TopQuadrant colleague &lt;a href=&#34;http://dallemang.typepad.com/&#34;&gt;Dean Allemang&lt;/a&gt; referred to the use of &lt;a href=&#34;http://dbpedia.org/About&#34;&gt;DBpedia&lt;/a&gt; as a controlled vocabulary, I said &amp;ldquo;Huh?&amp;rdquo; He helped me to realize that if you and I want to refer to the same person, place, or thing, but there&amp;rsquo;s a chance that we might use different names for it, DBpedia&amp;rsquo;s URI for it might make the best identifier for us to both use. For example, if you refer to the nineteenth-century American president and Civil War general Ulysses S. Grant and I refer to him as Ulysses Grant, and then we find out that DBpedia&amp;rsquo;s URI for him is &lt;code&gt;http://dbpedia.org/resource/Ulysses_S._Grant&lt;/code&gt;, I&amp;rsquo;m not going to insist on leaving Grant&amp;rsquo;s middle initial out of the URI.&lt;/p&gt;
&lt;p&gt;Grant once had the nickname &amp;ldquo;Useless S. Grant&amp;rdquo;, and DBpedia can help us here, too. If you try to go to a Wikipedia page for &lt;a href=&#34;http://en.wikipedia.org/wiki/Useless_S._Grant&#34;&gt;http://en.wikipedia.org/wiki/Useless_S._Grant&lt;/a&gt;, instead of sending you an error message, Wikipedia will redirect you to the &lt;a href=&#34;http://en.wikipedia.org/wiki/Ulysses_S._Grant&#34;&gt;http://en.wikipedia.org/wiki/Ulysses_S._Grant&lt;/a&gt; page. DBpedia uses the &lt;code&gt;http://dbpedia.org/ontology/wikiPageRedirects&lt;/code&gt; property to track these redirect values, and a SPARQL query that uses it can list alternative names for things that have Wikipedia entries.&lt;/p&gt;
&lt;p&gt;I can use this and one of DBpedia&amp;rsquo;s &lt;a href=&#34;http://en.wikipedia.org/wiki/Wikipedia:Categorization&#34;&gt;Categories&lt;/a&gt; pages to drive a SPARQL query that selects preferred and alternative labels for a group of DBpedia entries at once. If you enter the following query on DBpedia&amp;rsquo;s &lt;a href=&#34;http://dbpedia.org/snorql/&#34;&gt;snorql&lt;/a&gt; form, it will give you a list of the preferred names of all the 19th-century presidents of the United States, as well as other names they might be known by.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;SELECT ?prefLabel ?altLabel 
WHERE 
{
  ?president dcterms:subject 
   &amp;lt;http://dbpedia.org/resource/Category:19th-century_presidents_of_the_United_States&amp;gt; ; 
         rdfs:label ?prefLabel  . 
   ?nickname &amp;lt;http://dbpedia.org/ontology/wikiPageRedirects&amp;gt; ?president ; 
         rdfs:label ?altLabel . 
   FILTER ( lang(?prefLabel) = &amp;quot;en&amp;quot; )
   FILTER ( lang(?altLabel) = &amp;quot;en&amp;quot; )
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The variable names I used will give &lt;a href=&#34;http://en.wikipedia.org/wiki/SKOS&#34;&gt;SKOS&lt;/a&gt; fans a clue where I&amp;rsquo;m going with this: the creation of SKOS triples from this data. The following variation on the SELECT query above declares that the URL for each president on the list of 19th century presidents is a &lt;code&gt;skos:Concept&lt;/code&gt;, and it then assigns &lt;code&gt;skos:prefLabel&lt;/code&gt; and &lt;code&gt;skos:altLabel&lt;/code&gt; values based on the same logic used in the query above.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;CONSTRUCT 
{
  ?pres a skos:Concept;
        skos:prefLabel ?prefLabel ;
        skos:altLabel ?altLabel . 
}
WHERE 
{
  ?pres dcterms:subject 
   &amp;lt;http://dbpedia.org/resource/Category:19th-century_presidents_of_the_United_States&amp;gt; ; 
        rdfs:label ?prefLabel . 
   ?alt &amp;lt;http://dbpedia.org/ontology/wikiPageRedirects&amp;gt; ?pres; rdfs:label ?altLabel . 
   FILTER ( lang(?altLabel) = &amp;quot;en&amp;quot; )
   FILTER ( lang(?prefLabel) = &amp;quot;en&amp;quot; )
 }
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;When running this query with DBpedia, it creates 300 triples. These include &lt;code&gt;skos:altLabel&lt;/code&gt; values such as &amp;ldquo;The Great Emancipator&amp;rdquo; and &amp;ldquo;Abe Lincoln&amp;rdquo; for Abraham Lincoln (or rather, for the concept &lt;code&gt;http://dbpedia.org/resource/Abraham_Lincoln&lt;/code&gt;, which has a &lt;code&gt;skos:prefLabel&lt;/code&gt; of &amp;ldquo;Abraham Lincoln&amp;rdquo;) as well as popular misspellings such as &amp;ldquo;Abraham Linkin&amp;rdquo; and &amp;ldquo;Presedent Lincon&amp;rdquo;. (If I was going to use this in a production application, I&amp;rsquo;d change the &lt;code&gt;skos:altLabel&lt;/code&gt; values based on misspellings to &lt;a href=&#34;http://www.w3.org/TR/2009/REC-skos-reference-20090818/#L2007&#34;&gt;&lt;code&gt;skos:hiddenLabel&lt;/code&gt;&lt;/a&gt; values.)&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s nice how a single query can pull data from DBpedia to populate a SKOS-based thesaurus with preferred and alternative labels. It makes a nice example of how SPARQL can add value (in this case, by redoing the data to conform to a specialized standard) from linked data.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://www.brainwashed.com/brain/brainv08i08.html&#34;&gt;&lt;img id=&#34;id103513&#34; src=&#34;http://www.brainwashed.com/brain/images/ulyssessgrant.jpg&#34; border=&#34;0&#34; style=&#34;display: block;margin-left: auto;margin-right: auto &#34; hspace=&#34;30px&#34; vspace=&#34;30px&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&#34;2-comments&#34;&gt;2 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://dannyayers.com&#34; title=&#34;http://dannyayers.com&#34;&gt;Danny&lt;/a&gt; on &lt;a href=&#34;#comment-2800&#34;&gt;February 23, 2011 5:46 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Nice one Bob! Be sure and post anything else you might have on (quasi-) extracting domain vocabs from DBpedia. Seems a lot better than making up names/URIs from scratch - not only because there will be linkage already in place, but also it&amp;rsquo;ll save loads of work in looking for synonyms etc.&lt;/p&gt;
&lt;p&gt;But what I really want to know - what is that guitar he&amp;rsquo;s playing!? Does seem to suit his name (and hairstyle).&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt;&lt;a href=&#34;http://www.snee.com/bobdc.blog&#34;&gt;&lt;img alt=&#34;Author Profile Page&#34; src=&#34;http://www.snee.com/mt-static/images/comment/mt_logo.png&#34; width=&#34;16&#34; height=&#34;16&#34; /&gt;&lt;/a&gt;{.commenter-profile} on &lt;a href=&#34;#comment-2801&#34;&gt;February 23, 2011 6:00 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Hi Danny, and thanks! No idea about the guitar; I just found that with some searches.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2011">2011</category>
      
      <category domain="https://www.bobdc.com//categories/dbpedia">DBpedia</category>
      
      <category domain="https://www.bobdc.com//categories/skos">SKOS</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
    </item>
    
    <item>
      <title>What SKOS-XL adds to SKOS</title>
      <link>https://www.bobdc.com/blog/what-skos-xl-adds-to-skos/</link>
      <pubDate>Tue, 08 Feb 2011 07:47:22 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/what-skos-xl-adds-to-skos/</guid>
      
      
      <description><div>Extra flexibility for label metadata.</div><div>&lt;p&gt;In my first few glances at &lt;a href=&#34;http://www.w3.org/TR/skos-reference/skos-xl.html&#34;&gt;SKOS eXtension for Labels&lt;/a&gt;, I didn&amp;rsquo;t quite get it. Recently, though, while looking at a client&amp;rsquo;s requirements document at TopQuadrant, when I saw that they wanted to attach metadata to individual terms, I started modeling this in my head and then I realized I didn&amp;rsquo;t need to: SKOS-XL already had.&lt;/p&gt;
&lt;blockquote id=&#34;id103319&#34; class=&#34;pullquote&#34;&gt;&#34;Any problem in Computer Science can be solved by another level of indirection.&#34;&lt;/blockquote&gt;
&lt;p&gt;First, why can&amp;rsquo;t you attach metadata to specific terms with the &lt;a href=&#34;http://www.w3.org/TR/2009/REC-skos-reference-20090818/&#34;&gt;base SKOS standard&lt;/a&gt;? Because although SKOS is an ontology for managing controlled vocabularies (and taxonomies, and thesauri), the basic unit of what it manages is not a term, which is what taxonomy management software always managed before. This is a Good Thing, because it makes internationalized vocabularies much easier to manage. I can have a single concept with a German preferred label of &amp;ldquo;Spirituosen&amp;rdquo;, a British English preferred label of &amp;ldquo;spirits&amp;rdquo;, an American English preferred label of &amp;ldquo;liquor&amp;rdquo;, and an American alternative label of &amp;ldquo;booze&amp;rdquo;, and they all refer to the same concept. The United Nations Food and Agriculture Organization&amp;rsquo;s &lt;a href=&#34;http://aims.fao.org/website/AGROVOC-Thesaurus/sub&#34;&gt;AGROVOC&lt;/a&gt; thesaurus is a good example of this practice, with dozens of preferred and alternate labels for some concepts.&lt;/p&gt;
&lt;p&gt;SKOS&amp;rsquo;s extensibility means that you can attach all the metadata you want to a particular concept, but not to one of the terms defined as labels for that concept. This is because, being labels, they&amp;rsquo;re strings. (In spec talk, they&amp;rsquo;re &amp;ldquo;lexical entities&amp;rdquo;, which isn&amp;rsquo;t quite the same thing, but close enough for our purposes.) SKOS is built on RDF, and in RDF triples strings can only be the objects of triples, not the subjects. So how can we assign metadata about the labels themselves, such as the name of the person who added a particular label, or the date it was last updated?&lt;/p&gt;
&lt;p&gt;The Cambridge computer scientist &lt;a href=&#34;http://en.wikipedia.org/wiki/David_Wheeler_(computer_scientist)&#34;&gt;David Wheeler&lt;/a&gt;, who in 1951 became the first person ever to complete a PhD in the field, once said &amp;ldquo;Any problem in Computer Science can be solved by adding another level of indirection&amp;rdquo;. That&amp;rsquo;s what SKOS-XL does: it defines variations on the SKOS &lt;code&gt;skos:prefLabel&lt;/code&gt; and &lt;code&gt;skos:altLabel&lt;/code&gt; properties called &lt;code&gt;skosxl:prefLabel&lt;/code&gt; and &lt;code&gt;skosxl:altLabel&lt;/code&gt; (assuming, as always, that these prefixes have been properly declared). Instead of having strings as their values, these extension properties point to members of the &lt;code&gt;skosxl:Label&lt;/code&gt; class. Members of this class have a &lt;code&gt;skosxl:literalForm&lt;/code&gt; property to identify a string that serves as a label for the concept, and it can have all the additional properties you want.&lt;/p&gt;
&lt;p&gt;The following shows some Turtle syntax for a SKOS-XL representation of the concept described above, with extra &lt;code&gt;:lastEdited&lt;/code&gt; and &lt;code&gt;:myCustomProperty&lt;/code&gt; properties adding metadata to some of the labels:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;@prefix skos:   &amp;lt;http://www.w3.org/2004/02/skos/core#&amp;gt; .
@prefix skosxl: &amp;lt;http://www.w3.org/2008/05/skos-xl#&amp;gt; .
@prefix :       &amp;lt;http://www.example.com/demo#&amp;gt; .
@prefix rdf:    &amp;lt;http://www.w3.org/2000/01/rdf-schema#&amp;gt; . 
@prefix xsd:    &amp;lt;http://www.w3.org/2001/XMLSchema#&amp;gt; .


:concept234 rdf:type skos:Concept ;
  skosxl:prefLabel :label1 ;
  skosxl:prefLabel :label2 ;
  skosxl:prefLabel :label3 ;
  skosxl:altLabel  :label4 .


:label1 rdf:type skosxl:Label ; 
  :lastEdited &amp;quot;2011-02-05T10:21:00&amp;quot;^^xsd:dateTime ;
  skosxl:literalForm &amp;quot;Spirituosen&amp;quot;@de .


:label2 rdf:type skosxl:Label ; 
  :lastEdited &amp;quot;2011-02-05T10:28:00&amp;quot;^^xsd:dateTime ;
  :myCustomProperty 2.71828 ;
  skosxl:literalForm &amp;quot;spirits&amp;quot;@en-GB .


:label3 rdf:type skosxl:Label ; 
  :lastEdited &amp;quot;2011-02-05T10:34:00&amp;quot;^^xsd:dateTime ;
  skosxl:literalForm &amp;quot;liquor&amp;quot;@en-US .


:label4 rdf:type skosxl:Label ; 
  :lastEdited &amp;quot;2011-02-05T10:42:00&amp;quot;^^xsd:dateTime ;
  :myCustomProperty 3.1415 ;
  skosxl:literalForm &amp;quot;booze&amp;quot;@en-US .
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The general idea is pretty elegant, and having a standardized way to do it prevents me and others from developing our own variations that do the same thing. I&amp;rsquo;m glad I didn&amp;rsquo;t take that model in my head too far.&lt;/p&gt;
&lt;p&gt;How much use of SKOS-XL have people seen in the real world?&lt;/p&gt;
&lt;h2 id=&#34;4-comments&#34;&gt;4 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.semanlink.net&#34; title=&#34;http://www.semanlink.net&#34;&gt;fps&lt;/a&gt; on &lt;a href=&#34;#comment-2791&#34;&gt;February 8, 2011 5:09 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Hmm… nothing states that we can deduce the statements involving the standard skos props?&lt;/p&gt;
&lt;p&gt;&amp;ldquo;All problems in computer science can be solved by another level of indirection&amp;hellip; Except for the problem of too many layers of indirection.&amp;rdquo; - or too many indirections to deal with.&lt;/p&gt;
&lt;p&gt;So, couldn&amp;rsquo;t we have a global mechanism that would work for any property whose range is rdfs:Literal ? Something around :&lt;/p&gt;
&lt;p&gt;exlit:ExtendedLiteral a rdfs:Class.&lt;/p&gt;
&lt;p&gt;exlit:extendedLiteral&lt;br /&gt;
a rdfs:Property;&lt;br /&gt;
rdfs:range ExtendedLiteral.&lt;br /&gt;
dc:comment &amp;ldquo;from a statement using this property, one can deduce the statement whose subject is this statement&amp;rsquo;s subject , whose predicate is this statement&amp;rsquo;s object&amp;rsquo;s exlit:property value, and whose object is statement&amp;rsquo;s object&amp;rsquo;s exlit:value .&amp;rdquo;&lt;/p&gt;
&lt;p&gt;exlit:property a rdfs:Property;&lt;br /&gt;
rdfs:domain exlit:ExtendedLiteral;&lt;br /&gt;
rdfs:range rdfs:Property;&lt;br /&gt;
dc:comment &amp;ldquo;the &amp;rsquo;literal property&amp;rsquo;&amp;quot;@en.&lt;/p&gt;
&lt;p&gt;exlit:value a rdfs:Property;&lt;br /&gt;
rdfs:domain exlit:ExtendedLiteral;&lt;br /&gt;
rdfs:range rdfs:Literal;&lt;br /&gt;
dc:comment &amp;ldquo;the literal value&amp;rdquo;@en.\&lt;/p&gt;
&lt;p&gt;So, from:&lt;br /&gt;
ex:Me extendedLiteral ex:MyGivenName.&lt;br /&gt;
ex:MyGivenName a ExtendedLiteral;&lt;br /&gt;
exlit:property foaf:givenName;&lt;br /&gt;
exlit:value &amp;ldquo;François-Paul&amp;rdquo;;&lt;br /&gt;
dc:comment &amp;ldquo;My parents first thought of calling me François, like one if my grand fathers, but they resolved to add Paul like the other one, for having them both happy&amp;rdquo;.&lt;/p&gt;
&lt;p&gt;You could deduce ex:me foaf:givenName &amp;ldquo;François-Paul&amp;rdquo;.&lt;/p&gt;
&lt;p&gt;Does it make sense ?&lt;/p&gt;
&lt;p&gt;Best&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob&lt;/a&gt; on &lt;a href=&#34;#comment-2792&#34;&gt;February 8, 2011 9:43 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Sure, I guess. I&amp;rsquo;m just happy that a way to address this client&amp;rsquo;s needs didn&amp;rsquo;t require new modeling and non-standardized extensions, like I originally thought it would.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.kanzaki.com/&#34; title=&#34;http://www.kanzaki.com/&#34;&gt;masaka&lt;/a&gt; on &lt;a href=&#34;#comment-2793&#34;&gt;February 8, 2011 10:08 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Hi,&lt;/p&gt;
&lt;p&gt;NDLSH (National Diet Library Subject Heading) uses SKOS-XL heavily to give &amp;lsquo;yomi&amp;rsquo; (transcription) to each label. See for example,&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://id.ndl.go.jp/auth/ndlsh/00574798.ttl&#34;&gt;http://id.ndl.go.jp/auth/ndlsh/00574798.ttl&lt;/a&gt;\&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob&lt;/a&gt; on &lt;a href=&#34;#comment-2794&#34;&gt;February 9, 2011 8:55 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Thanks Masaka, those are very interesting.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2011">2011</category>
      
      <category domain="https://www.bobdc.com//categories/skos">SKOS</category>
      
    </item>
    
    <item>
      <title>More streamlined communication</title>
      <link>https://www.bobdc.com/blog/more-streamlined-communication/</link>
      <pubDate>Thu, 06 Jan 2011 09:49:27 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/more-streamlined-communication/</guid>
      
      
      <description><div>Teenagers now, Taylorites in 1910.</div><div>&lt;p&gt;In an &lt;a href=&#34;http://itc.conversationsnetwork.org/shows/detail4757.html&#34;&gt;ITConversations podcast&lt;/a&gt; interview with Tim O&amp;rsquo;Reilly and John Batelle, Mark Zuckerberg describes a recent conversation with a teenage relative of his girlfriend. Those of us with teenage kids know that they consider email a bit old-fashioned, and this girl explained to Zuckerberg why: because it&amp;rsquo;s so slow. He was puzzled, thinking that email is practically instantaneous; why was it slow? Because, the girl replied, it&amp;rsquo;s slow to create a message. You look up someone&amp;rsquo;s email address, your write out a subject line, you start your message with some sort of salutation, then you write it, then you sign off at end, and so forth.&lt;/p&gt;
&lt;p&gt;Obviously, as The Facebook Guy, Zuckerberg is pretty tuned in to how modern teenagers communicate, and he was telling this story to describe the motivation behind whatever Facebook&amp;rsquo;s latest spin on IM is. The story got me thinking back 100 years, though (or 101, now that it&amp;rsquo;s 2011).&lt;/p&gt;
&lt;p&gt;JoAnne Yates&amp;rsquo; excellent 1989 book &lt;a href=&#34;http://www.amazon.com/exec/obidos/ISBN=0801846137/bobducharmeA/&#34;&gt;Control through Communication: The Rise of System in American Management&lt;/a&gt; covers part of a topic that I&amp;rsquo;ve been interested in for a while: the change in information management that must have accompanied the industrial revolution. The factories making all those new things had to efficiently keep track of the what they made and the parts that went into it if they wanted to make a profit selling those things. Yates&amp;rsquo; book covers several things that could be considered early content management, and Zuckerberg&amp;rsquo;s story reminded me of one part in particular. To quote her book,&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Further changes in form were designed to make internal correspondence cheaper and more efficient to type, handle, and file. Writing in 1910 about what he called &amp;ldquo;interhouse correspondence,&amp;rdquo; or correspondence between different locations of a single company, one author recommended several changes in form that would make these documents look less like letters and more like present-day memos. His discussion is worth quoting at length, for it sheds light on the underlying reasons for the changes.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;In the first place, all unnecessary courtesy, such as &amp;ldquo;Fred Brown &amp;amp; Co.,&amp;rdquo; &amp;ldquo;Gentlemen,&amp;rdquo; &amp;ldquo;yours very truly,&amp;rdquo; and other phrases are omitted entirely. In a business where hundreds and sometimes thousands of interhouse letters are written daily the saving of time is considerable. Next, an expensive letterhead is done away with, and this also is a factor in reducing expense. The blank is made with simply the words, &amp;ldquo;From Chicago,&amp;rdquo; &amp;ldquo;From Atlanta,&amp;rdquo; or whatever may be the name of the town where the letter is written, printed in the upper left-hand corner, and underneath the word, &amp;ldquo;Subject.&amp;rdquo;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/blockquote&gt;
&lt;p&gt;The 1910 quote also recommends that internal letters include a serial number and that one letter replying to another should reference its serial number, or as I prefer to think of it, include a link to its unique ID. The book goes on to describe the origins of the memorandum—in later years, &amp;ldquo;memo&amp;rdquo;—which dispensed with the flowery niceties of traditional 19th-century correspondence because, in communication within a company, efficiency was more important than politeness conventions. Putting the message&amp;rsquo;s subject, date, and sender and recipient&amp;rsquo;s names in what we would now call a fielded metadata header made the information easier to digest, file, and receive. (Elsewhere, the book covers a 1902 recommendation that for easier filing and retrieval a piece of internal correspondence should cover no more than one topic—a century before DITA and over 60 years before Information Mapping.) The name of Frederick Taylor, who Dan Brickley mentioned in his &lt;a href=&#34;http://danbri.org/words/2011/01/01/650&#34;&gt;New Years blog posting&lt;/a&gt;, comes up often in Yates&amp;rsquo; book as a big influence on this thinking in general and Du Pont&amp;rsquo;s operations in particular.&lt;/p&gt;
&lt;p&gt;On the one hand, the way kids skip what they see as extraneous information seems to continue this trend. On the other hand, the things that I like about email that the kids don&amp;rsquo;t care about are the things that the Taylorites developed to help manage that content: clearly marked fields of information to make it easier to archive and retrieve the memos.&lt;/p&gt;
&lt;p&gt;Either way, it&amp;rsquo;s always interesting to look at long-term trends in information management by looking earlier than 1970, which computer scientists typically consider to be the stone age. I&amp;rsquo;d love any suggestions about related reading on the topic of information management during the industrial revolution.&lt;/p&gt;
&lt;h2 id=&#34;2-comments&#34;&gt;2 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://gromgull.net&#34; title=&#34;http://gromgull.net&#34;&gt;Gunnar&lt;/a&gt; on &lt;a href=&#34;#comment-2758&#34;&gt;January 7, 2011 5:50 AM&lt;/a&gt;&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;On the other hand, the things that I like about email that the kids don&amp;rsquo;t care about are the things that the Taylorites developed to help manage that content: clearly marked fields of information to make it easier to archive and retrieve the memos.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Surely they also like them - at least if they ever have to find a piece of communication again at a later stage. However, they are boring to enter manually, because you know the system could easily enter if for you. After you clicked &amp;ldquo;send message&amp;rdquo; on some facebook page only the subject requires minor thought to fill in - and this could be &amp;ldquo;fixed&amp;rdquo; even without being (very) clever by just taking the first line (like Word suggests the filename when you save a new document)&lt;/p&gt;
&lt;p&gt;I.e. what they do not like is Connolly&amp;rsquo;s Bane - probably just like you :)&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://danbri.org/words/&#34; title=&#34;http://danbri.org/words/&#34;&gt;Dan Brickley&lt;/a&gt; on &lt;a href=&#34;#comment-2765&#34;&gt;January 11, 2011 10:12 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;br /&gt;
Taylor would&amp;rsquo;ve loved Amazon&amp;rsquo;s Mechanical Turk. And see&lt;br /&gt;
&lt;a href=&#34;http://behind-the-enemy-lines.blogspot.com/2010/12/excerpts-from-communist-manifesto.html&#34;&gt;http://behind-the-enemy-lines.blogspot.com/2010/12/excerpts-from-communist-manifesto.html&lt;/a&gt; for some rather timely Marx quotes&amp;hellip;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2011">2011</category>
      
      <category domain="https://www.bobdc.com//categories/technology-past">technology, past</category>
      
    </item>
    
    <item>
      <title>What REST is really about</title>
      <link>https://www.bobdc.com/blog/what-rest-is-really-about/</link>
      <pubDate>Fri, 19 Nov 2010 08:57:18 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/what-rest-is-really-about/</guid>
      
      
      <description><div>According to the primary source document.</div><div>&lt;p&gt;I had thought that &amp;ldquo;RESTful&amp;rdquo; meant &amp;ldquo;easily accessible with an HTTP GET, even when something isn&amp;rsquo;t HTML&amp;rdquo;. Shortly after a RESTafarian &lt;a href=&#34;https://www.bobdc.com/blog/restful-sparql-queries-of-rdfa#comment-2528&#34;&gt;pointed out&lt;/a&gt; that there was more to it than that, I went to Brian Sletten&amp;rsquo;s excellent presentation &lt;a href=&#34;http://semtech2010.semanticuniverse.com/sessionPop.cfm?confid=42&amp;amp;proposalid=3065&#34;&gt;REST: Information Architecture for the 21st Century&lt;/a&gt; at the Semantic Technologies conference and I learned a lot more about what being RESTful implies. During the presentation I asked Brian whether &lt;a href=&#34;http://www.ics.uci.edu/~fielding/pubs/dissertation/top.htm&#34;&gt;Roy Fielding&amp;rsquo;s 2000 doctoral thesis&lt;/a&gt; that originally laid out what REST was all about was readable, for a PhD thesis, and he assured me that it was.&lt;/p&gt;
&lt;blockquote id=&#34;id103359&#34; class=&#34;pullquote&#34;&gt;Anyone with a basic understanding of software architecture issues can and should read Fielding&#39;s thesis.&lt;/blockquote&gt;
&lt;p&gt;He was right. Anyone with a basic understanding of software architecture issues can and should read Fielding&amp;rsquo;s thesis. I wish I&amp;rsquo;d read it years ago. I&amp;rsquo;ve copied a few nice quotes here, starting with this:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Software architecture research investigates methods for determining how best to partition a system, how components identify and communicate with each other, how information is communicated, how elements of a system can evolve independently, and how all of the above can be described using formal and informal notations.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;What the acronym &amp;ldquo;Representational State Transfer&amp;rdquo; really means (emphasis mine):&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;REST components perform actions on a resource by using a &lt;strong&gt;representation&lt;/strong&gt; to capture the current or intended &lt;strong&gt;state&lt;/strong&gt; of that resource and &lt;strong&gt;transferring&lt;/strong&gt; that representation between components. A representation is a sequence of bytes, plus representation metadata to describe those bytes. Other commonly used but less precise names for a representation include: document, file, and HTTP message entity, instance, or variant.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;It can select from a choice of representations; I now better appreciate the important role of content negotiation in REST:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;This abstract definition of a resource&amp;hellip; provides generality by encompassing many sources of information without artificially distinguishing them by type or implementation [and] allows late binding of the reference to a representation, enabling content negotiation to take place based on characteristics of the request. Finally, it allows an author to reference the concept rather than some singular representation of that concept, thus removing the need to change all existing links whenever the representation changes (assuming the author used the right identifier).&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;With all the talk now of REST interfaces to services that are not necessarily delivering hypertext documents, it&amp;rsquo;s interesting how often the thesis talks about REST being designed around hypermedia. The thesis&amp;rsquo;s introduction refers to it as &amp;ldquo;REST, a novel architectural style for distributed hypermedia systems,&amp;rdquo; and also mentions this,&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;REST is defined by four interface constraints: identification of resources; manipulation of resources through representations; self-descriptive messages; and, hypermedia as the engine of application state.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;and this:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;REST was originally referred to as the &amp;ldquo;HTTP object model,&amp;rdquo; but that name would often lead to misinterpretation of it as the implementation model of an HTTP server. The name &amp;ldquo;Representational State Transfer&amp;rdquo; is intended to evoke an image of how a well-designed Web application behaves: a network of web pages (a virtual state-machine), where the user progresses through the application by selecting links (state transitions), resulting in the next page (representing the next state of the application) being transferred to the user and rendered for their use.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;&amp;ldquo;Resource&amp;rdquo; is a pretty commonly used term, with its position as the &amp;ldquo;R&amp;rdquo; in &amp;ldquo;RDF&amp;rdquo; being only the tip of the iceberg. So what exactly is a resource?&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The resource is not the storage object. The resource is not a mechanism that the server uses to handle the storage object. The resource is a conceptual mapping—the server receives the identifier (which identifies the mapping) and applies it to its current mapping implementation (usually a combination of collection-specific deep tree traversal and/or hash tables) to find the currently responsible handler implementation and the handler implementation then selects the appropriate action+response based on the request content. All of these implementation-specific issues are hidden behind the Web interface; their nature cannot be assumed by a client that only has access through the Web interface.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This note on MIME&amp;rsquo;s relationship to HTTP was interesting:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;HTTP inherited its message syntax from MIME in order to retain commonality with other Internet protocols and reuse many of the standardized fields for describing media types in messages. Unfortunately, MIME and HTTP have very different goals, and the syntax is only designed for MIME&amp;rsquo;s goals.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Why shouldn&amp;rsquo;t you treat HTTP as a way to do Remote Procedure Calls? (And what&amp;rsquo;s my new favorite adjective to put in front of &amp;ldquo;scalable&amp;rdquo;?)&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;What makes HTTP significantly different from RPC is that the requests are directed to resources using a generic interface with standard semantics that can be interpreted by intermediaries almost as well as by the machines that originate services. The result is an application that allows for layers of transformation and indirection that are independent of the information origin, which is very useful for an Internet-scale, multi-organization, anarchically scalable information system. RPC mechanisms, in contrast, are defined in terms of language APIs, not network-based applications.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;More on his carefully chosen terms &amp;ldquo;representation&amp;rdquo; and &amp;ldquo;transfer&amp;rdquo;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;HTTP is not designed to be a transport protocol. It is a transfer protocol in which the messages reflect the semantics of the Web architecture by performing actions on resources through the transfer and manipulation of representations of those resources. It is possible to achieve a wide range of functionality using this very simple interface, but following the interface is required in order for HTTP semantics to remain visible to intermediaries.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Keep in mind that this was published ten years ago, about a century in Internet time. It&amp;rsquo;s more relevant than ever, and I recommend that you put it high on your reading list.&lt;/p&gt;
&lt;h2 id=&#34;3-comments&#34;&gt;3 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://twitter.com/webr3&#34; title=&#34;http://twitter.com/webr3&#34;&gt;Nathan&lt;/a&gt; on &lt;a href=&#34;#comment-2711&#34;&gt;November 20, 2010 6:41 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Exactly Bob, if I had a company, the first tasks for any new starts would be to:&lt;/p&gt;
&lt;p&gt;1) Read the /full/ REST dissertation (and chapter 5&amp;amp;6 twice!).&lt;br /&gt;
2) Read the original design for the world wide web.&lt;br /&gt;
3) Read the early HTTP and HTML specs, and also as many Design Issues as possible.&lt;/p&gt;
&lt;p&gt;Regardless of whether they&amp;rsquo;re a junior developer or a time served senior architect.&lt;/p&gt;
&lt;p&gt;The only point I will add, is remember that each RFC and specification has it&amp;rsquo;s own definition of &amp;ldquo;resource&amp;rdquo; with slight difference throughout, as TimBL recently pointed out, it&amp;rsquo;s not a universal term with universal meaning across all specs.&lt;/p&gt;
&lt;p&gt;Best,&lt;/p&gt;
&lt;p&gt;Nathan&lt;/p&gt;
&lt;p&gt;By Randy on &lt;a href=&#34;#comment-2712&#34;&gt;November 23, 2010 7:13 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;If you want to see how this line of thinking can be applied to software development in the smallest units of functionality, look at NetKernel (&lt;a href=&#34;http://www.1060research.com/netkernel/)&#34;&gt;http://www.1060research.com/netkernel/)&lt;/a&gt;. That software platform is based on a REST microkernel and allows you to build all of your software this way.&lt;/p&gt;
&lt;p&gt;&amp;ndash; Randy\&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt;&lt;a href=&#34;http://www.snee.com/bobdc.blog&#34;&gt;&lt;img alt=&#34;Author Profile Page&#34; src=&#34;http://www.snee.com/mt-static/images/comment/mt_logo.png&#34; width=&#34;16&#34; height=&#34;16&#34; /&gt;&lt;/a&gt;{.commenter-profile} on &lt;a href=&#34;#comment-2713&#34;&gt;November 23, 2010 8:41 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;As a matter of fact, in Brian&amp;rsquo;s talk he discussed NetKernel a lot. It looks pretty cool.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2010">2010</category>
      
      <category domain="https://www.bobdc.com//categories/miscellaneous">miscellaneous</category>
      
    </item>
    
    <item>
      <title>Playing more with SPARQL 1.1 property paths</title>
      <link>https://www.bobdc.com/blog/playing-more-with-sparql-11-pr/</link>
      <pubDate>Fri, 15 Oct 2010 08:58:09 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/playing-more-with-sparql-11-pr/</guid>
      
      
      <description><div>Some fun new features.</div><div>&lt;p&gt;I recently wrote about &lt;a href=&#34;https://www.bobdc.com/blog/trying-sparql-11-new-query-fea&#34;&gt;trying SPARQL 1.1 new query features with ARQ&lt;/a&gt;, and one thing I briefly tried was the new property paths feature. At the time, the query spec only had a placeholder for property paths, but the new version of it released yesterday has a &lt;a href=&#34;http://www.w3.org/TR/2010/WD-sparql11-query-20101014/#propertypaths&#34;&gt;detailed section&lt;/a&gt; on property paths with plenty of examples.&lt;/p&gt;
&lt;p&gt;I had seen the &lt;a href=&#34;http://www.w3.org/2009/sparql/docs/property-paths/Overview.xml&#34;&gt;separate document&lt;/a&gt; where this material was first drafted and tried out its examples, (except for the &amp;ldquo;Subproperty&amp;rdquo; and &amp;ldquo;Elements in an RDF collection&amp;rdquo; ones) and they all worked fine with ARQ &lt;a href=&#34;http://sourceforge.net/projects/jena/files/ARQ/ARQ-2.8.5/&#34;&gt;2.8.5&lt;/a&gt;. If you want to try themselves, &lt;a href=&#34;http://www.snee.com/bobdc.blog/files/proppaths.zip&#34;&gt;this zip file&lt;/a&gt; has the sample data file that I mocked up and the 12 query files. (Thanks to Andy Seaborne for helping me to straighten out my data and some of my tests.)&lt;/p&gt;
&lt;p&gt;They gave me more and more ideas for interesting queries that I can do with very little SPARQL code—for example, how to get a subtree of a hierarchy, or how to find nodes that have the same connection to the same nodes that a particular node has (for example, who likes the same bands that John likes, or who has the same friends that Jane has).&lt;/p&gt;
&lt;p&gt;If you didn&amp;rsquo;t see the separate property paths draft document and you&amp;rsquo;re interested in SPARQL, it&amp;rsquo;s definitely worth skimming &lt;a href=&#34;http://www.w3.org/TR/2010/WD-sparql11-query-20101014/#propertypaths&#34;&gt;section 9&lt;/a&gt; of the new query spec draft. There&amp;rsquo;s a lot of neat stuff there.&lt;/p&gt;
&lt;h2 id=&#34;4-comments&#34;&gt;4 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://danbri.or/words/&#34; title=&#34;http://danbri.or/words/&#34;&gt;Dan Brickley&lt;/a&gt; on &lt;a href=&#34;#comment-2654&#34;&gt;October 15, 2010 1:12 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Re hierarchies, I&amp;rsquo;ve not looked into this yet properly,&amp;hellip; but perhaps it is then a good fit for SKOS data, which builds hierarchies with skos:broader links?&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt;&lt;a href=&#34;http://www.snee.com/bobdc.blog&#34;&gt;&lt;img alt=&#34;Author Profile Page&#34; src=&#34;http://www.snee.com/mt-static/images/comment/mt_logo.png&#34; width=&#34;16&#34; height=&#34;16&#34; /&gt;&lt;/a&gt;{.commenter-profile} on &lt;a href=&#34;#comment-2655&#34;&gt;October 15, 2010 1:17 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Dan,&lt;/p&gt;
&lt;p&gt;Hell yeah! I&amp;rsquo;ve already used it for that on work-related projects, where I&amp;rsquo;ve been getting much deeper into SKOS.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://scardf.org&#34; title=&#34;http://scardf.org&#34;&gt;Hrvoje Simic&lt;/a&gt; on &lt;a href=&#34;#comment-2662&#34;&gt;October 27, 2010 11:39 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;When you say &amp;ldquo;how to get a subtree of a hierarchy&amp;rdquo;, do you mean something like in the example:&lt;/p&gt;
&lt;p&gt;?ancestor (ex:motherOf|ex:fatherOf)+&lt;/p&gt;
&lt;p&gt;or did have something more complex in mind, like extracting a subgraph (CONSTRUCT form)?&lt;/p&gt;
&lt;p&gt;Hrvoje&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt;&lt;a href=&#34;http://www.snee.com/bobdc.blog&#34;&gt;&lt;img alt=&#34;Author Profile Page&#34; src=&#34;http://www.snee.com/mt-static/images/comment/mt_logo.png&#34; width=&#34;16&#34; height=&#34;16&#34; /&gt;&lt;/a&gt;{.commenter-profile} on &lt;a href=&#34;#comment-2663&#34;&gt;October 27, 2010 12:27 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Something in between, I suppose, but I don&amp;rsquo;t think that a CONSTRUCT form would be that complex. See &lt;a href=&#34;http://lists.w3.org/Archives/Public/public-esw-thes/2010Oct/0015.html&#34;&gt;http://lists.w3.org/Archives/Public/public-esw-thes/2010Oct/0015.html&lt;/a&gt; for a few SKOS examples that I wrote out. A CONSTRUCT version of example 1 there would be something like&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;CONSTRUCT { ?c ?p ?o }
WHERE {
   ?c skos:broader* i:VariableStars .
   ?c ?p ?o .
}
&lt;/code&gt;&lt;/pre&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2010">2010</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
    </item>
    
    <item>
      <title>Integrate disparate data sources with Semantic Web technology</title>
      <link>https://www.bobdc.com/blog/integrate-disparate-data-sourc/</link>
      <pubDate>Thu, 30 Sep 2010 08:34:47 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/integrate-disparate-data-sourc/</guid>
      
      
      <description><div>A new developerWorks article.</div><div>&lt;p&gt;&lt;a href=&#34;http://www.ibm.com/developerworks/library/x-disprdf/index.html&#34;&gt;&lt;img id=&#34;id103324&#34; src=&#34;https://www.bobdc.com/img/main/dw-home2.jpg&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;developerWorks logo&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve given a presentation to both the &lt;a href=&#34;http://www.meetup.com/semweb-25/&#34;&gt;New York&lt;/a&gt; and &lt;a href=&#34;http://www.meetup.com/semweb-31/&#34;&gt;Washington D.C.&lt;/a&gt; semweb meetups about how useful semantic web technology can be even if you&amp;rsquo;re data isn&amp;rsquo;t stored as RDF. I showed a little app that pulls (fake) buy/sell/hold recommendation from an Excel spreadsheet and the latest stock quotes and DBpedia about the relevant companies from the appropriate sources, converts this data to RDF as necessary, and then combines it all into a nice-looking HTML report.&lt;/p&gt;
&lt;p&gt;The general architecture is more important than the specific implementation, and to make this clearer I implemented it with all free software and then again with TopQuadrant&amp;rsquo;s TopBraid Composer. I wrote an article about the free implementation that has just gone up on IBM developerWorks as &lt;a href=&#34;http://www.ibm.com/developerworks/library/x-disprdf/index.html&#34;&gt;Integrate disparate data sources with Semantic Web technology&lt;/a&gt;. It&amp;rsquo;s even a &lt;a href=&#34;http://www.ibm.com/developerworks/&#34;&gt;featured article&lt;/a&gt; for a few days.&lt;/p&gt;
&lt;h2 id=&#34;2-comments&#34;&gt;2 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.oilit.com&#34; title=&#34;http://www.oilit.com&#34;&gt;Neil McNaughton&lt;/a&gt; on &lt;a href=&#34;#comment-2644&#34;&gt;October 5, 2010 2:34 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;A great article. Just one thing, I would have liked to see a few lines of the RDF generated from Excel - to see what is being captured and processed.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt;&lt;a href=&#34;http://www.snee.com/bobdc.blog&#34;&gt;&lt;img alt=&#34;Author Profile Page&#34; src=&#34;http://www.snee.com/mt-static/images/comment/mt_logo.png&#34; width=&#34;16&#34; height=&#34;16&#34; /&gt;&lt;/a&gt;{.commenter-profile} on &lt;a href=&#34;#comment-2647&#34;&gt;October 5, 2010 9:46 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Neil,&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s pretty straightfoward, e.g.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;  &amp;lt;rdf:RDF


    xmlns:anrecs=&#39;http://www.snee.com/ns/analystRatings#&#39;


    xmlns:rdf=&#39;http://www.w3.org/1999/02/22-rdf-syntax-ns#&#39;


    xml:base=&#39;http://www.snee.com/ns/analystRatings#&#39;&amp;gt;


    &amp;lt;rdf:Description rdf:about=&#39;anrecs:1&#39;&amp;gt;


      &amp;lt;anrecs:analyst&amp;gt;Nick Perkins&amp;lt;/anrecs:analyst&amp;gt;


      &amp;lt;anrecs:tickersymbol&amp;gt;CAT&amp;lt;/anrecs:tickersymbol&amp;gt;


      &amp;lt;anrecs:company&amp;gt;Caterpillar Inc.&amp;lt;/anrecs:company&amp;gt;


      &amp;lt;anrecs:recommendation&amp;gt;SELL&amp;lt;/anrecs:recommendation&amp;gt;


      &amp;lt;anrecs:date-time&amp;gt;2010-07-14T13:36:00&amp;lt;/anrecs:date-time&amp;gt;


      &amp;lt;anrecs:description&amp;gt;Caterpillar has had an interesting quarter. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Praesent sed lectus augue. Suspendisse nisl nisl, pulvinar eu luctus non, sodales non magna. Sed in metus arcu, sit amet ornare nunc. Duis fermentum, nibh quis fermentum sagittis, mi eros porttitor magna, sed dictum tortor quam ut lectus. Praesent eu est augue.&amp;lt;/anrecs:description&amp;gt;


    &amp;lt;/rdf:Description&amp;gt;


  &amp;lt;!-- etc. --&amp;gt;


  &amp;lt;/rdf:RDF&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Bob&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2010">2010</category>
      
      <category domain="https://www.bobdc.com//categories/semantic-web">semantic web</category>
      
    </item>
    
    <item>
      <title>Les Frères DuCharmes</title>
      <link>https://www.bobdc.com/blog/les-freres-ducharmes/</link>
      <pubDate>Tue, 28 Sep 2010 09:15:06 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/les-freres-ducharmes/</guid>
      
      
      <description><div>Making music with dodgy electronics, in a damp basement, in 1983.</div><div>&lt;p&gt;&lt;a href=&#34;http://www.mcylinder.com/happenings/les-freres-du-charmes/&#34;&gt;&lt;img id=&#34;id103326&#34; src=&#34;http://www.mcylinder.com/wp/wp-content/uploads/2010/08/reel-238x300.jpg&#34; border=&#34;0&#34; align=&#34;right&#34; hspace=&#34;30px&#34; width=&#34;80px&#34; vspace=&#34;30px&#34; alt=&#34;tape deck graphic&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;My brother Peter wrote a &lt;a href=&#34;http://www.mcylinder.com/happenings/les-freres-du-charmes/&#34;&gt;blog entry&lt;/a&gt; on some musical experiments that he and I did together 27 years ago. I&amp;rsquo;m confident that James &amp;ldquo;LCD Soundsystem&amp;rdquo; Murphy never heard our &amp;ldquo;Stop This Crazy Thing&amp;rdquo;, but his &lt;a href=&#34;http://www.youtube.com/watch?v=OzuFeXYbOOo&#34;&gt;Losing My Edge&lt;/a&gt; 19 years later might give that impression. I guess any fan of &lt;a href=&#34;http://en.wikipedia.org/wiki/99_Records&#34;&gt;99 Records&lt;/a&gt; bands who owned a &lt;a href=&#34;http://www.sonicstate.com/synth/casio_pt-20/&#34;&gt;Casio PT-20&lt;/a&gt; might have done something similar.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2010">2010</category>
      
      <category domain="https://www.bobdc.com//categories/music">music</category>
      
    </item>
    
    <item>
      <title>Fallback with SPARQL</title>
      <link>https://www.bobdc.com/blog/fallback-with-sparql/</link>
      <pubDate>Wed, 22 Sep 2010 09:09:48 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/fallback-with-sparql/</guid>
      
      
      <description><div>&#34;Use this term if available, else fall back to that one&#34;.</div><div>&lt;p&gt;Last April Richard Cyganiak &lt;a href=&#34;http://twitter.com/cygri/status/11896004026&#34;&gt;tweeted&lt;/a&gt; the following:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;@iand @ldodds &amp;ldquo;use this term if available, else fall back to that one&amp;rdquo; is common when consuming RDF, not well supported by SPARQL or RDFS&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I took this as a challenge (if not as a very pressing one, if I waited this long to follow through). I managed to write a SPARQL query that reads the following data and sets &lt;code&gt;?label&lt;/code&gt; to the &lt;code&gt;skos:prefLabel&lt;/code&gt; value if it&amp;rsquo;s available and otherwise to the &lt;code&gt;rdfs:label&lt;/code&gt; value:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;@prefix skos: &amp;lt;http://www.w3.org/2004/02/skos/core#&amp;gt; .
@prefix rdfs: &amp;lt;http://www.w3.org/2000/01/rdf-schema#&amp;gt; .
@prefix : &amp;lt;http://rdfdata.org/whatever#&amp;gt; .


:thing1 rdfs:label &amp;quot;Robert&amp;quot;; skos:prefLabel &amp;quot;Bob&amp;quot; . 
:thing2 rdfs:label &amp;quot;Jane&amp;quot;.
:thing3 skos:prefLabel &amp;quot;Frank&amp;quot;.
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Here&amp;rsquo;s the output, using &lt;a href=&#34;http://jena.sourceforge.net/ARQ/&#34;&gt;ARQ&lt;/a&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;-----------
| label   |
===========
| &amp;quot;Frank&amp;quot; |
| &amp;quot;Bob&amp;quot;   |
| &amp;quot;Jane&amp;quot;  |
-----------
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Here&amp;rsquo;s a SPARQL 1.0 version of the query:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;PREFIX skos: &amp;lt;http://www.w3.org/2004/02/skos/core#&amp;gt;
PREFIX rdfs: &amp;lt;http://www.w3.org/2000/01/rdf-schema#&amp;gt;


SELECT ?label    # Bind ?label to  
WHERE { 
  {              # skos:prefLabel if available
    ?s skos:prefLabel ?label . 
  }
  UNION          # and rdfs:label if not. 
  { 
   ?s rdfs:label ?label .
   OPTIONAL { ?s skos:prefLabel ?prefLabel .}
   FILTER (!bound(?prefLabel)) .   
  }
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;It asks for the union of any &lt;code&gt;skos:prefLabel&lt;/code&gt; values and any &lt;code&gt;rdfs:label&lt;/code&gt; values but to filter out any of the latter that have a &lt;code&gt;skos:prefLabel&lt;/code&gt; property for the same subject. The query is verbose, and the FILTER(!bound()) trick is non-intuitive enough to have inspired two nicer substitutes in SPARQL 1.1: MINUS and FILTER NOT EXISTS. Here&amp;rsquo;s the query with MINUS:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;PREFIX skos: &amp;lt;http://www.w3.org/2004/02/skos/core#&amp;gt;
PREFIX rdfs: &amp;lt;http://www.w3.org/2000/01/rdf-schema#&amp;gt;


SELECT ?label    # Bind ?label to  
WHERE { 
  {              # skos:prefLabel if available
    ?s skos:prefLabel ?label . 
  }
  UNION          # and rdfs:label if not. 
  { 
   ?s rdfs:label ?label .
   MINUS { ?s skos:prefLabel ?prefLabel }
  }
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;You could substitute FILTER NOT EXISTS for MINUS there, and it would work the same way with a SPARQL engine that implements 1.1 &lt;a href=&#34;https://www.bobdc.com/blog/trying-sparql-11-new-query-fea&#34;&gt;such as ARQ&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s one less line than the SPARQL 1.0 version, and a bit easier to read, but it&amp;rsquo;s still a verbose way to assign &lt;code&gt;skos:prefLabel&lt;/code&gt; to &lt;code&gt;?label&lt;/code&gt; if it&amp;rsquo;s available and otherwise &lt;code&gt;rdfs:label&lt;/code&gt;. The important thing, though, is that it can be done with standard SPARQL, and that it&amp;rsquo;s a little easier with 1.1.&lt;/p&gt;
&lt;p&gt;Can you improve on this query at all?&lt;/p&gt;
&lt;h2 id=&#34;8-comments&#34;&gt;8 Comments&lt;/h2&gt;
&lt;p&gt;By Damian on &lt;a href=&#34;#comment-2626&#34;&gt;September 22, 2010 10:47 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;[Argh, captcha and validation are killing me]&lt;/p&gt;
&lt;p&gt;What about:&lt;/p&gt;
&lt;p&gt;{&lt;br /&gt;
?s ?p ?o . # or limit to a type&lt;br /&gt;
OPTIONAL { ?s skos:prefLabel ?label . }&lt;br /&gt;
OPTIONAL { ?s rdfs:label ?label . }&lt;br /&gt;
}&lt;/p&gt;
&lt;p&gt;COALESCE would be nicer, however.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.furia.com&#34; title=&#34;http://www.furia.com&#34;&gt;glenn mcdonald&lt;/a&gt; on &lt;a href=&#34;#comment-2627&#34;&gt;September 22, 2010 11:23 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Seems unfortunate that you have to repeat a whole pattern to get this to work, as the pattern you want in a real-world case could be substantially more complicated than this one. Is there a way to get both labels and then LIMIT 1, inside a subquery?&lt;/p&gt;
&lt;p&gt;[In Thread that would be &amp;ldquo;Subject|(.prefLabel,Label:#1)&amp;rdquo;, although there&amp;rsquo;s also a built-in &amp;ldquo;otherwise&amp;rdquo; feature so this could be just &amp;ldquo;Subject|(.prefLabel;Label)&amp;rdquo;.]&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt;&lt;a href=&#34;http://www.snee.com/bobdc.blog&#34;&gt;&lt;img alt=&#34;Author Profile Page&#34; src=&#34;http://www.snee.com/mt-static/images/comment/mt_logo.png&#34; width=&#34;16&#34; height=&#34;16&#34; /&gt;&lt;/a&gt;{.commenter-profile} on &lt;a href=&#34;#comment-2632&#34;&gt;September 22, 2010 12:02 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Damian,&lt;/p&gt;
&lt;p&gt;With ARQ, that gave me &amp;ldquo;Bob&amp;rdquo; twice, so I added ?s and ?p to the select statement and got this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;-------------------------------------------------------------------
| s                                    | p              | label   |
===================================================================
| &amp;lt;http://rdfdata.org/whatever#thing3&amp;gt; | skos:prefLabel | &amp;quot;Frank&amp;quot; |
| &amp;lt;http://rdfdata.org/whatever#thing2&amp;gt; | rdfs:label     | &amp;quot;Jane&amp;quot;  |
| &amp;lt;http://rdfdata.org/whatever#thing1&amp;gt; | skos:prefLabel | &amp;quot;Bob&amp;quot;   |
| &amp;lt;http://rdfdata.org/whatever#thing1&amp;gt; | rdfs:label     | &amp;quot;Bob&amp;quot;   |
-------------------------------------------------------------------
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;That&amp;rsquo;s close to what I was looking for, but obviously there&amp;rsquo;s a problem&amp;ndash;I think rdfs:label bound ?label to &amp;ldquo;Robert&amp;rdquo; and then it got overwritten with &amp;ldquo;Bob&amp;rdquo; so that there are the two &amp;ldquo;Bob&amp;rdquo; results.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://thefigtrees.net/lee/blog&#34; title=&#34;http://thefigtrees.net/lee/blog&#34;&gt;Lee Feigenbaum&lt;/a&gt; on &lt;a href=&#34;#comment-2633&#34;&gt;September 22, 2010 3:04 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Damian&amp;rsquo;s way is the standard way to do it. The only reason you&amp;rsquo;re getting the duplicates is because you&amp;rsquo;re selecting out the predicate as well and selecting for ?s ?p ?o before the optional.&lt;/p&gt;
&lt;p&gt;if you did:&lt;/p&gt;
&lt;p&gt;?s rdf:type &lt;whatever&gt;&lt;/p&gt;
&lt;p&gt;followed by the optionals, you&amp;rsquo;d get a single result for each as expected.&lt;/p&gt;
&lt;p&gt;By Damian on &lt;a href=&#34;#comment-2634&#34;&gt;September 22, 2010 3:42 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Hi Bob,&lt;/p&gt;
&lt;p&gt;The explanation for that is nothing to do with my trick, but rather the ?s ?p ?o business before it.&lt;/p&gt;
&lt;p&gt;For this trick to work you need ?s to be bound, so (for demo purposes) I added ?s ?p ?o. What you&amp;rsquo;re seeing is each triple, plus the (correct) label given the subject. There are two triples with thing1 as a subject, hence &amp;ldquo;Bob&amp;rdquo; is returned twice.&lt;/p&gt;
&lt;p&gt;If you add types to the subjects you can try:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;select ?s ?label
{
?s a skos:Concept .
OPTIONAL { ?s skos:prefLabel ?label . }
OPTIONAL { ?s rdfs:label ?label . }
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;which gives the expected answer.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt;&lt;a href=&#34;http://www.snee.com/bobdc.blog&#34;&gt;&lt;img alt=&#34;Author Profile Page&#34; src=&#34;http://www.snee.com/mt-static/images/comment/mt_logo.png&#34; width=&#34;16&#34; height=&#34;16&#34; /&gt;&lt;/a&gt;{.commenter-profile} on &lt;a href=&#34;#comment-2635&#34;&gt;September 22, 2010 3:58 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Hi Lee,&lt;/p&gt;
&lt;p&gt;That works if I assign rdf:type values to each of the resources in the data file. I assume there&amp;rsquo;s no other way to do it with the data as shown?&lt;/p&gt;
&lt;p&gt;Also, if I do it like this (with the prefLabel pattern first),&lt;/p&gt;
&lt;p&gt;OPTIONAL { ?s rdfs:label ?label . }&lt;br /&gt;
OPTIONAL { ?s skos:prefLabel ?label . }&lt;/p&gt;
&lt;p&gt;?label gets bound to &amp;ldquo;Robert&amp;rdquo;, not &amp;ldquo;Bob&amp;rdquo;, I assume because it was looking for an rdfs:label value first. I didn&amp;rsquo;t realize that the order could be used to control things this way. I just looked through section 6 of the 1.1. Query spec and didn&amp;rsquo;t see anything about this; where can I find something in the spec about the effect of ordering the OPTIONAL clauses?&lt;/p&gt;
&lt;p&gt;thanks,&lt;/p&gt;
&lt;p&gt;Bob&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://thefigtrees.net/lee/blog&#34; title=&#34;http://thefigtrees.net/lee/blog&#34;&gt;Lee Feigenbaum&lt;/a&gt; on &lt;a href=&#34;#comment-2636&#34;&gt;September 22, 2010 4:32 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The order dependence of OPTIONAL clauses is an artifact of the semantics of OPTIONAL (LeftJoin in the algebra).&lt;/p&gt;
&lt;p&gt;Lee&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt;&lt;a href=&#34;http://www.snee.com/bobdc.blog&#34;&gt;&lt;img alt=&#34;Author Profile Page&#34; src=&#34;http://www.snee.com/mt-static/images/comment/mt_logo.png&#34; width=&#34;16&#34; height=&#34;16&#34; /&gt;&lt;/a&gt;{.commenter-profile} on &lt;a href=&#34;#comment-2637&#34;&gt;September 22, 2010 10:25 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;And now I see, Lee, that your &lt;a href=&#34;http://www.thefigtrees.net/lee/sw/sparql-faq#alternative-predicates&#34;&gt;General SPARQL Discussion&lt;/a&gt; and &lt;a href=&#34;http://www.thefigtrees.net/lee/blog/2006/04/sparql_calendar_demo_using_spa.html&#34;&gt;this blog post&lt;/a&gt; that you wrote covered the very issue I was wondering about long ago!&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2010">2010</category>
      
      <category domain="https://www.bobdc.com//categories/skos">SKOS</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
    </item>
    
    <item>
      <title>Semantic technology: more than the tools</title>
      <link>https://www.bobdc.com/blog/semantic-technology-more-than/</link>
      <pubDate>Thu, 16 Sep 2010 09:42:38 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/semantic-technology-more-than/</guid>
      
      
      <description><div>Nice tools, though.</div><div>&lt;p&gt;At one point in the &lt;a href=&#34;http://xmlsummerschool.com/curriculum-2010/semantic-technologies-2010/&#34;&gt;semantic technologies&lt;/a&gt; track of last week&amp;rsquo;s &lt;a href=&#34;http://xmlsummerschool.com&#34;&gt;XML Summer School&lt;/a&gt;, I showed a little application I wrote where you enter the names of two film directors on a &lt;a href=&#34;http://www.snee.com/sparqlforms/commonActors.html&#34;&gt;form&lt;/a&gt;, click the search button, and then see a list of all actors who&amp;rsquo;ve been in movies by both directors. The form calls a CGI script that creates a short SPARQL query, runs it, and generates an HTML page of the results. You can read more about it in the developerWorks article &lt;a href=&#34;http://www.ibm.com/developerworks/xml/library/x-wikiquery/?ca=dgr-lnxw82%20SPARQL-DBpedia&amp;amp;S_TACT=105AGX59&amp;amp;S_CMP=grlnxw82&#34;&gt;Build Wikipedia query forms with semantic technology&lt;/a&gt;; this particular form doesn&amp;rsquo;t use Wikipedia data but &lt;a href=&#34;http://www.imdb.com&#34;&gt;IMDB&lt;/a&gt; data from the &lt;a href=&#34;http://www.linkedmdb.org/&#34;&gt;Linked Movie Database&lt;/a&gt;.&lt;/p&gt;
&lt;blockquote id=&#34;id103360&#34; class=&#34;pullquote&#34;&gt;The semantic web is really about the combination of tools and data.&lt;/blockquote&gt;
&lt;p&gt;The point of the demonstration was that semantic technology isn&amp;rsquo;t about everyone learning SPARQL, but about SPARQL becoming another technology to put behind interfaces such as web forms, just like Javascript and other scripting languages. After I demoed the form, &lt;a href=&#34;http://en.wikipedia.org/wiki/Michael_Kay_(software_engineer)&#34;&gt;Michael Kay&lt;/a&gt; (famous for XSLT and XQuery work in general and the &lt;a href=&#34;http://saxon.sourceforge.net/&#34;&gt;Saxon&lt;/a&gt; processor in particular) told me that if the data had been available in XML he could have done the same thing with XQuery, and he wondered what exactly SPARQL added. I don&amp;rsquo;t remember exactly what I said, but with excellent hindsight I thought of a much better answer the following day.&lt;/p&gt;
&lt;p&gt;Michael and several speakers from other tracks had come to the semantic web track with a reasonable question on their minds: what does this set of tools offer them that other sets of tools don&amp;rsquo;t? A flippant response to his question about XQuery would be &amp;ldquo;if the data had been available in XML? That&amp;rsquo;s an awfully big if!&amp;rdquo; A better answer would be that while some people focus on the tools, and others on the (linked) data, the semantic web is really about the combination of tools and data. If IMDB offered an SQL interface to their data, I could use that to list all the actors who&amp;rsquo;ve worked with two particular directors, but that too is a very big if. A SPARQL endpoint seems to be the most popular way to expose machine-readable data these days, even if the underlying data is relational. The Linked Movie Database offers the data in a SPARQL endpoint, so SPARQL is the query language for retrieving this information, and the sending of the query and the display of the results were easy with a CGI script.&lt;/p&gt;
&lt;p&gt;So the answer to the question &amp;ldquo;what does this set of tools offer that other sets of tools don&amp;rsquo;t&amp;rdquo; is this: lots of great data to query, not to mention easy ways to convert other data formats for use by these tools. If you want to combine this data from multiple sources from across the web and from behind your own firewall, the ease with which the underlying data model lets you aggregate data is another big advantage. So, tools plus data plus ease of aggregation are what the semantic web has to offer, and that&amp;rsquo;s quite a lot.&lt;/p&gt;
&lt;h2 id=&#34;1-comments&#34;&gt;1 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://twitter.com/vpenela&#34; title=&#34;http://twitter.com/vpenela&#34;&gt;Víctor Penela&lt;/a&gt; on &lt;a href=&#34;#comment-2623&#34;&gt;September 16, 2010 12:21 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;A similar application from a friend of mine: &lt;a href=&#34;http://10k.aneventapart.com/Uploads/310/&#34;&gt;http://10k.aneventapart.com/Uploads/310/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Totally agree on the need for more semantically aware apps, where the goal (the application) is the key, and not the means (the semantic technologies).&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2010">2010</category>
      
      <category domain="https://www.bobdc.com//categories/semantic-web">semantic web</category>
      
    </item>
    
    <item>
      <title>Trying SPARQL 1.1 new query features with ARQ</title>
      <link>https://www.bobdc.com/blog/trying-sparql-11-new-query-fea/</link>
      <pubDate>Wed, 18 Aug 2010 09:27:02 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/trying-sparql-11-new-query-fea/</guid>
      
      
      <description><div>Just about all there.</div><div>&lt;p&gt;When I learned that &lt;a href=&#34;http://tech.groups.yahoo.com/group/jena-dev/message/44671&#34;&gt;release 2.8.5 of ARQ&lt;/a&gt; implements all of SPARQL 1.1 Query (&amp;ldquo;except for corner cases of property paths&amp;rdquo;, and Andy Seaborne recently told me that they&amp;rsquo;ve finished up that part) I decided to try out some of the SPARQL 1.1 features, and it was all pretty easy. I used &lt;a href=&#34;https://www.bobdc.com/blog/using-the-arq-sparql-processor&#34;&gt;ARQ from the command line&lt;/a&gt; and went through &lt;a href=&#34;http://www.slideshare.net/LeeFeigenbaum/sparql2-status&#34;&gt;Lee Feigenbaum&amp;rsquo;s slides on the status of SPARQL 1.1&lt;/a&gt; as a checklist of things to try. For sample queries and data I tried to use the examples in the &lt;a href=&#34;http://www.w3.org/TR/2010/WD-sparql11-query-20100601/&#34;&gt;SPARQL 1.1&lt;/a&gt; spec wherever possible, but sometimes expanded on them a bit.&lt;/p&gt;
&lt;h2 id=&#34;id103344&#34;&gt;Projected expressions&lt;/h2&gt;
&lt;p&gt;When I tried &lt;a href=&#34;http://www.slideshare.net/LeeFeigenbaum/sparql2-status/6&#34;&gt;projected expressions&lt;/a&gt; using the &lt;a href=&#34;http://www.w3.org/TR/2010/WD-sparql11-query-20100601/#CreatingValuesWithExpressions&#34;&gt;spec&amp;rsquo;s example&lt;/a&gt; I got an error because of the sample query&amp;rsquo;s use of the fn:concat function, but when I added the following prefix declaration to the query it worked with no problem:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;PREFIX fn: &amp;lt;http://www.w3.org/2005/xpath-functions#&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;h2 id=&#34;id103378&#34;&gt;Aggregates&lt;/h2&gt;
&lt;p&gt;To test &lt;a href=&#34;http://www.slideshare.net/LeeFeigenbaum/sparql2-status/7&#34;&gt;aggregates&lt;/a&gt;, the spec&amp;rsquo;s &lt;a href=&#34;http://www.w3.org/TR/2010/WD-sparql11-query-20100601/#aggregateExample&#34;&gt;example&lt;/a&gt; works, but I added the following two lines to the end of the example&amp;rsquo;s data file so that org2&amp;rsquo;s author had a book with a price greater than 10:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;:auth3 :writesBook :book5 . 
:book5 :price 17 .
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This helped to demonstrate the GROUP BY part of the query better.&lt;/p&gt;
&lt;h2 id=&#34;id103415&#34;&gt;Negation&lt;/h2&gt;
&lt;p&gt;The MINUS keyword and the ability to use the NOT operator with EXISTS both provide a cleaner alternative to the FILTER(!bound(?varName)) trick used in SPARQL 1.0 to make missing values part of the retrieval criteria. Lee has a slide on &lt;a href=&#34;http://www.slideshare.net/LeeFeigenbaum/sparql2-status/9&#34;&gt;MINUS&lt;/a&gt;, but not on NOT EXISTS. I tried NOT EXISTS as well because it&amp;rsquo;s new and is grouped together with MINUS in the spec. The spec even has a subsection on &lt;a href=&#34;http://www.w3.org/TR/2010/WD-sparql11-query-20100601/#neg-notexists-minus&#34;&gt;the relationship and difference between NOT EXISTS and MINUS&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Using ARQ, the spec&amp;rsquo;s examples for both &lt;a href=&#34;http://www.w3.org/TR/2010/WD-sparql11-query-20100601/#neg-minus&#34;&gt;MINUS&lt;/a&gt; and &lt;a href=&#34;http://www.w3.org/TR/2010/WD-sparql11-query-20100601/#neg-notexists&#34;&gt;NOT EXISTS&lt;/a&gt; worked just fine.&lt;/p&gt;
&lt;h2 id=&#34;id103469&#34;&gt;Property paths&lt;/h2&gt;
&lt;p&gt;The spec only has a &lt;a href=&#34;http://www.w3.org/TR/2010/WD-sparql11-query-20100601/#propertypaths&#34;&gt;placeholder for this&lt;/a&gt; for now, but when I made up my own example after looking at Lee&amp;rsquo;s slide it worked fine. Here&amp;rsquo;s my sample data:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;@prefix : &amp;lt;http://rdfdata.org/whatever#&amp;gt; .
:jane :knows :frank . 
:frank :knows :sarah . 
:sarah :knows :steve . 
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The query asks for everyone that jane knows and whoever they know, transitively.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;PREFIX : &amp;lt;http://rdfdata.org/whatever#&amp;gt; 
SELECT ?person 
WHERE {
  :jane :knows+ ?person .
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I may not have perfectly described the semantics of what that plus sign does here (an asterisk is another option), but you get the idea when you see the result:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;----------
| person |
==========
| :frank |
| :sarah |
| :steve |
----------
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This will let SPARQL-based applications do even more without needing an OWL inference engine, especially when used with the rdfs:subClassOf property.&lt;/p&gt;
&lt;h2 id=&#34;id103529&#34;&gt;Federated queries and subqueries&lt;/h2&gt;
&lt;p&gt;SPARQL 1.1 uses the same syntax for these that Jena (the framework behind ARQ) always had as an extension, so I&amp;rsquo;ve demonstrated these before in my &lt;a href=&#34;https://www.bobdc.com/blog/federated-sparql-queries&#34;&gt;Federated SPARQL queries&lt;/a&gt; blog entry of last January.&lt;/p&gt;
&lt;h2 id=&#34;id103552&#34;&gt;Time to play more with SPARQL 1.1&lt;/h2&gt;
&lt;p&gt;SPARQL 1.1 is no longer just a specification and something to debate about, but something we can actually play with, so go and do so and let the SPARQL Working Group know what you think. (See the paragraph beginning &amp;ldquo;Comments on this document&amp;hellip;&amp;rdquo; in the &lt;a href=&#34;http://www.w3.org/TR/2010/WD-sparql11-query-20100601/&#34;&gt;Working Draft&lt;/a&gt;.) Personally, I&amp;rsquo;d prefer to see the spec include &lt;a href=&#34;http://www.w3.org/2009/sparql/wiki/Feature:Assignment&#34;&gt;variable assignment&lt;/a&gt;, which are &lt;a href=&#34;http://jena.sourceforge.net/ARQ/assignment.html&#34;&gt;already part of Jena and ARQ&lt;/a&gt; (and Open Anzo) as an extension. I know that SPARQL 1.1&amp;rsquo;s new projected expressions and subqueries features can be combined for a similar effect, but that&amp;rsquo;s going to be pretty verbose. What do you think?&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2010">2010</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
    </item>
    
    <item>
      <title>Converting CSV to RDF</title>
      <link>https://www.bobdc.com/blog/converting-csv-to-rdf/</link>
      <pubDate>Wed, 11 Aug 2010 08:22:22 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/converting-csv-to-rdf/</guid>
      
      
      <description><div>The simplest way yet.</div><div>&lt;p&gt;There are probably dozens of ways to convert comma-separated values to parsable RDF, but I recently came up with one that was so simple that I wanted to share it.&lt;/p&gt;
&lt;p&gt;Here is a sample CSV list:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;quot;red&amp;quot; , &amp;quot;blue&amp;quot;, &amp;quot;gray&amp;quot;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;If I put the following before it and a period after it,&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;@prefix : &amp;lt;http://rdfdata.org/csv#&amp;gt; . :csvList :item 
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I get this: parsable RDF using the Turtle syntax.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;@prefix : &amp;lt;http://rdfdata.org/csv#&amp;gt; . :csvList :item &amp;quot;red&amp;quot; , &amp;quot;blue&amp;quot;, &amp;quot;gray&amp;quot; . 
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;That&amp;rsquo;s it. It works as a single line like that, but it&amp;rsquo;s easier for human eyes to read if you look at it as an abbreviated version of the following:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;@prefix : &amp;lt;http://rdfdata.org/csv#&amp;gt; .  
:csvList :item &amp;quot;red&amp;quot; .
:csvList :item &amp;quot;blue&amp;quot; .
:csvList :item &amp;quot;gray&amp;quot; .
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Or, &amp;ldquo;the csvList resource has &amp;lsquo;red&amp;rsquo;, &amp;lsquo;blue&amp;rsquo;, and &amp;lsquo;gray&amp;rsquo; as item property values&amp;rdquo;.&lt;/p&gt;
&lt;p&gt;I just made up the URI, subject, and predicate. Your next step would probably be to use SPARQL to convert them to something more appropriate to your application.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve used the semicolon in Turtle and SPARQL many times to avoid repeating a triple&amp;rsquo;s subject for multiple triples. I&amp;rsquo;ve used the comma, which delimits a list of objects that go with the same subject and predicate, less often, and it&amp;rsquo;s the key to the trick here: that a CSV list is already a &lt;a href=&#34;http://www.w3.org/TeamSubmission/turtle/#groups&#34;&gt;part of Turtle syntax&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Converting CSV data to RDF in just about any programming language would be a very short script, and it&amp;rsquo;s easy enough with products such as TopBraid Composer, so I&amp;rsquo;m not interested in accumulating a list of other ways to do it here, unless you can beat mine for simplicity. I just thought it was neat that something as simple as prepending a short string and appending a period would turn a CSV list into legal, parsable RDF.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://www.flickr.com/photos/jacksnell707/3485929405/&#34;&gt;&lt;img id=&#34;id103304&#34; src=&#34;https://c2.staticflickr.com/4/3664/3485929405_046e5f1d20.jpg&#34; border=&#34;0&#34; style=&#34;display: block;margin-left: auto;margin-right: auto &#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;1971 Chevrolet Nova (Custom) &#39;937 CSV&#39; 1&#34; width=&#34;280px&#34;/&gt;&lt;/a&gt; (photo: &lt;a href=&#34;http://www.flickr.com/photos/jacksnell707/&#34;&gt;http://www.flickr.com/photos/jacksnell707/&lt;/a&gt; / &lt;a href=&#34;http://creativecommons.org/licenses/by-nc-sa/2.0/deed.en&#34;&gt;CC BY-NC-SA 2.0)&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&#34;7-comments&#34;&gt;7 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.furia.com&#34; title=&#34;http://www.furia.com&#34;&gt;glenn mcdonald&lt;/a&gt; on &lt;a href=&#34;#comment-2599&#34;&gt;August 11, 2010 10:38 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Cute! Sadly, CSV escapes internal quotes by doubling them, and Turtle requires them to be escaped as \&amp;quot;, so this trick will only work if you have no quotes in your data.&lt;/p&gt;
&lt;p&gt;Of course, it also totally fails to capture any of the semantics of table rows/columns/cells, so it&amp;rsquo;s not like you were going to use it for real!&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob&lt;/a&gt; on &lt;a href=&#34;#comment-2600&#34;&gt;August 11, 2010 11:43 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Glenn,&lt;/p&gt;
&lt;p&gt;You&amp;rsquo;re assuming that spreadsheets are the only source of CSV. I&amp;rsquo;ve already used this trick for real, when I was passing a few values from a Javascript script to a SPARQL query that was acting on RDF data combined from several sources.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.furia.com&#34; title=&#34;http://www.furia.com&#34;&gt;glenn mcdonald&lt;/a&gt; on &lt;a href=&#34;#comment-2601&#34;&gt;August 11, 2010 1:28 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Good point. Replace the word &amp;ldquo;real&amp;rdquo; in my comment with &amp;ldquo;whole spreadsheets&amp;rdquo;!&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.linkedopenservices.org/&#34; title=&#34;http://www.linkedopenservices.org/&#34;&gt;Barry Norton&lt;/a&gt; on &lt;a href=&#34;#comment-2737&#34;&gt;December 9, 2010 4:31 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Bob, that&amp;rsquo;s neither a list in CSV nor RDF/Turtle.&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s a row in CSV (being picky, but I&amp;rsquo;ll explain why in a second), but more importantly you&amp;rsquo;ve created a set in RDF, not a list.&lt;/p&gt;
&lt;p&gt;The Turtle list syntax would be:&lt;/p&gt;
&lt;p&gt;:csvRow rdf:value (&amp;ldquo;red&amp;rdquo; &amp;ldquo;blue&amp;rdquo; &amp;ldquo;grey&amp;rdquo;)&lt;/p&gt;
&lt;p&gt;Why does that matter? Because, as much as I agree with your &amp;ldquo;next step would probably be to use SPARQL to convert them to something more appropriate to your application&amp;rdquo; (it&amp;rsquo;s what we do in the JSON2RDF approach of Linked Open Services), you&amp;rsquo;ve lost the structure and can&amp;rsquo;t differentiate between columns in a graph pattern.&lt;/p&gt;
&lt;p&gt;Jumping back to the comment about CSV rows, this is clearer if (instead of having homogeneous data across columns in your source), you had something like:&lt;/p&gt;
&lt;p&gt;&amp;ldquo;red&amp;rdquo;, &amp;ldquo;FF0000&amp;rdquo;&lt;br /&gt;
&amp;ldquo;green&amp;rdquo;, &amp;ldquo;00FF00&amp;rdquo;&lt;br /&gt;
&amp;ldquo;blue&amp;rdquo;, &amp;ldquo;0000FF&amp;rdquo;&lt;br /&gt;
&amp;ldquo;yellow&amp;rdquo;, &amp;ldquo;FFFF00&amp;rdquo;&lt;/p&gt;
&lt;p&gt;You could project this into a list of lists:&lt;/p&gt;
&lt;p&gt;((&amp;ldquo;red&amp;rdquo; &amp;ldquo;FF0000&amp;rdquo;)&lt;br /&gt;
(&amp;ldquo;green&amp;rdquo; &amp;ldquo;00FF00&amp;rdquo;)&lt;br /&gt;
(&amp;ldquo;blue&amp;rdquo; &amp;ldquo;0000FF&amp;rdquo;)&lt;br /&gt;
(&amp;ldquo;yellow&amp;rdquo; &amp;ldquo;FFFF00&amp;rdquo;))&lt;/p&gt;
&lt;p&gt;(A valid Turtle doc being:&lt;/p&gt;
&lt;p&gt;Then could could actually make a construct like:&lt;/p&gt;
&lt;p&gt;CONSTRUCT&lt;br /&gt;
{?item rdfs:label ?colour; rdf:value ?code}&lt;br /&gt;
WHERE&lt;br /&gt;
{[rdf:first ?item] .&lt;br /&gt;
?item rdf:first ?colour; rdf:rest [rdf:first ?code]}&lt;/p&gt;
&lt;p&gt;Leading to:&lt;br /&gt;
[rdfs:label &amp;ldquo;red&amp;rdquo;; rdf:value &amp;ldquo;FF0000&amp;rdquo;] .&lt;br /&gt;
[rdfs:label &amp;ldquo;green&amp;rdquo;; rdf:value &amp;ldquo;00FF00&amp;rdquo;] .&lt;br /&gt;
[rdfs:label &amp;ldquo;blue&amp;rdquo;; rdf:value &amp;ldquo;0000FF&amp;rdquo;] .&lt;br /&gt;
[rdfs:label &amp;ldquo;yellow&amp;rdquo;; rdf:value &amp;ldquo;FFFF00&amp;rdquo;]&lt;/p&gt;
&lt;p&gt;&lt;br /&gt;
Ideally, of course, rather than these being blank nodes you would reuse or mint a URI scheme for them, but this requires two new features of SPARQL 1.1 to include in the query.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob&lt;/a&gt; on &lt;a href=&#34;#comment-2738&#34;&gt;December 9, 2010 8:05 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Thanks Barry, that&amp;rsquo;s interesting.&lt;/p&gt;
&lt;p&gt;What I did worked for my needs&amp;ndash;it wasn&amp;rsquo;t just a demo, but something in an actual application I was developing for a client&amp;ndash;but I appreciate the clarification of terminology.&lt;/p&gt;
&lt;p&gt;By Barry Norton on &lt;a href=&#34;#comment-2739&#34;&gt;December 9, 2010 8:29 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;No problems. Actually I only realised this was so long ago after I posted. I think you were in the thread about tools to achieve this, including Google Refine?&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt;&lt;a href=&#34;http://www.snee.com/bobdc.blog&#34;&gt;&lt;img alt=&#34;Author Profile Page&#34; src=&#34;http://www.snee.com/mt-static/images/comment/mt_logo.png&#34; width=&#34;16&#34; height=&#34;16&#34; /&gt;&lt;/a&gt;{.commenter-profile} on &lt;a href=&#34;#comment-2740&#34;&gt;December 10, 2010 8:46 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;No, I wasn&amp;rsquo;t really looking for extra tools. I just had to hand off a bit of data from some Javascript to TopBraid Composer and was looking for the simplest way to represent it as parsable triples, and I thought it was neat how simple it turned out to be. Neat enough to blog it&amp;hellip;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2010">2010</category>
      
      <category domain="https://www.bobdc.com//categories/rdf">RDF</category>
      
    </item>
    
    <item>
      <title>Jazz camp</title>
      <link>https://www.bobdc.com/blog/jazz-camp/</link>
      <pubDate>Tue, 27 Jul 2010 09:04:29 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/jazz-camp/</guid>
      
      
      <description><div>Theory and practice.</div><div>&lt;p&gt;I hadn&amp;rsquo;t planned on writing here about my experience at the &lt;a href=&#34;http://www.summerjazzworkshops.com/about.asp&#34;&gt;Jamey Aebersold Jazz Camp&lt;/a&gt; held at the University of Louisville in Kentucky in the second week of July, but it&amp;rsquo;s so easy to summarize the key lessons I learned about soloing that I thought I&amp;rsquo;d jot them down after all:&lt;/p&gt;
&lt;img id=&#34;id103317&#34; src=&#34;https://www.bobdc.com/img/main/PatHarbison2010Combo.jpg&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;Pat Harbison&#39;s 2010 Jamey Aebersold combo&#34;/&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;The best way to get to where the music in your head comes out of your fingers as you think of it is to is to record yourself singing a solo along with a given chord progression (a.k.a. scat singing), transcribe what you wrote onto music paper, learn it on your instrument, and then repeat.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Get more comfortable with the &lt;a href=&#34;http://en.wikipedia.org/wiki/Bebop_scale&#34;&gt;bebop scales&lt;/a&gt;. These are common scales with an extra note added so that if you play a long string of eighth notes and include that extra note as you go up and down the scale, you&amp;rsquo;re more likely to hit a chord tone on each downbeat. As the name implies, the technique was developed by the main musicians of the early bebop era.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Memorize the melodies of classic bebop tunes, especially the parts over the &lt;a href=&#34;http://en.wikipedia.org/wiki/Ii-V-I_turnaround&#34;&gt;ii-V-I&lt;/a&gt; chord sequences (or even just the ii-V parts) so that you can use those licks over other ii-V-I chord sequences as they come up. They come up a lot.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Of course none of these are magic wands, but instead guidelines to productive ways to spend your practice time, and I have a lot of practicing to do. Bass players spend a lot of time practicing things to play behind the solos of other players, and my attempts at solos don&amp;rsquo;t measure up to what the horn, piano, and guitar players that I play with are doing. I think I have a better idea how to catch up now.&lt;/p&gt;
&lt;p&gt;The camp is like an intensive week of music school, where you start each day with a theory class, and then have master classes with others who play your instrument, rehearsals with the combo you&amp;rsquo;ve been assigned to, and more classes. Placement in a theory class and combo depend on a written test and audition you do upon arrival. The faculty has great and important players on each instrument, and each evening ended with a concert over two hours long of various groups of faculty members.&lt;/p&gt;
&lt;p&gt;Each combo has a faculty member assigned to oversee them. I was happy that mine had trumpet player &lt;a href=&#34;http://www.patharbison.com/&#34;&gt;Pat Harbison&lt;/a&gt;, because as a horn player he put together some cool background harmonies for the group&amp;rsquo;s horn section to play behind the vocals and non-horn solos. I&amp;rsquo;ve put &lt;a href=&#34;http://www.flickr.com/photos/bobdc/sets/72157624446726931/&#34;&gt;pictures&lt;/a&gt; of Pat&amp;rsquo;s combo on flickr and a recording of our recital on my &lt;a href=&#34;http://www.myspace.com/bobdc&#34;&gt;MySpace page&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The attendees are an interesting mix—this year roughly half of the 300 or so people were under 21. The picture of our combo is pretty representative (unfortunately, our singer is not in that picture, but you can see her in the flickr pictures). You don&amp;rsquo;t see many people in their twenties or thirties, because it&amp;rsquo;s mostly teenagers and those of us closer to middle-age. Staying in a dorm was a bit of a pain, but cost a lot less than a hotel. My roommate was an alto player from Puerto Rico who is tired of playing salsa, and he&amp;rsquo;s played with some big names such as &lt;a href=&#34;http://en.wikipedia.org/wiki/El_Gran_Combo_de_Puerto_Rico&#34;&gt;El Gran Combo&lt;/a&gt;. (He only played with them as a sub; apparently, to get a full-time position, you have to wait for someone who plays your instrument in the band to die.) I had heard of El Gran Combo before, but never listened to them much, and have been doing so on &lt;a href=&#34;http://listen.grooveshark.com/&#34;&gt;GrooveShark&lt;/a&gt;, my new favorite music site. I highly recommend them.&lt;/p&gt;
&lt;p&gt;On the last day, recital day, the 30 or so combos each play one song in one of two recital halls. It was fun to go back and forth and see different combinations of young and old attendees that I&amp;rsquo;d met during the week.&lt;/p&gt;
&lt;p&gt;During the eight-hour drive back, I noticed that if you go south from the 70 miles of I-64 between Louisville and Lexington, many major bourbon distilleries are grouped together there. At first I thought this was an interesting coincidence, but then I remembered that bourbon is named for the Kentucky county where distillers first started to age their corn liquor in charred oak casks, and I was there (more or less—the boundaries have changed over the years). In the nineteenth century, drinkers down the river in New Orleans liked it so much more than the unaged corn whiskey from other places that they started requesting the Bourbon whiskey. I learned a lot more during my tour of the &lt;a href=&#34;http://www.wildturkey.com/&#34;&gt;Wild Turkey&lt;/a&gt; distillery, but skipped the free samples at the end of the tour because of the seven hours of driving I still had ahead of me. It was still nice to get my mind off of scales and chords.&lt;/p&gt;
&lt;h2 id=&#34;5-comments&#34;&gt;5 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://culinarygraphics.blogspot.com/&#34; title=&#34;http://culinarygraphics.blogspot.com/&#34;&gt;Julie&lt;/a&gt; on &lt;a href=&#34;#comment-2592&#34;&gt;July 27, 2010 9:40 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I can&amp;rsquo;t believe you couldn&amp;rsquo;t do the tasting at the distilleries! Probably wouldn&amp;rsquo;t be worth the extra hotel night so I&amp;rsquo;m glad your birthday present was waiting at home for you. Not one mention of what you ate, not very DuCharme of you.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://amundsen.com/blog/&#34; title=&#34;http://amundsen.com/blog/&#34;&gt;Mike Amundsen&lt;/a&gt; on &lt;a href=&#34;#comment-2593&#34;&gt;July 27, 2010 10:27 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;i remember working w/ Aebersold audio cassette tapes *years* ago; i even think i had a few of his LPs to practice against!&lt;/p&gt;
&lt;p&gt;brings back fond memories.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob&lt;/a&gt; on &lt;a href=&#34;#comment-2594&#34;&gt;July 27, 2010 10:39 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;At one point during the week I suggested that a turntablist with one of these LPs would make an excellent addition to a recital combo, although I&amp;rsquo;m sure it would have given Mr. Aebersold a heart attack.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.ccil.org/~cowan&#34; title=&#34;http://www.ccil.org/~cowan&#34;&gt;John Cowan&lt;/a&gt; on &lt;a href=&#34;#comment-2595&#34;&gt;July 27, 2010 12:24 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Lots of French in this: Louisville, Bourbon, Orleans, and of course du Charme.&lt;/p&gt;
&lt;p&gt;By Harold Carr on &lt;a href=&#34;#comment-2596&#34;&gt;July 27, 2010 11:52 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;What instrument do you play?&lt;/p&gt;
&lt;p&gt;I play bass. Next week I&amp;rsquo;ll be at:&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://www.stanfordjazz.org/education/jazzresidency.html&#34;&gt;http://www.stanfordjazz.org/education/jazzresidency.html&lt;/a&gt;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2010">2010</category>
      
      <category domain="https://www.bobdc.com//categories/music">music</category>
      
    </item>
    
    <item>
      <title>Replace Facebook with FOAF &#43; twitter &#43; ?</title>
      <link>https://www.bobdc.com/blog/replace-facebook-with-foaf-twi/</link>
      <pubDate>Thu, 17 Jun 2010 17:54:31 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/replace-facebook-with-foaf-twi/</guid>
      
      
      <description><div>Making the first connection.</div><div>&lt;img id=&#34;id103299&#34; src=&#34;https://www.bobdc.com/img/main/twitter2foaf.jpg&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;twitter to FOAF image&#34;/&gt;
&lt;p&gt;If we replaced Facebook with a decentralized collection of cooperating services that provide a similar collection of features, obvious candidates for some of these services are FOAF files, twitter, and flickr, but what would coordinate those services? Some have APIs and can store information that lets you make connections between the different services, so I wrote something to make one of those connections.&lt;/p&gt;
&lt;p&gt;Twitter and related services such as identi.ca do an excellent job of replacing Facebook&amp;rsquo;s status updates. FOAF files were supposed to be the RDF geek&amp;rsquo;s ideal way to track friend networks, and while FOAF is probably the most popular vocabulary outside of Dublin Core, actual FOAF files have been used for little more than demos. If we could integrate them into this Facebook-like collection of services that I&amp;rsquo;ve been thinking about, they could become more practically useful, so I came up with a way to find someone&amp;rsquo;s FOAF file based on their twitter ID.&lt;/p&gt;
&lt;h2 id=&#34;id103345&#34;&gt;Looking up a FOAF file using a twitter ID&lt;/h2&gt;
&lt;p&gt;Twitter lets you specify a home page address as part of your profile, and twitter&amp;rsquo;s API can easily find out someone&amp;rsquo;s home page URL. A little RDFa in your home page can point to your FOAF file, and then a short script can take someone&amp;rsquo;s twitter ID, find out their home page URL, and then find out the FOAF file URL from the RDFa in that home page. I wrote a service that takes a twitter ID and returns the FOAF file URL, which you can try yourself with the URL &lt;a href=&#34;http://www.rdfdata.org/cgi/twitterName2FOAFFilename.cgi?twitterID=bobdc&#34;&gt;http://www.rdfdata.org/cgi/twitterName2FOAFFilename.cgi?twitterID=bobdc&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Step one is adding the following RDFa to the body your home page, substituting your own home page and FOAF file URLs:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;lt;div xmlns:foaf=&amp;quot;http://xmlns.com/foaf/0.1/&amp;quot; typeof=&amp;quot;foaf:Person&amp;quot;&amp;gt; 
  &amp;lt;span rel=&amp;quot;foaf:homepage&amp;quot; href=&amp;quot;http://www.snee.com/bob&amp;quot;&amp;gt;&amp;lt;/span&amp;gt; 
  &amp;lt;span rel=&amp;quot;foaf:page&amp;quot; href=&amp;quot;http://www.snee.com/bob/foaf.rdf&amp;quot;&amp;gt;&amp;lt;/span&amp;gt; 
&amp;lt;/div&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;It basically says &amp;ldquo;the Person with the following home page (which should be the same one named as your home page in your twitter profile) has the following FOAF file&amp;rdquo;. There&amp;rsquo;s nothing twitter-specific here, so the small number of triples that this embeds in your home page could serve many other uses, and of course you can add other information about the home page owner being described.&lt;/p&gt;
&lt;p&gt;Once you&amp;rsquo;ve done this, you should be able to substitute your twitter account name for bobdc in the rdfdata.org URL above and see it return your FOAF file. (If you decide to revise the home page value in your twitter account before testing this, I&amp;rsquo;ve found that the data used by their API is not as recent as you would hope, so testing this with a revised home page value may require some patience.)&lt;/p&gt;
&lt;p&gt;You can try the relevant twitter API call &lt;a href=&#34;http://apiwiki.twitter.com/Twitter-REST-API-Method:-users%C2%A0show&#34;&gt;users show&lt;/a&gt; with &lt;a href=&#34;http://www.theonion.com/&#34;&gt;The Onion&lt;/a&gt; twitter account using the URL &lt;a href=&#34;http://api.twitter.com/1/users/show/theOnion.xml&#34;&gt;http://api.twitter.com/1/users/show/theOnion.xml&lt;/a&gt;. In the returned XML, the &lt;code&gt;url&lt;/code&gt; element holds the user&amp;rsquo;s home page URL. To parse and query the triples in the RDFa, I used the technique described in my &lt;a href=&#34;https://www.bobdc.com/blog/restful-sparql-queries-of-rdfa&#34;&gt;last blog posting&lt;/a&gt;, except that this time I used the excellent &lt;a href=&#34;http://www.dotnetrdf.org/demos/leviathan/&#34;&gt;http://www.dotnetrdf.org/demos/leviathan/&lt;/a&gt; service to both parse the RDFa and query the triples, because sparql.org was down for much of this week.&lt;/p&gt;
&lt;p&gt;If you do add this RDFa to your home page and the service works for you, let others know by tweeting &amp;ldquo;I&amp;rsquo;ve connected my twitter account to my FOAF file #twitter2foaf&amp;rdquo;.&lt;/p&gt;
&lt;h2 id=&#34;id103466&#34;&gt;Adding more Facebook-like features&lt;/h2&gt;
&lt;p&gt;Flickr lets you specify a home page as part of your profile, but I didn&amp;rsquo;t see a way to find this value using &lt;a href=&#34;http://www.flickr.com/services/api/&#34;&gt;flickr&amp;rsquo;s API&lt;/a&gt;, which is about as simple and straightforward as twitter&amp;rsquo;s. Theoretically, the same idea should work with Facebook itself, because they have an API, but their API looks like a real pain compared to the others, and they don&amp;rsquo;t have a dedicated profile field to let a user name a home page outside of Facebook—funny thing!&lt;/p&gt;
&lt;p&gt;If we can use tweets to send money to each other, it shouldn&amp;rsquo;t be that difficult to establish a twitter convention to &amp;ldquo;foaf:friend&amp;rdquo; someone—that is, to tweet &amp;ldquo;I&amp;rsquo;ll add you to my FOAF file at http://my/path/foaf.rdf if you&amp;rsquo;ll add me to yours at http://your/path/foaf.rdf&amp;rdquo;. Automating the addition would be a bit more work, but it&amp;rsquo;s not insurmountable.&lt;/p&gt;
&lt;p&gt;A Google search for &lt;a href=&#34;http://www.google.com/search?q=%22the+next+facebook%22&#34;&gt;&amp;ldquo;the next facebook&amp;rdquo;&lt;/a&gt; gets hundreds of thousands of hits. I&amp;rsquo;d love to see people worry less about replacing Facebook services and more about developing technology to connect existing services into something that can substitute for Facebook. It&amp;rsquo;s great to see how people like Henry Story, with his &lt;a href=&#34;http://esw.w3.org/Foaf%2Bssl&#34;&gt;FOAF+SSL&lt;/a&gt; work, are doing just that, and building on semantic web standards to make it possible.&lt;/p&gt;
&lt;h2 id=&#34;6-comments&#34;&gt;6 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://larlet.com&#34; title=&#34;http://larlet.com&#34;&gt;David Larlet&lt;/a&gt; on &lt;a href=&#34;#comment-2558&#34;&gt;June 18, 2010 6:57 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I agree with you.&lt;/p&gt;
&lt;p&gt;Note that you should accept FOAF files defined in link/meta too (like mine: ), I was a bit surprised that it doesn&amp;rsquo;t work with my twitter ID :)&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob&lt;/a&gt; on &lt;a href=&#34;#comment-2562&#34;&gt;June 18, 2010 10:03 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;David,&lt;/p&gt;
&lt;p&gt;Sorry about that. There are different ways of expressing a FOAF value in a home page, and this looks for explicit triples that say that a subject who has this homepage also has a particular FOAF file.&lt;/p&gt;
&lt;p&gt;Bob&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://larlet.com&#34; title=&#34;http://larlet.com&#34;&gt;David Larlet&lt;/a&gt; on &lt;a href=&#34;#comment-2563&#34;&gt;June 18, 2010 11:01 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Bob,&lt;/p&gt;
&lt;p&gt;No problem about that, it was more a suggestion than a feature request.&lt;/p&gt;
&lt;p&gt;David&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.krisvandenbergh.eu/&#34; title=&#34;http://www.krisvandenbergh.eu/&#34;&gt;Kris Van den Bergh&lt;/a&gt; on &lt;a href=&#34;#comment-2567&#34;&gt;June 22, 2010 1:35 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Hi Bob,&lt;/p&gt;
&lt;p&gt;Great post! Very interesting stuff. Managing your friends in a decentralized way would be very cool. You already hinted at subsequent steps.&lt;/p&gt;
&lt;p&gt;I was thinking how to &amp;ldquo;FOAF friend&amp;rdquo; someone via Twitter. Let&amp;rsquo;s say we write a twitter app. Do you agree that: 1) both parties should have installed the app. 2) Their FOAF files should be writable on their servers. I don&amp;rsquo;t see another way of doing this. The app itself could use a technology like SPARQL Push.&lt;/p&gt;
&lt;p&gt;Am very eager to hear your thoughts!&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob&lt;/a&gt; on &lt;a href=&#34;#comment-2568&#34;&gt;June 22, 2010 1:54 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Hi Kris,&lt;/p&gt;
&lt;p&gt;1) I think that&amp;rsquo;s asking too much. It should be driven by a protocol that can have multiple implementations.&lt;/p&gt;
&lt;p&gt;2) I&amp;rsquo;ve been thinking about this and here&amp;rsquo;s my idea: allowing write access to our individual FOAF files on our individual servers is a tall order, although as I mentioned FOAF+SSL might help. Something easier would be FOAF storage services that understood the request and probably still needed FOAF+SSL to know when there was permission to write to your data.&lt;/p&gt;
&lt;p&gt;What keeps a given service from becoming the new Facebook, with too much centralized control over everyone&amp;rsquo;s data, is that you should be able to download your FOAF file from them and upload it to and use it on another hosting service with minimal trouble. The file on your server would just redirect to the URL used by the service as its identifier for your information.&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s easy to download everything you&amp;rsquo;ve entered into del.icio.us; imagine if that was in a standardized format that you could upload that to a new service and use the same way. I picture FOAF file hosting service working like that.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://eurweb.blogspot.com/&#34; title=&#34;http://eurweb.blogspot.com/&#34;&gt;Yuriy&lt;/a&gt; on &lt;a href=&#34;#comment-2602&#34;&gt;August 14, 2010 3:56 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I think that&amp;rsquo;s&lt;br /&gt;
ligament vCard + XFN (in RDF format) are better as FOAF.&lt;br /&gt;
OpenSocial not support FOAF.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2010">2010</category>
      
      <category domain="https://www.bobdc.com//categories/rdf">RDF</category>
      
    </item>
    
    <item>
      <title>RESTful SPARQL queries of RDFa</title>
      <link>https://www.bobdc.com/blog/restful-sparql-queries-of-rdfa/</link>
      <pubDate>Thu, 03 Jun 2010 10:02:31 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/restful-sparql-queries-of-rdfa/</guid>
      
      
      <description><div>No local parsing or querying software needed.</div><div>&lt;p&gt;Facebook&amp;rsquo;s OpenGraph, Google&amp;rsquo;s Rich Snippets, BestBuy&amp;rsquo;s use of the GoodRelations vocabulary and other recent events are boosting RDFa&amp;rsquo;s popularity for storing machine-readable data in web pages. There are several tools and programming libraries available (not to mention built-in features of development platforms such as TopQuadrant&amp;rsquo;s TopBraid Suite for application development) that let you extract the RDF triples from this RDFa markup and use it, but I recently discovered how easily I can extract this data and perform SPARQL queries on it by just using publicly available, RESTful web services. The web page where the RDFa is embedded doesn&amp;rsquo;t even have to be well-formed HTML.&lt;/p&gt;
&lt;h2 id=&#34;id103333&#34;&gt;Getting the RDF triples out of the RDFa&lt;/h2&gt;
&lt;blockquote id=&#34;id103338&#34; class=&#34;pullquote&#34;&gt;I can say &#34;extract the RDF triples from the RDFa on that web page and then run this SPARQL query against it&#34; *all with a single URL.* &lt;/blockquote&gt;
&lt;p&gt;The W3C&amp;rsquo;s RDFa Distiller and Parser at &lt;a href=&#34;http://www.w3.org/2007/08/pyRdfa/&#34;&gt;http://www.w3.org/2007/08/pyRdfa/&lt;/a&gt; has a form that lets you enter the URL of a web page and set various parameters before clicking the &amp;ldquo;Go!&amp;rdquo; button to see the triples stored in that web page. Once you do this, you&amp;rsquo;ll see the RDF on your browser (a View Source may be necessary) and you&amp;rsquo;ll also see, in your browser&amp;rsquo;s navigation toolbar, the REST URL you would use to have the same program extract the triples without you filling out the form first. (As the page tells you, &amp;ldquo;If you intend to use this service regularly on large scale, consider downloading the package and use it locally.&amp;rdquo;)&lt;/p&gt;
&lt;p&gt;For example, if you go to this form and enter the URL of TopQuadrant&amp;rsquo;s products web page (&lt;a href=&#34;http://www.topquadrant.com/products/TB_Suite.html&#34;&gt;http://www.topquadrant.com/products/TB_Suite.html&lt;/a&gt;), leaving all the other parameters at their default settings, clicking the &amp;ldquo;Go!&amp;rdquo; button will get you RDF/XML of the triples and, in the navigation toolbar, the URL used to retrieve them. I trimmed a few parameters off the URL and entered this shortened version directly into the browser, and it worked: &lt;code&gt;http://www.w3.org/2007/08/pyRdfa/extract?uri=http%3A%2F%2Fwww.topquadrant.com%2Fproducts%2FTB_Suite.html&amp;amp;format=pretty-xml&lt;/code&gt;. I&amp;rsquo;ll come back to this URL below.&lt;/p&gt;
&lt;h2 id=&#34;id103411&#34;&gt;Querying the RDF&lt;/h2&gt;
&lt;p&gt;The sparql.org &lt;a href=&#34;http://www.sparql.org/sparql.html&#34;&gt;SPARQLer&lt;/a&gt; web form lets you enter a SPARQL query, specify a set of RDF to query and the return format, and then retrieve the result. For example, when I specify my FOAF file at &lt;a href=&#34;http://www.snee.com/bob/foaf.rdf&#34;&gt;http://www.snee.com/bob/foaf.rdf&lt;/a&gt; as the data to query and the following as the query, the SPARQLer lists my name and airport code, because I&amp;rsquo;m the only person in my FOAF file with both pieces of information:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;PREFIX foaf: &amp;lt;http://xmlns.com/foaf/0.1/&amp;gt;
PREFIX air: &amp;lt;http://www.megginson.com/exp/ns/airports#&amp;gt;
SELECT ?personName ?airportCode WHERE {
  ?person foaf:name ?personName ; 
          foaf:nearestAirport ?airport . 
  ?airport air:iata ?airportCode . 
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;If you pick a non-default output format at the bottom of that form, then instead of the results being displayed in your browser, they may get saved to your disk. When doing this as a RESTful call (for example, when using &lt;a href=&#34;http://www.gnu.org/software/wget/&#34;&gt;wget&lt;/a&gt; or &lt;a href=&#34;http://curl.haxx.se/&#34;&gt;curl&lt;/a&gt;) note the &lt;code&gt;&amp;amp;output=&lt;/code&gt; parameter in the URL and experiment with other settings besides the default of XML.&lt;/p&gt;
&lt;p&gt;My foaf.rdf file is a static text file sitting on disk, but here&amp;rsquo;s the cool part: I can enter any URI as the resource to query, as long as it identifies parsable RDF—for example, the URL above that gets RDF/XML out of the TopQuadrant products page.&lt;/p&gt;
&lt;h2 id=&#34;id103490&#34;&gt;Putting it together&lt;/h2&gt;
&lt;p&gt;The TopQuadrant products page uses mostly the &lt;a href=&#34;http://www.heppnetz.de/ontologies/goodrelations/v1&#34;&gt;GoodRelations&lt;/a&gt; vocabulary and the Yahoo! Searchmonkey &lt;a href=&#34;http://developer.yahoo.com/searchmonkey/smguide/profile_vocab.html&#34;&gt;Product&lt;/a&gt; vocabularies. (RDFa on other pages of the website use other mixes of different vocabularies as appropriate; let&amp;rsquo;s not take for granted how easy RDF makes it to do this.) If I want to use SPARQL to get a list of product names and descriptions from that page, I can take the URL above that extracts triples from the RDFa in the TopBraid products page, enter it as the &amp;ldquo;Target graph URI&amp;rdquo; value on the SPARQLer form, and put the following query into that form&amp;rsquo;s &amp;ldquo;General SPARQL query&amp;rdquo; box:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;PREFIX gr: &amp;lt;http://purl.org/goodrelations/v1#&amp;gt;
PREFIX rdfs: &amp;lt;http://www.w3.org/2000/01/rdf-schema#&amp;gt;
PREFIX sm: &amp;lt;http://search.yahoo.com/searchmonkey/product/&amp;gt;


SELECT ?name ?description WHERE {
  ?product a sm:Product ;
           rdfs:label ?name ; 
           gr:description ?description . 
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;As with the RDFa Distiller and Parser, in addition to seeing the results of my query on the SPARQLer form, I&amp;rsquo;ll see the URL in the navigation bar that I could have used to execute the same query against the same data with a single URL instead of using the form. This is the grander cool part: I can say &amp;ldquo;extract the RDF triples from the RDFa on that web page and then run this SPARQL query against it&amp;rdquo; &lt;em&gt;all with a single URL&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;Of course it&amp;rsquo;s a long, messy-looking URL because of the &lt;a href=&#34;http://www.xs4all.nl/~jlpoutre/BoT/Javascript/Utils/endecode.html&#34;&gt;URL-escaping&lt;/a&gt; of things like the spaces and punctuation in the SPARQL query. Any modern programming or scripting language provides a function that does this for you, and I&amp;rsquo;ve already written a perl script that does something pretty valuable with all this. More on that in a week or two.&lt;/p&gt;
&lt;h2 id=&#34;5-comments&#34;&gt;5 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.subbu.org&#34; title=&#34;http://www.subbu.org&#34;&gt;Subbu Allamaraju&lt;/a&gt; on &lt;a href=&#34;#comment-2528&#34;&gt;June 3, 2010 10:42 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I was intrigued by the title of this post only to find that the &amp;ldquo;RESTful&amp;quot;ness here is encoding some query into a URI that clients can GET to. May be it should be titled &amp;ldquo;How to Encode SPARQL into URIs&amp;rdquo;&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob&lt;/a&gt; on &lt;a href=&#34;#comment-2530&#34;&gt;June 3, 2010 12:05 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Subbu,&lt;/p&gt;
&lt;p&gt;As a matter of fact, I didn&amp;rsquo;t say how to encode SPARQL into URIs, and mentioned that most programming languages have a function that will do that for you. I described some services to call with those URIs once you have them, and how the use of two of these services could be combined in one call.&lt;/p&gt;
&lt;p&gt;Maybe I have an oversimplified idea of what qualifies as RESTful, but if a process can instruct processes on other servers to provide specific machine-readable information using HTTP GETs, I thought that qualified. It certainly can play a role in a useful distributed application.&lt;/p&gt;
&lt;p&gt;Bob\&lt;/p&gt;
&lt;p&gt;By Damian on &lt;a href=&#34;#comment-2531&#34;&gt;June 3, 2010 12:13 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Hauntingly familiar:&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://www.semanticoverflow.com/questions/587/is-there-a-web-service-that-allow-me-to-run-sparql-against-a-xhtmlrdfa-website/588#588&#34;&gt;http://www.semanticoverflow.com/questions/587/is-there-a-web-service-that-allow-me-to-run-sparql-against-a-xhtmlrdfa-website/588#588&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;There is something very pleasing about this sort of composition.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://element.rubyforge.org/&#34; title=&#34;http://element.rubyforge.org/&#34;&gt;carmen&lt;/a&gt; on &lt;a href=&#34;#comment-2550&#34;&gt;June 13, 2010 8:18 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;SPARQListas do some things over and over&lt;/p&gt;
&lt;p&gt;select some set of resources&lt;br /&gt;
as you did in both queries:&lt;br /&gt;
(_, personName, &lt;em&gt;)&lt;br /&gt;
(&lt;/em&gt; , type, someClass)&lt;/p&gt;
&lt;p&gt;i&amp;rsquo;d call your approach of encoding an arbitrary query-language into a querystring argument RPC-ish rather than REST-ful&lt;/p&gt;
&lt;p&gt;GET already has one URI per request. in the first example, theres exactly one URI in the triple pattern, so a single querystring key can apply &amp;ldquo;function that builds a (_ URI _) triplepattern&amp;rdquo; . the second example can specify the second URI in the querystring&lt;/p&gt;
&lt;p&gt;once we have the set of resources, pulling out the names, locations etc can be done with existing tools like CSS selectors or XPATH. or much more concise RDF path-expression microsyntaxes that arent ugly smashed into a URL&lt;/p&gt;
&lt;p&gt;obviously there are larger ad-hoc cases where you really want the power of full SPARQL, but the jump to the complexity of requiring a SPARQL engine is not necessary for some large swath of typical web needs. just like the world realized they didnt need SQL when a basic key/val hashtable store (with interesting sharding and distribution possibilities)&lt;/p&gt;
&lt;p&gt;i like where youre going i just think it can be taken a lot further, and since i personally was annoyed by the notion of having to flush REST down the drain in favor of SPARQL i decided to scratch the itch&lt;/p&gt;
&lt;p&gt;even basic things in HTTP are unspecified, for example how do you in-band into the URI Accept: arguemnts like the content-type. ive seen ?output=, ?format=, appending the extension of the format to the URI before the querystring, and countless other variations.&lt;/p&gt;
&lt;p&gt;&lt;br /&gt;
it would be nice if there were some standards there. maybe full-fledfged URI keys like myapi:format which could rdf:sameAs some standard definition of what to do with that querystring arg..&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-2551&#34;&gt;June 13, 2010 10:07 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Carmen,&lt;/p&gt;
&lt;p&gt;In general, that all makes sense to me. I certainly didn&amp;rsquo;t mean to flush REST down the drain. I wanted to encode a request for a resource into a URL that turned out to have a lot of extra stuff, so (as I said before) perhaps idea of REST is too broad.&lt;/p&gt;
&lt;p&gt;XPath (and for that matter, CSS) won&amp;rsquo;t work, though, unless the data conforms to a very specific structure that the person writing the query can take for granted. Then, of course, the query can be a lot simpler. I&amp;rsquo;ve worked with XML long enough to know that getting a wide variety of people to follow a specific DTD/schema in a wide variety of cases is a lot easier said than done, which is why I like the flexibility that RDF offers. This flexibility does shift the processing load elsewhere&amp;ndash;in the case of my example, to the query engine&amp;ndash;but the query engine software to do the work is out there, and I see it making a contribution to some real data processing problems.\&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2010">2010</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
    </item>
    
    <item>
      <title>Writing applications for 2G phones</title>
      <link>https://www.bobdc.com/blog/writing-applications-for-2g-ph/</link>
      <pubDate>Wed, 26 May 2010 12:54:56 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/writing-applications-for-2g-ph/</guid>
      
      
      <description><div>Not fancy apps, but they&#39;ll work with billions of phones.</div><div>&lt;p&gt;&lt;a href=&#34;http://www.ibm.com/developerworks/xml/library/x-2gserver/&#34;&gt;&lt;img id=&#34;id103318&#34; src=&#34;http://www.ibm.com/developerworks/i/dwwordmark.gif&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;developerWorks&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Two new habits of mine gave me a great idea for a simple application:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;When I want to jot something down and have my phone but no pen or paper, I send an SMS text to my regular email address. (I ain&amp;rsquo;t got one of them fancy 3G phones yet. We&amp;rsquo;re still waiting for 3G coverage in our neck of the woods.)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;When I need to send a quick message to one of my daughters at school to read when she gets a chance, I&amp;rsquo;ll send an SMS text to her phone from my email account, because I can type a lot faster on a full-sized keyboard than I can on my phone&amp;rsquo;s.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I realized that if SMS text messages can be sent or received as regular emails, and a little scripting can &lt;a href=&#34;http://www.xml.com/pub/a/2005/11/23/hacking-ebay-turning-email-alerts-into-atom.html&#34;&gt;automate the handling&lt;/a&gt; of email, then a server-side script that responds to queries delivered to it as SMS text messages would not be difficult to write.&lt;/p&gt;
&lt;p&gt;So I wrote a demo, and described it in the IBM developerWorks article &lt;a href=&#34;http://www.ibm.com/developerworks/xml/library/x-2gserver/&#34;&gt;Simple server-side 2G phone apps&lt;/a&gt;. Here&amp;rsquo;s the summary from the beginning of the article:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Mobile phones are transforming economies and societies all over the world, but often with phones that might be considered out-of-date by gadget geeks in more developed nations. The good news is that applications that work with these phones can be very simple to write, and they give your application a huge potential user base. In this article, learn how to write programs that respond to specialized requests for information from 2G phones.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;When you send a text message of a US telephone area code to the email address I set up with this little application, it texts back a short description of the geographic coverage of that zip code.&lt;/p&gt;
&lt;p&gt;When one of my daughter&amp;rsquo;s friends missed a call on her phone and wondered where in the country it was coming from, my daughter suggested that she text the area code to the email address that I had had her test so many times, and it worked, so it was nice to see the app take this small step beyond demo status.&lt;/p&gt;
&lt;p&gt;I love the irony of how seemingly modern new applications can often be built with old-fashioned UNIX tools like procmail. Check out the article to learn more.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2010">2010</category>
      
      <category domain="https://www.bobdc.com//categories/neat-tricks">neat tricks</category>
      
    </item>
    
    <item>
      <title>What&#39;s wrong with undeclared classes and properties?</title>
      <link>https://www.bobdc.com/blog/whats-wrong-with-undeclared-cl/</link>
      <pubDate>Fri, 30 Apr 2010 09:57:03 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/whats-wrong-with-undeclared-cl/</guid>
      
      
      <description><div>It&#39;s not like the RDF spec requires them.</div><div>&lt;p&gt;OK, it&amp;rsquo;s a rhetorical question. I know the answer: we can attach metadata to class and property declarations, so when we know that a given instance is a member of a particular class and has certain properties, if those are declared, we know more about the instance and can do more with it, not least of all aggregate it more easily with other data that uses the same or related classes and properties.&lt;/p&gt;
&lt;p&gt;I learned from Paula Gearon and Tom Heath tweets that section 2.3.2 of the &amp;ldquo;Weaving the Pedantic Web&amp;rdquo; paper (&lt;a href=&#34;http://events.linkeddata.org/ldow2010/papers/ldow2010_paper04.pdf&#34;&gt;pdf&lt;/a&gt;) presented at the Linked Data on the Web conference in Raleigh bemoans the existence of undeclared classes and attributes. I agree that this is not a good thing, but we should be careful about attacking it.&lt;/p&gt;
&lt;p&gt;The Pedantic Web paper does point out that &amp;ldquo;such practice is not prohibited&amp;rdquo;, which many people seem to forget. This reminds me of the decision to qualify merely well-formed XML as legal, parsable markup, which was one of the big breaks that XML made from SGML, or Tim Berners-Lee&amp;rsquo;s decision to accept the possibility of broken links in his hypertext system, unlike those of his predecessors. Serious XML-based applications still use DTDs or schemas and well-maintained web sites use some kind of link management, but the simpler, grass roots efforts don&amp;rsquo;t necessarily, and that turned out to be a great thing. It let these technologies grow to a point where millions of people can see their benefits.&lt;/p&gt;
&lt;p&gt;If I have a triple that says&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;lt;http://www.snee.com/d/r/s3/l9d&amp;gt; &amp;lt;http://www.snee.com/8r/xa/32e&amp;gt;  &amp;quot;true&amp;quot;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;and my subject and predicate aren&amp;rsquo;t declared anywhere, it doesn&amp;rsquo;t tell you much. If I have one that says this with an undeclared subject and predicate,&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;lt;http://www.snee.com/d/r/invoice#l9d&amp;gt; &amp;lt;http://www.snee.com/8r/xa/paid&amp;gt;  &amp;quot;true&amp;quot;
&lt;/code&gt;&lt;/pre&gt;
&lt;blockquote id=&#34;id103368&#34; class=&#34;pullquote&#34;&gt;I worry that I fall into the standardista class because I think that using the word &#34;semantic&#34; in your marketing literature isn&#39;t enough to qualify your work as part of the semantic web. &lt;/blockquote&gt;
&lt;p&gt;you can get a general idea of what&amp;rsquo;s going on even with no declarations, as you often can from element and attribute names in XML documents that have no corresponding schemas. Unlike the XML example, though, we can see a domain name associated with &amp;ldquo;invoice#129d&amp;rdquo; and &amp;ldquo;paid&amp;rdquo; here, which gives some context and therefore a bit of semantics about them.&lt;/p&gt;
&lt;p&gt;One great thing about RDF is that you can add on metadata after the fact, as Jim Hendler&amp;rsquo;s group at RPI is doing with a lot of the US government data. Third parties certainly can&amp;rsquo;t fix broken web links, and while James Clark&amp;rsquo;s wonderful &lt;a href=&#34;http://www.thaiopensource.com/relaxng/trang.html&#34;&gt;trang&lt;/a&gt; can generate schemas from documents, that&amp;rsquo;s more useful as a &lt;a href=&#34;http://www.snee.com/xml/xml2008/&#34;&gt;content analysis tool&lt;/a&gt; than as something that you&amp;rsquo;d use to create production schemas. Adding metadata such as declarations to triples after the fact is a perfectly normal thing to do, and it helps connect those triples to each other to form a, you know, web.&lt;/p&gt;
&lt;p&gt;I certainly don&amp;rsquo;t want to imply that the Pedantic Web effort is doing anything wrong; their efforts to educate people about the value of doing these things with more rigor are very valuable. In the name-calling that most discussions of new technology seem to devolve into these days (pedant! fanboy! standardista!), I worry that I fall into the standardista class because I think that using the word &amp;ldquo;semantic&amp;rdquo; in your marketing literature isn&amp;rsquo;t enough to qualify your work as part of the semantic web. I want to see support for relevant W3C standards involved, a position that apparently can get me lumped into the class of unreasonably demanding geeks who don&amp;rsquo;t appreciate the big picture, so I wanted to point out that the (spec-compliant) optional nature of class and property declarations can be a huge contributor to the growth of the semantic web.&lt;/p&gt;
&lt;p&gt;XML and Tim Berners-Lee&amp;rsquo;s hypertext system scaled up to the point that they did because of both carefully engineered efforts and the fast growth of unrigorous ones. Careful engineering of a system using semantic web technology can get a lot of value from class and property declarations, but we should remember that the other great thing about RDF, besides the ease of adding metadata to existing data, is that triples are simple and easy to aggregate and therefore share. Let&amp;rsquo;s not discourage people from doing so if they don&amp;rsquo;t happen to be doing it the way that we would.&lt;/p&gt;
&lt;h2 id=&#34;3-comments&#34;&gt;3 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://danbri.org/&#34; title=&#34;http://danbri.org/&#34;&gt;Dan Brickley&lt;/a&gt; on &lt;a href=&#34;#comment-2490&#34;&gt;April 30, 2010 11:57 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve always said dereferencing is a privilege not a right; there will certainly be RDF/OWL vocabs that aren&amp;rsquo;t public, even while bits of data using those vocabs might leak out. This is fine and inevitable. The reason to describe your properties and classes, and make them deferenceable, is just that is makes folk more likely (and more able) to use them. And by documenting the &amp;lsquo;real&amp;rsquo; vocab it makes error detection easier, since a typo in the name results in different behaviour. There are other ways around that one of course (eg. stats from aggregators).&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s nothing fancier than - &amp;lsquo;If you want lots of people to use your stuff, document it carefully&amp;rsquo;. I don&amp;rsquo;t see any huge difference here between RDF, XML or general software documentation issues.&lt;/p&gt;
&lt;p&gt;The classic undocumented properties in RDF are rdf:_12345 etc &amp;hellip; maybe someone should update that schema, building on the fantastic Linked Open Numbers work? :) &lt;a href=&#34;http://km.aifb.kit.edu/projects/numbers/&#34;&gt;http://km.aifb.kit.edu/projects/numbers/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://mud.cz/&#34; title=&#34;http://mud.cz/&#34;&gt;Jiri Prochazka&lt;/a&gt; on &lt;a href=&#34;#comment-2491&#34;&gt;April 30, 2010 2:34 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;You are right, but emphasizing this doesn&amp;rsquo;t bring any advantages as far as usefulness is concerned. Let&amp;rsquo;s not forget RDF is meant to be consumed by machines, not humans. Machines cannot see inside URIs, not literals&amp;hellip; So I wouldn&amp;rsquo;t call this helping the growth of the Semantic Web but rather helping the growth of the Linked Data. I expect knowledge using RDFS/OWL to be called Semantic Web, but this data I would be reluctant to call knowledge since is isn&amp;rsquo;t really machine understandable at all (anyway lot of markup oriented people are confused by this too).&lt;/p&gt;
&lt;p&gt;Still it&amp;rsquo;s better than the mess we are in now&amp;hellip;&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://danbri.org/&#34; title=&#34;http://danbri.org/&#34;&gt;Dan Brickley&lt;/a&gt; on &lt;a href=&#34;#comment-2493&#34;&gt;April 30, 2010 6:24 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;@Jira, &amp;hellip; even if people aren&amp;rsquo;t reading the RDF directly, they&amp;rsquo;re still often writing software that matches its patterns, or composing queries, or running analytics. And in practice this is often done in an example-driven manner. When developers encounter a new dataset, they&amp;rsquo;re far more likely to seek out example instance data, than to go meta and read the schema. The schema is there for reference and checking, but commonly skipped over until things go wrong. Examples are much more important to real usage&amp;hellip;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2010">2010</category>
      
      <category domain="https://www.bobdc.com//categories/rdf">RDF</category>
      
    </item>
    
    <item>
      <title>The meaning of &#34;semantics&#34;</title>
      <link>https://www.bobdc.com/blog/the-meaning-of-semantics/</link>
      <pubDate>Tue, 09 Mar 2010 18:48:11 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/the-meaning-of-semantics/</guid>
      
      
      <description><div>No pun intended.</div><div>&lt;p&gt;&lt;a href=&#34;http://www.flickr.com/photos/julianbleecker/269633724/&#34;&gt;&lt;img id=&#34;id103303&#34; src=&#34;http://farm1.static.flickr.com/79/269633724_459cd88a11.jpg&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;Configured Scenario Semantics&#34; width=&#34;280&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Dave McComb&amp;rsquo;s book &lt;a href=&#34;http://www.amazon.com/exec/obidos/ISBN=1558609172/bobducharmeA/%20&#34;&gt;Semantics in Business Systems&lt;/a&gt; recommended John Saeed&amp;rsquo;s &lt;a href=&#34;http://www.amazon.com/exec/obidos/ISBN=1405156392/bobducharmeA/%20&#34;&gt;Semantics&lt;/a&gt; as an &amp;ldquo;excellent introductory book on semantics in everyday life&amp;rdquo;, so I found a cheap used copy and have been working my way through it. I&amp;rsquo;m sure that it&amp;rsquo;s been used for both graduate and undergraduate courses, and it&amp;rsquo;s not too difficult to follow so far. I especially like this part, which Saeed said he adapted from the work of Charles Morris:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;syntax: the formal relation of signs to each other;&lt;/p&gt;
&lt;p&gt;semantics: the relations of signs to the objects to which the signs are applicable;&lt;/p&gt;
&lt;p&gt;pragmatics: the relation of signs to interpreters.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;He goes on to say that &amp;ldquo;the whole science of language, consisting of the three parts mentioned, is called semiotic&amp;rdquo;, but I was more interested in the way he put semantics in the larger context.&lt;/p&gt;
&lt;p&gt;Printed and &lt;a href=&#34;http://dictionary.reference.com/browse/semantics&#34;&gt;dictionary.com&lt;/a&gt; definitions of &amp;ldquo;semantics&amp;rdquo; typically come in pairs, with the first usually saying &amp;ldquo;the study of meaning&amp;rdquo; and the second more in line with Saeed&amp;rsquo;s definition. The latter is sometimes identified as being specific to the fields of linguistics or semiotics.&lt;/p&gt;
&lt;p&gt;I think that the linguistics/semiotics definition serves the semantic web better, because describing semantics as the relations of signs to the things they signify (and moving some of the &amp;ldquo;meaning&amp;rdquo; parts that take place in peoples&amp;rsquo; heads to the &amp;ldquo;pragmatics&amp;rdquo; category) helps us to focus on what the semantic web is the best at: providing an infrastructure to identify which signs (IDs in the form of URIs) refer to which objects (resources) so that people can use this infrastructure to create applications that work across the web.&lt;/p&gt;
&lt;p&gt;Interpretation of the &amp;ldquo;meaning&amp;rdquo; of the signified resources is not necessarily a goal of these applications. While &lt;a href=&#34;https://www.bobdc.com/blog/adding-semantics-to-make-data&#34;&gt;OWL&lt;/a&gt; can encode properties of concepts to let us do more reasoning with those concepts, attacking the feasibility of getting computers to Understand Meaning is a straw man argument that I&amp;rsquo;m tired of hearing from people who insist that the semantic web is an impractical idea. Standards and best practices that let applications track the relationship of identifiers to resources on a World Wide Web scale—who can argue with that?&lt;/p&gt;
&lt;p&gt;(photo: &lt;a href=&#34;http://www.flickr.com/photos/julianbleecker/&#34;&gt;http://www.flickr.com/photos/julianbleecker/&lt;/a&gt; / &lt;a href=&#34;http://creativecommons.org/licenses/by-nc-nd/2.0/&#34;&gt;CC BY-NC-ND 2.0)&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&#34;4-comments&#34;&gt;4 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.ccil.org/~cowan&#34; title=&#34;http://www.ccil.org/~cowan&#34;&gt;John Cowan&lt;/a&gt; on &lt;a href=&#34;#comment-2457&#34;&gt;March 9, 2010 8:09 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Two experts*, to explicate meaning,&lt;br /&gt;
Wrote a book called &lt;em&gt;The Meaning of Meaning&lt;/em&gt;.&lt;br /&gt;
     The world still perplexed,&lt;br /&gt;
     Three experts wrote next&lt;br /&gt;
&lt;em&gt;The Meaning of &amp;ldquo;Meaning of Meaning&amp;rdquo;&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;*Ogden and Richards&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://prateek-jain.com&#34; title=&#34;http://prateek-jain.com&#34;&gt;Prateek&lt;/a&gt; on &lt;a href=&#34;#comment-2458&#34;&gt;March 9, 2010 9:57 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Another excellent book in my humble opinion&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://www.semantic-web-book.org/page/Foundations_of_Semantic_Web_Technologies&#34;&gt;http://www.semantic-web-book.org/page/Foundations_of_Semantic_Web_Technologies&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.openlinksw.com/blog/~kidehen&#34; title=&#34;http://www.openlinksw.com/blog/~kidehen&#34;&gt;Kingsley Idehen&lt;/a&gt; on &lt;a href=&#34;#comment-2459&#34;&gt;March 10, 2010 7:59 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Bob,&lt;/p&gt;
&lt;p&gt;Nice post, and very well stated.&lt;/p&gt;
&lt;p&gt;I think, Microsoft&amp;rsquo;s use of deep zoom images as symbols (rather than exposing http identifiers) for the human interaction aspect of Linked Data Browsing / Exploration UIs may finally drive home the mercurial essence of what Linked Data is fundamentally about.&lt;/p&gt;
&lt;p&gt;If you haven&amp;rsquo;t done so already see:&lt;/p&gt;
&lt;p&gt;1. &lt;a href=&#34;http://www.youtube.com/watch?v=G29DBIEcIuQ&#34;&gt;http://www.youtube.com/watch?v=G29DBIEcIuQ&lt;/a&gt; &amp;ndash; Microsoft Pivot in front of Virtuoso&amp;rsquo;s DBMS hosted Faceted Linked Data Navigation Engine&lt;/p&gt;
&lt;p&gt;&lt;br /&gt;
Kingsley\&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://markwatson.com&#34; title=&#34;http://markwatson.com&#34;&gt;Mark Watson&lt;/a&gt; on &lt;a href=&#34;#comment-2461&#34;&gt;March 10, 2010 12:04 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Bob, a good overview, thanks. Another good read (Ben Goertzel recommended this to me when we worked together): &amp;ldquo;Semantics, Primes and Universals&amp;rdquo; by Anna Wierzbicka.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2010">2010</category>
      
      <category domain="https://www.bobdc.com//categories/semantic-web">semantic web</category>
      
    </item>
    
    <item>
      <title>Is SPIN the Schematron of RDF?</title>
      <link>https://www.bobdc.com/blog/is-spin-the-schematron-of-rdf/</link>
      <pubDate>Mon, 01 Mar 2010 18:56:45 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/is-spin-the-schematron-of-rdf/</guid>
      
      
      <description><div>Represent business rules using an implemented standard, then flagging violations in a machine-readable way.</div><div>&lt;blockquote id=&#34;id103300&#34; class=&#34;pullquote&#34;&gt;Many complain about the potentially low quality of public semantic web data, but Fürber and Hepp are doing something about it.&lt;/blockquote&gt;
&lt;p&gt;Christian Fürber and Martin Hepp (the latter being the source of the increasingly popular &lt;a href=&#34;http://www.heppnetz.de/projects/goodrelations/&#34;&gt;GoodRelations&lt;/a&gt; ontology) have published a paper titled &amp;ldquo;Using SPARQL and SPIN for Data Quality Management on the Semantic Web&amp;rdquo; (&lt;a href=&#34;http://www.heppnetz.de/files/fuerber-hepp-sparql-spin-dqm.pdf&#34;&gt;pdf&lt;/a&gt;) for the 2010 &lt;a href=&#34;http://bis.kie.ae.poznan.pl/13th_bis/&#34;&gt;Business Informations Systems&lt;/a&gt; conference in Berlin. TopQuadrant&amp;rsquo;s Holger Knublach designed SPIN, or the &lt;a href=&#34;http://www.spinrdf.org/&#34;&gt;SPARQL Inferencing Notation&lt;/a&gt;, as a SPARQL-based way to express constraints and inferencing rules on sets of triples, and Fürber and Hepp have taken a careful, structured look at how to apply it to business data.&lt;/p&gt;
&lt;p&gt;I knew that &amp;ldquo;data quality&amp;rdquo; was a specific discipline within IT, but I hadn&amp;rsquo;t looked at it very closely. Their paper gives a nice overview of this area before moving on to describing their work. It also describes the value that a systematic approach to data quality can bring to semantic web applications, but I don&amp;rsquo;t think anyone needs any convincing there; it&amp;rsquo;s often the first issue people bring up when they hear about the very idea of Linked Data on the web.&lt;/p&gt;
&lt;p&gt;Or, to put it more bluntly, many complain about the potentially low quality of public semantic web data, but Fürber and Hepp are doing something about it. SPIN may have the potential to do for RDF data what &lt;a href=&#34;http://xml.ascc.net/resource/schematron/schematron.html&#34;&gt;Schematron&lt;/a&gt; has done for XML for years now: providing a technique, based entirely on an existing, well-implemented W3C standard, for describing business rules about data and then validating data against those rules. (I see that William Vambenepe &lt;a href=&#34;http://stage.vambenepe.com/archives/496&#34;&gt;had some thoughts&lt;/a&gt; on the comparison early last year.)&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;m looking forward to Fürber and Hepp&amp;rsquo;s future work described in their paper and to seeing how others apply it in their applications.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2010">2010</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
    </item>
    
    <item>
      <title>Using the ARQ SPARQL processor from the command line</title>
      <link>https://www.bobdc.com/blog/using-the-arq-sparql-processor/</link>
      <pubDate>Thu, 21 Jan 2010 10:38:55 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/using-the-arq-sparql-processor/</guid>
      
      
      <description><div>With the Jena extensions.</div><div>&lt;p&gt;I recently described how to execute &lt;a href=&#34;https://www.bobdc.com/blog/federated-sparql-queries&#34;&gt;Federated SPARQL queries&lt;/a&gt; that use Jena extensions that we&amp;rsquo;ll hopefully see added to the SPARQL 1.1 standard. I showed a sample query and suggested that you try it at the &lt;a href=&#34;http://www.sparql.org/query.html&#34;&gt;sparql.org RDF Query Demo page&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;For local, command-line use of SPARQL, I&amp;rsquo;ve used the Jena &lt;a href=&#34;http://jena.sourceforge.net/ARQ/&#34;&gt;ARQ&lt;/a&gt; query engine for years, but my sample federated query didn&amp;rsquo;t work with it, and now I know why: the sparql.bat file that comes with the distribution invokes the processor in a strictly standards-compliant mode without the extensions enabled. I thought I&amp;rsquo;d have to write and compile some Java code to use the extensions, but my co-worker Jeremy Carroll pointed out that the sparql.bat file in ARQ&amp;rsquo;s bat subdirectory calls the arq.sparql library, like this,&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;java -cp %CP% arq.sparql %*
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;and that calling the arq.arq library instead enables the extensions. Then, I noticed the arq.bat file in the same directory as sparql.bat, and this is exactly what it does. There are more batch files in there, and a web search on their names led me to an &lt;a href=&#34;http://jena.sourceforge.net/ARQ/cmds.html&#34;&gt;ARQ - Command Line Applications&lt;/a&gt; documentation page, which will be handy.&lt;/p&gt;
&lt;p&gt;Using arq.bat instead of sparql.bat, the sample federated query works as written (tested with ARQ 2.8.2), and so does LET assignment and &lt;a href=&#34;http://jena.sourceforge.net/ARQ/library-function.html&#34;&gt;extension functions&lt;/a&gt;, making it possible to use ARQ in real semantic web application development with no need to do Java coding around the Jena API.&lt;/p&gt;
&lt;p&gt;(Thanks again, Jeremy!)&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2010">2010</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
    </item>
    
    <item>
      <title>Live stock ticker data in RDF</title>
      <link>https://www.bobdc.com/blog/live-stock-ticker-data-in-rdf/</link>
      <pubDate>Tue, 12 Jan 2010 11:19:22 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/live-stock-ticker-data-in-rdf/</guid>
      
      
      <description><div>Well, on a 20-minute delay.</div><div>&lt;p&gt;I&amp;rsquo;ve played with finance.yahoo.com&amp;rsquo;s feed of CSV stock ticker data &lt;a href=&#34;https://www.bobdc.com/blog/using-the-twitter-api-to-alert&#34;&gt;before&lt;/a&gt; and recently had an idea that was so simple that I&amp;rsquo;m surprised that no one&amp;rsquo;s done it before: why not write a script that passes along a request for this data but converts the result to RDF before returning it? So I did.&lt;/p&gt;
&lt;blockquote id=&#34;id103312&#34; class=&#34;pullquote&#34;&gt;I supposed it might count as a semantic web service.&lt;/blockquote&gt;
&lt;p&gt;A URL like &lt;a href=&#34;http://www.rdfdata.org/cgi/stockquotes.cgi?symbols=BUD,IBM,SNE&#34;&gt;http://www.rdfdata.org/cgi/stockquotes.cgi?symbols=BUD,IBM,SNE&lt;/a&gt; asks for recent ticker information about the stock symbols listed in the comma-separated value list. The stockquotes.cgi script adds the parameters to the appropriate stub to create a URL like &lt;a href=&#34;http://download.finance.yahoo.com/d/quotes.csv?f=sl1d1t1ohgv&amp;amp;e=.csv&amp;amp;s=BUD,IBM,SNE&#34;&gt;http://download.finance.yahoo.com/d/quotes.csv?f=sl1d1t1ohgv&amp;amp;e=.csv&amp;amp;s=BUD,IBM,SNE&lt;/a&gt;, uses this URL to retrieve the CSV results, converts them to RDF/XML, and sends that back to the original requester with a MIME type of application/rdf+xml. The whole script, with white space and comments, wasn&amp;rsquo;t even 100 lines. You can click the first link in this paragraph to see an example of it in action.&lt;/p&gt;
&lt;p&gt;I haven&amp;rsquo;t done anything with the rdfdata.org domain name in a while, so I thought that would be a nice place for this. I&amp;rsquo;ve already used this little web service in a work-related demo that combines and cross-references RDF data from multiple sources, because after all, that&amp;rsquo;s one of the things that RDF is so good at.&lt;/p&gt;
&lt;p&gt;Is this a &amp;ldquo;semantic web service&amp;rdquo;? All it does is convert the data returned by a Yahoo feed into a different syntax and pass it along. I did throw together a little ontology to name the properties, but it doesn&amp;rsquo;t add a lot of semantics. On the other hand, my script&amp;rsquo;s output syntax is based on a semantic web standard, and it makes the data easier to use in semantic web applications, so I suppose it might count as a semantic web service.&lt;/p&gt;
&lt;p&gt;I hope this is useful to others, and I hope that more people look for opportunities to convert live feeds of useful data in simple formats into live feeds of RDF.&lt;/p&gt;
&lt;h2 id=&#34;7-comments&#34;&gt;7 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://clockwerx.blogspot.com/&#34; title=&#34;http://clockwerx.blogspot.com/&#34;&gt;Daniel O&amp;rsquo;Connor&lt;/a&gt; on &lt;a href=&#34;#comment-2406&#34;&gt;January 12, 2010 9:51 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://www.freebase.com/view/user/doconnor/default_domain/views/nyse_companies&#34;&gt;http://www.freebase.com/view/user/doconnor/default_domain/views/nyse_companies&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Down the bottom are some export CSV links.&lt;/p&gt;
&lt;p&gt;Alternatively, you could view the &amp;ldquo;MQL&amp;rdquo; (like sparql if it were made of javascript/json), and an MQLread webservice to search for specific ids / matches&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.jazzengineers.com/&#34; title=&#34;http://www.jazzengineers.com/&#34;&gt;Bob&lt;/a&gt; on &lt;a href=&#34;#comment-2407&#34;&gt;January 13, 2010 12:43 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Thanks Daniel! I followed through a little there and found &lt;a href=&#34;http://rdf.freebase.com/rdf/en.the_hershey_company&#34;&gt;http://rdf.freebase.com/rdf/en.the_hershey_company&lt;/a&gt; , which is the first good example I&amp;rsquo;ve found of RDF from Freebase. I&amp;rsquo;m guessing that there&amp;rsquo;s a lot more&amp;hellip;&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://melvincarvalho.com&#34; title=&#34;http://melvincarvalho.com&#34;&gt;Melvin Carvalho&lt;/a&gt; on &lt;a href=&#34;#comment-2428&#34;&gt;January 15, 2010 8:52 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Nice service, I challenged myself to write a wrapper on this in 15 minutes, and here&amp;rsquo;s my attempt:&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://marketdata.me/&#34;&gt;http://marketdata.me/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.hellowallet.com&#34; title=&#34;http://www.hellowallet.com&#34;&gt;Erwin&lt;/a&gt; on &lt;a href=&#34;#comment-2537&#34;&gt;June 7, 2010 11:38 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Bob, any idea why your link makes me download a file named stockquotes.cgi? I wanted to see another example besides Melvin&amp;rsquo;s so I can write something for my own purposes. Thanks.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob&lt;/a&gt; on &lt;a href=&#34;#comment-2539&#34;&gt;June 7, 2010 1:32 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Erwin,&lt;/p&gt;
&lt;p&gt;For both mine and Melvin&amp;rsquo;s, Firefox just displays it, while Chrome and IE want to store it as you described. They store the RDF/XML file that Firefox displays.&lt;/p&gt;
&lt;p&gt;I played with the HTTP header returned with the data, but couldn&amp;rsquo;t affect the behavior. The important thing to me is that it works with wget and curl, so that I know that it works as a RESTful web service. It isn&amp;rsquo;t really aimed at browser use. (If it was, I would have had it return an HTML file!)&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.stock-trading-newsletter.com&#34; title=&#34;http://www.stock-trading-newsletter.com&#34;&gt;stock trading newsletter&lt;/a&gt; on &lt;a href=&#34;#comment-2587&#34;&gt;July 14, 2010 9:36 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;This looks good. Can you insert more criteria?&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob&lt;/a&gt; on &lt;a href=&#34;#comment-2588&#34;&gt;July 18, 2010 9:19 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;That was the best I could do with what I had available.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2010">2010</category>
      
      <category domain="https://www.bobdc.com//categories/rdf">RDF</category>
      
    </item>
    
    <item>
      <title>Federated SPARQL queries</title>
      <link>https://www.bobdc.com/blog/federated-sparql-queries/</link>
      <pubDate>Mon, 04 Jan 2010 13:07:44 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/federated-sparql-queries/</guid>
      
      
      <description><div>Using a Jena extension.</div><div>&lt;p&gt;Much of the promise of RDF and Linked Data is the ease of pulling data from multiple sources and combining it. I recently discovered the SERVICE extension that Jena adds to SPARQL, letting you send subqueries off to multiple SPARQL endpoints and then combine the results. Because a given SPARQL endpoint may be an interface to a triplestore or a relational data store or something else, the ability to query several endpoints with one query is very nice.&lt;/p&gt;
&lt;blockquote id=&#34;id103299&#34; class=&#34;pullquote&#34;&gt;The ability to query several endpoints with one query is very nice.&lt;/blockquote&gt;
&lt;p&gt;The Jena project&amp;rsquo;s &lt;a href=&#34;http://jena.sourceforge.net/ARQ/service.html&#34;&gt;ARQ - Basic Federated SPARQL Query&lt;/a&gt; describes the use of this keyword. Before I start quoting from that page, I wanted to jump right in with an example that worked for me to pull birthday and spouse information about Arnold Schwarzenegger from &lt;a href=&#34;http://dbpedia.org&#34;&gt;DBpedia&lt;/a&gt; and a list of his movies and their release dates from &lt;a href=&#34;http://www.linkedmdb.org/&#34;&gt;Linked Movie Database&lt;/a&gt; in one query:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;PREFIX imdb: &amp;lt;http://data.linkedmdb.org/resource/movie/&amp;gt;
PREFIX dcterms: &amp;lt;http://purl.org/dc/terms/&amp;gt;
PREFIX dbpo: &amp;lt;http://dbpedia.org/ontology/&amp;gt;
PREFIX rdfs: &amp;lt;http://www.w3.org/2000/01/rdf-schema#&amp;gt;


SELECT ?birthDate ?spouseName ?movieTitle ?movieDate {
  { SERVICE &amp;lt;http://dbpedia.org/sparql&amp;gt;
    { SELECT ?birthDate ?spouseName WHERE {
        ?actor rdfs:label &amp;quot;Arnold Schwarzenegger&amp;quot;@en ;
               dbpo:birthDate ?birthDate ;
               dbpo:spouse ?spouseURI .
        ?spouseURI rdfs:label ?spouseName .
        FILTER ( lang(?spouseName) = &amp;quot;en&amp;quot; )
      }
    }
  }
  { SERVICE &amp;lt;http://data.linkedmdb.org/sparql&amp;gt;
    { SELECT ?actor ?movieTitle ?movieDate WHERE {
      ?actor imdb:actor_name &amp;quot;Arnold Schwarzenegger&amp;quot;.
      ?movie imdb:actor ?actor ;
             dcterms:title ?movieTitle ;
             dcterms:date ?movieDate .
      }
    }
  }
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;You can run this query yourself at the &lt;a href=&#34;http://www.sparql.org/query.html&#34;&gt;sparql.org RDF Query Demo page&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Before you start modeling your own queries on this, it&amp;rsquo;s worth reading the Jena documentation page mentioned above, especially the &amp;ldquo;Performance Considerations&amp;rdquo; part:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;This feature is a basic building block to allow remote access in the middle of a query, not a general solution to the issues in distributed query evaluation. The algebra operation is executed without regard to how selective the pattern is. So the order of the query will affect the speed of execution. Because it involves HTTP operations, asking the query in the right order matters a lot. Don&amp;rsquo;t ask for the whole of a bookstore just to find book whose title comes from a local RDF file - ask the bookshop a query with the title already bound from earlier in the query.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;As an example, both subqueries above specifically ask for information about Schwarzenegger instead of trying to scan the complete databases looking for matches.&lt;/p&gt;
&lt;p&gt;Two parts of this trick are non-standard SPARQL, but may become part of SPARQL 1.1: &lt;a href=&#34;http://www.slideshare.net/LeeFeigenbaum/sparql2-status/8&#34;&gt;subqueries&lt;/a&gt; and the &lt;a href=&#34;http://www.slideshare.net/LeeFeigenbaum/sparql2-status/15&#34;&gt;SERVICE keyword&lt;/a&gt;. As the latter Lee Feigenbaum slide points out, the SPARQL Working Group is using ARQ&amp;rsquo;s SERVICE keyword as a starting point in thinking about how a query can target multiple endpoints.&lt;/p&gt;
&lt;p&gt;My query above of the two different SPARQL endpoints also works from within TopQuadrant&amp;rsquo;s TopBraid Suite of products, so I&amp;rsquo;m sure I&amp;rsquo;ll be using this on work-related projects more and more.&lt;/p&gt;
&lt;h2 id=&#34;3-comments&#34;&gt;3 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://thewebsemantic.com&#34; title=&#34;http://thewebsemantic.com&#34;&gt;Taylor&lt;/a&gt; on &lt;a href=&#34;#comment-2397&#34;&gt;January 4, 2010 9:07 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I knew we&amp;rsquo;d get you using Jena sooner or later. It&amp;rsquo;s got the best sparql IMHO.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://karl.glatz.biz&#34; title=&#34;http://karl.glatz.biz&#34;&gt;Karl Glatz&lt;/a&gt; on &lt;a href=&#34;#comment-2535&#34;&gt;June 7, 2010 9:14 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Nice blog post!&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;m not able to test your query on the sparql.org Webpage, got some &amp;ldquo;Error 500: No dataset description for query&amp;rdquo;? Any suggestions?&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob&lt;/a&gt; on &lt;a href=&#34;#comment-2536&#34;&gt;June 7, 2010 10:15 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Maybe one of the endpoints was down when you tried it. I just pasted the query above at the demo page, and it ran fine, i.e. it didn&amp;rsquo;t get an error. There were headers with no results under them, because DBpedia has changed the URL for birthDate to &lt;a href=&#34;http://dbpedia.org/property/birthDate&#34;&gt;http://dbpedia.org/property/birthDate&lt;/a&gt; and no longer has a spouse value for Schwarzenegger, so the ?birthDate and ?spouseURI variables didn&amp;rsquo;t get bound.&lt;/p&gt;
&lt;p&gt;The cleanup of DBpedia&amp;rsquo;s ontologies is obviously a good thing overall, but can break some queries. I have no idea why someone would remove his spouse value. Maria Shriver does have a spouse value of Schwarzenegger.&lt;/p&gt;
&lt;p&gt;Bob&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2010">2010</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
    </item>
    
    <item>
      <title>RDFS: The primary document</title>
      <link>https://www.bobdc.com/blog/rdfs-the-primary-document/</link>
      <pubDate>Sun, 29 Nov 2009 12:10:41 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/rdfs-the-primary-document/</guid>
      
      
      <description><div>Shorter and more interesting than I remember.</div><div>&lt;p&gt;About two years ago I &lt;a href=&#34;https://www.bobdc.com/blog/rdfs-without-rdfowl&#34;&gt;wondered&lt;/a&gt; if RDF Schema had become merely a layer of OWL or if anyone used RDFS by itself without OWL. My theory was that because tools such as TopBraidComposer, Protege, and SWOOP that let you design RDFS vocabularies also let you assign OWL properties to your classes, people used those because they were there, and we ended up with few pure RDFS vocabularies.&lt;/p&gt;
&lt;blockquote id=&#34;id103315&#34; class=&#34;pullquote&#34;&gt;I heartily recommend that you read the first 11 or 18 page of the RDFS spec and skim the rest.&lt;/blockquote&gt;
&lt;p&gt;Lately, though, it seems that a lot of people who had been using the terms vocabulary/taxonomy/ontology interchangeably have started to understand better when OWL is too much. As they review the issues surrounding the choice between OWL 1 Lite, DL, and Full, around OWL 2 EL, QL, and RL, and the implications of open vs. closed world assumptions, more attitudes can be summarized as &amp;ldquo;sounds interesting, but pretty complicated; maybe later.&amp;rdquo; This makes good sense for people whose main interest is defining a standardized vocabulary.&lt;/p&gt;
&lt;p&gt;SKOS looks pretty good to more and more of them, but here I want to focus on RDFS. As I thought more about it recently, I realized that I had never read the &lt;a href=&#34;http://www.w3.org/TR/2004/REC-rdf-schema-20040210/&#34;&gt;RDF Schema Recommendation&lt;/a&gt;, so about five years late I sat down to do so. It&amp;rsquo;s nice to remember, when you&amp;rsquo;re wondering about the true meaning of some term or the relationship between some concepts, that a spec is available where you can just read the official explanation of what&amp;rsquo;s what. (Of course, &lt;a href=&#34;http://www.w3.org/TR/2004/REC-xmlschema-1-20041028/&#34;&gt;some specs&lt;/a&gt; are less enlightening than others when you&amp;rsquo;re confused about what they describe.)&lt;/p&gt;
&lt;p&gt;I found the RDFS Recommendation to be an interesting mix of simple things that are commonly used and complex things that are rarely used. When I printed it out, it was 27 pages, but the summaries and references start on page 18, and the appropriately titled &lt;a href=&#34;http://www.w3.org/TR/2004/REC-rdf-schema-20040210/#ch_othervocab&#34;&gt;Other Vocabulary&lt;/a&gt; section on pages 12 through 17 describes the rarely used features. Let&amp;rsquo;s look at some interesting parts that lead up to that. From the Abstract:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;This specification describes how to use RDF to describe RDF vocabularies.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Maybe that&amp;rsquo;s obvious to some, but it&amp;rsquo;s reassuring when confusion over vocabularies, taxonomies, and ontologies comes up. From the introduction:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The Resource Description Framework (RDF) is a general-purpose language for representing information in the Web.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;As opposed to being a data model. (It&amp;rsquo;s certainly not a syntax!)&lt;/p&gt;
&lt;p&gt;Why do we need this schema language?&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;RDF properties may be thought of as attributes of resources and in this sense correspond to traditional attribute-value pairs. RDF properties also represent relationships between resources.&lt;/p&gt;
&lt;p&gt;RDF however, provides no mechanisms for describing these properties, nor does it provide any mechanisms for describing the relationships between these properties and other resources. That is the role of the RDF vocabulary description language, RDF Schema. RDF Schema defines classes and properties that may be used to describe classes, properties and other resources.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The following is interesting for two reasons: first, because it describes a member of a class as an &amp;ldquo;instance,&amp;rdquo; reminding me that &amp;ldquo;individual&amp;rdquo; is definitely an an OWL term that has no particular role in RDFS. (A little later the document tells us that &amp;ldquo;the members of a class are known as &lt;em&gt;instances&lt;/em&gt; [their emphasis] of the class&amp;rdquo;.) It&amp;rsquo;s also interesting as a nice summary of an issue that often confuses people with an object-oriented background.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The RDF vocabulary description language class and property system is similar to the type systems of object-oriented programming languages such as Java. RDF differs from many such systems in that instead of defining a class in terms of the properties its instances may have, the RDF vocabulary description language describes properties in terms of the classes of resource to which they apply. This is the role of the domain and range mechanisms described in this specification. For example, we could define the &lt;code&gt;eg:author&lt;/code&gt; property to have a domain of &lt;code&gt;eg:Document&lt;/code&gt; and a range of &lt;code&gt;eg:Person&lt;/code&gt;, whereas a classical object oriented system might typically define a class &lt;code&gt;eg:Book&lt;/code&gt; with an attribute called &lt;code&gt;eg:author&lt;/code&gt; of type &lt;code&gt;eg:Person&lt;/code&gt;. Using the RDF approach, it is easy for others to subsequently define additional properties with a domain of eg:&lt;code&gt;Document&lt;/code&gt; or a range of &lt;code&gt;eg:Person&lt;/code&gt;. This can be done without the need to re-define the original description of these classes. One benefit of the RDF property-centric approach is that it allows anyone to extend the description of existing resources, one of the architectural principles of the Web.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The role and relationship of the &lt;code&gt;rdfs:domain&lt;/code&gt; and &lt;code&gt;rdfs:range&lt;/code&gt; properties have confused me and &lt;a href=&#34;http://twitter.com/JeniT/status/5272938272&#34;&gt;many others&lt;/a&gt;. The spec&amp;rsquo;s description of their use is rather technical (nothing wrong with that; it&amp;rsquo;s a spec) but there&amp;rsquo;s this nice passage after that:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&amp;hellip;an RDF vocabulary might describe limitations on the types of values that are appropriate for some property, or on the classes to which it makes sense to ascribe such properties.&lt;/p&gt;
&lt;p&gt;The RDF Vocabulary Description language provides a mechanism for describing this information, but does not say whether or how an application should use it&amp;hellip;&lt;/p&gt;
&lt;p&gt;For example, data checking tools might use this to help discover errors in some data set, an interactive editor might suggest appropriate values, and a reasoning application might use it to infer additional information from instance data.&lt;/p&gt;
&lt;p&gt;RDF vocabularies can describe relationships between vocabulary items from multiple independently developed vocabularies. Since URI-References are used to identify classes and properties in the Web, it is possible to create new properties that have a &lt;code&gt;domain&lt;/code&gt; or &lt;code&gt;range&lt;/code&gt; whose value is a class defined in another namespace.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I think that makes some basic issues clearer.&lt;/p&gt;
&lt;p&gt;I have mixed feelings about the &amp;ldquo;Other vocabulary&amp;rdquo; section on features that, from what I&amp;rsquo;ve seen, never got much traction: container classes and properties, RDF collections, and reification. On the one hand, usage of these can appear so complex that I think it scared a lot of people away from RDF in the early days, obscuring the simplicity of the triple as the fundamental concept of RDF. On the other hand, as I read about these options now, they looked like they could be fun to play with, in a geeky sort of way. (I also realize that the whole concept of reification—the ability to refer to triples as resources themselves so that properties can be assigned to them—is an important bit of RDF foundational architecture for other good RDF-related ideas to build on.)&lt;/p&gt;
&lt;p&gt;So, whether you&amp;rsquo;re new to the whole idea of a standardized definition of a vocabulary or you&amp;rsquo;ve been using OWL and RDFS together for years, I heartily recommend that you read the first 11 or 18 page of the RDFS spec and skim the rest, which includes some handy reference material.&lt;/p&gt;
&lt;h2 id=&#34;1-comments&#34;&gt;1 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://danbri.org/&#34; title=&#34;http://danbri.org/&#34;&gt;Dan Brickley&lt;/a&gt; on &lt;a href=&#34;#comment-2377&#34;&gt;November 30, 2009 5:49 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Finally, someone read it! Thanks ;) I remember bits of those paragraphs coming together from contributors to the original RDFS WG, 98/9, and other things from the RDF Core makeover. There&amp;rsquo;s lots I&amp;rsquo;d do differently now but that&amp;rsquo;s life! There are some other bits that got mostly dropped from the doc at some point, eg. about the Warwick Framework,&lt;br /&gt;
&amp;ldquo;&amp;ldquo;&amp;ldquo;RDF and the RDF Schema language were also based on metadata research in the Digital Library community. In particular, RDF adopts a modular approach to metadata that can be considered an implementation of the Warwick Framework [WF]. RDF represents an evolution of the Warwick Framework model in that the Warwick Framework allowed each metadata vocabulary to be represented in a different syntax. In RDF, all vocabularies are expressed within a single well defined model. This allows for a finer grained mixing of machine-processable vocabularies, and addresses the need [EXTWEB] to create metadata in which statements can draw upon multiple vocabularies that are managed in a decentralized fashion by independent communities of expertise. &amp;quot;&amp;rdquo;&amp;rdquo;&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://www.w3.org/TR/2000/CR-rdf-schema-20000327/&#34;&gt;http://www.w3.org/TR/2000/CR-rdf-schema-20000327/&lt;/a&gt;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2009">2009</category>
      
      <category domain="https://www.bobdc.com//categories/rdf/owl">RDF/OWL</category>
      
    </item>
    
    <item>
      <title>Converting Word documents to DITA</title>
      <link>https://www.bobdc.com/blog/converting-word-documents-to-d/</link>
      <pubDate>Fri, 20 Nov 2009 09:37:31 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/converting-word-documents-to-d/</guid>
      
      
      <description><div>Via OpenOffice and DocBook.</div><div>&lt;p&gt;I recently had to convert a few Microsoft Word documents to DITA XML and thought it would be worth sharing my notes on the steps I took. To summarize, I opened each Word document with OpenOffice 3.1, saved it as a DocBook XML document, and then converted that to DITA with the XSLT stylesheet from a DITA plugin that I found. Images were a little more trouble, but at least I was able to eventually automate that part as well, dispelling my worries that I&amp;rsquo;d have to add all the image references to the DITA files by hand.&lt;/p&gt;
&lt;h2 id=&#34;id103333&#34;&gt;Word to DocBook&lt;/h2&gt;
&lt;img id=&#34;id103300&#34; src=&#34;https://www.bobdc.com/img/main/word2dita.jpg&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;Word and DITA logos&#34;/&gt;
&lt;p&gt;When you open a Word file with OpenOffice and do a Save As DocBook, it assumes that the document uses default Word styles, because that&amp;rsquo;s how OpenOffice knows what&amp;rsquo;s what in the document&amp;rsquo;s structure. The conversion does an impressive job of adding wrappers in the appropriate places considering that it&amp;rsquo;s using an XSLT 1.0 stylesheet. This kind of stylesheet would be &lt;a href=&#34;http://www.xml.com/pub/a/2003/11/05/tr.html&#34;&gt;much easier to write&lt;/a&gt; with XSLT 2, but that reduces the choice of XSLT processors that you can use. It doesn&amp;rsquo;t matter much from the user&amp;rsquo;s perspective, because it&amp;rsquo;s all under the covers anyway. The key thing is the convenience of creating the DocBook version from OpenOffice with a simple Save As.&lt;/p&gt;
&lt;p&gt;On the down side, some nested bulleted lists in the original content did not show up in the DocBook version. I found this after converting the eventual DITA version of one of these documents to a PDF file with the DITA Open Toolkit and skimming through the original Word file and the new PDF to do a block-by-block comparison. (I strongly recommend this QA step if you&amp;rsquo;re doing this conversion with important content.) Many bulleted lists got converted to numbered lists as well, although I&amp;rsquo;m not sure if this was the fault of the Word to DocBook conversion or of a later stage described below. Another small issue is that when the original had more than one space character in a row, all but one got converted to hard spaces to maintain the spacing in XML. I just deleted all the hard spaces from the DITA version with a global replace, but you may want to keep them, depending on how the documents use them.&lt;/p&gt;
&lt;p&gt;Typical Word users add space between paragraphs by inserting an extra carriage return, instead of adjusting the styles included with document, so your output from this conversion step might have a lot of empty &lt;code&gt;para&lt;/code&gt; elements. You can delete this with a simple XSLT stylesheet or even a global replace in a text editor.&lt;/p&gt;
&lt;h2 id=&#34;id103394&#34;&gt;Adding the images&lt;/h2&gt;
&lt;p&gt;One annoying detail was that the DocBook files created by OpenOffice lack references to the images. When you save a Word file as an OpenOffice native odt (that is, zip) file, you can see that the content.xml file in there has simple, straightforward references to image files that are also in the zip file. The references look like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;lt;draw:frame draw:style-name=&amp;quot;fr1&amp;quot; draw:name=&amp;quot;graphics63&amp;quot; 
  text:anchor-type=&amp;quot;as-char&amp;quot; svg:width=&amp;quot;6.8972in&amp;quot; svg:height=&amp;quot;2.6264in&amp;quot; 
  draw:z-index=&amp;quot;49&amp;quot;&amp;gt;&amp;lt;draw:image 
  xlink:href=&amp;quot;Pictures/10000000000003430000013EC16739CA.png&amp;quot;
  xlink:type=&amp;quot;simple&amp;quot; xlink:show=&amp;quot;embed&amp;quot; 
  xlink:actuate=&amp;quot;onLoad&amp;quot;/&amp;gt;&amp;lt;/draw:frame&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;(I had created the original image in the Word file by pasting it from somewhere else, so the conversion of each to a standalone png file was a nice bonus.) OpenOffice&amp;rsquo;s Save as DocBook feature doesn&amp;rsquo;t save these image references; the DocBook 4.1.2 version of the above that it creates looks like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;lt;inlinegraphic fileref=&amp;quot;embedded:graphics63&amp;quot; 
    width=&amp;quot;6.8972inch&amp;quot; depth=&amp;quot;2.6264inch&amp;quot;/&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;(Note that DocBook 5 &lt;a href=&#34;http://www.docbook.org/tdg/en/html/inlinegraphic.html&#34;&gt;deprecates&lt;/a&gt; the &lt;code&gt;inlinegraphic&lt;/code&gt; element.) After no luck tinkering with the sofftodocbookheadings.xsl stylesheet included with OpenOffice to create the DocBook file, I replaced its contents with an identity transformation to see what it was using as input. It turned out that it wasn&amp;rsquo;t using the original content.xml file mentioned above but some intermediary file that had replaced the &lt;code&gt;xlink:href&lt;/code&gt; value above with a child element that stored the actual content of the image, like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;lt;draw:image draw:style-name=&amp;quot;fr1&amp;quot; draw:name=&amp;quot;graphics63&amp;quot;
            text:anchor-type=&amp;quot;as-char&amp;quot; svg:width=&amp;quot;6.8972inch&amp;quot;
            svg:height=&amp;quot;2.6264inch&amp;quot; draw:z-index=&amp;quot;49&amp;quot;&amp;gt;
  &amp;lt;office:binary-data&amp;gt;iVBORw0KGgoAAAANSUhEUgAAA0MAAAE+CAIAAADAgVy 
   &amp;lt;!-- lots more data here--&amp;gt;&amp;lt;/office:binary-data&amp;gt;
&amp;lt;/draw:image&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;At least the &lt;code&gt;draw:name&lt;/code&gt; value of the &lt;code&gt;draw:image&lt;/code&gt; element&amp;rsquo;s parent &lt;code&gt;draw:frame&lt;/code&gt; element gets preserved in the DocBook output as the value of the &lt;code&gt;fileref&lt;/code&gt; attribute, so instead of digging intp OpenOffice&amp;rsquo;s architecture to see what was preparing the input for sofftodocbookheadings.xsl and trying to fix that, I wrote a &lt;a href=&#34;http://www.snee.com/bobdc.blog/files/getImageNameData.xsl&#34;&gt;getImageNameData.xsl&lt;/a&gt; stylesheet to pull the {&lt;code&gt;draw:name&lt;/code&gt;, &lt;code&gt;xlink:href&lt;/code&gt;} pairings from the original content.xml file. Then, I wrote an &lt;a href=&#34;http://www.snee.com/bobdc.blog/files/addImageRefs.xsl&#34;&gt;addImageRefs.xsl&lt;/a&gt; stylesheet to look up the image filenames in the getImageNameData.xsl output and insert them into a new copy of the DocBook file.&lt;/p&gt;
&lt;h2 id=&#34;id103535&#34;&gt;DocBook to DITA&lt;/h2&gt;
&lt;p&gt;Eric Hennum describes a docbook2dita plugin for the DITA Open Toolkit in &lt;a href=&#34;http://markmail.org/message/gd4r4elqmmmqcb2w&#34;&gt;this posting&lt;/a&gt; on a DocBook list. My first attempt to use it from within the DITA Open Toolkit resulted in the errors discussed in a DITA group thread that ends with &lt;a href=&#34;http://tech.groups.yahoo.com/group/dita-users/message/14620&#34;&gt;this posting&lt;/a&gt; from Mark Peters, who came up with a very simple solution: instead of running the conversion as a plugin, just call the XSLT stylesheet included with the plugin directly and tell it where your input is and where the output should go. The basic form of the command line that he shows worked for me.&lt;/p&gt;
&lt;h2 id=&#34;id103570&#34;&gt;Testing it&lt;/h2&gt;
&lt;p&gt;The first test to pass was whether the result was valid to a DITA DTD, and that went fine. The second test was the big one: whether the HTML and PDF created from the document by the DITA Open Toolkit looked right. In general it did, except for the issues described above, which showed that a block-by-block comparison of each PDF with the original Word file is worth the trouble. If I had to do a large amount of these conversions I&amp;rsquo;d dig deeper into the nested bulleted list and bulleted/numbered list issues in the hopes of reducing the need for this final manual step.&lt;/p&gt;
&lt;p&gt;So far, though, the automation steps that I found or put together are definitely saving me tons of potential manual work. I only had to do this to a few documents, so I didn&amp;rsquo;t mind executing each step one a time, but if you want to use OpenOffice to convert a large amount of documents, I wrote something in XML.com called &lt;a href=&#34;http://www.xml.com/pub/a/2006/01/11/from-microsoft-to-openoffice.html&#34;&gt;Moving to OpenOffice: Batch Converting Legacy Documents&lt;/a&gt; a few years ago that should help.&lt;/p&gt;
&lt;h2 id=&#34;3-comments&#34;&gt;3 Comments&lt;/h2&gt;
&lt;p&gt;By David Kelly on &lt;a href=&#34;#comment-2371&#34;&gt;November 20, 2009 12:47 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Bob,&lt;/p&gt;
&lt;p&gt;While we haven&amp;rsquo;t documented it, Scriptorium did a web presentation not long ago on a similar conversion process using the same tools you use. I have put together an Ant script that controls the processing chain from beginning to end, and we also added a bursting script that takes the large DITA file output and creates individual topic files with a ditamap to hold them all together.&lt;/p&gt;
&lt;p&gt;Some processing we do in Ant includes fixing Unicode by wrapping &amp;amp;#x and ; around the 4-digit code from the \unnnn instances in Word. Also, I wrote an XSL script that fixes autonumbering in the OO XML document before it gets converted to DocBook. It uses an identity transform with an exception that looks for this:&lt;/p&gt;
&lt;p&gt;text:list[contains(@text:style-name,&amp;lsquo;Outline&amp;rsquo;)]&lt;/p&gt;
&lt;p&gt;In the transform, it keeps the descendant text:h tags and discards the text:list tags. Autonumbered sections in Word cause the Docbook-to-DITA script not to pick up the headings, so no topics are output.&lt;/p&gt;
&lt;p&gt;I have used this process for 500-page Word documents, and it appears to be reliable, for the most part. Large tables slow it down considerably. Occasionally we run into Word styles that cause problems in the OO-to-DocBook conversion, so you are right, the results must be checked carefully. But as a conversion method, it sure beats cut and paste.&lt;/p&gt;
&lt;p&gt;Glad to see that great minds think alike!\&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-2372&#34;&gt;November 20, 2009 12:52 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Thanks David! I wrote out my notes to help others who may try something like this in the future, and your comments will definitely be a further help to them.&lt;/p&gt;
&lt;p&gt;By Jeroen Baten on &lt;a href=&#34;#comment-2446&#34;&gt;February 12, 2010 3:52 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;David, would you be kind enough to post the ant script itself? It would help me greatly in starting the toolchain!&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2009">2009</category>
      
      <category domain="https://www.bobdc.com//categories/dita">DITA</category>
      
    </item>
    
    <item>
      <title>Simple semi-structured data entry</title>
      <link>https://www.bobdc.com/blog/simple-semi-structure-data-ent/</link>
      <pubDate>Wed, 11 Nov 2009 20:48:55 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/simple-semi-structure-data-ent/</guid>
      
      
      <description><div>With RDF.</div><div>&lt;p&gt;When most people want to take notes on a collection of things, and they know that the notes will have some structure but they&amp;rsquo;re not sure about the nature of that structure just yet, they use a spreadsheet. For each thing that they take notes on, they add a new row; for each attribute of the things under review, they add a column. From an investment banker comparing potential investments to a scout leader planning a camping trip, the grid makes it easy for you to compare similar attributes of different things without forcing to you to specify all of your attributes before starting your data entry like a more serious database application would.&lt;/p&gt;
&lt;p&gt;In theory, RDF is ideal for this, because you can assign any attribute name/value pair to any resource that you can identify with no requirement to plan it all in advance, but in practice, it&amp;rsquo;s rarely as easy as pouring names and numbers into a spreadsheet. I&amp;rsquo;ve often thought that it would be fun to build a freeform database program that lets people do data entry and make up new fields as they go along, all with RDF underneath. I even wrote some Python code for this a few years ago, but never followed through. Since joining TopQuadrant, I&amp;rsquo;ve wondered about assembling something like this with the company&amp;rsquo;s application development tools, but then I realized that the &lt;a href=&#34;http://www.topquadrant.com/products/TB_Composer.html#free&#34;&gt;Free Edition&lt;/a&gt; of TopBraid Composer pretty much already does this.&lt;/p&gt;
&lt;p&gt;Here&amp;rsquo;s a use case that&amp;rsquo;s happened to most people in the modern workforce: you&amp;rsquo;re told that you&amp;rsquo;ll be joining a particular project, and to get you started someone emails a zip file of relevant files for you to review. For my notes on these files, I might create a text file or a spreadsheet, but I&amp;rsquo;d probably assemble an XML file where I made up element names as I went along. These elements would track the filename, document title, author, age, comments, and probably some project-specific fields. When the big picture starting coming into focus, I&amp;rsquo;d write a little XSLT to convert this XML to presentable HTML to show to others if necessary.&lt;/p&gt;
&lt;p&gt;A key reason that this would be easy for me is that the Emacs &lt;a href=&#34;http://www.thaiopensource.com/nxml-mode/&#34;&gt;nxml mode&lt;/a&gt; automates much of the work of entering tags and keeping everything well-formed. How would doing it in RDF be better? I could do the same steps as above using &lt;a href=&#34;http://www.xml.com/pub/a/2002/10/30/rdf-friendly.html&#34;&gt;RDF-Friendly XML&lt;/a&gt; and nxml&amp;rsquo;s excellent handling of RDF/XML, but I&amp;rsquo;d rather use a form-based interface instead of Emacs. This is where the free edition of TopBraid Composer comes in.&lt;/p&gt;
&lt;p&gt;The first step is creating an RDF data file with all the easily available file metadata: the name, size, and last modification date for each file. I wrote a simple perl script called &lt;a href=&#34;http://www.snee.com/bobdc.blog/files/dir2rdf-pl.txt&#34;&gt;dir2rdf.pl&lt;/a&gt; to do this; it&amp;rsquo;s simple because it declares a File class and all the properties for that class in the namespace declared for the file. (I also created a slightly more complex perl script called &lt;a href=&#34;http://www.snee.com/bobdc.blog/files/dir2nfordf-pl.txt&#34;&gt;dir2nfordf.pl&lt;/a&gt; which does the same thing but uses existing classes and properties from the &lt;a href=&#34;http://www.semanticdesktop.org/ontologies/2007/03/22/nfo/&#34;&gt;NEPOMUK File Ontology&lt;/a&gt;. It&amp;rsquo;s more complex because this ontology has properties based on properties from other vocabularies such as Dublin Core, so editing data with this ontology means pulling in a few layers of other ones.)&lt;/p&gt;
&lt;p&gt;When you pipe the result of the Windows &lt;code&gt;dir&lt;/code&gt; command into the simpler perl script, it outputs the property and class definitions for the files and an entry like this for each file:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;  &amp;lt;File rdf:ID=&#39;file11&#39; sd:lastModified=&#39;2009-10-30T17:05:00&#39;
        sd:fileName=&#39;teams.csv&#39; sd:fileSize=&#39;164&#39; rdfs:comment=&#39;&#39;/&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Loaded into the free edition of TopBraid Composer, the editing of that &amp;ldquo;record&amp;rdquo; looks like this (I&amp;rsquo;ve rearranged the combination of screen sections a bit from the default TopBraid &amp;ldquo;perspective&amp;rdquo;, to use the Eclipse parlance):&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://www.bobdc.com/img/main/rdfdataentry1.jpg&#34;&gt;&lt;img id=&#34;id103434&#34; src=&#34;https://www.bobdc.com/img/main/rdfdataentry1.jpg&#34; alt=&#34;TopBraid Composer screen shot&#34; width=&#34;500&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I can edit the values on this form, although there&amp;rsquo;s no reason to edit the file name, size, or last modified values. What I&amp;rsquo;m really going to do is add notes to the rdfs:comment property, as I&amp;rsquo;ve already done above, and perhaps add more comment properties for this resource. The really nice part is that I can define new properties in the Properties view on the right—for example, some project-specific subproperties of rdfs:comment—drag them onto the form for any of my File resources, and then add values to them, giving me the functional equivalent of adding new columns to a spreadsheet.&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s actually better than that, because if I wanted to add three contactWithQuestions names to one of these File resources on a spreadsheet grid, I&amp;rsquo;d have to either add three columns or string together three values in one spreadsheet cell as if they were one. With RDF, though, I can define a contactWithQuestions property and then add three separate values for this property to the same resource. Moving beyond the use of simple string data for the values here, I could create object properties (properties where the value is another resource—in this case, to define relationships between File objects such as mentionedIn or basedOn) by defining them in the Properties view on the right with a range of File. When I want to assign one of these properties to a particular File object, I would drag it from the property list on the right onto the Resource form for that File and then pick out the appropriate file it refers to from a drop-down list. For example, after creating a mentionedIn property, if teams.csv was mentioned in index.html and I wanted to record this in my notes on teams.csv, I&amp;rsquo;d drag the mentionedIn property onto the Resource Form for teams.csv and select index.html as the value for that property.&lt;/p&gt;
&lt;p&gt;Because this is a GUI editing interface, I can also add and delete new File resources (the equivalent of inserting and deleting rows on a spreadsheet) by clicking icons on the Instances view at the bottom. (Another nice bonus with TopBraid Composer is the SPARQL tab next to that, where you can enter and run SPARQL queries about the data.)&lt;/p&gt;
&lt;p&gt;So, I&amp;rsquo;ve got my form-driven interface that I can use with any RDF data. I&amp;rsquo;ve kept my address book in RDF for a long time; maybe I should try maintaining it like this instead of with Emacs.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2009">2009</category>
      
      <category domain="https://www.bobdc.com//categories/rdf">RDF</category>
      
    </item>
    
    <item>
      <title>Up and running with Mercurial</title>
      <link>https://www.bobdc.com/blog/up-and-running-with-mercurial/</link>
      <pubDate>Mon, 26 Oct 2009 09:50:12 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/up-and-running-with-mercurial/</guid>
      
      
      <description><div>Quick and easy.</div><div>&lt;p&gt;&lt;a href=&#34;http://mercurial.selenic.com/wiki/&#34;&gt;&lt;img id=&#34;id103283&#34; src=&#34;http://www.selenic.com/hg-logo/logo-droplets-200.png&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; width=&#34;100px&#34; alt=&#34;mercurial logo&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve used the cvs and svn version control systems for both work-related and personal projects. For personal work, I used svn in particular more as a backup program, with the added benefit of the version control. Keeping my repository on a thumb drive made it easy to perform the backups when traveling, but perhaps because of sloppiness in removing the thumb drive without clicking the right icons first, my repository got corrupted too often, so I gave up.&lt;/p&gt;
&lt;p&gt;I decided to try again with &lt;a href=&#34;http://mercurial.selenic.com/wiki/&#34;&gt;Mercurial&lt;/a&gt; and was shocked at how quickly I was able to learn it and get it to do everything I wanted—about an hour. &lt;a href=&#34;http://importantshock.wordpress.com/2008/08/07/git-vs-mercurial/&#34;&gt;This blog posting&lt;/a&gt; convinced me to try it before &lt;a href=&#34;http://git-scm.com/&#34;&gt;git&lt;/a&gt;, which sounds fascinating but a bit more complicated. By keeping my repository on the local drive and using the clone feature to keep backup copies of the repository elsewhere, I can redo a backup if a thumb drive version gets messed up.&lt;/p&gt;
&lt;p&gt;The &lt;a href=&#34;http://mercurial.selenic.com/wiki/QuickStart&#34;&gt;Mercurial Quick Start&lt;/a&gt; lives up to its name, and I kept some notes as I went along to provide my own Mercurial quick reference:&lt;/p&gt;
&lt;hr /&gt;
&lt;p&gt;&lt;code&gt;hg init&lt;/code&gt;                                Turn current directory into a project.
&lt;code&gt;hg add```                               Add files in current directory to repository. &lt;/code&gt;hg ci -m &amp;ldquo;comment about this commit&amp;rdquo;&lt;code&gt;  Commit recent changes to repository.&lt;/code&gt;hg clone . e:\otherCopy&lt;code&gt;               Create a clone of the current directory&#39;s repository somewhere else.&lt;/code&gt;hg push e:\otherCopy &lt;code&gt;                 Send recent changes in this directory&#39;s repository to a clone repository (that is, back up the changes here to there).&lt;/code&gt;update&lt;code&gt;                                (entered from within e:\\otherCopy directory) Make clone directory&#39;s contents reflect recent changes to clone repository.&lt;/code&gt;hg log test1.txt&lt;code&gt;                      List comments (see -m above) for each of test1.txt file&#39;s changes.&lt;/code&gt;hg revert -r 1 test1.txt &lt;code&gt;             Revert file test1.txt to revision 1. (You can then &amp;quot;revert&amp;quot; it to later versions.&lt;/code&gt;hg cat -r 2 test1.txt &lt;code&gt;                Look at version 2 of test1.txt.&lt;/code&gt;hg locate &lt;em&gt;foo&lt;/em&gt;`                        List files in repository with &amp;ldquo;foo&amp;rdquo; in their names.&lt;/p&gt;
&lt;hr /&gt;
&lt;p&gt;&lt;del&gt;One other note: an .hgignore file tells hg files which to ignore, and putting separate .htignore files in subdirectories of your main project directory works fine.&lt;/del&gt;&lt;/p&gt;
&lt;p&gt;I once had &lt;a href=&#34;https://www.bobdc.com/blog/dam-subversion-rdf-owl&#34;&gt;grand ideas&lt;/a&gt; about hooking up a version control system that can assign arbitrary metadata with an RDF triplestore to form the basis of some sort of CMS demo. Mercurial &lt;a href=&#34;http://markmail.org/thread/h66comox2nf4koay#query:mercurial%20metadata+page:1+mid:h66comox2nf4koay+state:results&#34;&gt;isn&amp;rsquo;t much help here&lt;/a&gt;, but when I prioritize the tasks &amp;ldquo;back up my stuff&amp;rdquo; and &amp;ldquo;build a demo CMS around a version control system&amp;rdquo; the former is clearly much more important. Maybe someday&amp;hellip;&lt;/p&gt;
&lt;h2 id=&#34;5-comments&#34;&gt;5 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://norman.walsh.name/&#34; title=&#34;http://norman.walsh.name/&#34;&gt;Norman Walsh&lt;/a&gt; on &lt;a href=&#34;#comment-2355&#34;&gt;October 26, 2009 7:00 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Submitted without comment: &lt;a href=&#34;http://www.whygitisbetterthanx.com/&#34;&gt;&lt;/a&gt;&lt;a href=&#34;http://www.whygitisbetterthanx.com/&#34;&gt;http://www.whygitisbetterthanx.com/&lt;/a&gt;.\&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob&lt;/a&gt; on &lt;a href=&#34;#comment-2356&#34;&gt;October 26, 2009 7:46 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;In my entry above I wrote that I &amp;ldquo;was shocked at how quickly I was able to learn it and get it to do everything I wanted—about an hour.&amp;rdquo; I exaggerated a bit; it was closer to an hour and half. On September 24th at about 11:30 AM GMT I told Norm &amp;ldquo;I&amp;rsquo;m going to try Mercurial now&amp;rdquo;. At about 1 PM I told him &amp;ldquo;I tried it and really liked it&amp;rdquo;. And now he tells my about whygitisbetterthanx?\&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://dirkjan.ochtman.nl/&#34; title=&#34;http://dirkjan.ochtman.nl/&#34;&gt;Dirkjan Ochtman&lt;/a&gt; on &lt;a href=&#34;#comment-2359&#34;&gt;October 27, 2009 4:40 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Don&amp;rsquo;t believe the git hype&amp;hellip;&lt;/p&gt;
&lt;p&gt;BTW, you name &amp;lsquo;.htignore&amp;rsquo; here, where I believe you mean &amp;lsquo;.hgignore&amp;rsquo;, and actually putting those in subdirectories of a repo doesn&amp;rsquo;t work (not sure why you have the impression that it does).&lt;/p&gt;
&lt;p&gt;Actually Mercurial&amp;rsquo;s filelog structure has a neat metadata mechanism built-in where you could store versioned metadata per file, and the changelog (where the changeset history graph is stored) can also hold arbitrary bits of metadata, so you might be able to leverage those capabilities to do some interesting things.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob&lt;/a&gt; on &lt;a href=&#34;#comment-2360&#34;&gt;October 27, 2009 9:53 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Thanks Dirkjan. I fixed the .hgignore filename and then struck out that sentence. I guess my opinion was based on a quick test where I wasn&amp;rsquo;t paying enough attention.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob&lt;/a&gt; on &lt;a href=&#34;#comment-2366&#34;&gt;November 11, 2009 8:11 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;This is a test&amp;hellip;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2009">2009</category>
      
      <category domain="https://www.bobdc.com//categories/miscellaneous">miscellaneous</category>
      
    </item>
    
    <item>
      <title>Blogging on TopQuadrant&#39;s Blog</title>
      <link>https://www.bobdc.com/blog/blogging-on-topquadrants-blog/</link>
      <pubDate>Wed, 14 Oct 2009 19:59:02 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/blogging-on-topquadrants-blog/</guid>
      
      
      <description><div>In addition to here.</div><div>&lt;p&gt;I just added my first entry to TopQuadrant&amp;rsquo;s blog, &lt;a href=&#34;http://topquadrantblog.blogspot.com/&#34;&gt;Voyages of the Semantic Enterprise&lt;/a&gt;. It&amp;rsquo;s called &lt;a href=&#34;http://topquadrantblog.blogspot.com/2009/10/spin-tutorial-available.html&#34;&gt;SPIN Tutorial Available&lt;/a&gt;, and describes the tutorial I recently finished writing on using the SPARQL Inferencing Notation with TopBraid Composer.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ll be adding more to that blog in the future and certainly continuing with this one here, keeping the entries that focus on TopQuadrant technology over there. I&amp;rsquo;ll put the others—including more general interest entries on the semantic web, SPARQL, and RDF—right here.&lt;/p&gt;
&lt;p&gt;Thanks for reading either!&lt;/p&gt;
&lt;h2 id=&#34;1-comments&#34;&gt;1 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://piershollott.blogspot.com&#34; title=&#34;http://piershollott.blogspot.com&#34;&gt;Piers Hollott&lt;/a&gt; on &lt;a href=&#34;#comment-2351&#34;&gt;October 15, 2009 4:41 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;But will your entries on TopQuadrant&amp;rsquo;s blog be marked up with Dublin Core RDFa? (ah, blogspot)&lt;/p&gt;
&lt;p&gt;Looking forward to more on either venue. It&amp;rsquo;s great that TopQuadrant is encouraging their consultants to blog. Kudos!&lt;/p&gt;
&lt;p&gt;Cheers,&lt;br /&gt;
Piers&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2009">2009</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
      <category domain="https://www.bobdc.com//categories/blogging-about-blogging">blogging about blogging</category>
      
      <category domain="https://www.bobdc.com//categories/semantic-web">semantic web</category>
      
    </item>
    
    <item>
      <title>A rules language for RDF</title>
      <link>https://www.bobdc.com/blog/a-rules-language-for-rdf/</link>
      <pubDate>Thu, 01 Oct 2009 12:45:05 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/a-rules-language-for-rdf/</guid>
      
      
      <description><div>Right under our noses.</div><div>&lt;p&gt;Last May, in &lt;a href=&#34;https://www.bobdc.com/blog/adding-semantics-to-make-data&#34;&gt;Adding semantics to make data more valuable: the secret revealed&lt;/a&gt;, I showed how storing a little bit of semantics about the word &amp;ldquo;spouse&amp;rdquo;—the fact that it&amp;rsquo;s a symmetric property (that is, that if A is the spouse of B, then B is the spouse of A)—let me look up someone&amp;rsquo;s home phone number in my address book even if my entry for him there lacks his home phone number. I like this story because unlike biotech and some of the other popular domains for Semantic Web technology, everyone has an address book and understands the basic properties of an entry: first name, last name, email address, and so forth. (Because so many people have lived through the annoyances of moving their contact information from one email client or phone to another, address books also provide nice use cases for data integration issues.)&lt;/p&gt;
&lt;p&gt;Back then, I wrote:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;With software that understands an OWL expression stating that &lt;code&gt;spouse&lt;/code&gt; is a symmetric property and a rule I define to say that spouses have the same home phone number, I can retrieve Leroy&amp;rsquo;s home phone number&amp;hellip;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;OWL is great for defining the symmetry, but I glossed over the part about defining the fact that spouses have the same phone number. How do you define such a rule? n3 has a &lt;a href=&#34;http://www.w3.org/2000/10/swap/doc/Rules&#34;&gt;rules language&lt;/a&gt;, but I haven&amp;rsquo;t seen it used much as the n3 subset known as &lt;a href=&#34;http://www.w3.org/TeamSubmission/turtle/&#34;&gt;Turtle&lt;/a&gt; (which leaves out such things) becomes more popular. Instead of defining a Semantic Web rules language, the W3C has decided to have the &lt;a href=&#34;http://www.w3.org/2005/rules/wiki/RIF_Working_Group&#34;&gt;Rules Interchange Format Working Group&lt;/a&gt; standardize an interchange format between the &lt;a href=&#34;http://www.w3.org/2005/rules/wg/wiki/List_of_Rule_Systems&#34;&gt;many rules languages&lt;/a&gt; out there. (The &lt;a href=&#34;http://ontolog.cim3.net/file/resource/presentation/ChrisWelty_20080612/W3C-Rules-Interchange-Format--ChrisWelty_20080612.ppt&#34;&gt;W3C Rules Interchange Format Basic Logic Dialect&lt;/a&gt; PowerPoint presentation by WG co-chair Chris Welty provides good historical background.)&lt;/p&gt;
&lt;blockquote id=&#34;id103392&#34; class=&#34;pullquote&#34;&gt;I can write a query that generates the triples I want to infer and call this query a &#34;rule&#34;, but what do I do with it? &lt;/blockquote&gt;
&lt;p&gt;I&amp;rsquo;ve used a proprietary RDF rules language before, and was wondering if a standard one would come along. Some colleagues at TopQuadrant have shown me that we all have a straightforward, standardized RDF rules language right under our noses: SPARQL. I&amp;rsquo;ve been appreciating SPARQL&amp;rsquo;s CONSTRUCT form &lt;a href=&#34;https://www.bobdc.com/blog/appreciating-sparql-construct&#34;&gt;more lately&lt;/a&gt;, and CONSTRUCT is the key here: like a SELECT statement, a CONSTRUCT statement defines conditions about which pieces of which triples to retrieve, but unlike SELECT, a CONSTRUCT statement assembles these into new triples. If we view a CONSTRUCT statement as the definition of a rule and the resulting new triples as the result of the execution of the rule, then we have a rules language and plenty of implementations of it available.&lt;/p&gt;
&lt;p&gt;For example, the following SPARQL &amp;ldquo;rule&amp;rdquo; says that if &lt;code&gt;?person1&lt;/code&gt; has the spouse &lt;code&gt;?person2&lt;/code&gt; and the home telephone number &lt;code&gt;?phoneNum&lt;/code&gt;, then &lt;code&gt;?person2&lt;/code&gt; also has the home telephone number &lt;code&gt;?phoneNum&lt;/code&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;PREFIX  : &amp;lt;http://www.snee.com/ns/demo#&amp;gt;
PREFIX v: &amp;lt;http://www.w3.org/2006/vcard/ns#&amp;gt;


CONSTRUCT { ?person2 v:homeTel ?phoneNum . }
WHERE {
  ?person1 :spouse   ?person2 ;
           v:homeTel ?phoneNum .
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;When run with the following data (for the purposes of this demo, assume that the {:leroy :spouse :loretta} triple was generated by an OWL reasoner that saw {:loretta :spouse :leroy} and knew that :spouse was symmetrical),&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;@prefix  : &amp;lt;http://www.snee.com/ns/demo#&amp;gt; .
@prefix v: &amp;lt;http://www.w3.org/2006/vcard/ns#&amp;gt; .
:loretta :spouse   :leroy ;
         v:homeTel &amp;quot;434-923-9321&amp;quot; .
:leroy   v:workTel &amp;quot;434-932-5329&amp;quot; ;
         :spouse   :loretta .
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;It generates the triple {:leroy v:homeTel &amp;ldquo;434-923-9321&amp;rdquo;}.&lt;/p&gt;
&lt;p&gt;OK, so I can write a query that generates the triples I want to infer and call this query a &amp;ldquo;rule&amp;rdquo;, but what do I do with it? What makes it a rule about a particular set of data?&lt;/p&gt;
&lt;p&gt;Holger Knublauch, a co-worker of mine who designed and developed the OWL plugin for &lt;a href=&#34;http://protege.stanford.edu/&#34;&gt;Protégé&lt;/a&gt; before coming to TopQuadrant, recently wrote an RDF vocabulary called SPIN (&amp;ldquo;SPARQL Inferencing Notation&amp;rdquo;), which—among other things—can express associations between these rules and classes. So, for example, if the blank node rdf:_1 pointed to the query above, the following triple would associate this query rule to the v:Address class:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;  v:Address spin:rule rdf:_1
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;To make the storage of the SPARQL rule in a triplestore even cleaner, Holger has implemented a way to &lt;a href=&#34;http://spinrdf.org/sp.html&#34;&gt;store SPARQL queries as triples&lt;/a&gt;, and he&amp;rsquo;s written the code to roundtrip between this and the standard text version. (See the &lt;a href=&#34;http://sparqlpedia.org/spinrdfconverter.html&#34;&gt;SPARQL Text to SPIN RDF Syntax Converter&lt;/a&gt; for an online converter, and see &lt;a href=&#34;http://www.spinrdf.org/&#34;&gt;spinrdf.org&lt;/a&gt; for more about what else the SPIN vocabulary can do, especially his blog entries as he developed it. I&amp;rsquo;m now finishing up a tutorial for the use of SPIN features in TopQuadrant products, and except for one optional step of the tutorial, it all works with the free version of TopBraid Composer.)&lt;/p&gt;
&lt;p&gt;When you take it a little further, symmetrical properties and many other parts of OWL can also be implemented with SPARQL queries, and there&amp;rsquo;s a lot going on among those who are doing this to find a sweet spot between RDFS and OWL Full that meets typical business needs without using a lot of processing power or dollars.&lt;/p&gt;
&lt;h2 id=&#34;6-comments&#34;&gt;6 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.openlinksw.com/blog/~kidehen&#34; title=&#34;http://www.openlinksw.com/blog/~kidehen&#34;&gt;Kingsley Idehen&lt;/a&gt; on &lt;a href=&#34;#comment-2341&#34;&gt;October 1, 2009 3:57 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Bob,&lt;/p&gt;
&lt;p&gt;Yes, SPARQL CONSTRUCT is a rule language in its own right for controlled &amp;ldquo;forward chanining&amp;rdquo;. The optical illusion that many have missed is this: Other Rules Languages have Head and Body on a Horizontal Plane, while SPARQL CONSTRUCT&amp;rsquo;s plane is vertical :-)&lt;/p&gt;
&lt;p&gt;SPIN is neat formalization of the basic concept via a controlled vocabulary; certainly something we wil use, as yet another mechanism for showcasing this aspect of SPARQL esp. in the Virtuoso Sponger Middleware which is already a constained forward-chaining mechanism within the general non RDF to RDF processing pipeline.&lt;/p&gt;
&lt;p&gt;Kingsley&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.furia.com&#34; title=&#34;http://www.furia.com&#34;&gt;glenn mcdonald&lt;/a&gt; on &lt;a href=&#34;#comment-2342&#34;&gt;October 1, 2009 5:20 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;For a coincidental post about this same concept, query language as rules language, in another context, see &lt;a href=&#34;http://www.furia.com/page.cgi?type=log&amp;amp;id=330&#34;&gt;http://www.furia.com/page.cgi?type=log&amp;amp;id=330&lt;/a&gt; .&lt;/p&gt;
&lt;p&gt;Kingsley, what does your horizontal/vertical comment mean? What other Rules Langauges, and what do &amp;ldquo;horizontal&amp;rdquo; and &amp;ldquo;vertical&amp;rdquo; mean here?\&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.furia.com&#34; title=&#34;http://www.furia.com&#34;&gt;glenn mcdonald&lt;/a&gt; on &lt;a href=&#34;#comment-2343&#34;&gt;October 1, 2009 5:27 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Oh, and what makes this idea inherently a forward-chaining solution? Seems to me that structurally you can evaluate your query for all data ahead of time, or for individual nodes when asked. There are various reasons to want one or the other in particular uses, but I don&amp;rsquo;t immediately see how the expression of the rule in a query-language is better or worse suited for one direction.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.wikier.org/&#34; title=&#34;http://www.wikier.org/&#34;&gt;Sergio Fernández&lt;/a&gt; on &lt;a href=&#34;#comment-2345&#34;&gt;October 2, 2009 2:51 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Check:&lt;/p&gt;
&lt;p&gt;Axel Polleres. From SPARQL to rules (and back). In Proceedings of the 16th World Wide Web Conference (WWW2007), pages 787-796, Banff, Canada, May 2007. ACM Press. Extended technical report version available at &lt;a href=&#34;http://www.polleres.net/TRs/GIA-TR-2006-11-28.pdf,&#34;&gt;http://www.polleres.net/TRs/GIA-TR-2006-11-28.pdf,&lt;/a&gt; slides available at &lt;a href=&#34;http://www.polleres.net/publications/poll-2007www-slides.pdf.&#34;&gt;http://www.polleres.net/publications/poll-2007www-slides.pdf.&lt;/a&gt;\&lt;/p&gt;
&lt;p&gt;By Diana Roberts on &lt;a href=&#34;#comment-2347&#34;&gt;October 12, 2009 11:38 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;But isn&amp;rsquo;t it a problem that SPARQL rules, once immortalized, define a reality that can easily become outdated? The example you used here is a case in point.&lt;/p&gt;
&lt;p&gt;Several of my younger married friends in fact do not share the same telephone number because they don&amp;rsquo;t have a land line at home and instead use their own mobile numbers. What mechanisms are there or could there be to make sure that the semantic web keeps up with the changing times?&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob&lt;/a&gt; on &lt;a href=&#34;#comment-2348&#34;&gt;October 12, 2009 12:45 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;If you&amp;rsquo;re concerned about potential issues around the mapping of reality to a rule set&amp;ndash;a perfectly reasonable concern&amp;ndash;then you want to avoid rule set specifications that rely on binary formats (e.g. compiled code) or proprietary languages from vendors who offer rule set apps using rule languages that they made up themselves. SPARQL scores very well on both counts, being a W3C standard.&lt;/p&gt;
&lt;p&gt;Because the SPIN approach treats the SPARQL queries as more data to manage, the addition, removal, and modification of the rules is as straightforward as doing so with the data it&amp;rsquo;s querying. There&amp;rsquo;s no need to immortalize anything.&lt;/p&gt;
&lt;p&gt;In fact, the semantic web model is often more adaptable than others because of the greater ease of schema modification than you&amp;rsquo;ll find with relational databases or XML.&lt;/p&gt;
&lt;p&gt;So, a SPARQL-based system can do a fine job of keeping up with the changing times.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2009">2009</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
    </item>
    
    <item>
      <title>Converting wpl playlists to m3u playlists</title>
      <link>https://www.bobdc.com/blog/converting-wpl-playlists-to-m3/</link>
      <pubDate>Fri, 18 Sep 2009 22:25:15 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/converting-wpl-playlists-to-m3/</guid>
      
      
      <description><div>Simple XML in, simple text out, but no good search results for wpl2m3u? Write a little XSLT.</div><div>&lt;blockquote id=&#34;id103280&#34; class=&#34;pullquote&#34;&gt;After taking a closer look at the WPL format I realized that an XSLT stylesheet to convert it to M3U would be very simple.&lt;/blockquote&gt;
&lt;p&gt;I&amp;rsquo;ve switched around between music-playing programs over the last few years. I suppose I should call them &amp;ldquo;media players&amp;rdquo;, but I only use them to play music, which is part of the reason I ended up using &lt;a href=&#34;http://getsongbird.com/&#34;&gt;Songbird&lt;/a&gt;, an open source Windows/Linux/Mac music front end that doesn&amp;rsquo;t pretend to be anything else. It looks a bit like iTunes, without all the ads in your face; how great is that?&lt;/p&gt;
&lt;p&gt;Before that I used &lt;a href=&#34;http://www.mediamonkey.com/&#34;&gt;MediaMonkey&lt;/a&gt;, and before that, the Windows Media Player. Guess which of these uses the most standardized, XML-based format for playlists? Surprise: the Microsoft one.&lt;/p&gt;
&lt;p&gt;Windows Media Player can create WPL files, which seem to conform to the W3C &lt;a href=&#34;http://www.w3.org/TR/REC-smil/&#34;&gt;SMIL&lt;/a&gt; standard, and it can export M3U files, which MediaMonkey uses. To convert WPL files to m3u for Songbird, reading them individually into Windows Media Player and exporting them one at a time was annoying. I did some web searches for wpl2m3u and only found one script that I couldn&amp;rsquo;t quite follow, and after taking a closer look at the WPL format I realized that an XSLT stylesheet to convert it to M3U would be very simple. So here it is:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;lt;xsl:stylesheet version=&amp;quot;1.0&amp;quot; xmlns:xsl=&amp;quot;http://www.w3.org/1999/XSL/Transform&amp;quot;&amp;gt;


  &amp;lt;xsl:strip-space elements=&amp;quot;*&amp;quot;/&amp;gt;
  &amp;lt;xsl:output method=&amp;quot;text&amp;quot;/&amp;gt;


  &amp;lt;xsl:template name=&amp;quot;textAfterLastSlash&amp;quot;&amp;gt;&amp;lt;!-- but actually backslash --&amp;gt;
    &amp;lt;xsl:param name=&amp;quot;string&amp;quot;&amp;gt;dummy string&amp;lt;/xsl:param&amp;gt;
    &amp;lt;xsl:choose&amp;gt;
      &amp;lt;xsl:when test=&amp;quot;not(contains($string,&#39;\&#39;))&amp;quot;&amp;gt;
        &amp;lt;xsl:value-of select=&amp;quot;$string&amp;quot;/&amp;gt;
      &amp;lt;/xsl:when&amp;gt;
      &amp;lt;xsl:otherwise&amp;gt;
        &amp;lt;xsl:call-template name=&amp;quot;textAfterLastSlash&amp;quot;&amp;gt;
          &amp;lt;xsl:with-param name=&amp;quot;string&amp;quot; select=&amp;quot;substring-after($string,&#39;\&#39;)&amp;quot;/&amp;gt;
        &amp;lt;/xsl:call-template&amp;gt;
      &amp;lt;/xsl:otherwise&amp;gt;
    &amp;lt;/xsl:choose&amp;gt;
  &amp;lt;/xsl:template&amp;gt;


  &amp;lt;xsl:template match=&amp;quot;smil&amp;quot;&amp;gt;
    &amp;lt;xsl:text&amp;gt;#EXTM3U&amp;amp;#10;&amp;lt;/xsl:text&amp;gt;
    &amp;lt;xsl:apply-templates/&amp;gt;
  &amp;lt;/xsl:template&amp;gt;


  &amp;lt;xsl:template match=&amp;quot;media&amp;quot;&amp;gt;
    &amp;lt;xsl:text&amp;gt;#EXTINF:0,&amp;lt;/xsl:text&amp;gt;
    &amp;lt;xsl:call-template name=&amp;quot;textAfterLastSlash&amp;quot;&amp;gt;
      &amp;lt;xsl:with-param name=&amp;quot;string&amp;quot; select=&amp;quot;@src&amp;quot;/&amp;gt;
    &amp;lt;/xsl:call-template&amp;gt;
    &amp;lt;xsl:text&amp;gt;&amp;amp;#10;&amp;lt;/xsl:text&amp;gt;
    &amp;lt;xsl:value-of select=&amp;quot;@src&amp;quot;/&amp;gt;
    &amp;lt;xsl:text&amp;gt;&amp;amp;#10;&amp;amp;#10;&amp;lt;/xsl:text&amp;gt;
  &amp;lt;/xsl:template&amp;gt;


  &amp;lt;xsl:template match=&amp;quot;title&amp;quot;/&amp;gt;


&amp;lt;/xsl:stylesheet&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;It&amp;rsquo;s not very long, but if you want fancy XSLT, I have a recursive named template, which I wrote for something else but modified here to look for the text after the last backslash. The &amp;amp;#10; is a trick I&amp;rsquo;ve used more lately to get XSLT to output a carriage return, because if I put an actual carriage return inside of an xsl:text element like I always did before, telling Emacs to re-indent the whole thing tends to screw that up.&lt;/p&gt;
&lt;p&gt;With a long plane ride tomorrow night to go to Oxford for the &lt;a href=&#34;http://www.xmlsummerschool.com/&#34;&gt;XML Summer School&lt;/a&gt;, I want to load up the MP3 player with something conducive to sleeping, so I just converted my playlist of &lt;a href=&#34;http://en.wikipedia.org/wiki/Lata_Mangeshkar&#34;&gt;Lata Mangeshkar&lt;/a&gt; ballads so that I can put that on. (If you like classic Bollywood soundtracks, check out &lt;a href=&#34;http://thirdfloormusic.blogspot.com/&#34;&gt;Music from the Third Floor&lt;/a&gt;; if you&amp;rsquo;re new to it and interested, start with the &lt;a href=&#34;http://thirdfloormusic.blogspot.com/search/label/Compilations&#34;&gt;compilations&lt;/a&gt; there.)&lt;/p&gt;
&lt;h2 id=&#34;1-comments&#34;&gt;1 Comments&lt;/h2&gt;
&lt;p&gt;By V1AN1 on &lt;a href=&#34;#comment-2869&#34;&gt;May 13, 2011 7:08 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;This was very helpful!&lt;/p&gt;
&lt;p&gt;Here are some instructions to use this sucker:&lt;/p&gt;
&lt;p&gt;Download the Saxon HE XSLT converter:&lt;br /&gt;
&lt;a href=&#34;http://sourceforge.net/projects/saxon/files/Saxon-HE/9.3/saxonhe9-3-0-5j.zip/download&#34;&gt;http://sourceforge.net/projects/saxon/files/Saxon-HE/9.3/saxonhe9-3-0-5j.zip/download&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Now create a folder called wpltom3u (or of your choosing) and go into that folder.&lt;br /&gt;
Create two additional folders, one titled wpl and another titled m3u.&lt;br /&gt;
Now, put the .jar of Saxon XSLT into the default (wpltom3u) folder. MAKE SURE YOU HAVE JAVA INSTALLED!&lt;br /&gt;
Now create a text file in the default (wpltom3u) directory and rename it to style.xsl, then edit it with notepad and paste in the XSLT code posted above.&lt;br /&gt;
Create another file and rename it to convert.bat.&lt;br /&gt;
Now edit convert.bat and put in this following code:&lt;/p&gt;
&lt;p&gt;for %%a in (wpl/*.wpl) do java -jar saxon9he.jar &amp;ldquo;wpl/%%a&amp;rdquo; &amp;ldquo;style.xsl&amp;rdquo; &amp;gt;&amp;ldquo;m3u/%%~na.m3u&amp;rdquo;&lt;br /&gt;
pause&lt;/p&gt;
&lt;p&gt;Save it and exit. Now put all your .wpl playlist files into the wpl folder, and hit Convert! PRESTO! You now have all your wpl playlists converted to m3u. :D&lt;/p&gt;
&lt;p&gt;It will take a while to convert it automatically and to rename them correctly but you wont have to do anything but double click and wait. :)&lt;/p&gt;
&lt;p&gt;Hope this helps people..&lt;/p&gt;
&lt;p&gt;&lt;br /&gt;
Ah, and a side note:&lt;br /&gt;
some playlists fail to convert due to their names. If they do, write down their names and then rename them and put them back into the wpl folder. Don&amp;rsquo;t reconvert all your playlists again, this code isn&amp;rsquo;t that smart. Just single out those playlists that didn&amp;rsquo;t convert and leave them in the wpl folder, then rename them to have no symbols in them, and convert!&lt;/p&gt;
&lt;p&gt;Cheers!&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2009">2009</category>
      
      <category domain="https://www.bobdc.com//categories/xslt">XSLT</category>
      
      <category domain="https://www.bobdc.com//categories/music">music</category>
      
    </item>
    
    <item>
      <title>Appreciating SPARQL CONSTRUCT more</title>
      <link>https://www.bobdc.com/blog/appreciating-sparql-construct/</link>
      <pubDate>Wed, 09 Sep 2009 19:33:39 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/appreciating-sparql-construct/</guid>
      
      
      <description><div>Another way to get more out of your data.</div><div>&lt;p&gt;As with SQL, SPARQL&amp;rsquo;s most popular verb is SELECT. It lets you request the data you want from a collection, whether you&amp;rsquo;re asking for a single phone number or you want a list of first and last names and phone numbers of all employees hired after January 1st, sorted by last name.&lt;/p&gt;
&lt;blockquote id=&#34;id103299&#34; class=&#34;pullquote&#34;&gt;CONSTRUCT provides a nice example of how SPARQL is more than a query language; along with extracting data using queries, you can create useful new data as well.&lt;/blockquote&gt;
&lt;p&gt;In SPARQL, SELECT is actually known as a &lt;a href=&#34;http://www.w3.org/TR/2008/REC-rdf-sparql-query-20080115/#QueryForms&#34;&gt;query form&lt;/a&gt;, and another is CONSTRUCT. According to the &lt;a href=&#34;http://www.w3.org/TR/2008/REC-rdf-sparql-query-20080115/&#34;&gt;SPARQL Query Language for RDF&lt;/a&gt; W3C Recommendation, CONSTRUCT returns a graph—a set of triples. I had thought of CONSTRUCT as a way of pulling a set of triples out of a triplestore, especially a remote triplestore, but while reviewing some TopQuadrant training material I realized how handy CONSTRUCT can be to create useful new triples.&lt;/p&gt;
&lt;p&gt;For example, let&amp;rsquo;s say you have the following triples written in Turtle syntax to identify the gender and parent/child relationships of a few people:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;@prefix : &amp;lt;http://www.snee.com/ns/demo#&amp;gt; .


:jane :hasParent :gene .
:gene :hasParent :pat ;
      :gender    :female .
:joan :hasParent :pat ;
      :gender    :female . 
:pat  :gender    :male .
:mike :hasParent :joan .
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The following CONSTRUCT statement creates new triples based on the ones above to specify who is who&amp;rsquo;s grandfather:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;PREFIX : &amp;lt;http://www.snee.com/ns/demo#&amp;gt; 


CONSTRUCT { ?p :hasGrandfather ?g . }


WHERE {?p      :hasParent ?parent .
       ?parent :hasParent ?g .
       ?g      :gender    :male .
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;When I ran this query with the data above, &lt;a href=&#34;http://jena.sourceforge.net/ARQ/&#34;&gt;ARQ&lt;/a&gt; returned the newly constructed triples in Turtle format:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;@prefix :        &amp;lt;http://www.snee.com/ns/demo#&amp;gt; .


:jane
      :hasGrandfather  :pat .


:mike
      :hasGrandfather  :pat .
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;From the same little data file, we can generate triples about who is who&amp;rsquo;s aunt:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;PREFIX : &amp;lt;http://www.snee.com/ns/demo#&amp;gt; 


CONSTRUCT { ?p :hasAunt ?aunt . }


WHERE {?p      :hasParent ?parent .
       ?parent :hasParent ?g .
       ?aunt   :hasParent ?g ;
               :gender    :female .


FILTER (?parent != ?aunt)  
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;With this query, ARQ constructs these triples:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;@prefix :        &amp;lt;http://www.snee.com/ns/demo#&amp;gt; .


:jane
      :hasAunt      :joan .


:mike
      :hasAunt      :gene .
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This isn&amp;rsquo;t really creating new information, but the ability to make implicit information explicit can certainly add value to a system, especially when the rules necessary to assemble the pieces are more complicated than the ones shown above for identifying grandfathers and aunts.&lt;/p&gt;
&lt;p&gt;How you use your newly constructed triples depends on how your SPARQL engine gives them to you. As we saw above, ARQ writes them out in Turtle syntax. TopQuadrant&amp;rsquo;s TopBraid Composer displays them in the window used for SPARQL query output, and after you select one or more of them, the &amp;ldquo;Assert selected constructed triples&amp;rdquo; menu choice adds them to the graph of triples that you&amp;rsquo;re currently working with. (This works in the &lt;a href=&#34;http://www.topquadrant.com/products/TB_Composer.html#free&#34;&gt;free edition&lt;/a&gt; as well.)&lt;/p&gt;
&lt;p&gt;CONSTRUCT provides a nice example of how SPARQL is more than a query language; along with extracting data using queries, you can create useful new data as well.&lt;/p&gt;
&lt;h2 id=&#34;4-comments&#34;&gt;4 Comments&lt;/h2&gt;
&lt;p&gt;By Keith Fahlgren on &lt;a href=&#34;#comment-2330&#34;&gt;September 10, 2009 1:04 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I also started turning to CONSTRUCT recently as a performance optimization. Rather than having to ask the server to build the &amp;ldquo;normal&amp;rdquo; huge serialization that the libraries expect, I just plucked out a tiny subset that I needed (and didn&amp;rsquo;t cross too many internal graph storage boundaries) and asked for a CONSTRUCT of that. The speedup wasn&amp;rsquo;t as huge as I&amp;rsquo;d hoped, but it was still a fruitful exercise.&lt;/p&gt;
&lt;p&gt;By Simon Reinhardt on &lt;a href=&#34;#comment-2331&#34;&gt;September 10, 2009 4:12 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;You can see some more examples for the usefulness of CONSTRUCT for things like rules and views at &lt;a href=&#34;http://spinrdf.org/spin.html&#34;&gt;http://spinrdf.org/spin.html&lt;/a&gt; and at &lt;a href=&#34;http://www.uni-koblenz-landau.de/koblenz/fb4/institute/IFI/AGStaab/Research/systeme/NetworkedGraphs&#34;&gt;http://www.uni-koblenz-landau.de/koblenz/fb4/institute/IFI/AGStaab/Research/systeme/NetworkedGraphs&lt;/a&gt; .&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-2332&#34;&gt;September 10, 2009 9:28 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Believe me, I&amp;rsquo;ve been studying spinrdf.org for a few weeks now&amp;ndash;it&amp;rsquo;s part of my job!&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://topquadrant.com&#34; title=&#34;http://topquadrant.com&#34;&gt;Daniel Mekonnen&lt;/a&gt; on &lt;a href=&#34;#comment-2333&#34;&gt;September 13, 2009 5:27 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Congratulations Bob, I think you are now seeing &amp;ldquo;the stars in the obelisk&amp;rdquo; to use a 2001 analogy (in the Arthur C. Clarke sense, not the actual year :).&lt;/p&gt;
&lt;p&gt;In my own experience with SPARQL I actually use CONSTRUCT, INSERT and DELETE more than SELECT. Which is very much a part of the process of semanticizing unlinked data sets from raw sources like Excel files, XML, XSD, CSV, RDBMS sources and text dumps from PDF files. TopBraid Composer can import most anything that you can point a URL at and bring into a semantic representation.&lt;/p&gt;
&lt;p&gt;But that&amp;rsquo;s just the starting point. The semantic representation that you get from the many import features are not necessarily going to be in the vocabulary that you are required to work with on a given project. This is where CONSTRUCT and friends come to the rescue, to transform the triple patterns form into another. SPARQLMotion brings the process to another level, allowing you to pipeline a series of transformations together, even merge multiple sources together, into the representation that you need.&lt;/p&gt;
&lt;p&gt;Very powerful stuff. I view the process as &amp;ldquo;data shaping&amp;rdquo; and the equivalent of &amp;ldquo;s/pattern A/pattern B/&amp;rdquo; from the regex world. I think people who enjoy writing regular expressions will find SPARQL very enjoyable, kind of like going from 1D to 2D pattern matching.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2009">2009</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
    </item>
    
    <item>
      <title>Growth of the linked data cloud</title>
      <link>https://www.bobdc.com/blog/growth-of-the-linked-data-clou/</link>
      <pubDate>Thu, 03 Sep 2009 09:19:24 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/growth-of-the-linked-data-clou/</guid>
      
      
      <description><div>Or at least, the growth of Richard Cyganiak&#39;s famous diagram.</div><div>&lt;p&gt;While preparing slides for the Semantic Web Overview talk I&amp;rsquo;ll be giving at the beginning of the &lt;a href=&#34;http://xmlsummerschool.com/curriculum2009/semantic-technologies/&#34;&gt;Semantic Technologies course&lt;/a&gt; of the &lt;a href=&#34;http://xmlsummerschool.com/&#34;&gt;Oxford XML Summer School&lt;/a&gt;, I was adding a few slides on Linked Data. (Leigh Dodds is presenting a more detailed class on Linked Data later in the day.) Of course I had to include a slide of Richard Cyganiak&amp;rsquo;s interactive diagram of the Linked Data cloud, and as with many of my slides, I was tempted to re-use a slide from a presentation I&amp;rsquo;d given before. I found the following image in a talk I gave in February of last year:&lt;/p&gt;
&lt;img id=&#34;id103301&#34; src=&#34;https://www.bobdc.com/img/main/ldcFeb08.jpg&#34; alt=&#34;[linked data cloud, February 2008]&#34; width=&#34;440px&#34;/&gt;
&lt;p&gt;I decided to be conscientious and update the image, so I went to &lt;a href=&#34;http://richard.cyganiak.de/2007/10/lod/&#34;&gt;Richard&amp;rsquo;s page&lt;/a&gt; to get an updated version, and found this:&lt;/p&gt;
&lt;img id=&#34;id103325&#34; src=&#34;https://www.bobdc.com/img/main/ldcJul09.jpg&#34; alt=&#34;[linked data cloud, July 2009]&#34; width=&#34;440px&#34;/&gt;
&lt;p&gt;It looks like the world of linked data is growing at quite a rate! And, if you look closely, you&amp;rsquo;ll see that his latest image says &amp;ldquo;As of July 2009&amp;rdquo;, so I imagine that there are even more nodes to add to this image by now.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2009">2009</category>
      
      <category domain="https://www.bobdc.com//categories/semantic-web">semantic web</category>
      
    </item>
    
    <item>
      <title>Getting started with the TopQuadrant product line</title>
      <link>https://www.bobdc.com/blog/getting-started-with-the-topqu/</link>
      <pubDate>Thu, 27 Aug 2009 20:38:34 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/getting-started-with-the-topqu/</guid>
      
      
      <description><div>A lot of great technology to learn about.</div><div>&lt;p&gt;Last week was my first week working at &lt;a href=&#34;http://www.topquadrant.com/&#34;&gt;TopQuadrant&lt;/a&gt;, and I spent three days in a class given by one of my new co-workers, Scott Henninger. I only had a skeletal idea of what the components of &lt;a href=&#34;http://www.topquadrant.com/products/TB_Suite.html&#34;&gt;TopBraid Suite&lt;/a&gt; did before, and now that I have a better idea, I&amp;rsquo;m very impressed. (I may be wrong on one or two details below, but I&amp;rsquo;m still the new guy.)&lt;/p&gt;
&lt;p&gt;I had the impression that the core original product, TopBraid Composer, was mostly for designing and editing RDFS/OWL schemas and ontologies. It&amp;rsquo;s very good at doing those, but it also makes a very good interface for dealing directly with the data described by these models. Being built on Eclipse, the various panes (or, in Eclipse parlance, &amp;ldquo;views&amp;rdquo;) of the main window let you see an ontology or a file of data from several angles at once and refine the model by pointing, clicking, dragging, and editing dialog boxes.&lt;/p&gt;
&lt;p&gt;TopBraid Composer also includes a SPARQL engine and uses standard SPARQL as the starting point for several new technologies that let you build applications around triplestores. A great new one is SPIN, for &amp;ldquo;SPARQL Inferencing Notation&amp;rdquo;. As described on &lt;a href=&#34;http://www.spinrdf.org/&#34;&gt;spinrdf.org&lt;/a&gt;,&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;SPIN is a collection of RDF vocabularies enabling the use of SPARQL to define constraints and inference rules on Semantic Web models. SPIN also provides meta-modeling capabilities that allow users to define their own SPARQL functions and query templates. Finally, SPIN includes a ready to use library of common functions.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;As TopQuadrant VP of Product Development Holger Knublauch wrote in a comment in a recent &lt;a href=&#34;http://stage.vambenepe.com/archives/496&#34;&gt;William Vambenepe blog entry&lt;/a&gt;,&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Another aspect of RDF that SPIN rides on is the vision of a distributed self-describing data structure. In the Semantic Web, both classes and instances live in the same space and can be queried using the same mechanisms. SPIN takes this idea to extremes: you can not only define classes and properties, but even define executable semantics of those and use this mechanism to build your own modeling languages.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Holger&amp;rsquo;s own blog entry &lt;a href=&#34;http://composing-the-semantic-web.blogspot.com/2009/01/object-oriented-semantic-web-with-spin.html&#34;&gt;The Object-Oriented Semantic Web with SPIN&lt;/a&gt; is a good introduction to what SPIN (and TopQuadrant&amp;rsquo;s implementation of those executable semantics, TopSPIN) are all about. With support for SPIN built into the &lt;a href=&#34;http://www.topquadrant.com/products/TB_Composer.html#free&#34;&gt;free edition&lt;/a&gt; of TopBraid Composer, a lot of people can now try this out, and I look forward to helping the company beef up the documentation for it.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://www.topquadrant.com/products/SPARQLMotion.html&#34;&gt;&lt;img id=&#34;id103394&#34; src=&#34;http://www.topquadrant.com/images/sparql_examples/SPARQLMotion-Example.jpg&#34; border=&#34;0&#34; align=&#34;right&#34; hspace=&#34;10px&#34; vspace=&#34;10px&#34; alt=&#34;[SPARQLMotion screen shot]&#34; width=&#34;320px&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://www.topquadrant.com/products/SPARQLMotion.html&#34;&gt;SPARQLMotion&lt;/a&gt; is another impressive RDF application development productivity tool. SPARQLMotion lets you build applications by dragging icons into a screen where you connect them into pipelines that can branch in different directions depending on various conditions. You configure each icon by filling out a dialog box to point to data sources, data destinations, and processing modules. Input modules represented by different icons can pull data from news feeds, spreadsheets, email, XML, all the obvious RDF sources, and more. Processing can apply rules via Pellet, Jena rules, TopSPIN, Calais, XSLT&amp;hellip; that&amp;rsquo;s about a quarter of the list. You can then output the results of your processing to most of the input formats and additional ones such as calendars, maps, and HTTP POST requests. (PDF support is on the way, so that you could have the XSLT processing module convert XML versions of data pulled by some SPARQL queries into XSL-FO of a nicely rendered page and then output a PDF file from there.) Holger has done a nice five-minute video called &amp;ldquo;Creating a SPARQLMotion Script&amp;rdquo; on &lt;a href=&#34;http://www.topquadrant.com/resources/videos.html&#34;&gt;TopQuadrant&amp;rsquo;s video page&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;After TopBraid Composer, the other two components of TopBraid Suite are Ensemble and Live. TopBraid Ensemble lets you create applications by selecting user interface components and essentially writing event handlers for them. Components for displaying data include trees, grids, and forms, so a dashboard app would be pretty straightforward to build. Because it&amp;rsquo;s built on Adobe Flex, you can create any Flex component you want, such as a movie player, and then use the Ensemble API to grab triples and use them in processing. (I never realized that a running copy of Eclipse has an &lt;a href=&#34;http://www.eclipse.org/jetty/&#34;&gt;HTTP server&lt;/a&gt; that you can use as the basis for applications.) Because the UI that you design can trigger manipulation of the data in a triplestore using SPIN and SPARQLMotion, you can build complete applications around triplestores for use by people who may not even know what RDF is but who need to work with that data using a form-driven interface.&lt;/p&gt;
&lt;p&gt;Once you build an application with Ensemble, TopBraid Live lets you deploy it on a server for others to use. I saw Scott help a customer deploy an app, and the process pretty much looked like zipping up some files and then unzipping them to the right place on a server that the app&amp;rsquo;s users would have access to.&lt;/p&gt;
&lt;p&gt;With SPARQLMotion as a development tool and TopBraid Live as a deployment tool, it&amp;rsquo;s easy to picture an information publisher having staff members who do nothing but full-time SPARQLMotion development, creating apps that mix and match data from all the different data sources available to that publisher in order to build information products and applications around those data sources. (The data might be available as native RDF, but would more likely be in a host of other formats available to the SPARQLMotion scripts using its automatic converters.) Using TopBraid Live, the publisher would use these apps to deliver content in any format necessary to their customers. The publisher would have an agile platform for creating new information products whose components may have started off in separate silos and would have taken a lot more work to integrate without TopBraid Ensemble. Of course, there&amp;rsquo;s more to it than the easier integration provided by the RDF data model; the possibilities that RDFS, OWL, and now SPIN provide for adding metadata to the content should be very attractive to publishers as well.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2009">2009</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
      <category domain="https://www.bobdc.com//categories/semantic-web">semantic web</category>
      
    </item>
    
    <item>
      <title>DevX article on using RDFa with DocBook and DITA</title>
      <link>https://www.bobdc.com/blog/devx-article-on-using-rdfa-wit/</link>
      <pubDate>Fri, 21 Aug 2009 07:18:04 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/devx-article-on-using-rdfa-wit/</guid>
      
      
      <description><div>Relatively easy.</div><div>&lt;p&gt;&lt;a href=&#34;http://www.devx.com/semantic/Article/42543&#34;&gt;&lt;img id=&#34;id97814&#34; src=&#34;http://assets.devx.com/articleicons/20357.gif&#34; border=&#34;0&#34; align=&#34;right&#34; hspace=&#34;30px&#34; vspace=&#34;30px&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;RDFa specs and discussions usually tell us that you&amp;rsquo;re not limited to using it with HTML, and then they only talk about using it with HTML. I wanted to see how difficult it would be to incorporate the examples from the W3C&amp;rsquo;s &lt;a href=&#34;http://www.w3.org/TR/2008/NOTE-xhtml-rdfa-primer-20081014/&#34;&gt;RDFa Primer&lt;/a&gt; into DocBook and DITA documents, and it wasn&amp;rsquo;t difficult at all. DevX has just published an article that I wrote on how I went about it: &lt;a href=&#34;https://web.archive.org/web/20150317231317/http://www.devx.com/semantic/Article/42543&#34;&gt;Using RDFa with DITA and DocBook&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I used to think that DTDs such as DocBook and DITA didn&amp;rsquo;t need RDFa because they&amp;rsquo;re so customizable, but when you define new metadata elements or attributes for either, you have to write new code in XSLT (or whatever your language is for processing the XML) to go get that new metadata. Once you&amp;rsquo;ve added RDFa support modules to these DTDs, though, existing RDFa extractors (that conform to the spec) will pull out any new kinds of metadata that you store in these RDFa attributes, which also reduces your need for future customization of those DTDs.&lt;/p&gt;
&lt;p&gt;I hope that people in the RDFa, DocBook, and DITA communities find the article useful.&lt;/p&gt;
&lt;h2 id=&#34;1-comments&#34;&gt;1 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://dret.net/netdret/&#34; title=&#34;http://dret.net/netdret/&#34;&gt;dret&lt;/a&gt; on &lt;a href=&#34;#comment-2321&#34;&gt;August 21, 2009 1:30 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;definitely interested, but &lt;a href=&#34;http://www.devx.com/semantic/Article/42543&#34;&gt;http://www.devx.com/semantic/Article/42543&lt;/a&gt; currently serves a standard frame with blank contents, so maybe that&amp;rsquo;s something DevX should look into?&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2009">2009</category>
      
      <category domain="https://www.bobdc.com//categories/dita">DITA</category>
      
      <category domain="https://www.bobdc.com//categories/rdf">RDF</category>
      
    </item>
    
    <item>
      <title>Joining TopQuadrant</title>
      <link>https://www.bobdc.com/blog/joining-topquadrant/</link>
      <pubDate>Fri, 14 Aug 2009 12:40:00 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/joining-topquadrant/</guid>
      
      
      <description><div>Doing semantic web work full-time at an industry leader in the field.</div><div>&lt;p&gt;&lt;a href=&#34;http://www.topquadrant.com/&#34;&gt;&lt;img src=&#34;https://www.bobdc.com/img/main/tqlogo.png&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;[TopQuadrant logo]&#34; /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;m very pleased to announce that on Monday I&amp;rsquo;ll be starting a full-time position with &lt;a href=&#34;http://www.topquadrant.com/&#34;&gt;TopQuadrant&lt;/a&gt;, a well-known name in the semantic web world. TopQuadrant makes &lt;a href=&#34;http://www.topquadrant.com/products/TB_Suite.html&#34;&gt;TopBraid Suite&lt;/a&gt;, the W3C standards-based desktop tool for modeling data and for developing and deploying applications that take advantage of semantic web and linked data technology.&lt;/p&gt;
&lt;p&gt;Of all the activities that their staff is engaged in—development, modeling, training, documentation, speaking, marketing, sales—I could be taking taking part in all of them, so while we haven&amp;rsquo;t thought of a job title yet, it will probably be one of those classically nebulous ones that tech company employees favor. (I have told them that if Dean Allemang is &lt;a href=&#34;http://www.topquadrant.com/company/mgmt.html#dean&#34;&gt;Chief Scientist&lt;/a&gt;, I&amp;rsquo;d love to have something with the word &amp;ldquo;Scientist&amp;rdquo; in it.)&lt;/p&gt;
&lt;p&gt;Three things in particular attracted me to TopQuadrant: they have a great track record of applying semantic web technology to customer business goals, they make a fully functional version of one of their core products available for free, and they&amp;rsquo;re firmly committed to the support of open standards—there&amp;rsquo;s no patented secret sauce holding up their business model. Of course, these three points overlap a great deal; for example, as the term &amp;ldquo;semantic&amp;rdquo; gains in marketing buzzword status, more companies claim that their products use semantic technology, but they never mention RDF, OWL, or SPARQL, while TopQuadrant is actively engaged in helping customers to build applications that use these standards and that hide the syntax from the customers&amp;rsquo; end users behind modern GUI interfaces when necessary.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;m looking forward very much to learning, using, teaching, and building on these tools. (And if you&amp;rsquo;re interested in applying XML technologies to publishing systems, &lt;a href=&#34;http://www.innodata-isogen.com/&#34;&gt;Innodata Isogen&lt;/a&gt; just might be interested in hiring you.)&lt;/p&gt;
&lt;h2 id=&#34;8-comments&#34;&gt;8 Comments&lt;/h2&gt;
&lt;p&gt;By Michael Friedman on &lt;a href=&#34;#comment-2307&#34;&gt;August 14, 2009 1:02 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Congratulations on the new position! It sounds like a great fit for you. Please do keep us informed!&lt;/p&gt;
&lt;p&gt;By Betty Harvey on &lt;a href=&#34;#comment-2308&#34;&gt;August 14, 2009 1:14 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Congratulations and good luck! I know you enjoy this new position and bring a lot of good work to the semantic web.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.ccil.org/~cowan&#34; title=&#34;http://www.ccil.org/~cowan&#34;&gt;John Cowan&lt;/a&gt; on &lt;a href=&#34;#comment-2309&#34;&gt;August 14, 2009 2:26 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Suggestions:&lt;/p&gt;
&lt;p&gt;&amp;ldquo;Other Scientist&amp;rdquo;&lt;/p&gt;
&lt;p&gt;&amp;ldquo;Grunt Scientist&amp;rdquo;&lt;/p&gt;
&lt;p&gt;&amp;ldquo;Indian Scientist&amp;rdquo; (as in &amp;ldquo;chiefs and Indians&amp;rdquo;)&lt;/p&gt;
&lt;p&gt;&amp;ldquo;Grad&amp;rdquo; (from Larry Niven&amp;rsquo;s Smoke Ring books)&lt;/p&gt;
&lt;p&gt;&amp;ldquo;Chief Natural Philosopher&amp;rdquo;&lt;/p&gt;
&lt;p&gt;This all reminds me of how in the Anglican Church the Archbishop of York is the Primate of England, whereas the Archbishop of Canterbury is the Primate of All England. Of course, they are both primates, so that&amp;rsquo;s only reasonable.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://thewebsemantic.com&#34; title=&#34;http://thewebsemantic.com&#34;&gt;Taylor&lt;/a&gt; on &lt;a href=&#34;#comment-2310&#34;&gt;August 14, 2009 5:56 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;This is good for you and TQ&amp;hellip;really pleased to hear this, now you&amp;rsquo;re going to have to dig into Jena a bit more, maybe some Jython bindings on the way?&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://blog.triplescape.com&#34; title=&#34;http://blog.triplescape.com&#34;&gt;Brian Manley&lt;/a&gt; on &lt;a href=&#34;#comment-2311&#34;&gt;August 14, 2009 9:27 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Congrats Bob! I interviewed with them a few weeks ago, and found them to be a really interesting and smart group. Hope it works out well for you!&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://sw-app.org/about.html&#34; title=&#34;http://sw-app.org/about.html&#34;&gt;Michael Hausenblas&lt;/a&gt; on &lt;a href=&#34;#comment-2312&#34;&gt;August 15, 2009 6:56 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Bob,&lt;/p&gt;
&lt;p&gt;Congrats! Sounds very cool; hope you still find time to continue your great posts, here.&lt;/p&gt;
&lt;p&gt;Cheers,&lt;br /&gt;
Michael&lt;/p&gt;
&lt;p&gt;By Pat Hayes on &lt;a href=&#34;#comment-2313&#34;&gt;August 16, 2009 12:10 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Congrats. On titles, I would go for &amp;ldquo;natural scientist&amp;rdquo;, which has a venerable history but also carries a subtle implication about all the other scientists :-)&lt;/p&gt;
&lt;p&gt;By Peter Ring on &lt;a href=&#34;#comment-2314&#34;&gt;August 16, 2009 6:55 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Congratulations! and keep up posting!&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;d love to have a business card that read &amp;lsquo;Evil Mad Scientist&amp;rsquo; or just &amp;lsquo;Evil Genius&amp;rsquo;. For now, I have to settle with &amp;lsquo;Information Architect&amp;rsquo;. Maybe I should start wearing a white coat &amp;hellip;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2009">2009</category>
      
      <category domain="https://www.bobdc.com//categories/semantic-web">semantic web</category>
      
    </item>
    
    <item>
      <title>Advanced XSLT (and XQuery, and XSL-FO)</title>
      <link>https://www.bobdc.com/blog/advanced-xslt-and-xquery-and-x/</link>
      <pubDate>Thu, 30 Jul 2009 11:28:49 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/advanced-xslt-and-xquery-and-x/</guid>
      
      
      <description><div>A good place to learn.</div><div>&lt;p&gt;&lt;a href=&#34;http://www.amazon.com/exec/obidos/ISBN=0470192747/bobducharmeA/&#34;&gt;&lt;img id=&#34;id202739&#34; src=&#34;http://ecx.images-amazon.com/images/I/31bJBBvsybL._SL500_AA180_.jpg&#34; border=&#34;0&#34; align=&#34;right&#34; hspace=&#34;30px&#34; vspace=&#34;20px&#34; alt=&#34;[&#39;XSLT 2.0 and XPath 2.0&#39; cover]&#34; width=&#34;140px&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;While there are many places for beginners to learn the basics of XSLT (I have &lt;a href=&#34;http://www.amazon.com/exec/obidos/ISBN=1930110111/bobducharmeA/&#34;&gt;my favorite&lt;/a&gt;, but I&amp;rsquo;m biased), learning more advanced techniques often means spending a good deal of money to bring in a specific expert for specialized training.&lt;/p&gt;
&lt;p&gt;For the eight or so years that I&amp;rsquo;ve chaired the XSLT/XSL-FO/XQuery track at the &lt;a href=&#34;http://xmlsummerschool.com/&#34;&gt;XML Summer School&lt;/a&gt; in Oxford, England, I&amp;rsquo;ve begun it by giving a beginner-level XSLT introduction class, but with people like Michael Kay, Jeni Tennison, and Priscilla Walmsley on hand (and that&amp;rsquo;s just the XSLT track—other tracks include XSLT experts such as Debbie Lapeyre and Norm Walsh), I&amp;rsquo;ve always wanted to take better advantage of the opportunity to address more advanced, cutting-edge issues. So this year, when the Summer School is held from the 20th to the 25th of September, I won&amp;rsquo;t give an introductory class as part of this track; Debbie will cover the introductory level material in the &lt;a href=&#34;http://xmlsummerschool.com/curriculum2009/hands-on-intro/&#34;&gt;Hands-on Introduction to XML&lt;/a&gt; course, leaving the &lt;a href=&#34;http://xmlsummerschool.com/curriculum2009/xslt-xsl-fo-and-xquery/&#34;&gt;XSLT, XSL-FO, and XQuery&lt;/a&gt; track more room to cover material that helps current XSLT practitioners build better applications more quickly. (I will be doing the introductory class in the &lt;a href=&#34;http://xmlsummerschool.com/curriculum2009/semantic-technologies/&#34;&gt;Semantic Technologies&lt;/a&gt; course, which I&amp;rsquo;m greatly looking forward to.)&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://www.amazon.com/exec/obidos/ISBN=1590593243/bobducharmeA/&#34;&gt;&lt;img id=&#34;id202842&#34; src=&#34;http://ecx.images-amazon.com/images/I/41NCOSM2okL.jpg&#34; border=&#34;0&#34; align=&#34;right&#34; hspace=&#34;30px&#34; vspace=&#34;20px&#34; alt=&#34;[&#39;Beginning XSLT 2.0&#39; cover]&#34; width=&#34;120px&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;For example, one new class aimed squarely at current XSLT practitioners is Jeni&amp;rsquo;s &amp;ldquo;Test-driven XSLT development&amp;rdquo;. Different programming and scripting languages present difference advantages and challenges for the use of unit tests in application development; Jeni has worked out a framework to let this approach benefit XSLT development, and will show us how it has helped her work.&lt;/p&gt;
&lt;p&gt;Another new class will be Michael Kay&amp;rsquo;s discussion of application architecture. Standards-based XML development often offers several ways to do the same task within a system (for example, XSLT vs. XQuery or native XML databases vs. relational ones) and Michael&amp;rsquo;s perspective on how to go about making the right choices will be very interesting, given his extensive development experience with both the inner workings of his Saxon XSLT processor and with system development for clients.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://www.amazon.com/exec/obidos/ISBN=0596006349/bobducharmeA/&#34;&gt;&lt;img id=&#34;id202890&#34; src=&#34;http://ecx.images-amazon.com/images/I/51E94VHpO9L.jpg&#34; border=&#34;0&#34; align=&#34;right&#34; hspace=&#34;30px&#34; vspace=&#34;20px&#34; alt=&#34;[&#39;XQuery&#39; cover]&#34; width=&#34;120px&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;A more large-scale class that I really look forward to is the XSLT Efficiency Workshop, a session that will last an entire afternoon. This will be led by Michael, Jeni, Priscilla, and myself, and cover both development efficiency and execution efficiency. Instead of simply taking turns presenting slides, we have an interactive session planned, in which we work with subsets of the workshop&amp;rsquo;s attendees to identify the most pressing issues in their own development, and then I and this all-star panel will discuss approaches to these issues.&lt;/p&gt;
&lt;p&gt;Classes from previous years that will be offered again are Jeni&amp;rsquo;s &amp;ldquo;Getting the Most Out of XSLT 2.0&amp;rdquo; and Priscilla&amp;rsquo;s introductions to XQuery and XSL-FO. I&amp;rsquo;ve seen these presentations several times and learn something new each time.&lt;/p&gt;
&lt;p&gt;As Jeni wrote in her &lt;a href=&#34;http://www.jenitennison.com/blog/node/107&#34;&gt;weblog&lt;/a&gt; recently,&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;I know a lot of beginners go to the XML Summer School for the introduction course, but to me the real value is for people who are actually using XML on a day to day basis and want to keep on top of the latest tools and technologies that will actually help them do their jobs. I learn something new every year.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;&lt;a href=&#34;http://www.amazon.com/exec/obidos/ISBN=1930110111/bobducharmeA/&#34;&gt;&lt;img id=&#34;id202971&#34; src=&#34;http://www.snee.com/bob/img/XQcoverSmall.jpg&#34; border=&#34;0&#34; align=&#34;right&#34; hspace=&#34;30px&#34; vspace=&#34;20px&#34; alt=&#34;[&#39;XSLT Quickly&#39; cover]&#34; width=&#34;120px&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;We hope that the Efficiency Workshop in particular brings the advanced expertise together with stories of day-to-day use to help the practitioners learn new techniques and the experts learn more about how people are using these technologies in a range of real-world applications.&lt;/p&gt;
&lt;p&gt;And of course there&amp;rsquo;s Oxford itself, and the beer, and the hanging out with friends, which can be even better than the classes, but it&amp;rsquo;s difficult to distinguish between the classes and the hanging out in the college bar when these old and new friends know so much about XML technology and enjoy discussing their work.&lt;/p&gt;
&lt;h2 id=&#34;2-comments&#34;&gt;2 Comments&lt;/h2&gt;
&lt;p&gt;By nicky on &lt;a href=&#34;#comment-2305&#34;&gt;August 10, 2009 2:54 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://www.4uaf.com/&#34;&gt;Nike AIR Force&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;br /&gt;
&lt;a href=&#34;http://www.enjoyjd.com/Air_jordan_IV.html&#34;&gt;Air jordan IV&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.software-development-blog.com&#34; title=&#34;http://www.software-development-blog.com&#34;&gt;jope&lt;/a&gt; on &lt;a href=&#34;#comment-2724&#34;&gt;November 30, 2010 3:19 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Hey,&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve learned XSLT and FOP by tutorials googled on the web and about 2 books. A video guide would be also a nice learning training, because of the replay function.&lt;/p&gt;
&lt;p&gt;Greets&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2009">2009</category>
      
      <category domain="https://www.bobdc.com//categories/xml">XML</category>
      
      <category domain="https://www.bobdc.com//categories/xslt">XSLT</category>
      
    </item>
    
    <item>
      <title>Court decision metadata and DBpedia</title>
      <link>https://www.bobdc.com/blog/court-decision-metadata-and-db/</link>
      <pubDate>Mon, 27 Jul 2009 09:05:41 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/court-decision-metadata-and-db/</guid>
      
      
      <description><div>An unplanned sequel.</div><div>&lt;img id=&#34;id202742&#34; src=&#34;http://upload.wikimedia.org/wikipedia/commons/thumb/f/f3/Seal_of_the_United_States_Supreme_Court.svg/100px-Seal_of_the_United_States_Supreme_Court.svg.png&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;seal of the US Supreme Court&#34;/&gt;
&lt;p&gt;When I wrote my last two blog entries (not counting the announcement about my new developerWorks article), &lt;a href=&#34;https://www.bobdc.com/blog/modeling-your-data-with-dbpedi&#34;&gt;Modeling your data with DBpedia vocabularies&lt;/a&gt; and &lt;a href=&#34;http://www.snee.com/bobdc.blog/2009/06/big-legal-publishers-and-seman.html&#34;&gt;Big legal publishers and semantic web technology&lt;/a&gt;, I had no idea that I would soon stumble across a nice collection of US Supreme Court case metadata in DBpedia. After writing about modeling with DBpedia vocabularies, it occurred to me that if Wikipedia has pages with infoboxes for individual professional wrestlers and Battlestar Galactica episodes, they probably have them for important Supreme Court cases as well. I checked for Roe v. Wade (popular in legal publishing because along with being a famous case, its title is short and easy to spell) and there it was at &lt;a href=&#34;http://en.wikipedia.org/wiki/Roe_v._Wade&#34;&gt;http://en.wikipedia.org/wiki/Roe_v._Wade&lt;/a&gt;. Even better for the semweb geek, its DBpedia page at &lt;a href=&#34;http://dbpedia.org/page/Roe_v._Wade&#34;&gt;http://dbpedia.org/page/Roe_v._Wade&lt;/a&gt; showed properties for most of the key bits of information you want for a court decision: the date, the reporter volume and page, names of concurring judges, names of dissenting judges, laws applied, and more.&lt;/p&gt;
&lt;p&gt;Wikipedia and DBpedia even include my favorite case, &lt;a href=&#34;http://en.wikipedia.org/wiki/Campbell_v._Acuff-Rose_Music,_Inc.&#34;&gt;Campbell v. Acuff-Rose Music, Inc.&lt;/a&gt;, in which Appendix B of the &lt;a href=&#34;http://www.law.cornell.edu/supct/html/92-1292.ZO.html&#34;&gt;Supreme Court decision&lt;/a&gt; includes the following lyrics from the 2 Live Crew song that Roy Orbison&amp;rsquo;s publisher sued &amp;ldquo;Luther Campbell aka Luke Skywalker&amp;rdquo; (as he&amp;rsquo;s known in the case&amp;rsquo;s dbprop:fullname) over: &amp;ldquo;Big hairy woman all that hair it ain&amp;rsquo;t legit/&amp;lsquo;Cause you look like &amp;lsquo;Cousin It&amp;rsquo;&amp;rdquo;. (I like my landmark Supreme Court IP law decisions to include &lt;a href=&#34;http://is.gd/1JiHe&#34;&gt;Addams Family&lt;/a&gt; references.)&lt;/p&gt;
&lt;p&gt;Wikipedia currently has pages for 198 Supreme Court decisions, according to their &lt;a href=&#34;http://en.wikipedia.org/wiki/Category:United_States_Supreme_Court_cases&#34;&gt;Category:United States Supreme Court cases&lt;/a&gt; page. After going to the &lt;a href=&#34;http://dbpedia.org/page/Category:United_States_Supreme_Court_cases&#34;&gt;DBpedia equivalent of that page&lt;/a&gt;, I realized that I could retrieve a list of them all with a simple SPARQL query on &lt;a href=&#34;http://dbpedia.org/sparql&#34;&gt;DBpedia&amp;rsquo;s query form&lt;/a&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;SELECT DISTINCT ?s WHERE {
  ?s 
  &amp;lt;http://www.w3.org/2004/02/skos/core#subject&amp;gt;
  &amp;lt;http://dbpedia.org/resource/Category:United_States_Supreme_Court_cases&amp;gt;
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Even better, I noticed at the bottom of the Wikipedia page for Campbell v Acuff-Rose that it belonged to the Wikipedia category &lt;a href=&#34;http://en.wikipedia.org/wiki/Category:United_States_copyright_case_law&#34;&gt;US copyright case law&lt;/a&gt;, a pretty important bit of categorization metadata. Sure, you can look at that page to see the list, but you can also retrieve the list with a slight modification to the SPARQL query above:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;SELECT DISTINCT ?s WHERE {
  ?s 
  &amp;lt;http://www.w3.org/2004/02/skos/core#subject&amp;gt;
  &amp;lt;http://dbpedia.org/resource/Category:United_States_copyright_case_law&amp;gt;
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The most interesting part of the metadata included with the cases is the connections between them. For example, the DBpedia page for &lt;a href=&#34;http://dbpedia.org/page/Brown_v._Board_of_Education&#34;&gt;Brown v. Board of Education&lt;/a&gt; shows that it &amp;ldquo;is dbpprop:overruled of&amp;rdquo; Plessy v. Ferguson. The DBpedia page for &lt;a href=&#34;http://dbpedia.org/page/Plessy_v._Ferguson&#34;&gt;Plessy v. Ferguson&lt;/a&gt; shows that it&amp;rsquo;s dbprop:overruled by Brown v. Board of Education.&lt;/p&gt;
&lt;p&gt;There are not enough of these links to threaten a commercial cite-checking service such as LexisNexis&amp;rsquo;s &lt;a href=&#34;http://www.oreillynet.com/xml/blog/2003/05/a_nineteenthcentury_linking_ap.html&#34;&gt;Shepard&amp;rsquo;s&lt;/a&gt; product—a lawyer checking whether a potentially citable case was has been overruled is a classic example of when &lt;a href=&#34;http://en.wikipedia.org/wiki/Precision_and_recall&#34;&gt;search recall&lt;/a&gt; trumps precision, because missing just one search result can be disasterous for the lawyer. Still, the current amount of SPARQL-addressable fielded metadata about US caselaw on Wikipedia (and hence on DBpedia) is a big step beyond the amount of &lt;a href=&#34;https://www.bobdc.com/blog/law-metadata-on-the-web&#34;&gt;law metadata on the web&lt;/a&gt; that was available when I wrote about this in early 2006. It will be great to see this collection grow and to see more applications take advantage of it.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2009">2009</category>
      
      <category domain="https://www.bobdc.com//categories/dbpedia">DBpedia</category>
      
      <category domain="https://www.bobdc.com//categories/legal-publishing">legal publishing</category>
      
    </item>
    
    <item>
      <title>New developerWorks article: &#34;Build Wikipedia query forms with semantic technology&#34;</title>
      <link>https://www.bobdc.com/blog/new-developerworks-article-bui/</link>
      <pubDate>Wed, 22 Jul 2009 10:07:10 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/new-developerworks-article-bui/</guid>
      
      
      <description><div>Build form-driven apps that let any user query DBpedia.</div><div>&lt;p&gt;&lt;a href=&#34;http://www.ibm.com/developerworks/xml/library/x-wikiquery/&#34;&gt;&lt;img id=&#34;id202763&#34; src=&#34;http://www.ibm.com/developerworks/i/dwwordmark.gif&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;[developerWorks logo]&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I often find discussions about whether SPARQL is difficult to be a bit silly. Not that SPARQL is incredibly easy—although I do find it easy enough as query or scripting languages go—but because any talk of its suitability for your Mom is just a red herring. In a January blog posting titled &lt;a href=&#34;https://www.bobdc.com/blog/hey-cnn-sparql-isnt-so-difficu&#34;&gt;Hey CNN, SPARQL isn&amp;rsquo;t so difficult&lt;/a&gt;, I wrote that as with SQL and other query languages, no one expects end users to type out SPARQL queries, but that someone who already knows a scripting language or two can pick up SPARQL and use it to build new kinds of applications.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve written an article titled &lt;a href=&#34;http://www.ibm.com/developerworks/xml/library/x-wikiquery/&#34;&gt;Build Wikipedia query forms with semantic technology&lt;/a&gt; to demonstrate how, and it&amp;rsquo;s now live on IBM&amp;rsquo;s developerWorks website. The article walks the reader through the components of two simple form-driven applications: one that queries the Internet Movie Database for the names of actors who have appeared in movies by the two directors whose names you enter on the input form (for example, only Kathleen Turner has appeared in both a Francis Ford Coppola film and a Sofia Coppola film), and another that retrieves nicely-formatted information about recording artist albums from DBpedia. I tried to make it clear in the article that the cool part of all this is not this relatively new query language, but the existence of these collections of data that can be accessed by a standard query language and the ease with which one can build a query around search terms entered on a form, then send the query off to the server, and then format and display the results in a web page—just as people have done with SQL queries for years now.&lt;/p&gt;
&lt;p&gt;I hope I got those points across, and that the article helps more people understand the contributions that SPARQL and linked data can make to useful applications that couldn&amp;rsquo;t exist otherwise.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2009">2009</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
    </item>
    
    <item>
      <title>Modeling your data with DBpedia vocabularies</title>
      <link>https://www.bobdc.com/blog/modeling-your-data-with-dbpedi/</link>
      <pubDate>Sat, 18 Jul 2009 09:58:37 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/modeling-your-data-with-dbpedi/</guid>
      
      
      <description><div>Broad, useful, vocabularies with plenty of sample data.</div><div>&lt;p&gt;I&amp;rsquo;ve known for a while about ways to dig into the vocabularies used in DBpedia&amp;rsquo;s massive collection of triples, and I&amp;rsquo;ve used terms from these vocabularies to query for information such as &lt;a href=&#34;https://www.bobdc.com/blog/learning-more-about-sparql&#34;&gt;Bart Simpson blackboard messages&lt;/a&gt; and &lt;a href=&#34;http://www.snee.com/bobdc.blog/2008/09/querying-wikidbpedia-for-presi.html&#34;&gt;US presidents&amp;rsquo; ages at inauguration&lt;/a&gt;. I saw these terms as &amp;ldquo;field&amp;rdquo; names to use when querying this body of data.&lt;/p&gt;
&lt;p&gt;Reading the W3C &lt;a href=&#34;http://www.w3.org/TR/2008/REC-rdfa-syntax-20081014/&#34;&gt;RDFa spec&lt;/a&gt; recently, though, I was struck by &lt;a href=&#34;http://www.w3.org/TR/2008/REC-rdfa-syntax-20081014/#sec_5.3.&#34;&gt;one example&lt;/a&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;lt;div about=&amp;quot;http://dbpedia.org/resource/Albert_Einstein&amp;quot;&amp;gt;
  &amp;lt;span property=&amp;quot;foaf:name&amp;quot;&amp;gt;Albert Einstein&amp;lt;/span&amp;gt;
  &amp;lt;span property=&amp;quot;dbp:dateOfBirth&amp;quot; datatype=&amp;quot;xsd:date&amp;quot;&amp;gt;1879-03-14&amp;lt;/span&amp;gt;
  &amp;lt;div rel=&amp;quot;dbp:birthPlace&amp;quot; resource=&amp;quot;http://dbpedia.org/resource/Germany&amp;quot;&amp;gt;
    &amp;lt;span property=&amp;quot;dbp:conventionalLongName&amp;quot;&amp;gt;Federal Republic of Germany
   &amp;lt;/span&amp;gt;
  &amp;lt;/div&amp;gt;
&amp;lt;/div&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This particular example demonstrates how to chain statements together with shared resource references, but what caught my eye was the use of the &lt;a href=&#34;http://dbpedia.org/resource/&#34;&gt;http://dbpedia.org/resource/&lt;/a&gt; namespace to reference Albert Einstein and Germany and the &lt;a href=&#34;http://dbpedia.org/property/&#34;&gt;http://dbpedia.org/property/&lt;/a&gt; namespace (here represented as &amp;ldquo;dbp:&amp;rdquo;) for the factual property &amp;ldquo;birthPlace&amp;rdquo;. In other words, here were two DBpedia vocabularies being used not to query DBpedia, but to model data completely outside of the context of DBpedia, because they offered straighforward, dereferencable URIs for these things.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://dbpedia.org/page/Comic_Book_Guy&#34;&gt;&lt;img id=&#34;id202783&#34; src=&#34;https://www.bobdc.com/img/main/comicbookguy.jpg&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;Comic Book Guy with LC URI&#34; width=&#34;200px&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;m not saying that these are the first vocabularies to check when you need URIs for people, places, concepts, or properties, but they could be the best second or third places to go to if your domain offers no clear choice for a vocabulary that meets your needs. For example, I&amp;rsquo;d prefer the &lt;a href=&#34;http://www.linkedmdb.org/&#34;&gt;Linked Movie Database&lt;/a&gt; URI of &lt;a href=&#34;http://data.linkedmdb.org/page/film/2674&#34;&gt;http://data.linkedmdb.org/page/film/2674&lt;/a&gt; for Truffaut&amp;rsquo;s film &amp;ldquo;Shoot the Piano Player&amp;rdquo; over DBpedia&amp;rsquo;s &lt;a href=&#34;http://dbpedia.org/resource/Shoot_the_Piano_Player&#34;&gt;http://dbpedia.org/resource/Shoot_the_Piano_Player&lt;/a&gt;, despite the latter&amp;rsquo;s greater readability, because for one thing, the linkedmdb.org page for &lt;a href=&#34;http://data.linkedmdb.org/page/film/2674&#34;&gt;Shoot the Piano Player&lt;/a&gt; includes data about this resource being owl:sameAs the resource &lt;a href=&#34;http://dbpedia.org/resource/Shoot_the_Piano_Player&#34;&gt;http://dbpedia.org/resource/Shoot_the_Piano_Player&lt;/a&gt;, making it easy for queries about this movie to tie the Linked Movie Database and DBpedia metadata together. The more important reason, though, is that as far as I can tell, the Linked Movie Database project team has worked out a specific property vocabulary as part of their project, while the DBpedia one has grown more organically, leading to many more strange edge cases among the well-chosen terms.&lt;/p&gt;
&lt;p&gt;While the &lt;a href=&#34;http://id.loc.gov/authorities/search/&#34;&gt;Library of Congress Subject Headings&lt;/a&gt; provide a solid, professional taxonomy and a set of URIs for a wide variety of subjects and concepts, they don&amp;rsquo;t have them for places or people. (They might have one for &lt;a href=&#34;http://id.loc.gov/authorities/sh85078205&#34;&gt;London (England)&amp;ndash;History&lt;/a&gt;, but they don&amp;rsquo;t have one for &amp;ldquo;London (England)&amp;rdquo;.) So, while they have a URI for the concept of &lt;a href=&#34;http://id.loc.gov/authorities/sh98003588&#34;&gt;sightings of Elvis Presley since his death&lt;/a&gt;, they have no URI for Elvis himself. Nor do they have one for Einstein, and I don&amp;rsquo;t know what well-known vocabulary does, so the RDFa spec&amp;rsquo;s authors went with the DBpedia URI for the famous physicist. (Interestingly, the Library of Congress Subject Headers do cover fictitious characters such as &lt;a href=&#34;http://id.loc.gov/authorities/sh90001074&#34;&gt;Holden Caulfield&lt;/a&gt; and even the Simpsons&amp;rsquo; &lt;a href=&#34;http://id.loc.gov/authorities/sh2005008628&#34;&gt;Comic Book Guy&lt;/a&gt;.)&lt;/p&gt;
&lt;p&gt;To describe facts about Einstein, the &lt;a href=&#34;http://xmlns.com/foaf/spec/&#34;&gt;FOAF&lt;/a&gt; vocabulary includes many good properties for describing a person, but none to identify the day a person was born, so the RDFa spec&amp;rsquo;s authors used the DBpedia &lt;a href=&#34;http://dbpedia.org/property/dateOfBirth&#34;&gt;http://dbpedia.org/property/dateOfBirth&lt;/a&gt; property. It&amp;rsquo;s easy enough to check whether DBpedia has a URI for a person, place, or thing by going to the appropriate Wikipedia page (watch out for redirects) and replacing the &lt;a href=&#34;http://en.wikipedia.org/wiki/&#34;&gt;http://en.wikipedia.org/wiki/&lt;/a&gt; part of its URI with &lt;a href=&#34;http://dbpedia.org/page/&#34;&gt;http://dbpedia.org/page/&lt;/a&gt;. I have a bookmarklet called &lt;a href=&#34;javascript:location.href=(location.href.replace(/https?:%5C/%5C/en.wikipedia.org%5C/wiki/,&#39;http:%5C/%5C/dbpedia.org%5C/page&#39;))&#34;&gt;wp -&amp;gt; dbpedia&lt;/a&gt; that makes this replacement and takes me from a Wikipedia page to the corresponding DBpedia page with one click. If you drag that link to your bookmarks toolbar, it should work for you.&lt;/p&gt;
&lt;p&gt;To look for a property name you might need, you can check a DBpedia page for a resource that may have had that property assigned to it. You can also download an ntriples or csv file in your choice of 14 languages from &lt;a href=&#34;http://wiki.dbpedia.org/Downloads33&#34;&gt;DBpedia&amp;rsquo;s Download Page&lt;/a&gt;. The compressed version of infoboxproperties_en.nt, the ntriples version of the English language properties, was 606K, which decompression expanded to over 13 megs. With two ntriples per property, as shown in their brief &lt;a href=&#34;http://downloads.dbpedia.org/preview.php?file=3.3_sl_en_sl_infoboxproperties_en.nt.bz2&#34;&gt;sample&lt;/a&gt; of the file, it&amp;rsquo;s pretty verbose, so I wrote a perl script to trim it down to just one property name per line, without the full URLs, bringing the size of the list down to 49,122 lines and about 879K.&lt;/p&gt;
&lt;p&gt;The list fun to skim through. There are a lot of goofy properties in there; &lt;a href=&#34;http://dbpedia.org/property/worldSnookerChampionshipRoundsProperty99&#34;&gt;worldSnookerChampionshipRoundsProperty99&lt;/a&gt; has 98 more to go with it. So how do you know which ones are worth using? I like metadata that&amp;rsquo;s really about existing data, and it&amp;rsquo;s easy to use &lt;a href=&#34;http://dbpedia.org/sparql&#34;&gt;DBpedia&amp;rsquo;s SPARQL query form&lt;/a&gt; to ask about resources that have a particular property assigned. Entering the following query there showed me that over 50 people have had worldSnookerChampionshipRoundsProperty99 values assigned to them:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;SELECT DISTINCT ?s ?o WHERE {
  ?s 
  &amp;lt;http://dbpedia.org/property/worldSnookerChampionshipRoundsProperty99&amp;gt; 
  ?o
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Seeing examples of how a property was used also gives you great background in whether it&amp;rsquo;s appropriate to your needs.&lt;/p&gt;
&lt;p&gt;The first place I&amp;rsquo;d check, though, for appropriate DBpedia property names would be the DBpedia Ontology available from the same download page. It&amp;rsquo;s not huge, defining metadata for about 1200 properties at this point, but it really brings the property vocabulary into ontology territory by defining domains, ranges, subclasses, and other relationships between terms that help you to get more out of them. Outside of that ontology, &lt;a href=&#34;http://blog.georgikobilarov.com/2008/10/dbpedia-rethinking-wikipedia-infobox-extraction/&#34;&gt;plenty of other hard work&lt;/a&gt; continues to make the DBpedia predicate vocabulary more valuable to all of us, so it&amp;rsquo;s worth keeping an eye on the work going on around this vocabulary.&lt;/p&gt;
&lt;h2 id=&#34;2-comments&#34;&gt;2 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://aeshin.org/&#34; title=&#34;http://aeshin.org/&#34;&gt;Ryan Shaw&lt;/a&gt; on &lt;a href=&#34;#comment-2291&#34;&gt;July 18, 2009 1:23 PM&lt;/a&gt;&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;While the Library of Congress Subject Headings provide a solid, professional taxonomy and a set of URIs for a wide variety of subjects and concepts, they don&amp;rsquo;t have them for places or people.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;While this is true, the Library of Congress does have authority files for those things, and I understand they plan on adding them to id.loc.gov as Linked Data soon.&lt;/p&gt;
&lt;p&gt;Einstein: &lt;a href=&#34;http://errol.oclc.org/laf/n79-22889.html&#34;&gt;http://errol.oclc.org/laf/n79-22889.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Great post!&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://sw-app.org/about.html&#34; title=&#34;http://sw-app.org/about.html&#34;&gt;Michael Hausenblas&lt;/a&gt; on &lt;a href=&#34;#comment-2292&#34;&gt;July 19, 2009 2:47 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Bob,&lt;/p&gt;
&lt;p&gt;Great article (as usual ;) and might be worth it for us to cover this aspect in [1].&lt;br /&gt;
Thanks!&lt;/p&gt;
&lt;p&gt;&lt;br /&gt;
[1] &lt;a href=&#34;http://ld2sd.deri.org/lod-ng-tutorial/#checklist&#34;&gt;http://ld2sd.deri.org/lod-ng-tutorial/#checklist&lt;/a&gt;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2009">2009</category>
      
      <category domain="https://www.bobdc.com//categories/dbpedia">DBpedia</category>
      
      <category domain="https://www.bobdc.com//categories/semantic-web">semantic web</category>
      
    </item>
    
    <item>
      <title>Big legal publishers and semantic web technology</title>
      <link>https://www.bobdc.com/blog/big-legal-publishers-and-seman/</link>
      <pubDate>Mon, 15 Jun 2009 15:10:03 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/big-legal-publishers-and-seman/</guid>
      
      
      <description><div>Which one will see the good fit first?</div><div>&lt;p&gt;A recent @TopQuadrant &lt;a href=&#34;http://www.twitter.com/TopQuadrant/status/2106840525&#34;&gt;tweet&lt;/a&gt; about legal knowledge and RDF/XML led me to Dr. Adam Wyner&amp;rsquo;s piece &lt;a href=&#34;http://www.law.com/jsp/legaltechnology/pubArticleLT.jsp?id=1202431256007&#34;&gt;Legal Ontologies Spin a Semantic Web&lt;/a&gt; on law.com. After reading it, I wanted to leave a comment, but this required registering on law.com and telling them lots of details about the law firm I work for. I don&amp;rsquo;t work for a law firm, so I&amp;rsquo;m just putting my comments here and expanding on them a bit.&lt;/p&gt;
&lt;blockquote id=&#34;id202768&#34; class=&#34;pullquote&#34;&gt;It&#39;s a logical next step for the big legal publishers to build ontologies that define new kinds of relationships among the data that they store.&lt;/blockquote&gt;
&lt;p&gt;Before discussing the value that ontologies can bring to the practice of law, Dr. Wyner writes:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Reading a case such as &lt;a href=&#34;http://www.courtinfo.ca.gov/opinions/documents/B211070.PDF&#34;&gt;&lt;em&gt;Manhattan Loft v. Mercury Liquors&lt;/em&gt;&lt;/a&gt;{.linelink}, there are elementary questions that can be answered by any legal professional, but not by a computer:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Where was the case decided?&lt;/li&gt;
&lt;li&gt;Who were the participants and what roles did they play?&lt;/li&gt;
&lt;li&gt;Was it a case of first instance or on appeal?&lt;/li&gt;
&lt;li&gt;What was the basis of the appeal?&lt;/li&gt;
&lt;li&gt;What were the legal issues at stake?&lt;/li&gt;
&lt;li&gt;What were the facts?&lt;/li&gt;
&lt;li&gt;What factors were relevant in making the decision?&lt;/li&gt;
&lt;li&gt;What was the decision?&lt;/li&gt;
&lt;li&gt;What legislation or case law was cited?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Legal information service providers such as &lt;a href=&#34;http://www.lexisnexis.com&#34;&gt;LexisNexis&lt;/a&gt;{.linelink} index some of the information&amp;hellip;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Actually, they identify and index most of the information in this list, as do Westlaw and the Wolters-Kluwer legal publishers, because they store the majority of their content in XML. (As early adopters of this technology, these companies sometimes store it using XML&amp;rsquo;s predecessor, SGML.) A case&amp;rsquo;s venue, its participants and their roles, the facts of the case, and the judge&amp;rsquo;s decision are typical pieces of information that a legal publisher identifies with XML markup and stores in a system that can use this information for specialized queries.&lt;/p&gt;
&lt;p&gt;Ontologies can add a lot to this, and the schemas for this XML will be a great head start to any semantic web-oriented system for getting more out of this data. This won&amp;rsquo;t happen outside of the publishers&amp;rsquo; firewalls soon, though, because the schemas for their legal content play such an important role in the extra value that they add and charge for that no legal publisher would share them. (They don&amp;rsquo;t worry about open source efforts to reproduce their work nearly as much as they worry about competitive advantages over each other.)&lt;/p&gt;
&lt;p&gt;Two other resources that these publishers can build on are their existing taxonomies and their databases of citation relationships. Taxonomies such as &lt;a href=&#34;http://en.wikipedia.org/wiki/West_American_Digest_System&#34;&gt;West&amp;rsquo;s Key Number system&lt;/a&gt; are divided by practice areas (for example, asbestos construction issues vs. child custody) and not document roles or purposes, and therefore make a nice complement to the XML schemas. Legal publishers have sold databases of citation relationships (for example, which case overruled another one) &lt;a href=&#34;http://www.oreillynet.com/xml/blog/2003/05/a_nineteenthcentury_linking_ap.html&#34;&gt;since the nineteenth century&lt;/a&gt;, and this data is all in clean, well-organized databases.&lt;/p&gt;
&lt;p&gt;Kingsley Idehen likes to discuss how relational databases added a level of abstraction over previous models, XML provided an additional layer of flexibility by enabling people to store and use structured data whose structure wasn&amp;rsquo;t necessarily tables, and the RDF data model and associated technology add another layer of abstraction and therefore more possibilities. Behind their firewalls, it&amp;rsquo;s a logical next step for the big legal publishers to build ontologies that define new kinds of relationships among the XML content, the relational citation information, and the taxonomy data that they currently store so that they can get more value out of this data.&lt;/p&gt;
&lt;p&gt;While there are cool things to do with this technology using content such as &lt;a href=&#34;https://www.bobdc.com/blog/semantic-web-technology-and-hu&#34;&gt;ancient literature&lt;/a&gt;, it&amp;rsquo;s much easier to see a business model in a domain such as legal publishing where customers have a bigger budget to spend on information that can help them do their jobs. Making a case for the return on semantic web technology investment for legal publishing will be an interesting challenge, but not too difficult, because these technologies can build incrementally on so many existing information resources such as relational databases and the XML content infrastructure that Dr. Wyner forgot to mention. It will be interesting to see which of the big legal publishers moves ahead with this first, although they may choose not to publicize it.&lt;/p&gt;
&lt;p&gt;For work outside of the big legal publishers, in a 2006 posting titled &lt;a href=&#34;https://www.bobdc.com/blog/law-metadata-on-the-web&#34;&gt;Law metadata on the web&lt;/a&gt; I wrote about how legal-rdf.org looked like a good start, but apparently there&amp;rsquo;s been little enough activity there that they let their domain name ownership lapse, and now it&amp;rsquo;s just parked by a speculator. (That posting also mentions the &lt;a href=&#34;http://www.oasis-open.org/committees/legalxml-courtfiling/&#34;&gt;OASIS LegalXML&lt;/a&gt; work, which hasn&amp;rsquo;t gotten to defining schemas for court decisions and kind of petered out in defining schemas for legislation, the other main document type for legal publishing.)&lt;/p&gt;
&lt;p&gt;Can anyone tell me of other public standards for legal metadata in development that could provide input to semantic web projects?&lt;/p&gt;
&lt;h2 id=&#34;1-comments&#34;&gt;1 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.topquadrant.com&#34; title=&#34;http://www.topquadrant.com&#34;&gt;Irene Polikoff&lt;/a&gt; on &lt;a href=&#34;#comment-2289&#34;&gt;June 21, 2009 12:14 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Bob,&lt;/p&gt;
&lt;p&gt;I was smiling as I read your post. The future is actually even closer than you may think.&lt;/p&gt;
&lt;p&gt;Can&amp;rsquo;t name any names as it would not be appropriate, but a case study on this page &lt;a href=&#34;http://www.topquadrant.com/solutions/ent_vocab_mgmt.html&#34;&gt;http://www.topquadrant.com/solutions/ent_vocab_mgmt.html&lt;/a&gt; is based on our work with one of the large legal information publishers mentioned in your blog. Representatives from the other large publisher you name spent quite a bit of time last week at our booth at the Semantic Technologies conference.&lt;/p&gt;
&lt;p&gt;Not rich ontologies so far, just taxonomies, but it is happening as we speak (or write, for that matter). Publishing is going RDF.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2009">2009</category>
      
      <category domain="https://www.bobdc.com//categories/legal-publishing">legal publishing</category>
      
      <category domain="https://www.bobdc.com//categories/publishing">publishing</category>
      
      <category domain="https://www.bobdc.com//categories/semantic-web">semantic web</category>
      
    </item>
    
    <item>
      <title>SearchMonkey and RDFa</title>
      <link>https://www.bobdc.com/blog/searchmonkey-and-rdfa/</link>
      <pubDate>Tue, 02 Jun 2009 20:15:24 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/searchmonkey-and-rdfa/</guid>
      
      
      <description><div>What am I missing?</div><div>&lt;img id=&#34;id202737&#34; src=&#34;http://i.i.com.com/cnwk.1d/i/bto/20080514/searchmonkey_logo_5.14.2008.PNG&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;[searchmonkey logo]&#34; width=&#34;120px&#34;/&gt;
&lt;p&gt;Yahoo! &lt;a href=&#34;http://developer.yahoo.com/searchmonkey/&#34;&gt;SearchMonkey&lt;/a&gt; is one of those interesting, RDF-related technologies that I&amp;rsquo;d been meaning to check out for a while, and when I saw how much of the reaction to &lt;a href=&#34;https://www.bobdc.com/blog/google-and-rdfa-what-and-why&#34;&gt;Google&amp;rsquo;s Rich Snippets&lt;/a&gt; was people like &lt;a href=&#34;http://hackingsearch.com/2009/05/semantic-markup-how-to-feed-google-rich-snippets-yahoo-searchmonkey-with-rdfa-and-microformats/&#34;&gt;Ryan Smith&lt;/a&gt; or Peter Mika in the May &lt;a href=&#34;http://semanticgang.talis.com/2009/05/22/may-2009-the-semantic-web-gang-discuss-wolfram-alpha-and-googles-rdfa/&#34;&gt;Semantic Web Gang podcast&lt;/a&gt; saying that Google was just doing what SearchMonkey had already done, I knew that it was time to look more closely at SearchMonkey.&lt;/p&gt;
&lt;p&gt;I wanted to see support for RDFa embedded in HTML, and to be honest, I only see it in SearchMonkey if I squint while I&amp;rsquo;m looking and tilt my head slightly sideways. Perhaps I&amp;rsquo;m missing something, and I hope someone points it out to me.&lt;/p&gt;
&lt;p&gt;According to the &lt;a href=&#34;http://developer.yahoo.com/searchmonkey/siteowner.html&#34;&gt;Site Owner Overview&lt;/a&gt;, there are two ways to take advantage of SearchMonkey: Standard Enhanced Results or Custom SearchMonkey Applications.&lt;/p&gt;
&lt;h2 id=&#34;id202825&#34;&gt;Standard Enhanced Results&lt;/h2&gt;
&lt;p&gt;The Site Owner Overview page says this is &amp;ldquo;Currently available for certain content types such as Video, Games, and Documents&amp;rdquo;. Sounds good to me; I&amp;rsquo;m very interested in adding metadata to documents. According to the &lt;a href=&#34;http://developer.search.yahoo.com/help/objects/documents&#34;&gt;Documents page&lt;/a&gt;, though, &amp;ldquo;the Yahoo! Search document reader currently supports Flash documents only&amp;rdquo;. If you want to use RDFa to identify specialized metadata for Yahoo to use when they return your document in a search result list, your document must be stored in a Flash document, and then you embed your metadata in the attributes of an &lt;code&gt;object&lt;/code&gt; element that points at that document.&lt;/p&gt;
&lt;p&gt;I think it&amp;rsquo;s great that this lets us use RDFa to assign metadata to &lt;a href=&#34;http://www.slideshare.net/&#34;&gt;slideshare&lt;/a&gt; and &lt;a href=&#34;http://www.scribd.com/&#34;&gt;Scribd&lt;/a&gt; documents, but if this has such a strong dependency on a binary format controlled by a single software company, I&amp;rsquo;m not that interested.&lt;/p&gt;
&lt;h2 id=&#34;id202876&#34;&gt;Custom SearchMonkey Applications&lt;/h2&gt;
&lt;p&gt;OK, so I don&amp;rsquo;t want to see a shared web publishing infrastructure have such dependencies on this proprietary binary format. The SearchMonkey &lt;a href=&#34;http://developer.search.yahoo.com/start&#34;&gt;Getting Started&lt;/a&gt; page tells us: &amp;ldquo;Don&amp;rsquo;t have Flash objects? Or want to build an app to display custom enhanced results? Head on over to the SearchMonkey Developer Tool to build an app where you can display a custom image, extract structured data from your site, [or] link to pages within your site&amp;rdquo;. This sounded a bit better.&lt;/p&gt;
&lt;p&gt;According to the &lt;a href=&#34;http://developer.search.yahoo.com/wizard/index&#34;&gt;SearchMonkey Application Dashboard&lt;/a&gt; page, &amp;ldquo;Presentation Applications are small PHP apps that display enhanced search results using data services. You can use an existing data service or create a custom service below&amp;rdquo;. When I went through the steps of building a Custom Data Service based on an existing one, it asked me for a URL pattern to specify pages where it should look for data and URLs that fit that pattern to use for testing. Then, it showed the XSLT that it would use to extract data, displayed in an edit box where I could customize it.&lt;/p&gt;
&lt;p&gt;You use this stylesheet to &amp;ldquo;specify XSLT code for extracting information from the page and representing that information as &lt;a href=&#34;http://developer.yahoo.com/searchmonkey/smguide/datarss.html&#34;&gt;DataRSS&lt;/a&gt;&amp;rdquo;. Despite the admonition to &amp;ldquo;avoid using namespaces in your XPATH expressions, as SearchMonkey strips these out&amp;rdquo;, this looked like something I could work with once I get to know the DataRSS format. (There&amp;rsquo;s a schema on that page to use for testing your stylesheet output.)&lt;/p&gt;
&lt;p&gt;So if I point Yahoo at some documents and write a stylesheet that goes through those documents and returns DataRSS, SearchMonkey can use this. I could put RDFa in those documents and have my stylesheet get DataRSS data out of that&amp;hellip; but I could also make up my own BobFooBar format to embed in the HTML and have my stylesheet get DataRSS out of that as well, so I don&amp;rsquo;t really see how this counts as RDFa support.&lt;/p&gt;
&lt;p&gt;The Semantic Web community is still trying to piece together the nature of Google&amp;rsquo;s support of RDFa in HTML documents, and there are things to complain about, but we know that their crawlers will look for some sort of RDFa in HTML documents. This looks like a real step forward for support of standards-based metadata on the web by a major search engine. Perhaps my review of the SearchMonkey options is missing something, but so far I haven&amp;rsquo;t seen anything to show me that what they offer is something for people interested in open web standards to get excited about.&lt;/p&gt;
&lt;p&gt;Again, if I&amp;rsquo;m wrong about any of this, I&amp;rsquo;d be happy to be corrected.&lt;/p&gt;
&lt;h2 id=&#34;9-comments&#34;&gt;9 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://thewebsemantic.com&#34; title=&#34;http://thewebsemantic.com&#34;&gt;Taylor&lt;/a&gt; on &lt;a href=&#34;#comment-2280&#34;&gt;June 3, 2009 10:04 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;SearchMonkey is similar to the tripblox concept where other sites provide the RDFa&amp;hellip;search monkey sees it, and therefore can list items in a more meaningful way.&lt;/p&gt;
&lt;p&gt;I don&amp;rsquo;t think there are any RDFa tie ins but microsoft bing has this flavor too. You can type &amp;ldquo;hotels in&amp;rdquo; and you&amp;rsquo;re shopping for hotels on a map, but in a vendor/supplier neutral way.&lt;/p&gt;
&lt;p&gt;So sites like Expedia, Orbitz, Travelocity write software to list travel search results. We know the content is travel related (hotel/air/car) and have custom views for that&amp;hellip;so the search is vertical, a specific domain. Now the horizontal search tools are finding ways to semantically recognize content and list it in horizontal specialized ways. viewzi is another example.&lt;/p&gt;
&lt;p&gt;So the very wide, general implication I see is that search tools are getting better, and allowing users to search supplier agnostic, price compare, and then they arrive at the vertical site ready to make a purchase. RDFa makes it possible for the small fries to be seen by &amp;ldquo;big vertical search&amp;rdquo; and have their results listed in a very meaningful way, for example, a hotel could be listed just as elegantly on search monkey as it&amp;rsquo;s listed on expedia&amp;hellip;and since search monkey gives you expedia/travelocity/orbitz results + the small fry suppliers with RDFa on their site, where you you start searching?&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob&lt;/a&gt; on &lt;a href=&#34;#comment-2281&#34;&gt;June 3, 2009 10:22 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Taylor,&lt;/p&gt;
&lt;p&gt;What you&amp;rsquo;re saying in general makes sense to me, but&amp;hellip;&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;sites provide the RDFa&amp;hellip;search monkey sees it&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I couldn&amp;rsquo;t find evidence that SearchMonkey sees any RDFa besides that which is embedded as attributes in object elements that point to Flash files. Other RDFa use by SearchMonkey depends on XSLT translation of that RDFa to DataRSS, which is what SearchMonkey is really using&amp;hellip; right?&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://developer.yahoo.com/searchmonkey&#34; title=&#34;http://developer.yahoo.com/searchmonkey&#34;&gt;Evan Goer&lt;/a&gt; on &lt;a href=&#34;#comment-2282&#34;&gt;June 3, 2009 11:07 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Hello Bob,&lt;/p&gt;
&lt;p&gt;Rest assured, SearchMonkey does see the RDFa you add to a page. When the Yahoo! crawler hits your page, we extract any valid RDFa we find. For each URL, we store that data as a chunk of DataRSS XML. DataRSS is our way of normalizing between all the different types of structured data we might have for a page: RDFa, eRDF, various microformats, feeds, Delicious data, anything else.&lt;/p&gt;
&lt;p&gt;If the DataRSS on a URL matches a pattern that we&amp;rsquo;re expecting, then we automatically display that URL as an enhanced result &amp;ndash; that&amp;rsquo;s our Flash video/documents/games functionality. Google Rich Snippets is the same thing, but for different use cases (like reviews, etc.) Rest assured, both teams are working to add more. :)&lt;/p&gt;
&lt;p&gt;For arbitrary RDFa where we don&amp;rsquo;t have an automatic presentation, you can use SearchMonkey to create a custom presentation. The SearchMonkey developer tool allows you to build a little PHP app that digs into the DataRSS XML using XPATH and tells Yahoo! Search how to display that data.&lt;/p&gt;
&lt;p&gt;Note that you do not have to write any XSLT to use RDFa. You&amp;rsquo;re right that if you create a BobFooBar format in your HTML, then we don&amp;rsquo;t understand that format at all. Which means if you want to get at it using SearchMonkey, yes, you would have to build what we call an &amp;ldquo;XSLT Custom Data Service.&amp;rdquo; But if you use RDFa, a format we do understand &amp;ndash; then we are essentially running that XSLT for you, at index time.&lt;/p&gt;
&lt;p&gt;Finally, you can also call our BOSS Search APIs and get all our RDFa + other structured data back as DataRSS XML or RDF/XML (your choice). Basically, Yahoo! crawls the web harvesting structured data, and you can use BOSS to reflect that data back at you.&lt;/p&gt;
&lt;p&gt;Best,&lt;/p&gt;
&lt;p&gt;Evan Goer&lt;br /&gt;
Yahoo! SearchMonkey Team&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob&lt;/a&gt; on &lt;a href=&#34;#comment-2283&#34;&gt;June 3, 2009 11:22 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Thanks Evan, this sounds more promising.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;For arbitrary RDFa where we don&amp;rsquo;t have an automatic presentation&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I assume that the RDFa where you do have an automatic presentation is a set of names from specific namespaces, e.g. dc:creator. Is this set documented somewhere? I get the impression from what you write that I can embed RDFa using these names as predicates into an HTML document, and that this metadata may show up as part of a search result.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;you can also call our BOSS Search APIs and get all our RDFa + other&lt;br /&gt;
structured data back as DataRSS XML&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;If dc:creator is part of the set documented above, would this let me query the documents for which you have DataRSS metadata stored for dc:creator=&amp;lsquo;Tim Berners-Lee&amp;rsquo; and have the documents returned if they&amp;rsquo;re there? Including HTML documents as described above?\&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://developer.yahoo.com/searchmonkey&#34; title=&#34;http://developer.yahoo.com/searchmonkey&#34;&gt;Evan Goer&lt;/a&gt; on &lt;a href=&#34;#comment-2284&#34;&gt;June 3, 2009 11:49 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;That&amp;rsquo;s right, the &lt;a href=&#34;http://developer.search.yahoo.com/start&#34;&gt;automatic SearchMonkey presentations&lt;/a&gt; are triggered off of certain namespaces. For example, you can trigger a Video result using media:video and media:thumbnail. You can also change the title, abstract, etc. by including a dc:title or dc:description.&lt;/p&gt;
&lt;p&gt;Viewing the metadata in search results: well, beyond fancy presentations, what we&amp;rsquo;ve got right now are &lt;a href=&#34;http://developer.yahoo.net/blog/archives/2008/12/monkey_finds_microformats_and_rdf.html&#34;&gt;some very crude filters&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;With BOSS, you could create something slightly more powerful. You could say, &amp;ldquo;give me the top 100 results that have RDFa &lt;em&gt;and&lt;/em&gt; have the term &amp;lsquo;Tim Berners-Lee&amp;rsquo;&amp;rdquo;. Then your BOSS app could sift through these results and return the URLs that have a dc:creator=&amp;lsquo;Tim Berners-Lee&amp;rsquo;. But we don&amp;rsquo;t yet support arbitrary SPARQL queries into the Yahoo! Search index. That&amp;rsquo;s more like the &amp;ldquo;&lt;a href=&#34;http://searchengineland.com/yahoo-were-moving-from-web-of-pages-to-web-of-objects-19524&#34;&gt;Web Of Objects&lt;/a&gt;&amp;rdquo; that our execs were talking about last month.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob&lt;/a&gt; on &lt;a href=&#34;#comment-2285&#34;&gt;June 3, 2009 12:59 PM&lt;/a&gt;&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;the automatic SearchMonkey presentations are triggered off of certain&lt;br /&gt;
namespaces&amp;hellip;.media:video&amp;hellip; media:thumbnail&amp;hellip; dc:title&amp;hellip; dc:description.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Is there a comprehensive list of these namespaces and properties somewhere?&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;But we don&amp;rsquo;t yet support arbitrary SPARQL queries into the&lt;br /&gt;
Yahoo! Search index.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;That would be cool, but I think it would be much simpler to simply allow queries that return documents that have the RDFa equivalent of (&amp;gt;, p:foo, &amp;ldquo;bar&amp;rdquo;) in them. You tell us what p:foo predicates we can use, we specify &amp;ldquo;bar&amp;rdquo;, and you return each document that has p:foo=&amp;ldquo;bar&amp;rdquo; in it.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://developer.yahoo.com/searchmonkey&#34; title=&#34;http://developer.yahoo.com/searchmonkey&#34;&gt;Evan Goer&lt;/a&gt; on &lt;a href=&#34;#comment-2286&#34;&gt;June 3, 2009 1:26 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;For the automatic SearchMonkey presentations, all the namespaces and properties are scattered across the different documentation pages under &lt;a href=&#34;http://developer.yahoo.search.com/start.&#34;&gt;http://developer.yahoo.search.com/start.&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;As for supporting a simpler query syntax: I&amp;rsquo;ll bring it up to our architect!&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob&lt;/a&gt; on &lt;a href=&#34;#comment-2287&#34;&gt;June 3, 2009 1:50 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;m guessing that you meant &lt;a href=&#34;http://developer.search.yahoo.com/start&#34;&gt;http://developer.search.yahoo.com/start&lt;/a&gt; and not &lt;a href=&#34;http://developer.yahoo.search.com/start.&#34;&gt;http://developer.yahoo.search.com/start.&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Compiling those namespaces and properties into a single document would be a big boost to usage of SearchMonkey by the semantic web community considering how little work it would be.&lt;/p&gt;
&lt;p&gt;Thanks again!&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.yourholylandstore.com&#34; title=&#34;http://www.yourholylandstore.com&#34;&gt;Yarmulka&lt;/a&gt; on &lt;a href=&#34;#comment-2361&#34;&gt;October 28, 2009 8:03 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Just added searchmonkey product objects to our pages. Yahoo tells that products are found but we don&amp;rsquo;t see it in search results. Very strange.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2009">2009</category>
      
      <category domain="https://www.bobdc.com//categories/rdf">RDF</category>
      
    </item>
    
    <item>
      <title>&#34;Semantic Web for the Working Ontologist&#34;</title>
      <link>https://www.bobdc.com/blog/semantic-web-for-the-working-o/</link>
      <pubDate>Wed, 27 May 2009 09:36:52 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/semantic-web-for-the-working-o/</guid>
      
      
      <description><div>And for anyone interested in working with ontologies.</div><div>&lt;p&gt;&lt;a href=&#34;http://www.amazon.com/exec/obidos/ISBN=0123735564/bobducharmeA/&#34;&gt;&amp;lt;img id=&amp;ldquo;id202742&amp;rdquo; src=&amp;ldquo;http://ecx.images-amazon.com/images/I/51VH71S80XL.jpg&amp;rdquo; border=&amp;ldquo;0&amp;rdquo; align=&amp;ldquo;right&amp;rdquo; class=&amp;ldquo;rightAlignedOpeningPicture&amp;rdquo; alt=&amp;quot;[&amp;ldquo;Semantic Web for the Working Ontologist&amp;rdquo; cover]&amp;quot; width=&amp;ldquo;160px&amp;rdquo;/&amp;gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I recently finished Dean Allemang and Jim Hendler&amp;rsquo;s book &lt;a href=&#34;http://www.amazon.com/exec/obidos/ISBN=0123735564/bobducharmeA/&#34;&gt;Semantic Web for the Working Ontologist&lt;/a&gt;, and I strongly recommend it to anyone interested in OWL, RDF, or the Semantic Web. I&amp;rsquo;m surprised that their publishers even agreed to the title; there may be some people who look at the book&amp;rsquo;s title and say &amp;ldquo;Hey, I&amp;rsquo;m a working ontologist, so I need that book!&amp;rdquo;, but I think that it would benefit a much wider audience: not just people who consider themselves working ontologists, but anyone who needs to work with standards-based ontologies or with people who do.&lt;/p&gt;
&lt;p&gt;The book describes many modeling issues and then shows how to work through them using concrete examples that are explained well enough to generalize them to other domains. Anyone who reads this book and then works with ontologies will come back to it saying to themselves &amp;ldquo;I know I saw something in here about how to handle this particular information relationship&amp;hellip;&amp;rdquo; Examples are not presented as working code per se, but there are many examples showing a set of triples, a few RDFS and/or OWL statements, and the resulting new triples implied by the combination. Many of these examples made me want to type them into a text editor, run them through &lt;a href=&#34;http://clarkparsia.com/pellet&#34;&gt;Pellet&lt;/a&gt;, and then start modifying the examples to see what happened, because to me, those implied triples are the &lt;a href=&#34;https://www.bobdc.com/blog/adding-metadata-value-with-pel&#34;&gt;coolest part&lt;/a&gt; of OWL: the new facts that you get out of an existing set of facts by adding metadata.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve &lt;a href=&#34;https://www.bobdc.com/blog/rdfs-without-rdfowl&#34;&gt;wondered before&lt;/a&gt; about what good &lt;a href=&#34;http://www.w3.org/TR/rdf-schema/&#34;&gt;RDFS&lt;/a&gt; was without OWL. I started to get a better appreciation for the possibilities when I &lt;a href=&#34;http://www.snee.com/bobdc.blog/2009/02/getting-started-with-sesame.html&#34;&gt;played a bit with Sesame&lt;/a&gt;, and Dean and Jim&amp;rsquo;s book gave me a much better idea of what you can do with RDFS when you don&amp;rsquo;t have OWL support, so there&amp;rsquo;s a reason for Sesame developers to get the book.&lt;/p&gt;
&lt;p&gt;In addition to showing people who are dabbling with Semantic Web technologies how to get deeper into the technology, the book does an especially good job of showing experienced software developers which aspects of Semantic Web development are different from what they&amp;rsquo;re used to and why these differences open up new possibilities instead of limiting them. For example:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The ability in OWL to infer class relationships is a severe departure from Object Oriented modeling. In OO modeling the class structure forms the backbone of the model&amp;rsquo;s organization. All instances are created as members of some class, and their behavior is specified by the class structure. Changes to the class structure have far-reaching impact on the behavior of the system. In OWL, it is possible for the class structure to change as more information is learned about classes or individuals.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;And this is a Good Thing! Got that, OO folks? If not, there&amp;rsquo;s plenty more in the book to demonstrate this to you. For example, an early chapter in the book asks &amp;ldquo;How can we accommodate variation of sources if we can&amp;rsquo;t structure the entities they are describing into a class model? The Semantic Web provides an elegant solution to this problem&amp;hellip; any model can be built up from contributions from multiple sources&amp;rdquo;. Or this: &amp;ldquo;it is never accurate in the Semantic Web to say that a property is &amp;lsquo;defined for a class.&amp;rsquo; A property is defined independently of any class, and the RDFS relations specify which inferences can be correctly made about it in particular contexts.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;Some great advice for all software developers:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&amp;hellip;you might think that modeling for reuse is best done by anticipating &lt;em&gt;everything&lt;/em&gt; that someone might want to use your model for, and thus the more you include the better. This is a mistake because the more you put in, the more you restrict someone else&amp;rsquo;s ability to extend your model instead of just use it as is. Reuse is best done, as in other systems, by designing to maximize future combination with other things, not to restrict it.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Closing the book with chapters such as &amp;ldquo;Using OWL in the Wild&amp;rdquo;, &amp;ldquo;Good and Bad Modeling Practices&amp;rdquo;, and a &amp;ldquo;Frequently Asked Questions&amp;rdquo; appendix help even more to connect the theory to the practice, and the final chapter&amp;rsquo;s &amp;ldquo;Beyond OWL 1.0&amp;rdquo; section shows what deficiencies the experts currently see in OWL and what kind of new features a future release might offer us. All in all, for people who are strongly interested in OWL and the Semantic Web, or even just a little curious, this book will give you a solid grounding in both the theory and practice of what the technology can bring to new applications that you might be working with.&lt;/p&gt;
&lt;h2 id=&#34;6-comments&#34;&gt;6 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.workingontologist.org/&#34; title=&#34;http://www.workingontologist.org/&#34;&gt;Dean Allemang&lt;/a&gt; on &lt;a href=&#34;#comment-2272&#34;&gt;May 27, 2009 10:47 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Thanks for the review, Bob!&lt;/p&gt;
&lt;p&gt;A resource you might not be aware of includes source code for most of the examples (soon to come - an ontology browser that will let you examine them and play with inferencing)&lt;/p&gt;
&lt;p&gt;Check it out a &lt;a href=&#34;http://www.workingontologist.org/&#34;&gt;WorkingOntologist.org&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;If you find errata (are you using the second printing or first printing?), please record them there, as well.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.proxml.be/&#34; title=&#34;http://www.proxml.be/&#34;&gt;Paul Hermans&lt;/a&gt; on &lt;a href=&#34;#comment-2273&#34;&gt;May 28, 2009 3:01 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I agree completely.&lt;br /&gt;
It is only a shame that the book was published with so many errors in the code and figures.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://thewebsemantic.com&#34; title=&#34;http://thewebsemantic.com&#34;&gt;Taylor&lt;/a&gt; on &lt;a href=&#34;#comment-2274&#34;&gt;May 28, 2009 10:43 AM&lt;/a&gt;&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Got that, OO folks?&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Bob, I would say that even those of us who say we&amp;rsquo;ve got it don&amp;rsquo;t, it&amp;rsquo;s profoundly different. The way I keep things straight is to classify OWL as POP, or Property Oriented Programming. Classes don&amp;rsquo;t have properties&amp;hellip;properties have classes by way of range/domain. Even then I still find myself confused and making invalid assumptions based on my OOP background&amp;hellip;in other words, for some it may be simple, but I caution anybody from making quick comparisons between OWL classification and OOP subclassing. It&amp;rsquo;s required me to think hard and ask questions and get feedback when I get lost.&lt;/p&gt;
&lt;p&gt;By Erik Hennum on &lt;a href=&#34;#comment-2275&#34;&gt;May 28, 2009 4:02 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;To shake up the OO mindset (for what it&amp;rsquo;s worth), the papers at &lt;a href=&#34;http://www.w3.org/TR/sw-oosd-primer/&#34;&gt;http://www.w3.org/TR/sw-oosd-primer/&lt;/a&gt; and &lt;a href=&#34;http://www.hpl.hp.com/techreports/2005/HPL-2005-189.pdf&#34;&gt;http://www.hpl.hp.com/techreports/2005/HPL-2005-189.pdf&lt;/a&gt; have been helpful to me. It sounds like this book goes much deeper; thanks for the alert.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob&lt;/a&gt; on &lt;a href=&#34;#comment-2276&#34;&gt;May 28, 2009 4:31 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Thanks Erik, I had no idea that that W3C paper was even there. It looks very useful.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://sites.google.com/site/rickcreamer&#34; title=&#34;http://sites.google.com/site/rickcreamer&#34;&gt;Rick&lt;/a&gt; on &lt;a href=&#34;#comment-2278&#34;&gt;May 31, 2009 10:55 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Bob, thank you for posting this review! I am trying to budget the time to buy and study this book. I still have questions in the SW vs. OO area! I will certainly read the W3C SW vs. OO paper. I also have questions as to how to properly model statement metadata in such a way as not to make my triple store incompatible with other tools such as inference engines. Finally, I would like to know how to properly model higher-order predicates. I guess you could call these SW &amp;ldquo;recipes&amp;rdquo;, or &amp;ldquo;best practices&amp;rdquo;, or SW patterns + anti-patterns. I hope this book covers some of these topics.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2009">2009</category>
      
      <category domain="https://www.bobdc.com//categories/rdf">RDF</category>
      
      <category domain="https://www.bobdc.com//categories/rdf/owl">RDF/OWL</category>
      
      <category domain="https://www.bobdc.com//categories/book-reviews">book reviews</category>
      
      <category domain="https://www.bobdc.com//categories/semantic-web">semantic web</category>
      
    </item>
    
    <item>
      <title>Writing about the Semantic Web</title>
      <link>https://www.bobdc.com/blog/writing-about-the-semantic-web/</link>
      <pubDate>Fri, 22 May 2009 09:51:10 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/writing-about-the-semantic-web/</guid>
      
      
      <description><div>And Linked Data, and RDF, and RDFa, and SPARQL, and OWL, and...</div><div>&lt;p&gt;After writing a few paid articles and doing a lot of blogging about various issues, features, and trends surrounding the Semantic Web, Linked Data, RDF, RDFa, SPARQL, OWL, and related tools and implementations, I thought it would be nice if I could tie them together into something resembling a cohesive whole. So, I wrote a short essay titled &lt;a href=&#34;http://www.snee.com/rdf/semweboverview.html&#34;&gt;RDF, The Semantic Web, and Linked Data&lt;/a&gt; with over 70 footnote links to these various pieces. It will be a handy reference for me in the future, and I hope it may be for others as well.&lt;/p&gt;
&lt;h2 id=&#34;5-comments&#34;&gt;5 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://danbri.org/words/&#34; title=&#34;http://danbri.org/words/&#34;&gt;Dan Brickley&lt;/a&gt; on &lt;a href=&#34;#comment-2266&#34;&gt;May 23, 2009 6:08 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Nice overview! Would you consider mentioning SKOS, FOAF and Dublin Core in any revisions? RDF isn&amp;rsquo;t so interesting without vocabularies and public data using them&amp;hellip;&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.contextin.com&#34; title=&#34;http://www.contextin.com&#34;&gt;Ben Stein&lt;/a&gt; on &lt;a href=&#34;#comment-2267&#34;&gt;May 23, 2009 8:57 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Good essay Bob, was really interesting to read.&lt;/p&gt;
&lt;p&gt;If you&amp;rsquo;re interested in the semantic web technologies, I&amp;rsquo;d like to reference you to &lt;a href=&#34;http://www.urlclassifier.com&#34;&gt;http://www.urlclassifier.com&lt;/a&gt; a web-service using NLP and statistical methods for extracting the main discussed topics from web-pages.&lt;br /&gt;
Using ContextIn [Semantic Web]{} algorithms&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-2268&#34;&gt;May 23, 2009 6:07 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Dan,&lt;/p&gt;
&lt;p&gt;Good idea, I will work those in.&lt;/p&gt;
&lt;p&gt;Ben,&lt;/p&gt;
&lt;p&gt;Note the part of the essay that says &amp;ldquo;I find it useful to think of the Semantic Web as being the Linked Data web with the addition of standards-based semantics encoded to help you get more out of that data. As the idea of &amp;lsquo;semantics&amp;rsquo; becomes a buzzword for selling web-based technology, the &amp;lsquo;standards-based&amp;rsquo; part of this becomes more important&amp;rdquo;. Can you tell us more about urlclassifier.com&amp;rsquo;s relationship to W3C semantic web technology standards such as RDF, SPARQL, and OWL?&lt;/p&gt;
&lt;p&gt;thanks,&lt;/p&gt;
&lt;p&gt;Bob&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://twitter.com/sarahebourne&#34; title=&#34;http://twitter.com/sarahebourne&#34;&gt;Sarah Bourne&lt;/a&gt; on &lt;a href=&#34;#comment-2270&#34;&gt;May 26, 2009 1:33 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;One of the hardest part of convincing people of the value of the Semantic Web is explaining it in plain English. This essay is a solid contribution to that goal. Thank you for sharing it!&lt;/p&gt;
&lt;p&gt;By Dean Allemang on &lt;a href=&#34;#comment-2271&#34;&gt;May 26, 2009 4:39 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;If you ever have a tough day, and need a break, check out &lt;a href=&#34;http://www.dailypuppy.com/&#34;&gt;The Daily Puppy&lt;/a&gt; :)&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2009">2009</category>
      
      <category domain="https://www.bobdc.com//categories/rdf">RDF</category>
      
      <category domain="https://www.bobdc.com//categories/rdf/owl">RDF/OWL</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
      <category domain="https://www.bobdc.com//categories/semantic-web">semantic web</category>
      
      <category domain="https://www.bobdc.com//categories/triplestores">triplestores</category>
      
    </item>
    
    <item>
      <title>Google and RDFa: what and why</title>
      <link>https://www.bobdc.com/blog/google-and-rdfa-what-and-why/</link>
      <pubDate>Fri, 15 May 2009 19:57:29 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/google-and-rdfa-what-and-why/</guid>
      
      
      <description><div>Surprise—to make more money!</div><div>&lt;p&gt;After the &lt;a href=&#34;http://search.twitter.com/search?q=%23google+%23rdfa&#34;&gt;initial burst&lt;/a&gt; of discussion about Google &lt;a href=&#34;http://google.com/support/webmasters/bin/topic.py?topic=21997&#34;&gt;putting their toe into the standardized metadata water&lt;/a&gt;, I started wondering about the corner of the pool they had chosen. They&amp;rsquo;re not ready to start parsing any old RDFa; they&amp;rsquo;ll be looking for RDFa that uses the &lt;a href=&#34;http://rdf.data-vocabulary.org/rdf.xml&#34;&gt;vocabulary&lt;/a&gt; they somewhat hastily defined for the purpose. Why does the vocabulary define the properties that it defines?&lt;/p&gt;
&lt;p&gt;The &lt;a href=&#34;http://google.com/support/webmasters/bin/answer.py?answer=146646&#34;&gt;People&lt;/a&gt; properties sound basic enough, although as all the semweb geeks have already tweeted, Google should have leveraged the extensive existing work done on the &lt;a href=&#34;http://www.foaf-project.org/&#34;&gt;FOAF&lt;/a&gt; vocabulary for that. The other three categories of properties they define are &lt;a href=&#34;http://google.com/support/webmasters/bin/answer.py?answer=146645&#34;&gt;Reviews&lt;/a&gt;, &lt;a href=&#34;http://google.com/support/webmasters/bin/answer.py?answer=146750&#34;&gt;Products&lt;/a&gt;, and &lt;a href=&#34;http://google.com/support/webmasters/bin/answer.py?answer=146861&#34;&gt;Businesses and organizations&lt;/a&gt;. Of all the knowledge domains to represent, why these?&lt;/p&gt;
&lt;blockquote id=&#34;id202818&#34; class=&#34;pullquote&#34;&gt;In the words of Drupal project lead Dries Buytaert, &#34;Structured data is the new search engine optimization&#34;.&lt;/blockquote&gt;
&lt;p&gt;Comparing a given Google project to the big picture of all their projects can be overwhelming, but there&amp;rsquo;s no need to when you remember what their core business is: putting ads next to search results and charging for the ads when they get clicked. The more relevant the ads are to the content next to them, the more likely they are to get clicked, and the more money Google makes.&lt;/p&gt;
&lt;p&gt;In a blog post titled &lt;a href=&#34;https://www.bobdc.com/blog/the-future-of-rdfa&#34;&gt;The future of RDFa&lt;/a&gt; in February of last year, I wrote that &amp;ldquo;Pricing is&amp;hellip; a huge area where people would be happy to give away data in the form of extra embedded metadata in their web pages, because it can drive new paying customers to the source of that data&amp;rdquo;. Google wants that data to help people sell more stuff and make more money themselves. The kind of metadata that would be embedded in reviews and information about products and companies—especially the category, brand, and price properties, and the detailed metadata that can be included in reviews—can make it much easier for Google to find users who are using their search engine to research things they&amp;rsquo;re interested in buying.&lt;/p&gt;
&lt;p&gt;It will be interesting to see how the big hustling SEO world adapts to this. In the words of Drupal project lead Dries Buytaert, &lt;a href=&#34;http://buytaert.net/structured-data-is-the-new-search-engine-optimization&#34;&gt;Structured data is the new search engine optimization&lt;/a&gt;. When he writes &amp;ldquo;Every webmaster wanting to improve click-through rates, reduce bounce rates, and improve conversation rates, can no longer ignore RDFa or Microformats&amp;rdquo;, it reminds me that when the SEO world eventually gravitates more in the RDFa direction or the microformats direction, these very quantitative, results-driven people will have some real data to explain why. I&amp;rsquo;ll have to start searching their voluminous discussions out there to see what people are saying.&lt;/p&gt;
&lt;p&gt;Some other miscellaneous notes on Google and RDFa:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;For now, Google isn&amp;rsquo;t going to look for this markup in all the data they crawl. As far as I can tell, they want you to &lt;a href=&#34;http://www.google.com/support/webmasters/bin/request.py?contact_type=rich_snippets_feedback&#34;&gt;nominate your own site&lt;/a&gt; to be crawled and parsed for the extra metadata.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;It&amp;rsquo;s nice that Google encourages people to add a proper namespace declaration of xmlns:v=&amp;ldquo;&lt;a href=&#34;http://rdf.data-vocabulary.org/%22&#34;&gt;http://rdf.data-vocabulary.org/&amp;quot;&lt;/a&gt; to a web page before adding properties such as v:reviewer and v:description. They even make this their number one &amp;ldquo;&lt;a href=&#34;http://google.com/support/webmasters/bin/answer.py?answer=146898&#34;&gt;important property&lt;/a&gt;&amp;rdquo;. But, when they parse a document that may contain this metadata, will they check for xmlns:v=&amp;ldquo;&lt;a href=&#34;http://rdf.data-vocabulary.org/%22&#34;&gt;http://rdf.data-vocabulary.org/&amp;quot;&lt;/a&gt; and then only look for v:reviewer and the other properties if they find it? Or, if they see xmlns:foo=&amp;ldquo;&lt;a href=&#34;http://rdf.data-vocabulary.org/%22&#34;&gt;http://rdf.data-vocabulary.org/&amp;quot;&lt;/a&gt;, will they look for foo:reviewer and other properties from their namespace even though they document doesn&amp;rsquo;t use the prefix from Google&amp;rsquo;s demo?&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;They &lt;a href=&#34;http://google.com/support/webmasters/bin/answer.py?hl=en&amp;amp;answer=146898&#34;&gt;point to&lt;/a&gt; the &amp;ldquo;official&amp;rdquo; W3C &lt;a href=&#34;http://www.w3.org/TR/xhtml-rdfa-primer/&#34;&gt;RDFa Primer&lt;/a&gt;. (It was a pleasant surprise to be reminded that the Primer&amp;rsquo;s &lt;a href=&#34;http://www.w3.org/TR/xhtml-rdfa-primer/#id85528&#34;&gt;acknowledgments&lt;/a&gt; mention me for &amp;ldquo;reviewing the work and providing useful commentary&amp;rdquo;.) Even if Google&amp;rsquo;s implementation of this will only deal with a limited vocabulary, from what I can see they&amp;rsquo;re not subsetting the standard itself, like Adobe did with their XMP &amp;ldquo;profile&amp;rdquo; of RDF.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Google does see the semantic web world beyond what&amp;rsquo;s defined in their ontology. According to the &lt;a href=&#34;http://google.com/support/webmasters/bin/answer.py?answer=146645&#34;&gt;Reviews&lt;/a&gt; page, &amp;ldquo;You can use the additional expressiveness of RDFa to provide more information about the subject of your review. Google does not currently use the &lt;code&gt;about&lt;/code&gt; property in search results, but it may be used in the future&amp;rdquo;. Building on this, they reassure the reader about an issue that often confuses those who are new to the use of URIs as identifiers instead of just being URLs: &amp;ldquo;If the object you&amp;rsquo;re referring to does not have an obvious URL to include, you could use the URL of pages on Wikipedia or similar web sources&amp;rdquo;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;It was nice to see how quickly a community effort led by Kingsley Idehen put together &lt;a href=&#34;http://purl.org/NET/googlevocab#&#34;&gt;an ontology&lt;/a&gt; (explore it &lt;a href=&#34;http://lod.openlinksw.com/describe/?url=http%3A%2F%2Fpurl.org%2FNET%2Fgooglevocab%23&amp;amp;sid=60266&#34;&gt;here&lt;/a&gt;) defining relationships between Google&amp;rsquo;s properties and more well-established ones, complete with owl:equivalentProperty properties defined to help clean up the potential mess of the vaguely defined delimiters between the &lt;a href=&#34;http://rdf.data-vocabulary.org&#34;&gt;http://rdf.data-vocabulary.org&lt;/a&gt; URI and each property name. (See &lt;a href=&#34;http://lod.openlinksw.com/describe/?url=http%3A%2F%2Fpurl.org%2FNET%2Fgooglevocab%23nickname&amp;amp;sid=60266&#34;&gt;here&lt;/a&gt;, near the bottom for an example.) This could become a canonical example of the value of ontologies.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;It will be a lot of fun to build apps that use RDFa found by Google&amp;hellip;&lt;/p&gt;
&lt;h2 id=&#34;6-comments&#34;&gt;6 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://clockwerx.blogspot.com/&#34; title=&#34;http://clockwerx.blogspot.com/&#34;&gt;Daniel O&amp;rsquo;Connor&lt;/a&gt; on &lt;a href=&#34;#comment-2260&#34;&gt;May 15, 2009 10:48 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I only wish that I could make blogger output xhtml strict - but I can&amp;rsquo;t, because of how they throw in some iframes and what have you.&lt;/p&gt;
&lt;p&gt;This means I can&amp;rsquo;t swap my doctype over to xhtml+rdfa and weave in their new information properly.&lt;/p&gt;
&lt;p&gt;Annoying.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://webBackplane.com/mark-birbeck&#34; title=&#34;http://webBackplane.com/mark-birbeck&#34;&gt;Mark Birbeck&lt;/a&gt; on &lt;a href=&#34;#comment-2261&#34;&gt;May 16, 2009 2:45 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Daniel,&lt;/p&gt;
&lt;p&gt;The doctype is optional.&lt;/p&gt;
&lt;p&gt;Mark&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://sw-app.org/&#34; title=&#34;http://sw-app.org/&#34;&gt;Michael Hausenblas&lt;/a&gt; on &lt;a href=&#34;#comment-2262&#34;&gt;May 16, 2009 3:20 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Bob,&lt;/p&gt;
&lt;p&gt;Good post, I by and large agree (esp. re semantic SEO) - see also my 2c at [1].&lt;/p&gt;
&lt;p&gt;Cheers,&lt;br /&gt;
Michael&lt;/p&gt;
&lt;p&gt;[1] &lt;a href=&#34;http://lists.w3.org/Archives/Public/public-lod/2009May/0095.html&#34;&gt;http://lists.w3.org/Archives/Public/public-lod/2009May/0095.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.nature.com/&#34; title=&#34;http://www.nature.com/&#34;&gt;Tony Hammond&lt;/a&gt; on &lt;a href=&#34;#comment-2263&#34;&gt;May 16, 2009 8:33 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Nice post, Bob.&lt;/p&gt;
&lt;p&gt;Re your 2nd bullet, this is really encouraging news. A shame that Google Scholar persists in not making a namespace available for its vocabulary. For an example, see &lt;a href=&#34;http://blogs.nature.com/wp/nascent/2008/05/naturecom_adds_metadata.html&#34;&gt;this post&lt;/a&gt; on Nasecnt about Nature&amp;rsquo;s inclusion of META tags, and compare the DC and PRISM vocabularies which have declared schemas with the Google Scholar tags which have no decalred schema. In fact, I couldn&amp;rsquo;t find any web page for this vocabulary other than &amp;ldquo;contact us&amp;rdquo; type links.&lt;/p&gt;
&lt;p&gt;This new approach to including namespaces is refreshing.&lt;/p&gt;
&lt;p&gt;Tony&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://go-to-hellman.blogspot.com/&#34; title=&#34;http://go-to-hellman.blogspot.com/&#34;&gt;Eric Hellman&lt;/a&gt; on &lt;a href=&#34;#comment-2264&#34;&gt;May 20, 2009 9:57 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;m also disturbed by all the careless mistakes that google has left in their help documentation at &lt;a href=&#34;http://google.com/support/webmasters/bin/answer.py?hl=en&amp;amp;answer=146898&#34;&gt;http://google.com/support/webmasters/bin/answer.py?hl=en&amp;amp;answer=146898&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I have also commented at &lt;a href=&#34;http://www.google.com/support/forum/p/Webmasters/thread?tid=165a6bebc77f2217&amp;amp;hl=en&#34;&gt;http://www.google.com/support/forum/p/Webmasters/thread?tid=165a6bebc77f2217&amp;amp;hl=en&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Who knows what they&amp;rsquo;ve actually implemented.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-2265&#34;&gt;May 20, 2009 10:50 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Eric: gluejar?&lt;/p&gt;
&lt;p&gt;Maybe they&amp;rsquo;re going with a &amp;ldquo;release early, release often&amp;rdquo; strategy and crowdsourcing the QA of the design to those who show an interest, like us&amp;hellip;&lt;/p&gt;
&lt;p&gt;Bob&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2009">2009</category>
      
      <category domain="https://www.bobdc.com//categories/rdf">RDF</category>
      
      <category domain="https://www.bobdc.com//categories/semantic-web">semantic web</category>
      
    </item>
    
    <item>
      <title>Semantic web technology and humanities research</title>
      <link>https://www.bobdc.com/blog/semantic-web-technology-and-hu/</link>
      <pubDate>Wed, 29 Apr 2009 18:13:55 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/semantic-web-technology-and-hu/</guid>
      
      
      <description><div>A Canadian historian uses semantic web technology to do interesting research and to lay the groundwork for others to do so.</div><div>&lt;p&gt;I&amp;rsquo;ve attended and given a few &lt;a href=&#34;http://www2.lib.virginia.edu/scholarslab/&#34;&gt;Scholar&amp;rsquo;s Lab&lt;/a&gt; talks at the nearby University of Virginia, and I&amp;rsquo;m kicking myself for missing a recent talk by Mount Allison University&amp;rsquo;s &lt;a href=&#34;http://www.mta.ca/faculty/humanities/classics/Robertson/&#34;&gt;Bruce Robertson&lt;/a&gt;, whose field at Mount Allison is ancient Greek and Roman history. (A podcast of his Scholars Lab talk is available &lt;a href=&#34;http://deimos.apple.com/WebObjects/Core.woa/FeedEnclosure/virginia-public.2014484138.02014484145.2053632731/enclosure.mp3&#34;&gt;here&lt;/a&gt;.) He&amp;rsquo;s the main guy behind the &lt;a href=&#34;http://heml.mta.ca&#34;&gt;Historical Event Markup Linking Project&lt;/a&gt; (HEML) and apparently even the people who brought him to UVa to give his recent talk were surprised at how far he&amp;rsquo;d refocused his XML orientation toward semantic web technologies.&lt;/p&gt;
&lt;p&gt;A few quotes from his presentation:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The semantic web stack&amp;hellip; allows a schema to be always growable in a federated way. You can add to my schema and I can&amp;rsquo;t do anything about it, and that&amp;rsquo;s a wonderful, wonderful thing.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote id=&#34;id203661&#34; class=&#34;pullquote&#34;&gt;&#34;You can add to my schema and I can&#39;t do anything about it, and that&#39;s a wonderful, wonderful thing&#34;.&lt;/blockquote&gt;
&lt;p&gt;I agree. While extensibility of a given XML DTD or schema must be &lt;a href=&#34;http://snee.com/xml/xml2005/industryschemas.html&#34;&gt;designed into it&lt;/a&gt; from the start, RDFS and OWL schemas allow a lot more flexibility and therefore more possibilities to build on the work of others. On a related note, here&amp;rsquo;s my favorite quote, which was a bit of a lightbulb moment for me:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;If in the XML world the schema next door is just a stylesheet away, in the RDF world, the schema next door can be reasoned into, so you can include reasoning rules so that the same server is providing data in very many different flavors. I think this is an underexplored and exciting aspect of RDF, that if we have multiple schemas, as we do in the humanities, and we&amp;rsquo;re not going to agree on one, we can just do all of them.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;When I give an XSLT class I like to provide some introductory historical background before I show the first stylesheet. I always say that the main growth driver for XSLT&amp;rsquo;s popularity was that people got tired of waiting for the shareable DTDs that they heard about when XML was first released—they just decided to send and accept whatever XML had the information they needed and and then write stylesheets to rename and rearrange that XML to fit into their systems. I never thought of RDF-oriented schemas the same way, but I now I realize that they&amp;rsquo;re all that and more, because it&amp;rsquo;s much easier to combine multiple RDFS/OWL schemas for a single application than it is to combine multiple XML schemas/DTDs. (As a side note, I&amp;rsquo;m currently reading Dean Allemang and Jim Hendler&amp;rsquo;s book &lt;a href=&#34;http://www.amazon.com/exec/obidos/ISBN=0123735564/bobducharmeA/&#34;&gt;Semantic Web for the Working Ontologist: Effective Modeling in RDFS and OWL&lt;/a&gt; and I&amp;rsquo;m learning a great deal. I&amp;rsquo;m familiar with most of the components of RDFS and OWL that they explain, but their advice on how to put those pieces together has taught me a lot and given me many ideas.)&lt;/p&gt;
&lt;p&gt;The Semantic Web community is sometimes accused, even from within, of being an echo chamber of tools vendors and open source developers telling each other about their latest features. A corollary issue is that these people must hear more from users about their needs, and Bruce&amp;rsquo;s talk is just the kind of thing they need to hear. His talk that I link to above covers issues such as what went well for him as he built his application, what didn&amp;rsquo;t, the mining of Wikipedia/DBPedia for historical research, issues he found with the representation of time and languages of content&amp;hellip; it&amp;rsquo;s great stuff. Too bad it&amp;rsquo;s too late for him to get on the bill of the &lt;a href=&#34;http://www.semantic-conference.com/&#34;&gt;Semantic Technology&lt;/a&gt; conference; in a &lt;a href=&#34;http://semanticgang.talis.com/2009/04/16/april-2009-the-semantic-web-gang-discuss-vocabularies-and-ontologies/&#34;&gt;recent Semantic Web Gang discussion&lt;/a&gt;, Reuters Clearforest&amp;rsquo;s Tom Tague discussed his hopes that more non-industry people would help make this conference less echoey than it had been in the past. To be honest, he actually said he was hoping to see more &amp;ldquo;business users&amp;rdquo;; perhaps, to get more non-semweb geek perspectives, we should think about how much non-computer science academic people can contribute to the discussion as well. Bruce Robertson is a great example.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2009">2009</category>
      
      <category domain="https://www.bobdc.com//categories/semantic-web">semantic web</category>
      
    </item>
    
    <item>
      <title>An epub comic book</title>
      <link>https://www.bobdc.com/blog/an-epub-comic-book/</link>
      <pubDate>Tue, 21 Apr 2009 09:59:29 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/an-epub-comic-book/</guid>
      
      
      <description><div>From the golden days of goofy comics.</div><div>&lt;p&gt;&lt;a href=&#34;http://members.fortunecity.com/srca1943/SpotAnn2-2-1.html&#34;&gt;&lt;img id=&#34;id203603&#34; src=&#34;https://www.bobdc.com/img/main/bluebeetle.jpg&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;image from &#39;Blue Beetle \#2&#39;&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;My brother-in-law works for a company that handles licensing and republishing of a lot of comic strip publishing. Lately we&amp;rsquo;ve been discussing the issues involved with republishing such image-based content as electronic books. (After these discussions, my wife asks &amp;ldquo;How&amp;rsquo;s your sister?&amp;rdquo; and I say &amp;ldquo;Uhh, OK I guess.&amp;rdquo;) I wanted to give it a shot, and found that a Google search on &lt;a href=&#34;http://www.google.com/search?q=public%20domain%20comics&#34;&gt;public domain comics&lt;/a&gt; got plenty of hits.&lt;/p&gt;
&lt;p&gt;I managed to find a pretty cool one called the Blue Beetle, and &lt;a href=&#34;http://members.fortunecity.com/srca1943/SpotAnn2-2-1.html&#34;&gt;issue 2&lt;/a&gt; from 1955 has panels that are all the same size, which helped me to sidestep one of the tougher comics-as-ebooks issues, so I created an &lt;a href=&#34;http://www.snee.com/ebooks/bluebeetle2.epub&#34;&gt;epub Blue Beetle&lt;/a&gt; comic. In the epub version the images are all rotated horizontally, so it really is aimed at smaller devices that you can easily turn 90 degrees such as the Sony Reader and the iPhone. Tests by friends show that they look pretty good on each, although the images are based on scans of 50 year-old cheaply printed paper which probably had a high acid content, so it&amp;rsquo;s not as crisp as it might be. Text on anything but a white background is difficult to read on the iPhone.&lt;/p&gt;
&lt;p&gt;Still, overall, it&amp;rsquo;s pretty cool. Check it out yourself and let me know what you think.&lt;/p&gt;
&lt;h2 id=&#34;6-comments&#34;&gt;6 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.gilbane.com/xml&#34; title=&#34;http://www.gilbane.com/xml&#34;&gt;Bill Trippe&lt;/a&gt; on &lt;a href=&#34;#comment-2258&#34;&gt;April 21, 2009 11:09 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;You are the man!&lt;/p&gt;
&lt;p&gt;By Erio on &lt;a href=&#34;#comment-2364&#34;&gt;November 11, 2009 4:31 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;ok, i just downloaded it&amp;hellip; As soon as i get my PRS-600 i&amp;rsquo;ll try it and let you know my opinion&amp;hellip; thank you for the work tho =)&lt;/p&gt;
&lt;p&gt;By Erio on &lt;a href=&#34;#comment-2365&#34;&gt;November 11, 2009 4:32 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;ok, i just downloaded it&amp;hellip; As soon as i get my PRS-600 i&amp;rsquo;ll try it and let you know my opinion&amp;hellip; thank you for the work tho =)&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.marchansenstuff.com&#34; title=&#34;http://www.marchansenstuff.com&#34;&gt;Marc Hansen&lt;/a&gt; on &lt;a href=&#34;#comment-2447&#34;&gt;February 12, 2010 1:09 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;There&amp;rsquo;s an epub comic book test &lt;a href=&#34;http://ralphsnart.blogspot.com/2010/02/epub-comic-book-template.html&#34;&gt;here&lt;/a&gt; if anyone wants to try it.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://comictoepub.sourceforge.net/&#34; title=&#34;http://comictoepub.sourceforge.net/&#34;&gt;Will&lt;/a&gt; on &lt;a href=&#34;#comment-2467&#34;&gt;March 16, 2010 11:44 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I wrote a program to convert comic to EPUB. It&amp;rsquo;s free and open source for anyone who wants it. I was getting tired of manually converting them. :)&lt;/p&gt;
&lt;p&gt;The program automatically converts CBR and CBZ files. It also cleans up the scanned images so that they look nicer on an eBook reader. It&amp;rsquo;s all pretty much automatic. Windows only though (sorry), but open source if anyone wants to port it.&lt;/p&gt;
&lt;p&gt;To download it go here:&lt;br /&gt;
&lt;a href=&#34;http://comictoepub.sourceforge.net/&#34;&gt;http://comictoepub.sourceforge.net/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob&lt;/a&gt; on &lt;a href=&#34;#comment-2468&#34;&gt;March 17, 2010 9:16 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Will: looks cool! Can you post links to some sample converted comics?&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2009">2009</category>
      
      <category domain="https://www.bobdc.com//categories/ebooks">ebooks</category>
      
    </item>
    
    <item>
      <title>Expand those shortened URLs before archiving twitter messages</title>
      <link>https://www.bobdc.com/blog/expand-those-shortened-urls-be/</link>
      <pubDate>Tue, 14 Apr 2009 09:51:30 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/expand-those-shortened-urls-be/</guid>
      
      
      <description><div>What if a shortening service goes down?</div><div>&lt;p&gt;People love to talk about the implications of twitter.com going down, but what if a URL-shortening service goes down? When I had trouble getting to &lt;a href=&#34;http://is.gd/&#34;&gt;is.gd&lt;/a&gt; recently, I realized that when they&amp;rsquo;re down tweets referencing is.gd URLs are worthless—and that it wouldn&amp;rsquo;t be too difficult to do something about it before this happens. (I have wondered, though: why doesn&amp;rsquo;t twitter grab some short domain name and offer their own shortening service?) After all, if you&amp;rsquo;re saving any tweets, why save them with a dependency on some potentially fly-by-night point of failure?&lt;/p&gt;
&lt;p&gt;My wrapShortenedURLs.py python script, available at &lt;a href=&#34;http://www.snee.com/xml/twclient/wrapShortenedURLs.py.txt&#34;&gt;http://www.snee.com/xml/twclient/wrapShortenedURLs.py.txt&lt;/a&gt;, looks for URLs from five shortening services (defined in a list at the top of the script, in case you want to add others) and wraps those URLs in an HTML &lt;code&gt;a&lt;/code&gt; element with an &lt;code&gt;href&lt;/code&gt; attribute storing the URL that the shortened URL redirects to. For example, it will turn &amp;lsquo;See &lt;a href=&#34;http://is.gd/p3zb&#34;&gt;http://is.gd/p3zb&lt;/a&gt; for Joseph Beuys fronting a bad German New Wave band&amp;rsquo; into &amp;lsquo;See &lt;a href=&#34;http://www.youtube.com/watch?v=DQ1_ALxGbGk&#34;&gt;&lt;a href=&#34;http://is.gd/p3zb&#34;&gt;http://is.gd/p3zb&lt;/a&gt;&lt;/a&gt; for Joseph Beuys fronting a bad German New Wave band&amp;rsquo;. (When writing the script, tweets with multiple shortened URLs were the difficult part, requiring an upgrade to my skill with Python regular expression functions.)&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve tested this with some XML pulled down using the &lt;a href=&#34;http://www.devx.com/webdev/Article/40359/1954&#34;&gt;twitter API&lt;/a&gt; and with a CSV file from &lt;a href=&#34;http://tweetake.com&#34;&gt;tweetake.com&lt;/a&gt;, a service that lets you back up information you&amp;rsquo;ve stored on twitter, and it seems to work fine. I&amp;rsquo;ll be using it with all my archived tweets from tweetake.com from now on, and if I ever write my own twitter archiving routine using the API, this will certainly be a part of it.&lt;/p&gt;
&lt;h2 id=&#34;6-comments&#34;&gt;6 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://norman.walsh.name/&#34; title=&#34;http://norman.walsh.name/&#34;&gt;Norman Walsh&lt;/a&gt; on &lt;a href=&#34;#comment-2250&#34;&gt;April 14, 2009 10:02 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Sure seems like a good idea to me!&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.webcomposite.com&#34; title=&#34;http://www.webcomposite.com&#34;&gt;Jim Fuller&lt;/a&gt; on &lt;a href=&#34;#comment-2251&#34;&gt;April 14, 2009 10:38 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;url shortening is probably turning out to be a bad idea and even though I have use/d it, completely agree&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.martin-probst.com&#34; title=&#34;http://www.martin-probst.com&#34;&gt;Martin Probst&lt;/a&gt; on &lt;a href=&#34;#comment-2252&#34;&gt;April 14, 2009 11:05 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Now you&amp;rsquo;ve made me look at that truly horrible video&amp;hellip;&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://planb.nicecupoftea.org&#34; title=&#34;http://planb.nicecupoftea.org&#34;&gt;Libby&lt;/a&gt; on &lt;a href=&#34;#comment-2253&#34;&gt;April 14, 2009 7:03 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;fwiw, I wrote a small (and probably not very good) ruby script to expand tinyurls: &lt;a href=&#34;http://planb.nicecupoftea.org/2009/02/02/expand-tinyurls-using-ruby/&#34;&gt;http://planb.nicecupoftea.org/2009/02/02/expand-tinyurls-using-ruby/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.leobard.net&#34; title=&#34;http://www.leobard.net&#34;&gt;leo sauermann&lt;/a&gt; on &lt;a href=&#34;#comment-2256&#34;&gt;April 19, 2009 5:45 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;for what reason would you archive twitter messages? the content is intended to be outdated after a day.&lt;/p&gt;
&lt;p&gt;archive blogs!&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-2257&#34;&gt;April 19, 2009 5:53 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Leo,&lt;/p&gt;
&lt;p&gt;Of course I archive blogs. Some consider twitter to be a &amp;ldquo;microblog&amp;rdquo;, and since I sometimes use it to mention interesting websites I&amp;rsquo;ve found, I like to archive those as well.&lt;/p&gt;
&lt;p&gt;Many twitter messages are very ephemeral, and many aren&amp;rsquo;t. I usually don&amp;rsquo;t follow people who tweet things like &amp;ldquo;just finished breakfast&amp;rdquo;, because I prefer the ones that say a little more. The people posting those may well consider them worth archiving.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2009">2009</category>
      
      <category domain="https://www.bobdc.com//categories/miscellaneous">miscellaneous</category>
      
    </item>
    
    <item>
      <title>Getting started with AllegroGraph</title>
      <link>https://www.bobdc.com/blog/getting-started-with-allegrogr/</link>
      <pubDate>Wed, 08 Apr 2009 16:50:33 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/getting-started-with-allegrogr/</guid>
      
      
      <description><div>Via Python and via HTTP.</div><div>&lt;p&gt;&lt;a href=&#34;http://www.franz.com/agraph/allegrograph/&#34;&gt;&lt;img id=&#34;id203601&#34; src=&#34;http://www.franz.com/images/lg_franz.gif&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;some description&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The home page of Franz Inc.&amp;rsquo;s &lt;a href=&#34;http://www.franz.com/agraph/allegrograph/&#34;&gt;AllegroGraph RDFStore&lt;/a&gt; calls it &amp;ldquo;a modern, high-performance, persistent RDF graph database&amp;rdquo; that &amp;ldquo;scale[s] to billions of triples while maintaining superior performance&amp;rdquo;. Franz offers a free version that lets you store up to 50 million triples, so I installed and played with release 3.2 of the Windows version. When I tried it, the documentation and examples were not well coordinated with the configuration of the latest release, but Franz&amp;rsquo;s email support was very responsive and helpful, even to a non-paying customer like me. I&amp;rsquo;ve also seen some evidence that they&amp;rsquo;re bringing this documentation up to date.&lt;/p&gt;
&lt;p&gt;For each &lt;a href=&#34;http://www.snee.com/bobdc.blog/metadata/rdf/triplestores/&#34;&gt;triplestore I&amp;rsquo;ve played with&lt;/a&gt;, I tried to avoid coding and compiling. I didn&amp;rsquo;t see any web interface or command line tool for loading RDF triples into AllegroGraph and then querying the data using SPARQL, so I started with its Python interface and then tried the HTTP interface. I first learned Python several years ago because of all of the RDF-related libraries out there, so I&amp;rsquo;m happy to write some scripts with it. It would be interesting to try AllegroGraph&amp;rsquo;s LISP interface, but my &lt;a href=&#34;http://www.snee.com/bob/worksch.html#i1&#34;&gt;last experience coding in LISP&lt;/a&gt; is some time ago, so there&amp;rsquo;d be some catch-up time.&lt;/p&gt;
&lt;h2 id=&#34;id203672&#34;&gt;The AllegroGraph server&lt;/h2&gt;
&lt;p&gt;AllegroGraph&amp;rsquo;s setup routine configured it to automatically run as a service under Windows. After some early frustration with the Python client, I discovered that this copy of the server was not being started up according to assumptions made by the sample code in AllegroGraph&amp;rsquo;s &lt;a href=&#34;http://www.franz.com/agraph/support/documentation/current/python-tutorial.html&#34;&gt;Python API for AllegroGraph&lt;/a&gt; tutorial. For one thing, a line in the tutorial&amp;rsquo;s first Python script tells the server to open up the &amp;ldquo;ag&amp;rdquo; catalog—according to the tutorial, a repository is another term for an RDF triplestore, and a catalog is a container for a set of repositories—but the server didn&amp;rsquo;t know about this catalog. I shut down the AllegroGraph service (in Windows, from Control Panel/Administrative Tools/Services, right-click &amp;ldquo;AllegroGraph Server&amp;rdquo; and pick &amp;ldquo;Stop&amp;rdquo;) and then started it up from the &lt;code&gt;Program Files\AllegroGraphFJE32&lt;/code&gt; directory with this command, which specifies a directory included with the AllegroGraph distribution as the catalog location:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;AllegroGraphServer --new-http-port 8080 --new-http-catalog doc/agraph-javadoc/com/franz/ag
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This also tells the server to use port 8080, which is where the Python tutorial&amp;rsquo;s sample scripts send their requests.&lt;/p&gt;
&lt;h2 id=&#34;id203727&#34;&gt;A little Python client&lt;/h2&gt;
&lt;p&gt;The AllegroGraphFJE32/doc/server-installation.html file included with the distribution recommends that Windows users use ActiveState&amp;rsquo;s version of Python, which may explain some of my other early problems with the Python interface. I also found mistakes in the Python tutorial&amp;rsquo;s sample code; instead of listing these problems, I&amp;rsquo;ve posted my script, which includes corrected versions of the first few examples, at &lt;a href=&#34;http://www.snee.com/rdf/agdemo.py.txt&#34;&gt;http://www.snee.com/rdf/agdemo.py.txt&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The script creates a repository in the ag catalog, loads the same RDF files that I loaded into other triplestores I&amp;rsquo;ve tried, and sends the server the &amp;ldquo;SELECT DISTINCT ?p WHERE {?s ?p ?o}&amp;rdquo; query I usually use to start any SPARQL session. I commented this Python script script where I could, so I won&amp;rsquo;t describe it here. For now, AllegroGraph&amp;rsquo;s documentation of their Python interface is skimpy, but better documentation is on the way. You can learn more about AllegroGraph&amp;rsquo;s Python interface from &lt;a href=&#34;http://kill.devc.at/node/233&#34;&gt;this blog posting&lt;/a&gt; by someone in Austria named &amp;ldquo;Rho&amp;rdquo;. Keep in mind that Rho&amp;rsquo;s examples use release 3.1.1, and apparently improvements to the Python client were an important part of AllegroGraph&amp;rsquo;s upgrade to release 3.2.&lt;/p&gt;
&lt;h2 id=&#34;id203776&#34;&gt;Trying the HTTP interface&lt;/h2&gt;
&lt;p&gt;AllegroGraph&amp;rsquo;s currently available documentation of their HTTP interface provides no examples of complete URLs to send to the server, so it took me some time to work out the correct format, but once I did, it was pretty straightforward to use. (As with the Python interface documentation, I heard that better HTTP interface documentation is on the way.) One other caveat: when I tried this with a recent distribution version of release 3.2, some of these commands didn&amp;rsquo;t work until after I&amp;rsquo;d picked &amp;ldquo;Download AllegroGraph 3.2 Free Java Edition Updates&amp;rdquo; from the AllegroGraph program group on the Windows Start menu.&lt;/p&gt;
&lt;p&gt;AllegroGraph&amp;rsquo;s &lt;a href=&#34;http://agraph.franz.com/support/documentation/current/http-protocol.html&#34;&gt;HTTP interface documentation&lt;/a&gt; says that if you start the server with the -new-http-port option, as I did, then you should us the separate documentation for their &lt;a href=&#34;http://agraph.franz.com/support/documentation/current/new-http-server.html&#34;&gt;new HTTP server&lt;/a&gt;. I used &lt;a href=&#34;http://curl.haxx.se/&#34;&gt;cURL&lt;/a&gt; to send URIs to the server&amp;rsquo;s HTTP interface.&lt;/p&gt;
&lt;p&gt;To list existing repositories, the following query retrieved a &lt;a href=&#34;http://www.w3.org/TR/2008/REC-rdf-sparql-XMLres-20080115/&#34;&gt;SPARQL query results XML format&lt;/a&gt; listing with fields for the uri, id, title, readable, and writable status of each repository:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;curl http://localhost:8080/catalogs/ag/repositories
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This is an important command, because many others require you to supply a repository id.&lt;/p&gt;
&lt;p&gt;This next command following successfully created a new repository with an id of test1 (all curl commands were actually issued as one line; I added carriage returns here for readability):&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;curl -X PUT -H &amp;quot;content-type: application/x-www-form-urlencoded; accept: */*&amp;quot; 
  http://localhost:8080/catalogs/ag/repositories/test1
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The first time I tried it I saw no response, but the second time I was told &amp;ldquo;there is already a store named &amp;rsquo;test1&amp;rsquo;&amp;rdquo;, which was good news.&lt;/p&gt;
&lt;p&gt;The following command added triples from the indicated disk file to the test1 repository:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;curl -X POST -T \bob\dev\xml\rdf\fakeaddrbookpt1.rdf -H &amp;quot;content-type: application/rdf+xml&amp;quot;
  http://localhost:8080/catalogs/ag/repositories/test1/statements&amp;quot;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;em&gt;(April 9th correction: when I posted this entry yesterday, the preceding command and the remainder of this paragraph had the POST and PUT references backwards, so I just fixed them.)&lt;/em&gt; I found that without that &amp;ldquo;-X POST&amp;rdquo; in the command line, either the server or curl assumed that I was PUTting data. An HTTP PUT replaces any existing data in the repository, so if you want to add several files to the same repository, make sure to explicitly POST them there.&lt;/p&gt;
&lt;p&gt;The next command sent an &lt;a href=&#34;http://www.xs4all.nl/~jlpoutre/BoT/Javascript/Utils/endecode.html&#34;&gt;escaped&lt;/a&gt; SPARQL query to the server, which sent back a SPARQL query result format list of the predicates used in the data that I had loaded:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;curl -H &amp;quot;Accept: application/sparql-results+xml&amp;quot; 
  http://localhost:8080/catalogs/ag/repositories/test1?query=SELECT%20DISTINCT%20%3Fp%20WHERE%20%7B%3Fs%20%3Fp%20%3Fo%7D
&lt;/code&gt;&lt;/pre&gt;
&lt;h2 id=&#34;id203919&#34;&gt;Querying sets of named graphs&lt;/h2&gt;
&lt;p&gt;Using the HTTP interface, I also managed to reproduce my experiment with named graphs described at &lt;a href=&#34;https://www.bobdc.com/blog/querying-a-set-of-named-rdf-gr&#34;&gt;Querying a set of named RDF graphs without naming the graphs&lt;/a&gt;. (See that posting for background on what I was trying to accomplish, the sample data files, and the queries I used. And, if you&amp;rsquo;re interested in named graphs, don&amp;rsquo;t miss the discussion between Paula Gearon, Lee Feigenbaum, and Andy Seaborne in the &lt;a href=&#34;http://www.snee.com/bobdc.blog/2009/03/querying-a-set-of-named-rdf-gr.html#comments&#34;&gt;comments&lt;/a&gt; section of that post.) Following the steps described there, I first loaded the mybluegraph.rdf file into the graph named &lt;a href=&#34;http://www.snee.com/ng/mybluegraph.rdf&#34;&gt;http://www.snee.com/ng/mybluegraph.rdf&lt;/a&gt; (or, in AllegroGraph terms, into the context named &lt;a href=&#34;http://www.snee.com/ng/mybluegraph.rdf)&#34;&gt;http://www.snee.com/ng/mybluegraph.rdf)&lt;/a&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;curl -X POST -T \bob\dev\xml\rdf\sparql\namedgraphs\mybluegraph.rdf 
  -H &amp;quot;Content-Type: application/rdf+xml&amp;quot; 
  http://localhost:8080/catalogs/ag/repositories/test1/statements?context=%3Chttp%3A%2F%2Fwww.snee.com%2Fng%2Fmybluegraph.rdf%3E
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Then I loaded myredgraph.rdf into the &lt;a href=&#34;http://www.snee.com/ng/myredgraph.rdf&#34;&gt;http://www.snee.com/ng/myredgraph.rdf&lt;/a&gt; graph with a similar command:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;curl -X POST -T \bob\dev\xml\rdf\sparql\namedgraphs\myredgraph.rdf 
  -H &amp;quot;Content-Type: application/rdf+xml&amp;quot; 
  http://localhost:8080/catalogs/ag/repositories/test1/statements?context=%3Chttp%3A%2F%2Fwww.snee.com%2Fng%2Fmyredgraph.rdf%3E
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I loaded mygreengraph.rdf without specifying a graph in which to load it:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;curl -X POST -T \bob\dev\xml\rdf\sparql\namedgraphs\mygreengraph.rdf 
  -H &amp;quot;Content-Type: application/rdf+xml&amp;quot; 
  http://localhost:8080/catalogs/ag/repositories/test1/statements
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;A query for all dc:title values retrieved them from all three files,&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;curl -H &amp;quot;Accept: application/sparql-results+xml&amp;quot; http://localhost:8080/catalogs/ag/repositories/test1?query=PREFIX%20dc%3A%3Chttp%3A%2F%2Fpurl.org%2Fdc%2Felements%2F1.1%2F%3E%20select%20%3Ftitle%20WHERE%20%7B%3Fs%20dc%3Atitle%20%3Ftitle%7D%0A
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;but a query for dc:title values from graphs that were subgraphs of &lt;a href=&#34;http://www.snee.com/ng/mygraph.rdf&#34;&gt;http://www.snee.com/ng/mygraph.rdf&lt;/a&gt; only retrieved the redgraph and bluegraph ones, just as I&amp;rsquo;d hoped:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;curl -H &amp;quot;Accept: application/sparql-results+xml&amp;quot; 
  http://localhost:8080/catalogs/ag/repositories/test1?query=PREFIX%20dc%3A%3Chttp%3A%2F%2Fpurl.org%2Fdc%2Felements%2F1.1%2F%3E%20PREFIX%20rdfg%3A%3Chttp%3A%2F%2Fwww.w3.org%2F2004%2F03%2Ftrix%2Frdfg-1%2F%3E%0Aselect%20%3Ftitle%20WHERE%20%7B%20%3Fg%20rdfg%3AsubGraphOf%20%3Chttp%3A%2F%2Fwww.snee.com%2Fng%2Fmygraph.rdf%3E%20GRAPH%20%3Fg%20%7B%3Fs%20dc%3Atitle%20%3Ftitle%7D%20%7D%0A
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;As one of the commercial triplestores, AllegroGraph looks very scalable, and as I mentioned, their support is very good. Franz has been holding some &lt;a href=&#34;http://www.franz.com/agraph/&#34;&gt;webinars&lt;/a&gt; about large-scale applications of their server lately, and an upcoming one on &lt;a href=&#34;http://www.franz.com/agraph/services/conferences_seminars/semantic_technologies_v17.lhtml&#34;&gt;Solving Scale and Reasoning in Large RDF Datasets&lt;/a&gt; looks interesting; Franz distributes the &lt;a href=&#34;http://www.franz.com/agraph/racer/&#34;&gt;Racer&lt;/a&gt; Description Logics reasoner in much of the world, so I assume that it will play a role in this reasoning application.&lt;/p&gt;
&lt;h2 id=&#34;4-comments&#34;&gt;4 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://danbri.org/&#34; title=&#34;http://danbri.org/&#34;&gt;Dan Brickley&lt;/a&gt; on &lt;a href=&#34;#comment-2247&#34;&gt;April 9, 2009 4:42 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Thanks for the writeup! I never got any further than the impression I needed to write java code to talk to the db.&lt;/p&gt;
&lt;p&gt;Have you figured out any of the social network analysis stuff? &lt;a href=&#34;http://danbri.org/words/2008/06/02/327&#34;&gt;http://danbri.org/words/2008/06/02/327&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://www.franz.com/agraph/support/documentation/3.0/reference-guide.html#header3-65&#34;&gt;http://www.franz.com/agraph/support/documentation/3.0/reference-guide.html#header3-65&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;eg. can i fill it full of mail headers and foaf and do clustering to find out which groups and lists are interconnected?&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.sxnee.com/bobdc.blog&#34; title=&#34;http://www.sxnee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-2248&#34;&gt;April 9, 2009 8:26 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Thanks Dan! I was going for breadth more than depth with these, trying to follow through on the same set of baseline tasks with each triplestore. Particularly with the commercial ones like AllegroGraph and OpenLink, there are weeks&amp;rsquo; worth of features to play with.&lt;/p&gt;
&lt;p&gt;Bob&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://kill.devc.at/&#34; title=&#34;http://kill.devc.at/&#34;&gt;Robert (rho) Barta&lt;/a&gt; on &lt;a href=&#34;#comment-2249&#34;&gt;April 10, 2009 2:51 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Hi Bob.&lt;/p&gt;
&lt;p&gt;If you want to keep your coding at a minimum, then maybe watch Perl RDF::AllegroGraph::Easy evolve on CPAN. I&amp;rsquo;ll progress it as spare time allows.&lt;/p&gt;
&lt;p&gt;Worth noting is also that AllegroGraph seems to be more than &amp;ldquo;just an RDF store&amp;rdquo; as it can host tuples (and not just triples). But I admit that I have not yet fathomed out this thing yet.&lt;/p&gt;
&lt;p&gt;I can recommend these webinars, especially Jans Aasman talking. But they only last about an hour and cannot get very deep. Experimenting with the code remains a must. Good for me, as a consultant ;-)&lt;/p&gt;
&lt;p&gt;BTW, it&amp;rsquo;s rho, not Rho. And, yes, I&amp;rsquo;m sailing under no flags to avoid angry ladies emailing me about my schroedinger&amp;rsquo;sch cat experiments&amp;hellip;&lt;/p&gt;
&lt;p&gt;By Bill on &lt;a href=&#34;#comment-2288&#34;&gt;June 6, 2009 4:44 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve also started a Google Group for AllegroGraph users, called, cleverly enough, &amp;ldquo;AllegroGraph-users&amp;rdquo;. You can sign up at&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://groups.google.com/group/allegrograph-users&#34;&gt;http://groups.google.com/group/allegrograph-users&lt;/a&gt;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2009">2009</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
      <category domain="https://www.bobdc.com//categories/triplestores">triplestores</category>
      
    </item>
    
    <item>
      <title>Setting up your microcomputer facility</title>
      <link>https://www.bobdc.com/blog/setting-up-your-microcomputer/</link>
      <pubDate>Thu, 26 Mar 2009 08:56:23 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/setting-up-your-microcomputer/</guid>
      
      
      <description><div>A 1985 filmstrip. Not a slideshow, but a filmstrip. And dig that funky music.</div><div>&lt;p&gt;(Apparently the video isn&amp;rsquo;t on vimeo anymore)&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2009">2009</category>
      
      <category domain="https://www.bobdc.com//categories/technology-past">technology, past</category>
      
    </item>
    
    <item>
      <title>My own little Twitter client</title>
      <link>https://www.bobdc.com/blog/my-own-little-twitter-client/</link>
      <pubDate>Wed, 25 Mar 2009 18:49:40 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/my-own-little-twitter-client/</guid>
      
      
      <description><div>No AJAX, Flash, or AIR; just HTML, but arranged the way I want it.</div><div>&lt;p&gt;I&amp;rsquo;ve tried various Twitter clients, but usually just went back to the &lt;a href=&#34;http://twitter.com/home&#34;&gt;twitter.com&lt;/a&gt; web-based interface that people hate so much. My main complaint with it—and I saw no other clients that did any better—was that it showed tweets in reverse chronological order. Conversations and the multi-tweet mini-essays that some people write are difficult to read that way, so I decided to write my own little client.&lt;/p&gt;
&lt;blockquote id=&#34;id203615&#34; class=&#34;pullquote&#34;&gt;It&#39;s easy to think of new features to add, but in its current state it scratches the itch that I had, so I&#39;ll leave it alone.&lt;/blockquote&gt;
&lt;p&gt;It&amp;rsquo;s a simple python script that uses the &lt;a href=&#34;http://code.google.com/p/python-twitter/&#34;&gt;python-twitter&lt;/a&gt; interface. (The zipped distribution version there is a bit out of date; my script has a comment at the top about what to do.) My twitter client python script checks for a disk file that identifies the last tweet that I read, pulls all tweets since then, and then creates a web page showing those in chronological order. (Sample page &lt;a href=&#34;http://www.snee.com/xml/twclient/tweets.html&#34;&gt;here&lt;/a&gt;.) The little ← arrow after each entry lets you reply to that tweet if you&amp;rsquo;re logged in on the web client, and an additional, slightly different ↵ arrow lets you link back to a message being replied to. Mouseover text makes the meaning of the cryptic little arrows clearer.&lt;/p&gt;
&lt;p&gt;I was going to also have the script check for new direct messages, but twitter sends me email the rare times that I actually get one of those, so I won&amp;rsquo;t miss any. I also considered adding entries showing the results of a twitter vanity search, but python-twitter doesn&amp;rsquo;t support the search interface and I have an RSS feed to alert me to that anyway. It&amp;rsquo;s easy to think of new features to add—when I tweeted that I was working on this, more suggestions started coming—but in its current state it scratches the itch that I had, so I&amp;rsquo;ll leave it alone.&lt;/p&gt;
&lt;p&gt;Coding around twitter&amp;rsquo;s API is easy if you don&amp;rsquo;t want to implement a fancy UI. Some of the comments I received about features to add suggested easier following of threaded conversations, and the API gives you what you need to do that, once you decide on a UI. That&amp;rsquo;s how I added the link for the second arrow mentioned above.&lt;/p&gt;
&lt;p&gt;For now, when I want to read recent tweets from my friends, I run a batch file that runs the python script and displays the resulting HTML file. I&amp;rsquo;ll probably make something to trigger it with a CGI so that I can check for updates by just clicking a button.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve put the python script at &lt;a href=&#34;http://www.snee.com/xml/twclient/getNewTweets.py.txt&#34;&gt;http://www.snee.com/xml/twclient/getNewTweets.py.txt&lt;/a&gt; and the &lt;a href=&#34;http://www.snee.com/xml/twclient/tweets.css&#34;&gt;tweets.css&lt;/a&gt; stylesheet that the output references in the same directory. If you&amp;rsquo;ve never played with Twitter&amp;rsquo;s API, see &lt;a href=&#34;http://www.devx.com/webdev/Article/40359&#34;&gt;part 1&lt;/a&gt; and &lt;a href=&#34;http://www.devx.com/webdev/Article/40511/0&#34;&gt;part 2&lt;/a&gt; of my DevX article on it. python-twitter makes it pretty easy, but twitter&amp;rsquo;s &lt;a href=&#34;http://apiwiki.twitter.com/REST+API+Documentation&#34;&gt;RESTful native interface&lt;/a&gt; makes it easy to write a client in any language—even XSLT, if you use cURL to retrieve the data from the server, because while XSLT engines can do HTTP GETs, I know of none that can do the authenticated GETs required for most calls to the twitter API.&lt;/p&gt;
&lt;h2 id=&#34;3-comments&#34;&gt;3 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.thodla.com&#34; title=&#34;http://www.thodla.com&#34;&gt;Dorai Thodla&lt;/a&gt; on &lt;a href=&#34;#comment-2240&#34;&gt;March 25, 2009 10:16 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Nice. I always wanted one which I can customize. For example, when some one follows you, I would like to click on the person&amp;rsquo;s account and get a tag cloud to see whether I have any interest in following them.&lt;/p&gt;
&lt;p&gt;I don&amp;rsquo;t care much for AIR even though I use Twhirl a lot since it is easy to retweet.&lt;/p&gt;
&lt;p&gt;Thanks for sharing.&lt;/p&gt;
&lt;p&gt;regards,&lt;br /&gt;
Dorai&lt;/p&gt;
&lt;p&gt;By Marty on &lt;a href=&#34;#comment-2254&#34;&gt;April 16, 2009 5:46 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;What do I do with the python script to run it on windows?&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-2255&#34;&gt;April 16, 2009 7:36 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;First download it, rename it as getNewTweets.py, and change the lines that set the username and password to use your own.&lt;/p&gt;
&lt;p&gt;I run it with a batch file that looks like this:&lt;/p&gt;
&lt;p&gt;python getNewTweets.py &amp;gt; temp.html&lt;br /&gt;
temp.html&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2009">2009</category>
      
      <category domain="https://www.bobdc.com//categories/miscellaneous">miscellaneous</category>
      
    </item>
    
    <item>
      <title>Getting started with Open Anzo</title>
      <link>https://www.bobdc.com/blog/getting-started-with-open-anzo/</link>
      <pubDate>Thu, 19 Mar 2009 19:55:09 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/getting-started-with-open-anzo/</guid>
      
      
      <description><div>Don&#39;t miss the exciting command line video demo!</div><div>&lt;p&gt;&lt;a href=&#34;http://www.openanzo.org/&#34;&gt;Open Anzo&lt;/a&gt; is the third &lt;a href=&#34;http://www.snee.com/bobdc.blog/metadata/rdf/triplestores/&#34;&gt;disk-based triplestore&lt;/a&gt; that I managed to set up, load with a few files of RDF data, and query with SPARQL. Its home page describes it as &amp;ldquo;an open source enterprise-featured RDF store and service oriented middleware platform that provides support for multiple users, distributed clients, offline work, real-time notification, named-graph modularization, versioning, access controls, and transactions with preconditions&amp;rdquo;.&lt;/p&gt;
&lt;p&gt;Before I describe my experience setting it up, loading sample data, and querying that data, take a look at Lee Feigenbaum&amp;rsquo;s short &lt;a href=&#34;http://www.youtube.com/watch?v=pBeDYCA8oDk&#34;&gt;video&lt;/a&gt; demonstrating the use of Open Anzo&amp;rsquo;s command line interface:&lt;/p&gt;

&lt;div style=&#34;position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;&#34;&gt;
  &lt;iframe src=&#34;https://www.youtube.com/embed/pBeDYCA8oDk&#34; style=&#34;position: absolute; top: 0; left: 0; width: 100%; height: 100%; border:0;&#34; allowfullscreen title=&#34;YouTube Video&#34;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;p&gt;He&amp;rsquo;s using Linux in the video, but I managed to perform similar queries using Open Anzo under Windows XP. I got the impression from &lt;a href=&#34;http://www.openanzo.org/projects/openanzo/wiki&#34;&gt;one documentation web page&lt;/a&gt; that the product requires DB2, Oracle, PostgreSQL, HSQLDB, or Apache DB on the back end in order to run it, but you don&amp;rsquo;t need any external database manager to try it out. It is nice to know that these database managers are options as your storage needs scale up; a readme file mentions that it also supports MySQL, and documentation for configuring Open Anzo to hook up to each of these database managers is easy to find on the openanzo.org web site.&lt;/p&gt;
&lt;p&gt;After I &lt;a href=&#34;http://www.openanzo.org/downloads.html&#34;&gt;downloaded&lt;/a&gt; release 3.1 of the Open Anzo full distribution and unzipped it, I set the ANZO_HOME environment variable to the name of the directory where I had unzipped it and then ran the startAnzo.bat script that started the server. (Once the server is started, sending a browser to http://localhost:8080/status shows whether you&amp;rsquo;ve got it up and running properly.) The server gives you an &amp;ldquo;osgi&amp;gt;&amp;rdquo; prompt in the command window where you started it up. Entering &amp;ldquo;help&amp;rdquo; at this server prompt shows you various things you can do there, but I didn&amp;rsquo;t play with that much.&lt;/p&gt;
&lt;p&gt;Once the server is running, you can interact with it using a command line client, as Lee demonstrated in his video. From a Windows operating system prompt, you do this by supplying parameters to the anzo.bat script. In addition to the ANZO_HOME variable, the window where you issue these commands also needs the ANZO_CLI_HOME variable set; I pointed it to the same directory.&lt;/p&gt;
&lt;p&gt;Entering &amp;ldquo;anzo help&amp;rdquo; lists the various anzo commands, and entering a command name after &amp;ldquo;help&amp;rdquo; like this tells you about that command:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;anzo help query
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Before you issue your first successful command, you also need to make sure that the server recognizes you as a legitimate user. I used peter as a username and 123 as a password, because I found these in the configuration\anzo.ldif file. Open Anzo offers options to point the client to a username and password pair stored in a configuration file, which is why Lee didn&amp;rsquo;t need to include them on the command line in his video, but I just added them to each anzo command with the -w and -u switches. The following two commands each loaded a file of RDF data into the named graph identified by the -g option:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;anzo import -w 123 -u peter -g http://whatever.com/g1 \bob\dev\xml\rdf\fakeAddrBookPt1.rdf
anzo import -w 123 -u peter -g http://whatever.com/g2 \bob\dev\xml\rdf\fakeAddrBookPt2.rdf
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The following query then asked for a list of all the predicates used by triples in the &lt;a href=&#34;http://whatever.com/g1&#34;&gt;http://whatever.com/g1&lt;/a&gt; graph:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;anzo query -u peter -w 123  &amp;quot;SELECT DISTINCT ?p FROM &amp;lt;http://whatever.com/g1&amp;gt; WHERE {?s ?p ?o}&amp;quot;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The next query doesn&amp;rsquo;t mention a specific graph, but it does include the -a switch, which tells Open Anzo to query against a merge of all the named graphs in the repository:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;anzo query -u sysadmin -w 123 -a &amp;quot;SELECT DISTINCT ?p WHERE {?s ?p ?o}&amp;quot;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Both queries worked just fine. As I mentioned in an &lt;a href=&#34;https://www.bobdc.com/blog/querying-a-set-of-named-rdf-gr#id203812&#34;&gt;update to last week&amp;rsquo;s post&lt;/a&gt;, I also managed to query a set of graphs at once in Open Anzo based on metadata associated with the graph.&lt;/p&gt;
&lt;p&gt;As I understand it, Open Anzo once included a RESTful SPARQL endpoint to provide an HTTP interface, and although some more recent builds didn&amp;rsquo;t include this, it&amp;rsquo;s being put back in. I couldn&amp;rsquo;t get it to work in a few tests with &lt;a href=&#34;http://curl.haxx.se/&#34;&gt;curl&lt;/a&gt;, but I&amp;rsquo;m going to keep trying with future builds.&lt;/p&gt;
&lt;p&gt;As with Virtuoso, Open Anzo has an impressive list of features beyond the simple ability to load and query triples that I&amp;rsquo;ve demonstrated here. I love the command line interface, and Lee&amp;rsquo;s video quickly demonstrates a lot of cool things you can do with it. I look forward to playing more with Open Anzo.&lt;/p&gt;
&lt;h2 id=&#34;1-comments&#34;&gt;1 Comments&lt;/h2&gt;
&lt;p&gt;By Bernhard Schandl on &lt;a href=&#34;#comment-2306&#34;&gt;August 13, 2009 4:07 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Bob,&lt;/p&gt;
&lt;p&gt;are you sure Open Anzo works with a file-based triple store? As far as the documentation reads, without an underlying RDBMS an in-memory store is used, which means your triples will be gone once the server is shut down.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2009">2009</category>
      
      <category domain="https://www.bobdc.com//categories/rdf">RDF</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
      <category domain="https://www.bobdc.com//categories/triplestores">triplestores</category>
      
    </item>
    
    <item>
      <title>Some use cases to implement using SPARQL graphs</title>
      <link>https://www.bobdc.com/blog/some-use-cases-to-implement-us/</link>
      <pubDate>Sun, 15 Mar 2009 17:34:34 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/some-use-cases-to-implement-us/</guid>
      
      
      <description><div>Or not; I&#39;m open to suggestions.</div><div>&lt;p&gt;As I wrote in my &lt;a href=&#34;https://www.bobdc.com/blog/querying-a-set-of-named-rdf-gr&#34;&gt;last entry&lt;/a&gt;, I&amp;rsquo;ve recently figured out how to assign metadata to RDF graphs and to perform SPARQL queries on sets of those graphs. I&amp;rsquo;m working a bit backwards here, because I&amp;rsquo;m now moving on to the use cases that got me thinking about this in the first place. It&amp;rsquo;s easier to think about them now that I know that I can implement them using standard syntax and multiple open source implementations of that standard. I wanted to outline my ideas about how to implement these use cases to see if they sound particularly good or bad to others. They&amp;rsquo;re general enough that they&amp;rsquo;ll apply to other situations.&lt;/p&gt;
&lt;h2 id=&#34;id203629&#34;&gt;Simple aggregation of distributed data&lt;/h2&gt;
&lt;p&gt;Let&amp;rsquo;s say I have a collection of RDF data that mirrors several sets of data on the Internet. I want to query the aggregate set without retrieving every set from its original source with every query. It&amp;rsquo;s not very time-sensitive data, so updating the central collection once every 24 hours is fine. &amp;ldquo;Updating&amp;rdquo; is the key operation here; if someone deletes a triple from one of the satellite collections, I want to be confident that it won&amp;rsquo;t be in my aggregate collection the next day, so here&amp;rsquo;s what I would do.&lt;/p&gt;
&lt;p&gt;I name each graph in my internal collection after the source of its triples. To update the data from source &lt;a href=&#34;http://www.greatdata.org/latest.rdf&#34;&gt;http://www.greatdata.org/latest.rdf&lt;/a&gt;, a cron job does the following at 3:14 AM each morning:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Delete the triples in the &lt;a href=&#34;http://www.greatdata.org/latest.rdf&#34;&gt;http://www.greatdata.org/latest.rdf&lt;/a&gt; graph in my collection.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Load the latest data from &lt;a href=&#34;http://www.greatdata.org/latest.rdf&#34;&gt;http://www.greatdata.org/latest.rdf&lt;/a&gt; into the graph with that name in my collection.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Add some triples like the following to a graph dedicated to tracking such downloads:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;lt;http://www.greatdata.org/latest.rdf&amp;gt;
&amp;lt;http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#fileLastAccessed&amp;gt;
&amp;quot;2009-03-15T03:14:52-0500&amp;quot;.
&lt;/code&gt;&lt;/pre&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Wiping out a set of data and completely replacing it will only scale up to a certain point, and a &lt;a href=&#34;http://jena.hpl.hp.com/~afs/SPARQL-Update.html&#34;&gt;SPARQL UPDATE&lt;/a&gt; ability will be a better way to implement certain variations on this, but if the total aggregate size is just a few dozen megabytes, the general approach above makes sense to me. Does it look horribly wrong to anyone else?&lt;/p&gt;
&lt;h2 id=&#34;id203706&#34;&gt;Identifying a triple&amp;rsquo;s provenance&lt;/h2&gt;
&lt;p&gt;This time, instead of replacing each graph with a more updated version, I want to aggregate all the downloaded data as it accumulates. I assign each downloaded batch its own graph URL and assign metadata to this new graph such as the source, date, and time of the retrieval. I could also assign it rdfg:subGraphOf values, depending on which sets of graphs I was defining for querying, updating, and access control purposes.&lt;/p&gt;
&lt;p&gt;To move on to a usage scenario, let&amp;rsquo;s say that Kendall Clark queries a service on snee.com and finds this triple:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;lt;http://clarkparsia.com/weblog/2008/10/31/we-won/&amp;gt; dc:creator &amp;quot;Bijan Parsia&amp;quot;.
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;He contacts me and says &amp;ldquo;Bijan didn&amp;rsquo;t write that! I did! Where are you getting this data?&amp;rdquo; I check and see that this triple is part of the named graph &lt;a href=&#34;http://www.snee.com/ns/graphids&#34;&gt;http://www.snee.com/ns/graphids&lt;/a&gt;#i23F2A9, so I query the metadata associated with this named graph and find this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;        &amp;lt;http://www.snee.com/ns/graphids#i23F2A9&amp;gt; 
        dc:date &amp;quot;2008-11-01T17:37:00&amp;quot;;
        dc:source &amp;lt;http://planetrdf.com/index.rdf&amp;gt;.
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I tell Kendall that I got that triple from the Planet RDF RSS feed at 5:37 PM GMT from the Planet RDF feed.&lt;/p&gt;
&lt;p&gt;Again, does the general outline of what I describe here make sense, or would there be a better way to approach it?&lt;/p&gt;
&lt;h2 id=&#34;4-comments&#34;&gt;4 Comments&lt;/h2&gt;
&lt;p&gt;By Simon Reinhardt on &lt;a href=&#34;#comment-2237&#34;&gt;March 15, 2009 9:19 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Hi Bob,&lt;/p&gt;
&lt;p&gt;Jeni wrote an &lt;a href=&#34;http://www.jenitennison.com/blog/node/101&#34;&gt;interesting piece&lt;/a&gt; on this the other day. Lots of relevant comments, too.&lt;br /&gt;
Not speaking of my comment, of course! ;-) However the reason I&amp;rsquo;m referring to this is that I put &lt;a href=&#34;http://www.jenitennison.com/blog/node/101#comment-4872&#34;&gt;my ideas on using the HTTP vocabulary in there&lt;/a&gt; which I think is relevant to your second use case. It&amp;rsquo;s restricted to cases where you dereference HTTP URIs but in those cases it gives you very detailed control.&lt;br /&gt;
Other relevant vocabularies: &lt;a href=&#34;http://tw.rpi.edu/2008/sw/archive.owl#&#34;&gt;http://tw.rpi.edu/2008/sw/archive.owl#&lt;/a&gt; &lt;a href=&#34;http://web.resource.org/rss/1.0/modules/syndication/&#34;&gt;http://web.resource.org/rss/1.0/modules/syndication/&lt;/a&gt; &lt;a href=&#34;http://wiki.foaf-project.org/ScutterVocab&#34;&gt;http://wiki.foaf-project.org/ScutterVocab&lt;/a&gt;&lt;br /&gt;
Hmm, lots of links. Usually that&amp;rsquo;s the point where my comment ends up in the spam box. ;-)&lt;/p&gt;
&lt;p&gt;Simon&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-2238&#34;&gt;March 15, 2009 10:54 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Thanks Simon! I&amp;rsquo;ve certainly been following Jeni&amp;rsquo;s work there. The HTTP vocabulary looks very useful, although a namespace prefix of http is bound to be confusing&amp;ndash;an end-tag with /http:httpVersion between the &amp;lt;&amp;gt; could be pretty confusing to people who aren&amp;rsquo;t hardcore markup geeks.&lt;/p&gt;
&lt;p&gt;The other vocabularies also look useful, but I have to wonder if some spokes of the Library of Congress MARC-based metadata wheels (e.g. METS, EAD) got reinvented in there. If so, they&amp;rsquo;ll make great demos for OWL equivalency predicates&amp;hellip;&lt;/p&gt;
&lt;p&gt;Bob\&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://thewebsemantic.com&#34; title=&#34;http://thewebsemantic.com&#34;&gt;Taylor&lt;/a&gt; on &lt;a href=&#34;#comment-2239&#34;&gt;March 25, 2009 3:20 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Your description here on the provenance/named graphs is good&amp;hellip;but I don&amp;rsquo;t think I like named graphs as the solution to provenance, because any given triple could flow through many graphs&amp;hellip;ie, tracking down the genesis of the triple is more or less archeology&amp;hellip;or even worse, trying to track the ownership of a penny.&lt;/p&gt;
&lt;p&gt;What if each triple were really a quadruple, the 4th item being a URI to the authority that gave birth to the statement. That URI might also point to a thing that is an instanceof &amp;ldquo;provenance node&amp;rdquo; with more stuff like the time, and person who asserted the fact, or if it&amp;rsquo;s inferred.&lt;/p&gt;
&lt;p&gt;By Crystal on &lt;a href=&#34;#comment-2246&#34;&gt;April 5, 2009 7:02 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;great post.. thanks&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://www.noprescription-needed.com/&#34;&gt;No Prescription Needed&lt;/a&gt;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2009">2009</category>
      
      <category domain="https://www.bobdc.com//categories/rdf">RDF</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
    </item>
    
    <item>
      <title>Querying a set of named RDF graphs without naming the graphs</title>
      <link>https://www.bobdc.com/blog/querying-a-set-of-named-rdf-gr/</link>
      <pubDate>Tue, 10 Mar 2009 20:01:38 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/querying-a-set-of-named-rdf-gr/</guid>
      
      
      <description><div>A big step toward using named graphs to track provenance.</div><div>&lt;p&gt;I&amp;rsquo;d like to thank everyone who added comments to my last post, &lt;a href=&#34;https://www.bobdc.com/blog/some-questions-about-rdf-named&#34;&gt;Some questions about RDF named graphs&lt;/a&gt;. Lee Feigenbaum wrote an entire &lt;a href=&#34;http://www.thefigtrees.net/lee/blog/2009/03/named_graphs_in_open_anzo.html&#34;&gt;blog post&lt;/a&gt; addressing the issues I raised, and it looks like his Open Anzo triplestore (which I&amp;rsquo;ll write up in its own post soon) has some nice support for versioning, access control, and replication.&lt;/p&gt;
&lt;blockquote id=&#34;id203618&#34; class=&#34;pullquote&#34;&gt;It all worked fine in Sesame and Virtuoso.&lt;/blockquote&gt;
&lt;p&gt;Jeni Tennison&amp;rsquo;s &lt;a href=&#34;https://www.bobdc.com/blog/some-questions-about-rdf-named#comment-2231&#34;&gt;comment&lt;/a&gt; was a bit embarassing, because it showed that the answer to my key question was right &lt;a href=&#34;http://www.w3.org/TR/2008/REC-rdf-sparql-query-20080115/#namedAndDefaultGraph&#34;&gt;in the SPARQL specification&lt;/a&gt;. I have read the entire spec, but didn&amp;rsquo;t understand the point of named graphs at the time, so that part didn&amp;rsquo;t sink in the way it should have.&lt;/p&gt;
&lt;p&gt;To review my third question, which built on the first two:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;If we&amp;rsquo;re going to use named graphs to track provenance, then it would make sense to assign each batch of data added to my triplestore to its own graph. Let&amp;rsquo;s say that after a while I have thousands of graphs, and I want to write a SPARQL query whose scope is 432 of those graphs. Do I need 432 &amp;ldquo;FROM NAMED&amp;rdquo; clauses in my query? (Let&amp;rsquo;s assume that I plan to query those same 432 multiple times.)&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I want to put each batch in its own graph so that I can store metadata for each batch. I also want to write a query that retrieves triples from a set of graphs, and when new graphs are added to the set, I don&amp;rsquo;t want to have to rewrite the query. Based on the example that Jeni pointed me to, I now know how to do this, and I assembled a working demo. It&amp;rsquo;s a pretty low-level demo; in my next posting I&amp;rsquo;ll describe one or two more real-world scenarios of applying these ideas, because I&amp;rsquo;d like some opinions on whether the architecture I have in mind makes sense.&lt;/p&gt;
&lt;p&gt;The SPARQL spec explains why I don&amp;rsquo;t need multiple FROM NAMED clauses to issue a single query against multiple graphs: &amp;ldquo;the GRAPH keyword is used to match patterns against named graphs. GRAPH can provide an IRI to select one graph or use a variable which will range over the IRI of all the named graphs in the query&amp;rsquo;s RDF dataset&amp;rdquo;. So, if I use a variable that ranges over 432 named graphs, I just need a pattern to identify those 432 graphs—ideally, a pattern that still works the following week if I need it to range over 433 graphs, then 434 graphs, and so forth.&lt;/p&gt;
&lt;p&gt;For my demo, I created three named graphs and a query that retrieves data from two by using a GRAPH pattern instead of explicitly naming them. Each graph assigns a Dublin Core title value to a book whose identifier is based on its ISBN, and the first two graphs identify themselves as subgraphs of &lt;a href=&#34;http://www.snee.com/ng/mygraph.rdf&#34;&gt;http://www.snee.com/ng/mygraph.rdf&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;My first graph:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;lt;rdf:RDF xmlns:dc=&amp;quot;http://purl.org/dc/elements/1.1/&amp;quot;
    xmlns:rdfg=&amp;quot;http://www.w3.org/2004/03/trix/rdfg-1/&amp;quot;
    xmlns:rdf=&amp;quot;http://www.w3.org/1999/02/22-rdf-syntax-ns#&amp;quot;&amp;gt;


  &amp;lt;rdf:Description rdf:about=&amp;quot;http://www.snee.com/ng/mybluegraph.rdf&amp;quot;&amp;gt;
    &amp;lt;rdfg:subGraphOf rdf:resource=&amp;quot;http://www.snee.com/ng/mygraph.rdf&amp;quot;/&amp;gt;
  &amp;lt;/rdf:Description&amp;gt;


  &amp;lt;rdf:Description rdf:about=&amp;quot;urn:isbn:1-93-022011-1&amp;quot;&amp;gt;
    &amp;lt;dc:title&amp;gt;XSLT Quickly&amp;lt;/dc:title&amp;gt;
  &amp;lt;/rdf:Description&amp;gt;


&amp;lt;/rdf:RDF&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;(Thanks also to Jeni for pointing me to the &lt;a href=&#34;http://www.w3.org/2004/03/trix/rdfg-1/&#34;&gt;http://www.w3.org/2004/03/trix/rdfg-1/&lt;/a&gt; vocabulary for describing graph relationships.) Second graph:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;lt;rdf:RDF xmlns:dc=&amp;quot;http://purl.org/dc/elements/1.1/&amp;quot;
    xmlns:rdfg=&amp;quot;http://www.w3.org/2004/03/trix/rdfg-1/&amp;quot;
    xmlns:rdf=&amp;quot;http://www.w3.org/1999/02/22-rdf-syntax-ns#&amp;quot;&amp;gt;


  &amp;lt;rdf:Description rdf:about=&amp;quot;http://www.snee.com/ng/myredgraph.rdf&amp;quot;&amp;gt;
    &amp;lt;rdfg:subGraphOf rdf:resource=&amp;quot;http://www.snee.com/ng/mygraph.rdf&amp;quot;/&amp;gt;
  &amp;lt;/rdf:Description&amp;gt;


  &amp;lt;rdf:Description rdf:about=&amp;quot;urn:isbn:0-13-082676-6&amp;quot;&amp;gt;
    &amp;lt;dc:title&amp;gt;XML: The Annotated Specification&amp;lt;/dc:title&amp;gt;
  &amp;lt;/rdf:Description&amp;gt;


&amp;lt;/rdf:RDF&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The third graph, whose data shouldn&amp;rsquo;t show up in the query results, because it&amp;rsquo;s not a subgraph of &lt;a href=&#34;http://www.snee.com/ng/mygraph.rdf&#34;&gt;http://www.snee.com/ng/mygraph.rdf&lt;/a&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;lt;rdf:RDF xmlns:dc=&amp;quot;http://purl.org/dc/elements/1.1/&amp;quot;
    xmlns:rdfg=&amp;quot;http://www.w3.org/2004/03/trix/rdfg-1/&amp;quot;
    xmlns:rdf=&amp;quot;http://www.w3.org/1999/02/22-rdf-syntax-ns#&amp;quot;&amp;gt;


  &amp;lt;rdf:Description rdf:about=&amp;quot;urn:isbn:0-13-475740-8&amp;quot;&amp;gt;
    &amp;lt;dc:title&amp;gt;SGML CD&amp;lt;/dc:title&amp;gt;
  &amp;lt;/rdf:Description&amp;gt;


&amp;lt;/rdf:RDF&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The following test query retrieved the titles from all three graphs, because it has no qualifications about which graphs to retrieve from:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;PREFIX dc:&amp;lt;http://purl.org/dc/elements/1.1/&amp;gt;


select ?title WHERE {?s dc:title ?title}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This next query, however, only wants dc:title values from graphs that are subgraphs of &lt;a href=&#34;http://www.snee.com/ng/mygraph.rdf&#34;&gt;http://www.snee.com/ng/mygraph.rdf&lt;/a&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;PREFIX dc:&amp;lt;http://purl.org/dc/elements/1.1/&amp;gt;
PREFIX rdfg:&amp;lt;http://www.w3.org/2004/03/trix/rdfg-1/&amp;gt;


select ?title 
WHERE { ?g rdfg:subGraphOf &amp;lt;http://www.snee.com/ng/mygraph.rdf&amp;gt;
        GRAPH ?g {?s dc:title ?title}
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;It all worked fine in Sesame and Virtuoso. (One note: when you load a graph into Sesame using its workbench interface, you can specify a URL for the graph&amp;rsquo;s name, so I chose names of the form &lt;a href=&#34;http://www.snee.com/ng/myredgraph.rdf&#34;&gt;http://www.snee.com/ng/myredgraph.rdf&lt;/a&gt; shown in the sample data files above. When loading the graphs into Virtuoso, its default behavior is to assign graph name URLs based on the URL of the WebDav folder used to load it—see my earlier posting on &lt;a href=&#34;https://www.bobdc.com/blog/getting-started-using-virtuoso&#34;&gt;Getting Started with Virtuoso&lt;/a&gt; for more on this—so for the data I loaded into Virtuoso, I used URLs that followed the form &lt;a href=&#34;http://local.virt/DAV/home/joeuser/rdf_sink/myredgraph.rdf&#34;&gt;http://local.virt/DAV/home/joeuser/rdf_sink/myredgraph.rdf&lt;/a&gt; for the rdfg:subGraphOf triples.) &lt;em&gt;Update: it works in OpenAnzo as well. When I first tried that, I didn&amp;rsquo;t know about the -A command line option when querying; see also Lee&amp;rsquo;s comment below. More on OpenAnzo in an upcoming post.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Now I know that the SPARQL standard and multiple open source implementations support the querying of a set of named graphs without requiring me to list them all, so the use of a large amount of graphs doesn&amp;rsquo;t sound so unwieldy. This is an important application building block, and next I&amp;rsquo;ll describe some things that sound sensible to build.&lt;/p&gt;
&lt;h2 id=&#34;7-comments&#34;&gt;7 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://thefigtrees.net/lee/blog/&#34; title=&#34;http://thefigtrees.net/lee/blog/&#34;&gt;Lee Feigenbaum&lt;/a&gt; on &lt;a href=&#34;#comment-2235&#34;&gt;March 11, 2009 1:22 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Bob, you said:&lt;/p&gt;
&lt;p&gt;&amp;ldquo;&amp;rdquo;&amp;quot;&lt;br /&gt;
The SPARQL spec explains why I don&amp;rsquo;t need multiple FROM NAMED clauses to issue a single query against multiple graphs: &amp;ldquo;the GRAPH keyword is used to match patterns against named graphs. GRAPH can provide an IRI to select one graph or use a variable which will range over the IRI of all the named graphs in the query&amp;rsquo;s RDF dataset&amp;rdquo;. So, if I use a variable that ranges over 432 named graphs, I just need a pattern to identify those 432 graphs—ideally, a pattern that still works the following week if I need it to range over 433 graphs, then 434 graphs, and so forth.&lt;br /&gt;
&amp;quot;&amp;quot;&amp;quot;&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;m not sure this is quite correct. From the SPARQL specification&amp;rsquo;s point of view, you (or your SPARQL engine) do indeed need to specify the graphs that comprise the RDF dataset against which you are querying.&lt;/p&gt;
&lt;p&gt;What makes this tractable is that some stores will, by default, make the default graph the RDF-merge (union, basically) of all of the graphs in the store and also add all graphs in the store as named graphs in the dataset.&lt;/p&gt;
&lt;p&gt;Other stores (e.g. Open Anzo) provide a &amp;ldquo;magic&amp;rdquo; URI to stand for &amp;ldquo;all graphs&amp;rdquo;, or introduce the concept of named datasets, computed datasets, etc. to address this challenge.&lt;/p&gt;
&lt;p&gt;Lee&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www-sop.inria.fr/edelweiss/people/Fabien.Gandon/&#34; title=&#34;http://www-sop.inria.fr/edelweiss/people/Fabien.Gandon/&#34;&gt;Fabien Gandon&lt;/a&gt; on &lt;a href=&#34;#comment-2236&#34;&gt;March 11, 2009 4:27 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I thought you might also be interested in the work done in CORESE for graph and paths handling :&lt;br /&gt;
&lt;a href=&#34;http://www-sop.inria.fr/edelweiss/software/corese/v2_4_1/manual/next.php&#34;&gt;http://www-sop.inria.fr/edelweiss/software/corese/v2_4_1/manual/next.php&lt;/a&gt;&lt;br /&gt;
in particular the nested graph and recursive querying mechanism.&lt;/p&gt;
&lt;p&gt;on the RDF side these extensions build on a previous member submission:&lt;br /&gt;
&lt;a href=&#34;http://www.w3.org/Submission/rdfsource/&#34;&gt;http://www.w3.org/Submission/rdfsource/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Cheers,&lt;/p&gt;
&lt;p&gt;\&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://gearon.blogspot.com/&#34; title=&#34;http://gearon.blogspot.com/&#34;&gt;Paula Gearon&lt;/a&gt; on &lt;a href=&#34;#comment-2241&#34;&gt;March 27, 2009 5:46 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;In response to Lee&amp;rsquo;s comment:&lt;br /&gt;
&amp;ldquo;I&amp;rsquo;m not sure this is quite correct. From the SPARQL specification&amp;rsquo;s point of view, you (or your SPARQL engine) do indeed need to specify the graphs that comprise the RDF dataset against which you are querying.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;No, that&amp;rsquo;s not actually true. I agree that this *seems* to be what the spec is implying, but it&amp;rsquo;s not true.&lt;/p&gt;
&lt;p&gt;If you use a variable in a GRAPH statement, then the variable will range over the dataset names - all graph names in scope. FROM NAMED does create a scope, but if it is not used in the query then it&amp;rsquo;s all the graph names.&lt;/p&gt;
&lt;p&gt;I made this mistake as well, and was trying to limit my SPARQL implementation to not allow unbound variables in the GRAPH position, but Andy Seaborne set me straight. (Andy is the editor of the SPARQL spec document, and the implementor of Jena of SPARQL for Jena).&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://thefigtrees.net/lee/blog/&#34; title=&#34;http://thefigtrees.net/lee/blog/&#34;&gt;Lee Feigenbaum&lt;/a&gt; on &lt;a href=&#34;#comment-2242&#34;&gt;March 30, 2009 9:57 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Paula, I beg to differ.&lt;/p&gt;
&lt;p&gt;The SPARQL specification has no concept of &amp;ldquo;all the graph names&amp;rdquo;. In the absence of any explicitly defined dataset, the query is run against a dataset that is chosen by your implementation (your SPARQL engine).&lt;/p&gt;
&lt;p&gt;For *some* implementations, this means that the query is run against all the graphs that the engine knows about. For other implementations, this means that the query is run against an empty dataset. For still others, an engine may be hardwired to query specific graphs in the absence of an explicitly given dataset.&lt;/p&gt;
&lt;p&gt;This is a very common misunderstanding about the SPARQL specification. I&amp;rsquo;m guessing that Andy was either speaking specifically about what Joseki/ARQ do or that there was a miscommunication there.&lt;/p&gt;
&lt;p&gt;best,&lt;br /&gt;
Lee&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://gearon.blogspot.com/&#34; title=&#34;http://gearon.blogspot.com/&#34;&gt;Paula Gearon&lt;/a&gt; on &lt;a href=&#34;#comment-2243&#34;&gt;March 30, 2009 3:13 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Hi Lee,&lt;/p&gt;
&lt;p&gt;My perspective on this comes from an email conversation with Andy, partly on a mailing list, and partly in private. The context was when I was implementing SPARQL on Mulgara (Andy is on the Mulgara mailing list).&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;m sure Andy won&amp;rsquo;t mind if I repeat part of one of our private emails here. I had been referring to an unbound variable named &amp;ldquo;x&amp;rdquo; which referred to the graph:&lt;/p&gt;
&lt;p&gt;  &lt;em&gt;?x ranges over the dataset names - all graph names in scope.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;  &lt;em&gt;FROM NAMED may have created a scope with certain names, but FROM NAMED is not necessary in a query anyway.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;  &lt;em&gt;Just query with&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;  &lt;em&gt;SELECT ?g { GRAPH ?g { ?s ?p ?o } }&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;  &lt;em&gt;All names of graphs.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;In a later (public) email, he replies to a comment from me:&lt;/p&gt;
&lt;p&gt;   &lt;em&gt;&amp;gt; It&amp;rsquo;s interesting you would say this. I wondered about this in the&lt;br /&gt;
   &lt;em&gt;&amp;gt; past, and wasn&amp;rsquo;t satisfied that it was allowed. Also, when I wrote my&lt;br /&gt;
   &lt;em&gt;&amp;gt; email this morning, then I looked it up again (so I didn&amp;rsquo;t look like&lt;br /&gt;
   &lt;em&gt;&amp;gt; an idiot - as I am wont to do) and again, I wasn&amp;rsquo;t satisfied that it&lt;br /&gt;
   &lt;em&gt;&amp;gt; was allowed.&lt;/em&gt;&lt;/em&gt;&lt;/em&gt;&lt;/em&gt;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;   &lt;em&gt;Paula,&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;  &lt;em&gt;Examples can&amp;rsquo;t be exhaustive! We tried to put in as much as we could but not every single case can be an example.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;  &lt;em&gt;There are tests in the test suite: graph/graph-02 to -09 or thereabouts&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;  &lt;em&gt;Section 12 gives the definition of the evaluation of a graph pattern.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;He also gave a final followup:&lt;/p&gt;
&lt;p&gt;   &lt;em&gt;&amp;gt; I take it that 12.5 is the area of most relevance here? Specifically,&lt;br /&gt;
   &lt;em&gt;&amp;gt; the definition of:&lt;br /&gt;
   &lt;em&gt;&amp;gt; eval(D(G), Graph(var,P)) = &amp;hellip;..&lt;br /&gt;
   &lt;em&gt;&amp;gt; ???&lt;/em&gt;&lt;/em&gt;&lt;/em&gt;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;   &lt;em&gt;Yes - that&amp;rsquo;s it: the example in section 8.3.4 is:&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;   &lt;em&gt;1 PREFIX dc:&lt;br /&gt;
   &lt;em&gt;2 PREFIX foaf:&lt;br /&gt;
   &lt;em&gt;3&lt;br /&gt;
   &lt;em&gt;4 SELECT ?name ?mbox ?date&lt;br /&gt;
   &lt;em&gt;5 WHERE&lt;br /&gt;
   &lt;em&gt;6 { ?g dc:publisher ?name ;&lt;br /&gt;
   &lt;em&gt;7 dc:date ?date .&lt;br /&gt;
   &lt;em&gt;8 GRAPH ?g&lt;br /&gt;
   &lt;em&gt;9 { ?person foaf:name ?name ;&lt;br /&gt;
   &lt;em&gt;10 foaf:mbox ?mbox .&lt;br /&gt;
   &lt;em&gt;11 }&lt;br /&gt;
   &lt;em&gt;12 }&lt;/em&gt;&lt;/em&gt;&lt;/em&gt;&lt;/em&gt;&lt;/em&gt;&lt;/em&gt;&lt;/em&gt;&lt;/em&gt;&lt;/em&gt;&lt;/em&gt;&lt;/em&gt;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;  &lt;em&gt;which is the algebra expression:&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;   &lt;em&gt;1 (base&lt;br /&gt;
   &lt;em&gt;2 (prefix ((dc: )&lt;br /&gt;
   &lt;em&gt;3 (foaf: ))&lt;br /&gt;
   &lt;em&gt;4 (project (?name ?mbox ?date)&lt;br /&gt;
   &lt;em&gt;5 (join&lt;br /&gt;
   &lt;em&gt;6 (BGP&lt;br /&gt;
   &lt;em&gt;7 (triple ?g dc:publisher ?name)&lt;br /&gt;
   &lt;em&gt;8 (triple ?g dc:date ?date)&lt;br /&gt;
   &lt;em&gt;9 )&lt;br /&gt;
   &lt;em&gt;10 (graph ?g&lt;br /&gt;
   &lt;em&gt;11 (BGP&lt;br /&gt;
   &lt;em&gt;12 (triple ?person foaf:name ?name)&lt;br /&gt;
   &lt;em&gt;13 (triple ?person foaf:mbox ?mbox)&lt;br /&gt;
   &lt;em&gt;14 ))&lt;br /&gt;
   &lt;em&gt;15 ))))&lt;/em&gt;&lt;/em&gt;&lt;/em&gt;&lt;/em&gt;&lt;/em&gt;&lt;/em&gt;&lt;/em&gt;&lt;/em&gt;&lt;/em&gt;&lt;/em&gt;&lt;/em&gt;&lt;/em&gt;&lt;/em&gt;&lt;/em&gt;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;   &lt;em&gt;And being a applicative-order evaluation (graph ?g &amp;hellip; ) is evaluated as in 12.5 &amp;quot; Evaluation of a Graph Pattern &amp;quot; then participates in the join. That ?g is unconstrained at the point of evaluating the (graph &amp;hellip;).&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;   &lt;em&gt;D, the dataset, can come from the protocol, the query or the execution environment (in that order of priority).&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s entirely possibly that I&amp;rsquo;m misinterpreting what Andy is saying here, but my reading of it is that an unbound variable using in a GRAPH expression will evaluate to all known graphs. Also, I&amp;rsquo;m not saying that Andy is the definitive source for this information, but he is certainly clearer on it than I am. :-)&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://seaborne.blogspot.com/&#34; title=&#34;http://seaborne.blogspot.com/&#34;&gt;Andy Seaborne&lt;/a&gt; on &lt;a href=&#34;#comment-2244&#34;&gt;March 31, 2009 7:18 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;In SPARQL, a query executes over a dataset. GRAPH accesses the names in the dataset. If it&amp;rsquo;s &amp;ldquo;GRAPH ?g&amp;rdquo; and then ?g ranges over all names in the dataset. This is in &amp;ldquo;&lt;a href=&#34;http://www.w3.org/TR/rdf-sparql-query/#defn_evalGraph&#34;&gt;Definition: Evaluation of a Graph Pattern&lt;/a&gt;&amp;rdquo; (third case). ?g may be constrained in other parts of the query.&lt;/p&gt;
&lt;p&gt;Where the dataset comes from is either from a description or the dataset is decided by the query service.&lt;/p&gt;
&lt;p&gt;There are two ways to describe the dataset - in the query with FROM/FROM NAMED, or the protocol with default-graph-uri/named-graph-uri. If the dataset is described, the the dataset must be what is described and not more (or worse, different).&lt;/p&gt;
&lt;p&gt;If the dataset is not described whatever the service provides, it can be set up is a variety of ways - the case of having the default graph being the manifest and other metadata for the named graph sis quite an interesting set up for the provenance situation.&lt;/p&gt;
&lt;p&gt;Bob&amp;rsquo;s &lt;a href=&#34;#id203803&#34;&gt;query&lt;/a&gt;&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;PREFIX dc:
PREFIX rdfg:


select ?title 


WHERE { ?g rdfg:subGraphOf 


        GRAPH ?g {?s dc:title ?title}


}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;will work on any SPARQL system that can be set up with the dataset having all the named graphs in it and the default graph being the manifest (and other details) of named graphs.&lt;/p&gt;
&lt;p&gt;A query service can reject any query it chooses not to answer. Maybe it only processes queries with a description, maybe it only processes queries without a description. The latter case is important because it is the case of providing access to a published dataset over the web. Here, just providing query over that one dataset is what the service does. It might even execute a query if the description matches but there is no obligation for it to do that. After all, the dataset may not be describable if it&amp;rsquo;s some large relational database fronted by a SPARQL query service.&lt;/p&gt;
&lt;p&gt;A graph can appear more than once in the dataset under different names or once as the default graph and also with a name. The default graph or any named graph may be some calculated form of other graphs such as the RDF merge. What matters is that at the time the query is executed, there is a dataset, and that the dataset has a default graph and zero or more named graphs.&lt;/p&gt;
&lt;p&gt;Aside: the test assume the default if not mentioned is the empty graph. That&amp;rsquo;s just convenience for the tests and compatible with the fact that if there is a description of the dataset and the default graph is not otherwise mentioned (FROM, default-graph-uri) then it is empty. The test provide another way to describe the dataset for the purposes of the tests.&lt;/p&gt;
&lt;p&gt;I was quoted as saying:&lt;/p&gt;
&lt;p&gt;&lt;em&gt;&amp;quot;?x ranges over the dataset names - all graph names in scope.&amp;quot;&lt;/em&gt; and the &amp;ldquo;all&amp;rdquo; here is all names in the dataset because a query is executed over a dataset. See &amp;ldquo;&lt;a href=&#34;http://www.w3.org/TR/rdf-sparql-query/#defn_evalGraph&#34;&gt;Definition: Evaluation of a Graph Pattern&lt;/a&gt;&amp;rdquo;.&lt;/p&gt;
&lt;p&gt;When Paula says: &lt;em&gt;&amp;ldquo;my reading of it is that an unbound variable using in a GRAPH expression will evaluate to all known graphs.&amp;rdquo;&lt;/em&gt; this is true where &amp;ldquo;all known&amp;rdquo; means all known in the dataset. Not all graphs on the web.&lt;/p&gt;
&lt;p&gt;LeeF said: &lt;em&gt;&amp;ldquo;From the SPARQL specification&amp;rsquo;s point of view, you (or your SPARQL engine) do indeed need to specify the graphs that comprise the RDF dataset against which you are querying.&amp;rdquo;&lt;/em&gt; is also true but the tricky part is &amp;ldquo;or your SPARQL engine&amp;rdquo;. It seems to me that the issue is about what is the dataset. While one could have a dataset which is &amp;ldquo;all graphs on the web, by name&amp;rdquo; or some such dataset, no implementation could realise that but still we have a dataset specified. The requirement is to have a dataset when the query executes and to have a set of names that can be iterated over for the evaluation of GRAPH with a variable that is not constrained in any way could not be met.&lt;/p&gt;
&lt;p&gt;Some systems have a notion of the graphs that they have in their storage. The dataset description is interpreted as meaning &amp;ldquo;pick graphs out of the collection of stored graphs&amp;rdquo;. Nothing wrong with that model, but it&amp;rsquo;s not required by the specs.&lt;/p&gt;
&lt;p&gt;I do think it would be wrong to be overly prescriptive, especially about the default graph. Some systems foce that to be teh RDf merge of the named graphs, and that would preclude Bob&amp;rsquo;s use case of the default graph having the manifest information.&lt;/p&gt;
&lt;p&gt;Finally, we have a new working group running. All that matters is the text in the documents, and not the intent of the text, what implementations do nor what authors thought they meant when writing the text. The &lt;a href=&#34;http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/&#34;&gt;workign group comments list is the place to send suggestions for improving the text. Hint, hint.&lt;br /&gt;
&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-2245&#34;&gt;March 31, 2009 7:56 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Thanks Andy! My use of the default graph for manifest information was just off the top of my head when coming up with the example. It sounds like a specific named graph for these triples would be a good idea.&lt;/p&gt;
&lt;p&gt;Bob&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2009">2009</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
    </item>
    
    <item>
      <title>Some questions about RDF named graphs</title>
      <link>https://www.bobdc.com/blog/some-questions-about-rdf-named/</link>
      <pubDate>Sun, 01 Mar 2009 11:17:17 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/some-questions-about-rdf-named/</guid>
      
      
      <description><div>Trying to connect the data structure to real-world use.</div><div>&lt;p&gt;Most triplestores support named graphs, and from a high level I can see how they&amp;rsquo;d be useful, but as I think about using named graphs to address specific application needs, some questions come to mind, so I thought I&amp;rsquo;d throw them out there.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;If graph membership is implemented by using the fourth part of a quad to name the graph that the triple belongs to, then a triple can only belong directly to one graph, right?&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;I say &amp;ldquo;belong directly&amp;rdquo; because I&amp;rsquo;m thinking that a graph can belong to another graph. If so, how would this be indicated? Is there some specific predicate to indicate that graph x belongs to graph y?&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;If we&amp;rsquo;re going to use named graphs to track provenance, then it would make sense to assign each batch of data added to my triplestore to its own graph. Let&amp;rsquo;s say that after a while I have thousands of graphs, and I want to write a SPARQL query whose scope is 432 of those graphs. Do I need 432 &amp;ldquo;FROM NAMED&amp;rdquo; clauses in my query? (Let&amp;rsquo;s assume that I plan to query those same 432 multiple times.)&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;I can think of more questions, but I want to wait and see what I can learn about the issues above, and then I can ask better follow-up questions.&lt;/p&gt;
&lt;h2 id=&#34;6-comments&#34;&gt;6 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.linkedin.com/in/erics&#34; title=&#34;http://www.linkedin.com/in/erics&#34;&gt;Eric Schoonover&lt;/a&gt; on &lt;a href=&#34;#comment-2229&#34;&gt;March 1, 2009 1:02 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;In the repository I am helping to build we have the concept of a graph alias that helps with the overload of named or default graph references in your SPARQL query. It is especially useful if you are going to be executing multiple queries against the same set of graphs. You can assign a single URI that acts as an alias to the 432 graphs you really intend to query and then you can have a single FROM or FROM NAMED clause that points to the graph alias and the SPARQL endpoint will automatically expand the query based on the contents of the graph alias.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.furia.com&#34; title=&#34;http://www.furia.com&#34;&gt;glenn mcdonald&lt;/a&gt; on &lt;a href=&#34;#comment-2230&#34;&gt;March 1, 2009 1:58 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I think this idea of named-graphs being a &amp;ldquo;physical&amp;rdquo; (i.e., exclusive, containing) partitioning of the triple-space not only doesn&amp;rsquo;t make sense, but its failure to make sense is in hilariously exact hierarchical contradiction to the very graph-structured premise of RDF. The relationship between a triple and anything else demands all the same structure and flexibility as anything other kind of relationship. The fourth column in a quad-store should not be graph-name, it should be triple-id. Once a triple has an ID, you can then express anything you want *about* that triple, whether it&amp;rsquo;s confidence or provenance or batch or saltiness or whatever.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.jenitennison.com/blog&#34; title=&#34;http://www.jenitennison.com/blog&#34;&gt;Jeni Tennison&lt;/a&gt; on &lt;a href=&#34;#comment-2231&#34;&gt;March 1, 2009 2:48 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;m no expert, but I agree with Glenn Mcdonald, that the fourth column should really be triple-id (as in a unique URI for each triple). Then again, I think named graphs are flexible enough to be used in this way anyway: they just &lt;em&gt;can&lt;/em&gt; encapsulate more than one triple if that&amp;rsquo;s useful.&lt;/p&gt;
&lt;p&gt;As far as the questions go: my understanding is that a given triple (as in a unique subject/property/object combination) can belong to multiple graphs. Each graph it belongs to provides a separate &amp;lsquo;row&amp;rsquo; in the quad store.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://www.w3.org/2004/03/trix/&#34;&gt;Named Graphs / Semantic Web Activity&lt;/a&gt; points to a vocabulary for describing the relationships between graphs (subgraphs, equivalent graphs and so on) at &lt;a href=&#34;http://www.w3.org/2004/03/trix/rdfg-1/&#34;&gt;http://www.w3.org/2004/03/trix/rdfg-1/&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I agree with Eric about making your 432 graphs subgraphs or a larger graph which you then query. I guess how you do that depends on the triplestore you&amp;rsquo;re using. The SPARQL specification has an example of &lt;a href=&#34;http://www.w3.org/TR/2008/REC-rdf-sparql-query-20080115/#namedAndDefaultGraph&#34;&gt;named and default graphs&lt;/a&gt; which might be useful as a starting point.&lt;/p&gt;
&lt;p&gt;By Chris Booth on &lt;a href=&#34;#comment-2232&#34;&gt;March 1, 2009 3:08 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;m no expert, especially about your first two questions, but for your third question it seems to me that you could use a variable for the named graph and then FILTER the results. That might not reduce your 432 individual requests to one, but it might help quite considerably.&lt;/p&gt;
&lt;p&gt;By Damian on &lt;a href=&#34;#comment-2233&#34;&gt;March 1, 2009 3:24 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Oh boy, good questions. Let&amp;rsquo;s try these ropey definitions first:&lt;/p&gt;
&lt;p&gt;Graph: a set of triples.&lt;br /&gt;
Named graph: a name, graph pair.&lt;br /&gt;
Dataset: a default graph, and zero or more named graphs.&lt;/p&gt;
&lt;p&gt;1) No, a triple can be in more than one graph: , . However some stores let you ignore the graphs in certain situations, which require caution to maintain the set-ness of the resulting pseudo-graph. I believe some stores use this as the default graph in SPARQL, which is neither precluded nor suggested by the spec.&lt;/p&gt;
&lt;p&gt;2) I don&amp;rsquo;t understand how a graph can belong to another graph. It might be mentioned (e.g. one graph contains a statement &amp;lsquo;:Bob eg:made &amp;lsquo;)? You may have functional dependencies between graphs ( made from via CONSTRUCT), but that&amp;rsquo;s up to your application to track. Named graphs are just graphs with names, nothing more.&lt;/p&gt;
&lt;p&gt;3) An exciting part of SPARQL :-) In SPARQL you query a dataset, but what determines the dataset? It might be the protocol parameters, it might be the query (your FROM and FROM NAMED), and it might simply be the endpoint that determines it. So don&amp;rsquo;t expect the endpoint to even pay attention to your FROM NAMED clauses.&lt;/p&gt;
&lt;p&gt;The best I can suggest is talk to your store vendor, although you may find FILTERing graphs in or out will do the trick.&lt;/p&gt;
&lt;p&gt;Hope this comment helps a little.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://thefigtrees.net/lee/blog/&#34; title=&#34;http://thefigtrees.net/lee/blog/&#34;&gt;Lee Feigenbaum&lt;/a&gt; on &lt;a href=&#34;#comment-2234&#34;&gt;March 2, 2009 12:21 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Bob,&lt;/p&gt;
&lt;p&gt;For the most part, I agree with everything Damian says. That said, since Open Anzo is based on a named graph model, I wanted to give some specific answers based on our experience.&lt;/p&gt;
&lt;p&gt;Since my comments were a bit lengthy, I stuck them on my blog:&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://www.thefigtrees.net/lee/blog/2009/03/named_graphs_in_open_anzo.html&#34;&gt;http://www.thefigtrees.net/lee/blog/2009/03/named_graphs_in_open_anzo.html&lt;/a&gt;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2009">2009</category>
      
      <category domain="https://www.bobdc.com//categories/rdf">RDF</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
    </item>
    
    <item>
      <title>Restoring context to shortened URLs in Twitter</title>
      <link>https://www.bobdc.com/blog/restoring-context-to-shortened/</link>
      <pubDate>Thu, 26 Feb 2009 13:44:00 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/restoring-context-to-shortened/</guid>
      
      
      <description><div>Giving me a better idea what tweets are pointing at.</div><div>&lt;p&gt;When you have to fit Twitter messages into 140 characters, URL shortening services such as &lt;a href=&#34;http://tinyurl.com/&#34;&gt;TinyURL&lt;/a&gt; and &lt;a href=&#34;http://is.gd/&#34;&gt;is.gd&lt;/a&gt; are handy, but I hate seeing tweets likes this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;This is hilarious: http://is.gd/kSyL
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Typical URLs do include information that provides context, starting with the domain name. If someone points to a &amp;ldquo;great article on [whatever]&amp;rdquo;, the fact that it&amp;rsquo;s on nytimes.com versus someguy.wordpress.com gives me a clue about how much I want to read it, so if the description with the URL doesn&amp;rsquo;t give any meaningful context, I&amp;rsquo;m not going to follow the link.&lt;/p&gt;
&lt;p&gt;Firefox plug-in to the rescue: I recently learned from a &lt;a href=&#34;http://twitter.com/kasthomas&#34;&gt;@kasthomas&lt;/a&gt; about the &lt;a href=&#34;https://addons.mozilla.org/en-US/firefox/addon/8636&#34;&gt;LongURL expander&lt;/a&gt;, which displays the real destination of a URL when you mouse over the shortened version.&lt;/p&gt;
&lt;p&gt;Thanks, &lt;a href=&#34;http://iamseanmurphy.com/&#34;&gt;Sean Murphy&lt;/a&gt;, for writing it!&lt;/p&gt;
&lt;h2 id=&#34;2-comments&#34;&gt;2 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://IamSeanMurphy.com&#34; title=&#34;http://IamSeanMurphy.com&#34;&gt;Sean Murphy&lt;/a&gt; on &lt;a href=&#34;#comment-2226&#34;&gt;February 26, 2009 4:14 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Hey, no problem! I&amp;rsquo;m glad people find it as useful as I do. Thanks for spreading the word.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://danbri.org/&#34; title=&#34;http://danbri.org/&#34;&gt;Dan Brickley&lt;/a&gt; on &lt;a href=&#34;#comment-2227&#34;&gt;February 27, 2009 1:04 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Good point, though often enough a popular domain name alone isn&amp;rsquo;t quite enough to indicate the dangers lurking behind a shortened link&amp;hellip;&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://bit.ly/4kb77v&#34;&gt;http://bit.ly/4kb77v&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Another plugin from bitly: &lt;a href=&#34;https://addons.mozilla.org/en-US/firefox/addon/10297&#34;&gt;https://addons.mozilla.org/en-US/firefox/addon/10297&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Family filter-related stuff (PICS; POWDER) fit into the landscape here somewhere too&amp;hellip;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2009">2009</category>
      
      <category domain="https://www.bobdc.com//categories/neat-tricks">neat tricks</category>
      
    </item>
    
    <item>
      <title>Sorry Facebook, not these blog postings</title>
      <link>https://www.bobdc.com/blog/sorry-facebook-not-these-blog/</link>
      <pubDate>Tue, 17 Feb 2009 17:21:30 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/sorry-facebook-not-these-blog/</guid>
      
      
      <description><div>This is the last one for which you get a &#34;perpetual, fully-paid right to sublicense, modify, edit, create derivate works and distribute&#34;.</div><div>&lt;p&gt;&lt;a href=&#34;http://www.facebook.com/terms.php&#34;&gt;&lt;img id=&#34;id197045&#34; src=&#34;http://creative.ak.facebook.com/ads3/creative/pressroom/jpg/b_1234208947_facebook_logo.jpg&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;Facebook logo from their &#39;press room&#39; page&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;There&amp;rsquo;s been plenty of fuss over the changes to Facebook&amp;rsquo;s terms of service recently, even in yesterday&amp;rsquo;s &lt;a href=&#34;http://www.nytimes.com/2009/02/17/technology/internet/17facebook.htm&#34;&gt;New York Times&lt;/a&gt;. Trying to remember which of my friends recently tweeted &amp;ldquo;All your data are belong to us&amp;rdquo; on the topic, I &lt;a href=&#34;http://search.twitter.com/search?q=%22all+your+data+are+belong%22&#34;&gt;searched Twitter&lt;/a&gt; this morning and found that dozens of people have done so in the last 24 hours.&lt;/p&gt;
&lt;p&gt;Instead of going over the claims and counterclaims about Facebook&amp;rsquo;s intent, let&amp;rsquo;s go right to the primary document: the &lt;a href=&#34;http://www.facebook.com/terms.php&#34;&gt;Facebook Terms of Service&lt;/a&gt;. (After all, if you&amp;rsquo;re battling them in court, it won&amp;rsquo;t carry much weight to say &amp;ldquo;but your honor, their corporate communications guy told the New York Times that what they really meant was&amp;hellip;&amp;rdquo;) The &lt;a href=&#34;http://consumerist.com/5150175/facebooks-new-terms-of-service-we-can-do-anything-we-want-with-your-content-forever&#34;&gt;big issue this week&lt;/a&gt; is a sentence that was removed after the following paragraph, but I found unchanged text in the paragraph itself that scared me:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;You hereby grant Facebook an irrevocable, perpetual, non-exclusive, transferable, fully paid, worldwide license (with the right to sublicense) to (a) use, copy, publish, stream, store, retain, publicly perform or display, transmit, scan, reformat, modify, edit, frame, translate, excerpt, adapt, create derivative works and distribute (through multiple tiers), any User Content you (i) Post on or in connection with the Facebook Service or the promotion thereof subject only to your privacy settings&amp;hellip;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I found their &amp;ldquo;import a blog&amp;rdquo; feature handy, so that what I put on &lt;a href=&#34;http://www.snee.com/bobdc.blog/&#34;&gt;bobdc.blog&lt;/a&gt; automatically gets published as a Facebook Note as well, thereby reaching more people. But do I want to grant Facebook a perpetual right to create derivative works from any content I post? A transferable, fully paid right, so that they can sell my content to others? I don&amp;rsquo;t think so. &amp;ldquo;Subject to [my] privacy settings&amp;rdquo; isn&amp;rsquo;t very reassuring; I&amp;rsquo;m not writing about any particularly private issues, so I don&amp;rsquo;t want distribution limited to my Facebook friends. As with the posting of my Twitter messages into my Facebook status, I found this automated importing of weblog postings to be a nice convenience, but it looks like the potential cost is too high. I&amp;rsquo;m disabling the blog import after this shows up as a Facebook note.&lt;/p&gt;
&lt;p&gt;Why do I even bother with Facebook? Sometimes it&amp;rsquo;s a handy way to get in touch with someone whose email address changed because their DSL provider got bought out by another one, and the new one&amp;rsquo;s rebranding effort extended to changing the domain name in all the customers&amp;rsquo; email addresses. I&amp;rsquo;ve never actually &amp;ldquo;friended&amp;rdquo; anyone in Facebook, but I do accept if someone I know friends me.&lt;/p&gt;
&lt;p&gt;Henry Story of Sun is working on some technology to allow the &lt;a href=&#34;http://blogs.sun.com/bblfish/entry/building_secure_and_distributed_social&#34;&gt;Building [of] Secure, Open and Distributed Social Network Applications&lt;/a&gt;. I hope that work like this gets us to a point where social networking connections and features, like the web itself, are distributed among data and services that different people choose on their own terms instead of being owned by a single, &lt;a href=&#34;http://www.time.com/time/business/article/0,8599,1644040,00.html&#34;&gt;privately owned&lt;/a&gt; corporation that reserves the right to do whatever they want with our content.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;February 18 update: It looks like Facebook has not only restored the sentence that everyone worried about losing from the Terms of Service but removed the language above as well.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;It&amp;rsquo;s interesting to compare &lt;a href=&#34;http://twitter.com/terms&#34;&gt;Twitter&amp;rsquo;s Terms of Service&lt;/a&gt;, which should provide a model for all such services.&lt;/em&gt;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2009">2009</category>
      
      <category domain="https://www.bobdc.com//categories/publishing">publishing</category>
      
    </item>
    
    <item>
      <title>Getting started using Virtuoso as a triplestore</title>
      <link>https://www.bobdc.com/blog/getting-started-using-virtuoso/</link>
      <pubDate>Mon, 16 Feb 2009 10:02:33 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/getting-started-using-virtuoso/</guid>
      
      
      <description><div>The open source edition.</div><div>&lt;p&gt;Just about all the RDF triplestores I&amp;rsquo;ve &lt;a href=&#34;https://www.bobdc.com/blog/playing-with-some-rdf-stores&#34;&gt;been trying&lt;/a&gt; were designed from the ground up to store RDF triples. &lt;a href=&#34;http://www.openlinksw.com/&#34;&gt;OpenLink Software&amp;rsquo;s&lt;/a&gt; Virtuoso is a database server that can also store (and, as part of its original specialty, serve as an efficient interface to databases of) relational data and XML, so some of my setup and usage steps required learning a few other aspects of it first. For example, the actual loading of RDF is done using Virtuoso&amp;rsquo;s &lt;a href=&#34;http://www.webdav.org/&#34;&gt;WebDAV&lt;/a&gt; support, so I had to learn a bit about that. At first this seemed like another obstacle along the way to my goal of loading RDF and then issuing SPARQL queries against it, but I reminded myself that in a fast, free database server that supports a variety of data models, WebDAV support is most certainly a feature, not a bug.&lt;/p&gt;
&lt;blockquote id=&#34;id197076&#34; class=&#34;pullquote&#34;&gt;The possibility of a single server that can store both XML content and RDF triples of metadata about that content could be very interesting for publishers.&lt;/blockquote&gt;
&lt;p&gt;After downloading and unzipping the &lt;a href=&#34;http://virtuoso.openlinksw.com/wiki/main/Main/&#34;&gt;open source edition&lt;/a&gt; of Virtuoso for Windows, I found the virtuoso-t.exe server program in the virtuoso-opensource\bin directory. Running this with &lt;code&gt;--help&lt;/code&gt; as a parameter showed me the various options for starting it up, including the commands to create a Virtuoso Windows service and to then start up that service.&lt;/p&gt;
&lt;p&gt;Once I had this service running, sending a browser to http://localhost:8890/ displayed the product&amp;rsquo;s Welcome page, and the first choice on the menu on the left side of this screen took me to the Virtuoso Conductor. The Conductor requires you to log in before getting anything done, and the &lt;a href=&#34;http://docs.openlinksw.com/virtuoso/newadminui.html#defpasschange&#34;&gt;Default Passwords&lt;/a&gt; section of the &lt;a href=&#34;http://docs.openlinksw.com/virtuoso/quicktours.html&#34;&gt;Quick Start &amp;amp; Tours&lt;/a&gt; documentation included &amp;ldquo;dba&amp;rdquo; in its list of default IDs, so I logged in as the dba.&lt;/p&gt;
&lt;p&gt;An HTTP request to load data must specify the ID of the user loading the data, so as the dba I created a new user by picking the System Admin tab, User Accounts, and then &amp;ldquo;Create New Account&amp;rdquo; to create a joeuser account. I had some initial trouble configuring this user&amp;rsquo;s account to let it do all the things it needed to do, but with some &lt;a href=&#34;http://sourceforge.net/mailarchive/forum.php?thread_name=4990C45F.2000003%40snee.com&amp;amp;forum_name=virtuoso-users&#34;&gt;help on the virtuoso-users mailing list&lt;/a&gt; I learned that I had to check &amp;ldquo;User Enabled&amp;rdquo; and &amp;ldquo;Allow SQL/ODBC Logins&amp;rdquo;, add the roles SPARQL_SELECT and SPARQL_UPDATE for the user, check &amp;ldquo;Allow DAV logins&amp;rdquo;, set a DAV home path of /DAV/home/joeuser/ for this user, and check the DAV folder name&amp;rsquo;s Create box before clicking the Save button to actually create this user account and WebDAV folder.&lt;/p&gt;
&lt;p&gt;For a given user, you can create any WebDAV folder you want, upload RDF data to it, and then load that data to the triplestore (a quad store, actually, to track each triple&amp;rsquo;s graph) from that folder, but Virtuoso includes a special folder with each WebDAV-enabled account called &lt;a href=&#34;http://virtuoso.openlinksw.com/dataspace/dav/wiki/Main/VirtuosoRDFSinkFolder&#34;&gt;rdf_sink&lt;/a&gt; to automate this process so that once you load an RDF file there its triples get sent right to the quad store.&lt;/p&gt;
&lt;p&gt;Once I had created the joeuser account with a password of jupw, the following &lt;a href=&#34;http://curl.haxx.se/&#34;&gt;cURL&lt;/a&gt; command loaded the fakeAddrBookPt1.rdf file into the graph named http://localhost:8890/DAV/home/joeuser/rdf_sink (all curl command lines shown here include extra carriage returns for readability):&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;curl -i -T fakeAddrBookPt1.rdf 
  http://localhost:8890/DAV/home/joeuser/rdf_sink/fakeAddrBookPt1.rdf
  -u joeuser:jupw
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;(Later, substituting &amp;ldquo;fakeAddrBookPt2.rdf&amp;rdquo; for &amp;ldquo;fakeAddrBookPt1.rdf&amp;rdquo; in that command loaded this other file into the same graph.) After loading fakeAddrBookPt1.rdf, I went to the SPARQL query form at http://localhost:8890/sparql (the RDF tab of the Virtuoso conductor displays a similar one), entered http://localhost:8890/DAV/home/joeuser/rdf_sink/ as the Default Graph URI to query, and entered my favorite first SPARQL query of &amp;ldquo;SELECT DISTINCT ?p WHERE {?s ?p ?o}&amp;rdquo; in the Query text field. Clicking the Run Query button then retrieved a list of predicates from the data that I had loaded into that graph, just as I&amp;rsquo;d asked for.&lt;/p&gt;
&lt;p&gt;Because issuing a SPARQL query with curl reassures me that I really understand a server&amp;rsquo;s HTTP interface, I also entered the following command to perform the same query and got the same result:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;curl -F &amp;quot;query=SELECT DISTINCT ?p FROM 
  &amp;lt;http://localhost:8890/DAV/home/joeuser/rdf_sink/&amp;gt; 
  WHERE {?s ?p ?o}&amp;quot; http://localhost:8890/sparql
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Throughout my process of setting this up and trying Virtuoso, I must admit that I did a lot of hunting in the documentation, although as I mentioned I got very good help on the mailing list. There is a &lt;a href=&#34;http://demo.openlinksw.com/doc/pdf/virtdocs.pdf&#34;&gt;documentation PDF file&lt;/a&gt; that looks pretty complete—at 15 megs and 2202 pages, it better be!&lt;/p&gt;
&lt;p&gt;For my next step with Virtuoso, the &lt;a href=&#34;http://docs.openlinksw.com/virtuoso/rdfsparqlrule.html&#34;&gt;RDF Inference in Virtuoso&lt;/a&gt; page describes some RDFS and OWL support, but shows that the RDFS and OWL properties must be loaded using special functions instead of just including them as more triples with the data. I&amp;rsquo;ll probably try it, but I&amp;rsquo;m also very curious about Virtuoso&amp;rsquo;s XQuery support—the possibility of a single server that can store both XML content and RDF triples of metadata about that content could be very interesting &lt;a href=&#34;https://www.bobdc.com/blog/publishing-and-semantic-web-te&#34;&gt;for publishers&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;4-comments&#34;&gt;4 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.openlinksw.com/blog/~kidehen&#34; title=&#34;http://www.openlinksw.com/blog/~kidehen&#34;&gt;Kingsley Idehen&lt;/a&gt; on &lt;a href=&#34;#comment-2223&#34;&gt;February 16, 2009 4:37 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Bob,&lt;/p&gt;
&lt;p&gt;Here are some links to posts I made in the past re. Inference rules.&lt;/p&gt;
&lt;p&gt;1. &lt;a href=&#34;http://www.mail-archive.com/public-lod@w3.org/msg00870.html&#34;&gt;http://www.mail-archive.com/public-lod@w3.org/msg00870.html&lt;/a&gt; - UMBEL &amp;amp; DBpedia&lt;br /&gt;
2. &lt;a href=&#34;http://www.mail-archive.com/dbpedia-discussion@lists.sourceforge.net/msg00263.html&#34;&gt;http://www.mail-archive.com/dbpedia-discussion@lists.sourceforge.net/msg00263.html&lt;/a&gt; - YAGO &amp;amp; DBpedia&lt;/p&gt;
&lt;p&gt;Steps:&lt;/p&gt;
&lt;p&gt;1. You load the class hierarchies in question (typically an OWL ontology)&lt;br /&gt;
2. You associate an named rule with the named graph hosting the ontology in step 1&lt;br /&gt;
3. You execute SPARQL with the inference rule pragma which allows you to select which rules to use for reasoning.&lt;/p&gt;
&lt;p&gt;By Mario Kofler on &lt;a href=&#34;#comment-2582&#34;&gt;July 1, 2010 11:18 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Hello,&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;I created a new user by picking the System Admin tab, User&lt;br /&gt;
Accounts, and then &amp;ldquo;Create New Account&amp;rdquo; to create a joeuser&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;can you please tell me where the button or the link &amp;ldquo;Create New Account&amp;rdquo; is located?&lt;/p&gt;
&lt;p&gt;i went with the user &amp;ldquo;dba&amp;rdquo; to system-admin-&amp;gt;user-accounts but i can just watch the users that are already in the system, but do not find a way to create a new account.&lt;/p&gt;
&lt;p&gt;i am using Virtuoso 6.1.1&lt;/p&gt;
&lt;p&gt;thank you for your help,&lt;/p&gt;
&lt;p&gt;greetings,&lt;/p&gt;
&lt;p&gt;Mario Kofler&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob&lt;/a&gt; on &lt;a href=&#34;#comment-2583&#34;&gt;July 1, 2010 11:52 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;It may have changed since February of last year. I would ask on a Virtuoso mailing list.&lt;/p&gt;
&lt;p&gt;By Jamshaid Ashraf on &lt;a href=&#34;#comment-2624&#34;&gt;September 17, 2010 7:29 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Mario,&lt;/p&gt;
&lt;p&gt;&amp;quot; &amp;gt; I created a new user by picking the System Admin tab, User\&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Accounts, and then &amp;ldquo;Create New Account&amp;rdquo; to create a joeuser&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;can you please tell me where the button or the link &amp;ldquo;Create New Account&amp;rdquo; is located? &amp;quot;&lt;/p&gt;
&lt;p&gt;You can find &amp;ldquo;Create New Account&amp;rdquo; as a link on the column heading of last column of user table. Though it looks like a link for sorting but in fact it is a link to &amp;lsquo;create new user&amp;rsquo; (implicitly explicit feature)&lt;/p&gt;
&lt;p&gt;reg&lt;br /&gt;
Jamshaid&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2009">2009</category>
      
      <category domain="https://www.bobdc.com//categories/rdf">RDF</category>
      
      <category domain="https://www.bobdc.com//categories/triplestores">triplestores</category>
      
    </item>
    
    <item>
      <title>MOTO connects Android to an e-ink display</title>
      <link>https://www.bobdc.com/blog/moto-connects-android-to-an-ei/</link>
      <pubDate>Sat, 14 Feb 2009 10:05:44 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/moto-connects-android-to-an-ei/</guid>
      
      
      <description><div>If I were Jeff Bezos, I&#39;d be nervous.</div><div>&lt;img id=&#34;id197042&#34; src=&#34;http://labs.moto.com/wp-content/uploads/2009/01/0209_gadget_labeled.png&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;some description&#34; width=&#34;200px&#34;/&gt;
&lt;p&gt;In &lt;a href=&#34;https://www.bobdc.com/blog/the-cheap-commodity-ebook-read&#34;&gt;The cheap commodity eBook reader of the future&lt;/a&gt;, I wrote about how I look forward to a mass-market ebook reader created from an e-ink display and an inexpensive commodity processor. The folks at &lt;a href=&#34;http://www.moto.com/&#34;&gt;MOTO&lt;/a&gt; have taken a very cool step in this direction by hooking up a processor running Google&amp;rsquo;s Linux-based Android ( the mobile phone operating system that underlies &lt;a href=&#34;http://gizmodo.com/5039741/t+mobile-android-htc-dream-launch-details-oct-13-199-w-2+year-contract-only&#34;&gt;T-Mobile&amp;rsquo;s G1 phone&lt;/a&gt;) to an e-ink display. MOTO&amp;rsquo;s &lt;a href=&#34;http://labs.moto.com/android-meets-e-ink/&#34;&gt;announcement&lt;/a&gt; about it includes a short video demo.&lt;/p&gt;
&lt;p&gt;As I suggested in a comment on their announcement, if MOTO got an ebook-reading program that understood the &lt;a href=&#34;http://www.idpf.org/&#34;&gt;EPUB&lt;/a&gt; format running on that Android processor—which I&amp;rsquo;m sure is much simpler than the work they&amp;rsquo;ve already done—it would make $400 for a Kindle look even dumber than it already looks.&lt;/p&gt;
&lt;h2 id=&#34;1-comments&#34;&gt;1 Comments&lt;/h2&gt;
&lt;p&gt;By penn on &lt;a href=&#34;#comment-2277&#34;&gt;May 29, 2009 9:23 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve been trying to decide which e-book reader to get for my mom. I figured I would have to get her both an ebook reader and a laptop, so she could download things (now she downloads ebooks from &lt;a href=&#34;http://www.ebook-search-queen.com/&#34;&gt;http://www.ebook-search-queen.com/&lt;/a&gt; ). Knowing that the Kindle doesn&amp;rsquo;t require her to go to her computer, or even have a wireless network setup, makes my decision easy. For her needs, it is the ideal item. Thank you for this!&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2009">2009</category>
      
      <category domain="https://www.bobdc.com//categories/ebooks">ebooks</category>
      
    </item>
    
    <item>
      <title>Getting started with Sesame</title>
      <link>https://www.bobdc.com/blog/getting-started-with-sesame/</link>
      <pubDate>Thu, 12 Feb 2009 09:15:09 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/getting-started-with-sesame/</guid>
      
      
      <description><div>Surprisingly easy.</div><div>&lt;p&gt;My &lt;a href=&#34;https://www.bobdc.com/blog/playing-with-some-rdf-stores&#34;&gt;efforts&lt;/a&gt; to set up and try RDF triplestores have been a bit frustrating. I won&amp;rsquo;t go into reasons here, because several of the efforts are on hold for now, but my attempts to set up and use &lt;a href=&#34;http://www.openrdf.org/&#34;&gt;Sesame&lt;/a&gt; went so quickly and easily that I wanted to write it up right away.&lt;/p&gt;
&lt;p&gt;My main goal with any of the triplestores is to load some RDF that will be stored persistently and then run some SPARQL queries against it. I can do some Java coding if I must, but I wanted to see how far I could get with each triplestore without doing any coding (and especially, no compiling). For bonus points, I wanted to see how much inferencing and OWL usage was possible, but in general I wanted to avoid special features in my initial research because I wanted to establish a baseline. (In an ideal world, OWL support would be part of the baseline!)&lt;/p&gt;
&lt;h2 id=&#34;id197086&#34;&gt;Installing and running Sesame&lt;/h2&gt;
&lt;p&gt;According to the &lt;a href=&#34;http://www.openrdf.org/doc/sesame2/users/ch06.html&#34;&gt;installation instructions&lt;/a&gt;, the Sesame server software requires Java 5 or later and &amp;ldquo;a Java Servlet Container that supports Java Servlet API 2.4 and Java Server Pages (JSP) 2.0, or newer&amp;rdquo;. They &amp;ldquo;recommend using a recent, stable version of Apache Tomcat&amp;rdquo;.&lt;/p&gt;
&lt;blockquote id=&#34;id197107&#34; class=&#34;pullquote&#34;&gt;It was very easy—the kind of &#34;it just works&#34; experience that&#39;s a particular pleasure to find in open source software.&lt;/blockquote&gt;
&lt;p&gt;I began by downloading and unzipping &lt;a href=&#34;http://sourceforge.net/project/showfiles.php?group_id=46509&amp;amp;package_id=168413&#34;&gt;the zip file of the 2.2.4 Sesame SDK&lt;/a&gt; and &lt;a href=&#34;http://tomcat.apache.org/download-60.cgi&#34;&gt;Apache Tomcat 6.0.18&lt;/a&gt;. The instructions in apache-tomcat-6.0.18\RUNNING.txt to get the Tomcat server up and running were simple and straightforward. To &lt;a href=&#34;http://www.openrdf.org/doc/sesame2/users/ch06.html&#34;&gt;install a Sesame server&lt;/a&gt; on top of Tomcat, I copied the two war files from openrdf-sesame-2.2.4\war to apache-tomcat-6.0.18\webapps. After I shut down and restarted Tomcat, sending my browser to http://localhost:8080/openrdf-workbench and http://localhost:8080/openrdf-sesame showed welcome screens about how these apps were running with no problem.&lt;/p&gt;
&lt;h2 id=&#34;id197166&#34;&gt;Using Sesame&lt;/h2&gt;
&lt;p&gt;The Workbench is the form-driven interface to Sesame. It let me create repositories, load remote or local data files into them, and then query them all by picking menu choices and filling out forms. It was very easy—the kind of &amp;ldquo;it just works&amp;rdquo; experience that&amp;rsquo;s a particular pleasure to find in open source software.&lt;/p&gt;
&lt;p&gt;The &lt;a href=&#34;http://www.openrdf.org/doc/sesame2/system/ch08.html&#34;&gt;REST HTTP protocol&lt;/a&gt; for doing all these things, which I tested using &lt;a href=&#34;http://curl.haxx.se/&#34;&gt;cURL&lt;/a&gt;, was well-documented and easy to to figure out. I After I&amp;rsquo;d created a &amp;ldquo;test1&amp;rdquo; repository using the Workbench, the following cURL command line listed repositories and showed that test1 was one of them (all curl command lines shown here include extra carriage returns for readability):&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;curl -H &amp;quot;Accept: application/sparql-results+xml, */*;q=0.5&amp;quot; 
  http://localhost:8080/openrdf-sesame/repositories
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The following loaded some RDF into the test1 repository:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;curl -T rdftest2.rdf -H &amp;quot;Content-Type: application/rdf+xml;charset=UTF-8&amp;quot;
  http://localhost:8080/openrdf-sesame/repositories/test1/statements
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Once the files triples were loaded, the following request sent a &lt;a href=&#34;http://www.xs4all.nl/~jlpoutre/BoT/Javascript/Utils/endecode.html&#34;&gt;URL-encoded&lt;/a&gt; version of the SPARQL query &amp;ldquo;SELECT DISTINCT ?p WHERE {?s ?p ?o}&amp;rdquo; to the test1 repository:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;curl -H &amp;quot;Accept:  application/sparql-results+xml, */*;q=0.5&amp;quot; 
  http://localhost:8080/openrdf-sesame/repositories/test1?query=
  SELECT%20DISTINCT%20%3Fp%20WHERE%20%7B%3Fs%20%3Fp%20%3Fo%7D
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The server sent back a &lt;a href=&#34;http://www.w3.org/TR/2008/REC-rdf-sparql-XMLres-20080115/&#34;&gt;SPARQL query result format&lt;/a&gt; version of the response.&lt;/p&gt;
&lt;h2 id=&#34;id197266&#34;&gt;Inferencing&lt;/h2&gt;
&lt;p&gt;The steps up to this point were all so easy that I decided to push my luck and try some inferencing. I wanted to:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Store someone&amp;rsquo;s home phone number and mobile phone number&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Declare that both of these properties were subproperties of phone&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Issue a SPARQL query saying &amp;ldquo;give me any phone numbers for this person&amp;rdquo;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;When you create a Sesame repository, there are &lt;a href=&#34;http://www.openrdf.org/doc/sesame2/users/ch07.html#section-console-repository-creation&#34;&gt;nine choices&lt;/a&gt; for the type of store ranging from &amp;ldquo;In Memory Store&amp;rdquo; to &amp;ldquo;PostgreSQL RDF Store&amp;rdquo; and &amp;ldquo;Remote RDF Store&amp;rdquo;. For my inferencing tests, I picked &amp;ldquo;Native Java Store RDF Schema&amp;rdquo;, called the repository rdftest1, and uploaded the following file into it:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;lt;rdf:RDF xmlns:o=&amp;quot;urn:schemas-microsoft-com:office:outlook#&amp;quot;
  xmlns:f=&amp;quot;http://xmlns.com/foaf/0.1/&amp;quot;
  xmlns:rdfs=&amp;quot;http://www.w3.org/2000/01/rdf-schema#&amp;quot;
  xmlns:rdf=&amp;quot;http://www.w3.org/1999/02/22-rdf-syntax-ns#&amp;quot;&amp;gt;


  &amp;lt;rdf:Description rdf:about=&amp;quot;http://localhost:2020/addrbook/RichardMutt&amp;quot;&amp;gt;
    &amp;lt;f:firstName&amp;gt;Richard&amp;lt;/f:firstName&amp;gt;
    &amp;lt;f:surname&amp;gt;Mutt&amp;lt;/f:surname&amp;gt;
    &amp;lt;o:homePhone&amp;gt;463-477-1322&amp;lt;/o:homePhone&amp;gt;
    &amp;lt;o:mobilePhone&amp;gt;463-215-8470&amp;lt;/o:mobilePhone&amp;gt;
  &amp;lt;/rdf:Description&amp;gt;


  &amp;lt;rdf:Property rdf:about=&amp;quot;urn:schemas-microsoft-com:office:outlook#homePhone&amp;quot;&amp;gt;
    &amp;lt;rdfs:subPropertyOf rdf:resource=&amp;quot;http://xmlns.com/foaf/0.1/phone&amp;quot; /&amp;gt;
  &amp;lt;/rdf:Property&amp;gt;


  &amp;lt;rdf:Property
    rdf:about=&amp;quot;urn:schemas-microsoft-com:office:outlook#mobilePhone&amp;quot;&amp;gt;
    &amp;lt;rdfs:subPropertyOf rdf:resource=&amp;quot;http://xmlns.com/foaf/0.1/phone&amp;quot; /&amp;gt;
  &amp;lt;/rdf:Property&amp;gt;


&amp;lt;/rdf:RDF&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Let&amp;rsquo;s say I want to call Richard but can&amp;rsquo;t remember which phone numbers I have for him. The following query asks for the type and number of any phone numbers I have for him, and the little table that follows the query shows what Sesame returned:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;PREFIX f:&amp;lt;http://xmlns.com/foaf/0.1/&amp;gt;
PREFIX o:&amp;lt;urn:schemas-microsoft-com:office:outlook#&amp;gt;


SELECT DISTINCT ?phoneType ?phoneNum WHERE { 
  ?s f:phone ?phoneNum; 
  ?phoneType ?phoneNum; 
  f:firstName &amp;quot;Richard&amp;quot;;
  f:surname &amp;quot;Mutt&amp;quot;.
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;phoneType                                           phoneNum&lt;/p&gt;
&lt;hr /&gt;
&lt;p&gt;&lt;a href=&#34;explore?resource=o%3AhomePhone&#34;&gt;o:homePhone&lt;/a&gt;       &lt;a href=&#34;explore?resource=%22463-477-1322%22&#34;&gt;&amp;ldquo;463-477-1322&amp;rdquo;&lt;/a&gt;
&lt;a href=&#34;explore?resource=f%3Aphone&#34;&gt;f:phone&lt;/a&gt;               &lt;a href=&#34;explore?resource=%22463-477-1322%22&#34;&gt;&amp;ldquo;463-477-1322&amp;rdquo;&lt;/a&gt;
&lt;a href=&#34;explore?resource=o%3AmobilePhone&#34;&gt;o:mobilePhone&lt;/a&gt;   &lt;a href=&#34;explore?resource=%22463-215-8470%22&#34;&gt;&amp;ldquo;463-215-8470&amp;rdquo;&lt;/a&gt;
&lt;a href=&#34;explore?resource=f%3Aphone&#34;&gt;f:phone&lt;/a&gt;               &lt;a href=&#34;explore?resource=%22463-215-8470%22&#34;&gt;&amp;ldquo;463-215-8470&amp;rdquo;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;(The returned data looks much better with the CSS included with openrdf-workbench.war. A &amp;ldquo;View Source&amp;rdquo; of the returned data reveals that it&amp;rsquo;s provided in the SPARQL query result format with a processing instruction at the top pointing to an XSLT stylesheet that works with the CSS to display the results nicely in the browser.) The links in the copy of the returned data above lead nowhere, but when you&amp;rsquo;re using it with a running copy of Workbench they let you explore around the data in the repository.&lt;/p&gt;
&lt;h2 id=&#34;id197750&#34;&gt;Next&lt;/h2&gt;
&lt;p&gt;It was particularly cool how many of these steps worked the first or second time I tried them, with no configuration required. I can think of more things I&amp;rsquo;d like to try with Sesame, but I&amp;rsquo;m going to keep trying out more triplestores and reporting on the ones that I have much luck with.&lt;/p&gt;
&lt;h2 id=&#34;14-comments&#34;&gt;14 Comments&lt;/h2&gt;
&lt;p&gt;By Bruce D&amp;rsquo;Arcus on &lt;a href=&#34;#comment-2219&#34;&gt;February 12, 2009 11:17 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;On the &amp;ldquo;really easy to setup a SPARQL endpoint&amp;rdquo; front, have you taken a look at &lt;a href=&#34;http://arc.semsol.org/&#34;&gt;ARC&lt;/a&gt;?&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-2220&#34;&gt;February 12, 2009 11:39 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Thanks Bruce, I will add that to my list, but judging from &lt;a href=&#34;http://arc.semsol.org/docs/v2/getting_started,&#34;&gt;http://arc.semsol.org/docs/v2/getting_started,&lt;/a&gt; it looks like I would need some familiarity with PHP first, which I don&amp;rsquo;t have, so this means that I may not get to it for a while.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://danbri.org/&#34; title=&#34;http://danbri.org/&#34;&gt;Dan Brickley&lt;/a&gt; on &lt;a href=&#34;#comment-2221&#34;&gt;February 12, 2009 3:41 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Many thanks for undertaking this survey, the results are really interesting.&lt;/p&gt;
&lt;p&gt;I tried a different task with Sesame the other day, but didn&amp;rsquo;t (yet) win. OK I was tired and maybe missed it in the documentation &amp;hellip; but all I wanted to do was load a SKOS rdf/xml document from disk, and explore it via API or (preferably) SPARQL. Couldn&amp;rsquo;t figure out how.&lt;/p&gt;
&lt;p&gt;I wonder whether FOAF+DOAP descriptions of the various SW toolkits could be made available from their providers. Perhaps even with extensions to point out where in their various documents, wikis etc., answers to these common questions can be found.&lt;/p&gt;
&lt;p&gt;Also, +1 on reviewing ARC. Assuming you have access to a server with basic PHP facilities, and the username/password for a MySQL db, it should be pretty smooth. ARC is a very impressive package in my experience&amp;hellip;&lt;/p&gt;
&lt;p&gt;By Bruce D&amp;rsquo;Arcus on &lt;a href=&#34;#comment-2222&#34;&gt;February 12, 2009 8:19 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Yeah, I have a problem with PHP as well. But less so than Java ;-)&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.troven.com.au&#34; title=&#34;http://www.troven.com.au&#34;&gt;Troven&lt;/a&gt; on &lt;a href=&#34;#comment-2440&#34;&gt;February 2, 2010 6:20 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Sesame is a great tool - and so is ARC. ARC has more power when extracting and presenting information in keeping with it&amp;rsquo;s PHP heritage. Sesame is a very powerful data modelling and storage engine - SAIL being the crown jewels.&lt;/p&gt;
&lt;p&gt;By Mihir Shivkumar Wagle on &lt;a href=&#34;#comment-2453&#34;&gt;March 2, 2010 10:53 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;It worked for the native store. But, it didn&amp;rsquo;t return any results when I tried it with MySQL or PostGres :(&lt;/p&gt;
&lt;p&gt;By Stratos on &lt;a href=&#34;#comment-2613&#34;&gt;August 29, 2010 9:32 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I tried your installation steps and they were very helpful indeed. I reach this problem though:&lt;/p&gt;
&lt;p&gt;I have a working apache tomcat.&lt;/p&gt;
&lt;p&gt;I downloaded the .jar i got the openrdf-sesame.war and the openrdf-workbench.war from the /war directory and i copied them to my apache tomcat directory /webapps/&lt;/p&gt;
&lt;p&gt;After a while i saw 2 new folders created openrdf-workbench/ and openrdf-sesame/&lt;/p&gt;
&lt;p&gt;I hit my servers adress &lt;a href=&#34;http://localhost:8080/openrdf-sesame/&#34;&gt;http://localhost:8080/openrdf-sesame/&lt;/a&gt; and I get:&lt;/p&gt;
&lt;p&gt;HTTP Status 404 -&lt;/p&gt;
&lt;p&gt;type Status report&lt;/p&gt;
&lt;p&gt;message&lt;/p&gt;
&lt;p&gt;description The requested resource () is not available.&lt;br /&gt;
Apache Tomcat/7.0.0&lt;/p&gt;
&lt;p&gt;Same with workbench.&lt;/p&gt;
&lt;p&gt;Any help would be really appreciated!&lt;/p&gt;
&lt;p&gt;By Stratos on &lt;a href=&#34;#comment-2614&#34;&gt;August 29, 2010 9:35 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I forgot to write it (i think). I restarted my tomcat before i checked the address&lt;/p&gt;
&lt;p&gt;Thanks again.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob&lt;/a&gt; on &lt;a href=&#34;#comment-2615&#34;&gt;August 29, 2010 11:06 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Stratos,&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s been a while since I played with it. openrdf.org seems to be down right now, but Google shows that it has a link to a mailing list which would probably be the best place to ask this.&lt;/p&gt;
&lt;p&gt;Bob&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://home.elka.pw.edu.pl/~mroj&#34; title=&#34;http://home.elka.pw.edu.pl/~mroj&#34;&gt;michał&lt;/a&gt; on &lt;a href=&#34;#comment-2616&#34;&gt;September 2, 2010 10:38 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Hi Stratos,&lt;br /&gt;
I had the same problem (Tomcat 7.0.2 + Sesame 2.3.2). I tried a lot of things but nothing helped. So I decided to install the version as similar as possible to the Bob&amp;rsquo;s settings. On 2nd Sept 2010 it was: Tomcat 6.0.29 + Sesame 2.2.4&amp;hellip; and everything went fine - exactly as in this blogs.&lt;/p&gt;
&lt;p&gt;michał&lt;/p&gt;
&lt;p&gt;By Jim Smart on &lt;a href=&#34;#comment-2620&#34;&gt;September 4, 2010 5:32 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I tried installing Sesame 2.3.2 into Tomcat 7.0.2-beta earlier today, and it didn&amp;rsquo;t work - I downgraded Tomcat 7 to 6.0.29 and then it worked for me :)&lt;/p&gt;
&lt;p&gt;Sesame 2.3.2 just doesn&amp;rsquo;t seem to work out-of-the-box on the current betas of Tomcat 7&lt;/p&gt;
&lt;p&gt;By Stratos on &lt;a href=&#34;#comment-2621&#34;&gt;September 11, 2010 1:34 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I used Tomcat 6 with the latest version of sesame2 and it worked like a charm. Thank you for the help everyone! :)&lt;/p&gt;
&lt;p&gt;By Jamshaid Ashraf on &lt;a href=&#34;#comment-2643&#34;&gt;October 4, 2010 7:00 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Yes, there is some problem with if you deploy sesame 2.3.2 on Tomcat7&lt;/p&gt;
&lt;p&gt;The best way to save time is deploy sesame 2.3.2 with Tomcat 6 and every things works as described in this blog&lt;/p&gt;
&lt;p&gt;jamshaid\&lt;/p&gt;
&lt;p&gt;By Pafka on &lt;a href=&#34;#comment-2777&#34;&gt;January 26, 2011 9:58 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Hi guys,&lt;br /&gt;
Did you try with Tomcat 7 the following URL:&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://localhost:8080/openrdf-workbench/&#34;&gt;http://localhost:8080/openrdf-workbench/&lt;/a&gt;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2009">2009</category>
      
      <category domain="https://www.bobdc.com//categories/rdf">RDF</category>
      
      <category domain="https://www.bobdc.com//categories/semantic-web">semantic web</category>
      
      <category domain="https://www.bobdc.com//categories/triplestores">triplestores</category>
      
    </item>
    
    <item>
      <title>What can publishing and semantic web technology offer to each other? </title>
      <link>https://www.bobdc.com/blog/publishing-and-semantic-web-te/</link>
      <pubDate>Fri, 06 Feb 2009 10:35:24 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/publishing-and-semantic-web-te/</guid>
      
      
      <description><div>That&#39;s &#34;semantic web technology&#34;, not &#34;the Semantic Web&#34;.</div><div>&lt;p&gt;Many have wondered about what the semantic web and publishing can offer each other. (By &amp;ldquo;publishing&amp;rdquo; here, I mean &amp;ldquo;making content available in one media or another, ideally to make money&amp;rdquo;.) After following a lot of writing and discussions in these two worlds—and they are surprisingly separate worlds—I have a few ideas and wanted to write them up where people could comment on them.&lt;/p&gt;
&lt;h2 id=&#34;id197060&#34;&gt;What can the publishing world offer to the semantic web?&lt;/h2&gt;
&lt;p&gt;The less obvious, but to me, the clearest win is what the publishing world can offer to the semantic web: the lessons learned from long practical experience with developing and applying taxonomies, such as identifying useful concepts, naming them, identifying the useful relationships between them, and mapping units of content to those concepts. Many of the if-you-build-it-they-will-come ontologies out there seem to be thrown together in the hope that someone will use them, with no examination of use cases beyond the needs of the individual developers who created them—and sometimes, not even a close look at those needs. Semantic web technology gives us the standards and tools to assign descriptive terms to resources so that people (and software agents) who need those resources can identify them more easily; taxonomy professionals know about best practices for picking good terms to assign that will help the larger project meet specific goals. (For an example of this thinking, see &lt;a href=&#34;http://web.fumsi.com/go/article/manage/3126&#34;&gt;part 1&lt;/a&gt; and &lt;a href=&#34;http://web.fumsi.com/go/article/manage/3198&#34;&gt;part 2&lt;/a&gt; of the article &amp;ldquo;Creating User-Centred Taxonomies&amp;rdquo; from the FUMSI group, which is just one of the resources I&amp;rsquo;ve learned about since I began following the &lt;a href=&#34;http://finance.groups.yahoo.com/group/TaxoCoP/&#34;&gt;Taxonomy Community of Practice&lt;/a&gt;.)&lt;/p&gt;
&lt;h2 id=&#34;id197122&#34;&gt;What can the semantic web offer to the world of publishing?&lt;/h2&gt;
&lt;p&gt;I&amp;rsquo;ve heard discussions in which publishers picture machine-readable encoded semantics of content driving customers to that content, but this sounds a little pie-in-the-sky for now. (I&amp;rsquo;d be happy if someone could point me to indications that working examples of this using semantic technology are imminent.) Publishers who want more people to find their content on the web would be better off putting greater effort into basic search engine optimization, and will find solid practical advice in Jamie Lowe&amp;rsquo;s &lt;a href=&#34;http://www.youtube.com/watch?v=gAkEilpmdSE&#34;&gt;SEO for Publishers&lt;/a&gt; presentation.&lt;/p&gt;
&lt;p&gt;Semantic web technologies, as opposed to the grander idea of the Semantic Web itself, offer tools that can help publishers assemble and distribute their content more efficiently, and I think that this low-hanging fruit is a better place to start, if only to get a better idea of the technology&amp;rsquo;s strengths and weaknesses.&lt;/p&gt;
&lt;blockquote id=&#34;id197157&#34; class=&#34;pullquote&#34;&gt;What can an aggregator/publisher do to take advantage of content metadata when the metadata fields for one source&#39;s articles don&#39;t line up with the fields in another source&#39;s articles? &lt;/blockquote&gt;
&lt;p&gt;More and more publishing these days is about aggregation. When so much content is available from so many places for free, we&amp;rsquo;re more likely to pay money for (or put up with ads next to) content selected by people whose judgment we trust. There are many models for aggregation, ranging from print publications such as &lt;a href=&#34;http://www.utne.com/daily.aspx&#34;&gt;Utne Reader&lt;/a&gt; to grand old online services such as &lt;a href=&#34;http://w3.nexis.com/new/&#34;&gt;Nexis&lt;/a&gt; and &lt;a href=&#34;http://factiva.com/&#34;&gt;Factiva&lt;/a&gt; to more Web 2.0-oriented approaches such as &lt;a href=&#34;http://digg.com/&#34;&gt;Digg&lt;/a&gt; and &lt;a href=&#34;http://www.reddit.com/&#34;&gt;Reddit&lt;/a&gt;. Now more than ever, publishers know that metadata makes it easier for both publishing staff and readers to track and connect relevant content, but a problem for aggregators is that while they&amp;rsquo;re happy to get metadata with the content that they collect, different content sources will send different sets of metadata.&lt;/p&gt;
&lt;p&gt;There may be certain fields of metadata that most content chunks have in common, such as Dublin Core fields, but what can an aggregator/publisher do to take advantage of content metadata when the metadata fields for one source&amp;rsquo;s articles don&amp;rsquo;t line up with the fields in another source&amp;rsquo;s articles? Or when the same thing happens with images?&lt;/p&gt;
&lt;p&gt;According to traditional practice, the aggregator should put this data into a database that may be built into a CMS or set up as a standalone relational system such as Oracle, MySQL, or SQL Server. In either case, a crucial step in the setup part is deciding what fields you want to track.&lt;/p&gt;
&lt;p&gt;Let&amp;rsquo;s say you define 10 fields of metadata to track. If an article arrives with 12 fields of metadata, but only 8 match fields that you&amp;rsquo;ve defined, you store those 8, throw out the other 4, and have 2 blanks left over. If, over time, you find yourself throwing out a particular field that more and more content providers have been including with their articles and images, you can modify your database schema or revise the customized fields in your CMS to start collecting that field from that point on, but this is rarely a quick and simple procedure, and all the values delivered to you for that field in content you&amp;rsquo;ve received up to then are still lost.&lt;/p&gt;
&lt;p&gt;The kind of technology developed to support semantic web projects offers an alternative. The RDF triples at the base of semantic technology let you store the fact that a particular resource (for example, a JPEG file) has a field with a particular name (for example, &amp;ldquo;resolution&amp;rdquo;) and a particular value for that field (for example, &amp;ldquo;72dpi&amp;rdquo;.) Actual resource and field names must be URLs to avoid confusion (I discussed this a bit &lt;a href=&#34;https://www.bobdc.com/blog/publishers-and-semantic-web-te&#34;&gt;last week&lt;/a&gt;); if you can do this, you can store any metadata about anything. The {resource, field name, field value} combination (more technically known as a subject/predicate/object) is called a triple, and the database managers that store them are called triplestores. Unlike relational database managers and production XML systems, the technology for working with these triples doesn&amp;rsquo;t need to know about field names in advance. The flexibility that this offers lets developers fit applications around their data instead of shoehorning their data into the current application&amp;rsquo;s requirements, which can put a lot of constraints on future possibilities for both the applications and the data.&lt;/p&gt;
&lt;p&gt;This flexibility does offer the possibility that two publishers might use different field names for the same concept, as Dale Waldt described in the posting I responded to last week, but the OWL part of the semantic web technology stack can help to account for that. For example, what if two publishers use different URLs to indicate the title of an article? If one uses a term from the Adobe &lt;a href=&#34;http://www.adobe.com/products/xmp/&#34;&gt;XMP&lt;/a&gt; namespace to assign an article a &lt;a href=&#34;http://ns.adobe.com/xap/1.0/Title&#34;&gt;http://ns.adobe.com/xap/1.0/Title&lt;/a&gt; value of &amp;ldquo;The Trans-Siberian Railroad&amp;rdquo;, and the other publisher assigns another article an &lt;a href=&#34;http://purl.org/dc/elements/1.1/title&#34;&gt;http://purl.org/dc/elements/1.1/title&lt;/a&gt; value of &amp;ldquo;Across Canada by Train&amp;rdquo;, a bit of OWL (as demonstrated in my response to Dale) can show that these terms mean the same thing so that a single query for titles retrieves both articles.&lt;/p&gt;
&lt;p&gt;If you as an aggregator feel that it would be easier for your suppliers to use a more normalized set of vocabulary terms, get them together and talk about it. This is what standards groups such as &lt;a href=&#34;http://www.oasis-open.org/&#34;&gt;OASIS&lt;/a&gt; and &lt;a href=&#34;http://www.idealliance.org/&#34;&gt;IDEAlliance&lt;/a&gt; are for. (IDEAlliance&amp;rsquo;s &lt;a href=&#34;http://www.prismstandard.org/&#34;&gt;PRISM&lt;/a&gt; standard, whose motto is &amp;ldquo;Developing a standard XML metadata vocabulary for the publishing industry&amp;rdquo;, is just such a group, and they include an RDF profile as part of their standard.)&lt;/p&gt;
&lt;h2 id=&#34;id197612&#34;&gt;Getting More Semantic&lt;/h2&gt;
&lt;p&gt;If I&amp;rsquo;m recommending semantic web tools to help you keep track of things such as the resolution of your digital images, you might ask &amp;ldquo;what&amp;rsquo;s so semantic about that?&amp;rdquo; It&amp;rsquo;s not particularly semantic, but it uses semantic web technology to track metadata that helps your staff and customers more easily find the content that they need, so it does help toward the greater goal. If you want to push this technology a little further to incorporate metadata about the semantics of the content—without spending money on software—look into &lt;a href=&#34;http://www.opencalais.com/&#34;&gt;OpenCalais&lt;/a&gt;, which analyzes content and returns a copy with RDF representations of key terms it found and information about what classes those key terms fall into (for example, that &amp;ldquo;Slumdog Millionaire&amp;rdquo; is a Movie or that &amp;ldquo;Golden Globe&amp;rdquo; is an EntertainmentAwardEvent). I played with the first release of OpenCalais to create the &lt;a href=&#34;http://www.snee.com/blogbigpicture/&#34;&gt;BlogBigPicture&lt;/a&gt; website, which uses this metadata to ease navigation of news about Hollywood gossip, investing, the British Premier League, world business, and U.S. politics. You can take the metadata that OpenCalais returns and store it in the triplestore of metadata about your content as easily as you can store information about the resolution of your digital images.&lt;/p&gt;
&lt;p&gt;Don&amp;rsquo;t let the grander ideas about semantics distract you too much just yet, though. Prototypes aimed at lower-hanging fruit will give you a better focus on which of the grand ideas can help your business. There&amp;rsquo;s plenty of free software available to create these prototypes, and even Oracle provides support for triplestores nowadays. So, if you&amp;rsquo;re interested in what semantic web technology can do for your publishing business, start thinking about some inexpensive short-term projects that will give you a better idea of the long-term possibilities.&lt;/p&gt;
&lt;h2 id=&#34;4-comments&#34;&gt;4 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.ivan-herman.net/&#34; title=&#34;http://www.ivan-herman.net/&#34;&gt;Ivan Herman&lt;/a&gt; on &lt;a href=&#34;#comment-2216&#34;&gt;February 7, 2009 3:43 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Hi Bob,&lt;/p&gt;
&lt;p&gt;On the issue of what the Library world can give to the Semantic Web: another thing is a set of stable URI-s at least in their domain of discourse. A question I have asked before: what is the URI that I can use on the SW to make statements on Bach&amp;rsquo;s Hohe Messe or Thomas Mann&amp;rsquo;s novel &amp;lsquo;Joseph and his brothers&amp;rsquo;? Sure, some of these entities are on wikipedia, hence one can use their DBpedia URI-s. But for many items in the literary or musical world, just to take these two examples, this would not work. References set up to those &amp;lsquo;works&amp;rsquo; by major libraries (with suitable sameAs statements if there are different libraries giving URIs to the same work) would be great, and they are in a unique position to do that&amp;hellip;&lt;/p&gt;
&lt;p&gt;Cheers&lt;/p&gt;
&lt;p&gt;Ivan&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-2217&#34;&gt;February 7, 2009 9:48 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Hi Ivan,&lt;/p&gt;
&lt;p&gt;Identifying specific editions with a URI is easy: &lt;a href=&#34;http://www.rfc-archive.org/getrfc.php?rfc=3187,&#34;&gt;http://www.rfc-archive.org/getrfc.php?rfc=3187,&lt;/a&gt; e.g. urn:isbn:1400040019.&lt;/p&gt;
&lt;p&gt;For a single URI to represent a work, it&amp;rsquo;s a lot tougher&amp;ndash;are you looking for a single URI to represent &amp;ldquo;Joseph and his Brothers&amp;rdquo;, &amp;ldquo;Joseph und seine Brüder&amp;rdquo;, &amp;ldquo;Joseph and His Brothers: The Stories of Jacob, Young Joseph, Joseph in Egypt, Joseph the Provider&amp;rdquo;, and &amp;ldquo;Joseph the Provider (Joseph and his Brothers, Young Joseph, Joseph in Egypt)&amp;rdquo;, (the latter two being titles on Amazon)?&lt;/p&gt;
&lt;p&gt;I could ask questions like &amp;ldquo;If an author revises and/or retitles a work in his or her own lifetime, should the new version get a new URI?&amp;rdquo; but some committee should be able to work out standards for something like that. The bigger, tougher question is the proper jurisdiction of URI assignment for a particular work:&lt;br /&gt;
if a publisher assigns an ISBN number to an edition of a work that they&amp;rsquo;re publishing, who would assign a specific URI that covers multiple editions from multiple publishers to a work like this Mann novel?&lt;/p&gt;
&lt;p&gt;thanks,&lt;/p&gt;
&lt;p&gt;Bob\&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://thewebsemantic.com&#34; title=&#34;http://thewebsemantic.com&#34;&gt;Taylor&lt;/a&gt; on &lt;a href=&#34;#comment-2218&#34;&gt;February 8, 2009 9:37 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&amp;ldquo;Don&amp;rsquo;t let the grander ideas about semantics distract you too much just yet, though. Prototypes aimed at lower-hanging fruit&amp;rdquo;&lt;/p&gt;
&lt;p&gt;Good advice Bob. I&amp;rsquo;ve placed my own goal of being the URL provider Ivan mentions, but for the travel space, to creating an internal semantic tool for tracking software/machines and processes. I&amp;rsquo;ve been able to engage more support and promote the idea with tangible, visible benefits better.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://ibrg.zoo.ox.ac.uk&#34; title=&#34;http://ibrg.zoo.ox.ac.uk&#34;&gt;Dr David Shotton&lt;/a&gt; on &lt;a href=&#34;#comment-2259&#34;&gt;April 30, 2009 8:43 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Dear Bob,&lt;/p&gt;
&lt;p&gt;Thanks for your useful comments. I would like to bring to your attention three articles that I have published in April 2009 on the subject of semantic publishing, in the hope of contributing to this debate, detailed at &lt;a href=&#34;http://imageweb.zoo.ox.ac.uk/pub/2008/publications/Shotton_Articles_on_Semantic_Publishing.pdf&#34;&gt;http://imageweb.zoo.ox.ac.uk/pub/2008/publications/Shotton_Articles_on_Semantic_Publishing.pdf&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;An excellent critique of what we describe in the first of these articles is given by Rod Page at &lt;a href=&#34;http://iphylo.blogspot.com/2009/04/semantic-publishing-towards-real.html,&#34;&gt;http://iphylo.blogspot.com/2009/04/semantic-publishing-towards-real.html,&lt;/a&gt; in which, in essence, he correctly says we did not go far enough in terms of making machine-readable data and metadata available, thereby failing to contribute to an ecosystem of linked data (&lt;a href=&#34;http://linkeddata.org/)&#34;&gt;http://linkeddata.org/)&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The third article, on our Citation Typing Ontology, also comments indirectly on the problem discussed in your blog by Ivan and yourself about URIs for particular works. I believe the issues surrounding this problem of URIs are best clarified by adopting the FRBR classification (&lt;a href=&#34;http://www.ifla.org/publications/functional-requirements-for-bibliographic-records;&#34;&gt;http://www.ifla.org/publications/functional-requirements-for-bibliographic-records;&lt;/a&gt; &lt;a href=&#34;http://www.frbr.org/;&#34;&gt;http://www.frbr.org/;&lt;/a&gt; &lt;a href=&#34;http://en.wikipedia.org/wiki/FRBR),&#34;&gt;http://en.wikipedia.org/wiki/FRBR),&lt;/a&gt; developed by librarians to distinguish works, expressions and manifestations. URIs are most conveniently used to refer to expressions of works - the same items to which DOIs refer (&lt;a href=&#34;http://www.doi.org/)&#34;&gt;http://www.doi.org/)&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I hope you find these papers interesting and helpful.&lt;/p&gt;
&lt;p&gt;Kind regards,&lt;/p&gt;
&lt;p&gt;David&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2009">2009</category>
      
      <category domain="https://www.bobdc.com//categories/publishing">publishing</category>
      
      <category domain="https://www.bobdc.com//categories/semantic-web">semantic web</category>
      
    </item>
    
    <item>
      <title>Publishers and semantic web technology</title>
      <link>https://www.bobdc.com/blog/publishers-and-semantic-web-te/</link>
      <pubDate>Thu, 29 Jan 2009 13:47:14 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/publishers-and-semantic-web-te/</guid>
      
      
      <description><div>A response to Dale Waldt&#39;s Gilbane XML posting on semantics and the web.</div><div>&lt;p&gt;My old friend Dale Waldt (I remember, immediately after the announcement of the existence of XML at SGML 1996, going up to my then-coworker Dale and asking &amp;ldquo;So what do we think?&amp;rdquo;) recently posted an entry on the Gilbane XML blog titled &lt;a href=&#34;http://gilbane.com/xml/2009/01/why-adding-semantics-to-web-da.html&#34;&gt;Why Adding Semantics to Web Data is Difficult&lt;/a&gt;. A few days ago I posted a comment saying that the things that he saw as missing from semantic technologies are actually already there and working well, but my reply hasn&amp;rsquo;t shown up yet, so after a bit of revision, I&amp;rsquo;m putting it here. For my blog entry categories, I&amp;rsquo;ve put this under &amp;ldquo;Publishing&amp;rdquo; because most of what I&amp;rsquo;ve written below is already familiar to people in the semantic web world, but not as widely known in the publishing world.&lt;/p&gt;
&lt;p&gt;Dale wrote:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Consider though, that the schema in use can tell us the names of semantically defined elements, but not necessarily their meaning. I can tell you something about a piece of data by using the &lt;income&gt; tag, but how, in a schema can I tell you it is a net &lt;income&gt; calculated using the guidelines of US Internal Revenue Service, and therefore suitable for eFiling my tax return? For that matter, one system might use the element type name &amp;lt;net_income&amp;gt; while another might use &lt;inc&gt;.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This is why the semantic web is built around URLs, not just element names. If someone refers to a &amp;ldquo;title&amp;rdquo; and you don&amp;rsquo;t know whether that person is an HR administrator who means &amp;ldquo;job title&amp;rdquo; or a realtor referring to the deed to a piece of property, you don&amp;rsquo;t know what they mean. However, if I refer to a &lt;a href=&#34;http://purl.org/dc/elements/1.1/title&#34;&gt;http://purl.org/dc/elements/1.1/title&lt;/a&gt;, you know that I mean the title of a work or resource, because the URL makes it clear that I&amp;rsquo;m referring to the Dublin Core sense of the term.&lt;/p&gt;
&lt;blockquote id=&#34;id197068&#34; class=&#34;pullquote&#34;&gt;The things that Dale saw as missing from semantic technologies are actually already there and working well.&lt;/blockquote&gt;
&lt;p&gt;As I understand it, XBRL&amp;rsquo;s goal was not to standardize the vocabularies of element type names as much to standardize ways of identifying them. For example, in GE&amp;rsquo;s XBRL financial statement, they chose to identify net income with the URL &lt;a href=&#34;http://www.xbrl.org/us/fr/common/pte/2005-02-28&#34;&gt;http://www.xbrl.org/us/fr/common/pte/2005-02-28&lt;/a&gt;#usfr-pte:NetIncome and have this &lt;a href=&#34;http://www.secinfo.com/%24/SEC/Filing.asp?D=17Je.vx.5&#34;&gt;declared in a filed document&lt;/a&gt;. Instead of encouraging everyone to create their own new vocabularies, though, the XBRL effort did create a &lt;a href=&#34;http://xbrl.us/pages/us-gaap.aspx&#34;&gt;set of US GAAP taxonomies&lt;/a&gt;, and these are forming a core set of documented, commonly understood terminology for U.S. accounting.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;How will we know that elements labeled with &amp;lt;net_income&amp;gt; and &lt;inc&gt; are the same and should be handled as such?&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Let&amp;rsquo;s assume that company X uses the term &amp;ldquo;net_income&amp;rdquo; and company Y uses the term &amp;ldquo;inc&amp;rdquo;. When they publicly define what they mean by these terms using OWL ontologies or XBRL taxonomies, they avoid the confusion you describe by defining them with URLs, just as the OCLC did for Dublin Core terms, so let&amp;rsquo;s say the terms&amp;rsquo; full names are &lt;a href=&#34;http://www.x.com/ns/xbrl/net_income&#34;&gt;http://www.x.com/ns/xbrl/net_income&lt;/a&gt; and &lt;a href=&#34;http://www.y.com/some/path/inc&#34;&gt;http://www.y.com/some/path/inc&lt;/a&gt;. (Of course, if an XML document includes the namespace declarations xmlns:x=&amp;ldquo;&lt;a href=&#34;http://www.x.com/ns/xbrl/%22&#34;&gt;http://www.x.com/ns/xbrl/&amp;quot;&lt;/a&gt; and xmlns:y=&amp;ldquo;&lt;a href=&#34;http://www.y.com/some/path/%22&#34;&gt;http://www.y.com/some/path/&amp;quot;&lt;/a&gt;, the element names can use the abbreviations x:net_income and y:inc.)&lt;/p&gt;
&lt;p&gt;The following bit of OWL asserts that they&amp;rsquo;re both the same as GE&amp;rsquo;s term for net income, and a SPARQL query that uses the GE URL to say &amp;ldquo;get me net income figures&amp;rdquo; will get the others as well:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;lt;owl:ObjectProperty 
  rdf:about=&amp;quot;http://www.xbrl.org/us/fr/common/pte/2005-02-28#usfr-pte:NetIncome&amp;quot;&amp;gt;


  &amp;lt;owl:equivalentProperty&amp;gt;
    &amp;lt;owl:DatatypeProperty rdf:about=&amp;quot;http://www.x.com/ns/xbrl/net_income&amp;quot;/&amp;gt;
  &amp;lt;owl:equivalentProperty&amp;gt;


  &amp;lt;owl:equivalentProperty&amp;gt;
    &amp;lt;owl:DatatypeProperty rdf:about=&amp;quot;http://www.y.com/some/path/inc&amp;quot;/&amp;gt;
  &amp;lt;/owl:equivalentProperty&amp;gt;


&amp;lt;/owl:ObjectProperty&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This nicely demonstrates the potential of OWL as metadata that adds value to existing bodies of data.&lt;/p&gt;
&lt;p&gt;OWL has been a standard for four years, and there are several implementations available that let you do this. (Speaking of semantics, in addition to defining such equivalences, OWL can also &lt;a href=&#34;https://www.bobdc.com/blog/adding-semantics-to-make-data&#34;&gt;encode semantics&lt;/a&gt;.)&lt;/p&gt;
&lt;p&gt;The great thing about OWL&amp;rsquo;s relationship to XBRL is that much of XBRL is about defining taxonomies and semantics, and OWL is about building on such definitions to get more value out of data.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Obviously a industry standard like XBRL (eXtensible Business Reporting Language) can help standardize vocabularies for element type names, but this cannot be the whole solution or XBRL use would be more widespread.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;XBRL helps to standardize naming within the world of business reporting, but the need for vocabulary definition standards and tools goes well beyond that world. (The full set of XBRL specs is also a complex solution to a complex problem, which slows the adoption from getting widespread very quickly.) The goal of RDFS was to help people define such vocabularies, but &lt;a href=&#34;https://www.bobdc.com/blog/rdfs-without-rdfowl&#34;&gt;OWL provides a superset of RDFS&lt;/a&gt; and offers more slick tools, so people sometimes build OWL ontologies when they only need an RDFS vocabulary.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;I think the Semantic Web will require more than schemas and XML-aware search tools to reach its full potential in intelligent data and applications that process them. What is probably needed is a concerted effort to build semantic data and tools that can process these included browsing, data storage, search, and classification tools.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;For data storage and search, commercial and open source &lt;a href=&#34;http://esw.w3.org/topic/LargeTripleStores&#34;&gt;triplestore tools&lt;/a&gt; are available. (I &lt;a href=&#34;https://www.bobdc.com/blog/playing-with-some-rdf-stores&#34;&gt;recently mentioned&lt;/a&gt; that I&amp;rsquo;ve been blogging less because I&amp;rsquo;ve been looking into them.) For browsing, new &lt;a href=&#34;http://www.google.com/search?q=%22semantic%20web%22%20firefox&#34;&gt;semantic web Firefox plugins&lt;/a&gt; crop up all the time. I&amp;rsquo;ll discuss classification next week, but as a hint, it turns around the question of what semantic web technology can bring to the publishing world—it&amp;rsquo;s more about what they can learn from the publishing world.&lt;/p&gt;
&lt;h2 id=&#34;2-comments&#34;&gt;2 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://blogs.law.harvard.edu/pkeane&#34; title=&#34;http://blogs.law.harvard.edu/pkeane&#34;&gt;Peter Keane&lt;/a&gt; on &lt;a href=&#34;#comment-2210&#34;&gt;January 29, 2009 4:12 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The Semantic Web ideals, while quite exciting, have always struck me as too much of an all-or-none proposition: either my data is part of this universal graph of knowledge or it isn&amp;rsquo;t, based on whether I have encoded my data in triples (RDF, RDFa, etc). But it is not always the consumer that needs help &amp;ldquo;understanding&amp;rdquo; my data&amp;rsquo;s place in that graph &amp;ndash; I as the producer do as well. And semantic assertions (i.e., this tag equals dc:title) take time and understanding, which many/most do not have. What they DO have is domain knowledge. E.g., &amp;ldquo;Here&amp;rsquo;s what the figures in this column of ths spreadsheet I am publishing on the web as an HTML table mean.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;d love to see tools that allow publishers to make their data &amp;ldquo;smarter&amp;rdquo; over time &amp;ndash; not as an all-or-none proposition. Yahoo&amp;rsquo;s Search Monkey and perhaps GRDDL (?) are perhaps steps in the right direction. As another example, tagging seems to be a fairly easy-to-grok and easy-to-implement feature. How about more focus on something simple like tagging and tools that allow the publisher to then create equivalencies between a tag on their site and some domain-specific ontology if such exists (and probably best we don&amp;rsquo;t use the word &amp;ldquo;ontology&amp;rdquo; ;-)). My take (influenced by working w/ folks in higher ed) is that folks are willing to do a bit of work to &amp;ldquo;rationalize&amp;rdquo; their data, esp. if they gain some benefit. But not a lot of work, and especially not if they need to understand a whole new world of knowledge representation.&lt;/p&gt;
&lt;p&gt;Our approach has been to create a system (theoretically) as easy to use as FilemakerPro, Microsoft Access, Excel, etc. Users can create arbitrary sets of &amp;ldquo;attributes&amp;rdquo; for their collections of digital things (audio, images, video, documents, web pages) and then assign values as they wish. They may start with just a title and date, but when possible they may add much more detailed metadata. And commercial sets of, say, images+metadata are easy to incorporate as well.&lt;/p&gt;
&lt;p&gt;Everything is stored in a backend with Atom/AtomPub interfaces in and out. The key-value pairs are simply held in atom:category elements &amp;ndash; one atom entry for every item in the system. Many of these collections do, in fact, map to existing metadata schemes, VRA Core4 for images, for example. But indeed, eveything has a scheme, if only local to that one users collection. Much is gained here in terms of interoperability (Google spreadsheets is becoming a favorite data creation tools, since it is so easy to &amp;ldquo;import&amp;rdquo; into our system), preservation, data portability, etc. And if/when a set of data needs to enter the cloud of linked data, asserting the equivalencies and serializing to RDF is quite easy.&lt;/p&gt;
&lt;p&gt;I guess my point is that there is some low-hanging fruit on the way to the Semantic Web that does not require publishers to join up here and now. Simply thinking in terms of regularizing metadata schemes, data portability, simple xml-based formats (Atom +1) get us a very significant way along a useful path. Not, certainly, the whole vision of the Semantic Web but quite useful nonetheless.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://gilbane.com&#34; title=&#34;http://gilbane.com&#34;&gt;Frank Gilbane&lt;/a&gt; on &lt;a href=&#34;#comment-2211&#34;&gt;January 29, 2009 6:44 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Bob:&lt;/p&gt;
&lt;p&gt;Sorry your comment didn&amp;rsquo;t show up. I just found it in the comment spam folder, published it, and sent Dale an email.&lt;/p&gt;
&lt;p&gt;Frank&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2009">2009</category>
      
      <category domain="https://www.bobdc.com//categories/publishing">publishing</category>
      
    </item>
    
    <item>
      <title>Playing with some RDF stores</title>
      <link>https://www.bobdc.com/blog/playing-with-some-rdf-stores/</link>
      <pubDate>Mon, 26 Jan 2009 22:33:48 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/playing-with-some-rdf-stores/</guid>
      
      
      <description><div>Instead of blogging.</div><div>&lt;p&gt;I recently realized that most of my experience with RDF has been with tools that load triples into memory and then work with them there, so I&amp;rsquo;ve decided to get to know the disk-based triplestores out there better: Jena, Joseki, Sesame, AllegroGraph, OpenLink, Mulgara&amp;hellip; let me know if I&amp;rsquo;m missing anything here.&lt;/p&gt;
&lt;p&gt;This is consuming just about all of my free time at the computer (of which I have little lately because of some very long hours for the employer), so I&amp;rsquo;ve had a lot less time to write for the weblog. When I&amp;rsquo;ve gotten further with this research, though, I&amp;rsquo;ll have a lot to write about.&lt;/p&gt;
&lt;h2 id=&#34;7-comments&#34;&gt;7 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.linkedin.com/in/erics&#34; title=&#34;http://www.linkedin.com/in/erics&#34;&gt;Eric Schoonover&lt;/a&gt; on &lt;a href=&#34;#comment-2205&#34;&gt;January 26, 2009 11:16 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;d really encourage you to take a look at Intellidimensions Semantic Server product. It runs on top of SQL server (any edition including Express) and there is a 60 day trial version. They also have a free academic license.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://www.intellidimension.com/&#34;&gt;http://www.intellidimension.com/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Note: make sure if you are using SQL Server Express that you pull down the advanced version that includes full text indexing.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.ldodds.com/blog&#34; title=&#34;http://www.ldodds.com/blog&#34;&gt;Leigh Dodds&lt;/a&gt; on &lt;a href=&#34;#comment-2206&#34;&gt;January 27, 2009 7:23 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Hi Bob,&lt;/p&gt;
&lt;p&gt;If you want to add the Talis Platform to your list of services to explore, then just drop me a mail and I&amp;rsquo;ll get you set up with a developer account.\&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bobx DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-2207&#34;&gt;January 27, 2009 8:14 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Thanks Leigh!&lt;/p&gt;
&lt;p&gt;Eric, I may try that, since I do have a copy of SQL Server running.&lt;/p&gt;
&lt;p&gt;Bob&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://thefigtrees.net/lee/blog/&#34; title=&#34;http://thefigtrees.net/lee/blog/&#34;&gt;Lee Feigenbaum&lt;/a&gt; on &lt;a href=&#34;#comment-2208&#34;&gt;January 29, 2009 12:27 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Bob,&lt;/p&gt;
&lt;p&gt;You&amp;rsquo;re also welcome to try out Open Anzo - &lt;a href=&#34;http://openanzo.org&#34;&gt;http://openanzo.org&lt;/a&gt; - but really, most of these stores aren&amp;rsquo;t (or shouldn&amp;rsquo;t be) directly comparable. Most of them have their sweet spot(s), whether it be raw speed, clustering/scalability, federation, enterprise features, collaboration, full-on inference, lightweight (e.g. RDFS) inferencing, etc.&lt;/p&gt;
&lt;p&gt;Horses for courses, and all that.&lt;/p&gt;
&lt;p&gt;Anyway, drop me a line if you&amp;rsquo;re interested to hear more about my personal take on various stores&amp;rsquo; sweet spots, or, even better, I&amp;rsquo;d love to hear what you think after you&amp;rsquo;ve played around some. :-)&lt;/p&gt;
&lt;p&gt;Lee&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://thewebsemantic.com&#34; title=&#34;http://thewebsemantic.com&#34;&gt;Taylor&lt;/a&gt; on &lt;a href=&#34;#comment-2209&#34;&gt;January 29, 2009 11:58 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Bob,&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;m totally biased, I like Jena&amp;hellip;most of all because I don&amp;rsquo;t have enough time to learn more that one semweb tool ;-), but also because they have some SPARQL heavy hitters and it supports the latest greatest sparql goodness (updates). So to tempt you along I&amp;rsquo;ve edited this example for you&amp;hellip;&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://tinyurl.com/bzuqrk&#34;&gt;http://tinyurl.com/bzuqrk&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve been playing with different ways to code to the jena api, this example is jenabean&amp;rsquo;s &amp;ldquo;Thing&amp;rdquo; which uses simple interfaces to simplify asserting new triples. It makes it easy to polymorph into various vocabs, the library comes with just a few, but it&amp;rsquo;s very easy to create more.&lt;/p&gt;
&lt;p&gt;By Martin Brousseau on &lt;a href=&#34;#comment-2213&#34;&gt;February 4, 2009 5:37 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Hi Bob,&lt;br /&gt;
Don&amp;rsquo;t forget to add BigOWLIM and Virtuoso to your shopping list.&lt;br /&gt;
Known to be among the most scalable triple store. BigOWLIM is using the Sesame api.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-2215&#34;&gt;February 5, 2009 12:34 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Thanks Martin. Virtuoso was on my list from the beginning. By adding an OWL layer to Sesame, BigOWLIM looks very cool, so I will definitely be playing with it.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2009">2009</category>
      
      <category domain="https://www.bobdc.com//categories/rdf">RDF</category>
      
      <category domain="https://www.bobdc.com//categories/triplestores">triplestores</category>
      
    </item>
    
    <item>
      <title>Our long national nightmare is over</title>
      <link>https://www.bobdc.com/blog/our-long-national-nightmare-is/</link>
      <pubDate>Mon, 19 Jan 2009 23:08:25 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/our-long-national-nightmare-is/</guid>
      
      
      <description><div>According to a highly specialized hardware device.</div><div>&lt;img id=&#34;id197063&#34; src=&#34;https://www.bobdc.com/img/main/backwardsbush.jpg&#34; border=&#34;0&#34; style=&#34;display: block;margin-left: auto;margin-right: auto &#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;&#39;Backwards Bush&#39; device&#34; width=&#34;480px&#34;/&gt;
&lt;h2 id=&#34;2-comments&#34;&gt;2 Comments&lt;/h2&gt;
&lt;p&gt;By Prateek on &lt;a href=&#34;#comment-2203&#34;&gt;January 20, 2009 1:59 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;ROFL.. =)&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.TimothyHorrigan.com&#34; title=&#34;http://www.TimothyHorrigan.com&#34;&gt;Timothy Horrigan&lt;/a&gt; on &lt;a href=&#34;#comment-2204&#34;&gt;January 20, 2009 10:38 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Actually Bush II&amp;rsquo;s term doesn&amp;rsquo;t run out till exactly noon DC time.&lt;/p&gt;
&lt;p&gt;In 1989, as they usually do, the incoming Vice President (Dan Quayle) took his oath of office about 5 minutes before noon, and then they had about 10 minutes of music and pomp and circumstance before the incoming President (Bush I) actually took his oath. During that interim, President Reagan fell asleep&amp;mdash; not for good, it was just a little mini-nap. I was wondering what would happen if Reagan died before Bush took the oath&amp;hellip; might Quayle become President. At that moment, the anchorperson (probably Tom Brokaw, but maybe Peter Jennings or his CBS counterpart) took the mike to assure us that Bush automatically became President at the stroke of noon, regardless of whether or not he had been sworn in yet.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2009">2009</category>
      
      <category domain="https://www.bobdc.com//categories/miscellaneous">miscellaneous</category>
      
    </item>
    
    <item>
      <title>Displaying a message box from the Windows command line</title>
      <link>https://www.bobdc.com/blog/displaying-a-message-box-from/</link>
      <pubDate>Sat, 17 Jan 2009 13:19:26 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/displaying-a-message-box-from/</guid>
      
      
      <description><div>With no special software or compiling; just a little scripting.</div><div>&lt;img id=&#34;id197045&#34; src=&#34;https://www.bobdc.com/img/main/msgbox.jpg&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;message box created by script&#34;/&gt;
&lt;p&gt;When I run a time-consuming batch file that executes perl scripts or XSLT stylesheets on hundreds of files, I usually end the batch file with an &lt;code&gt;echo&lt;/code&gt; command with only a Control-G as its output, so that a beep lets me know that the job is done. Processing some client files while watching &lt;a href=&#34;http://www.idealliance.org/xml2008/schedule-details.asp#at2&#34;&gt;Mark Birbeck speak at XML 2008&lt;/a&gt;, I knew it would be rude to have my computer emit such an obnoxious beep, so I found a nice alternative: a command line way to display a message box about my task being finished using only native Windows features.&lt;/p&gt;
&lt;p&gt;First, I needed a short Windows JavaScript script like this, which I called msgbox.js:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;if (WScript.Arguments.length &amp;lt; 1) {
    msg = &amp;quot;No message supplied&amp;quot;
}
else {
  msg = &amp;quot;&amp;quot;;
  for (i = 0; i &amp;lt; WScript.Arguments.length; i++) {
      msg = msg + WScript.Arguments.Item(i) + &amp;quot; &amp;quot;;
  }
}
WScript.Echo(msg);
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;If it&amp;rsquo;s invoked with any arguments, it displays them as the text of a message box.&lt;/p&gt;
&lt;p&gt;Then, I wrote this one-line batch file, which I called msgbox.bat, to call the JavaScript script:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;wscript \util\msgbox.js %*
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;WScript is the more Windows-oriented sibling of CScript, the Windows JavaScript engine that I&amp;rsquo;ve &lt;a href=&#34;https://www.bobdc.com/blog/windows-command-line-text-proc&#34;&gt;written about before&lt;/a&gt;. They&amp;rsquo;re both included with Windows.&lt;/p&gt;
&lt;p&gt;Now, if I end a batch file with this line,&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;msgbox Yo! The fixfiles.bat batch file is all done.
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;the message box shown above displays.&lt;/p&gt;
&lt;p&gt;Of course there are dozens of other ways to display a message box, but it&amp;rsquo;s always nice to find a way to do something useful with minimal code and no downloaded or newly purchased software.&lt;/p&gt;
&lt;h2 id=&#34;2-comments&#34;&gt;2 Comments&lt;/h2&gt;
&lt;p&gt;By gonzalo rodriguez on &lt;a href=&#34;#comment-2374&#34;&gt;November 26, 2009 6:51 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Greetings:&lt;br /&gt;
Thank you very much for that information, did not know WScript.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.web-panda.ru&#34; title=&#34;http://www.web-panda.ru&#34;&gt;Alexander&lt;/a&gt; on &lt;a href=&#34;#comment-2436&#34;&gt;January 25, 2010 5:38 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Hi, thanks for the post. I have looked for a fast and simple way to display a message box under Windows. And I have found it ! =)&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2009">2009</category>
      
      <category domain="https://www.bobdc.com//categories/neat-tricks">neat tricks</category>
      
    </item>
    
    <item>
      <title>Hey CNN, SPARQL isn&#39;t so difficult.</title>
      <link>https://www.bobdc.com/blog/hey-cnn-sparql-isnt-so-difficu/</link>
      <pubDate>Thu, 08 Jan 2009 09:19:12 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/hey-cnn-sparql-isnt-so-difficu/</guid>
      
      
      <description><div>And like any programming language, it doesn&#39;t have to be convoluted.</div><div>&lt;img id=&#34;id197052&#34; src=&#34;http://upload.wikimedia.org/wikipedia/en/thumb/5/59/Missy-sup-dupa-fly.jpg/200px-Missy-sup-dupa-fly.jpg&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;Supa Dupa Fly cover&#34; width=&#34;150px&#34;/&gt;
&lt;p&gt;In the December 17th cnn.com/technology article &lt;a href=&#34;http://www.cnn.com/2008/TECH/12/17/db.semanticweb/&#34;&gt;Making sense of the &amp;lsquo;semantic Web&amp;rsquo;&lt;/a&gt;, Steve Mollman wrote:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Consider, for instance, SPARQL, a query language. To find, say, music artists associated with the producer Timbaland, you&amp;rsquo;d have to type a long piece of convoluted code that most of us wouldn&amp;rsquo;t bother to do.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Steve: try pasting the following into DBPedia&amp;rsquo;s &lt;a href=&#34;http://dbpedia.org/sparql&#34;&gt;SPARQL Query Form&lt;/a&gt; and then clicking the &amp;ldquo;Run Query&amp;rdquo; button:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;PREFIX d: &amp;lt;http://dbpedia.org/property/&amp;gt;


SELECT DISTINCT ?artist WHERE {
  ?album d:artist ?artist.
  ?album d:producer &amp;lt;http://dbpedia.org/resource/Timbaland&amp;gt;.
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Or just click &lt;a href=&#34;http://dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.org&amp;amp;should-sponge=&amp;amp;query=PREFIX+d%3A+%3Chttp%3A%2F%2Fdbpedia.org%2Fproperty%2F%3E%0D%0A%0D%0ASELECT+DISTINCT+%3Fartist+WHERE+%7B%0D%0A++%3Falbum+d%3Aartist+%3Fartist.%0D%0A++%3Falbum+d%3Aproducer++%3Chttp%3A%2F%2Fdbpedia.org%2Fresource%2FTimbaland%3E.%0D%0A%7D%0D%0A&amp;amp;format=text%2Fhtml&amp;amp;debug=on&#34;&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;That wasn&amp;rsquo;t so bad, was it? I could make it shorter, but if you&amp;rsquo;re not familiar with basic SPARQL queries, you might consider the shorter version more convoluted. Of course, it doesn&amp;rsquo;t have the elegant clarity of this bit of JavaScript included as part of your article&amp;rsquo;s web page:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;if(cnnWinExtraRegExp.test(cnnWinExtra)){var cnnOmniExtra = 
  cnnWinExtraRegExp.split(cnnWinExtra);cnnWinLoc = cnnWinLoc + cnnOmniExtra[0];}
else {cnnWinLoc = cnnWinLoc + cnnWinExtra;}}
if (typeof(cnnPageName) != &amp;quot;undefined&amp;quot;) {s.pageName = 
  cnnPageName;s.eVar1 = cnnPageName;} else {s.pageName = cnnWinLoc;s.eVar1 = cnnWinLoc;}
if (typeof(cnnSectionName) != &amp;quot;undefined&amp;quot;) {s.channel=cnnSectionName;s.eVar2=cnnSectionName;} 
else {s.channel=&amp;quot;Nonlabeled&amp;quot;;s.eVar2=&amp;quot;Nonlabeled&amp;quot;;}
if (typeof(cnnSubSectionName) != &amp;quot;undefined&amp;quot;) 
{s.server=cnnSubSectionName;s.eVar3=cnnSubSectionName;} else {s.server=&amp;quot;&amp;quot;;s.eVar3=&amp;quot;&amp;quot;;}
if (typeof(cnnSectionFront) != &amp;quot;undefined&amp;quot;) {s.prop1=cnnSectionFront;} 
if (typeof(cnnContentType) != &amp;quot;undefined&amp;quot;) {s.prop4=cnnContentType;s.prop6=s.pageName;}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;As with SQL and other query languages, no one expects end users to type out SPARQL queries like the one above, but someone who already knows a scripting language or two can pick up SPARQL and use it to build new kinds of applications. Like the JavaScript included in your article&amp;rsquo;s web page, SPARQL will play an increasingly valuable role in bringing information to people.&lt;/p&gt;
&lt;p&gt;And being Elvis&amp;rsquo;s birthday today, I&amp;rsquo;d like to express my hope that the next time someone does another updated remix of an Elvis tune to follow &amp;ldquo;A Little Less Conversation&amp;rdquo; and &amp;ldquo;Rubberneckin&amp;rsquo;,&amp;rdquo; I hope it&amp;rsquo;s Timbaland.&lt;/p&gt;
&lt;h2 id=&#34;2-comments&#34;&gt;2 Comments&lt;/h2&gt;
&lt;p&gt;By carmen on &lt;a href=&#34;#comment-2201&#34;&gt;January 8, 2009 10:15 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;theres something to be said about not inventing another syntax.&lt;/p&gt;
&lt;p&gt;metaweb has shown how SPARQL-like queries can be formatted in JSON. this makes it much easier for end-users of browser-based tools.&lt;/p&gt;
&lt;p&gt;agree with the original point of the post. why do these cnet/nyt/cnn type places like to make sweeping remarks?&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-2202&#34;&gt;January 8, 2009 11:36 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Do you have a URL for that?&lt;/p&gt;
&lt;p&gt;I must admit, I&amp;rsquo;ve been hearing &amp;ldquo;SPARQL-like&amp;rdquo; from so many different directions that I really prefer to go with the actual standard. The syntax isn&amp;rsquo;t so bad, and there are a lot of implementations out there.\&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2009">2009</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
    </item>
    
    <item>
      <title>Turtles all the way down</title>
      <link>https://www.bobdc.com/blog/turtles-all-the-way-down/</link>
      <pubDate>Tue, 30 Dec 2008 14:13:34 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/turtles-all-the-way-down/</guid>
      
      
      <description><div>A nice early version, without the turtles.</div><div>&lt;p&gt;&lt;a href=&#34;http://en.wikipedia.org/wiki/Turtles_all_the_way_down&#34;&gt;&lt;img id=&#34;id197030&#34; src=&#34;http://upload.wikimedia.org/wikipedia/en/thumb/5/5a/Yertle_the_Turtle_and_Other_Stories_cover.png/200px-Yertle_the_Turtle_and_Other_Stories_cover.png&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;&#39;Yertle the Turtle&#39; cover&#34; width=&#34;160px&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve been reading &lt;a href=&#34;http://www.gutenberg.org/dirs/etext00/eduha10h.htm&#34;&gt;The Education of Henry Adams&lt;/a&gt; because I heard that this descendant of two US presidents had some interesting perspectives on the effects of technological progress on peoples&amp;rsquo; lives—in his case, in the latter half of the 19th century, when things changed more than they have in the second half of the 20th. Near the end, he quotes the French mathematician Henri Poincaré:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Doubtless if our means of investigation should become more and more penetrating, we should discover the simple under the complex; then the complex under the simple; then anew the simple under the complex; and so on without ever being able to foresee the last term.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;It reminds me of the &lt;a href=&#34;http://en.wikipedia.org/wiki/Turtles_all_the_way_down&#34;&gt;turtles all the way down&lt;/a&gt; story, whose earliest mentions come several decades after Poincaré and Adams. Wikipedia has a nice overview of the various location/lecturer/audience-member attributions included in popular versions of this story of the earth&amp;rsquo;s cosmology.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2008">2008</category>
      
    </item>
    
    <item>
      <title>A belated Christmas wish: a SPARQL endpoint for Digg RDF</title>
      <link>https://www.bobdc.com/blog/a-christmas-wish-a-sparql-endp/</link>
      <pubDate>Fri, 26 Dec 2008 14:47:00 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/a-christmas-wish-a-sparql-endp/</guid>
      
      
      <description><div>Or consider it a lazy semweb wish.</div><div>&lt;p&gt;I&amp;rsquo;ve been looking for a SPARQL endpoint that provides new data fairly regularly—not just new triples to query, but data that is new to the world, such as from a stock ticker feed. If the &lt;a href=&#34;http://www.betanews.com/article/Digg_makes_official_its_adoption_of_a_semantic_Web_standard/1209743762&#34;&gt;RDFa on digg.com pages&lt;/a&gt; was accumulated in a database that could be queried as a SPARQL endpoint, that would certainly qualify, and it would be fun to play with.&lt;/p&gt;
&lt;p&gt;&lt;img id=&#34;id197053&#34; src=&#34;http://ebiquity.umbc.edu/blogger/wp-content/uploads//2006/05/sparql.png&#34; border=&#34;0&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;sparql logo&#34; width=&#34;140pt&#34;/&gt; &lt;img id=&#34;id197070&#34; src=&#34;http://www.nelsonguirado.com/media/blogs/choiceplease/digg.gif&#34; border=&#34;0&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;digg logo&#34; width=&#34;140pt&#34;/&gt;&lt;/p&gt;
&lt;h2 id=&#34;2-comments&#34;&gt;2 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.openlinksw.com/blog/~kidehen&#34; title=&#34;http://www.openlinksw.com/blog/~kidehen&#34;&gt;Kingsley Idehen&lt;/a&gt; on &lt;a href=&#34;#comment-2197&#34;&gt;December 29, 2008 11:25 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Bob,&lt;/p&gt;
&lt;p&gt;Please try:&lt;br /&gt;
&lt;a href=&#34;http://demo.openlinksw.com/sparql&#34;&gt;http://demo.openlinksw.com/sparql&lt;/a&gt; or /isparql&lt;/p&gt;
&lt;p&gt;This instance of Virtuoso includes are in-built Sponger Middleware.&lt;/p&gt;
&lt;p&gt;The Sponger Middleware converts a plethora of none RDF resources into RDF based Linked Data &amp;ldquo;on the fly&amp;rdquo;.&lt;/p&gt;
&lt;p&gt;The Sponger is also integrated into the Virtuoso SPARQL processor so you can put any resource URL in the &amp;ldquo;FROM&amp;rdquo; clause of a SPARQL query. The effect is that you can SPARQL against any Web resource URI via a Virtuoso sparql endpoint.&lt;/p&gt;
&lt;p&gt;1. If a local graph IRI matching the resource URL doesn&amp;rsquo;t exist, the Sponger will crawl the resource&lt;br /&gt;
2. The localized resource is then RDFized (we have RDFizers aka. Cartridges for about 30 different data source types which includes Digg)&lt;br /&gt;
3. The Graph IRI for the sponged resource is always the same as the original resource URL.&lt;/p&gt;
&lt;p&gt;Basically, the Sponger is like a Driver Manager, but instead of dealing with relational data (ala. ODBC, JDBC etc..) it offers dynamic binding to RDF Drivers / Providers / Cartridges which take on the duty of transforming negotiated resource representations into RDF based Linked Data.&lt;/p&gt;
&lt;p&gt;Sample links:&lt;/p&gt;
&lt;p&gt;1. &lt;a href=&#34;http://tinyurl.com/6wu8nt&#34;&gt;http://tinyurl.com/6wu8nt&lt;/a&gt; - this is an ODE page which is the output of SPARQL passed through an HTML template for browsing&lt;br /&gt;
2. &lt;a href=&#34;http://tinyurl.com/7qagne&#34;&gt;http://tinyurl.com/7qagne&lt;/a&gt; &amp;ndash; raw SPARQL endpoint variant&lt;/p&gt;
&lt;p&gt;&lt;br /&gt;
Additional information:&lt;/p&gt;
&lt;p&gt;1. &lt;a href=&#34;http://virtuoso.openlinksw.com/presentations/Virtuoso_Sponger_1/Virtuoso_Sponger_1.html&#34;&gt;http://virtuoso.openlinksw.com/presentations/Virtuoso_Sponger_1/Virtuoso_Sponger_1.html&lt;/a&gt;&lt;br /&gt;
2. &lt;a href=&#34;http://virtuoso.openlinksw.com/Whitepapers/pdf/sponger_whitepaper_10102007.pdf&#34;&gt;http://virtuoso.openlinksw.com/Whitepapers/pdf/sponger_whitepaper_10102007.pdf&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://bigasterisk.com/&#34; title=&#34;http://bigasterisk.com/&#34;&gt;drewp&lt;/a&gt; on &lt;a href=&#34;#comment-2200&#34;&gt;January 2, 2009 11:43 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I put up an endpoint for one of my projects. It&amp;rsquo;s not exciting data, but it is new each day.&lt;/p&gt;
&lt;p&gt;announcement&lt;br /&gt;
&lt;a href=&#34;http://drewp.quickwitretort.com/2009/01/02/0&#34;&gt;http://drewp.quickwitretort.com/2009/01/02/0&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;endpoint&lt;br /&gt;
&lt;a href=&#34;http://whatsplayingnext.com/sparql&#34;&gt;http://whatsplayingnext.com/sparql&lt;/a&gt;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2008">2008</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
    </item>
    
    <item>
      <title>Adding metadata value with Pellet</title>
      <link>https://www.bobdc.com/blog/adding-metadata-value-with-pel/</link>
      <pubDate>Mon, 22 Dec 2008 09:37:09 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/adding-metadata-value-with-pel/</guid>
      
      
      <description><div>A nice new feature of Pellet 2.0.</div><div>&lt;p&gt;The open-source program &lt;a href=&#34;http://clarkparsia.com/pellet&#34;&gt;Pellet&lt;/a&gt; is described as an OWL reasoner, but I&amp;rsquo;ve used it mostly as a SPARQL engine that happens to understand OWL. So, for example, if I have RDF that says &amp;ldquo;Loretta&amp;rsquo;s spouse is Leroy and spouse is a symmetric property,&amp;rdquo; but the data makes no mention of Leroy&amp;rsquo;s spouse, and I ask Pellet &amp;ldquo;who is Leroy&amp;rsquo;s spouse,&amp;rdquo; it can give me the answer.&lt;/p&gt;
&lt;p&gt;Most SPARQL engines can&amp;rsquo;t do this kind of OWL inferencing, and I thought it would be cool if Pellet could read a batch of RDF with some facts and some OWL properties, infer what it can, and then write out a copy of the RDF with all the implicit facts made explicit. This way, the less intelligent SPARQL engines could take advantage of the inferred data. It&amp;rsquo;s one of those holy grails in publishing technology: a process that reads data and adds value to it (in this case, by adding new facts that weren&amp;rsquo;t there before) and then writes out the data in a standard format so that other programs can use it. Pellet 2.0&amp;rsquo;s new &lt;code&gt;extract&lt;/code&gt; subcommand now makes this possible.&lt;/p&gt;
&lt;p&gt;First, let&amp;rsquo;s review how Pellet would run a SPARQL query against some sample data and infer a new fact to answer a question that a non-reasoning SPARQL engine could not answer. The following RDF/XML sample has a few facts about Leroy and Loretta and specifies that the spouse property is symmetric (that is, that if X is the spouse of Y, then Y is the spouse of X):&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;lt;-- spousedemo.rdf --&amp;gt;
&amp;lt;rdf:RDF xmlns:rdf=&amp;quot;http://www.w3.org/1999/02/22-rdf-syntax-ns#&amp;quot;
         xmlns:owl=&amp;quot;http://www.w3.org/2002/07/owl#&amp;quot;
         xmlns=&amp;quot;http://www.snee.com/ns/abook#&amp;quot;&amp;gt;


  &amp;lt;rdf:Description rdf:about=&amp;quot;L1&amp;quot;&amp;gt;
    &amp;lt;first&amp;gt;Leroy&amp;lt;/first&amp;gt;
    &amp;lt;last&amp;gt;Lockhorn&amp;lt;/last&amp;gt;
  &amp;lt;/rdf:Description&amp;gt;


  &amp;lt;rdf:Description rdf:about=&amp;quot;L2&amp;quot;&amp;gt;
    &amp;lt;first&amp;gt;Loretta&amp;lt;/first&amp;gt;
    &amp;lt;last&amp;gt;Lockhorn&amp;lt;/last&amp;gt;
    &amp;lt;spouse rdf:resource=&amp;quot;L1&amp;quot;/&amp;gt;
  &amp;lt;/rdf:Description&amp;gt;


  &amp;lt;owl:ObjectProperty rdf:about=&amp;quot;http://www.snee.com/ns/abook#spouse&amp;quot;&amp;gt;
    &amp;lt;rdf:type rdf:resource=&amp;quot;http://www.w3.org/2002/07/owl#SymmetricProperty&amp;quot;/&amp;gt;
  &amp;lt;/owl:ObjectProperty&amp;gt;


&amp;lt;/rdf:RDF&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Leroy has no &lt;code&gt;spouse&lt;/code&gt; property, and if I tell a SPARQL engine such as &lt;a href=&#34;http://jena.sourceforge.net/ARQ/&#34;&gt;ARQ&lt;/a&gt; to run the following query against the RDF above to ask who Leroy&amp;rsquo;s spouse is, it won&amp;rsquo;t have anything to tell us. Old or new versions of Pellet, though, will read this query and tell us that Leroy&amp;rsquo;s spouse is Loretta Lockhorn because that information is available to it after it uses the extra OWL metadata to infer what it can.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;PREFIX a: &amp;lt;http://www.snee.com/ns/abook#&amp;gt;
SELECT ?spouseFirst ?spouseLast WHERE {


       ?s a:first  &amp;quot;Leroy&amp;quot;;
          a:last   &amp;quot;Lockhorn&amp;quot;;
          a:spouse ?spouse.


       ?spouse a:first ?spouseFirst;
               a:last  ?spouseLast.
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Pellet 2.0&amp;rsquo;s &lt;code&gt;extract&lt;/code&gt; subcommand reads RDF, does any inferencing it can from included OWL metadata, and then writes out RDF that includes the inferenced data. The following command line shows how I used it. (Additional command line parameters let you control just how much inferenced data Pellet adds when doing this.)&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;pellet extract --input-format RDF/XML spousedemo.rdf  &amp;gt; temp.rdf
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This copies all the triples from spousedemo.rdf to temp.rdf and includes new data such as the bolded part in the following (the &amp;ldquo;j.0&amp;rdquo; prefix is assigned to the URL that was the default namespace in spousedemo.rdf):&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;  &amp;lt;rdf:Description rdf:about=&amp;quot;http://www.snee.com/ns/ID#L1&amp;quot;&amp;gt;
    &amp;lt;j.0:last&amp;gt;Lockhorn&amp;lt;/j.0:last&amp;gt;
    &amp;lt;j.0:first&amp;gt;Leroy&amp;lt;/j.0:first&amp;gt;
    &amp;lt;j.0:spouse rdf:resource=&amp;quot;http://www.snee.com/ns/ID#L2&amp;quot;/&amp;gt;
    &amp;lt;rdf:type rdf:resource=&amp;quot;http://www.w3.org/2002/07/owl#Thing&amp;quot;/&amp;gt;
  &amp;lt;/rdf:Description&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;If I ask ARQ to run the query shown earlier on temp.rdf, it can tell me the name of Leroy&amp;rsquo;s spouse, because Pellet&amp;rsquo;s &lt;code&gt;extract&lt;/code&gt; subcommand has made temp.rdf a richer data file than spousedemo.rdf.&lt;/p&gt;
&lt;p&gt;Declaring the spouse property to be symmetric is just a small bit of metadata added to the data shown in the file. OWL can add all kinds of metadata, and Pellet now makes it even easier to take take advantage of that metadata.&lt;/p&gt;
&lt;p&gt;For me, this small bit of metadata also proves something important about the value of semantic technology: while it would be silly to try to encode all the semantics of the word &amp;ldquo;spouse&amp;rdquo; in a machine-readable form, encoding just this small bit of the word&amp;rsquo;s semantics—that it&amp;rsquo;s a symmetric property—can add value to data and let you answer questions that you couldn&amp;rsquo;t answer before.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://www.cafepress.com/lockhorn&#34;&gt;&lt;img id=&#34;id197211&#34; class=&#34;centerImage&#34; src=&#34;https://www.bobdc.com/img/main/lockhorns20051030.gif&#34; border=&#34;0&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;Lockhorns&#39; semantic cartoon display: block;margin-left: auto;margin-right: auto &#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&#34;1-comments&#34;&gt;1 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://clarkparsia.com/&#34; title=&#34;http://clarkparsia.com/&#34;&gt;Kendall Clark&lt;/a&gt; on &lt;a href=&#34;#comment-2198&#34;&gt;December 30, 2008 10:36 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Nice post, Bob! This sort of use of OWL and RDF is just the kind of insanely boring but incredibly useful thing that too often gets overlooked. ;&amp;gt;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2008">2008</category>
      
      <category domain="https://www.bobdc.com//categories/rdf/owl">RDF/OWL</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
    </item>
    
    <item>
      <title>Picking XML schemas and tools?</title>
      <link>https://www.bobdc.com/blog/picking-xml-schemas-and-tools/</link>
      <pubDate>Tue, 16 Dec 2008 03:46:40 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/picking-xml-schemas-and-tools/</guid>
      
      
      <description><div>Then first think about your content and users.</div><div>&lt;p&gt;At last week&amp;rsquo;s &lt;a href=&#34;http://www.idealliance.org/xml2008/&#34;&gt;XML in Practice 2008&lt;/a&gt; conference, I joined &lt;a href=&#34;http://dubinko.info/blog/2008/12/08/xml-2008-non-liveblog-content-authoring-schemas/&#34;&gt;Micah Dubinko&lt;/a&gt;, Evan Lenz, and Frank Miller for the panel on working with authoring tools and schemas. (Lisa Bos of Really Strategies did a fine job hosting the panel; she should consider doing one of those &lt;a href=&#34;https://www.bobdc.com/blog/compress-those-podcasts&#34;&gt;interview podcast&lt;/a&gt; shows.) The panel&amp;rsquo;s full title mentioned both DITA and DocBook, and while Mark Shellenberger &lt;a href=&#34;http://twitter.com/mshellenberger/statuses/1045813198&#34;&gt;predicted&lt;/a&gt; a &amp;ldquo;cage match,&amp;rdquo; several people later seemed disappointed that there weren&amp;rsquo;t more DITA/DocBook partisan sparks flying. I prefer not to take identity politics to the point of identifying myself with only one technical content schema, and I think that Micah, Evan, and Frank felt the same way. (I&amp;rsquo;d love to be the moderator of a Norm Walsh/Eliot Kimber discussion on DocBook/DITA issues, though.)&lt;/p&gt;
&lt;blockquote id=&#34;id197069&#34; class=&#34;pullquote&#34;&gt;A schema is metadata whose job is to add value to data.&lt;/blockquote&gt;
&lt;p&gt;When it was my turn to introduce myself and my background, I wanted to draw a connection from the panel topic to the services of my employer, &lt;a href=&#34;http://www.innodata-isogen.com/&#34;&gt;Innodata Isogen&lt;/a&gt;, so I mentioned that we had a lot of experience helping publishers find a good fit between their content, schemas (sometimes DocBook, and sometimes DITA!), tools, and users. This was off the top of my head, but I thought about it more as Evan introduced himself and jotted in my notebook: &amp;ldquo;content/schemas/tools/users&amp;rdquo;.&lt;/p&gt;
&lt;p&gt;People often want to know what the best schema is, or the best editing tool. I got to thinking about how the best way to determine those two parts of the content/schemas/tools/users lineup is to take a good hard look at the other two.&lt;/p&gt;
&lt;p&gt;Content analysis is underrated. People discuss the virtues of one schema or another as if the schema by itself will do something for them, but a schema is metadata whose job is to add value to data. If you&amp;rsquo;re wondering how well each of three schemas fits your content, then type and paste some of your content into documents that conform to those schemas and see for yourself. Once, while helping the &lt;a href=&#34;http://www.prismstandard.org/&#34;&gt;PRISM&lt;/a&gt; standard group think through a content DTD to go with their metadata spec, I typed up &lt;a href=&#34;http://www.ew.com&#34;&gt;Entertainment Weekly&lt;/a&gt; interviews with Will Smith and Tommy Lee Jones the week that &amp;ldquo;Men in Black II&amp;rdquo; came out (Entertainment Weekly is a Time Inc. publication; so is Mad Magazine, as I found out during a PRISM meeting in their building) in DocBook and one or two other DTDs that I can&amp;rsquo;t remember right now. Doing this makes it much clearer whether the schema has the data and metadata elements and attributes you need and if its required structures fit your structure.&lt;/p&gt;
&lt;p&gt;To frame any thoughts about users of the authoring tools and schemas, consider the two extremes: on the one hand, especially if you&amp;rsquo;re in aerospace or some other heavy industry, you might have a staff of users who use powerful, higher-priced, editing tools because that&amp;rsquo;s the job specialty you need from them. If it&amp;rsquo;s not their job specialty, and you need it to be, you arrange training. In the other extreme, you might be a legal publisher whose authors include your country&amp;rsquo;s leading expert on bankruptcy, and you&amp;rsquo;re happy enough to publish this author&amp;rsquo;s treatise on bankruptcy law that if he or she turns it in on floppy disks with WordPefect 4.2 files, you&amp;rsquo;ll do what you must in order to convert that content into the XML that you use to drive your publishing system. If you gave this author an $800 XML authoring tool and a week of training, you&amp;rsquo;d probably annoy this valued author more than anything else.&lt;/p&gt;
&lt;p&gt;Most content creators in XML publishing scenarios fall between these two extremes. There are a lot of them who are comfortable with Word but who can be convinced to use something similarly WYSIWYGgy that imposes the structure you need, but you might not have $800 plus training costs to spend on them. Don&amp;rsquo;t lose heart; there are alternatives.&lt;/p&gt;
&lt;p&gt;Once you have a better idea of what your content needs and your users and budget can handle, it&amp;rsquo;s easier to think about the best schema and tools for your system. You can remove the tools question from consideration if you contract with a business partner to create the XML for you; you specify the schema you want (or work with them to determine the best one) and the quality levels you want, and then they do it for you. One of Innodata Isogen&amp;rsquo;s newer services that we&amp;rsquo;ve had increasing success with is in content origination and authoring. For people wondering about how to work with another firm to have them take on these tasks, innodata-isogen.com&amp;rsquo;s &amp;ldquo;Knowledge Center&amp;rdquo; has a new section titled &lt;a href=&#34;http://www.innodata-isogen.com/knowledge_center/content_origination&#34;&gt;Outsourcing Content Origination and Authoring: Closing the Publishing Loop&lt;/a&gt;, which includes a white paper covering the issues, upcoming webinars to listen to people with long experience with this, and more.&lt;/p&gt;
&lt;h2 id=&#34;1-comments&#34;&gt;1 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.manorfieldconsulting.com&#34; title=&#34;http://www.manorfieldconsulting.com&#34;&gt;Mark Shellenberger&lt;/a&gt; on &lt;a href=&#34;#comment-2196&#34;&gt;December 16, 2008 11:43 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;You must admit the panel had the potential for sparks.&lt;/p&gt;
&lt;p&gt;It is a testament to the professionalism and skill of the panelists that it didn&amp;rsquo;t devolve.&lt;/p&gt;
&lt;p&gt;I particularly agree with your statement &amp;ldquo;content analysis is underrated&amp;rdquo;. Too often people say &amp;ldquo;I want to use X schema/DTD&amp;rdquo; without having looked at their data to see if that makes any sense. And it is very difficult to convince them otherwise, even after doing some of that analysis.&lt;/p&gt;
&lt;p&gt;Thanks for fleshing out some of the things you said during the session.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2008">2008</category>
      
      <category domain="https://www.bobdc.com//categories/xml">XML</category>
      
    </item>
    
    <item>
      <title>Looking forward to XML 2008</title>
      <link>https://www.bobdc.com/blog/looking-forward-to-xml-2008/</link>
      <pubDate>Fri, 05 Dec 2008 12:44:25 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/looking-forward-to-xml-2008/</guid>
      
      
      <description><div>And seeing some friends and learning about new developments.</div><div>&lt;p&gt;The first time I went to the annual conference that will be called &lt;a href=&#34;http://www.idealliance.org/xml2008/&#34;&gt;XML-in-Practice 2008&lt;/a&gt; this year (but which I think of as &amp;ldquo;XML 2008&amp;rdquo;), it was called SGML &amp;lsquo;95. It grew from there and morphed into an XML conference, and when the dot com boom supported several XML conferences a year, this was the best and biggest. It&amp;rsquo;s slimmed down over the years, and I hate to admit that I might not go if it was going to be a &lt;a href=&#34;https://www.bobdc.com/blog/metadata-and-metadata&#34;&gt;conference full of strangers&lt;/a&gt;, but I know I&amp;rsquo;ll see some old friends, and the chance to bounce ideas off other XML geeks in person is still very appealing, especially when it&amp;rsquo;s a two-hour drive away from home.&lt;/p&gt;
&lt;p&gt;The &lt;a href=&#34;http://www.idealliance.org/xml2008/schedule.asp&#34;&gt;presentation grid&lt;/a&gt; has interesting looking things from both friends and strangers:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;I&amp;rsquo;ve been more interested in taxonomies lately, and a presentation like Guthrie Collins&amp;rsquo;s &lt;a href=&#34;http://www.idealliance.org/xml2008/schedule-details.asp#media1&#34;&gt;Using XML at The Associated Press for Taxonomies and Revenue Generation&lt;/a&gt; will cover content relevant to everyone—newspaper articles—from a leader in the field, and hearing how they combine taxonomies with repurposing to generate new revenue should be very interesting.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;I haven&amp;rsquo;t had a chance to check out the &lt;a href=&#34;http://code.google.com/p/ubiquity-xforms/&#34;&gt;ubiquity-xforms&lt;/a&gt; project hosted at Google Code, so I look forward to seeing Mark Birbeck demo and describe it in his &lt;a href=&#34;http://www.idealliance.org/xml2008/schedule-details.asp#at2&#34;&gt;Declarative Ajax programming with Ubiquity Xforms&lt;/a&gt; presentation. An AJAX-friendly open source implementation could be just what Xforms needs to give it greater traction.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The description for Lisa Bos and Chandi Perera&amp;rsquo;s &lt;a href=&#34;http://www.idealliance.org/xml2008/schedule-details.asp#media4&#34;&gt;Driving XML workflows through Creative Suite&lt;/a&gt; makes it look like they&amp;rsquo;ve found one of those holy grails that people often seek in publishing: round-tripping between Word, XML, and a CMS with serious XML support.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;At 3 on Monday, I&amp;rsquo;ll probably stray away from the publishing track to hear about &lt;a href=&#34;http://www.idealliance.org/xml2008/schedule-details.asp#gov4&#34;&gt;Authoring and Publishing Legislative Documents in XML&lt;/a&gt; at the US House of Representatives and the Senate. (In the same time slot: &lt;a href=&#34;http://www.idealliance.org/xml2008/schedule-details.asp#at4&#34;&gt;Accelerating DITA with OmniMark&lt;/a&gt;. I didn&amp;rsquo;t know that Omnimark was still around.)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;I know the basics of Schematron pretty well, but it will be fun to hear &lt;a href=&#34;http://www.idealliance.org/xml2008/schedule-details.asp#ft10&#34;&gt;Wendell Piez&lt;/a&gt; explain it.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Other topics that look interesting: Tony Coates&amp;rsquo; &lt;a href=&#34;http://www.idealliance.org/xml2008/schedule-details.asp#gov6&#34;&gt;UBL panel&lt;/a&gt;, Priscilla Walmsley on &lt;a href=&#34;http://www.idealliance.org/xml2008/schedule-details.asp#ft6&#34;&gt;new features of XSLT 2.0&lt;/a&gt;, &lt;a href=&#34;http://www.idealliance.org/xml2008/schedule-details.asp#at6&#34;&gt;Microsoft&amp;rsquo;s new schema editor&lt;/a&gt;, the use of the open source XQuery database eXist &lt;a href=&#34;http://www.idealliance.org/xml2008/schedule-details.asp#ft7&#34;&gt;in US intelligence agencies&lt;/a&gt;, and the &lt;a href=&#34;http://www.idealliance.org/xml2008/schedule-details.asp#gov8&#34;&gt;semantic web panel&lt;/a&gt; with Mark Birbeck, Ron Reck, and Ken Sall. The grid schedule has a confusing description of two (combined?) panels on &amp;ldquo;Working with authoring schemas&amp;rdquo; that is probably one big panel, and with Norm Walsh listed as a panelist, I&amp;rsquo;ll have to check that out.&lt;/p&gt;
&lt;p&gt;Ken Holman is the chair of the track where I&amp;rsquo;ll speak on &lt;a href=&#34;http://www.idealliance.org/xml2008/schedule-details.asp#ft9&#34;&gt;Automating Content Analysis with Trang and Simple XSLT Scripts&lt;/a&gt; Tuesday afternoon, so I&amp;rsquo;ll have to be careful what I say about XSLT. (It will be interesting to see how he can provide an &lt;a href=&#34;http://www.idealliance.org/xml2008/schedule-details.asp#ft1&#34;&gt;Intro to XML, XSLT, and XSLFO&lt;/a&gt; in 60 minutes&amp;hellip;)&lt;/p&gt;
&lt;p&gt;I can&amp;rsquo;t say that I&amp;rsquo;m that pumped up to see the former CEO of Musak give the &lt;a href=&#34;http://www.idealliance.org/xml2008/schedule-details.asp#keynote&#34;&gt;main keynote&lt;/a&gt;, and who&amp;rsquo;s going to get up early enough on Tuesday morning to see a 7:30 AM &amp;ldquo;Premier Sponsor Presentation #1&amp;rdquo; that hasn&amp;rsquo;t even been booked yet? I won&amp;rsquo;t, but I look forward to learning a lot Monday and Tuesday and maybe even Wednesday.&lt;/p&gt;
&lt;h2 id=&#34;6-comments&#34;&gt;6 Comments&lt;/h2&gt;
&lt;p&gt;By Eamonn on &lt;a href=&#34;#comment-2188&#34;&gt;December 5, 2008 1:20 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;My first SGML conference was in 1995 (SGML Europe in the beautiful Austrian town of Gmunden) where I met Charles Goldfarb - what a great way to start. Gmunden is one of those places that you put on the &amp;lsquo;must revisit sometime to see if its still as wonderful as I remember it&amp;rsquo; list. Sadly, the European XML conference has morphed into something more general. But &lt;a href=&#34;http://www.xmlprague.cz/&#34;&gt;XML Prague&lt;/a&gt; is looking promising if you fancy a trip in March 2009.&lt;/p&gt;
&lt;p&gt;Have fun next week!&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://norman.walsh.name/&#34; title=&#34;http://norman.walsh.name/&#34;&gt;Norman Walsh&lt;/a&gt; on &lt;a href=&#34;#comment-2189&#34;&gt;December 5, 2008 1:30 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Alas, Bob, I&amp;rsquo;ve had to give my regrets for XML 2008 (for personal reasons, no worries). I gave them back in early November, and reminded them about the incorrect schedule again a week-or-so ago. Apparently it takes longer than that to update a web page. Who knew?&lt;/p&gt;
&lt;p&gt;The panel was supposed to be a &amp;ldquo;DocBook vs. DITA&amp;rdquo; sort of a thing and I was looking forward to it. I&amp;rsquo;ve been, perhaps, way too polite about the subject for perhaps way too long :-)&lt;/p&gt;
&lt;p&gt;See you at Balisage?&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.ccil.org/~cowan&#34; title=&#34;http://www.ccil.org/~cowan&#34;&gt;John Cowan&lt;/a&gt; on &lt;a href=&#34;#comment-2190&#34;&gt;December 5, 2008 2:35 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The biggest, I grant; better than EML/Balisage, I deny. But that&amp;rsquo;s why we have horse races.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://xmlportfolio.com&#34; title=&#34;http://xmlportfolio.com&#34;&gt;Evan Lenz&lt;/a&gt; on &lt;a href=&#34;#comment-2191&#34;&gt;December 5, 2008 5:31 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Bob, that&amp;rsquo;s great to hear you&amp;rsquo;re coming. This will be my first XML conference in five years! XML 2003 in Philadelphia was the last one. Were you there? I at least remember chatting with you at Disney World in 2001. :-)&lt;/p&gt;
&lt;p&gt;Norm, I was sorry to hear you couldn&amp;rsquo;t come. I was asked to join the panel based on my experience with WordML, but it will definitely feel like there&amp;rsquo;s a void without you on the panel. I&amp;rsquo;d love to make it to Balisage next year. I feel kind of ashamed of admitting I&amp;rsquo;ve never made it to Extreme before either.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.linkedin.com/in/sarahbourne&#34; title=&#34;http://www.linkedin.com/in/sarahbourne&#34;&gt;Sarah Bourne&lt;/a&gt; on &lt;a href=&#34;#comment-2194&#34;&gt;December 11, 2008 4:24 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Just read your post today - too late for anything other than regrets for not attending myself. Hope you get a chance to post highlights (hint! hint!)&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-2195&#34;&gt;December 11, 2008 4:44 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;m too swamped with work to work up a posting for a few days, but managed to achieve a lazy, Web 2.0-oriented equivalent: when I suggested (&lt;a href=&#34;http://twitter.com/bobdc/status/1042218934&#34;&gt;http://twitter.com/bobdc/status/1042218934&lt;/a&gt;) on Twitter that people use the hashtag #xml2008, enough people picked up on it to provide a nice narrative: &lt;a href=&#34;http://search.twitter.com/search?q=%23xml2008&#34;&gt;http://search.twitter.com/search?q=%23xml2008&lt;/a&gt;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2008">2008</category>
      
      <category domain="https://www.bobdc.com//categories/xml">XML</category>
      
    </item>
    
    <item>
      <title>SPARQL and live relational data</title>
      <link>https://www.bobdc.com/blog/sparql-and-live-relational-dat/</link>
      <pubDate>Mon, 01 Dec 2008 17:19:54 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/sparql-and-live-relational-dat/</guid>
      
      
      <description><div>A little demo.</div><div>&lt;img id=&#34;id197020&#34; src=&#34;https://www.bobdc.com/img/main/chiracsarkozy.jpg&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;Chirac and Sarkozy&#34;/&gt;
&lt;p&gt;In the &lt;a href=&#34;http://www.snee.com/xml/xml2006/owlrdbms.html&#34;&gt;first project&lt;/a&gt; I did with SPARQL, D2RQ, and MySQL I used D2RQ to pull all the relational data into a disk file and then queried that after adding some OWL-based metadata. D2RQ does let you execute SPARQL queries against a live relational database, instead of dumping data to a file and querying that, so I wanted to see the effects for myself. This would work better as a live demo, but you could think of it as a script for one.&lt;/p&gt;
&lt;p&gt;First, because MySQL is a multi-user database, imagine that several users are simultaneously using the same copy of the &amp;ldquo;world&amp;rdquo; database that I described in an &lt;a href=&#34;https://www.bobdc.com/blog/sparql-and-relational-database&#34;&gt;earlier entry&lt;/a&gt;. This will make my fake demo look more dramatic. (For additional drama, imagine bullets whizzing by my head as I type the various queries and commands.) I&amp;rsquo;ll start with a SPARQL query asking about the head of state for France:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;SELECT ?headOfState WHERE { 
?s vocab:country_Name &amp;quot;France&amp;quot;;
   vocab:country_HeadOfState ?headOfState.
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;With the version of the world database currently available from MySQL, that query returns &amp;ldquo;Jacques Chirac&amp;rdquo;. In fact, the database lists him as the head of state for several countries; this query&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;SELECT ?name WHERE { 
?s vocab:country_Name ?name;
   vocab:country_HeadOfState &amp;quot;Jacques Chirac&amp;quot;.
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;returns this list:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;quot;Guadeloupe&amp;quot;
&amp;quot;Martinique&amp;quot;
&amp;quot;Mayotte&amp;quot;
&amp;quot;France&amp;quot;
&amp;quot;French Guiana&amp;quot;
&amp;quot;French Polynesia&amp;quot;
&amp;quot;Réunion&amp;quot;
&amp;quot;Saint Pierre and Miquelon&amp;quot;
&amp;quot;New Caledonia&amp;quot;
&amp;quot;Wallis and Futuna&amp;quot;
&amp;quot;French Southern territories&amp;quot;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now imagine that someone else using the same database updates it with the following query at the MySQL command line:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;mysql&amp;gt; UPDATE country
    -&amp;gt; SET HeadOfState=&amp;quot;Nicolas Sarkozy&amp;quot;
    -&amp;gt; WHERE HeadOfState=&amp;quot;Jacques Chirac&amp;quot;;
Query OK, 11 rows affected (0.08 sec)
Rows matched: 11  Changed: 11  Warnings: 0
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;(I look forward to making a similar update for the United States entry in January.) When I rerun my original SPARQL query about the vocab:country_HeadOfState value for the subject that has a country name of &amp;ldquo;France&amp;rdquo;, I get the updated answer: &amp;ldquo;Nicolas Sarkozy&amp;rdquo;.&lt;/p&gt;
&lt;p&gt;When an interface such as D2RQ provides access to a relational database, SPARQL provides an excellent tool for looking at the data. Of course, if you can access that database using an SQL command line, you have even more options, but how many publicly accessible relational databases let you issue SQL commands against them? More and more offer SPARQL access, so SPARQL will be an increasingly valuable tool for getting at increasing amounts data. (Not that SPARQL&amp;rsquo;s future is limited to read-only access—an &lt;a href=&#34;http://jena.hpl.hp.com/~afs/SPARQL-Update.html&#34;&gt;UPDATE&lt;/a&gt; language for SPARQL is in the works.)&lt;/p&gt;
&lt;h2 id=&#34;2-comments&#34;&gt;2 Comments&lt;/h2&gt;
&lt;p&gt;By Prateek on &lt;a href=&#34;#comment-2185&#34;&gt;December 3, 2008 4:02 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Have a question about the statement &amp;ldquo;but how many publicly accessible relational databases let you issue SQL commands against them?More and more offer SPARQL access, so SPARQL will be an increasingly valuable tool for getting at increasing amounts data &amp;ldquo;.&lt;/p&gt;
&lt;p&gt;When I search for information on a Website (not search engines),lets say Geonames and I look for &amp;ldquo;New York&amp;rdquo;.Isn&amp;rsquo;t the search a query against a database?Plenty of websites, I think provide querying against publicly accessible relational database.The complexity of learning and writing SQL is hidden from the end user.&lt;/p&gt;
&lt;p&gt;In the case of Geonames,its a MySQL based store.It makes it easy for a naive user to search for information in Geonames because there is no necessity to learn the query language or SQL.&lt;/p&gt;
&lt;p&gt;My question&lt;/p&gt;
&lt;p&gt;(1) Isn&amp;rsquo;t the pain of learning and writing SPARQL one of the biggest hindrance in as you have it &amp;ldquo;becoming an increasingly valuable tool for getting at increasing amounts data&amp;rdquo;.?&lt;/p&gt;
&lt;p&gt;(2) Or because of this it will continue to remain a tool in the hands of people of SW Community?.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-2186&#34;&gt;December 3, 2008 4:50 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;As you quoted, I did say &amp;ldquo;let you issue SQL commands against them,&amp;rdquo; not &amp;ldquo;query the databases,&amp;rdquo; so I wouldn&amp;rsquo;t count form-driven queries against MySQL backends as relational queries of public data. You don&amp;rsquo;t have the flexibility to make up your own queries. As a matter of fact, I&amp;rsquo;m sure we&amp;rsquo;ll see more forms triggering SPARQL queries on the back end over time, so comparing SPARQL queries to form-driven queries of relational databases is not an apples-to-apples comparison.&lt;/p&gt;
&lt;p&gt;We could call SPARQL a tool in the hands of the SW community, but we could also call SQL a tool in the hands of the relational community. The biggest difference to me is that if you know SQL well, your options for writing an app that combines data from multiple public SQL databases are very limited. You query your personal data or your employer&amp;rsquo;s. The increasing amount of SPARQL-accessible data is what&amp;rsquo;s opening up the possibilities.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2008">2008</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
    </item>
    
    <item>
      <title>A video from a still camera</title>
      <link>https://www.bobdc.com/blog/a-video-from-a-still-camera/</link>
      <pubDate>Mon, 24 Nov 2008 21:03:50 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/a-video-from-a-still-camera/</guid>
      
      
      <description><div>With various strange noises and images.</div><div>&lt;p&gt;I&amp;rsquo;m on my second Canon Powershot right now, having gotten my first one in early 2003. These can take brief movies, but the Powershot is not really a movie camera, so I only took three- or four-second movies of things that I could loop. In November of 2004, once I had a collection of these clips and access to a Mac with iMovie installed, I took a few loopable shots of my daughter Alice playing the drums so I could tie the whole thing together with a regular beat and edited it into a &lt;a href=&#34;http://www.youtube.com/watch?v=1HM3mOmOv4c&#34;&gt;trippy little movie&lt;/a&gt;.&lt;/p&gt;

&lt;div style=&#34;position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;&#34;&gt;
  &lt;iframe src=&#34;https://www.youtube.com/embed/1HM3mOmOv4c&#34; style=&#34;position: absolute; top: 0; left: 0; width: 100%; height: 100%; border:0;&#34; allowfullscreen title=&#34;YouTube Video&#34;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;p&gt;The first shot has some kids spinning light things in the dark and my daughter Madeline jokingly complaining about Alice&amp;rsquo;s spinning thing hitting her, which gave the movie its title. In the four years since then, Alice has gotten a Ludwig kit with some entry-level Zildjian cymbals (the one in the movie sounds awful) and she&amp;rsquo;s also grown a few inches and improved on the drums quite a bit.&lt;/p&gt;
&lt;p&gt;The strange capsule bouncing way up in the air at 1:11 has Dan Brickley in it, if I remember correctly. This was at a carnival on the Dam Square in in Amsterdam during the XML Europe 2004 conference. (If it&amp;rsquo;s not him, it has at least one of the members of the &lt;a href=&#34;http://www.snee.com/panoramic/20040420.html&#34;&gt;RDF All Stars&lt;/a&gt; panoramic picture that I took earlier that day while demonstrating the camera&amp;rsquo;s panoramic capability to Uche Ogbuji, who had the exact same camera but didn&amp;rsquo;t know about this feature. Dave Beckett, with his back to Leigh Dodds, lost out in the stitching of the images, which gives him the appearance of a disappearing ghost.) In addition to that carnival, a local Virginia county fair and a Jersey shore boardwalk arcade provided the more American-looking carnival images. Oxford residents may recognize the playground on Abingdon Road.&lt;/p&gt;
&lt;p&gt;I recently got a &lt;a href=&#34;http://www.theflip.com&#34;&gt;Flip Video&lt;/a&gt; camera (whose logo owes much to the one from the &lt;a href=&#34;http://www.youtube.com/results?search_query=%22flip+wilson+show%22&amp;amp;search_type=&amp;amp;aq=-1&amp;amp;oq=%22flip+wilson+show%22&#34;&gt;Flip Wilson show&lt;/a&gt; in the early seventies). I&amp;rsquo;m having a lot of fun with it, although what I&amp;rsquo;ve done so far are outright home movies, which I won&amp;rsquo;t be putting here. I may bring it to &lt;a href=&#34;http://www.idealliance.org/xml2008/&#34;&gt;XML 2008&lt;/a&gt;, though&amp;hellip;&lt;/p&gt;
&lt;hr /&gt;
&lt;p&gt;\                                                                                                                                                              &lt;a href=&#34;http://www.youtube.com/results?search_query=%22flip+wilson+show%22&amp;amp;search_type=&amp;amp;aq=-1&amp;amp;oq=%22flip+wilson+show%22&#34;&gt;&lt;img id=&#34;id203750&#34; src=&#34;http://www.irememberjfk.com/mt/graphics/flip.jpg&#34; border=&#34;0&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;logo of flip wilson show&#34;/&gt;&lt;/a&gt;
\                                                                                                                                                           &lt;br /&gt;
&lt;img id=&#34;id203692&#34; src=&#34;http://sharing.theflip.com/images/flip_logo_email.gif&#34; border=&#34;0&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;Flip Wilson show logo&#34;/&gt;&lt;/p&gt;
&lt;hr /&gt;
&lt;h2 id=&#34;2-comments&#34;&gt;2 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://fracthis.blogspot.com/2008/12/7-years-in-c.html&#34; title=&#34;http://fracthis.blogspot.com/2008/12/7-years-in-c.html&#34;&gt;zarina&lt;/a&gt; on &lt;a href=&#34;#comment-2192&#34;&gt;December 8, 2008 8:57 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;This is in regard to your blog entry about semantic data entry.&lt;br /&gt;
You were looking for ways in which people record varied data on a daily basis.&lt;br /&gt;
Please do checkout Tinderbox @ &lt;a href=&#34;http://www.eastgate.com/Tinderbox/&#34;&gt;http://www.eastgate.com/Tinderbox/&lt;/a&gt;\&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-2193&#34;&gt;December 8, 2008 11:35 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Wow, I didn&amp;rsquo;t realize that eastgate was still around.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2008">2008</category>
      
      <category domain="https://www.bobdc.com//categories/miscellaneous">miscellaneous</category>
      
    </item>
    
    <item>
      <title>Linking information to &#34;missing&#34; information in SPARQL</title>
      <link>https://www.bobdc.com/blog/linking-information-to-missing/</link>
      <pubDate>Tue, 18 Nov 2008 21:07:31 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/linking-information-to-missing/</guid>
      
      
      <description><div>Or, as the SQL people call it, doing outer joins.</div><div>&lt;p&gt;&lt;a href=&#34;http://www.flickr.com/photos/sidereal/33547506/&#34;&gt;&lt;img id=&#34;id203579&#34; src=&#34;http://farm1.static.flickr.com/22/33547506_1474e0e528.jpg&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; width=&#34;220px&#34; alt=&#34;flickr picture titled &#39;outer join&#39;&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Last month, I described the SPARQL approach to some basic relational database queries in &lt;a href=&#34;https://www.bobdc.com/blog/sparql-and-relational-database&#34;&gt;SPARQL and relational databases: getting started&lt;/a&gt;. Today I want to talk about SPARQL&amp;rsquo;s equivalent of the SQL outer join, a bit of syntax that lets you add the phrase &amp;ldquo;and these corresponding fields if they&amp;rsquo;re there&amp;rdquo; to a query. I&amp;rsquo;ll use the same &amp;ldquo;world&amp;rdquo; database that I used in that posting&amp;rsquo;s examples.&lt;/p&gt;
&lt;p&gt;First, to review a simple join, the following SQL query asks for the names of the countries as well as the names of the cities that are the capitals of those countries:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;      SELECT country.name, city.name 
      FROM country, city
      WHERE country.capital = city.id;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This query returns 232 rows, beginning with these five:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt; Afghanistan                            Kabul
 Netherlands                            Amsterdam
 Netherlands Antilles                   Willemstad
 Albania                                Tirana
 Algeria                                Alger
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;If we ask the world database to just list country names, like this,&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;SELECT name FROM country;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;we get 239 rows, because the result includes &amp;ldquo;countries&amp;rdquo; that have no capital such as Antarctica and Bouvet Island. So how would you tell an SQL system &amp;ldquo;Show me all the countries, and if they have them, their capitals&amp;rdquo;? Using a technique called an &lt;a href=&#34;http://en.wikipedia.org/wiki/Outer_join#Outer_joins&#34;&gt;outer join&lt;/a&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;      SELECT country.name, city.name
      FROM country LEFT OUTER JOIN city ON
      country.capital = city.id;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;A left outer join asks for all relevant columns in the table on the &amp;ldquo;left&amp;rdquo; (that is, the first one mentioned in the query) and any information from the other table that matches. (A right outer join asks for all columns from the second table and any relevant ones from the first one, and a full outer join asks for all relevant columns from either table.) The result of our sample left outer join query includes all the rows from our first query above with seven extra rows for the &amp;ldquo;countries&amp;rdquo; that have no capital. These have NULL listed where the capital name would go, as this excerpt shows:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt; Virgin Islands, U.S.                          Charlotte Amalie
 Zimbabwe                                      Harare
 Palestine                                     Gaza
 Antarctica                                    NULL
 Bouvet Island                                 NULL
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;How can we do this in SPARQL? I found it to be easier than the SQL version, which took me several tries to run with no syntax mistakes.&lt;/p&gt;
&lt;p&gt;First, let&amp;rsquo;s review the SPARQL version of the first SQL query above, which asks for the countries in the database that have capitals, listed with those capitals:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;      SELECT ?countryName ?cityName
      WHERE {
        ?s1 vocab:country_Name ?countryName;
            vocab:country_Capital ?capital.
        ?s2 vocab:city_Name ?cityName;
            vocab:city_ID ?capital.
      }
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Like the SQL version, it pulls out 232 rows. The SPARQL version of &amp;ldquo;list all the countries and, if they&amp;rsquo;re there, the capitals&amp;rdquo; looks like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;      SELECT ?countryName ?cityName
      WHERE {
        ?s1 vocab:country_Name ?countryName.


      OPTIONAL {
        ?s1 vocab:country_Capital ?capital.
        ?s2 vocab:city_Name ?cityName;
            vocab:city_ID ?capital.
        }
      }
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The corresponding excerpt looks like this, with hyphens showing where it found no values:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;quot;Virgin Islands, U.S.&amp;quot;  &amp;quot;Charlotte Amalie&amp;quot;
&amp;quot;Zimbabwe&amp;quot;              &amp;quot;Harare&amp;quot;
&amp;quot;Palestine&amp;quot;             &amp;quot;Gaza&amp;quot;
&amp;quot;Antarctica&amp;quot;            -
&amp;quot;Bouvet Island&amp;quot;         -
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I think that the word OPTIONAL followed by a pair of curly braces is a much more intuitive way to say &amp;ldquo;and the following, if they&amp;rsquo;re there&amp;rdquo; than &amp;ldquo;LEFT OUTER JOIN ON&amp;rdquo;, but I&amp;rsquo;m increasingly biased.&lt;/p&gt;
&lt;h2 id=&#34;5-comments&#34;&gt;5 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.furia.com&#34; title=&#34;http://www.furia.com&#34;&gt;glenn mcdonald&lt;/a&gt; on &lt;a href=&#34;#comment-2178&#34;&gt;November 18, 2008 10:25 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I agree that &amp;ldquo;OPTIONAL&amp;rdquo; is better than &amp;ldquo;LEFT OUTER JOIN ON&amp;rdquo;, but this is a pretty low standard, and I don&amp;rsquo;t think you can reasonably claim that your SPARQL query as a whole is any more &amp;ldquo;intuitive&amp;rdquo; than the SQL version. It&amp;rsquo;s strictly more complicated both syntactically and semantically, and I bet a reasonably adept SPARQL programmer (imagining that more than 10 of these exist) would take more trial and error to get it right than an equivalent SQL programmer would need to get the SQL version working.&lt;/p&gt;
&lt;p&gt;Also, one of the victories SPARQL frequently claims over SQL is the elimination of joins. This is a misleading claim to begin with, since the reason you don&amp;rsquo;t need most ordinary joins in SPARQL is that they&amp;rsquo;ve have been moved into the data model. But even so, the fact that you had to change your SPARQL query to get the effect of a different SQL join shows that SPARQL hasn&amp;rsquo;t actually eliminated joins.&lt;/p&gt;
&lt;p&gt;Whereas (I say in the spirit of language-comparison), in Thread this query would be:&lt;/p&gt;
&lt;p&gt;Country|Capital&lt;/p&gt;
&lt;p&gt;That&amp;rsquo;s all. &amp;ldquo;Country&amp;rdquo; means &amp;ldquo;get all nodes of type Country&amp;rdquo;, &amp;ldquo;|Country&amp;rdquo; means &amp;ldquo;and for each of those nodes, calculate and return the results of following its Capital arc&amp;rdquo;. It makes no difference whether a country has 1 capital, no captial, 25 capitals, etc. (Your SPARQL version will produce N *different* pairs for countries with N capitals, right? Ugly.)&lt;/p&gt;
&lt;p&gt;And if you want to see it get even uglier, do the SPARQL query for finding those 7 countries without capitals.&lt;/p&gt;
&lt;p&gt;Thread&amp;rsquo;s version:&lt;/p&gt;
&lt;p&gt;Country:!(.Capital)&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-2179&#34;&gt;November 19, 2008 8:47 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;With the word &amp;ldquo;Thread&amp;rdquo; referring to a connected series of postings on virtually any topic so often, it&amp;rsquo;s very difficult to find out more about this query language with a web search, so it&amp;rsquo;s an unfortunate choice of name. Can you point me to implementations and data to query with those implementations so that I can try it?&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.furia.com&#34; title=&#34;http://www.furia.com&#34;&gt;glenn mcdonald&lt;/a&gt; on &lt;a href=&#34;#comment-2180&#34;&gt;November 19, 2008 9:38 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The bigger problem with searching for info on Thread, in this case, is that it&amp;rsquo;s part of a project still under development! So you can&amp;rsquo;t try it out yet, and the spec hasn&amp;rsquo;t been published. There&amp;rsquo;s a blog post about it at&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://www.furia.com/page.cgi?type=log#id311&#34;&gt;http://www.furia.com/page.cgi?type=log#id311&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;based on a talk I did at the Web 3.0 conference last month, and I gave some other examples in comments on your movie-query SPQARL post. It should be available for first public experimentation in Q1 some time. Possibly this makes it too annoying for me to be talking about it in advance at all, but I&amp;rsquo;m impatient, and I think most query languages get discussed before they&amp;rsquo;re built, so hopefully that moves it to within the bounds of acceptability.&lt;/p&gt;
&lt;p&gt;But if you&amp;rsquo;d rather not have me distracting your SQL/SPARQL contrast with a phantom alternative, just say so and I&amp;rsquo;ll leave you alone!&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-2181&#34;&gt;November 19, 2008 10:08 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;To be honest, while language elegance is certainly a great thing, I&amp;rsquo;m not going to defend SPARQL on elegance points. Its main appeal to me is the increasing amount of data and implementations available, so while I&amp;rsquo;ve obviously been discussing its syntax, that&amp;rsquo;s really just a means to an end: helping people take advantage of all that data.&lt;/p&gt;
&lt;p&gt;I will keep an eye on &lt;a href=&#34;http://www.furia.com.&#34;&gt;http://www.furia.com.&lt;/a&gt; (In fact, to gather data to query against with your Thread implementation, you may very well end up using SPARQL!)&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.timothyhorrigan.com&#34; title=&#34;http://www.timothyhorrigan.com&#34;&gt;Timothy Horrigan&lt;/a&gt; on &lt;a href=&#34;#comment-2182&#34;&gt;November 19, 2008 6:37 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;This issue gets complicated if someone fills the capital field with something when the country region doesn&amp;rsquo;t really have a capital. There are no good universal standards for how to indicate a null and/or missing and/or unknown value.&lt;/p&gt;
&lt;p&gt;Also some countries have multiple capitals. You see a couple examples just in the sample data above. Palestine&amp;rsquo;s capital is Gaza if you use the location of the Palestine government&amp;rsquo;s heaqdquarters as the capital. But the Palestinians think that Jerusalem is their capital, and the offices in Gaza are just a temporary expedient. A less exotic example is the Netherlands: Amsterdam is its capital for most purposes, but some parts of the government are headquartered in The Hague.&lt;/p&gt;
&lt;p&gt;\&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2008">2008</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
    </item>
    
    <item>
      <title>Compress those podcasts!</title>
      <link>https://www.bobdc.com/blog/compress-those-podcasts/</link>
      <pubDate>Wed, 12 Nov 2008 08:53:46 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/compress-those-podcasts/</guid>
      
      
      <description><div>A simple step that podcasters can take to make their work easier to listen to.</div><div>&lt;p&gt;There are a lot of fascinating podcasts of interviews out there. They&amp;rsquo;re usually done by phone, and because of the vagaries of the hardware and software that link up each caller&amp;rsquo;s voice to the host&amp;rsquo;s, different voices get added to a podcast MP3 at different volumes, especially in conference call panel discussions such as those for &lt;a href=&#34;http://semanticgang.talis.com/&#34;&gt;The Semantic Web Gang&lt;/a&gt; and &lt;a href=&#34;http://readwritetalk.com/&#34;&gt;ReadWriteTalk&lt;/a&gt;. These differences can lead to an annoying combination of blasting your ears (especially when using ear buds while jogging) and fiddling with the volume control. For example, after turning up my car stereo loud enough to hear Daniela Barbosa&amp;rsquo;s voice in the &lt;a href=&#34;http://readwritetalk.com/2008/09/05/daniela-barbosa-dow-jones/&#34;&gt;ReadWriteTalk&lt;/a&gt; interview with her, the closing music nearly blew out my car&amp;rsquo;s speakers. Professionals have tools to reduce this difference, and fine free tools are available for amateurs as well. I love &lt;a href=&#34;http://audacity.sourceforge.net/&#34;&gt;Audacity&lt;/a&gt;, an open-source SourceForge project audio editor with binaries available for Windows, Linux, and Mac OS X.&lt;/p&gt;
&lt;p&gt;Audacity&amp;rsquo;s default view of an audio file displays volume over time. The following shows how it displays part of the &lt;a href=&#34;http://blogs.talis.com/nodalities/podpress_trac/web/1130/0/twt20081007-DavidProvost.mp3&#34;&gt;MP3 file&lt;/a&gt; for &lt;a href=&#34;http://blogs.talis.com/nodalities/2008/10/david-provost-talks-with-talis-about-his-report-of-a-semantic-web-industry-on-the-cusp.php&#34;&gt;Paul Miller&amp;rsquo;s Nodalities interview with David Provost&lt;/a&gt;. I don&amp;rsquo;t mean to pick on the Talis broadcasts, but this particular interview makes for a good visual example of how different the volumes in two speaker&amp;rsquo;s voice volumes may come across:&lt;/p&gt;
&lt;img id=&#34;id203651&#34; src=&#34;https://www.bobdc.com/img/main/compresspodcasts1.jpg&#34; border=&#34;0&#34; align=&#34;middle&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;Audacity screen shot of Paul Miller podcast 1&#34;/&gt;
&lt;p&gt;When viewing this in Audacity, if you click on one of the narrow parts of the blue bar (for example, at 10:20) and click the Play button, you&amp;rsquo;ll hear Paul&amp;rsquo;s voice. If you click on one of the wide parts, you&amp;rsquo;ll hear David&amp;rsquo;s voice. Paul may be soft-spoken, and he often interviews people who are excited about the technology they&amp;rsquo;re evangelizing, but as I said earlier, the luck of the connection has more to do with the volume differences than personal style.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://www.kqzyfj.com/click-1973330-10381297?url=http%3A%2F%2Fguitars.musiciansfriend.com%2Fproduct%2FMXR-M102-Dyna-Comp-Compressor-Pedal%3Fsku%3D151101&amp;amp;cjsku=151101&#34;&gt;&lt;img id=&#34;id203691&#34; src=&#34;http://guitargeek.com/gear/img/mxr_dynacomp.gif&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;MXR Dyna-Comp&#34;/&gt;&lt;/a&gt;&lt;img id=&#34;id203707&#34; src=&#34;http://www.tqlkg.com/image-1973330-10381297&#34; width=&#34;1&#34; height=&#34;1&#34; border=&#34;0&#34;/&gt;&lt;/p&gt;
&lt;p&gt;This is easy enough to remediate with an Audacity feature called compression, which makes the loud sounds quieter and the quiet sounds louder. (As the Wikipedia page for &lt;a href=&#34;http://en.wikipedia.org/wiki/Dynamic_range_compression&#34;&gt;dynamic range compression&lt;/a&gt; points out, this is unrelated to file size compression.) As a bit of specialized rock and roll hardware, compression boxes are popular with guitar players for providing more sustain by boosting the signal as the note dies away without adding to the distortion. (The down side is that boosting quiet sounds can turn a slight buzz or hum into a loud one, and reducing the loud sounds can take the crunch out of your attack with chords.) I still have the &lt;a href=&#34;http://guitargeek.com/gearview/183/&#34;&gt;MXR Dyna-Comp Compressor&lt;/a&gt; that I bought in 1979, which is apparently a collector&amp;rsquo;s item now; an ebay search for &lt;a href=&#34;http://shop.ebay.com/items/_W0QQ_dmptZGuitarQ5fAccessories?_nkw=vintage+mxr+dyna-comp&amp;amp;_sacat=0&amp;amp;_fromfsb=&amp;amp;_trksid=m270.l1313&amp;amp;_odkw=mxr+dyna-comp&amp;amp;_osacat=0&#34;&gt;&amp;ldquo;vintage mxr dyna-comp&amp;rdquo;&lt;/a&gt; gets multiple hits. At the end of the song &amp;ldquo;Cooper Square&amp;rdquo; on &lt;a href=&#34;http://www.snee.com/music/ha/&#34;&gt;this page&lt;/a&gt; you can hear my unsuccessful and then succesful attempts to step on the Dyna Comp switch to turn off the hum, and if you turn it way up, you may hear the other guys laughing at me.&lt;/p&gt;
&lt;p&gt;To compress all or part of an audio file with Audacity, select the part to compress and then pick &lt;strong&gt;Compressor&lt;/strong&gt; from the &lt;strong&gt;Effect&lt;/strong&gt; menu to display the &lt;strong&gt;Dynamic Range Compressor&lt;/strong&gt; dialog box. After trying several combinations of settings on this dialog box with the Talis interview, I had the best luck with Threshold on -35, Ratio on 3:1, Attack Time on .1, and &amp;ldquo;Normalize to 0db after compressing&amp;rdquo; unchecked. The result looked like this:&lt;/p&gt;
&lt;img id=&#34;id203798&#34; src=&#34;https://www.bobdc.com/img/main/compresspodcasts2.jpg&#34; border=&#34;0&#34; align=&#34;middle&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;Audacity screen shot of Paul Miller podcast 2&#34;/&gt;
&lt;p&gt;It looks spiky, but it&amp;rsquo;s much easier to listen to without touching your player&amp;rsquo;s volume control, and I&amp;rsquo;d recommend that anyone who isn&amp;rsquo;t compressing their podcasts before posting them use this open-source, cross-platform tool follow the same procedure.&lt;/p&gt;
&lt;p&gt;Audacity has many more features. To make an audio file out of any sound coming out of my computer, I can connect the headphone out jack to the microphone in jack and record with it. You can cut, copy, and paste when you want to trim something, add various other effects, and even do multi-track recording, although I haven&amp;rsquo;t tried that much. And you can&amp;rsquo;t beat it for the price.&lt;/p&gt;
&lt;h2 id=&#34;1-comments&#34;&gt;1 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.timothyhorrigan.com/tammi_itunes.html&#34; title=&#34;http://www.timothyhorrigan.com/tammi_itunes.html&#34;&gt;Timothy Horrigan&lt;/a&gt; on &lt;a href=&#34;#comment-2177&#34;&gt;November 14, 2008 1:49 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Excellent tips, Bob: I have Audacity myself (with a capital A I mean though I have the small-a audacity as well.) But I haven&amp;rsquo;t used it as effectively as you have.&lt;/p&gt;
&lt;p&gt;One thing I can add about the hum issue is that you can use the noise filter to get rid of it. Just find some unadulterated hum, use it to train the noise flier and then apply it to the whole file.&lt;/p&gt;
&lt;p&gt;BTW, if you are feeling really ambitious, you can try using Second Life to host the podcast. Basically you would have one avatar record the chat on a machinanime (i.e., &amp;ldquo;machine anime&amp;rdquo; or a video recording) while the others talk and then you could extract the audio from the movie. You can even have a live audience listen in and/or a do live broadcast to streaming video.&lt;/p&gt;
&lt;p&gt;SL&amp;rsquo;s voice module does a good job of equalizing the volumes of the speakers&amp;rsquo; voices. You can also adjust the volume manually, although you can get a nasty overdriven-amp effect if someone comes in too hot.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2008">2008</category>
      
      <category domain="https://www.bobdc.com//categories/neat-tricks">neat tricks</category>
      
    </item>
    
    <item>
      <title>SPARQL at the movies</title>
      <link>https://www.bobdc.com/blog/sparql-at-the-movies/</link>
      <pubDate>Fri, 07 Nov 2008 09:12:00 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/sparql-at-the-movies/</guid>
      
      
      <description><div>Using SPARQL to answer a few questions that IMDB won&#39;t help much with.</div><div>&lt;p&gt;Last week I &lt;a href=&#34;https://www.bobdc.com/blog/download-sparql-results-direct#id203726&#34;&gt;mentioned&lt;/a&gt; that the &lt;a href=&#34;http://www.linkedmdb.org/&#34;&gt;Linked Movie Database&lt;/a&gt; SPARQL endpoint would be fun to play with, and it has been. I used my &lt;a href=&#34;http://www.snee.com/sparql/spreadsheetSPARQL.html&#34;&gt;spreadsheetSPARQL&lt;/a&gt; interface to send the following queries to their &lt;a href=&#34;http://sparql.linkedmdb.org:2020/linkedmdb&#34;&gt;http://sparql.linkedmdb.org:2020/linkedmdb&lt;/a&gt; SPARQL endpoint.&lt;/p&gt;
&lt;h2 id=&#34;id203604&#34;&gt;One degree of Kevin Bacon&lt;/h2&gt;
&lt;p&gt;The following lists all the actors who have appeared in a movie with Kevin Bacon. Or, in more SPARQLy terms, it says &amp;ldquo;show me ?actorName (with no repeats) where ?kb is the ID for Kevin Bacon, and a given movie has ?kb and ?actor in it, and ?actor has the name ?actorName, but don&amp;rsquo;t show me ?actorName if the actor is ?kb&amp;rdquo; (that is, don&amp;rsquo;t list Kevin himself):&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;SELECT DISTINCT ?actorName WHERE {


  ?kb    &amp;lt;http://data.linkedmdb.org/resource/movie/actor_name&amp;gt; &amp;quot;Kevin Bacon&amp;quot;.


  ?movie &amp;lt;http://data.linkedmdb.org/resource/movie/actor&amp;gt; ?kb;
         &amp;lt;http://data.linkedmdb.org/resource/movie/actor&amp;gt; ?actor.


  ?actor &amp;lt;http://data.linkedmdb.org/resource/movie/actor_name&amp;gt; ?actorName.


  FILTER (?kb != ?actor).
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I won&amp;rsquo;t show you all of the 240 names that get returned, but here are the first few:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Eve
Vincent D&#39;Onofrio
Daniel Stern
John C. Reilly
J. T. Walsh
Michael Gross
William Windom
Michael Tucker
Stephen Lang
&lt;/code&gt;&lt;/pre&gt;
&lt;h2 id=&#34;id203671&#34;&gt;Versatile actor(s)&lt;/h2&gt;
&lt;p&gt;Which actors have appeared in both a John Waters movie and a Steven Spielberg movie? (Assign the URI for each director to a variable, find the URI for any actors who worked with both directors, and get the actors&amp;rsquo; names.)&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;SELECT DISTINCT ?actorName WHERE {


  ?dir1      &amp;lt;http://data.linkedmdb.org/resource/movie/director_name&amp;gt; &amp;quot;John Waters&amp;quot;.


  ?dir2      &amp;lt;http://data.linkedmdb.org/resource/movie/director_name&amp;gt; &amp;quot;Steven Spielberg&amp;quot;.


  ?dir1movie &amp;lt;http://data.linkedmdb.org/resource/movie/director&amp;gt; ?dir1;
             &amp;lt;http://data.linkedmdb.org/resource/movie/actor&amp;gt; ?actor.


  ?dir2movie &amp;lt;http://data.linkedmdb.org/resource/movie/actor&amp;gt; ?actor;
             &amp;lt;http://data.linkedmdb.org/resource/movie/director&amp;gt; ?dir2.


  ?actor     &amp;lt;http://data.linkedmdb.org/resource/movie/actor_name&amp;gt; ?actorName.
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This query returns only one result: Darren E. Burrows, who was in both &lt;a href=&#34;http://data.linkedmdb.org/resource/film/870%20&#34;&gt;Amistad&lt;/a&gt; and &lt;a href=&#34;http://data.linkedmdb.org/resource/film/27532%20&#34;&gt;Cry-Baby&lt;/a&gt;. (Fans of the old television show &amp;ldquo;Northern Exposure&amp;rdquo; might remember him as Ed Chigliak.)&lt;/p&gt;
&lt;h2 id=&#34;id203723&#34;&gt;Woody Allen&amp;rsquo;s favorite actors&lt;/h2&gt;
&lt;p&gt;To list everyone who had ever been in a Woody Allen movie, I might use the DISTINCT keyword so that each was only listed once, but I wanted the repetition so that I could see who had been in how many of these movies:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;SELECT ?actorName WHERE {


  ?woody  &amp;lt;http://data.linkedmdb.org/resource/movie/director_name&amp;gt; &amp;quot;Woody Allen&amp;quot;.


  ?movie  &amp;lt;http://data.linkedmdb.org/resource/movie/director&amp;gt; ?woody;
          &amp;lt;http://data.linkedmdb.org/resource/movie/actor&amp;gt; ?actor.


  ?actor &amp;lt;http://data.linkedmdb.org/resource/movie/actor_name&amp;gt; ?actorName.
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Adding count(?actorName) after ?actorName caused an error, so either this SPARQL extension isn&amp;rsquo;t supported by the Linked Movie Database&amp;rsquo;s SPARQL implementation or I was doing something wrong. Either way, I got what I wanted by copying the output of the query above to a file called waactors.txt and piping that through the following at a Windows command prompt:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;type waactors.txt | sort | uniq -c | sort /r
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This sorts the names, reduces the list to show each name once with a count of how many times it occurred, and then does a reverse sort to put the names that occurred the most at the top of the list. (The Linux equivalent would be &amp;ldquo;cat waactors.txt | sort | uniq -c | sort -r&amp;rdquo;.) The resulting list began with these names:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;     26 Woody Allen
     12 Mia Farrow
      7 Diane Keaton
      6 Julie Kavner
      5 Dianne Wiest
      4 Louise Lasser
      3 Tony Roberts
      3 Scarlett Johansson
      3 Judy Davis
      3 Alan Alda
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now we have a clear picture of who his favorite actor is. When I saw that &lt;a href=&#34;http://www.imdb.com/name/nm0001413/&#34;&gt;the voice of Marge Simpson&lt;/a&gt; beat everyone but Allen himself and his two most famous leading ladies, I wondered what six movies she appeared in. The answer was easy to find out:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;SELECT  ?movieName WHERE {


  ?woody  &amp;lt;http://data.linkedmdb.org/resource/movie/director_name&amp;gt; &amp;quot;Woody Allen&amp;quot;.


  ?actor &amp;lt;http://data.linkedmdb.org/resource/movie/actor_name&amp;gt; &amp;quot;Julie Kavner&amp;quot;.


  ?movie  &amp;lt;http://data.linkedmdb.org/resource/movie/director&amp;gt; ?woody;
          &amp;lt;http://data.linkedmdb.org/resource/movie/actor&amp;gt; ?actor;
          &amp;lt;http://purl.org/dc/terms/title&amp;gt; ?movieName.
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The answer:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Don&#39;t Drink the Water
Deconstructing Harry
Hannah and Her Sisters
New York Stories
Radio Days
Shadows and Fog
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Moving further down the list of Allen&amp;rsquo;s favorite actors, I&amp;rsquo;m going to go out on a limb and predict that Scarlett Johansson moves ahead of Tony Roberts before Alan Alda does.&lt;/p&gt;
&lt;h2 id=&#34;6-comments&#34;&gt;6 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.ccil.org/~cowan&#34; title=&#34;http://www.ccil.org/~cowan&#34;&gt;John Cowan&lt;/a&gt; on &lt;a href=&#34;#comment-2173&#34;&gt;November 7, 2008 9:47 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Actually, you have to use &amp;ldquo;sort -n -r&amp;rdquo; on Posix systems to force the first field to be interpreted numerically. (I do a lot of this sort of thing.)&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-2174&#34;&gt;November 7, 2008 9:57 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Hi John,&lt;/p&gt;
&lt;p&gt;I didn&amp;rsquo;t need the -n, I assume because the uniq -c output right-justifies the numbers with leading spaces that make the numeric sort come out the way I want it.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.furia.com&#34; title=&#34;http://www.furia.com&#34;&gt;glenn mcdonald&lt;/a&gt; on &lt;a href=&#34;#comment-2175&#34;&gt;November 7, 2008 3:04 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Very cool that you actually managed to get computed answers to these questions.&lt;/p&gt;
&lt;p&gt;Not so cool, it seems to me, that the questions are so cumbersome to frame.&lt;/p&gt;
&lt;p&gt;I will now tantalize (appall? annoy?) you with what these queries would look like in Thread, the path-based query-language I&amp;rsquo;m working on at ITA:&lt;/p&gt;
&lt;p&gt;&amp;ldquo;actors who have appeared in a movie with Kevin Bacon&amp;rdquo;:&lt;/p&gt;
&lt;p&gt;Actor:=Kevin Bacon.Movie.Actor&lt;/p&gt;
&lt;p&gt;&lt;br /&gt;
&amp;ldquo;actors who have appeared in both a John Waters movie and a Steven Spielberg movie&amp;rdquo;:&lt;/p&gt;
&lt;p&gt;Actor:(.Movie.Director:=John Waters:=Steven Spielberg)&lt;/p&gt;
&lt;p&gt;&lt;br /&gt;
&amp;ldquo;actors who have appeared in Woody Allen movies&amp;rdquo;:&lt;/p&gt;
&lt;p&gt;Director:=Woody Allen.Movie.Actor&lt;/p&gt;
&lt;p&gt;&lt;br /&gt;
&amp;ldquo;actors who have appeared in Woody Allen movies, with counts&amp;rdquo;&lt;/p&gt;
&lt;p&gt;Actor|Appearances=(.Movie:(.Director:=Woody Allen)._Count):Appearances&amp;gt;0#Appearances&lt;/p&gt;
&lt;p&gt;or, more cleverly:&lt;/p&gt;
&lt;p&gt;Director:=Woody Allen.Movie/Actor|Movies=(.Nodes._Count)#Movies&lt;/p&gt;
&lt;p&gt;&lt;br /&gt;
You can probably get the gist of most of these without a very detailed explanation of syntax. Except the last one, which is the most fun:&lt;/p&gt;
&lt;p&gt;Director - get all nodes of type Director&lt;br /&gt;
:=Woody Allen - narrow this list down to the one named Woody Allen&lt;br /&gt;
.Movie - get all this director&amp;rsquo;s movies&lt;br /&gt;
/Actor - group these movies by actor (each movie will appear in multiple groups&amp;hellip;)&lt;br /&gt;
|Movies=(.Nodes._Count) - calculate the number of grouped movies in each actor&amp;rsquo;s group, and call this &amp;ldquo;Movies&amp;rdquo;&lt;br /&gt;
#Movies - and sort the set of actor/movie-groups by these counts&lt;/p&gt;
&lt;p&gt;&lt;br /&gt;
No real points until it&amp;rsquo;s publicly available, obviously. But intriguing, maybe?&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.furia.com&#34; title=&#34;http://www.furia.com&#34;&gt;glenn mcdonald&lt;/a&gt; on &lt;a href=&#34;#comment-2176&#34;&gt;November 7, 2008 3:16 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Oh, sorry, the Waters/Spielberg example would be:&lt;/p&gt;
&lt;p&gt;Actor:(.Movie.Director:=John Waters,=Steven Spielberg._Count:=2)&lt;/p&gt;
&lt;p&gt;or&lt;/p&gt;
&lt;p&gt;Actor:(.Movie.Director:=John Waters):(.Movie.Director:=Steven Spielberg)&lt;/p&gt;
&lt;p&gt;or, for that matter&lt;/p&gt;
&lt;p&gt;Director:=John Waters,=Steven Spielberg/(.Movie.Actor):(.nodes._Count:=2)&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://vannevarvision.wordpress.com&#34; title=&#34;http://vannevarvision.wordpress.com&#34;&gt;Shahan Khatchadourian&lt;/a&gt; on &lt;a href=&#34;#comment-2183&#34;&gt;November 23, 2008 11:48 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Thanks for the interesting post. The SPARQL endpoint has been shifted back to the D2RQ server:&lt;br /&gt;
&lt;a href=&#34;http://data.linkedmdb.org/sparql&#34;&gt;http://data.linkedmdb.org/sparql&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.india-forums.tv/bollywood-forum.html&#34; title=&#34;http://www.india-forums.tv/bollywood-forum.html&#34;&gt;Fozia&lt;/a&gt; on &lt;a href=&#34;#comment-2187&#34;&gt;December 4, 2008 1:07 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;well… i visit your website first time and found this site very usefull and intresting !&lt;/p&gt;
&lt;p&gt;well… you guys doing nice work and i just want to say that keep rocking and keep it up !!!!&lt;/p&gt;
&lt;p&gt;Regards&lt;br /&gt;
Fozia\&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2008">2008</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
    </item>
    
    <item>
      <title>Converting SGML DTDs to XML</title>
      <link>https://www.bobdc.com/blog/converting-sgml-dtds-to-xml/</link>
      <pubDate>Tue, 04 Nov 2008 18:35:22 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/converting-sgml-dtds-to-xml/</guid>
      
      
      <description><div>Not quite to XML DTDs, but close enough to be useful.</div><div>&lt;p&gt;I recently had to analyze a large batch of SGML DTDs for a client who planned to convert their publishing system to XML. I was mostly looking for redundant declarations in multiple DTDs that could be pulled into shared modules, but I also wanted some lists of elements and attributes that I could compare against statistics compiled about sample data so that I could see which elements and attributes were actually being used, because there&amp;rsquo;s not much point converting SGML declarations for elements that aren&amp;rsquo;t even used into XML element declarations.&lt;/p&gt;
&lt;p&gt;When I want to analyze a collection of information that doesn&amp;rsquo;t neatly fit into one or more tables, I want it in XML so that I can write little XSLT stylesheets to churn through it and count and compare things, and I found a surprisingly easy way to make an XML version of all of this SGML DTD information. I didn&amp;rsquo;t quite turn it into an XML DTD or schema—there was enough refactoring planned for the DTD conversion that we didn&amp;rsquo;t bother—but a few more steps and a minimal amount of manual work would have made that pretty straightforward.&lt;/p&gt;
&lt;p&gt;The key was Earl Hood&amp;rsquo;s &lt;a href=&#34;http://savannah.nongnu.org/projects/perlsgml/&#34;&gt;perlSGML&lt;/a&gt; DTD analysis tools. I wrote about these in a 1998 book I did called &lt;a href=&#34;http://www.snee.com/bob/sgmlfree/&#34;&gt;SGML CD&lt;/a&gt;. (The book was originally going to be called &amp;ldquo;SGML for Free&amp;rdquo;, because it documented all the best free SGML tools, but Prentice Hall decided that including a CD of the software itself would make the book more appealing, and they changed the book&amp;rsquo;s title to make it clearer that a CD came with it.) Earl&amp;rsquo;s tools are a collection of perl scripts that read an SGML DTD and give you various ways to explore it.&lt;/p&gt;
&lt;p&gt;One script, called dtd2html, creates a directory full of HTML reports about various aspects of the DTD such as which elements have which subelements and attributes, which parent elements they can have, and which elements have which attribute types and which of those are required. My original idea was to run these HTML files through &lt;a href=&#34;http://home.ccil.org/~cowan/XML/tagsoup/&#34;&gt;tagsoup&lt;/a&gt; so that I could use XSLT stylesheets to pull out the information that I wanted, but it wasn&amp;rsquo;t as easy for the stylesheets to find what I needed in the dtd2html output as I had hoped. This was easy enough to fix: I added a few lines to the dtd2html perl script to wrap some &lt;code&gt;div&lt;/code&gt; elements around the parts that I was interested in. These &lt;code&gt;div&lt;/code&gt; elements included &lt;code&gt;class&lt;/code&gt; attributes with names that served as hooks to make it easier for the XSLT stylesheet to find them, so that once I ran the modified version of dtd2html and tagsoup again, the stylesheet was pretty simple to write.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;m not going to post my revised version of dtd2html because I wrote it for client work, and getting the right permissions would be more work than adding the lines that added the &lt;code&gt;div&lt;/code&gt; tags to the perl script. If you need something like this, though, you can make your own customized additions to dtd2html with very little trouble.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ll be discussing this and related techniques in my XML 2008 talk on &lt;a href=&#34;http://www.idealliance.org/xml2008/schedule-details.asp#ft9&#34;&gt;Automating Content Analysis with Trang and Simple XSLT Scripts&lt;/a&gt; on December 9th.&lt;/p&gt;
&lt;h2 id=&#34;1-comments&#34;&gt;1 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-2171&#34;&gt;November 4, 2008 6:42 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Implementing captcha on my weblog forced me to convert to using some Movable Type 4 templates where I had been using MT 3 ones, and this screwed some things up, so I apologize if there are any problems adding comments. Kudos to the support people at pair.net, who patiently helped me straighten out the initial captcha problems.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2008">2008</category>
      
      <category domain="https://www.bobdc.com//categories/xml">XML</category>
      
    </item>
    
    <item>
      <title>Download SPARQL results directly into a spreadsheet</title>
      <link>https://www.bobdc.com/blog/download-sparql-results-direct/</link>
      <pubDate>Wed, 29 Oct 2008 09:02:14 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/download-sparql-results-direct/</guid>
      
      
      <description><div>And then sort it, graph it, create new calculations... do all that stuff people do with spreadsheets.</div><div>&lt;p&gt;&lt;a href=&#34;http://www.snee.com/sparql/spreadsheetSPARQL.html&#34;&gt;&lt;img id=&#34;id203578&#34; src=&#34;https://www.bobdc.com/img/main/spreadsheetSPARQL.jpg&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;graph generated in spreadsheet from SPARQL output&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;One could make a case that spreadsheets are the oldest form of writing, if some of the earliest examples of writing are columns of cuneiform symbols pressed into clay with reeds to keep track of how many cattle or pots got shipped up and down the river. Back then, stories were oral, but written records made tracking your business&amp;rsquo;s assets and activities a lot easier. Spreadsheets were certainly the killer app of the original personal computers, as people bought 8-bit PCs to run &lt;a href=&#34;http://www.bricklin.com/&#34;&gt;Dan Bricklin&amp;rsquo;s&lt;/a&gt; VisiCalc (&lt;a href=&#34;https://www.bobdc.com/blog/scraping-and-linked-data#c001540&#34;&gt;not Dan Brickley&amp;rsquo;s&lt;/a&gt;) and then the 16-bit IBM PC to run Lotus 123. Whether creating a phone list for a kid&amp;rsquo;s soccer team or modeling complex financial derivatives, people tracking data that fits into a table of rows and columns like to put it into spreadsheets.&lt;/p&gt;
&lt;p&gt;A SPARQL query returns a table of results in &lt;a href=&#34;http://www.w3.org/TR/2008/REC-rdf-sparql-XMLres-20080115/&#34;&gt;XML&lt;/a&gt; or &lt;a href=&#34;http://www.w3.org/TR/rdf-sparql-json-res/&#34;&gt;JSON&lt;/a&gt;, formats that you can easily convert into other formats. One of my most popular blog postings describes how implementing a &lt;a href=&#34;https://www.bobdc.com/blog/download-as-spreadsheet&#34;&gt;download as spreadsheet&lt;/a&gt; link is as simple as creating an HTML table and then telling the downloading browser &amp;ldquo;here comes a spreadsheet&amp;rdquo; before delivering that HTML. The browser will usually open up the designated spreadsheet application and load the table there, including simple formatting included with the data.&lt;/p&gt;
&lt;p&gt;To do this with for SPARQL queries, I created a form at &lt;a href=&#34;http://www.snee.com/sparql/spreadsheetSPARQL.html&#34;&gt;http://www.snee.com/sparql/spreadsheetSPARQL.html&lt;/a&gt; with an appearance based on SNORQL forms such as &lt;a href=&#34;http://dbpedia.org/snorql/&#34;&gt;DBpedia&amp;rsquo;s&lt;/a&gt;. The form has two fields: one for the URL of a SPARQL endpoint and one for the SPARQL query to send to that endpoint. Clicking the &amp;ldquo;Go&amp;rdquo; button tells a python CGI script to do the following:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Send the query to the endpoint, asking for a JSON version of the results.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Send an HTTP Content-type value of &amp;ldquo;application/vnd.ms-excel&amp;rdquo; to the application that requested the SPARQL data (most likely, the browser displaying the query form).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Send an HTML version of the JSON data to the requesting application.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;A computer that doesn&amp;rsquo;t have Excel installed will open whatever application is assigned to open Excel spreadsheets; my Ubuntu laptop displayed the table in OpenOffice Calc. It all worked fine from Firebird on Ubuntu and Windows and from Internet Explorer (but not from Google Chrome, which just downloaded the result to a file).&lt;/p&gt;
&lt;p&gt;There are &lt;a href=&#34;http://esw.w3.org/topic/SparqlEndpoints&#34;&gt;many SPARQL endpoints&lt;/a&gt; that have apparently worked at one time or another, but only four worked when I tested my spreadsheetSPARQL form and script. Still, the fact that four worked fine was great, and the &lt;a href=&#34;http://www.linkedmdb.org/&#34;&gt;Linked Movie Database&lt;/a&gt; in particular will be a lot of fun to play with. The spreadsheetSPARQL form lists the endpoints that worked for me. It also links to a recent posting I did on &lt;a href=&#34;https://www.bobdc.com/blog/how-you-can-explore-a-new-set&#34;&gt;good queries to start with&lt;/a&gt; when exploring an unfamiliar set of RDF data.&lt;/p&gt;
&lt;p&gt;Here&amp;rsquo;s one example of using the form. I sent the following query to DBpedia&amp;rsquo;s SPARQL endpoint of &lt;a href=&#34;http://dbpedia.org/sparql&#34;&gt;http://dbpedia.org/sparql&lt;/a&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;SELECT ?co,  ?revenue, ?netIncome
WHERE {
  ?co dbpedia2:revenue ?revenue;
      dbpedia2:netIncome ?netIncome.
  FILTER (?revenue &amp;gt; 80000000000)
}
ORDER BY ?revenue
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I downloaded the result into a spreadsheet and then created the bar graph shown above. (Gazprom&amp;rsquo;s revenue for last year pretty much dwarfs all other figures, but the current drop in oil prices should even out next year&amp;rsquo;s version of the graph.)&lt;/p&gt;
&lt;p&gt;There is tons more data on DBpedia alone that will be interesting in a spreadsheet, and the number of additional SPARQL endpoints to pull data from is growing. Please let me know if you create other interesting spreadsheet uses of SPARQL data, with graphs or otherwise. I look forward to hearing about them.&lt;/p&gt;
&lt;h2 id=&#34;1-comments&#34;&gt;1 Comments&lt;/h2&gt;
&lt;p&gt;By drewp on &lt;a href=&#34;#comment-2170&#34;&gt;November 2, 2008 1:14 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;So far you&amp;rsquo;ve motivated seeing the results of a query in a table and making a graph from them. I&amp;rsquo;d like to have both of those capabilities in a webapp. E.g. I should be able to embed a live graph in my own page like this:&lt;/p&gt;
&lt;img src=&#34;http://sparqlgrapher.com/svg/example.com/query=SELECT+?date+?price+{...}&#34;&gt;
&lt;p&gt;Visiting my hypothetical sparqlgrapher.com directly would give you a UI to layout and customize the graph. When you&amp;rsquo;re done, you&amp;rsquo;d take that url and embed it elsewhere (or just take a copy of the image, if you want a one-off).\&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2008">2008</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
    </item>
    
    <item>
      <title>Using the Twitter API to alert myself to swings in the Dow</title>
      <link>https://www.bobdc.com/blog/using-the-twitter-api-to-alert/</link>
      <pubDate>Mon, 27 Oct 2008 10:35:26 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/using-the-twitter-api-to-alert/</guid>
      
      
      <description><div>Or, using a free REST-based service to distribute important financial information on a hot new platform.</div><div>&lt;p&gt;&lt;a href=&#34;http://twitter.com/djia50&#34;&gt;&lt;img id=&#34;id203576&#34; src=&#34;http://ichart.finance.yahoo.com/instrument/1.0/%5EDJI/chart;range=1d/image;size=239x110&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;chart of today&#39;s DJIA activity&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;In a normal day two months ago, I would never check the Dow Jones Industrial Average, but with the financial turmoil of recent weeks I&amp;rsquo;ve wondered more often whether we were getting in deeper or if things had bounced back at all, so I was checking too often. I usually went to &lt;a href=&#34;http://www.cnbc.com/&#34;&gt;CNBC&amp;rsquo;s home page&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Since I &lt;a href=&#34;https://www.bobdc.com/blog/tweet-tweet&#34;&gt;became a regular Twitter user&lt;/a&gt;, I&amp;rsquo;ve also checked that more frequently, and I&amp;rsquo;d been wondering about a good excuse to play with the &lt;a href=&#34;http://apiwiki.twitter.com/&#34;&gt;Twitter API&lt;/a&gt;. I decided that a Twitter account that alerted me to big swings in the Dow would inform me about important news there when I checked Twitter, letting me skip the visits to cnbc.com. I used the &lt;a href=&#34;http://code.google.com/p/python-twitter/&#34;&gt;python-twitter&lt;/a&gt; API to write a script that checks the Dow on &lt;a href=&#34;http://finance.yahoo.com/&#34;&gt;Yahoo Finance&lt;/a&gt; and tweets if the figure has moved more than 50 points since the last tweet or since the day&amp;rsquo;s opening, whichever was more recent. I also scheduled a cron job to run this once an hour. (My host provider won&amp;rsquo;t let me schedule it any more often.) These days, 50 points doesn&amp;rsquo;t seem like a huge swing, but three or four tweets in one day tell me about big gains, big losses, or big volatility, and no tweets tells me that things are relatively calm.&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s always nice when minimal coding around a REST-based API from a free service results in something genuinely useful. The account is called &lt;a href=&#34;http://twitter.com/djia50&#34;&gt;DJIA50&lt;/a&gt;, and I hope it&amp;rsquo;s useful to others as well.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2008">2008</category>
      
      <category domain="https://www.bobdc.com//categories/publishing">publishing</category>
      
    </item>
    
    <item>
      <title>New XMP spec</title>
      <link>https://www.bobdc.com/blog/new-xmp-spec/</link>
      <pubDate>Thu, 23 Oct 2008 19:43:03 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/new-xmp-spec/</guid>
      
      
      <description><div>And the W3C thought they had a problem dating XML Recommendation releases.</div><div>&lt;p&gt;&lt;a href=&#34;http://www.adobe.com/devnet/xmp/&#34;&gt;&lt;img id=&#34;id203577&#34; src=&#34;http://www.adobe.com/devnet/images/xmp_tagline.gif&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;some description&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve found several things to like and several not to like about Adobe&amp;rsquo;s XMP specification (&lt;a href=&#34;http://www.xml.com/pub/a/2004/09/22/xmp.html&#34;&gt;[1]&lt;/a&gt;, &lt;a href=&#34;https://www.bobdc.com/blog/using-or-not-using-adobes-xmp&#34;&gt;[2]&lt;/a&gt;, &lt;a href=&#34;http://www.snee.com/bobdc.blog/2008/03/batch_processing_of_image_file.html&#34;&gt;[3]&lt;/a&gt;), a subset of RDF that lets you embed standard and custom metadata into the kinds of file formats that Adobe products typically read and write. I recently learned from a &lt;a href=&#34;http://www.crossref.org/CrossTech/2008/10/xmp_marches_on.html&#34;&gt;Tony Hammond posting&lt;/a&gt; on the crosstech blog that Adobe just released a new version of this seven year-old spec. There&amp;rsquo;s a new SDK with it as well, and while I only remember the SDK supporting C++ before, this &amp;ldquo;version&amp;rdquo; includes Java support.&lt;/p&gt;
&lt;p&gt;I quote the word &amp;ldquo;version&amp;rdquo; because, as Tony pointed out, this new one has &amp;ldquo;no version number and no date&amp;rdquo;. Hopefully they&amp;rsquo;ll learn something from the &lt;a href=&#34;http://lists.w3.org/Archives/Public/xml-editor/2008OctDec/0016.html&#34;&gt;current problems&lt;/a&gt; being discussed about dating releases of the XML Recommendations and the value of making it easy for people to cite specific releases of a spec.&lt;/p&gt;
&lt;h2 id=&#34;1-comments&#34;&gt;1 Comments&lt;/h2&gt;
&lt;p&gt;By Tony Hammond on &lt;a href=&#34;#comment-2144&#34;&gt;October 24, 2008 10:54 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Hi Bob:&lt;/p&gt;
&lt;p&gt;Just to clarify the SDK issues. As I &lt;a href=&#34;http://www.crossref.org/CrossTech/2007/03/xmp_capabilities_extended.html&#34;&gt;posted&lt;/a&gt; to CrossTech back in March &amp;lsquo;07, there was a new release of the SDK (4.1.1) which included two libraries: XMPCore and XMPFiles. Both are implemented in C++ with a Java implementation provided for XMPCore only. (XMPCore allows XMP packets to be constructed, whereas XMPFiles allows XMP packets to be written to and read from files, i.e. the useful bit.)&lt;/p&gt;
&lt;p&gt;The new SDK release (4.4.2 - there were no public 4.2 or 4.3 offerings afaik), does not change things in this regard. There is still no Java support for reading or writing files. To quote from the &lt;a href=&#34;http://www.adobe.com/devnet/xmp/pdfs/XMP-Toolkit-SDK-Overview.pdf&#34;&gt;Toolkit Overiew (PDF)&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;&amp;ldquo;A Java implementation of XMPCore is also provided, to be used with J2SE Version 1.4.2 or higher. Project files for Eclipse 3.2 and an Ant build file are included.&amp;rdquo;&lt;br /&gt;
&lt;br /&gt;
&amp;ldquo;XMPFiles is provided as a C++ implementation &amp;hellip;&amp;rdquo;&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The XMP Toolkit SDK is at least versioned. The XMP Spec, however, is not currently versioned - although Gunar Penikis has &lt;a href=&#34;http://blogs.adobe.com/gunar/2008/10/new_xmp_sdks_released.html#comments&#34;&gt;agreed&lt;/a&gt; that the docs should be dated. Another related concern is that there seems to be no public archive of XMP Specs being maintained by Adobe (or other).&lt;/p&gt;
&lt;p&gt;\&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2008">2008</category>
      
      <category domain="https://www.bobdc.com//categories/rdf">RDF</category>
      
    </item>
    
    <item>
      <title>SPARQL and relational databases: getting started</title>
      <link>https://www.bobdc.com/blog/sparql-and-relational-database/</link>
      <pubDate>Mon, 20 Oct 2008 17:24:13 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/sparql-and-relational-database/</guid>
      
      
      <description><div>Asking about tables and columns and doing a simple join.</div><div>&lt;p&gt;In an &lt;a href=&#34;http://www.snee.com/xml/xml2006/owlrdbms.html&#34;&gt;earlier project&lt;/a&gt; I did querying relational data with SPARQL, I wanted to demonstrate how adding OWL metadata made it possible to answer useful queries that couldn&amp;rsquo;t have been answered with just the original data. I did this using two simple databases of one table each, but recently, while helping someone who was also using the &lt;a href=&#34;http://d2rq.org/&#34;&gt;D2RQ&lt;/a&gt; interface to provide access to some relational data, I decided to get more comfortable with using SPARQL to query data from multi-table relational databases.&lt;/p&gt;
&lt;p&gt;I wanted to find ways to:&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://www.flickr.com/photos/toasty/1540997910/&#34;&gt;&lt;img id=&#34;id203612&#34; src=&#34;http://farm3.static.flickr.com/2259/1540997910_dd04a385ae.jpg?v=0&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;picture of globe&#34; width=&#34;320px&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;list a database&amp;rsquo;s tables&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;list the columns in those tables&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;run a query that joined data from at least two of the tables&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;It was all easier than I though it would be.&lt;/p&gt;
&lt;p&gt;For a database to query, I went to the sample database section of &lt;a href=&#34;http://dev.mysql.com/doc/#sampledb&#34;&gt;MySQL&amp;rsquo;s documentation page&lt;/a&gt; and got the &amp;ldquo;world&amp;rdquo; database. This database has a &lt;code&gt;country&lt;/code&gt; table with columns for country names, country codes unique within the table, and other information about the country such as its population and head of state. Another table, named &lt;code&gt;countrylanguage&lt;/code&gt;, has a column for a country code to link to the &lt;code&gt;country&lt;/code&gt; table, a column for a language spoken in that country, another for a figure showing the percentage of the country&amp;rsquo;s residents who speak that language, and a fourth column for a boolean &lt;code&gt;IsOfficial&lt;/code&gt; value.&lt;/p&gt;
&lt;p&gt;The following SQL query against that database lists the name of each country, a language spoken there, and what percentage of the population speak that language, with the rows sorted from high percentages to low:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;SELECT Name, Language, Percentage FROM country, countrylanguage 
WHERE country.Code = countrylanguage.CountryCode
ORDER BY Percentage DESC;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Here&amp;rsquo;s how I developed a SPARQL query doing the same thing.&lt;/p&gt;
&lt;p&gt;As I described in the writeup of my earlier project, the first step of using D2RQ with a particular database is pointing its generate-mapping utility at a database installed in a running copy of MySQL (or one of the other relational database managers that D2RQ supports) to generate an SQL-to-SPARQL mapping file for that database. Next, you start up the d2r-server program with that mapping file as a parameter to run a server that provides a SPARQL endpoint for that database.&lt;/p&gt;
&lt;p&gt;You could then send SPARQL queries to that server&amp;rsquo;s SPARQL endpoint &lt;a href=&#34;https://www.bobdc.com/blog/querying-wikidbpedia-for-presi#id203795&#34;&gt;using Curl&lt;/a&gt;. For exploring the data, though, I prefer the SNORQL interface, because I can enter a query on a web form, click the Go button, and browse the result. (The SNORQL interface in the latest official release of D2RQ &lt;a href=&#34;http://sourceforge.net/mailarchive/forum.php?thread_name=48F7C172.3040003%40snee.com&amp;amp;forum_name=d2rq-map-devel&#34;&gt;doesn&amp;rsquo;t get along with Firefox 3.0&lt;/a&gt;—apparently there&amp;rsquo;s already a fixed version in D2RQ&amp;rsquo;s cvs tree—so I used Chrome for this.)&lt;/p&gt;
&lt;p&gt;When exploring a relational database, I first want to know what its tables are and what columns are in those tables. To do this with SPARQL queries, the D2RQ mapping file that I generated earlier provided some good clues. (Keep in mind that there may be other SPARQL interfaces to relational databases in the future, and they may not all map the relational structures to RDF the same way.)&lt;/p&gt;
&lt;p&gt;First: what are the tables in the database? D2RQ treats each table row as a resource of type tablename. For example, it treats the &lt;code&gt;country&lt;/code&gt; table&amp;rsquo;s row for France as a resource identified as http://localhost:2020/resource/country/&lt;strong&gt;FRA&lt;/strong&gt;, which has a &lt;a href=&#34;http://www.w3.org/1999/02/22-rdf-syntax-ns&#34;&gt;http://www.w3.org/1999/02/22-rdf-syntax-ns&lt;/a&gt;#&lt;strong&gt;type&lt;/strong&gt; of http://localhost:2020/resource/vocab/&lt;strong&gt;country&lt;/strong&gt;. So, by asking for a list of all the types with the following query (with DISTINCT added so that each only shows up once),&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;SELECT DISTINCT ?o WHERE {
  ?s  &amp;lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#type&amp;gt; ?o
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;SNORQL&amp;rsquo;s browse mode gives me this list (SNORQL defines &amp;ldquo;vocab&amp;rdquo; as the prefix for the http://localhost:2020/resource/vocab/ URI), which corresponds to the tables declared in the world database:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;vocab:country 
vocab:countrylanguage 
vocab:city 
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;What are the columns of these tables? For an example of one column&amp;rsquo;s name, the world database&amp;rsquo;s &lt;code&gt;country&lt;/code&gt; table has a row for France with a figure of 59225700 in the &lt;code&gt;Population&lt;/code&gt; column. D2RQ represents the column name as a property named tablename_columnname, so the triple describing France&amp;rsquo;s population tells us that http://localhost:2020/resource/country/&lt;strong&gt;FRA&lt;/strong&gt; has a http://localhost:2020/resource/vocab/&lt;strong&gt;country_Population&lt;/strong&gt; of 59225700. How do we list all these predicate names? As I mentioned in &lt;a href=&#34;https://www.bobdc.com/blog/how-you-can-explore-a-new-set&#34;&gt;How you can explore a new set of linked data&lt;/a&gt;, the following query lists all the predicates in an RDF-based data collection:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;SELECT DISTINCT ?p WHERE {?s ?p ?o}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Displaying &lt;em&gt;all&lt;/em&gt; predicates in the world database, though, includes a few extras along with the ones representing the table columns. The table columns are all in the http://localhost:2020/resource/vocab namespace, so the best way I could think of to query for predicates that were in that namespace was like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;SELECT DISTINCT ?p 
WHERE { ?s ?p ?o.
        FILTER(regex(str(?p),&amp;quot;http://localhost:2020/resource/vocab/&amp;quot;)).
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;It&amp;rsquo;s pretty kludgy, so I&amp;rsquo;d love to hear of a better alternative. It selects all the predicates in the data set and then only passes along the ones whose URLs, when converted to a string, have &amp;ldquo;http://localhost:2020/resource/vocab/&amp;rdquo; in them. If another namespace used http://localhost:2020/resource/vocab/foo as its URL, this FILTER would select predicates in that namespace as well, because it has that &amp;ldquo;http://localhost:2020/resource/vocab/&amp;rdquo; substring in there, but I couldn&amp;rsquo;t find a better way to do this. (Although SPARQL implements a subset of XPath, the &lt;a href=&#34;http://www.w3.org/TR/1999/REC-xpath-19991116#function-namespace-uri&#34;&gt;namespace-uri&lt;/a&gt; function is not part of this subset.)&lt;/p&gt;
&lt;p&gt;I won&amp;rsquo;t show you all 24 predicate names that that this query returns, but for the SPARQL version of the SQL query above that listed country names, languages, and percentages, I picked out the tablename_columnname predicates I needed from the list of 24 and used them to create this query:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;SELECT ?name ?language ?percentage 
WHERE { ?s1 vocab:country_Name ?name;
            vocab:country_Code ?ccode.
        ?s2 vocab:countrylanguage_CountryCode ?ccode;
            vocab:countrylanguage_Language ?language;
            vocab:countrylanguage_Percentage ?percentage.
}
ORDER BY DESC(?percentage)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;It&amp;rsquo;s not a very fancy query. The most interesting part is the use of the &lt;code&gt;ccode&lt;/code&gt; variable to connect information in triples from the world database&amp;rsquo;s &lt;code&gt;country&lt;/code&gt; table with information in triples from the database&amp;rsquo;s &lt;code&gt;countrylanguage&lt;/code&gt; table. When I first started this I had no idea how I was going to do this SPARQL equivalent of an SQL join, but once I sat down and tried it was intuitive enough. I won&amp;rsquo;t show you all 984 results rows, but here&amp;rsquo;s a selection from the middle to give you the flavor of what&amp;rsquo;s there:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;quot;Réunion&amp;quot;        &amp;quot;Creole French&amp;quot;  91.5
&amp;quot;Germany&amp;quot;        &amp;quot;German&amp;quot;         91.3
&amp;quot;Seychelles&amp;quot;     &amp;quot;Seselwa&amp;quot;        91.3
&amp;quot;Romania&amp;quot;        &amp;quot;Romanian&amp;quot;       90.7
&amp;quot;American Samoa&amp;quot; &amp;quot;Samoan&amp;quot;         90.6
&amp;quot;Syria&amp;quot;          &amp;quot;Arabic&amp;quot;         90.0
&amp;quot;Swaziland&amp;quot;      &amp;quot;Swazi&amp;quot;          89.9
&amp;quot;Bahamas&amp;quot;        &amp;quot;Creole English&amp;quot; 89.7
&amp;quot;Chile&amp;quot;          &amp;quot;Spanish&amp;quot;        89.7
&amp;quot;Sweden&amp;quot;         &amp;quot;Swedish&amp;quot;        89.5
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now I feel more confident about forging ahead to explore sets of relational data that I can access with SPARQL queries using the D2RQ interface.&lt;/p&gt;
&lt;h2 id=&#34;4-comments&#34;&gt;4 Comments&lt;/h2&gt;
&lt;p&gt;By Irene Polikoff on &lt;a href=&#34;#comment-2135&#34;&gt;October 20, 2008 9:18 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Bob,&lt;/p&gt;
&lt;p&gt;Queries like SELECT DISTINCT ?p WHERE {?s ?p ?o} are OK for RDF stores because of their specialized indexing strategies, but I don&amp;rsquo;t think this would work well with D2RQ against a relational database.&lt;/p&gt;
&lt;p&gt;It would be OK if the RDBMS had a very small amount of data. But on a database of any size, such queries are not practical. I suspect D2RQ will just try to get all the data in the entire database and then try to figure out predicates. Same for query you are suggesting for getting the table names.&lt;/p&gt;
&lt;p&gt;D2RQ translates database schema in the following way - all tables get represented as owl:Class, all columns are exposed as properties connected to the classes using rdfs:domain statements.&lt;/p&gt;
&lt;p&gt;So, a much less expensive way to get all tables is simply by using SELECT ?table WHERE {?table rdf:type owl:Class}&lt;/p&gt;
&lt;p&gt;Then to get columns you can do ?column rdfs:domain ?table. In fact, this single query will get you all tables with the corresponding columns&lt;/p&gt;
&lt;p&gt;SELECT ?table ?column WHERE{?column rdfs:domain ?table}&lt;/p&gt;
&lt;p&gt;Regards,&lt;/p&gt;
&lt;p&gt;Irene&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-2136&#34;&gt;October 20, 2008 10:55 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Thanks Irene! That all makes sense to me, but&lt;br /&gt;
I didn&amp;rsquo;t see any use of the &lt;a href=&#34;http://www.w3.org/2002/07/owl#&#34;&gt;http://www.w3.org/2002/07/owl#&lt;/a&gt; namespace (or even the string &amp;ldquo;owl&amp;rdquo;) in the mapping file generated by D2RQ, and&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;  SELECT ?table WHERE {?table rdf:type }
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;got me no results. This also got me no results:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;SELECT ?table ?column WHERE{?column rdfs:domain ?table}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;By &lt;a href=&#34;http://dowhatimean.net/&#34; title=&#34;http://dowhatimean.net/&#34;&gt;Richard Cyganiak&lt;/a&gt; on &lt;a href=&#34;#comment-2138&#34;&gt;October 21, 2008 3:29 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Bob, I enjoy your posts on practical RDF production and SPARQL usage a lot. Keep it up!&lt;/p&gt;
&lt;p&gt;Irene is right that the queries for listing the classes and properties are not very efficient. However, her proposed replacements unfortunately don&amp;rsquo;t work – D2RQ only translates the instance data (records) from the DB to RDF, but does not provide an RDF view on the schema level. Information about classes and properties is on the schema level.&lt;/p&gt;
&lt;p&gt;It would be nice if these statements were queryable through SPARQL, but this remains to be done for future versions of D2RQ.&lt;/p&gt;
&lt;p&gt;So, at the moment, the queries you suggested are the best that can be done. They will work well enough for databases up to a few 100k records if the number of tables/columns isn&amp;rsquo;t very large. Note that Snorql uses variations of your queries to list classes and properties.&lt;/p&gt;
&lt;p&gt;By Irene Polikoff on &lt;a href=&#34;#comment-2141&#34;&gt;October 23, 2008 12:04 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Hmm&amp;hellip; interesting. It works exactly as described for me. Of course, I use a version of D2RQ bundled with TopBraid Composer. And I know that Holger has made some changes. This could be one of them.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2008">2008</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
    </item>
    
    <item>
      <title>White paper on metadata standards</title>
      <link>https://www.bobdc.com/blog/white-paper-on-metadata-standa/</link>
      <pubDate>Thu, 16 Oct 2008 09:01:26 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/white-paper-on-metadata-standa/</guid>
      
      
      <description><div>Not as confusing a choice as many think.</div><div>&lt;p&gt;I recently wrote a white paper for Innodata Isogen titled &lt;a href=&#34;http://www.innodata-isogen.com/knowledge_center/white_papers/content_metadata_standards_wp&#34;&gt;Content Metadata Standards: Libraries, Publishers, and More&lt;/a&gt; that is available for free if you don&amp;rsquo;t mind registering first. (If you do register, you&amp;rsquo;ll find a nice choice of other white papers available on topics such as DITA, content re-use, and ebooks.)&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve heard several people say &amp;ldquo;there are dozens and dozens of metadata standards out there! It&amp;rsquo;s all so confusing!&amp;rdquo; It&amp;rsquo;s really not that bad, and this paper addresses several key issues and tours through the more well-known standards. To summarize three of the main points:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Dublin Core is pretty central. Some people complain that it&amp;rsquo;s too vague (for example, it has a &amp;ldquo;date&amp;rdquo; field, but date of what?) but being very generalized is what makes it so broadly useful. Most other metadata standards build on it for more specific uses.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;It seems like half the metadata standards out there are administered by the US Library of Congress. Many of these standards build on the LoC&amp;rsquo;s original &lt;a href=&#34;http://lcweb.loc.gov/marc/marcdocz.html&#34;&gt;MARC&lt;/a&gt; standard for bibliographic information, and others began at one library or university or another and moved to the LoC&amp;rsquo;s stewardship as they grew. (More good news from the LoC: I just found out from a &lt;a href=&#34;http://broadcast.oreilly.com/2008/10/us-library-of-congress-makes-a.html&#34;&gt;Rick Jelliffe posting&lt;/a&gt; that they&amp;rsquo;re putting together a set of &lt;a href=&#34;http://www.w3.org/Provider/Style/URI&#34;&gt;cool URIs&lt;/a&gt; for US federal legislation.) Many of these are focused on increasingly modern needs such as digital scholarship.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The more industry-specific standards, by their very nature, make it relatively easy to identify whether they&amp;rsquo;re relevant to what you as a publisher need. For example, if you&amp;rsquo;re involved in magazine publishing, &lt;a href=&#34;http://www.prismstandard.org&#34;&gt;PRISM&lt;/a&gt; will be valuable; for book publishing, there&amp;rsquo;s &lt;a href=&#34;http://www.editeur.org/onix.html&#34;&gt;ONIX&lt;/a&gt;. (&amp;ldquo;Involved in&amp;rdquo; here could mean being such a publisher yourself, but it could also mean being having such publishers as business partners selling you content or buying it from you.)&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Other issues covered by the paper are the OCLC&amp;rsquo;s five classes of metadata, which provide a nice framework when evaluating your own needs; content standards such as DocBook and DITA with built-in metadata slots; specialized vs. generalized metadata, and controlled, taxonomy-based keyword metadata vs. folksonomies.&lt;/p&gt;
&lt;p&gt;If there are any important issues about metadata that a general publishing audience would want to know about but which aren&amp;rsquo;t covered by the paper, please let me know.&lt;/p&gt;
&lt;h2 id=&#34;3-comments&#34;&gt;3 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://scottysengineeringlog.net&#34; title=&#34;http://scottysengineeringlog.net&#34;&gt;Scott Hudson&lt;/a&gt; on &lt;a href=&#34;#comment-2124&#34;&gt;October 16, 2008 11:38 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;We are considering adding explicit support for Dublin Core metadata as part of the Publishers schema we are creating in the DocBook Subcommittee for Publishers. Thoughts on that approach?&lt;/p&gt;
&lt;p&gt;I think ONIX might be a bit heavy-weight to add to our schema, though&amp;hellip;&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://scottysengineeringlog.net&#34; title=&#34;http://scottysengineeringlog.net&#34;&gt;Scott Hudson&lt;/a&gt; on &lt;a href=&#34;#comment-2125&#34;&gt;October 16, 2008 11:43 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Forgot to add: Our reasoning behind adopting Dublin Core in the DocBook Publishers schema is:\&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;interoperability\&lt;/li&gt;
&lt;li&gt;widely recognized standard. DocBook already has support for external standards, such as SVG and MathML, so why not for metadata?\&lt;/li&gt;
&lt;li&gt;tool support/integration. It should be easier for tool vendors to add support for a recognized industry standard.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&amp;ndash;Scott&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-2126&#34;&gt;October 16, 2008 1:28 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Hi Scott,&lt;/p&gt;
&lt;p&gt;There&amp;rsquo;s already a good chunk of Dublin Core in DocBook now, right[1]?&lt;/p&gt;
&lt;p&gt;ONIX support would mean narrowing your definition of publisher to mean &amp;ldquo;(hard copy?) book publisher,&amp;rdquo; and I&amp;rsquo;m sure you want to keep it broader than that.&lt;/p&gt;
&lt;p&gt;Bob&lt;/p&gt;
&lt;p&gt;[1] &lt;a href=&#34;http://www.docbook.org/specs/cs-docbook-docbook-4.2.html#d0e652&#34;&gt;http://www.docbook.org/specs/cs-docbook-docbook-4.2.html#d0e652&lt;/a&gt;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2008">2008</category>
      
      <category domain="https://www.bobdc.com//categories/metadata">metadata</category>
      
    </item>
    
    <item>
      <title>Learning more about SPARQL</title>
      <link>https://www.bobdc.com/blog/learning-more-about-sparql/</link>
      <pubDate>Fri, 10 Oct 2008 07:38:49 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/learning-more-about-sparql/</guid>
      
      
      <description><div>Improving the Bart blackboard query.</div><div>&lt;p&gt;Since I &lt;a href=&#34;https://www.bobdc.com/blog/querying-dbpedia&#34;&gt;first wrote on&lt;/a&gt; sending DBpedia SPARQL queries about Bart&amp;rsquo;s blackboard messages at the start of Simpsons episodes, I&amp;rsquo;ve learned a lot more about SPARQL (reading &lt;a href=&#34;http://www.w3.org/TR/2008/REC-rdf-sparql-query-20080115/&#34;&gt;the spec&lt;/a&gt; helped) and I wanted to walk through some of the things I&amp;rsquo;ve learned by expanding on and refining my original query.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://www.addletters.com/bart-simpson-generator.htm&#34;&gt;&lt;img id=&#34;id203611&#34; src=&#34;https://www.bobdc.com/img/main/bartsparql.gif&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; width=&#34;360px&#34; alt=&#34;Bart and SPARQL query&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I had finished that entry by wondering how to list Bart&amp;rsquo;s blackboard entries for all episodes instead of for just one season. Vaclav Synacek &lt;a href=&#34;https://www.bobdc.com/blog/querying-dbpedia#c001393&#34;&gt;showed me one way&lt;/a&gt;, and I recently realized that there&amp;rsquo;s a much simpler way—maybe too simple (all queries shown assume the namespace declarations shown on the &lt;a href=&#34;http://dbpedia.org/snorql/&#34;&gt;SNORQL interface form&lt;/a&gt; for sending SPARQL queries to DBpedia):&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;SELECT ?blackboard WHERE {
  ?s dbpedia2:blackboard ?blackboard.
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;(See it executed &lt;a href=&#34;http://dbpedia.org/snorql/?query=SELECT+%3Fblackboard+WHERE+%7B%0D%0A++%3Fs+dbpedia2%3Ablackboard+%3Fblackboard.%0D%0A%7D%0D%0A&#34;&gt;here&lt;/a&gt;.) What makes this too simple is that it asks for the dbpedia2:blackboard value for &lt;em&gt;anything&lt;/em&gt; in DBpedia, whether it&amp;rsquo;s a Simpsons episode or not. I wanted to only ask about Simpsons episodes—not that it comes up for anything else, but I thought this would be a good exercise—so I looked on &lt;a href=&#34;http://dbpedia.org/page/Tennis_the_Menace&#34;&gt;the DBpedia page&lt;/a&gt; for one episode and found a property called dbpedia2:portalProperty. For Simpsons episodes, it has a value of &amp;ldquo;The Simpsons&amp;rdquo;@en, with the final @en indicating that this string is in English, so I entered this query:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;SELECT ?episode,?blackboard WHERE {
  ?episode dbpedia2:blackboard ?blackboard;
           dbpedia2:portalProperty &amp;quot;The Simpsons&amp;quot;@en.
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;(See it executed &lt;a href=&#34;http://dbpedia.org/snorql/?query=SELECT+%3Fepisode%2C%3Fblackboard+WHERE+%7B%0D%0A++%3Fepisode+dbpedia2%3Ablackboard+%3Fblackboard%3B%0D%0A+++++++++++dbpedia2%3AportalProperty+%22The+Simpsons%22%40en.%0D%0A%7D%0D%0A&#34;&gt;here&lt;/a&gt;.) This query and its answer set brought up two more questions for me:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Some answers are URLs, and some are actual strings of what Bart wrote. How can I tell DBpedia to only give me the latter?&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;What&amp;rsquo;s a portalProperty, and what other values might show up there?&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I learned from the spec how to filter the answer set so that only literal strings get returned, with no URLs (more technically, with no &lt;a href=&#34;http://www.w3.org/TR/2008/REC-rdf-sparql-query-20080115/#QSynIRI&#34;&gt;IRIs&lt;/a&gt;): with the &lt;a href=&#34;http://www.w3.org/TR/2008/REC-rdf-sparql-query-20080115/#func-isLiteral&#34;&gt;isLiteral operator&lt;/a&gt; in a &lt;a href=&#34;http://www.w3.org/TR/2008/REC-rdf-sparql-query-20080115/#tests&#34;&gt;filter&lt;/a&gt;, like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;SELECT ?episode,?blackboard WHERE {
  ?episode dbpedia2:blackboard ?blackboard;
           dbpedia2:portalProperty &amp;quot;The Simpsons&amp;quot;@en.
  FILTER isLiteral(?blackboard)
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;(See it executed &lt;a href=&#34;http://dbpedia.org/snorql/?query=SELECT+%3Fepisode%2C%3Fblackboard+WHERE+%7B%0D%0A++%3Fepisode+dbpedia2%3Ablackboard+%3Fblackboard%3B%0D%0A+++++++++++dbpedia2%3AportalProperty+%22The+Simpsons%22%40en.%0D%0A++FILTER+isLiteral(%3Fblackboard)%0D%0A%7D%0D%0A&#34;&gt;here&lt;/a&gt;.) Now to the portalProperty. As I described in &lt;a href=&#34;https://www.bobdc.com/blog/how-you-can-explore-a-new-set&#34;&gt;How you can explore a new set of linked data&lt;/a&gt;, a query like the following lists all the values that came up for a particular property, although if there are too many, DBpedia may not return them all:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;SELECT DISTINCT ?pprop WHERE {
  ?s dbpedia2:portalProperty ?pprop.
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;(See it executed &lt;a href=&#34;http://dbpedia.org/snorql/?query=SELECT+DISTINCT+%3Fpprop+WHERE+%7B%0D%0A++%3Fs+dbpedia2%3AportalProperty+%3Fpprop.%0D%0A%7D%0D%0A&#34;&gt;here&lt;/a&gt;.) We want that DISTINCT keyword because otherwise we&amp;rsquo;re asking about &lt;em&gt;all&lt;/em&gt; the triples that have dbpedia2:portalProperty predicates, and we know that for the Simpsons alone that&amp;rsquo;s over a hundred repetitions.&lt;/p&gt;
&lt;p&gt;The list of potential portalProperty values is interesting, but not all Simpsons episodes have this property assigned, so asking for dbpedia2:blackboard values for any subjects that have a dbpedia2:portalProperty value of &amp;ldquo;The Simpsons&amp;rdquo;@en won&amp;rsquo;t give us a complete list of blackboard gags. Most episodes seem to have a dbpedia2:reference property pointing to the page on thesimpsons.com for that episode, so I considered querying for dbpedia2:blackboard values for any subjects that have a dbpedia2:reference value with &amp;ldquo;thesimpsons.com&amp;rdquo; in it, but then I realized that this wouldn&amp;rsquo;t be much different from Vaclav&amp;rsquo;s solution.&lt;/p&gt;
&lt;p&gt;The real point is that as I learn more about SPARQL (and DBpedia), I&amp;rsquo;m finding more ways to explore this huge set of interesting data and more ways to control the data that&amp;rsquo;s returned to me. Checking what Bart wrote on the blackboard is fun, but I have some more interesting ideas in the works.&lt;/p&gt;
&lt;h2 id=&#34;3-comments&#34;&gt;3 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://feelitlive.com&#34; title=&#34;http://feelitlive.com&#34;&gt;Simon Gibbs&lt;/a&gt; on &lt;a href=&#34;#comment-2121&#34;&gt;October 11, 2008 2:35 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Inspiring post! I had a shot at it and discovered you need to use a couple of techniques to get it close. I wonder if inference rules embedded in the DB might also be required to deal with categories properly.&lt;/p&gt;
&lt;p&gt;SELECT ?chalkboard_gag WHERE {&lt;br /&gt;
{&lt;br /&gt;
{?episode skos:subject _:category . _:category skos:broader }&lt;br /&gt;
UNION&lt;br /&gt;
{?episode skos:subject }&lt;br /&gt;
}&lt;br /&gt;
?episode dbpedia2:blackboard ?chalkboard_gag&lt;br /&gt;
FILTER isLiteral(?chalkboard_gag)&lt;br /&gt;
FILTER (?chalkboard_gag != &amp;ldquo;None&amp;rdquo;@en)&lt;br /&gt;
}&lt;/p&gt;
&lt;p&gt;There are a couple of interesting modelling issues in here as well. Many values are not atomic, most are pre-formatted with quotes and Lisa has a quote in one episode. Perhaps most interesting, there is a child-safe and non-child-safe version of one message - and rightly so, you might not want to encourage your audience to visit that domain.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://semantic.umwblogs.org&#34; title=&#34;http://semantic.umwblogs.org&#34;&gt;Patrick Murray-John&lt;/a&gt; on &lt;a href=&#34;#comment-2123&#34;&gt;October 13, 2008 10:00 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Many thanks! I&amp;rsquo;m realizing more and more how useful it is to share SPARQL queries focused on particular topics, just to get a broader sense of the twists and turns we all encounter.&lt;/p&gt;
&lt;p&gt;Here&amp;rsquo;s a tiny one I discovered this week. Using ARC2, I plugged in a query to DBpedia that worked just fine in the query form at &lt;a href=&#34;http://dbpedia.org/sparql/.&#34;&gt;http://dbpedia.org/sparql/.&lt;/a&gt; But it failed in the PHP script. Turned out that, when defining prefixes, I had included a space between the prefix and the URI. That worked in the form, but failed the script. Just a wee gotcha in case anyone else encounters it.&lt;/p&gt;
&lt;p&gt;Patrick&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.furia.com&#34; title=&#34;http://www.furia.com&#34;&gt;glenn mcdonald&lt;/a&gt; on &lt;a href=&#34;#comment-2529&#34;&gt;June 3, 2010 10:48 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;This topic arose again in this StackOverflow question:&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://stackoverflow.com/questions/2956449/linked-data-and-endpoint/&#34;&gt;http://stackoverflow.com/questions/2956449/linked-data-and-endpoint/&lt;/a&gt;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2008">2008</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
    </item>
    
    <item>
      <title>(semantic web) - semantics = linked data?</title>
      <link>https://www.bobdc.com/blog/semantic-web-semantics-linked/</link>
      <pubDate>Tue, 07 Oct 2008 09:53:23 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/semantic-web-semantics-linked/</guid>
      
      
      <description><div>So much of the best &#34;semantic web&#34; technology has little to do with semantics.</div><div>&lt;p&gt;When people talk about semantic technology, they&amp;rsquo;re often talking about technology that has nothing to do with semantics. They&amp;rsquo;re talking about the new possibilities that the RDF data model and the SPARQL query language add to distributed database applications, and there&amp;rsquo;s a lot to talk about. As Jim Hendler &lt;a href=&#34;http://swig.xmlhack.com/2007/11/30/2007-11-30.html&#34;&gt;once wrote&lt;/a&gt;,&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;My document can point at your document on the Web, but my database can&amp;rsquo;t point at something in your database without writing special purpose code. The Semantic Web aims at fixing that.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Why do we describe technology for easier integration of machine-readable data on the web as &amp;ldquo;semantic&amp;rdquo;? I don&amp;rsquo;t mean to pick on Jim—I had the quote handy because it&amp;rsquo;s in my file of favorite quotes, and few understand the semantic add-ons to Linked Data that will make for a proper Semantic Web better than he does—but I don&amp;rsquo;t see semantics &lt;em&gt;necessarily&lt;/em&gt; playing much role in the technology evolving to let web databases easily point at each other. There are some semantics built into the middle third of all RDF triples, because the requirement that a predicate use a full URL means that I can&amp;rsquo;t just say &amp;ldquo;title&amp;rdquo; there, leaving you to wonder whether I&amp;rsquo;m talking about a job title, the deed to a piece of property, or the title of a work; I have to say something like &lt;a href=&#34;http://purl.org/dc/elements/1.1/title&#34;&gt;http://purl.org/dc/elements/1.1/title&lt;/a&gt; to make it clear that I mean the title of a work. In other words, I must make the semantics of the triple&amp;rsquo;s predicate clear.&lt;/p&gt;
&lt;blockquote id=&#34;id203639&#34; class=&#34;pullquote&#34;&gt;There is plenty of payoff when applications can combine data from different sources to do things with no need for a central schema tying them together, and this is possible without any program logic addressing the semantics of that data.&lt;/blockquote&gt;
&lt;p&gt;Other than that, I don&amp;rsquo;t see what&amp;rsquo;s semantic about exposing data as triples and using SPARQL to get at it as described by &lt;a href=&#34;http://www.w3.org/DesignIssues/LinkedData.html&#34;&gt;Tim Berners-Lee&amp;rsquo;s original essay on Linked Data principles&lt;/a&gt;, except that the general ideas are an outgrowth of the older idea of the Semantic Web. We&amp;rsquo;re seeing now that as more data gets exposed and linked this way, more and more possibilities open up. Once enough data is linked using this technology, then there will be enough to work with to start making general-purpose semantic applications, but until then, the use of OWL and related technologies that really address semantics will be limited to niches. Companies such as &lt;a href=&#34;http://www.topquadrant.com/&#34;&gt;TopQuadrant&lt;/a&gt; and &lt;a href=&#34;http://clarkparsia.com/&#34;&gt;Clark &amp;amp; Parsia&lt;/a&gt; are doing very interesting work in those niches, and they&amp;rsquo;re blazing the trails for when the broader information technology and publishing worlds are ready to take advantage of the semantics of this linked data. (In a recent Semantic Web gang podcast, someone said that new technology traditionally moves from NASA to the military to corporations to independent end users, and that we&amp;rsquo;re seeing the reverse with Semantic Web adoption. I guess he didn&amp;rsquo;t know that NASA is a client of both TopQuadrant and Clark &amp;amp; Parsia.)&lt;/p&gt;
&lt;p&gt;While &lt;a href=&#34;http://www.zepheira.com/&#34;&gt;Zepheira&amp;rsquo;s&lt;/a&gt; web site certainly uses the word &amp;ldquo;semantic&amp;rdquo; a lot, they seem more focused on linked data technologies as they focus on helping their clients &amp;ldquo;integrate, navigate and manage information across personal, group and enterprise boundaries.&amp;rdquo; I think that this is a better place for most developers to focus on, at least for now, because there&amp;rsquo;s a better chance of a medium- and even short-term payoff. That&amp;rsquo;s the data infrastructure that actual semantic technologies can build on, so for now let&amp;rsquo;s focus on the value of the infrastructure: data exposed (either publicly or behind the firewall across internal enterprise boundaries, which I believe is where Zepheira&amp;rsquo;s been helping a lot of clients) in a standard way so that the growing number of tools built around those standards can take advantage of that data. This is just what the organizations in the &lt;a href=&#34;http://richard.cyganiak.de/2007/10/lod/&#34;&gt;Linking Open Data dataset cloud&lt;/a&gt; have been doing. There is plenty of payoff when applications can combine data from different sources to do things with no need for a central schema tying them together, and this is possible without any program logic addressing the semantics of that data.&lt;/p&gt;
&lt;p&gt;Of course the real semantic technologies such as OWL and inferencing engines build on that, so this will bring even cooler applications. Nevertheless, to evangelize the data infrastructure that this will build on and to allay the fears of enterprise IT people who remember pie-in-the-sky AI promises when they hear the word &amp;ldquo;semantic&amp;rdquo;, telling them about Semantic Web technology without the semantic parts (a.k.a. Linked Data) looks like an easier sell to me.&lt;/p&gt;
&lt;p&gt;Comments? Corrections? Is the full URL in predicates enough to say that any use of RDF triples qualifies as semantic technology? (If anyone tells me that I&amp;rsquo;m misunderstanding the term &amp;ldquo;semantics&amp;rdquo;, I&amp;rsquo;ll be tempted to say &amp;ldquo;well, that&amp;rsquo;s just &lt;a href=&#34;https://www.bobdc.com/img/main/lockhorns20051030.gif&#34;&gt;semantics&lt;/a&gt;&amp;rdquo;, so be forewarned.)&lt;/p&gt;
&lt;h2 id=&#34;9-comments&#34;&gt;9 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.siatec.net/&#34; title=&#34;http://www.siatec.net/&#34;&gt;Simone Onofri&lt;/a&gt; on &lt;a href=&#34;#comment-2110&#34;&gt;October 7, 2008 10:57 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;talking with more people about (semantic) web the word semantic always confused for IA meaning&amp;hellip; Linked Data works well for this&amp;hellip; and there are an interesting trick on your title: semantic web - semantic = web (the real, not 2.0, 3.0 and so much)&lt;/p&gt;
&lt;p&gt;Cheers,&lt;/p&gt;
&lt;p&gt;Simone&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://clarkparsia.com/&#34; title=&#34;http://clarkparsia.com/&#34;&gt;Kendall Clark&lt;/a&gt; on &lt;a href=&#34;#comment-2111&#34;&gt;October 7, 2008 4:41 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Why do people say &amp;ldquo;semantic&amp;rdquo; when there isn&amp;rsquo;t much semantics in what they&amp;rsquo;re doing? Well, marketing mojo is a perfectly reasonable answer. I don&amp;rsquo;t mean to be cynical, since this seems an unobjectionable strategy.&lt;/p&gt;
&lt;p&gt;The same goes for people&amp;rsquo;s worries that &amp;ldquo;semantic&amp;rdquo; equals AI &amp;ndash; marketing is, by definition, a pragmatic endeavor, so if in some cases it makes sense to soft-sell that connection, one should soft-sell it. If in other cases &amp;ndash; which the anti-OWL crowd never seems to consider &amp;ndash; the connection between semantics and AI is a plus for a customer, then you can emphasize it.&lt;/p&gt;
&lt;p&gt;For a more technical answer, re: Jim&amp;rsquo;s thing about pointing at other people&amp;rsquo;s databases&amp;hellip;This is tricky. Even within organizations, or within parts of organizations, integrating directly with someone else&amp;rsquo;s database is tricky, often introducing a tight coupling that you don&amp;rsquo;t really want.&lt;/p&gt;
&lt;p&gt;Using some &amp;ldquo;semantics&amp;rdquo; in this context really means integrating data models (or service interfaces) rather than integrating data sources directly, such that consumers and producers are sufficiently decoupled to be able to ignore some (though not all) changes in the underlying data.&lt;/p&gt;
&lt;p&gt;The standard way to do this (and the way which is in line with historical trends in IT) is to have some declarative abstract representation of the data source, or database, and integrate with *that* thing, since it will tend to be more change resistant than the underlying thing it is an abstraction of. Hence the use of ontologies for integration, etc. In this usage pattern, a reasoner is an aid to (1) developing the ontology in the first place, and (2) a supplement to the code you write to integrate with the ontology instead of the the thing it represents.&lt;/p&gt;
&lt;p&gt;(So you get the reasoner to do check that the model is logically consistent, to do subclass and subproperty inference, or most specific type realization, or inference explanation in order to shorten the total amount of code you have to write, etc.)&lt;/p&gt;
&lt;p&gt;RDFS gives you some abstraction constructs over the underlying messy reality, but if you&amp;rsquo;re doing RDFS, you&amp;rsquo;re not exactly semantics-less. OWL gives you more, obviously. ISO Common Logic gives you even more &amp;ndash; at least, in principle &amp;ndash; at the cost of some tradeoffs, etc.&lt;/p&gt;
&lt;p&gt;But it&amp;rsquo;s this problem of direct coupling of data sources that makes me think that the Linked Data thing, at least as I presently understand it, is not a useful approach for the sorts of things we&amp;rsquo;re trying to do. Oh and my skepticism about the claims of network effect &amp;ndash; that once you get enough &amp;ldquo;linked data&amp;rdquo; some cool semantics effects emerge. I think there&amp;rsquo;s no reason whatever to believe that will happen. Or to put my skepticims in a weaker, falsifiable form: no one has explained, with sufficient detail, a plausible scenario whereby having a lot of &amp;ldquo;linked data&amp;rdquo; means you don&amp;rsquo;t need to build models or ontologies or etc.&lt;/p&gt;
&lt;p&gt;Oh, and PS: The rhetorical strategies around the notion of &amp;ldquo;niche&amp;rdquo; &amp;ndash; OWL is the &amp;ldquo;niche&amp;rdquo;, Linked Data is the &amp;ldquo;mainstream&amp;rdquo; &amp;ndash; relies on a shared set of empirical data (or shared set of empirical *hunches and intuitions*) about what&amp;rsquo;s getting used more often, when, where, etc. Apparently we don&amp;rsquo;t share the same data or intuition with you, Bob, such that OWL is the &amp;ldquo;niche&amp;rdquo; and Linked Data is the mainstream.&lt;/p&gt;
&lt;p&gt;It may seem that way in the semweb blogger echo chamber, but it doesn&amp;rsquo;t seem that way anywhere else, at least not to me. FYI. :&amp;gt;&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-2112&#34;&gt;October 7, 2008 9:54 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Hi Kendall,&lt;/p&gt;
&lt;p&gt;That&amp;rsquo;s a good point about data integration. Saying that field W in database X is the same as field Y in database Z is not something to do lightly if you&amp;rsquo;re doing updates based on those values, so understanding and documenting the semantics of those fields makes such an association much more robust.&lt;/p&gt;
&lt;p&gt;I certainly don&amp;rsquo;t believe that once you get enough linked data some cool semantics effects will emerge spontaneously; my point is that as more and more interesting public data sets become available and point at each other, there will be more opportunities to create ontologies that add value to that data and write apps that take advantage of that added value, which is where I think the real semantic goodness lies, and least in terms of apps with the potential for wide deployment.&lt;/p&gt;
&lt;p&gt;Perhaps I should have described in more detail why I wrote this posting. I see organizations in industries like publishing (which I will address more specifically in the near future) asking what semantic technologies can do for them, but the concept of &amp;ldquo;semantic technologies&amp;rdquo; is such a vague blob to them that both for them and those helping them to address the question it becomes more difficult to line up potential actions with potential benefits. I think that breaking down the categories of semantic technologies into related units will make this easier, and my &amp;ldquo;(semantic web) - semantics = linked data&amp;rdquo; cut, while obviously broad and generalized, is an attempt at this.&lt;/p&gt;
&lt;p&gt;I certainly don&amp;rsquo;t feel that Linked Data is mainstream. The semantic web marketing that you mentioned has a much bigger head start. I used &amp;ldquo;niches&amp;rdquo; to describe those who can really take advantage of semantics at this point; I&amp;rsquo;m sure it will move beyond niches over time. (I look forward to it!) I see the potential value of exposing data in SPARQL endpoints, without necessarily defining an ontology for the data sets, as having more potential for publishers and many others for now. In other words, I&amp;rsquo;m not talking about what&amp;rsquo;s getting used more often, but what I feel has more short-term potential. Just my opinion.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://danbri.org/&#34; title=&#34;http://danbri.org/&#34;&gt;Dan Brickley&lt;/a&gt; on &lt;a href=&#34;#comment-2113&#34;&gt;October 8, 2008 6:44 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;(I won&amp;rsquo;t repeat what Kendall has already said re data integration strategies)&lt;/p&gt;
&lt;p&gt;While I&amp;rsquo;m not a huge fan of the *word* &amp;ldquo;semantics&amp;rdquo; (many find it confusing or obscure), there are plenty of semantics intimately involved in all RDF-based linked data activities. At the heart of the SW effort is a project to make mechanically clearer what Web documents are telling us. A big part of this is to do with reference - knowing what real world entities are being described. Colloquially, &amp;ldquo;what they are about&amp;rdquo;.&lt;br /&gt;
Linked data efforts care about that at least as much as the rest of the SW world: URIs for things, well known URIs for things, URIs for things that can be readily used to find good and machine-readable descriptions of those things,&amp;hellip;. And at least to the extent they use FOAF constructs and habits, there&amp;rsquo;s some modest but significant use of OWL too: the use of the &amp;lsquo;inverse functional property&amp;rsquo; construct (eg. isPrimaryTopicOf) to help point out identifying properties in a description, even if the property itself is not one known to an aggregator.&lt;/p&gt;
&lt;p&gt;In general I&amp;rsquo;m pretty wary of encouraging SW enthusiasts to fracture into competing sub-tribes. There is too much &amp;ldquo;we don&amp;rsquo;t need that fancy academic OWL&amp;rdquo; rhetetoric floating around, which is to my mind as senseless as having Java users berate Javadoc and IDEs.&lt;/p&gt;
&lt;p&gt;Even a tiny little vocabulary the size of FOAF is complex enough that internal contradictions and other mistakes are a real risk (&amp;lsquo;can documents be agents? are onlineaccounts documents? or agents? can they be both? can two different documents have the same foaf:sha1 value? why not&amp;rsquo;, etc.). OWL is a tool that can help us achieve clarity, and detect inclarities, in this area, regardless of whether there are &amp;ldquo;intelligent agents&amp;rdquo; running around at click-time drawing inferences and doing what-u-wait inferences.&lt;/p&gt;
&lt;p&gt;And don&amp;rsquo;t get me started on &amp;lsquo;owl:sameAs&amp;rsquo;, &amp;hellip;&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://clarkparsia.com/&#34; title=&#34;http://clarkparsia.com/&#34;&gt;Kendall Clark&lt;/a&gt; on &lt;a href=&#34;#comment-2114&#34;&gt;October 8, 2008 2:35 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;In general I&amp;rsquo;m pretty wary of encouraging SW enthusiasts to fracture into competing sub-tribes. There is too much &amp;ldquo;we don&amp;rsquo;t need that fancy academic OWL&amp;rdquo; rhetetoric floating around, which is to my mind as senseless as having Java users berate Javadoc and IDEs.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Dan, I agree:&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://clarkparsia.com/weblog/2006/11/13/cooperation-competition-and-growing-markets-or-why-expressivity-wars-are-stupid/&#34;&gt;http://clarkparsia.com/weblog/2006/11/13/cooperation-competition-and-growing-markets-or-why-expressivity-wars-are-stupid/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.topquadrant.com/&#34; title=&#34;http://www.topquadrant.com/&#34;&gt;Dean Allemang&lt;/a&gt; on &lt;a href=&#34;#comment-2115&#34;&gt;October 8, 2008 8:04 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I used to apologize for the word &amp;ldquo;Semantic&amp;rdquo; in &amp;ldquo;Semantic Web&amp;rdquo;, until a student in one of my classes who happened to be a professional linguist told me to stop apologizing. Why? Because, he told me, there are many meanings of the word &amp;ldquo;Semantics&amp;rdquo; in Linguistics, including speech acts, formal semantics, etc. But, he pointed out, all of them refer to one very simple notion of semantics - that a symbol can refer to something in the world. He went so far as to say that this was the fundamental notion of &amp;ldquo;Semantics&amp;rdquo; in linguistics. Other linguists might challenge that statement for linguistics in general, but it holds up in the Semantic Web. The basic idea of linked data is that you can refer to something in the world with a symbol (where a symbol is a URI).&lt;/p&gt;
&lt;p&gt;This is the basis for the non-niche work that makes up the bulk of TopQuadrant&amp;rsquo;s custom, in fact, as far as we are concerned, the jury is still out on the usefulness of &amp;ldquo;OWL and related technologies&amp;rdquo; in real enterprise applications. Our customers are getting on pretty well with the more basic notion of &amp;ldquo;Semantics&amp;rdquo;.&lt;/p&gt;
&lt;p&gt;There&amp;rsquo;s a reason why Jim and I called our book &amp;ldquo;Working Ontologist&amp;rdquo; - we only refer to OWL inasmuch as it can be used to specify how datasources relate to one another.&lt;/p&gt;
&lt;p&gt;By Irene Polikoff on &lt;a href=&#34;#comment-2116&#34;&gt;October 8, 2008 8:40 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;TopQuadrant is certainly doing a lot of complex ontology-based work for NASA. Having said this, our business is about helping organizations harness (read - integrate, share, analyze) information distributed across systems and parties. Much of the work at NASA is about data integration.&lt;/p&gt;
&lt;p&gt;Majority of our customers use pretty light ontologies/schemas. There is no way of getting away from some kind of a schema or structure – XML has it, spreadsheets have it, databases have it, etc. And this is what our customers are bringing together. TopBraid Suite generates RDFS/OWL representation of the schemas used to interpret the data so that the data and its structure can be exposed in RDF for SPARQL queries - through either conversion of the data or for translating SPARQL queries into SQL. We see ourselves as a SPARQL company. Take a look, for example, at: &lt;a href=&#34;http://topquadrant.com/sparqlmotion/&#34;&gt;http://topquadrant.com/sparqlmotion/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Most of the data sources our customers want to integrate tend to be internal but also some external – from the technology perspective we do not really see any difference. One business difference is whether a customer does it to expose the data outside their organization on the World Wide Web. A few are considering doing this, but majority wants a more flexible way of integrating data and creating and exposing data services to other parties within their own and partnering organizations. Many also want to take advantage of flexible schemas and databases they get from using RDFS/OWL as opposed to more rigid world of relational databases. We see the later benefit being of considerable interest to companies involved in managing and publishing content and wanting to have flexible taxonomies and metadata.&lt;/p&gt;
&lt;p&gt;Since the areas we see most developers focusing on are inline with what you have described in the post (managing, navigating and integrating information), I guess we agree on where the value is.&lt;/p&gt;
&lt;p&gt;What I am not so sure about is the contrast you are drawing between this and the word “semantic”. If “semantic” is interpreted as focus on complex description logic ontologies, then we see some of it here and there, but not much. We do see people wanting to express their business rules as part of the data integration and application development. For example, in the vocabulary management application there could be a rule that indicates that a “level” number of a topic needs to be changed in a certain way if it is moved within a hierarchy. TopBraid Suite makes it easy to automate this.\&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://uche.posterous.com/&#34; title=&#34;http://uche.posterous.com/&#34;&gt;Uche Ogbuji&lt;/a&gt; on &lt;a href=&#34;#comment-2119&#34;&gt;October 9, 2008 11:19 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;m speaking with my Uche hat on, because I think in many ways, Zeheira is an integration of differing perspectives (I&amp;rsquo;m, for example, more of an XML type than most). As for the word &amp;ldquo;semantics&amp;rdquo;, it&amp;rsquo;s interested me even before I got heavily into RDF (just before, to be fair). The spark was Robin Cover&amp;rsquo;s 1998 article, &lt;a href=&#34;http://xml.coverpages.org/xmlAndSemantics.html&#34;&gt;&amp;ldquo;XML and Semantic Transparency&amp;rdquo;&lt;/a&gt;. I still use the term &amp;ldquo;semantic transparency&amp;rdquo; a lot to describe the gap left by the base layer of XML technology.&lt;/p&gt;
&lt;p&gt;In the side bar to my 2000 article &lt;a href=&#34;http://www.ibm.com/developerworks/library/w-rdf/&#34;&gt;&amp;ldquo;An introduction to RDF&amp;rdquo;&lt;/a&gt; I was already speaking of the curse of the &amp;ldquo;S&amp;rdquo; word. In the end, as Kendall pointed out, it all comes down to marketing, and that&amp;rsquo;s fine. Marketing is all about communication, and if &amp;ldquo;semantic&amp;rdquo; allows those visiting our site to understand what we want them to understand, we&amp;rsquo;ll use the term.&lt;/p&gt;
&lt;p&gt;I don&amp;rsquo;t think there is dichotomy between linked data and &amp;ldquo;semantics&amp;rdquo;. As I&amp;rsquo;ve argued a lot in my &amp;ldquo;Thinking XML&amp;rdquo;, I think that simple links, e.g. within schema definitions, to some source of semantic agreement, whether expressed in RDF or otherwise, is sufficient for most needs, and sufficient to make a huge change in the value of bodies of data. See, for example:&lt;/p&gt;
&lt;p&gt;* &lt;a href=&#34;http://www.ibm.com/developerworks/xml/library/x-tipdict.html&#34;&gt;http://www.ibm.com/developerworks/xml/library/x-tipdict.html&lt;/a&gt;&lt;br /&gt;
* &lt;a href=&#34;http://www.ibm.com/developerworks/xml/library/x-think32.html&#34;&gt;http://www.ibm.com/developerworks/xml/library/x-think32.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I think this is a very close correspondence to linked data. I know that e.g. Tim BL doesn&amp;rsquo;t like my advocacy of Linked Data without insisting on every other spec built on top of RDF, but I don&amp;rsquo;t see my lack of enthusiasm for SPARQL and OWL as schism-making. If the RDF community thinks itself healthy and viable, it&amp;rsquo;s going to have to accommodate deep differences of technical opinion. We can&amp;rsquo;t all just be a happy-happy-joy-joy coxless eight.&lt;/p&gt;
&lt;p&gt;As for Zepheira, we cleave to the practical. All our architects can agree on the outlines of Linked Data, and I offer pretty sharp tools for using this to bring semantic transparency to XML, and re-animating XML and other large, dead bodies of data is generally a large concern for our clients, so both shoes fit rather nicely, and we&amp;rsquo;re happy to wear them.\&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.furia.com&#34; title=&#34;http://www.furia.com&#34;&gt;glenn mcdonald&lt;/a&gt; on &lt;a href=&#34;#comment-2224&#34;&gt;February 17, 2009 10:17 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I continue to think that even Linked Data, by which we mean my pieces of data can link to your pieces of data across the web, is secondary. The big change is in database structure: from relational tables to graphs. It&amp;rsquo;s in the way that my pieces of data link to &lt;em&gt;my other&lt;/em&gt; pieces of data. This is the core thing RDF does (but not well enough), and what SPARQL exists to take advantage of (but not well enough). And yes, it&amp;rsquo;s also the foundation for linking separate datasets across the web, but as long as we keep talking about it as primarily (exclusively?) an integration strategy, it will seem peripheral to most of the people who currently have the data&amp;hellip;\&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2008">2008</category>
      
      <category domain="https://www.bobdc.com//categories/semantic-web">semantic web</category>
      
    </item>
    
    <item>
      <title>Querying wiki/dbpedia for presidents&#39; ages at inauguration</title>
      <link>https://www.bobdc.com/blog/querying-wikidbpedia-for-presi/</link>
      <pubDate>Tue, 30 Sep 2008 09:10:25 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/querying-wikidbpedia-for-presi/</guid>
      
      
      <description><div>Easier than Jon Udell had thought.</div><div>&lt;p&gt;In an &lt;a href=&#34;http://itc.conversationsnetwork.org/shows/detail3793.html#&#34;&gt;August 19th&lt;/a&gt; interview with Jon Udell, David Huynh of &lt;a href=&#34;http://www.freebase.com/&#34;&gt;Freebase&lt;/a&gt; (and formerly of MIT&amp;rsquo;s &lt;a href=&#34;http://simile.mit.edu/&#34;&gt;Project Simile&lt;/a&gt;) introduced his Freebase demo by describing a hypothetical query to a database asking for presidents&amp;rsquo; ages when they are inaugurated and whether there&amp;rsquo;s a trend that we&amp;rsquo;re getting younger presidents. Jon replies:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;If it were possible to issue a database query over Wikipedia, then you could ask a question like that; you could say give me—well first of all, it would presume that you could identify US presidents, and it would further presume that you could find a field within those documents that would say the ages of those people, and that&amp;rsquo;s not really part of the structure of Wikipedia. This information can be explicitly made available in Freebase. It hasn&amp;rsquo;t in all cases, and that&amp;rsquo;s part of the social process. So it ultimately relies on people to refine this raw information that came from Wikipedia and elsewhere so that it is more fielded and structured.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote id=&#34;id203634&#34; class=&#34;pullquote&#34;&gt;DBPedia + SPARQL is my new favorite toy.&lt;/blockquote&gt;
&lt;p&gt;They then go on to do such a query with Freebase&amp;hellip; but they could have done it with Wikipedia, with a little help from SPARQL and &lt;a href=&#34;http://en.wikipedia.org/wiki/Dbpedia&#34;&gt;DBpedia&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Wikipedia has plenty of fielded information in &lt;a href=&#34;http://en.wikipedia.org/wiki/Help:Infobox&#34;&gt;infoboxes&lt;/a&gt;. DBpedia lets you access this collection of data &lt;a href=&#34;http://wiki.dbpedia.org/OnlineAccess?v=11r9&#34;&gt;via a SPARQL endpoint&lt;/a&gt;. While Wikipedia (and hence DBpedia) have no field for a president&amp;rsquo;s age at inauguration, it does store their birthdate and the year they began their first term, so calculating their ages when they each became president is pretty easy.&lt;/p&gt;
&lt;p&gt;You could see a list of US Presidents by going to Wikipedia&amp;rsquo;s &lt;a href=&#34;http://en.wikipedia.org/wiki/List_of_Presidents_of_the_United_States&#34;&gt;List of Presidents of the United States&lt;/a&gt; page, but let&amp;rsquo;s do it programatically with this SPARQL query so that we can build from there to get their ages at inauguration. We ask for the things in the database that have a subject of &amp;ldquo;Presidents of the United States&amp;rdquo;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;PREFIX skos: &amp;lt;http://www.w3.org/2004/02/skos/core#&amp;gt;
SELECT ?presName WHERE {
  ?presName skos:subject &amp;lt;http://dbpedia.org/resource/Category:Presidents_of_the_United_States&amp;gt;.
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;To see the query in action, click &lt;a href=&#34;http://dbpedia.org/snorql/?query=SELECT+%3FpresName+WHERE+%7B%0D%0A++%3FpresName+skos%3Asubject+%3Chttp%3A%2F%2Fdbpedia.org%2Fresource%2FCategory%3APresidents_of_the_United_States%3E.%0D%0A%7D%0D%0A%0D%0A&#34;&gt;this executable URL version&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;A slightly more complex query lists the name, birth date, and beginning of the first term of each one:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;PREFIX skos: &amp;lt;http://www.w3.org/2004/02/skos/core#&amp;gt;
PREFIX dbpedia2: &amp;lt;http://dbpedia.org/property/&amp;gt;
SELECT ?presName,?birthday, ?startDate WHERE {
  ?presName skos:subject &amp;lt;http://dbpedia.org/resource/Category:Presidents_of_the_United_States&amp;gt;;
            dbpedia2:birth ?birthday;
            dbpedia2:presidentStart ?startDate.
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Click &lt;a href=&#34;http://dbpedia.org/snorql/?query=SELECT+%3FpresName%2C%3Fbirthday%2C+%3FstartDate+WHERE+%7B%0D%0A++%3FpresName+skos%3Asubject+%3Chttp%3A%2F%2Fdbpedia.org%2Fresource%2FCategory%3APresidents_of_the_United_States%3E%3B%0D%0A++++++++++++dbpedia2%3Abirth+%3Fbirthday%3B%0D%0A++++++++++++dbpedia2%3ApresidentStart+%3FstartDate.%0D%0A%7D%0D%0A%0D%0A&#34;&gt;here&lt;/a&gt; to see it in action. Because the fielded information for the various presidents is not consistent, only 19 of them have dbpedia2:birth and dbpedia2:presidentStart fields, so you&amp;rsquo;ll only see those presidents returned for this query. Wikipedia pages for all US presidents do have this information, but it&amp;rsquo;s not always named the same way—compare the dbpedia pages for &lt;a href=&#34;http://dbpedia.org/page/Zachary_Taylor&#34;&gt;Zachary Taylor&lt;/a&gt; and &lt;a href=&#34;http://dbpedia.org/page/Lyndon_B._Johnson&#34;&gt;Lyndon Johnson&lt;/a&gt;, who doesn&amp;rsquo;t show up on the list return by that last query, for some examples. As Jon said, filling out that data is part of the social process.&lt;/p&gt;
&lt;p&gt;The real promise of Linked Data is the ability to write a program or script that grabs the data and does something with it, so I wrote a two-line batch file that:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;uses &lt;a href=&#34;http://curl.haxx.se/&#34;&gt;curl&lt;/a&gt; to send that URL to DBpedia and store the results in an XML file&lt;/li&gt;
&lt;li&gt;runs a short XSLT script to calculate the presidents&amp;rsquo; ages at inauguration&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;It doesn&amp;rsquo;t look much like a two-line batch file here, so before running it, replace the first six carriage returns with spaces to turn the first seven lines into one:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;curl -o presidentAges.xml -F &amp;quot;query=PREFIX dbpedia2: 
  &amp;lt;http://dbpedia.org/property/&amp;gt; PREFIX skos: 
  &amp;lt;http://www.w3.org/2004/02/skos/core#&amp;gt; SELECT ?presName,?birthday, 
  ?startDate WHERE { ?presName skos:subject 
  &amp;lt;http://dbpedia.org/resource/Category:Presidents_of_the_United_States&amp;gt;; 
  dbpedia2:birth ?birthday; dbpedia2:presidentStart ?startDate.}&amp;quot; 
  http://dbpedia.org/sparql 
xsltproc presidentAges.xsl presidentAges.xml
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Here is the XSLT stylesheet, presidentAges.xsl:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;lt;xsl:stylesheet xmlns:xsl=&amp;quot;http://www.w3.org/1999/XSL/Transform&amp;quot;
                xmlns:s=&amp;quot;http://www.w3.org/2005/sparql-results#&amp;quot;
                version=&amp;quot;1.0&amp;quot;&amp;gt;


  &amp;lt;xsl:strip-space elements=&amp;quot;*&amp;quot;/&amp;gt;
  &amp;lt;xsl:output method=&amp;quot;text&amp;quot;/&amp;gt;




  &amp;lt;xsl:template match=&amp;quot;s:result&amp;quot;&amp;gt;


    &amp;lt;xsl:variable name=&amp;quot;birthYear&amp;quot;
                  select=&amp;quot;substring(
                          s:binding[@name=&#39;birthday&#39;]/s:literal,1,4)&amp;quot;/&amp;gt;
    &amp;lt;xsl:variable name=&amp;quot;presidentName&amp;quot;
                  select=&amp;quot;substring(s:binding[@name=&#39;presName&#39;]/s:uri,29)&amp;quot;/&amp;gt;


    &amp;lt;xsl:value-of select=&amp;quot;translate($presidentName,&#39;_&#39;,&#39; &#39;)&amp;quot;/&amp;gt;
    &amp;lt;xsl:text&amp;gt; &amp;lt;/xsl:text&amp;gt;
    &amp;lt;xsl:value-of select=&amp;quot;s:binding[@name=&#39;startDate&#39;]/
                          s:literal - $birthYear - 1&amp;quot;/&amp;gt;
&amp;lt;xsl:text&amp;gt;
&amp;lt;/xsl:text&amp;gt;
  &amp;lt;/xsl:template&amp;gt;
&amp;lt;/xsl:stylesheet&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I subtracted the birth year and then another 1 from the startDate because with inaugurations being in January (at least in modern times) I assumed that each president hadn&amp;rsquo;t reached his birthday yet. Here is the result:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Abraham Lincoln 51
Andrew Johnson 56
Bill Clinton 46
Chester A. Arthur 50
Franklin Pierce 48
George H. W. Bush 64
George Washington 56
Harry S. Truman 60
James K. Polk 49
James Monroe 58
John Adams 61
John Quincy Adams 57
Martin Van Buren 54
Millard Fillmore 49
Richard Nixon 55
Rutherford B. Hayes 54
Thomas Jefferson 57
Ulysses S. Grant 46
Zachary Taylor 64
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I never realized that Grant was the same age as Clinton when he started—a year younger than Obama is now—but having led the army that won the US Civil War, I guess he had reasons to look a bit older at the start of his term.&lt;/p&gt;
&lt;p&gt;DBPedia + SPARQL is my new favorite toy, and I&amp;rsquo;m getting more and more ideas lately about useful (or at least fun) things to do with the combination.&lt;/p&gt;
&lt;h2 id=&#34;5-comments&#34;&gt;5 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.openlinksw.com/blog/~kidehen&#34; title=&#34;http://www.openlinksw.com/blog/~kidehen&#34;&gt;Kingsley Idehen&lt;/a&gt; on &lt;a href=&#34;#comment-2098&#34;&gt;September 30, 2008 12:01 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Bob,&lt;/p&gt;
&lt;p&gt;Nice demo :-)&lt;/p&gt;
&lt;p&gt;Your juxtaposition of the Udell comments re. parallax provide much needed additional insight re. utility of DBpedia and SPARQL.&lt;/p&gt;
&lt;p&gt;Kingsley&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://svg.startpagina.nl&#34; title=&#34;http://svg.startpagina.nl&#34;&gt;stelt&lt;/a&gt; on &lt;a href=&#34;#comment-2099&#34;&gt;September 30, 2008 2:54 PM&lt;/a&gt;&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;and whether there&amp;rsquo;s a trend that we&amp;rsquo;re getting younger presidents.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I think for that part we still need a time against age graph with a poly-fit, preferably in SVG.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-2101&#34;&gt;October 1, 2008 10:48 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Kingsley has sent me version of this query that uses an OpenLink extension to do the age calculation as part of the SPARQL query, without the need for the XSLT part. Click &lt;a href=&#34;http://tinyurl.com/48l6c6&#34;&gt;http://tinyurl.com/48l6c6&lt;/a&gt; to see the query and execute it. Very cool.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://blog.georgikobilarov.com&#34; title=&#34;http://blog.georgikobilarov.com&#34;&gt;Georgi Kobilarov&lt;/a&gt; on &lt;a href=&#34;#comment-2104&#34;&gt;October 5, 2008 7:22 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Hi Bob,&lt;/p&gt;
&lt;p&gt;fantastic stuff! The issue you were having with different rdf properties of the same relation will be solved shortly. I&amp;rsquo;ll release a new version of the infobox dataset (based on a new extraction approach) in the next days.&lt;/p&gt;
&lt;p&gt;Cheers, Georgi&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.openlinksw.com/blog/~kidehen&#34; title=&#34;http://www.openlinksw.com/blog/~kidehen&#34;&gt;Kingsley Idehen&lt;/a&gt; on &lt;a href=&#34;#comment-2105&#34;&gt;October 5, 2008 10:36 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Due to underlying engine change and DBpedia instance update, here is the revised query:&lt;/p&gt;
&lt;p&gt;SELECT ?presName, ?birthday, ?startDate, (bif:datediff(&amp;ldquo;year&amp;rdquo;, ?birthday, xsd:date(bif:sprintf(&amp;quot;%d-01-20&amp;quot;, ?startDate)))) as ?age_at_innaguration&lt;br /&gt;
&lt;br /&gt;
WHERE {?presName skos:subject ;&lt;br /&gt;
&lt;br /&gt;
dbpedia2:birth ?birthday;&lt;br /&gt;
&lt;br /&gt;
dbpedia2:presidentStart ?startDate.&lt;br /&gt;
&lt;br /&gt;
filter (datatype(?startDate) = xsd:integer)&lt;br /&gt;
&lt;br /&gt;
}&lt;/p&gt;
&lt;p&gt;Live Link: &lt;a href=&#34;http://tinyurl.com/4edjzl&#34;&gt;http://tinyurl.com/4edjzl&lt;/a&gt;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2008">2008</category>
      
      <category domain="https://www.bobdc.com//categories/dbpedia">DBpedia</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
    </item>
    
    <item>
      <title>tweet tweet</title>
      <link>https://www.bobdc.com/blog/tweet-tweet/</link>
      <pubDate>Fri, 26 Sep 2008 09:02:41 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/tweet-tweet/</guid>
      
      
      <description><div>Joining the twittering classes.</div><div>&lt;p&gt;After joining twitter last April, I entered the following as my second entry a month later:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;I don&amp;rsquo;t twitter. I barely have time to follow my friends&amp;rsquo; blogs. I just signed up to grab the name bobdc in case I ever do want to use it.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;&lt;a href=&#34;http://twitter.com/bobdc&#34;&gt;&lt;img id=&#34;id203592&#34; src=&#34;http://assets3.twitter.com/images/twitter_logo_s.png?1222127610&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;twitter logo&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Daniela Barbosa&amp;rsquo;s posting on &lt;a href=&#34;http://danielabarbosa.blogspot.com/2008/08/newspapers-and-their-use-of-twitter.html&#34;&gt;Newspapers and Their Use of Twitter&lt;/a&gt; prompted me to get to know it better. If publishers ranging from &lt;a href=&#34;http://twitter.com/bbcbreaking&#34;&gt;the BBC&lt;/a&gt; to &lt;a href=&#34;http://twitter.com/TheOnion&#34;&gt;The Onion&lt;/a&gt; use it, and it&amp;rsquo;s carving out a new role in how people communicate, and it&amp;rsquo;s free, it&amp;rsquo;s kind of silly not to investigate it further.&lt;/p&gt;
&lt;p&gt;How do you decide who to follow on twitter? For a start, their server can search an online address book and then tell you which of those people have twitter accounts. I have no online address book, but I do have a gmail account, so I wrote a little XSLT stylesheet to convert my address book to a comma separated value file, uploaded that to gmail, and then pointed twitter to it and picked out some names.&lt;/p&gt;
&lt;p&gt;As with other social networks, part of the fun is checking who your friends&amp;rsquo; friends are, and after finding a few friends to follow and seeing who they follow, I started finding more friends and publishers to add. This leads to a certain hanging-out-with friends aspect to using twitter, and you know that making a little joke as a follow-up to a friend&amp;rsquo;s tweet only makes sense if you have a few mutual followers to see the exchange.&lt;/p&gt;
&lt;p&gt;A few random things that I like about twitter: I&amp;rsquo;m a sucker for anything with a &lt;a href=&#34;http://help.twitter.com/index.php?pg=kb.page&amp;amp;id=75&#34;&gt;command line interface&lt;/a&gt;, even if &lt;code&gt;whois&lt;/code&gt; is the only one of these commands that I use. It&amp;rsquo;s good to know it&amp;rsquo;s there. &lt;a href=&#34;http://twitter.com/hashtags&#34;&gt;hashtags&lt;/a&gt; are a great example of twitter carving out a new role in information distribution; for one example, to quote a Tim Bray tweet about the recent hurricane, &amp;ldquo;Following Ike on twitter (via #ike at &lt;a href=&#34;http://search.twitter.com/&#34;&gt;search.twitter.com&lt;/a&gt; and @ike) is pretty compelling&amp;rdquo;. It&amp;rsquo;s nice to know that even when people are limited to 140 characters of data at a time, they still find a place for semi-structured metadata.&lt;/p&gt;
&lt;p&gt;While the twitter prompt for a tweet is &amp;ldquo;What are you doing?&amp;rdquo;, I&amp;rsquo;m less inclined to tell the two dozen people following me &amp;ldquo;I&amp;rsquo;m going to empty the dishwasher&amp;rdquo; than I am to point out something funny I just saw on the web. One wants to be a little entertaining. The Onion is the most entertaining of all, and after hardly reading it for the last few years, twitter has me reading it again. In addition to &amp;ldquo;here&amp;rsquo;s something funny on the web&amp;rdquo;, another popular topic seems to be &amp;ldquo;here&amp;rsquo;s something that annoys me&amp;rdquo;. If &lt;a href=&#34;http://www.lewisblack.com/&#34;&gt;Lewis Black&lt;/a&gt; is the &lt;a href=&#34;http://www.cbsnews.com/sections/60minutes/rooney/main3419.shtml&#34;&gt;Andy Rooney&lt;/a&gt; of the boomer generation, twitter gives the rest of us a modern platform to complain about the annoyances of modern life.&lt;/p&gt;
&lt;p&gt;To really get to know twitter better, I&amp;rsquo;ve got another account that is only followed by my &lt;a href=&#34;http://twitter.com/bobdc&#34;&gt;bobdc account&lt;/a&gt; to experiment with some of the features, like the use of a mobile phone to send and receive tweets. It just occurred to me that anyone can find out the name of that account by doing some mouseovers on the &amp;ldquo;following&amp;rdquo; section of my twitter page, so if you&amp;rsquo;re really interested in tweets like &amp;ldquo;Test from my phone&amp;rdquo;, you know where to look.&lt;/p&gt;
&lt;p&gt;The twitter &lt;a href=&#34;http://apiwiki.twitter.com/REST+API+Documentation&#34;&gt;API&lt;/a&gt; is simple enough to start playing with right away. The following command line (substitute your own username and password and write it all as one line) retrieves an XML version of your followed friends&amp;rsquo; recent tweets:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;wget --http-user=USERNAME --http-passwd=PASSWORD 
  http://twitter.com/statuses/friends_timeline.xmlw
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I wrote a simple XSLT 1.0 stylesheet called &lt;a href=&#34;http://www.snee.com/bobdc.blog/files/newtweets.xsl&#34;&gt;newtweets.xsl&lt;/a&gt; (see also the &lt;a href=&#34;http://www.snee.com/bobdc.blog/files/newtweets.bat.txt&#34;&gt;newtweets.bat.txt&lt;/a&gt; batch file, renamed here for easier downloading) to turn the downloaded XML into more readable text, so now I have a simple command-line twitter client that took me one-third the time to write than this this weblog entry did.&lt;/p&gt;
&lt;p&gt;Another neat trick is embedding your last few tweets into a web page. A &lt;a href=&#34;http://twitter.com/badges/html&#34;&gt;twitter help page&lt;/a&gt; shows you the markup necessary to do this dynamically. The wrapper &lt;code&gt;div&lt;/code&gt; element has an @id value of &amp;ldquo;twitter-update-list&amp;rdquo;, so I used that hook to create a a bit of CSS that converts the tweets from a bulleted list to paragraphs separated by a little extra space:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;#twitter_div #twitter_update_list, #twitter_div #twitter_update_list li {
  margin-left: 0; padding-left: 0;
}


#twitter_div #twitter_update_list li {
  margin-bottom: 6pt;
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;You&amp;rsquo;ll see it on the right hand side of my &lt;a href=&#34;http://www.snee.com/bobdc.blog/&#34;&gt;weblog&amp;rsquo;s main page&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The experiments continue&amp;hellip;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2008">2008</category>
      
      <category domain="https://www.bobdc.com//categories/publishing">publishing</category>
      
    </item>
    
    <item>
      <title>Querying aggregated XBRL reports with SPARQL</title>
      <link>https://www.bobdc.com/blog/querying-aggregated-xbrl-repor/</link>
      <pubDate>Tue, 23 Sep 2008 09:15:10 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/querying-aggregated-xbrl-repor/</guid>
      
      
      <description><div>Easier than I though it would be.</div><div>&lt;p&gt;My main goal for doing a SPARQL query against XBRL data was to be able to pull out the same bit of information from multiple companies&amp;rsquo; reports at once, and it turned out to be much less work than I thought it would be. Here is the result of my query for interest expense figures across several companies:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;------
| companyName                    | periodStart  | periodEnd    | interestExp |
==============================================================================
| &amp;quot;GENERAL MILLS INC&amp;quot;            | &amp;quot;2005-05-30&amp;quot; | &amp;quot;2006-05-28&amp;quot; | &amp;quot;399600000&amp;quot; |
| &amp;quot;GENERAL MILLS INC&amp;quot;            | &amp;quot;2006-05-29&amp;quot; | &amp;quot;2007-05-27&amp;quot; | &amp;quot;426500000&amp;quot; |
| &amp;quot;GENERAL MILLS INC&amp;quot;            | &amp;quot;2007-05-28&amp;quot; | &amp;quot;2008-05-25&amp;quot; | &amp;quot;421700000&amp;quot; |
| &amp;quot;PAPA JOHNS INTERNATIONAL INC&amp;quot; | &amp;quot;2007-01-01&amp;quot; | &amp;quot;2007-07-01&amp;quot; | &amp;quot;3232000&amp;quot;   |
| &amp;quot;PAPA JOHNS INTERNATIONAL INC&amp;quot; | &amp;quot;2007-04-02&amp;quot; | &amp;quot;2007-07-01&amp;quot; | &amp;quot;1706000&amp;quot;   |
| &amp;quot;PAPA JOHNS INTERNATIONAL INC&amp;quot; | &amp;quot;2007-12-31&amp;quot; | &amp;quot;2008-06-29&amp;quot; | &amp;quot;3694000&amp;quot;   |
| &amp;quot;PAPA JOHNS INTERNATIONAL INC&amp;quot; | &amp;quot;2008-03-31&amp;quot; | &amp;quot;2008-06-29&amp;quot; | &amp;quot;1802000&amp;quot;   |
| &amp;quot;PEPSICO INC&amp;quot;                  | &amp;quot;2006-12-31&amp;quot; | &amp;quot;2007-06-16&amp;quot; | &amp;quot;96000000&amp;quot;  |
| &amp;quot;PEPSICO INC&amp;quot;                  | &amp;quot;2007-03-25&amp;quot; | &amp;quot;2007-06-16&amp;quot; | &amp;quot;54000000&amp;quot;  |
| &amp;quot;PEPSICO INC&amp;quot;                  | &amp;quot;2007-12-30&amp;quot; | &amp;quot;2008-06-14&amp;quot; | &amp;quot;132000000&amp;quot; |
| &amp;quot;PEPSICO INC&amp;quot;                  | &amp;quot;2008-03-23&amp;quot; | &amp;quot;2008-06-14&amp;quot; | &amp;quot;74000000&amp;quot;  |
------
&lt;/code&gt;&lt;/pre&gt;
&lt;blockquote id=&#34;id203630&#34; class=&#34;pullquote&#34;&gt;Being able to compare specific financial figures from different companies will be great for people doing financial research.&lt;/blockquote&gt;
&lt;p&gt;A given company&amp;rsquo;s XBRL SEC filing is typically an instance file full of facts plus additional files with taxonomies about the terms used and XLink linkbases about the relationships between the facts. The instance files, on their own, looked like the low hanging fruit to me.&lt;/p&gt;
&lt;p&gt;After kicking around some of my ideas for modeling XBRL in RDF with Dave Raggett (who&amp;rsquo;s doing some very interesting, more ambitious work modeling the whole deal in RDF—in related news, Kingsley Idehen &lt;a href=&#34;https://www.bobdc.com/blog/free-xbrl-software#comments&#34;&gt;said that&lt;/a&gt; OpenLink has an XBRL ontology almost ready, and TopQuadrant&amp;rsquo;s Ralph Hodgson &lt;a href=&#34;http://lists.w3.org/Archives/Public/semantic-web/2008Sep/att-0136/00-part&#34;&gt;pointed out&lt;/a&gt; the BRONTO project at &lt;a href=&#34;http://www.tifbrewery.com/tifBrewery/writing.htm&#34;&gt;TIFbrewery&lt;/a&gt;), I wrote an &lt;a href=&#34;http://www.snee.com/bobdc.blog/files/instance2rdf.xsl&#34;&gt;instance2rdf.xsl&lt;/a&gt; XSLT stylesheet to convert an XBRL instance to RDF/XML. After running it on the instance documents for several companies that I downloaded &lt;a href=&#34;http://www.sec.gov/Archives/edgar/xbrl.html&#34;&gt;from the SEC website&lt;/a&gt; and manually creating a file that I called &lt;a href=&#34;http://www.snee.com/bobdc.blog/files/colist.rdf&#34;&gt;colist.rdf&lt;/a&gt; to map company identifiers in the XBRL instances to company names, I ran the following query with &lt;a href=&#34;http://jena.sourceforge.net/ARQ/&#34;&gt;ARQ 1.4&lt;/a&gt; to ask about all Interest Expense figures in my collection of reports:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;PREFIX rdf: &amp;lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#&amp;gt;
PREFIX rdfs: &amp;lt;http://www.w3.org/2000/01/rdf-schema#&amp;gt;
PREFIX xi:  &amp;lt;http://www.xbrl.org/2003/instance&amp;gt;


SELECT DISTINCT ?companyName ?periodStart ?periodEnd ?interestExp


FROM &amp;lt;RRDonnelley.rdf&amp;gt;
FROM &amp;lt;pepsico.rdf&amp;gt;
FROM &amp;lt;nobleenergy.rdf&amp;gt;
FROM &amp;lt;generalmills.rdf&amp;gt;
FROM &amp;lt;papajohns.rdf&amp;gt;
FROM &amp;lt;dow.rdf&amp;gt;
FROM &amp;lt;ge.rdf&amp;gt;
FROM &amp;lt;cocacola.rdf&amp;gt;
FROM &amp;lt;colist.rdf&amp;gt;


WHERE {
  ?s rdf:type http://xbrl.us/us-gaap/2008-03-31#InterestExpense&amp;gt;;
     rdf:value     ?interestExp;
     xi:identifier ?identifier;
     xi:startDate  ?periodStart;
     xi:endDate    ?periodEnd.


  ?identifier rdfs:label ?companyName.


}
ORDER BY ?companyName ?periodStart
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Not all of the named companies have InterestExpense figures in that namespace; the query just asks for the figure from the companies that do.&lt;/p&gt;
&lt;p&gt;I originally planned to merge all the RDF files into one before running the query, but I decided to let SPARQL do it, which is why there are nine FROM clauses above. In a more realistic scenario, the RDF versions of the companies&amp;rsquo; XBRL data would be loaded into a single triplestore and you would run the query against that.&lt;/p&gt;
&lt;p&gt;As Dave &lt;a href=&#34;http://lists.w3.org/Archives/Public/semantic-web/2008Sep/0113.html&#34;&gt;suggested&lt;/a&gt;, I could add data typing to the RDF created from the XBRL instances. Before I add anything else to the RDF, though, I want to make sure that it enables a new kind of useful SPARQL query against the data that I couldn&amp;rsquo;t do before the addition. I&amp;rsquo;m open to suggestions!&lt;/p&gt;
&lt;p&gt;What does this prove? We know that RDF is great for aggregating data, especially resources that may have different data structures but certain data in common. XBRL gets more interesting when you start aggregating data from multiple companies, and I haven&amp;rsquo;t seen much of that, although my research was limited to &lt;a href=&#34;https://www.bobdc.com/blog/free-xbrl-software&#34;&gt;free software&lt;/a&gt;. Being able to compare specific financial figures from different companies will be great for people doing financial research, and this new combination of standards and free software makes it pretty easy.&lt;/p&gt;
&lt;h2 id=&#34;3-comments&#34;&gt;3 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.openlinksw.com/blog/~kidehen&#34; title=&#34;http://www.openlinksw.com/blog/~kidehen&#34;&gt;Kingsley Idehen&lt;/a&gt; on &lt;a href=&#34;#comment-2093&#34;&gt;September 23, 2008 11:56 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Bob,&lt;/p&gt;
&lt;p&gt;From &lt;a href=&#34;http://demo.openlinksw.com/sparql,&#34;&gt;http://demo.openlinksw.com/sparql,&lt;/a&gt; just execute a SPARQL query (ie. select * from the XBRL instance URI ).&lt;/p&gt;
&lt;p&gt;Take any XBRL instance from: &lt;a href=&#34;http://www.sec.gov/Archives/edgar/xbrl.html&#34;&gt;http://www.sec.gov/Archives/edgar/xbrl.html&lt;/a&gt; .&lt;/p&gt;
&lt;p&gt;I would still encourage you to assist me in getting all the XBRL interested parties to work together via the XBRL Financial Report Ontology effort at:&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://groups.google.com/group/xbrl-ontology-specification-group&#34;&gt;http://groups.google.com/group/xbrl-ontology-specification-group&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;We should be able to collectively produce a Financial Reporting Ontology from XBRL.&lt;/p&gt;
&lt;p&gt;Kingsley&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-2094&#34;&gt;September 23, 2008 12:20 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Hi Kingsley,&lt;/p&gt;
&lt;p&gt;Does &lt;a href=&#34;http://demo.openlinksw.com/sparql&#34;&gt;http://demo.openlinksw.com/sparql&lt;/a&gt; offer a way to issue a SPARQL query against multiple XBRL reports at once? That&amp;rsquo;s really what I was interested in.&lt;/p&gt;
&lt;p&gt;I had another question for you, but decided that &lt;a href=&#34;http://groups.google.com/group/xbrl-ontology-specification-group&#34;&gt;http://groups.google.com/group/xbrl-ontology-specification-group&lt;/a&gt; would be a more effective place to put it.&lt;/p&gt;
&lt;p&gt;Bob&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.openlinksw.com/blog/~kidehen&#34; title=&#34;http://www.openlinksw.com/blog/~kidehen&#34;&gt;Kingsley Idehen&lt;/a&gt; on &lt;a href=&#34;#comment-2095&#34;&gt;September 23, 2008 3:11 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Bob,&lt;/p&gt;
&lt;p&gt;SPARQL FROM NAMED is how you refer to multiple RDF information resources via their URIs. You can even scope your SPARQL query patterns to specific graphs if you want via GRAPH {query-pattern} .&lt;/p&gt;
&lt;p&gt;Just try it :-)&lt;/p&gt;
&lt;p&gt;Note: use the drop down to tell the service to SPONGE (i.e. get remote Graphs).&lt;/p&gt;
&lt;p&gt;Kingsley&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2008">2008</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
      <category domain="https://www.bobdc.com//categories/xbrl">XBRL</category>
      
      <category domain="https://www.bobdc.com//categories/semantic-web">semantic web</category>
      
    </item>
    
    <item>
      <title>Free XBRL software</title>
      <link>https://www.bobdc.com/blog/free-xbrl-software/</link>
      <pubDate>Mon, 15 Sep 2008 09:34:52 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/free-xbrl-software/</guid>
      
      
      <description><div>A tour.</div><div>&lt;p&gt;In trying to learn more about XBRL, an important first step is to find software, and I don&amp;rsquo;t want to pay for it. Open Source is even better, in case I want to build some application around it. I&amp;rsquo;ve written up my research experience with each free package I heard about, roughly in most promising to least promising order. For sample input, most of my testing used &lt;a href=&#34;http://www.sec.gov/Archives/edgar/xbrl.html&#34;&gt;XBRL filings to the SEC&lt;/a&gt; by large multinational corporations specializing in carbonated brown sugar water.&lt;/p&gt;
&lt;h2 id=&#34;id203605&#34;&gt;XBRLAPI&lt;/h2&gt;
&lt;p&gt;This makes the top of the list because it was the closest to what I was looking for—an open source library, with some built-in routines to let you try it right away without doing any coding yourself, that more or less worked with data that I chose to feed it. XBRLAPI has a brief &lt;a href=&#34;http://www.xbrlapi.org/gettingStarted/&#34;&gt;Getting Started&lt;/a&gt; page, and &lt;a href=&#34;http://www.xbrlapi.org/installationDocumentation/commandLine.html&#34;&gt;Using the XBRLAPI from the command line&lt;/a&gt; tells you more about actually getting started with it. I took their command line example demonstrating how to specify all the jar files on the command line and got a few error messages until I added xercesImpl.jar and xalan.jar to the list and increased the initial and maximum heap sizes. It requires a log4j.xml file, and the first one I tried didn&amp;rsquo;t work, so I eventually used the XBRLAPI &lt;a href=&#34;http://xbrlapi.svn.sourceforge.net/viewvc/xbrlapi/trunk/conf/log4j.xml?revision=207&amp;amp;pathrev=207&#34;&gt;distribution log4j.xml&lt;/a&gt;, which just sends log messages to the console.&lt;/p&gt;
&lt;p&gt;Of the built-in routines, the &amp;ldquo;compose operation (merging all of the discovered documents into a single XML composite document)&amp;rdquo; was the most attractive to me, because a company typically stores their XBRL information in a set of instance and taxonomy documents, and the compose operation combines them all into one. This makes it easier for an XSLT stylesheet or some other simple scripting technology to act on the information with no need to do all the cross-file lookups and dereferencing that is normally part of XBRL processing. The Coca Cola files submitted to the SEC added up to about 600K, and trying to process them appeared to hang my machine, so I tried the 134K of the Noble Energy filings. This took 4 hours and 15 minutes, so I tried Coke and Pepsi again with no luck. It&amp;rsquo;s encouraging that it worked with Noble Energy, and looks like a configuration issue worth trying on my Linux machine instead of the Windows computer where I made these attempts.&lt;/p&gt;
&lt;h2 id=&#34;id203679&#34;&gt;DragonView&lt;/h2&gt;
&lt;p&gt;Rivet Software&amp;rsquo;s &lt;a href=&#34;http://www.rivetsoftware.com/content/index.cfm?fuseaction=showContent&amp;amp;contentID=90&amp;amp;navID=80&#34;&gt;Dragon View&lt;/a&gt; reads XBRL files and displays them with enough interactivity to let you navigate around the cleanly presented reports. Downloading requires registration first, but confirmation of my registration came in about an hour, even though it was a Sunday night.&lt;/p&gt;
&lt;p&gt;When I first installed it and loaded the instance document from Coca Cola&amp;rsquo;s EDGAR filing, I got an error message that Dragon View couldn&amp;rsquo;t find us-gaap-all-2008-03-31.xsd. This file and related ones are available at the SEC&amp;rsquo;s &lt;a href=&#34;http://www.sec.gov/spotlight/xbrl/xbrlusfrv1-core.htm&#34;&gt;US Financial Reporting Version 1 Taxonomies — Core&lt;/a&gt; web page, but don&amp;rsquo;t right-click and try to download the files directly from there, like I did; what appears to be links to the files actually link to a message that you&amp;rsquo;re leaving he SEC site.&lt;/p&gt;
&lt;p&gt;Once I had the right schema files from the standard, I managed to load Coca Cola&amp;rsquo;s cce-20080627.xml instance document into DragonView, where it displayed in a clear and straightforward tabular view showing one of several reports. A drop down &amp;ldquo;Reports&amp;rdquo; field at the top offered a choice of other reports to view. When I tried to load Pepsico&amp;rsquo;s pep-20080614.xml document, DragonView displayed a &amp;ldquo;Missing Information Warning&amp;rdquo; message box with the explanation &amp;ldquo;One or more XBRL elements contained within the XBRL document is missing from the referenced taxonomy&amp;rdquo; and a clear and simple tabular display of the problematic elements. (Kudos to Rivet for displaying the warning details so nicely; a lot of products, commercial or otherwise, would have dumped a bunch of courier text log messages into a scrolling field on the message box.) When you select a particular labeled row on the report such as &amp;ldquo;Net changes in assets and liabilities, net of acquisition amounts&amp;rdquo;, optional fields at the bottom of the main display show the authoritative references (source citation) and the definition for that piece of information—in this case, &amp;ldquo;The net change during the reporting period of all current assets and liabilities used in operating activities&amp;rdquo;.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://209.234.225.154/viewer/filings/overview.asp?cik=0000804055&amp;amp;accessionNumber=0001193125-08-100599&#34;&gt;&lt;img id=&#34;id203765&#34; src=&#34;https://www.bobdc.com/img/main/cokexbrlimg.png&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;Financial Explorer view of Coca Cola cash data&#34; width=&#34;400px&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&#34;id203786&#34;&gt;Financial Explorer&lt;/h2&gt;
&lt;p&gt;The SEC&amp;rsquo;s &lt;a href=&#34;http://209.234.225.154/viewer/home/&#34;&gt;Financial Explorer&lt;/a&gt; is an online application for browsing XBRL submitted to them. Its interactive diagrams are great, with color-coded circles of different sizes giving quick visual overviews of different amounts of related income or expenses. Being essentially an interactive website, it&amp;rsquo;s not the kind of software I was looking for, but it will be great for many people who want to explore the submitted data without buying software.&lt;/p&gt;
&lt;h2 id=&#34;id203809&#34;&gt;ABRA&lt;/h2&gt;
&lt;p&gt;&lt;a href=&#34;http://www.xbrlopen.org/&#34;&gt;ABRA&lt;/a&gt; is an open source effort from a German company called ABZ Reporting. It works as a set of XSLT stylesheets with Java-based extensions, and I got its demo to work, but found the overall setup to be a little too hardcoded to its demo. I wrote out more details about what I tried and my suggestions for the program on its &lt;a href=&#34;https://sourceforge.net/forum/forum.php?thread_id=2190642&amp;amp;forum_id=452096&#34;&gt;SourceForge mailing list&lt;/a&gt; near the end of August and haven&amp;rsquo;t seen any reply since then.&lt;/p&gt;
&lt;h2 id=&#34;id203839&#34;&gt;XBRL View&lt;/h2&gt;
&lt;p&gt;This package doesn&amp;rsquo;t seem to have any real home, and the only mentions I could find are &lt;a href=&#34;http://www.download.com/XBRL-View/3000-2066_4-10535259.html&#34;&gt;on the free software sites where you can download it&lt;/a&gt;. It appears to come from China and hasn&amp;rsquo;t been updated since May of 2006. The help page says &amp;ldquo;copyright 2005 - 2006&amp;rdquo; and lists &lt;a href=&#34;https://www.clousoft.com&#34;&gt;www.clousoft.com&lt;/a&gt; as a web page, but this domain name &lt;a href=&#34;http://www.justdropped.com/drops/111406com.html&#34;&gt;expired&lt;/a&gt; in 2006 there&amp;rsquo;s nothing there now. After starting the program up and using its graphical interface to try to load the Coca Cola and Pepsico instance documents, I got a java.util.NoSuchElementException Java exception in the XBRL View&amp;rsquo;s console window for both sets of XBRL data.&lt;/p&gt;
&lt;h2 id=&#34;id203872&#34;&gt;Free if you spend some money: SavaNet&amp;rsquo;s XBRL Reader and Fujitsu&amp;rsquo;s XBRL Tools&lt;/h2&gt;
&lt;p&gt;&lt;a href=&#34;http://www.savanet.com/AboutReader.aspx&#34;&gt;SavaNet® XBRL® Reader™&lt;/a&gt; (look at all those IP superscripts!) is free &amp;ldquo;to the investors and clients of publishers using SavaNet products&amp;rdquo;, and Fujitsu&amp;rsquo;s &lt;a href=&#34;http://www.fujitsu.com/global/services/software/interstage/xbrltools/&#34;&gt;XBRL Tools&lt;/a&gt; is free &amp;ldquo;for XBRL Consortium members / academic users only&amp;rdquo;. (The latter used to be free for anyone who wanted to download it.) Joining XBRL means &lt;a href=&#34;http://www.xbrl.org/HowToJoin/&#34;&gt;joining your local jurisdiction&lt;/a&gt;, which for xbrl.us, means paying thousands of dollars a year in dues.&lt;/p&gt;
&lt;h2 id=&#34;id203908&#34;&gt;Semansys&lt;/h2&gt;
&lt;p&gt;Semansys has a &lt;a href=&#34;http://www.semansys.com/downloads.html&#34;&gt;download area&lt;/a&gt; that says that &amp;ldquo;Semansys Technologies offers documentation, white papers and evaluation software, ready to use for everyone who&amp;rsquo;s interested&amp;rdquo;, but &amp;ldquo;currently [they] cannot provide you with automated download functionality&amp;rdquo;. Clicking on any of the page&amp;rsquo;s five &amp;ldquo;appropriate profile&amp;rdquo; links pops up a window telling you to email &lt;a href=&#34;mailto:sales@semansys.com&#34;&gt;sales@semansys.com&lt;/a&gt; to find out more. (Clicking the &amp;ldquo;I’m a CPA or consultant and I want to learn more about XBRL and available software in my personal interest&amp;rdquo; profile displays a window telling you &amp;ldquo;Contact us to purchase the application and enjoy a 30 day money-back guarantee&amp;rdquo;—I guess &amp;ldquo;evaluation software ready to use for everyone who&amp;rsquo;s interested&amp;rdquo; means &amp;ldquo;buy it and if you have problems we&amp;rsquo;ll give you your money back&amp;rdquo;. The page does let you download some XBRL samples from &amp;ldquo;Virtual Company&amp;rdquo;.&lt;/p&gt;
&lt;h2 id=&#34;id203944&#34;&gt;XBreeze&lt;/h2&gt;
&lt;p&gt;When I began this research about two weeks ago, I &lt;a href=&#34;http://www.andhranews.net/intl/2007/January/9/em-UBmatrixnounces.asp&#34;&gt;heard about&lt;/a&gt; an open source program called xBRreeze from &lt;a href=&#34;http://www.ubmatrix.com/&#34;&gt;UBMatrix&lt;/a&gt;. The site required registration before you could download it. After trying several days in a row and getting the error message &amp;ldquo;An Error has occurred in the application. The administrator has been alerted about the problem. Please try after some time&amp;rdquo;, I emailed them two weeks ago and haven&amp;rsquo;t hear back. Now the registration page and all pages mentioning xBRreeze seem to be gone.&lt;/p&gt;
&lt;h2 id=&#34;id203975&#34;&gt;Next step&lt;/h2&gt;
&lt;p&gt;Converting some SEC XBRL into RDF and working out some reasonably simple SPARQL queries to run against it. Dave Raggett and I have just moved our private email discussion about modeling XBRL in RDF onto the &lt;a href=&#34;http://lists.w3.org/Archives/Public/semantic-web/2008Sep/0113.html&#34;&gt;SWIG&lt;/a&gt; mailing list if anyone wants to join in.&lt;/p&gt;
&lt;h2 id=&#34;5-comments&#34;&gt;5 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.openlinksw.com/blog/~kidehen&#34; title=&#34;http://www.openlinksw.com/blog/~kidehen&#34;&gt;Kingsley Idehen&lt;/a&gt; on &lt;a href=&#34;#comment-2084&#34;&gt;September 15, 2008 6:14 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Bob,&lt;/p&gt;
&lt;p&gt;Why can&amp;rsquo;t we coordinate the RDF and XBRL Linked Data effort via:&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://groups.google.com/group/xbrl-ontology-specification-group&#34;&gt;http://groups.google.com/group/xbrl-ontology-specification-group&lt;/a&gt; ?&lt;/p&gt;
&lt;p&gt;At the very least, why not ping the members of this community?&lt;/p&gt;
&lt;p&gt;Note, OpenLink has already produced an intial ontology for XBRL which is what we use in our ODE product. The ontology will be released this week.&lt;/p&gt;
&lt;p&gt;Links:&lt;/p&gt;
&lt;p&gt;1. &lt;a href=&#34;http://ode.openlinksw.com&#34;&gt;http://ode.openlinksw.com&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-2085&#34;&gt;September 15, 2008 7:26 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Hi Kingsley,&lt;/p&gt;
&lt;p&gt;In the last 11 months, there have been give messages on that group: two from you last June and three since of porn spam.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;At the very least, why not ping the members of this community?&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I did get in touch with Frederik last month, and he told me that they couldn&amp;rsquo;t find anyone in the XBRL community willing to work on it, and that Zitgist will be doing some work when the time is right.&lt;/p&gt;
&lt;p&gt;So far, the ontology continues to be at release 0.0, with nothing in particular to build on.&lt;/p&gt;
&lt;p&gt;Besides, I&amp;rsquo;m not looking for an ontology, I&amp;rsquo;m looking for RDF data to query. &lt;a href=&#34;http://demo.openlinksw.com/ode/&#34;&gt;http://demo.openlinksw.com/ode/&lt;/a&gt; is very interesting, but after following your instructions and poking around a bit (even clicking &amp;ldquo;Raw Triples&amp;rdquo; and then doing &amp;ldquo;View Source&amp;rdquo;) I still don&amp;rsquo;t see any RDF. I&amp;rsquo;ve been meaning to get to know OpenLink better, but I haven&amp;rsquo;t seen how to get RDF out of it yet. So, I&amp;rsquo;ve written a little XSLT to convert instance documents to RDF and I&amp;rsquo;ve been querying those. It&amp;rsquo;s all come together pretty quickly.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;OpenLink has already produced an intial ontology&lt;br /&gt;
for XBRL which is what we use in our ODE product.&lt;br /&gt;
The ontology will be released this week.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I look forward to seeing it!&lt;/p&gt;
&lt;p&gt;Bob&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.openlinksw.com/blog/~kidehen&#34; title=&#34;http://www.openlinksw.com/blog/~kidehen&#34;&gt;Kingsley Idehen&lt;/a&gt; on &lt;a href=&#34;#comment-2087&#34;&gt;September 16, 2008 7:48 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Bob,&lt;/p&gt;
&lt;p&gt;&lt;br /&gt;
To get RDF from an XBRL instance document simply do the following:&lt;/p&gt;
&lt;p&gt;Using our SPARQL Endpoints (e.g. &lt;a href=&#34;http://demo.openlinksw.com/sparql):&#34;&gt;http://demo.openlinksw.com/sparql):&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;1. basic sparql pattern with XBRL instance doc URL in the FROM NAMED CLAUSE&lt;/p&gt;
&lt;p&gt;2. over the SPARQL protocol with appropriate results serialization choosen&lt;/p&gt;
&lt;p&gt;When using ODE, please use the &amp;ldquo;Page Description&amp;rdquo; feature from an XBRL instance document, and then look at the footer where there are options for RDF/XML or N3 serialization options.&lt;/p&gt;
&lt;p&gt;The proxy URIs that we produce via our Sponger Middleware service will enable you to then replicate the XBRL to RDF experience using other RDF based tools and platforms.&lt;/p&gt;
&lt;p&gt;As for the Google Discussion forum, I am yet to understand how you administer those forums re. SPAM. Also, when it comes to inactivity, we are back to my original frustration: nobody has stepped up to assist us with the enormous task of producing an ontology from XBRL, so we did it ourselves, and even after that, the fragmentation continues :-(\&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.corefiling.com&#34; title=&#34;http://www.corefiling.com&#34;&gt;John Turner&lt;/a&gt; on &lt;a href=&#34;#comment-2088&#34;&gt;September 16, 2008 8:13 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Bob&lt;/p&gt;
&lt;p&gt;Feel free to download the free, no strings attached, version of SpiderMonkey to help create or extend taxonomies. I&amp;rsquo;ll be interested to hear how you are modelling things like the calculation linkbase as triples.&lt;/p&gt;
&lt;p&gt;Cheers&lt;/p&gt;
&lt;p&gt;John Turner&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-2089&#34;&gt;September 16, 2008 9:05 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;John,&lt;/p&gt;
&lt;p&gt;I will check that out. For now, I&amp;rsquo;m just playing with the modeling of simple XBRL facts, but Dave Raggett is working on the modeling of the more complete XBRL picture, including taxonomies.&lt;/p&gt;
&lt;p&gt;Bob&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2008">2008</category>
      
      <category domain="https://www.bobdc.com//categories/xbrl">XBRL</category>
      
    </item>
    
    <item>
      <title>Using XSLT to deliver XML on browsers</title>
      <link>https://www.bobdc.com/blog/using-xslt-to-deliver-xml-on-b/</link>
      <pubDate>Tue, 09 Sep 2008 10:18:43 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/using-xslt-to-deliver-xml-on-b/</guid>
      
      
      <description><div>An update on Firefox (with some help from the world of model railroading) and Chrome.</div><div>&lt;p&gt;Delivery of XML on web browsers isn&amp;rsquo;t as popular as XML&amp;rsquo;s inventors originally hoped, but it&amp;rsquo;s still useful. It&amp;rsquo;s &lt;a href=&#34;http://www.xml.com/pub/a/2003/02/05/tr.html&#34;&gt;easy&lt;/a&gt; to add a standardized processing instruction to your XML that points at an XSLT stylesheet that converts your XML to HTML, and then when you open the XML file in your browser, you see the result. When you need a rendered version of some XML to review, this can make it happen pretty quickly. (The W3C Recommendation &lt;a href=&#34;http://www.w3.org/TR/xml-stylesheet/&#34;&gt;Associating Style Sheets with XML documents&lt;/a&gt; is possibly their shortest; its author, James Clark, was always good for a high signal-to-noise ratio.)&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://www.flickr.com/photos/marksimpkins/57681468/&#34;&gt;&lt;img id=&#34;id202512&#34; src=&#34;http://farm1.static.flickr.com/27/57681468_e371eb7352.jpg&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;guys with model trains&#34; width=&#34;320px&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve done this with Firefox very often, and I&amp;rsquo;ve set up a system for a client who needed to let users preview XML documents without starting up specialized software by doing the same trick with Internet Explorer. (Their choice of browser, not mine.) Recently, though, I found that this didn&amp;rsquo;t work with Firefox 3.0. The stylesheet was ignored, and the error console told me that there was a &amp;ldquo;Security Error: Content at file:///c:/my/path/filename.xml may not load data from file:///c:/some/path/stylesheet.xsl&amp;rdquo;. I understand the general idea of not letting a browser allow one file to load another, but when they&amp;rsquo;re both local, I see a baby going out with the bathwater.&lt;/p&gt;
&lt;p&gt;I had to look pretty hard to find the solution, and of all places, it turned out to be on a website dedicated to an &lt;a href=&#34;http://wiki.rocrail.net/doku.php&#34;&gt;open source model railroad control system&lt;/a&gt;. (It would have been great if a posting in the thread came from &lt;a href=&#34;http://www.gibson.com/en-us/Lifestyle/Features/Smokestack%20Lightnin__%20Neil%20You/&#34;&gt;Clyde Coil&lt;/a&gt;, but none did.) According to to page 3 of a thread on &lt;a href=&#34;http://forum.rocrail.net/viewtopic.php?p=2376&#34;&gt;Who can help on XSL-Stylesheets&lt;/a&gt;, you need to reset Firefox&amp;rsquo;s security.fileuri.strict_origin_policy setting to &amp;ldquo;false&amp;rdquo;. This worked for me, and I hope I&amp;rsquo;m not opening some gaping security hole. I&amp;rsquo;d hate to have to choose between good security and Firefox rendering of styled XML.&lt;/p&gt;
&lt;p&gt;After downloading &lt;a href=&#34;http://www.google.com/chrome&#34;&gt;Google Chrome&lt;/a&gt;, this XSLT trick was the first thing I tried, with no luck. When I used chrome to open an XML file with a processing instruction that pointed to an XSLT stylesheet, as described by James&amp;rsquo; brief W3C Recommendation, it showed nothing—pure blank space, not even the PCDATA from the file. Chrome&amp;rsquo;s clean interface made it difficult to find out how to View Source, but luckily Ctrl+U did it like it does in Firefox. When I tried this, it displayed a color coded version of the XML file, including comments&amp;hellip; but with all processing instructions removed, so the standard way to associate stylesheets with arbitrary XML files won&amp;rsquo;t work.&lt;/p&gt;
&lt;p&gt;If Chrome is more about being an application development platform than another grab at desktop dominance, then I hope that XSLT is eventually part of that application development picture. Maybe someday.&lt;/p&gt;
&lt;h2 id=&#34;4-comments&#34;&gt;4 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.ccil.org/~cowan&#34; title=&#34;http://www.ccil.org/~cowan&#34;&gt;John Cowan&lt;/a&gt; on &lt;a href=&#34;#comment-2078&#34;&gt;September 9, 2008 11:16 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The Google Chrome team has been made aware of this issue. (That&amp;rsquo;s as much as I&amp;rsquo;m authorized to say.)&lt;/p&gt;
&lt;p&gt;By Erik Hetzner on &lt;a href=&#34;#comment-2079&#34;&gt;September 9, 2008 11:17 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve been using FF3 to load XSLT for XML on the same origin server (with relative &amp;amp; absolute URIs) without problems. There is more info about this policy here: &lt;a href=&#34;http://kb.mozillazine.org/Security.fileuri.strict_origin_policy&#34;&gt;http://kb.mozillazine.org/Security.fileuri.strict_origin_policy&lt;/a&gt; It might work for you to use an HTTP URI for the stylesheet or to put the stylesheet in the same dir or a subdir that the XML is in.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-2080&#34;&gt;September 9, 2008 11:42 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;John: thanks, that gives me hope.&lt;/p&gt;
&lt;p&gt;Erik: thanks, I will try that.&lt;/p&gt;
&lt;p&gt;By Norm on &lt;a href=&#34;#comment-2081&#34;&gt;September 10, 2008 6:47 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;We do the same thing with XML and XSL for displaying the output of our build system. Sadly I ran into the same problem as you with Chrome (we had figured out the FF3 &amp;ldquo;work-around&amp;rdquo; a while ago). I really hope Google fixes this. Given the speed of Chrome I&amp;rsquo;m hoping that this transfers to their XSLT processing (we have very large XML files and some complicated XSL). Putting the stylesheet in the same dir didn&amp;rsquo;t work for me. I&amp;rsquo;m going to try serving up the stylesheet from HTTP now.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2008">2008</category>
      
      <category domain="https://www.bobdc.com//categories/xslt">XSLT</category>
      
    </item>
    
    <item>
      <title>Learning about XBRL</title>
      <link>https://www.bobdc.com/blog/learning-about-xbrl/</link>
      <pubDate>Wed, 03 Sep 2008 19:41:50 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/learning-about-xbrl/</guid>
      
      
      <description><div>For the geek, not for the accountant.</div><div>&lt;blockquote&gt;
&lt;p&gt;*Facts are simple and facts are straight&lt;br /&gt;
Facts are lazy and facts are late&lt;br /&gt;
*&lt;/p&gt;
&lt;/blockquote&gt;
&lt;ul&gt;
&lt;li&gt;Talking Heads, &amp;ldquo;Crosseyed and Painless&amp;rdquo;, from &lt;a href=&#34;http://en.wikipedia.org/wiki/Remain_in_Light&#34;&gt;Remain in Light&lt;/a&gt;, 1980&lt;/li&gt;
&lt;/ul&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Facts can be simple, in which case their values must be expressed as simple content (except in the case of simple facts whose values are expressed as a ratio), and facts can be compound, in which case their value is made up from other simple and/or compound facts.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;http://www.xbrl.org/Specification/XBRL-RECOMMENDATION-2003-12-31+Corrected-Errata-2005-11-07.htm#fact&#34;&gt;XBRL Recommendation 2.1, section 1.4, &amp;ldquo;Terminology&amp;rdquo;&lt;/a&gt;, 2005&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;(Sorry, couldn&amp;rsquo;t resist that.) Most introductions to XBRL out there are aimed at financial people. They briefly touch on the what and why of the XML parts, but leave out the how, treating the technology part as a black box. Simple web searches turn up plenty of these introductions, so for those interested in a more technical, markup geek perspective, I wanted to give an overview of the better resources that I found. If you&amp;rsquo;re coming at XBRL from an implementer&amp;rsquo;s angle, it&amp;rsquo;s certainly important to read the overviews aimed at the CFO and accountant crowd to learn what users do with this data and expect from the technology, but if you want to create or modify the technology you have to dig a little more for background.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;For an overview of XBRL&amp;rsquo;s role in the standards world, Dale Waldt&amp;rsquo;s &lt;a href=&#34;http://www.xml.com/pub/a/2004/03/10/xbrl.html&#34;&gt;XBRL: The Language of Finance and Accounting&lt;/a&gt; article in XML.com (part of a series titled &amp;ldquo;Standards Lowdown&amp;rdquo;) answers key questions such as &amp;ldquo;What Is it?&amp;rdquo; and &amp;ldquo;Where does it come from?&amp;rdquo;, so it&amp;rsquo;s a good place to start.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Although Wikipedia is aimed at a generalized audience, its &lt;a href=&#34;http://en.wikipedia.org/wiki/XBRL&#34;&gt;XBRL entry&lt;/a&gt; packs a lot of technical detail and context into a fairly brief entry, so I strongly recommend that as one of the first things to look at.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;You wouldn&amp;rsquo;t want to read all 165 pages of the actual &lt;a href=&#34;http://www.xbrl.org/Specification/XBRL-RECOMMENDATION-2003-12-31+Corrected-Errata-2005-11-07.htm&#34;&gt;XBRL Recommendation&lt;/a&gt; straight through, but as with any spec for a standard that you&amp;rsquo;re interested in, it&amp;rsquo;s worth reading the introductory part and skimming the rest to get an idea of what&amp;rsquo;s there so that you know how it&amp;rsquo;s organized when you need to look up something specific. I read about the first 20 pages, then my eyes glazed over when it started getting into detail about the more advanced XLink possibilities, so I skipped to the introductions of the &lt;a href=&#34;http://www.xbrl.org/Specification/XBRL-RECOMMENDATION-2003-12-31+Corrected-Errata-2005-11-07.htm#_4&#34;&gt;XBRL Instances&lt;/a&gt; and &lt;a href=&#34;http://www.xbrl.org/Specification/XBRL-RECOMMENDATION-2003-12-31+Corrected-Errata-2005-11-07.htm#_5&#34;&gt;XBRL Taxonomies&lt;/a&gt; sections.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Lastly, there&amp;rsquo;s a somewhat interactive &lt;a href=&#34;http://www.us.kpmg.com/microsite/xbrl/train/86/86.htm&#34;&gt;tutorial&lt;/a&gt; at KPMG&amp;rsquo;s website. (Don&amp;rsquo;t even follow the link unless you&amp;rsquo;re using Internet Explorer, which is obviously the first strike against the tutorial.) This tutorial has plenty of good information, but it&amp;rsquo;s still a glorified PowerPoint presentation masquerading as a series of Interactive Course Module Rich User Experience Learning Management System Objects, or whatever the hell they&amp;rsquo;re calling it. (I also question the expertise of any XML &amp;ldquo;experts&amp;rdquo; who don&amp;rsquo;t understand the &lt;a href=&#34;http://xml.silmaril.ie/authors/makeup/&#34;&gt;difference between elements and tags&lt;/a&gt;. And, don&amp;rsquo;t be put off by the awful music with the title slides; it&amp;rsquo;s only on the title slides.) Despite these annoyances, the KPMG tutorial lays out the elements and attributes that make up XBRL instances and taxonomies pretty well, and I took several pages of notes.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Do you know of any good introductions to XBRL aimed more at implementers than at accountants?&lt;/p&gt;
&lt;h2 id=&#34;1-comments&#34;&gt;1 Comments&lt;/h2&gt;
&lt;p&gt;By Bill Donoghoe on &lt;a href=&#34;#comment-2069&#34;&gt;September 4, 2008 5:56 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Using XBRL Extensibility by Charles Hoffman&lt;br /&gt;
(&lt;a href=&#34;http://www.lulu.com/browse/book_view.php?fCID=592782&amp;amp;fBuyItem=5&#34;&gt;http://www.lulu.com/browse/book_view.php?fCID=592782&amp;amp;fBuyItem=5&lt;/a&gt;) may be useful.&lt;/p&gt;
&lt;p&gt;You could also check out my XBRL bookmarks at&lt;br /&gt;
&lt;a href=&#34;http://delicious.com/bdonoghoe/xbrl.&#34;&gt;http://delicious.com/bdonoghoe/xbrl.&lt;/a&gt;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2008">2008</category>
      
      <category domain="https://www.bobdc.com//categories/xbrl">XBRL</category>
      
    </item>
    
    <item>
      <title>Werewolves of Kid Rock</title>
      <link>https://www.bobdc.com/blog/werewolves-of-kid-rock/</link>
      <pubDate>Mon, 01 Sep 2008 10:58:21 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/werewolves-of-kid-rock/</guid>
      
      
      <description><div>Skynyrd, sure, but don&#39;t forget Warren Zevon.</div><div>&lt;p&gt;Kid Rock&amp;rsquo;s &amp;ldquo;All Summer Long&amp;rdquo; looks like the monster summer hit of 2008. In today&amp;rsquo;s compartmentalized market for pop music, this song is big in a lot of compartments. The lyrics reminisce about a summer when he was young, and the girlfriend he had then, and how they would all sing &amp;ldquo;Sweet Home Alabama&amp;rdquo; a lot. Like &amp;ldquo;Sweet Home Alabama,&amp;rdquo; the song&amp;rsquo;s chords are half a bar of D, half a bar of C, and a bar of G, repeated with no variation throughout the song&amp;rsquo;s verses, choruses, and solo, with no bridge to break it up the pattern. Mr. Rock works in a lot of backing vocals and lead guitar lines from the Lynyrd Skynyrd hit, and because &amp;ldquo;Sweet Home Alabama&amp;rdquo; still gets plenty of airplay on country and classic rock stations, even young kids get the reference. There&amp;rsquo;s a key reference they don&amp;rsquo;t get, though. The first time I heard the opening of &amp;ldquo;All Summer Long&amp;rdquo; on the radio, I said to my daughter &amp;ldquo;Cool! Someone&amp;rsquo;s covered &amp;lsquo;Werewolves of London&amp;rsquo;!&amp;rdquo; I was wrong, but not far off.&lt;/p&gt;
&lt;p&gt;If you&amp;rsquo;re not familiar with the late &lt;a href=&#34;http://en.wikipedia.org/wiki/Warren_Zevon&#34;&gt;Warren Zevon&lt;/a&gt;, I could compare him to Randy Newman or Nilsson, two more LA songwriters with a cynical sense of humor whose greatest commercial success as songwriters were more well-known artists&amp;rsquo; versions of their songs—in Zevon&amp;rsquo;s case, Linda Ronstadt&amp;rsquo;s version of &amp;ldquo;Poor Pitiful Me.&amp;rdquo; (Be thankful that Newman is still with us.) Not only does &amp;ldquo;Werewolves of London&amp;rdquo; do that same D-C-G thing throughout, but the piano part that kicks off &amp;ldquo;All Summer Long&amp;rdquo; is clearly a copy of Zevon&amp;rsquo;s, if not a sample. When I pulled out my vinyl copy of &lt;a href=&#34;http://en.wikipedia.org/wiki/Excitable_Boy&#34;&gt;Excitable Boy&lt;/a&gt; and played it for my daughters, they acted blasé about the resemblance, but teenagers are good at that. Make the comparison yourself, even if you only listen to the first 10 seconds of &lt;a href=&#34;http://www.youtube.com/watch?v=nhSc8qVMjKM&#34;&gt;Werewolves of London&lt;/a&gt; and &lt;a href=&#34;http://www.youtube.com/watch?v=uwIGZLjugKA&#34;&gt;All Summer Long&lt;/a&gt;:&lt;/p&gt;
&lt;p&gt;
&lt;div style=&#34;position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;&#34;&gt;
  &lt;iframe src=&#34;https://www.youtube.com/embed/nhSc8qVMjKM&#34; style=&#34;position: absolute; top: 0; left: 0; width: 100%; height: 100%; border:0;&#34; allowfullscreen title=&#34;YouTube Video&#34;&gt;&lt;/iframe&gt;
&lt;/div&gt;
&lt;br /&gt;

&lt;div style=&#34;position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;&#34;&gt;
  &lt;iframe src=&#34;https://www.youtube.com/embed/uwIGZLjugKA&#34; style=&#34;position: absolute; top: 0; left: 0; width: 100%; height: 100%; border:0;&#34; allowfullscreen title=&#34;YouTube Video&#34;&gt;&lt;/iframe&gt;
&lt;/div&gt;
&lt;/p&gt;
&lt;p&gt;Kid Rock always seemed to straddle multiple cultures with a grin and a wink; I&amp;rsquo;ll take this juxtaposition as one more wink.&lt;/p&gt;
&lt;p&gt;Update: my brother pointed out that Rock did the right thing, &lt;a href=&#34;http://repertoire.bmi.com/title.asp?blnWriter=True&amp;amp;blnPublisher=True&amp;amp;blnArtist=True&amp;amp;keyid=9398351&amp;amp;ShowNbr=0&amp;amp;ShowSeqNbr=0&amp;amp;querytype=WorkID&#34;&gt;spreading the publishing around&lt;/a&gt; to all the authors of both songs—even the great LA session guitarist Waddy Wachtel.&lt;/p&gt;
&lt;h2 id=&#34;8-comments&#34;&gt;8 Comments&lt;/h2&gt;
&lt;p&gt;By James Lynch III on &lt;a href=&#34;#comment-2058&#34;&gt;September 1, 2008 12:44 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I think he&amp;rsquo;s acknowledged sampling both tunes&amp;hellip; but I can&amp;rsquo;t help but think this is a totally crass way to get a hit in this f-ed up music marketplace&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://dannyayers.com&#34; title=&#34;http://dannyayers.com&#34;&gt;Danny&lt;/a&gt; on &lt;a href=&#34;#comment-2059&#34;&gt;September 1, 2008 2:52 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I listened to Zevon all the way through (an old fave, hadn&amp;rsquo;t seen the vid). Watched 30 seconds of Kid Rock, stopped it and listened to Zevon again. Now I&amp;rsquo;m off to youtube to find the Skynyrd thing. Must be getting old&amp;hellip;&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://dannyayers.com&#34; title=&#34;http://dannyayers.com&#34;&gt;Danny&lt;/a&gt; on &lt;a href=&#34;#comment-2061&#34;&gt;September 1, 2008 3:39 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&amp;ldquo;don&amp;rsquo;t forger Warren Zevon&amp;rdquo; - typo or intentional?&lt;/p&gt;
&lt;p&gt;By Peter on &lt;a href=&#34;#comment-2062&#34;&gt;September 1, 2008 3:59 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I just heard it for the first time and remarked &amp;ldquo;Wow, sampling werewolves. Pretty clever.&amp;rdquo; And then I was informed I was &amp;lsquo;out of the loop&amp;rsquo; and it was Skynyrd. Same freaking key. Look for lazy mashups coming soon.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-2063&#34;&gt;September 1, 2008 4:20 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Danny: it was a typo, and I&amp;rsquo;ve corrected it. Copying one instrument&amp;rsquo;s licks from one use of a tried-and-true chord progression isn&amp;rsquo;t forgery. (Thanks for pointing it out.)&lt;/p&gt;
&lt;p&gt;Jim: sampling Zevon, or Skynyrd, isn&amp;rsquo;t a way to get a hit. You need more ingredients than that, and &amp;ldquo;All Summer Long&amp;rdquo; has them.\&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.timothyhorrigan.com&#34; title=&#34;http://www.timothyhorrigan.com&#34;&gt;Tim Horrigan&lt;/a&gt; on &lt;a href=&#34;#comment-2064&#34;&gt;September 1, 2008 5:52 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Hmm, I wonder if Warren Zevon was consciously referencing the riff of the original &amp;ldquo;Sweet Home Alabama&amp;rdquo; in &amp;ldquo;Werewolves of London.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;BTW, even those two songs are classics, I think Kid Rock&amp;rsquo;s mashup is an improvement on both of them. And I speak as someone who is not that a big fan of his :-)&lt;/p&gt;
&lt;p&gt;Kid Rock rips hooks from other 70s and 80s artists as well&amp;hellip; the drum sound is straight from the Ramones&amp;rsquo; cover of &amp;ldquo;Time Has Come Today,&amp;rdquo; the vocal harmonies from Madonna&amp;rsquo;s &amp;ldquo;Like a Prayer&amp;rdquo;, some of the guitar riffs are from &amp;ldquo;Blue Skies&amp;rdquo; by the Allman Brothers and others are from &amp;ldquo;Tumbling Dice&amp;rdquo; and other Rolling Stones classics.&lt;/p&gt;
&lt;p&gt;By Rob Koberg on &lt;a href=&#34;#comment-2065&#34;&gt;September 1, 2008 7:34 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The Dead covered &amp;lsquo;Werewolves of London&amp;rsquo; often (I saw them do it in London :) Best cover of a Warren Zevon song is Dwight Yoakum&amp;rsquo;s version of Carmelita.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-2067&#34;&gt;September 2, 2008 12:15 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Tim: tough points to prove.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2008">2008</category>
      
      <category domain="https://www.bobdc.com//categories/music">music</category>
      
    </item>
    
    <item>
      <title>Changing my mind about XBRL again</title>
      <link>https://www.bobdc.com/blog/changing-my-mind-about-xbrl-ag/</link>
      <pubDate>Thu, 28 Aug 2008 09:20:34 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/changing-my-mind-about-xbrl-ag/</guid>
      
      
      <description><div>Call me a flip-flopper.</div><div>&lt;p&gt;&lt;a href=&#34;http://www.xbrl.org&#34;&gt;&lt;img id=&#34;id202480&#34; src=&#34;http://www.xbrl.org/TMLogos/LOGO-XBRL_with_R.jpg&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;XBRL logo&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;When I first heard about the &lt;a href=&#34;http://www.xbrl.org/Home/&#34;&gt;eXtensible Business Reporting Language&lt;/a&gt;, it sounded great: an XML standard for business reports and their contents. Who could argue with lots of data with lots of value to many people, available in an open standard? I knew some of the people who worked on it, and I dug in and played a bit, but eventually lost interest. A comment that I left on a Tim Bray ongoing posting titled &lt;a href=&#34;http://www.tbray.org/ongoing/When/200x/2007/12/12/XBRL-News&#34;&gt;XBRL News&lt;/a&gt; last December showed the high point (or perhaps low point) of my cynicism:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;XBRL (which has taken over 7 years to achieve the minor level of adoption that you describe) is second only to W3C Schemas in the number of people it&amp;rsquo;s inspired to say &amp;ldquo;sure it&amp;rsquo;s complex, but don&amp;rsquo;t worry&amp;ndash;there&amp;rsquo;ll be tools to take care of that!&amp;rdquo; A key problem is that is that it&amp;rsquo;s so customizable and flexible that it&amp;rsquo;s difficult to put together something that can perform similar processing on multiple arbitrary XBRL documents. Compare DITA, which allows two different documents to appear structurally very different, but has open-source software to abstract away the differences to let us treat the documents as having an equivalent structure. (Of course DITA has a simpler, more straightforward domain, so there&amp;rsquo;s no ocean to boil.)&lt;/p&gt;
&lt;p&gt;When XBRL has an open source equivalent of the DITA Open Toolkit, I&amp;rsquo;ll be ready to take another good look at it. Until then, as with so many standards, without free software that lets us do stuff with data that conforms to the standard, I don&amp;rsquo;t see much incentive for conformance to the standard, except of course for regulations.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Another source of my frustration was that in May of 2004, when I was last playing with it, Fujitsu&amp;rsquo;s set of &lt;a href=&#34;http://software.fujitsu.com/en/interstage-xwand/activity/xbrltools/&#34;&gt;XBRL Tools&lt;/a&gt;, the most popular free XBRL software at the time, was going through the upgrade from XBRL 2.0 to 2.1, and at that point their latest document validator couldn&amp;rsquo;t handle documents created by their latest document creator. Although the Fujitsu tools are now available only to XBRL Consortium members (and joining the consortium means jumping through a hoop or two), more free XBRL software is now available.&lt;/p&gt;
&lt;p&gt;More importantly, Tim pointed out last December (with a link to the SEC&amp;rsquo;s &lt;a href=&#34;http://www.sec.gov/Archives/edgar/xbrl.html&#34;&gt;XBRL Data Submitted in the XBRL Voluntary Program on EDGAR&lt;/a&gt; page) that less than 100 big public companies were reporting in XBRL, but this week I count 468 companies listed on that page. That&amp;rsquo;s a substantial increase, and that&amp;rsquo;s plenty of data of data to play with—enough to make the whole idea of XBRL more than just a theoretical nice-to-have.&lt;/p&gt;
&lt;p&gt;Something else that drew me back to XBRL is that as I studied various kinds of ontology and taxonomy work, I noticed that XBRL people were doing a lot of very careful taxonomy work to support specific business goals. An &lt;a href=&#34;http://www.iaconline.org&#34;&gt;American Council for Technology&lt;/a&gt; white paper titled &amp;ldquo;Transforming Financial Information – Use of XBRL in Federal Financial Management&amp;rdquo; (&lt;a href=&#34;http://www.actgov.org/actiac/documents/pdfs/XBRLWhitePaper.pdf%20&#34;&gt;PDF&lt;/a&gt;) quoted Charles Hoffman (both the father and author of XBRL, according to the paper) as saying&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;XBRL is in a field referred to collectively as Semantic Technology&amp;hellip; Semantic Technologies is a multi-faceted field with progressive layers of technology and complexity. The World Wide Web Consortium developed a set of semantic standards established at the turn of the century (most significant of which are the Resource Description Framework (RDF) and the Web Ontology Language (OWL)). This field is rich with possibilities and stands as the next logical step in the natural progression of information technology to seek a higher value proposition.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I&amp;rsquo;ll bite, but it doesn&amp;rsquo;t seem that many others have, because I don&amp;rsquo;t see much work on the potential connection XBRL and the W3C-oriented semantic web standards. (I&amp;rsquo;d be happy to have things I missed pointed out to me.) Many web pages out there mention both XBRL and RDF, but mostly as examples in lists, with no explicit discussion of possible relationships. The &lt;a href=&#34;http://xbrlontology.com/&#34;&gt;XBRL Ontology Specification&lt;/a&gt; hasn&amp;rsquo;t gotten any further than the 0.0 status it had in April of last year, with &lt;a href=&#34;http://groups.google.com/group/xbrl-ontology-specification-group/browse_thread/thread/38d8a481d5251155&#34;&gt;mailing list activity&lt;/a&gt; ending a month after it started. There was a &lt;a href=&#34;http://www.semantic-conference.com/session/1007/&#34;&gt;Financial Services XBRL Seminar&lt;/a&gt; at the 2008 Semantic Web Technology Conference, but I haven&amp;rsquo;t seen any evidence of cross-fertilization that came out of it.&lt;/p&gt;
&lt;p&gt;So I&amp;rsquo;m going to pursue the potential connections between XBRL and RDF-related technology myself. As I read about all those information relationships that XBRL can model, on the one hand I&amp;rsquo;m thinking &amp;ldquo;Cool! (How about that XLink, &lt;a href=&#34;http://www.xml.com/pub/a/2002/03/13/xlink.html&#34;&gt;after all&lt;/a&gt;!&amp;rdquo;) and on the other hand I&amp;rsquo;m thinking &amp;ldquo;This would be so difficult to model as triples!&amp;rdquo; I&amp;rsquo;m more interested in a bottom-up proof-of-concept than in a top-down ontology, though. For a start, instead of modeling all of XBRL&amp;rsquo;s many potential data structures as triples, I plan to model a subset that can be queried with reasonably non-contorted SPARQL queries and to put together a demo using some of that EDGAR data. Playing with the existing free software and writing some XSLT to convert EDGAR filings to RDF will be priorities. I&amp;rsquo;ll report on my progress (or lack thereof) as I move along.&lt;/p&gt;
&lt;h2 id=&#34;9-comments&#34;&gt;9 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.openlinksw.com/blog/~kidehen&#34; title=&#34;http://www.openlinksw.com/blog/~kidehen&#34;&gt;Kingsley Idehen&lt;/a&gt; on &lt;a href=&#34;#comment-2048&#34;&gt;August 28, 2008 10:48 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Bob,&lt;/p&gt;
&lt;p&gt;Please make time to do the following:&lt;/p&gt;
&lt;p&gt;1. Visit &lt;a href=&#34;http://ode.openlinksw.com&#34;&gt;http://ode.openlinksw.com&lt;/a&gt;&lt;br /&gt;
2. Follow the examples link&lt;br /&gt;
3. See XBRL instance data in Linked Data form&lt;br /&gt;
4. Download and install the OpenLink Data Explorer for Firefox&lt;br /&gt;
5. Visit any XBRL instance doc URL&lt;br /&gt;
6. Use the &amp;ldquo;View | Linked Data Sources&amp;rdquo; feature of ODE to flip from the XBRL view to Linked Data View&lt;/p&gt;
&lt;p&gt;&lt;br /&gt;
Kingsley&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-2049&#34;&gt;August 28, 2008 2:01 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Thanks Kingsley, that works as describes and is very cool.&lt;/p&gt;
&lt;p&gt;Bob&lt;/p&gt;
&lt;p&gt;By Rick Jelliffe on &lt;a href=&#34;#comment-2051&#34;&gt;August 29, 2008 11:31 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I think the XBRL instances are a really terrific design: linking rather than direct markup.&lt;/p&gt;
&lt;p&gt;But the modeling on top of XSD absolutely stinks as a system, big suckerooney: there are some nice tools to be sure, but the tools never stop someone from needing to know what is going on, and what is going on is XSD+++++. IIRC it uses equivalence classes a lot, which is one of those &amp;ldquo;we don&amp;rsquo;t implement that&amp;rdquo; features for some data-binding/DMBS kinds of tools.&lt;/p&gt;
&lt;p&gt;The difference is that the XBRL modeling is at least more straightforward. It is not the XSD sea-of-details approach.&lt;/p&gt;
&lt;p&gt;So a mixed bag on friendliness, but definitely a &amp;lsquo;hardcore&amp;rsquo; technology, not a casual one or for visibility to Mom-and-Pop.&lt;/p&gt;
&lt;p&gt;Cheers&lt;br /&gt;
Rick&lt;/p&gt;
&lt;p&gt;By Dave Raggett on &lt;a href=&#34;#comment-2052&#34;&gt;August 29, 2008 4:20 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;XBRL&amp;rsquo;s complexity reflects the richness of the domain it models. Its sophisticated use of XLink certainly makes XBRL hard to process with XSLT. I prefer to think of XBRL as a transfer format, as it turns out to be rather easy to convert to RDF. This way you can generate different kinds of XBRL reports using queries over a scalable RDF triple store, such as sesame. This also opens the theoretical possibility for XBRL filings to be submitted in one of the RDF syntaxes, e.g. turtle. The current XML syntax makes use of XML Schema to assist with validation of XBRL filings, and it will be interesting to look at validation using Semantic Web technologies as an alternative.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-2053&#34;&gt;August 29, 2008 5:06 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Thanks Dave!&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;it turns out to be rather easy to convert to RDF.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Just what I was looking for! Who has done this? Is there code available to use or see?&lt;/p&gt;
&lt;p&gt;thanks,&lt;/p&gt;
&lt;p&gt;Bob&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://people.w3.org/~dsr/blog/&#34; title=&#34;http://people.w3.org/~dsr/blog/&#34;&gt;Dave Raggett&lt;/a&gt; on &lt;a href=&#34;#comment-2054&#34;&gt;August 30, 2008 12:47 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;This is something I am working on as a background activity. The code for converting XBRL to RDF turtle syntax is in C and linked against libxml2. I will ask my manager at JustSystems if it would be possible to release this as open source, but that will inevitably take some time as it requires sign-off at the top levels of the company. I will post some details on the relation between XBRL and RDF on my blog, but due to vacation and other higher priority work, this may take a while.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-2055&#34;&gt;August 30, 2008 1:05 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Cool, thanks!&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://rhizomik.net/~roberto&#34; title=&#34;http://rhizomik.net/~roberto&#34;&gt;Roberto García&lt;/a&gt; on &lt;a href=&#34;#comment-2056&#34;&gt;August 30, 2008 5:09 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Hi Bob,&lt;/p&gt;
&lt;p&gt;I have started to map XBRL XSDs and instance data from the EDGAR program to OWL and RDF. I use the generic mappings provided by ReDeFer XSD2OWL and XML2RDF tools (&lt;a href=&#34;http://rhizomik.net/redefer)&#34;&gt;http://rhizomik.net/redefer)&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The mappings are partial and quite preliminary. All is available from &lt;a href=&#34;http://rhizomik.net/ontologies/bizontos&#34;&gt;http://rhizomik.net/ontologies/bizontos&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Best,&lt;/p&gt;
&lt;p&gt;Roberto&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-2057&#34;&gt;August 31, 2008 10:22 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Thanks Roberto, this looks interesting.&lt;/p&gt;
&lt;p&gt;Bob&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2008">2008</category>
      
      <category domain="https://www.bobdc.com//categories/xbrl">XBRL</category>
      
    </item>
    
    <item>
      <title>Jonathan Zittrain&#39;s &#34;The Future of the Internet: and How to Stop It&#34;</title>
      <link>https://www.bobdc.com/blog/jonathan-zittrains-the-future/</link>
      <pubDate>Thu, 21 Aug 2008 09:09:05 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/jonathan-zittrains-the-future/</guid>
      
      
      <description><div>Highly recommended.</div><div>&lt;p&gt;&lt;a href=&#34;http://www.amazon.com/exec/obidos/ISBN=0300124872/bobducharmeA/&#34;&gt;&lt;img id=&#34;id202480&#34; src=&#34;http://ecx.images-amazon.com/images/I/51Eq-gmEYyL.jpg&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;The Future of the Internet: and How to Stop It&#34; width=&#34;160px&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The title of Jonathan Zittrain&amp;rsquo;s book &lt;a href=&#34;http://www.amazon.com/exec/obidos/ISBN=0300124872/bobducharmeA/&#34;&gt;The Future of the Internet&lt;/a&gt; makes it sound like one of those upbeat future technology books that you see people in suits reading on planes, but the subtitle &amp;ldquo;and How to Stop It&amp;rdquo; shows that it&amp;rsquo;s not so upbeat. Zittrain, the Professor of Internet Governance and Regulation at Oxford University and the co-founder of Harvard Law School&amp;rsquo;s &lt;a href=&#34;http://cyber.law.harvard.edu/&#34;&gt;Berkman Center for Internet &amp;amp; Society&lt;/a&gt;, describes how so much use of the Internet is headed in directions that contradict the principles that made the Internet great in the first place. The most important of these principles is what he calls generativity—flexibility in the creation of hardware, operating systems, applications, or websites that allow people to make new contributions, often resulting in unexpected contributions that others can build on further. While Linux, Apache web servers, Firefox, wikis, the IBM PC&amp;rsquo;s open architecture, and many other platforms have provided this so far, the increasing use of &amp;ldquo;tethered appliances&amp;rdquo; to perform Internet-related tasks threatens this pattern. Products such as the iPhone, TiVo, and the XBox are so tightly controlled by their makers that any innovations built on these platforms must come from within the companies that control them, much like any innovation in the U.S. telephone system had to come from the monopoly company that controlled it for so many decades. Sure, you can write a new application for the iPhone, but no one can load your app onto their iPhone until it goes into Apple, gets approved, and then distributed by them. If you want to add a new menu option to Firefox and recompile it, no such approval process is necessary for people to use it, and this kind of freedom is how the Internet grew to where it is today.&lt;/p&gt;
&lt;p&gt;I don&amp;rsquo;t want to rehash the whole book here, but I&amp;rsquo;ll admit that I expected it to be fairly dry and read it mostly out of a sense of responsibility to be up on these issues. It actually is a fairly quick read; I read most of my copy sitting on a beach. Once Zittrain lays out his case, which includes a history of the Internet that was fascinating to someone who&amp;rsquo;s read quite a few histories of the Internet, he reviews several of the things that have gone wrong (for example, spam and malware). This sets the stage for how tightly-controlled Internet walled gardens are becoming more appealing to people, and he describes some of the decentralized, grass-roots practices that have dealt with such issues surprisingly effectively—for example, robots.txt files and Wikipedia&amp;rsquo;s practices for resolving disputes.&lt;/p&gt;
&lt;p&gt;He does present a hopeful case for how the future can build on current work by technical people and legal scholars to prevent the looming corporate-controlled Internet. (One legal scholars he mentions is Pamela Samuelson, a member of the markup geek family if only &lt;a href=&#34;http://people.ischool.berkeley.edu/~glushko/&#34;&gt;by marriage&lt;/a&gt;.) I strongly recommend the book to geeks interested in relevant legal issues and to lawyers interested in Internet technology, because Zittrain lays out the explicit, implicit, and potential connections between these worlds so well.&lt;/p&gt;
&lt;p&gt;He&amp;rsquo;s made an online version of the book available under a Creative Commons license at &lt;a href=&#34;http://www.jz.org&#34;&gt;at his web site&lt;/a&gt;. All his talk of people building on each others&amp;rsquo; work gave me one nice idea: create a version of his footnotes in which court case citations are live links to publicly available versions of those cases. For example, a link from the citation in &lt;a href=&#34;http://yupnet.org/zittrain/notes-chapter-2#note-3&#34;&gt;footnote 3 of chapter 2&lt;/a&gt; of the book to the judge&amp;rsquo;s decision on &lt;a href=&#34;http://cases.justia.com/us-court-of-appeals/F2/238/266/247746&#34;&gt;238 F.2d 266&lt;/a&gt; at justia.com.&lt;/p&gt;
&lt;p&gt;One more thing for the to-do pile.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2008">2008</category>
      
      <category domain="https://www.bobdc.com//categories/book-reviews">book reviews</category>
      
    </item>
    
    <item>
      <title>How you can explore a new set of linked data</title>
      <link>https://www.bobdc.com/blog/how-you-can-explore-a-new-set/</link>
      <pubDate>Fri, 15 Aug 2008 09:50:02 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/how-you-can-explore-a-new-set/</guid>
      
      
      <description><div>Some great tips from Dean Allemang.</div><div>&lt;p&gt;Although he doesn&amp;rsquo;t describe it in linked data terms, a &lt;a href=&#34;http://dallemang.typepad.com/my_weblog/2008/08/rdf-as-self-describing-data.html&#34;&gt;recent posting&lt;/a&gt; from Dean Allemang has some great suggestions for how to dive into a set of SPARQL-accessible data you know nothing about in order to find out what&amp;rsquo;s there. If there&amp;rsquo;s cool stuff in the data set, this is a lot of fun. (Also check out the recent &lt;a href=&#34;http://blogs.talis.com/nodalities/2008/07/dean-allemang-talks-about-topquadrant-and-semantic-web-for-the-working-ontologist.php&#34;&gt;Talking with Talis&lt;/a&gt; with Dean, where he describes many examples of semantic web technology helping large organizations solve very real problems.)&lt;/p&gt;
&lt;p&gt;If someone gives you access to an SQL database, commands like &lt;code&gt;show databases&lt;/code&gt;, &lt;code&gt;use [database name]&lt;/code&gt;, &lt;code&gt; show tables&lt;/code&gt;, and &lt;code&gt;describe [table name]&lt;/code&gt; let you explore the data, even if you have no idea of its schema at first, but that&amp;rsquo;s a big &amp;ldquo;if&amp;rdquo;—there aren&amp;rsquo;t many large relational databases with useful data available over the public Internet waiting for you to issue SQL queries. There is a growing amount of linked data with SPARQL front ends, and Dean describes a few general-purpose SPARQL queries and a few more that build on the results to explore a set of data that you might know nothing about. He uses &lt;a href=&#34;https://www.bobdc.com/blog/querying-dbpedia&#34;&gt;dbpedia&lt;/a&gt; in his examples, so we know that his demonstration will work with a huge data set.&lt;/p&gt;
&lt;p&gt;Before recommending that everyone else go and try this, I thought I should try it myself on another data set whose structure I knew nothing about, so I went to Richard Cygniak&amp;rsquo;s &lt;a href=&#34;http://richard.cyganiak.de/2007/10/lod/&#34;&gt;The Linking Open Data dataset cloud&lt;/a&gt; page (at the Linked Data Planet conference, pretty much everyone had a slide of this interactive diagram) to find another data set on which to try this out. Some servers were down, and some had RDF files to download that I could have queried against, but I ended up with the &lt;a href=&#34;http://www4.wiwiss.fu-berlin.de/gutendata/&#34;&gt;D2R Server for the Gutenberg Project&lt;/a&gt;, where I entered SPARQL queries at its &lt;a href=&#34;http://www4.wiwiss.fu-berlin.de/gutendata/snorql/&#34;&gt;SNORQL web-based front end&lt;/a&gt; for SPARQL queries.&lt;/p&gt;
&lt;p&gt;As Dean suggested, I listed all the predicates:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;SELECT DISTINCT ?p WHERE {?s ?p ?o}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I saw a lot of Dublin Core predicates, including dc:creator, and dc:title, and dc:description. I did this to list all the authors:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;SELECT DISTINCT ?o where { ?s &amp;lt;http://purl.org/dc/elements/1.1/creator&amp;gt; ?o }
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;One of the values there was &amp;ldquo;db:people/Goethe_Johann_Wolfgang_von_1749-1832&amp;rdquo;, so I did the following to list his works in Project Gutenberg:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;SELECT ?title where {
  ?s &amp;lt;http://purl.org/dc/elements/1.1/creator&amp;gt; 
     &amp;lt;http://www4.wiwiss.fu-berlin.de/gutendata/resource/people/Goethe_Johann_Wolfgang_von_1749-1832&amp;gt;;
     &amp;lt;http://purl.org/dc/elements/1.1/title&amp;gt; ?title.
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I wondered about Project Gutenberg&amp;rsquo;s description of one title, &amp;ldquo;The Sorrows of Young Werther&amp;rdquo;, so I entered this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;SELECT ?desc where {
  ?s &amp;lt;http://purl.org/dc/elements/1.1/title&amp;gt; &amp;quot;The Sorrows of Young Werther&amp;quot;;
     &amp;lt;http://purl.org/dc/elements/1.1/description&amp;gt; ?desc.
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The answer is: &amp;ldquo;Translation of: Die Leiden des jungen Werther.&amp;rdquo; (The German version is also available—most of the Project Gutenberg Goethe texts are in German.)&lt;/p&gt;
&lt;p&gt;I could go on, and I certainly will try this with more sites that offer a SNORQL front end to a SPARQL interface. Like I said, it&amp;rsquo;s a lot of fun; check out Dean&amp;rsquo;s suggested queries, Richard&amp;rsquo;s suggested data sets, and try it yourself!&lt;/p&gt;
&lt;h2 id=&#34;3-comments&#34;&gt;3 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://sourceforge.net/projects/meo/&#34; title=&#34;http://sourceforge.net/projects/meo/&#34;&gt;Colm Sean Murdoch O Cinneide.&lt;/a&gt; on &lt;a href=&#34;#comment-1977&#34;&gt;August 15, 2008 2:18 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I do not know what an information resource is. I have come to think of RDF as algerbra &amp;ldquo;over&amp;rdquo; information resources. RDF writers should be barred from coining new URI&amp;rsquo;s. I&amp;rsquo;ll stop pontificating and read the rest of this interesting material now ;-)&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-1978&#34;&gt;August 15, 2008 3:04 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;URIs (and sets of them packaged as ontologies) are a lot like source code: everyone agrees that re-use of existing ones is good, but instead of looking for some to re-use, they create some and tell the world that they should re-use it. This is easier than tracking down existing well-design URIs (or code) to re-use. That being said, what you need isn&amp;rsquo;t always out there, so sometimes you have to make up new URIs (or code).\&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.openlinksw.com/blog/~kidehen&#34; title=&#34;http://www.openlinksw.com/blog/~kidehen&#34;&gt;Kingsley Idehen&lt;/a&gt; on &lt;a href=&#34;#comment-1979&#34;&gt;August 15, 2008 3:16 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Bob,&lt;/p&gt;
&lt;p&gt;Here another way:&lt;/p&gt;
&lt;p&gt;1. Go to &lt;a href=&#34;http://dbpedia.org:8890/isparql&#34;&gt;http://dbpedia.org:8890/isparql&lt;/a&gt;&lt;br /&gt;
2. Go to &amp;ldquo;Advanced Tab&amp;rdquo; (just so you can paste in the query that follows)&lt;/p&gt;
&lt;p&gt;Query:&lt;br /&gt;
PREFIX rdf: &lt;a href=&#34;http://www.w3.org/1999/02/22-rdf-syntax-ns%5C#&#34;&gt;http://www.w3.org/1999/02/22-rdf-syntax-ns\#&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;SELECT DISTINCT *&lt;br /&gt;
FROM &lt;a href=&#34;http://dbpedia.org&#34;&gt;http://dbpedia.org&lt;/a&gt;&lt;br /&gt;
WHERE {&lt;br /&gt;
?s ?p ?o. ?o bif:contains &amp;ldquo;&amp;lsquo;Goethe_Johann_Wolfgang&amp;rsquo;&amp;rdquo;&lt;br /&gt;
}&lt;/p&gt;
&lt;p&gt;3. The results grid contains URIs, click on the URI for Wolfgang, and select the &amp;ldquo;Describe&amp;rdquo; option.&lt;/p&gt;
&lt;p&gt;&lt;br /&gt;
You can also do it the other way round starting with this query:&lt;/p&gt;
&lt;p&gt;PREFIX rdf: &lt;a href=&#34;http://www.w3.org/1999/02/22-rdf-syntax-ns%5C#&#34;&gt;http://www.w3.org/1999/02/22-rdf-syntax-ns\#&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;SELECT DISTINCT *&lt;br /&gt;
FROM &lt;a href=&#34;http://dbpedia.org&#34;&gt;http://dbpedia.org&lt;/a&gt;&lt;br /&gt;
WHERE {&lt;br /&gt;
?s ?p ?o. ?o bif:contains &amp;ldquo;&amp;lsquo;Goethe_Johann_Wolfgang&amp;rsquo;&amp;rdquo;&lt;br /&gt;
}&lt;/p&gt;
&lt;p&gt;To see the visualization of the SPARQL Query click on the triple icon in the &amp;ldquo;Advanced&amp;rdquo; UI.&lt;/p&gt;
&lt;p&gt;As you explore the resulting graph, this visual query tool will constuct SPARQL and the fly, and at each turn you can visualize the queries etc..&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2008">2008</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
      <category domain="https://www.bobdc.com//categories/semantic-web">semantic web</category>
      
    </item>
    
    <item>
      <title>SKOS and SWOOP: how</title>
      <link>https://www.bobdc.com/blog/skos-and-swoop-how/</link>
      <pubDate>Tue, 12 Aug 2008 09:54:05 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/skos-and-swoop-how/</guid>
      
      
      <description><div>A step-by-step example.</div><div>&lt;p&gt;&lt;a href=&#34;https://www.bobdc.com/blog/using-the-ontology-editing-too&#34;&gt;Last week&lt;/a&gt; I discussed the possibility of using the &lt;a href=&#34;http://code.google.com/p/swoop/&#34;&gt;SWOOP&lt;/a&gt; ontology editor and the W3C&amp;rsquo;s &lt;a href=&#34;http://www.w3.org/2004/02/skos/&#34;&gt;SKOS&lt;/a&gt; standard to create taxonomies or thesaurii, and I promised to go into a little more detail about how to do so.&lt;/p&gt;
&lt;p&gt;(Again, I encourage those more familiar than I am with SKOS and these tools to correct me.) The file &lt;a href=&#34;http://www.w3.org/2004/02/skos/core/history/2006-04-18.rdf&#34;&gt;2006-04-18.rdf&lt;/a&gt; defines the SKOS Core Vocabulary. It defines some of the more sophisticated relationships that I described last week, such as broaderPartitive and narrowerInstantive, as deprecated properties with the owl:versionInfo message &amp;ldquo;This term has been moved to the &amp;lsquo;SKOS Extensions&amp;rsquo; vocabulary. See &lt;a href=&#34;http://www.w3.org/2004/02/skos/extensions/%22&#34;&gt;http://www.w3.org/2004/02/skos/extensions/&amp;quot;&lt;/a&gt;. I downloaded the &lt;a href=&#34;http://www.w3.org/2004/02/skos/extensions.rdf&#34;&gt;extensions ontology file&lt;/a&gt; and wrote &lt;a href=&#34;http://www.snee.com/bobdc.blog/files/combineSkosOntology.xsl&#34;&gt;a little XSLT stylesheet&lt;/a&gt; to combine the core file and the extensions file and to remove the deprecated properties. Otherwise, when viewing the combined ontology in SWOOP, you would see properties like broaderPartitive listed twice: the deprecated version and the new version.&lt;/p&gt;
&lt;p&gt;To use this ontology with SWOOP to define a thesaurus, start up SWOOP and load the &lt;a href=&#34;http://www.snee.com/bobdc.blog/files/skoscombo.rdf&#34;&gt;combined ontology&lt;/a&gt; created from 2006-04-18.rdf and extensions.rdf. Add a term (or, in OWL terms, add an Individual) such as &amp;ldquo;museum&amp;rdquo; by clicking SWOOP&amp;rsquo;s &amp;ldquo;Add I&amp;rdquo; button, which has its &amp;ldquo;I&amp;rdquo; inside of a little pink diamond. In the New Entity dialog box that appears, the default value for &amp;ldquo;Instance-of&amp;rdquo; is owl:Thing, but you&amp;rsquo;re working with SKOS, so pick the Concept class instead. Enter Museum as an ID, and then click &amp;ldquo;Add and Close&amp;rdquo;. Do the same to add the term &amp;ldquo;TheLouvre&amp;rdquo;. Remember not to include any space in this term&amp;rsquo;s ID name; you can add &amp;ldquo;The Louvre&amp;rdquo; as a Label for the term in the same dialog box if you like.&lt;/p&gt;
&lt;p&gt;After clicking &amp;ldquo;Add and Close&amp;rdquo; for TheLouvre, you&amp;rsquo;ll see the &amp;ldquo;Concise Format&amp;rdquo; tab for TheLouvre, where you can add some metadata about it: the fact that it has the relationship BroaderInstantive to Museum.&lt;/p&gt;
&lt;p&gt;To do this, first click &amp;ldquo;Add&amp;rdquo; next to &amp;ldquo;Object Assertions&amp;rdquo;. In the &amp;ldquo;Select Property&amp;rdquo; dialog box that appears, look at that long list of properties to choose from. This is the main reason to use SWOOP and SKOS together: because the combination gives you the ability to create rich standardized metadata by simply picking names from lists like this. Click on BroaderInstantive and the &amp;ldquo;Select Prop[erty] &amp;amp; Proceed&amp;rdquo; button, then pick Museum from the list that appears. After you click the &amp;ldquo;Add and Close&amp;rdquo; button, you&amp;rsquo;ll see it reflected on the Concise Format information about TheLouvre:&lt;/p&gt;
&lt;img id=&#34;id202593&#34; src=&#34;https://www.bobdc.com/img/main/skosswoop1.jpg&#34; width=&#34;560px&#34; alt=&#34;SKOS screenshot&#34;/&gt;
&lt;p&gt;As I described last week, the Concise Format screen for Museum will have no mention of the term&amp;rsquo;s relationship to TheLouvre, but an automated way to add that is apparently not far off.&lt;/p&gt;
&lt;p&gt;To get some more ideas about the things that SWOOP can do with a SKOS file, download the &lt;a href=&#34;http://cain.ice.ucdavis.edu/thesauri/ismt.rdf&#34;&gt;Invasive Species Management Thesaurus&lt;/a&gt; from the &lt;a href=&#34;http://cain.ice.ucdavis.edu/&#34;&gt;California Information Node&lt;/a&gt;, load it into SWOOP, and look at the Concise Format tab for a few terms. They have a lot of metadata. The recent devX article &lt;a href=&#34;http://www.devx.com/semantic/Article/38629&#34;&gt;Applying SKOS Concept Schemes&lt;/a&gt; also showed me that there are plenty of other aspects of SKOS for me to explore.&lt;/p&gt;
&lt;p&gt;Looking this over made me even more sure of something I wrote last week: once Pellet supports SPARQL CONSTRUCT queries, the combination of the SKOS ontology, SWOOP, and Pellet is going to be very useful for people working with taxonomies and thesaurii.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2008">2008</category>
      
      <category domain="https://www.bobdc.com//categories/skos">SKOS</category>
      
      <category domain="https://www.bobdc.com//categories/metadata">metadata</category>
      
    </item>
    
    <item>
      <title>Using the ontology editing tool SWOOP to edit taxonomies and thesaurii</title>
      <link>https://www.bobdc.com/blog/using-the-ontology-editing-too/</link>
      <pubDate>Wed, 06 Aug 2008 10:23:50 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/using-the-ontology-editing-too/</guid>
      
      
      <description><div>Hopefully, as a more powerful open source alternative to existing taxonomy packages.</div><div>&lt;p&gt;In the online course in taxonomy development that I took recently we reviewed several popular taxonomy development tools. I found them to be expensive or to have clunky, dated interfaces, and was disappointed that the formats most of these programs supported for storing saved work was either a binary proprietary format or what they just called &amp;ldquo;XML&amp;rdquo;. (I&amp;rsquo;m open to correction on any of these points.) &amp;ldquo;OK,&amp;rdquo; I wondered, &amp;ldquo;What XML?&amp;rdquo; Reviewing some samples of their exported XML, it was pretty easy to understand the structure by looking at the element names and container patterns, but I never saw any mentions of a DTD, and I thought it would be ideal if there was a standard format that they could share.&lt;/p&gt;
&lt;p&gt;There is a standard format that they can share: &lt;a href=&#34;http://www.w3.org/2004/02/skos/&#34;&gt;SKOS&lt;/a&gt;, which provides an ontology (available as an OWL file &lt;a href=&#34;http://www.w3.org/2004/02/skos/core/history/2006-04-18.rdf&#34;&gt;here&lt;/a&gt;) that defines the kinds of relationships that taxonomists want to see in taxonomy or thesaurus development. This includes basic ones such as &amp;ldquo;narrower&amp;rdquo; and &amp;ldquo;broader&amp;rdquo; and more sophisticated variations on these such as &amp;ldquo;broaderPartitive&amp;rdquo; and &amp;ldquo;narrowerInstantive&amp;rdquo;. (A little background on these variations, featuring examples from the ANSI Z39 standard for controlled vocabularies that I &lt;a href=&#34;https://www.bobdc.com/blog/what-is-a-taxonomy&#34;&gt;wrote about&lt;/a&gt; recently: in a hierarchy of terms, we can qualify the relationship between a term in a tree and its parent by saying that the child node is narrowerInstantive, as the Louvre is an instance of a museum, or narrowerPartitive, as a brain stem is a part of a brain, or narrowerGeneric, as the class of parrots is a subclass of the class of birds. In addition to defining the taxonomy term relationship properties &amp;ldquo;broader&amp;rdquo; and &amp;ldquo;narrower&amp;rdquo;, SKOS defines instantive, partitive, and generic subproperties of &amp;ldquo;broader&amp;rdquo; and &amp;ldquo;narrower&amp;rdquo;.)&lt;/p&gt;
&lt;p&gt;If the SKOS standard lays out the potential relationships and provides a definition of these relationships in a standard syntax (OWL), and an open source GUI tool like &lt;a href=&#34;http://code.google.com/p/swoop/&#34;&gt;SWOOP&lt;/a&gt; can read that and let you define the terms and relationships in a new thesaurus by pointing and clicking, then the most difficult part of providing a new alternative to the well-known taxonomy tools is already done, right? Well, not quite. There are two key things missing, but we&amp;rsquo;ll see them both available for SWOOP use in time:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Thesaurus editors usually offer a series of canned reports about the terms and relationships within a given thesaurus—the kinds of reports that taxonomists want to see as they perform their work. A little Python code to read a SKOS-based thesaurus and then sort and summarize its contents would be simple to write.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;As far as I could tell, intelligence about inverse relationships is not built into SWOOP. For example, let&amp;rsquo;s say I read the SKOS ontology into SWOOP and create Individuals (or, in object-oriented terms, instances) of the terms &amp;ldquo;The Louvre&amp;rdquo; and &amp;ldquo;museum&amp;rdquo;. Then, I use the appropriate SWOOP features to indicate that The Louvre has the relationship broaderInstantive to museum, because The Louvre is an instance of the class museum. I&amp;rsquo;d like to then go to SWOOP&amp;rsquo;s panel for museum and see &amp;ldquo;The Louvre&amp;rdquo; listed as having a narrowerInstantive relationship to this term I&amp;rsquo;m reading about, but I won&amp;rsquo;t.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;My first idea was to tell SWOOP to save the ontology file with these relationships and instances, then use &lt;a href=&#34;http://pellet.owldl.com/&#34;&gt;Pellet&lt;/a&gt; to turn the implicit relationships in the file (for example, that museum has a narrowerInstantive relationship to The Louvre) into explicit ones written right out in the same file, and then read that file with all the spelled-out relationships back into SWOOP, but apparently Pellet isn&amp;rsquo;t quite there yet. A SPARQL query delivered via Pellet can pull out explicit and implicit triples, but not in a syntax that can be used for an RDF/OWL file. I saw on the Pellet &lt;a href=&#34;http://lists.owldl.com/pipermail/pellet-users/2008-July/002911.html&#34;&gt;mailing list&lt;/a&gt; that the next version would support SPARQL &lt;a href=&#34;http://www.w3.org/TR/2004/WD-rdf-sparql-query-20041012/#construct&#34;&gt;CONSTRUCT&lt;/a&gt; queries that let you create a new set of RDF around the returned triples, so that will help.&lt;/p&gt;
&lt;p&gt;Describing all this here, I can casually refer to the use of SWOOP to read an ontology file and then define individuals and their taxonomic relationships, but I&amp;rsquo;d like to spell out in more detail how I used SWOOP to do this. My family is about to head out for a summer beach vacation, so instead of postponing the completion of a great big posting on all this, I&amp;rsquo;m making this overview part 1, and I will describe the hands-on part in part 2 sometime next week.&lt;/p&gt;
&lt;h2 id=&#34;4-comments&#34;&gt;4 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.cs.rpi.edi/~hendler&#34; title=&#34;http://www.cs.rpi.edi/~hendler&#34;&gt;Jim Hendler&lt;/a&gt; on &lt;a href=&#34;#comment-1964&#34;&gt;August 6, 2008 1:56 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Bob, Pellet will do this, but not with the default SWOOP plugin - the online version of Pellet (I think in the code fork at Google) will export an ontology with the additions - problem is SWOOP is no longer under real development (but please feel free to contribute open source and keep it running - there&amp;rsquo;s a large user community who would love to see this happe) - the pellet at owldl.org has a lot of stuff in it the original one didn&amp;rsquo;t (and the incremental Pellet developed by Chris Halaschek-Weiner is an incredible improvement) so there&amp;rsquo;s been a lot of code splitting and such since it left Maryland &amp;ndash; sorry about that&amp;hellip;&lt;br /&gt;
-Jim H.\&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://clarkparsia.com/weblog/&#34; title=&#34;http://clarkparsia.com/weblog/&#34;&gt;Kendall Clark&lt;/a&gt; on &lt;a href=&#34;#comment-1965&#34;&gt;August 6, 2008 6:36 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Actually, Bob, Pellet can perfectly well do what you want, only not in yr preferred way. As Evren said on the pellet-users list in response to yr question, you can get what you want by writing some Java, but the command-line interface doesn&amp;rsquo;t support SPARQL CONSTRUCT queries presently. It will in the next release, due soon now.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-1968&#34;&gt;August 8, 2008 10:01 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Thanks Jim&amp;ndash;I was running the most recent version of Pellet (downloaded from the website I linked to) from the command line. I will play some more.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-1969&#34;&gt;August 8, 2008 10:06 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Kendall&amp;ndash;I&amp;rsquo;m too lazy to write the Java code. I&amp;rsquo;ll just wait.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2008">2008</category>
      
      <category domain="https://www.bobdc.com//categories/skos">SKOS</category>
      
      <category domain="https://www.bobdc.com//categories/metadata">metadata</category>
      
    </item>
    
    <item>
      <title>DevX article &#34;Relational Database Integration with RDF/OWL&#34;</title>
      <link>https://www.bobdc.com/blog/devx-article-relational-databa/</link>
      <pubDate>Wed, 30 Jul 2008 10:26:33 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/devx-article-relational-databa/</guid>
      
      
      <description><div>Summarizing and demonstrating the use of relational databases and OWL metadata together to get more out of the databases.</div><div>&lt;p&gt;&lt;a href=&#34;http://www.devx.com/semantic/Article/38700&#34;&gt;&lt;img id=&#34;id202483&#34; src=&#34;http://assets.devx.com/articleicons/19303.jpg&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;green OWL image&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://www.devx.com&#34;&gt;DevX.com&lt;/a&gt; has just published my article &lt;a href=&#34;http://www.devx.com/semantic/Article/38700&#34;&gt;Relational Database Integration with RDF/OWL&lt;/a&gt;, which summarizes and updates some things I&amp;rsquo;ve written about here and in my paper for the XML 2006 conference. The article describes how to take a Eudora-based address book database and an Outlook-based one, load them into MySQL, add some OWL metadata, and then issue SPARQL queries to answer reasonably real-world questions that couldn&amp;rsquo;t have been answered without the additional metadata.&lt;/p&gt;
&lt;p&gt;Except for the use of Outlook and Eudora to create the data models, I did the whole thing with free software that runs on both Windows and Linux. Besides MySQL, prominently featured tools included the &lt;a href=&#34;http://d2rq.org/&#34;&gt;D2RQ&lt;/a&gt; RDF interface to relational database managers, the &lt;a href=&#34;http://code.google.com/p/swoop/&#34;&gt;SWOOP&lt;/a&gt; open-source ontology editor, and the &lt;a href=&#34;http://pellet.owldl.com/&#34;&gt;Pellet&lt;/a&gt; OWL reasoner to perform SPARQL queries that take the OWL metadata into account when pulling answer sets.&lt;/p&gt;
&lt;p&gt;The &amp;ldquo;databases&amp;rdquo; being integrated are pretty simple, each being a single table, but unlike some of the more obscure domains that have seen some cool ontology work, address book data is something that everyone can relate to. I hope the article can help more people see how OWL-based metadata can help apps get more out of what might otherwise seem like typical, everyday databases.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2008">2008</category>
      
      <category domain="https://www.bobdc.com//categories/rdf/owl">RDF/OWL</category>
      
    </item>
    
    <item>
      <title>What is a taxonomy?</title>
      <link>https://www.bobdc.com/blog/what-is-a-taxonomy/</link>
      <pubDate>Fri, 11 Jul 2008 17:07:05 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/what-is-a-taxonomy/</guid>
      
      
      <description><div>A standard definition.</div><div>&lt;p&gt;There are many terms that people can&amp;rsquo;t agree on. The great thing about standards is that even when everyone doesn&amp;rsquo;t agree about definitions included in those standards, these definitions provide a common baseline for everyone to work from.&lt;/p&gt;
&lt;p&gt;After hearing many definitions of the word &amp;ldquo;taxonomy&amp;rdquo;, I was pleased to discover the ANSI/NISO Z39.19 standard, &lt;a href=&#34;http://www.niso.org/kst/reports/standards?step=2&amp;amp;gid=None&amp;amp;project_key=7cc9b583cb5a62e8c15d3099e0bb46bbae9cf38a&#34;&gt;Guidelines for the Construction, Format, and Management of Monolingual Controlled Vocabularies&lt;/a&gt;, a specification that among other things defines the terminology for several classes of controlled vocabularies, including &amp;ldquo;taxonomy&amp;rdquo;. (It even defines the term &amp;ldquo;term&amp;rdquo;!) It does a great job of putting the term &amp;ldquo;taxonomy&amp;rdquo; in the right context of related terms such as &amp;ldquo;controlled vocabulary&amp;rdquo; and &amp;ldquo;thesaurus&amp;rdquo;, but not, unfortunately, the term &amp;ldquo;ontology&amp;rdquo;. More on this below; first, I&amp;rsquo;ll paste a few handy quotations.&lt;/p&gt;
&lt;blockquote id=&#34;id202592&#34; class=&#34;pullquote&#34;&gt;&#34;A taxonomy is a controlled vocabulary consisting of preferred terms, all of which are connected in a hierarchy or polyhierarchy&#34;.&lt;/blockquote&gt;
&lt;p&gt;From section 2.5, &amp;ldquo;Maintenance&amp;rdquo;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;A controlled vocabulary can be as simple as a short list of terms or as complex as a thesaurus containing tens of thousands of terms with a complex hierarchical structure and many different types of relationships among the terms.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;From section 4.1, &amp;ldquo;Definitions&amp;rdquo;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;controlled vocabulary&lt;/strong&gt; A list of terms that have been enumerated explicitly.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;taxonomy&lt;/strong&gt; A collection of controlled vocabulary terms organized into a hierarchical structure. Each term in a taxonomy is in one or more parent/child (broader/narrower) relationships to other terms in the taxonomy.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;thesaurus&lt;/strong&gt; A controlled vocabulary arranged in a known order and structured so that the various relationships among terms are displayed clearly and identified by standardized relationship indicators. Relationship indicators should be employed reciprocally.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;From section 5.4, &amp;ldquo;Structure&amp;rdquo;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;There are four different types of controlled vocabularies, determined by their increasingly complex structure. These are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;List&lt;/li&gt;
&lt;li&gt;Synonym ring&lt;/li&gt;
&lt;li&gt;Taxonomy&lt;/li&gt;
&lt;li&gt;Thesaurus&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;From section 5.4.1 &amp;ldquo;List&amp;rdquo;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;A list (also sometimes called a pick list) is a limited set of terms arranged as a simple alphabetical list or in some other logically evident way. Lists are used to describe aspects of content objects or entities that have a limited number of possibilities.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;From section 5.4.2 &amp;ldquo;Synonym Ring&amp;rdquo;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;While a synonym ring is considered to be a type of controlled vocabulary, it plays a somewhat different role than the other types covered by this Standard. Synonym rings cannot be used during the indexing process. Rather, they are used only during retrieval. Use of synonym rings ensures that a concept that can be described by multiple synonymous or equivalent terms will be retrieved if any one of the terms is used in a search.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;From section 5.4.3, &amp;ldquo;Taxonomy&amp;rdquo;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;A taxonomy is a controlled vocabulary consisting of preferred terms, all of which are connected in a hierarchy or polyhierarchy.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;From section 5.4.4, &amp;ldquo;Thesaurus&amp;rdquo;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;A thesaurus is a controlled vocabulary arranged in a known order and structured so that the various relationships among terms are displayed clearly and identified by standardized relationship indicators.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;From section 8.3, &amp;ldquo;Hierarchical Relationships&amp;rdquo;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The use of hierarchical relationships is the primary feature that distinguishes a taxonomy or thesaurus from other, simple forms of controlled vocabularies such as lists and synonym rings.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;From section 2.1, &amp;ldquo;Applying the Standard&amp;rdquo;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;This Standard does not cover numerical classification schemes (except as they correlate to topics such as Dewey, for example), ontologies or semantic networks.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;(It actually does include a section on semantic networks.) The &amp;ldquo;standardized relationship indicators&amp;rdquo; mentioned in section 5.4.4 are typically things like &amp;ldquo;Broader Term&amp;rdquo; to show the relationship between, for example, the terms &amp;ldquo;collies&amp;rdquo; and &amp;ldquo;dogs&amp;rdquo;. This broader/narrower relationship is about the only relationship that a taxonomy tree represents; a thesaurus can show other relationships that one term can have to another—for example, it can be a related term, a preferred term, or a non-preferred term.&lt;/p&gt;
&lt;p&gt;Some research I&amp;rsquo;ve been doing lately, including an &lt;a href=&#34;http://www.hedden-information.com/course-simmons-taxonomies-online.htm&#34;&gt;online course&lt;/a&gt; in &amp;ldquo;Taxonomies and Controlled Vocabularies&amp;rdquo;, gives me the impression that most of what taxonomists do is develop thesauri, not taxonomies. I guess calling themselves &amp;ldquo;thesaurists&amp;rdquo; would sound a bit odd, and the term &amp;ldquo;thesarus&amp;rdquo; conjures up images of the &lt;a href=&#34;http://www.amazon.com/Rogets-International-Thesaurus-Barbara-Kipfer/dp/0060935448&#34;&gt;Roget book&lt;/a&gt; that our teachers told us about as teenagers if we overused any words in the papers we handed in.&lt;/p&gt;
&lt;p&gt;We saw above that a thesaurus uses &amp;ldquo;standardized relationship indicators&amp;rdquo;. I&amp;rsquo;ve described ontologies to people as being like taxonomies, except that you (or more likely, people in your field) get to make up new, specialized relationships beyond those standardized for thesauri. For example, in legal publishing, a higher court ruling could have the relationship property &amp;ldquo;cite&amp;rdquo; to a lower court ruling, with potential values such as &amp;ldquo;overturns&amp;rdquo; or &amp;ldquo;affirms&amp;rdquo;. According to the &lt;a href=&#34;http://www.w3.org/TR/2004/REC-webont-req-20040210/&#34;&gt;OWL Use Cases and Requirements&lt;/a&gt;, which I &lt;a href=&#34;https://www.bobdc.com/blog/some-great-w3c-explanations-of&#34;&gt;wrote about&lt;/a&gt; last August, &amp;ldquo;The word ontology has been used to describe artifacts with different degrees of structure. These range from simple taxonomies (such as the Yahoo hierarchy), to metadata schemes (such as the Dublin Core), to logical theories&amp;rdquo;. This describes a taxonomy as a simpler version of an ontology, so it makes sense to me to add &amp;ldquo;ontology&amp;rdquo; as a fifth step on the four levels of controlled vocabulary shown above.&lt;/p&gt;
&lt;p&gt;If ontologies are a potentially more complex class of taxonomy, then knowledge of taxonomy development can help ontology development, and vice versa. And, I&amp;rsquo;ve got some ideas about the use of ontology development tools to develop taxonomies and thesaurii that I&amp;rsquo;ll be writing about here shortly.&lt;/p&gt;
&lt;h2 id=&#34;3-comments&#34;&gt;3 Comments&lt;/h2&gt;
&lt;p&gt;By Erik Hennum on &lt;a href=&#34;#comment-1943&#34;&gt;July 11, 2008 8:30 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Perhaps along the lines of the ontology spectrum from&lt;br /&gt;
&lt;br /&gt;
from&lt;br /&gt;
&lt;a href=&#34;http://www.ksl.stanford.edu/people/dlm/papers/ontologies-come-of-age-mit-press-(with-citation).htm&#34;&gt;Ontologies come of age&lt;/a&gt;?\&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.webcomposite.com&#34; title=&#34;http://www.webcomposite.com&#34;&gt;Jim Fuller&lt;/a&gt; on &lt;a href=&#34;#comment-1945&#34;&gt;July 12, 2008 9:54 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;nice bit of synchronicity &amp;hellip; I was just yesterday looking for a reasonable definition of taxonomy and look what dropped into my reader &amp;hellip;&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.thodla.com&#34; title=&#34;http://www.thodla.com&#34;&gt;Dorai Thodla&lt;/a&gt; on &lt;a href=&#34;#comment-1947&#34;&gt;July 14, 2008 10:45 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Nice description. I will go and read both the standards documents. A nice discussion would be about various classification and tagging (folksonomy)and faceted classification and discuss some applications.&lt;/p&gt;
&lt;p&gt;There are bits of Semantic Web Technologies (and standards efforts) and related area that may incrementally improve how we gather, classify/view and consume information as well.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2008">2008</category>
      
      <category domain="https://www.bobdc.com//categories/metadata">metadata</category>
      
    </item>
    
    <item>
      <title>XForms &#43; REST &#43; XQuery (&#43; Jenni Tennison)</title>
      <link>https://www.bobdc.com/blog/xforms-rest-xquery-jenni-tenni/</link>
      <pubDate>Mon, 07 Jul 2008 07:17:02 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/xforms-rest-xquery-jenni-tenni/</guid>
      
      
      <description><div>New, standards-based ways to build cool applications.</div><div>&lt;p&gt;As a new application development architecture stack complete with its own cryptic acronym, &lt;a href=&#34;http://datadictionary.blogspot.com/2007/12/introducing-xrx-architecture.html&#34;&gt;XRX&lt;/a&gt; (XForms/REST/XQuery) is a good example of &amp;ldquo;sounds promising, but I don&amp;rsquo;t know when I&amp;rsquo;ll have a chance to dig deeper&amp;rdquo;. So, I was very happy to hear that Jeni Tennison is digging deeper and &lt;a href=&#34;http://news.oreilly.com/2008/07/xrx-xqueries-in-exist.html&#34;&gt;reporting on her findings&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;She&amp;rsquo;s using the &lt;a href=&#34;http://exist.sourceforge.net/&#34;&gt;eXist&lt;/a&gt; XQuery engine (which I once &lt;a href=&#34;http://www.xml.com/pub/a/2006/06/21/scaling-up-with-xquery-part-2.html?page=1&#34;&gt;wrote about&lt;/a&gt; in XML.com) and the &lt;a href=&#34;http://www.orbeon.com/&#34;&gt;Orbeon&lt;/a&gt; XForms engine, which apparently bundles eXist. eXist may never catch up with MarkLogic in features and performance, but hey, it&amp;rsquo;s open source, and seems to be progressing nicely from release to release. eXist and MarkLogic provide a great example of the value of standards in general and XQuery in particular, because the combination lets you develop a standards-compliant proof-of-concept application with completely free software and then scale up with a commercial platform once you&amp;rsquo;ve proved your concept.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve &lt;a href=&#34;http://www.xml.com/pub/a/2003/12/30/xforms.html&#34;&gt;dabbled&lt;/a&gt; with XForms implementations on and off over the years, and I&amp;rsquo;ve been a little disappointed, but only a little, because I saw progress. Orbeon looks like even more progress, so I look forward to hearing more from Jeni about her experiments. Considering how active new MarkLogic employee Micah Dubinko has always been in XForms work, perhaps we&amp;rsquo;ll see some interesting XRX work from him as well in the future. (And, if you&amp;rsquo;re interested in XQuery, don&amp;rsquo;t miss his co-worker Norm Walsh&amp;rsquo;s &lt;a href=&#34;http://norman.walsh.name/2008/07/02/xquery&#34;&gt;reports&lt;/a&gt; on getting to know XQuery.)&lt;/p&gt;
&lt;h2 id=&#34;1-comments&#34;&gt;1 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://piershollott.blogspot.com&#34; title=&#34;http://piershollott.blogspot.com&#34;&gt;piers&lt;/a&gt; on &lt;a href=&#34;#comment-1935&#34;&gt;July 7, 2008 1:08 PM&lt;/a&gt;&lt;/p&gt;
&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;the combination lets you develop a standards-compliant proof-of-concept application with completely free software and then scale up with a commercial platform once you&amp;rsquo;ve proved your concept.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/blockquote&gt;
&lt;/blockquote&gt;
&lt;p&gt;Yes, that&amp;rsquo;s exactly right. Exist and MarkLogic Server complement each other in a way that is definitely more than the sum of the parts, and both are very good reasons to get behind XQuery. The great thing about eXist is that you can use it for a small project, a great way to play around with the technology without committing a lot of time or expense. Not sure how Orbeon fits into that, so I was also happy to read Jeni&amp;rsquo;s article(s).&lt;/p&gt;
&lt;p&gt;pretty sure it&amp;rsquo;s Jeni with one &amp;rsquo;n&amp;rsquo; in the title, but I love the &amp;ldquo;jenni_tenni&amp;rdquo; in the url, which sounds like a swedish 70&amp;rsquo;s pop band.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2008">2008</category>
      
      <category domain="https://www.bobdc.com//categories/xml">XML</category>
      
    </item>
    
    <item>
      <title>The (SGML) geekiest shirt ever</title>
      <link>https://www.bobdc.com/blog/the-sgml-geekiest-shirt-ever/</link>
      <pubDate>Tue, 01 Jul 2008 08:21:10 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/the-sgml-geekiest-shirt-ever/</guid>
      
      
      <description><div>&#34;We&#39;re all special characters&#34;.</div><div>&lt;p&gt;&lt;a href=&#34;http://www.thinkgeek.com/tshirts/generic/a878/?ref=c&#34;&gt;&lt;img id=&#34;id202507&#34; src=&#34;http://www.thinkgeek.com/images/products/front/i_heart_iso_8879.jpg&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;some description&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;A few years ago at the Oxford XML Summer School, at an outdoor dinner at the &lt;a href=&#34;http://www.botanic-garden.ox.ac.uk/Garden/oxfordbotanicgar.html&#34;&gt;Oxford Botanic Garden&lt;/a&gt;, I saw that &lt;a href=&#34;http://www.xmlgrrl.com/blog/&#34;&gt;Eve Maler&amp;rsquo;s&lt;/a&gt; T-shirt said &amp;ldquo;&amp;lt;geek&amp;gt;&amp;rdquo; on it. I couldn&amp;rsquo;t resist pointing out to her that her that with its lone start-tag, her shirt was not well-formed. She took off her jacked to show me the back, which said &amp;ldquo;&amp;lt;/geek&amp;gt;&amp;rdquo;. I stood corrected.&lt;/p&gt;
&lt;p&gt;The ThinkGeek web site has just come out with a T-shirt that&amp;rsquo;s even worse: it&amp;rsquo;s so markup geeky that most XML geeks won&amp;rsquo;t get it. It says &amp;ldquo;I &amp;amp;#9829; ISO 8879&amp;rdquo;, referring to SGML, the ISO standard of which XML is a simplified version. (I first met Eve, and many other well-known markup geeks, at an SGML conference before XML was invented.) The &amp;ldquo;&amp;amp;#9829;&amp;rdquo; part is the numeric character reference for a heart symbol. Get it?&lt;/p&gt;
&lt;p&gt;To make it even more obscure, the &lt;a href=&#34;http://www.thinkgeek.com/tshirts/generic/a878/?ref=c&#34;&gt;ThinkGeek webpage for the shirt&lt;/a&gt; &lt;em&gt;doesn&amp;rsquo;t even mention SGML&lt;/em&gt;. Some may see a clue in its reference to this ISO standard &amp;ldquo;setting the groundwork for XML&amp;rdquo;, but I think that very few people are going to buy this T-shirt.&lt;/p&gt;
&lt;p&gt;And they&amp;rsquo;ll all be at &lt;a href=&#34;http://www.balisage.net/index.htm&#34;&gt;Balisage&lt;/a&gt; in August.&lt;/p&gt;
&lt;h2 id=&#34;4-comments&#34;&gt;4 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://plasmasturm.org/&#34; title=&#34;http://plasmasturm.org/&#34;&gt;Aristotle Pagaltzis&lt;/a&gt; on &lt;a href=&#34;#comment-1927&#34;&gt;July 1, 2008 12:16 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Of all the ultra-geek shirts I have seen, this one is my perpetual favourite:&lt;br /&gt;
&lt;a href=&#34;http://www.flickr.com/photos/jedwards/89064330/&#34;&gt;http://www.flickr.com/photos/jedwards/89064330/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.adjb.net/&#34; title=&#34;http://www.adjb.net/&#34;&gt;Alex Brown&lt;/a&gt; on &lt;a href=&#34;#comment-1928&#34;&gt;July 1, 2008 2:29 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;But numeric character referencing wasn&amp;rsquo;t part of 8879 (was it? - I forget).&lt;/p&gt;
&lt;p&gt;Though I suppose an XML person could love SGML. Especially now that it&amp;rsquo;s not encountered much ;-)&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Alex.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;By &lt;a href=&#34;http://dl.ziza.ru/other/052008/12/pics/015_pics.jpg&#34; title=&#34;http://dl.ziza.ru/other/052008/12/pics/015_pics.jpg&#34;&gt;Forget It&lt;/a&gt; on &lt;a href=&#34;#comment-1929&#34;&gt;July 1, 2008 4:32 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Check this birthday cake:&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://dl.ziza.ru/other/052008/12/pics/015_pics.jpg&#34;&gt;http://dl.ziza.ru/other/052008/12/pics/015_pics.jpg&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.bortzmeyer.org/&#34; title=&#34;http://www.bortzmeyer.org/&#34;&gt;Stéphane Bortzmeyer&lt;/a&gt; on &lt;a href=&#34;#comment-1931&#34;&gt;July 4, 2008 10:56 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Why doe the cake use the attribute &amp;ldquo;code&amp;rdquo; instead of the more standard &amp;ldquo;xml:lang&amp;rdquo;?&lt;/p&gt;
&lt;p&gt;(Answer: because the language identifiers used as values of this attribute do not have the proper syntax?)\&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2008">2008</category>
      
      <category domain="https://www.bobdc.com//categories/xml">XML</category>
      
    </item>
    
    <item>
      <title>A successful Linked Data Planet conference</title>
      <link>https://www.bobdc.com/blog/a-successful-linked-data-plane/</link>
      <pubDate>Tue, 24 Jun 2008 08:24:57 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/a-successful-linked-data-plane/</guid>
      
      
      <description><div>Plenty of people, plenty of great talks and interaction.</div><div>&lt;p&gt;The first ever Linked Data conference, sponsored by Jupiter Media at New York City&amp;rsquo;s Roosevelt hotel last Tuesday and Wednesday, was great. I didn&amp;rsquo;t give any talks, but as co-chair I put together the program with Ken North and with his throat bothering him a bit I did most of the speaker introductions. I also moderated the &lt;a href=&#34;http://www.linkeddataplanet.com/conference/sessionsbyday.php#T8&#34;&gt;Linked Data Workshop&lt;/a&gt; panel, a session that provided the audience with a good range of perspectives on some difficult Linked Data application development issues such as data access control and the use of distributed versus aggregated data.&lt;/p&gt;
&lt;blockquote id=&#34;id202510&#34; class=&#34;pullquote&#34;&gt;The Linked Data movement is just the latest step in this long process that began almost fifty years ago of separating data further and further from the programs that create and maintain it so that other programs can use it, opening up new possibilities along the way.&lt;/blockquote&gt;
&lt;p&gt;In addition to seeing some old friends, I got to finally meet many people I had only known by email or reputation, such as Kingsley Idehen (who first had the idea for the conference), Seth Earley, Christine Connors, Taylor Cowan, Andy Seaborne, Barak Pridor, Ashok Malhotra, Dean Allemang, Jim Hendler, Paul Miller (the voice of the &lt;a href=&#34;http://talk.talis.com/&#34;&gt;Talking with Talis&lt;/a&gt; podcasts, whom I&amp;rsquo;ve heard interview most the people named in this sentence), and especially Tim Berners-Lee, whose keynote Tuesday evening was quite an inspirational pep talk for the Linked Data movement. Unlike some other big name keynoters I&amp;rsquo;ve seen, he didn&amp;rsquo;t just fly in, give his talk, and fly out; he attended and closely followed a great many talks. (In fact, when a group of conference organizers and keynote speakers met in the hotel bar before heading off to a prearranged dinner, we couldn&amp;rsquo;t find the main keynote speaker, and it turned out that he was attending an evening panel on the business possibilities of the semantic web. Luckily, he managed to catch up with us.)&lt;/p&gt;
&lt;p&gt;An interesting point in his &lt;a href=&#34;http://www.internetnews.com/dev-news/article.php/3753646/Sir+Tim+Talks+Up+Linked+Open+Data+Movement.htm&#34;&gt;keynote&lt;/a&gt; was how the original web made computers less important, and documents more important—for example, if I want to see a given spec or read a particular story, I don&amp;rsquo;t care what computer it&amp;rsquo;s on—and how &lt;a href=&#34;http://www.w3.org/DesignIssues/LinkedData&#34;&gt;Linked Data principles&lt;/a&gt; will help make documents less important and data more important. I didn&amp;rsquo;t understand this at first, but then realized that if I want to know a convenient start time for a movie or some good flights to get me from one city to another, I don&amp;rsquo;t care what document these facts are stored on; I just want the data. In fact, I want to easily find or write a program that grabs that data without relying on a proprietary data format or scraped HTML. The Linked Data movement is getting us there.&lt;/p&gt;
&lt;p&gt;Uche Ogbuji&amp;rsquo;s talk on &lt;a href=&#34;http://www.linkeddataplanet.com/conference/sessionsbyday.php#T4&#34;&gt;Linked Data: The Real Web 2.0&lt;/a&gt; described something interesting that he didn&amp;rsquo;t mention in his &lt;a href=&#34;https://www.bobdc.com/blog/an-interview-with-uche-ogbuji&#34;&gt;interview here&lt;/a&gt; shortly before the conference: his company &lt;a href=&#34;http://www.zepheira.com/&#34;&gt;Zepheira&lt;/a&gt;&amp;rsquo;s concept of &amp;ldquo;Linked Enterprise Data&amp;rdquo;, or LED. With so much Linked Open Data talk out there about the value of freely sharing data across the web, this uses the same principles for different means: easier sharing of data across the silos behind a firewall through the use of Linked Data principles. As one of Uche&amp;rsquo;s slides put it, &amp;ldquo;Rather than the ERP-type play to replace legacy apps with a centralized super-model, LED focuses on wrapping and exposing data in those apps&amp;rdquo;.&lt;/p&gt;
&lt;p&gt;I like this for two reasons. First, it&amp;rsquo;s a nice application of the database tuning world&amp;rsquo;s appropriation of the classic political advice &amp;ldquo;think globally, act locally&amp;rdquo;. Second, it works toward using Linked Data principles to solve current business needs instead of just treating it as some variation on the semantic web that could lead to cool applications.&lt;/p&gt;
&lt;p&gt;Ken&amp;rsquo;s opening welcome address helped put Linked Data into the perspective of the long-term history of computing (Ken &lt;a href=&#34;http://ourworld.compuserve.com/homepages/Ken_North/db_hall.htm&#34;&gt;knows a lot&lt;/a&gt; about this stuff), and this gave me a new insight: the original &amp;ldquo;data base&amp;rdquo; managers (or, even, in those days, &amp;ldquo;data bank&amp;rdquo; programs) were a step forward in the history of computing because they separated the data from the application that created and used that data so that other programs could use it. The Linked Data movement is just the latest step in this process that began almost fifty years ago of separating data further and further from the programs that create and maintain it so that other programs can use it, opening up new possibilities along the way.&lt;/p&gt;
&lt;h2 id=&#34;1-comments&#34;&gt;1 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.ccil.org/~cowan&#34; title=&#34;http://www.ccil.org/~cowan&#34;&gt;John Cowan&lt;/a&gt; on &lt;a href=&#34;#comment-1921&#34;&gt;June 24, 2008 10:30 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Sounds like it went extremely well; I&amp;rsquo;m sorry I couldn&amp;rsquo;t participate.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2008">2008</category>
      
      <category domain="https://www.bobdc.com//categories/semantic-web">semantic web</category>
      
    </item>
    
    <item>
      <title>An interview with Seth Earley about Linked Data</title>
      <link>https://www.bobdc.com/blog/an-interview-with-seth-earley/</link>
      <pubDate>Fri, 13 Jun 2008 15:45:57 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/an-interview-with-seth-earley/</guid>
      
      
      <description><div>The role that taxonomies can play in Linked Data applications.</div><div>&lt;p&gt;&lt;em&gt;&lt;a href=&#34;http://www.earley.com/&#34;&gt;Earley &amp;amp; Associates&lt;/a&gt; is one of the biggest names in taxonomy development, and founder Seth Earley will be giving a talk on &lt;a href=&#34;http://www.linkeddataplanet.com/conference/sessionsbyday.php#T7&#34;&gt;Building a Practical Semantic Framework: The role of taxonomies and controlled vocabularies in data integration&lt;/a&gt; at the Linked Data Planet conference next week. My recent reading makes the world of taxonomy development look a lot more mature than the ontology development that plays such a significant role in the semantic web, especially in terms of identifying concepts and relationships in a way that helps businesses achieve specific goals. I interviewed Seth via email to learn more about his company and their relationship to the burgeoning world of Linked Data techniques and practices. (As a side note about taxonomies and Linked Data, I recently learned from &lt;a href=&#34;http://www.openlinksw.com/blog/~kidehen/?id=1384&#34;&gt;Kingsley Idehen&amp;rsquo;s blog&lt;/a&gt; about a very interesting Linked Data application of one of the most important taxonomies in the US: the Library of Congress Subject Headings. If you follow the links in his bulleted list, remember to do a View Source on them.)&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;1. Tell me a little about your company.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Earley &amp;amp; Associates delivers consulting and applications development services that help companies leverage internal expertise and knowledge creating capabilities. We specialize in:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Enterprise taxonomy development&lt;/li&gt;
&lt;li&gt;Content management &amp;amp; Knowledge management&lt;/li&gt;
&lt;li&gt;Technology advisory&lt;/li&gt;
&lt;li&gt;Search strategy &amp;amp; integration&lt;/li&gt;
&lt;li&gt;Change management &amp;amp; governance&lt;/li&gt;
&lt;li&gt;Training &amp;amp; workshops&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We are a small company of around 15 full time consultants but we work with all sizes and types of organizations. Some of our recent clients include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Motorola&lt;/li&gt;
&lt;li&gt;The Hartford&lt;/li&gt;
&lt;li&gt;The Ford Foundation&lt;/li&gt;
&lt;li&gt;Hasbro Inc.&lt;/li&gt;
&lt;li&gt;The Coca Cola Company&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We are recognized within the industry as thought leaders and many of our consultants speak regularly at conferences and workshops including:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Enterprise Search Summit&lt;/li&gt;
&lt;li&gt;Enterprise3 Portals, Collaboration &amp;amp; Content&lt;/li&gt;
&lt;li&gt;Taxonomy Bootcamp&lt;/li&gt;
&lt;li&gt;KM World &amp;amp; Intranets&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We also maintain a regular CoP call series covering a diverse range of topics from search, taxonomy &amp;amp; metadata to usability testing and web analytics.&lt;/p&gt;
&lt;blockquote id=&#34;id202629&#34; class=&#34;pullquote&#34;&gt;&#34;The most important aspect of the question is deciding what the real application of either taxonomy or ontology will be, and making sure you have the metrics in place to be able justify the effort it takes to develop either one.&#34;&lt;/blockquote&gt;
&lt;p&gt;&lt;em&gt;2. What does the idea of Linked Data mean to you?&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;I think Linked Data is really an extension of concepts and questions that we have been dealing with in the information management field for years. Which is to say, how can we make meaningful connections between the information that we use to do our work? How can we understand it within a context?&lt;/p&gt;
&lt;p&gt;In the case of Linked Data, we are attempting to expand this notion of connections or linking from strictly web pages and documents to structured data and other types of resources that can be represented through RDF, and making those connections explicit.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;3. What can Linked Data practices and technologies bring to the challenges that Earley &amp;amp; Associates clients are facing?&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;For the most part our clients have come to recognize the incredible challenge of creating a shared semantic framework within their organization. In this case, we understand the term semantic, not in reference to the semantic web, but in relation to a controlled vocabulary that has a particular meaning to an organization and the content it manages.&lt;/p&gt;
&lt;p&gt;In our experience most organizations are not at the level of IM maturity that linked data practices are really relevant to their current needs.&lt;/p&gt;
&lt;p&gt;That being said, there is incredible potential for linked data technology to create a richer information environment both on the semantic web and in the organization. The explicit nature of the links made using RDF certainly present a new level of granularity in defining the relationships of one item of content to another.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;4. How would you distinguish &amp;ldquo;Linked Data&amp;rdquo; projects from &amp;ldquo;Semantic Web&amp;rdquo; projects? Or would you?&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;I suppose it’s possible to invest in linked data projects that are enterprise focused, in that the information lives outside the semantic web behind a firewall. However, the main driver around the creation of linked data is to build the semantic web and create links between disparate data sources. I think the business case is really still in its early stages.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;5. Semantic Web discussions often bring up the role of ontologies. Is it possible to differentiate between the potential roles of taxonomies and ontologies in Linked Data and/or Semantic Web efforts?&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;The line between what is possible to represent with taxonomy and what is possible to represent in an ontology is a fuzzy area. Taxonomies, in a traditional sense, are solely hierarchical in nature, representing a general to specific relationship, whereas an ontology is capable of representing a much larger range of relationships.&lt;/p&gt;
&lt;p&gt;However, in our work with clients developing taxonomies, the inclusion of polyhierarchical relationships, as well as reciprocal &amp;ldquo;see also&amp;rdquo; relationships, have become commonplace. Now these types of relationships certainly fall outside of the most traditional taxonomy definitions but also fall short of the complexity that can be modelled with RDF and OWL.&lt;/p&gt;
&lt;p&gt;I think the most important aspect of the question is deciding what the real application of either taxonomy or ontology will be, and making sure you have the metrics in place to be able justify the effort it takes to develop either one.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;6. Some enterprises have already invested in taxonomies. How can they leverage this in Linked Data projects?&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;This really comes down to the nature of the taxonomy itself. Proponents of the semantic web recommend the use of standard vocabularies (e.g. FOAF SIOC DOAP, etc.) for representing content.&lt;/p&gt;
&lt;p&gt;If the taxonomy that an organization has already invested in is representing a very specific and organization-centric domain of information, their may be a lot of work required to align it with standardized vocabularies recommended for the semantic web.&lt;/p&gt;
&lt;p&gt;Again, I think it comes down to planning and alignment of effort with an overall information strategy. Anytime you decide to describe a piece of information so that it can be shared, you enter a highly charged and political world. Building a taxonomy is as much about understanding people as it is content. If that understanding can be shared through a linked data project, then great. However I would suggest that a key priority of most organizations is still understanding what the value and meaning of their own content is to them.&lt;/p&gt;
&lt;h2 id=&#34;3-comments&#34;&gt;3 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://clarkparsia.com/&#34; title=&#34;http://clarkparsia.com/&#34;&gt;Kendall Clark&lt;/a&gt; on &lt;a href=&#34;#comment-1915&#34;&gt;June 18, 2008 2:57 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;As someone who does with ontologies roughly what Seth&amp;rsquo;s company does with taxonomies, I commend him for a very clear, fair, and honest assessment (by my lights) of the differences between taxonomies and ontologies. I couldn&amp;rsquo;t agree more than neither is better than the other per se, and which you need depends on use cases, resources, and other engineering tradeoffs.&lt;/p&gt;
&lt;p&gt;That said, I don&amp;rsquo;t agree at all with Bob&amp;rsquo;s claim about relative maturity of development tools or available taxonomies versus ontologies; but, then, I wouldn&amp;rsquo;t tend to. ;&amp;gt;&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-1916&#34;&gt;June 18, 2008 9:57 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Hi Kendall,&lt;/p&gt;
&lt;p&gt;I never said anything about tools&amp;ndash;in fact, the more I study the taxonomy tools out there, the more I think that the free combination of SWOOP and SKOS has a lot more value than several taxonomy development products that cost hundreds of dollars. Someone just needs to write out a bit of code to spit out some of the standard reports that those tools typically offer. (Hello, lazy semweb&amp;hellip;)&lt;/p&gt;
&lt;p&gt;There probably are more available ontologies than taxonomies out there, but that&amp;rsquo;s part of the problem. While companies like Clark Parsia are basing client ontologies on serious analysis of the client&amp;rsquo;s business goals and needs, the &amp;ldquo;if you build it maybe they&amp;rsquo;ll come&amp;rdquo; thrown-together ontologies have really multiplied like rabbits out there over the last few years. I used the term &amp;ldquo;maturity&amp;rdquo; because carefully codified taxonomies designed to aid the management of information have been around a lot longer than their ontological equivalents.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://clarkparsia.com/&#34; title=&#34;http://clarkparsia.com/&#34;&gt;Kendall Clark&lt;/a&gt; on &lt;a href=&#34;#comment-1923&#34;&gt;June 24, 2008 3:58 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Bob, okay, I take yr point to make more sense now that I understand what you were saying. But on the marketing level, it&amp;rsquo;s a bit of suckage that *most* of the &amp;ldquo;ontologies&amp;rdquo; you are talking about are RDF Schemas, and really just, technically, taxonomies, rather than full-on ontologies. And yet, marketing-wise, you&amp;rsquo;re calling them &amp;ldquo;ontologies&amp;rdquo; which implies suckage of the wrong technology! :&amp;gt;&lt;/p&gt;
&lt;p&gt;More seriously, there&amp;rsquo;s a lot of crap out there, OWL, RDF, XML, SQL DDL, etc. I don&amp;rsquo;t think any of those technologies is any more prone to crap than any other, not in the aggregate. OWL is probably the hardest, but then it *seems* hard, which tends to warn off people who don&amp;rsquo;t really know what they&amp;rsquo;re doing.&lt;/p&gt;
&lt;p&gt;That make any sense?&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2008">2008</category>
      
      <category domain="https://www.bobdc.com//categories/semantic-web">semantic web</category>
      
    </item>
    
    <item>
      <title>Navigating Hollywood gossip with semantic technology</title>
      <link>https://www.bobdc.com/blog/navigating-hollywood-gossip-wi/</link>
      <pubDate>Wed, 11 Jun 2008 09:19:20 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/navigating-hollywood-gossip-wi/</guid>
      
      
      <description><div>And news in the worlds of investment, U.S. politics, and more.</div><div>&lt;p&gt;&lt;a href=&#34;http://www.snee.com/blogbigpicture&#34;&gt;&lt;img id=&#34;id202480&#34; src=&#34;https://www.bobdc.com/img/main/bbb2.jpg&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;BlogBigPicture screenshot&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I &lt;a href=&#34;https://www.bobdc.com/blog/having-fun-with-reuters-calais&#34;&gt;recently mentioned&lt;/a&gt; that while I had used &lt;a href=&#34;http://www.opencalais.com/&#34;&gt;Reuters Calais&lt;/a&gt; to look for entities in Giorgio Vasari&amp;rsquo;s &amp;ldquo;Lives of the Painters&amp;rdquo;, I had something more interesting in the works, and here it is: &lt;a href=&#34;http://www.snee.com/blogbigpicture/&#34;&gt;BlogBigPicture&lt;/a&gt;. It lets you navigate a set of related blog entries based on the names, places, companies, movies, and other entities mentioned in those entries.&lt;/p&gt;
&lt;p&gt;The default tab shows Hollywood gossip, but others have blogs and news about investing, English Premier League football, world business news, and U.S. politics. To get started on the Hollywood tab, click &amp;ldquo;Person&amp;rdquo; in the gray box on the right and then mouse over the names that appear. You&amp;rsquo;ll see the titles of the entries mentioning that person highlighted in the main panel, where you can click those titles to read the entries. (Oh, that Amy Winehouse&amp;hellip;) When I was doing the main work on this, Ashlee Simpson had just married what&amp;rsquo;s-his-name, and the many entry titles that appeared when mousing over her name showed what a hot story they were that week in the world of Hollywood gossip. Using the same technique to evaluate hot news in the business world isn&amp;rsquo;t quite as much fun, but ultimately much more valuable.&lt;/p&gt;
&lt;p&gt;The news categories that I chose are just samples. I picked investment blogs and world business news because Calais is &lt;a href=&#34;http://www.opencalais.com/calaisAPI#extractedsemanticdata&#34;&gt;tuned for&lt;/a&gt; that subject matter, Hollywood gossip because it was fun, U.S. politics because there&amp;rsquo;s a lot going on now, and Premier League Football because I wanted a sports category with international appeal that wasn&amp;rsquo;t U.S.-centric.&lt;/p&gt;
&lt;p&gt;BlogBigPicture is still pretty rough, and I have many ideas to improve it, but I decided that now that it works well enough for people to play with it, it was time to let them do so. Enjoy it, and let me know what you think!&lt;/p&gt;
&lt;h2 id=&#34;4-comments&#34;&gt;4 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://thefigtrees.net/lee/blog/&#34; title=&#34;http://thefigtrees.net/lee/blog/&#34;&gt;Lee&lt;/a&gt; on &lt;a href=&#34;#comment-1903&#34;&gt;June 11, 2008 10:43 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Bob,&lt;/p&gt;
&lt;p&gt;This is fantastic. I&amp;rsquo;m curious to know more about the architecture/implementation underlying it, if you have the chance. Are you regularly updating the feeds and running them through Calais and generating static Web content? Is there any dynamic discovery going on?&lt;/p&gt;
&lt;p&gt;Lee&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-1904&#34;&gt;June 11, 2008 11:36 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Sure, if by &amp;ldquo;regularly&amp;rdquo; you mean once or twice a day! It would be nice to have it happen more often (and of course, to let users choose their own RSS feeds and groups), but that&amp;rsquo;s for the future.&lt;/p&gt;
&lt;p&gt;The basic architecture is that the feeds get pulled down with feedparser, and after storing basic metadata about each feed and each entry in an RDFlib triplestore, each entry gets sent to Calais. The returned RDF gets stored in the triplestore, and then the interface is built from that.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://sourceforge.net/projects/lpkb/&#34; title=&#34;http://sourceforge.net/projects/lpkb/&#34;&gt;Colm Kennedy&lt;/a&gt; on &lt;a href=&#34;#comment-1905&#34;&gt;June 11, 2008 1:02 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;greeting(s) sentient entities operating in the semantic web conceptual space ;-)&lt;/p&gt;
&lt;p&gt;as part of the ongoing construction of ./lpkb, we have arrived as similar requirement albeit thru&amp;rsquo; very different assumptions and development paths.&lt;/p&gt;
&lt;p&gt;given Natural Language Text ./lpkb can parse it i.e. generate a readable facsimile (not everything&amp;hellip;every word&amp;hellip; but &amp;rsquo;enuf to make a readable copy)&lt;/p&gt;
&lt;p&gt;i have no special training in natural language extraction or similar high end approaches but my requirements are a follows:&lt;/p&gt;
&lt;p&gt;in: text&lt;br /&gt;
do: classify the text assign each word a cateogry&lt;br /&gt;
out: a structured archive of the parsing&lt;/p&gt;
&lt;p&gt;(I would like this final stage to be in RDF-A)&lt;br /&gt;
(at one stage the parser did a round trip through jena for x-links)&lt;br /&gt;
(sparql queries below)&lt;br /&gt;
now lounging about all day &amp;ldquo;reading&amp;rdquo; is not what what ./lpkb is for, but I like the simplicity of just assigning one of four types to each word and being able to follow along. those four categories are [a] [e] (actor/event) which are usually verbs and nouns. [o] any other symbol not a noun/verb and [x] for x-link (these are in an &amp;ldquo;un-debugged&amp;rdquo; semantic network which was minded from the open-mind corpus).&lt;/p&gt;
&lt;p&gt;the reason I mention of all of this is that it is:&lt;br /&gt;
&amp;ndash; well defined task&lt;br /&gt;
&amp;ndash; exists (un-web-ified) code&lt;br /&gt;
&amp;ndash; concerns natural language&lt;/p&gt;
&lt;p&gt;if i am off base here or off topic *please* say so.&lt;/p&gt;
&lt;p&gt;for your reference:&lt;br /&gt;
&lt;a href=&#34;http://csksoft.com/RDF/tump.rdf&#34;&gt;http://csksoft.com/RDF/tump.rdf&lt;/a&gt;&lt;br /&gt;
&lt;a href=&#34;http://sparql.org/sparql?query=PREFIX+lpkb%3A+++%3Chttp%3A%2F%2Fhomepage.eircom.net%2F~cornagill%2Flpkb%23%3E%0D%0A%0D%0ASELECT+%3Fbigword+%0D%0AFROM++++++%3Chttp%3A%2F%2Fcsksoft.com%2FRDF%2Ftump.rdf%3E%0D%0AWHERE+%0D%0A++%7B+%3Fbigword+lpkb%3AconceptuallyRelated+lpkb%3Aread+%7D&amp;amp;default-graph-uri=&amp;amp;stylesheet=%2Fxml-to-html.xsl&#34;&gt;http://sparql.org/sparql?query=PREFIX+lpkb%3A+++%3Chttp%3A%2F%2Fhomepage.eircom.net%2F~cornagill%2Flpkb%23%3E%0D%0A%0D%0ASELECT+%3Fbigword+%0D%0AFROM++++++%3Chttp%3A%2F%2Fcsksoft.com%2FRDF%2Ftump.rdf%3E%0D%0AWHERE+%0D%0A++%7B+%3Fbigword+lpkb%3AconceptuallyRelated+lpkb%3Aread+%7D&amp;amp;default-graph-uri=&amp;amp;stylesheet=%2Fxml-to-html.xsl&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.patrickgmj.net/blog&#34; title=&#34;http://www.patrickgmj.net/blog&#34;&gt;Patrick Murray-John&lt;/a&gt; on &lt;a href=&#34;#comment-1906&#34;&gt;June 11, 2008 9:31 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Positively inspiring! You&amp;rsquo;ve demonstrated how easy it can be to slice and dice a lot of info, and more importantly, guide to more info!&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;m working on slicing/dicing/guiding around a WordPress MultiUser installation&amp;hellip;this is a fantastic model for me to follow. Thanks much!&lt;/p&gt;
&lt;p&gt;Patrick&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2008">2008</category>
      
      <category domain="https://www.bobdc.com//categories/rdf">RDF</category>
      
      <category domain="https://www.bobdc.com//categories/semantic-web">semantic web</category>
      
    </item>
    
    <item>
      <title>An interview with Uche Ogbuji about Linked Data</title>
      <link>https://www.bobdc.com/blog/an-interview-with-uche-ogbuji/</link>
      <pubDate>Mon, 09 Jun 2008 13:05:44 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/an-interview-with-uche-ogbuji/</guid>
      
      
      <description><div>Helping his clients with Linked Data technology and principles.</div><div>&lt;p&gt;&lt;em&gt;Anyone who follows the XML or semantic web world knows of &lt;a href=&#34;http://zepheira.com/team/uche/&#34;&gt;Uche Ogbuji&amp;rsquo;s&lt;/a&gt; work. His presentation &lt;a href=&#34;http://www.linkeddataplanet.com/conference/sessionsbyday.php#T4&#34;&gt;Linked Data: The Real Web 2.0&lt;/a&gt;. will be one of the first talks on the first day of the Linked Data Planet conference next week; as we prepare for it, I asked him a few questions about his work with Linked Data and the benefits its brought to clients of his company, &lt;a href=&#34;http://www.zepheira.com/&#34;&gt;Zepheira&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Tell us a little about Zepheira.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Zepheira provides solutions for data integration, focusing on Semantic technology. But let me back away a bit from the straight corporate line. Think of how we traditionally deal with data in informatics. We look to fit data into neat partitions, shepherd it along neat lines and fit it into grand unified theory. All good, hard science. The problem is that data isn&amp;rsquo;t so easily quantized. It&amp;rsquo;s a living , temperamental entity that absorbs bits of personality from everyone who touches it. Dealing effectively with data requires art, and at Zepheira we really look to the art of data rather than to the science of code. We think adopting the right conventions for data that accommodate its unpredictable qualities is the key to so many of the problems that have dogged IT, and we believe that the web is the most successful set of conventions in this regard. In general we look to apply web architecture to enterprise problems. This brings us right in line with the Linked Data concept, which is really just a way to distill the essential keys to web architecture in a way any developer could tick off his fingers. At Zepheira we start with such principles as the body of art, and we bring together folks who&amp;rsquo;ve have proven themselves as journeymen and masters in this art, and we think this positions us to offer particularly effective solutions to our customers.&lt;/p&gt;
&lt;blockquote id=&#34;id202541&#34; class=&#34;pullquote&#34;&gt;Web developers can ease into Linked Data ideas, whereas the original message for the semantic web was focused on a major shift to new technologies that seemed too alien and complex to the average Web developer.&lt;/blockquote&gt;
&lt;p&gt;&lt;em&gt;What does the idea of Linked Data mean to you?&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;To me Linked Data means building on the basic framework of the Web, originally designed for documents. Using a set of &lt;a href=&#34;http://www.w3.org/DesignIssues/LinkedData.html&#34;&gt;four basic principles&lt;/a&gt; articulated by Tim Berners-Lee we extend it to provide a similarly same rich information space for granular data. We do so by using semantically rich hooks or translations of the essential data in Web pages, and by creating new Web information sources primarily in semantically rich formats. RDF is the format of choice for Linked Data, but more importantly it is the data model for merging information (for query, &amp;ldquo;mash-up&amp;rdquo; and much more)—the physical format can be anything from which RDF-like semantics are readily extracted.&lt;/p&gt;
&lt;p&gt;What makes Linked Data exciting is that it is a vehicle for the future (semantic web) without straying too far from what has worked so well in past and present. Whether they come in through the door of Microformats, Web feeds or JSON APIs, Web developers can ease into Linked Data ideas, whereas the original message for the semantic web was focused on a major shift to new technologies that seemed too alien and complex to the average Web developer.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;What differences do you see from the idea of the &amp;ldquo;semantic web&amp;rdquo;?&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;I think the term &amp;lsquo;semantic web&amp;rsquo; actually covers two separate, but related ideas. On one hand it&amp;rsquo;s a goal—a web where information context is curated as carefully as the presented data. On the other it&amp;rsquo;s a methodology—a specific set of techniques advocated for achieving that goal. Linked Data is just another methodology towards the same goal. Linked Data is simpler because rather than requiring sophisticated and exhaustive declaration of the data such as OWL, it merely requires that you use links effectively, and do what you can to express the basic relationship semantics of those links. It&amp;rsquo;s a much lower barrier to entry, and though the resulting context might not be rigorous enough for a logician, it&amp;rsquo;s a big enough leap that I believe the result merits the term &amp;ldquo;semantic web&amp;rdquo;.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Are you seeing current or near-term benefits from linked data technology with Zepheira client projects?&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Definitely. In rapid prototypes for clients we&amp;rsquo;re usually able to give them new analysis and decision-making capabilities. We often find that after we&amp;rsquo;ve produced a few deliverables, clients get much more ambitious because they see new possibilities. I think this is in large part down to web architecture, and thus Linked Data. It&amp;rsquo;s not really black magic; the trick is usually to convert existing data sources using Linked Data techniques, which allows us to very quickly integrate across departments, viewpoints and specific application capabilities. It&amp;rsquo;s the sort of integration that unfortunately IT is too used to associating with heavily-staffed, multi-year projects. We&amp;rsquo;ve found Linked Data to be a prodigious accelerator.&lt;/p&gt;
&lt;p&gt;It may not be black magic, but again it is all about the art. We&amp;rsquo;ve put together a pretty reliable sequence of solutions beginning with START, which is a seminar/workshop combination to analyze the benefits and ideal targets of technology such as Linked Data at a specific client. Once the client has a target project in mine we have 3D, with is a carefully crafted package to accelerate the use of Linked Data in the project. 3D is like taking the general ideas of a home owner sketching out their dream home, and placing it into a particular architectural school, and producing blueprints, detailed materials manifests, subcontractor plans for framing, wiring, plumbing and more. 3D itself does not include implementation (building the home) because many of our clients would prefer that we prepare their internal development teams for that. When we are called for implementation we often use Remix, a web-based application built on SIMILE and other Linked Data open source products, to which we&amp;rsquo;ve made many commercial enhancements. That gives us a ready platform for the sort of rapid and rich integration I mentioned above. In effect we&amp;rsquo;ve built an entire solutions stack on Linked Data.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;At the Linked Data Planet conference, you&amp;rsquo;ll be talking about the Linking Open Data initiative. Where does this fit into the larger picture of Linked Data technology?&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;I think it&amp;rsquo;s a pretty fuzzy line, but the way I try to organize it in my head Linked Data is a broader concept encompassing four main principles. The LOD initiative is a project associated with a more specific range of techniques and a particular kernel of sites (most notably DBPedia), providing a practical basis for expanding the field of information available as Linked data. At Zepheira we tend towards techniques popularized in LOD, but clearly for most of our clients the data can&amp;rsquo;t be thrown into a cloud of public data, so we&amp;rsquo;ve added our own refinements leading to what we&amp;rsquo;ve started to call Linking Enterprise Data (LED) to increase the definition of the Linked Data principles as applied to organizational data integration and decision support needs.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2008">2008</category>
      
      <category domain="https://www.bobdc.com//categories/semantic-web">semantic web</category>
      
    </item>
    
    <item>
      <title>Ask a good linked data development question, go to Linked Data Planet for free</title>
      <link>https://www.bobdc.com/blog/ask-a-good-linked-data-develop/</link>
      <pubDate>Sun, 01 Jun 2008 16:07:12 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/ask-a-good-linked-data-develop/</guid>
      
      
      <description><div>And hear a panel of experts discuss the answer.</div><div>&lt;img id=&#34;id202477&#34; src=&#34;http://www.linkeddataplanet.com/images/hdr_logo.jpg&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; width=&#34;220px&#34; alt=&#34;LinkedData Planet logo&#34;/&gt;
&lt;p&gt;As part of the &lt;a href=&#34;http://www.linkeddataplanet.com/&#34;&gt;Linked Data Planet&lt;/a&gt; conference, on June 18th I&amp;rsquo;m hosting a panel described &lt;a href=&#34;http://www.linkeddataplanet.com/conference/sessionsbyday.php#T8&#34;&gt;on the conference program&lt;/a&gt; like this:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;T8: Linked Data Workshop&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Based on questions submitted by the audience, a panel of Linked Data experts discuss architecture and system development issues surrounding Linked Data application development.&lt;/p&gt;
&lt;p&gt;MODERATOR: Bob DuCharme, Solutions Architect and Author / Conference Chair, Innodata Isogen&lt;/p&gt;
&lt;p&gt;PANELIST: Dr. Melliyal Annamalai, Principal Product Manager, Oracle&lt;br /&gt;
PANELIST: Michael Bergman, CEO, Zitgist LLC&lt;br /&gt;
PANELIST: Stefanos Damianakis, President &amp;amp; CEO, Netrics&lt;br /&gt;
PANELIST: Uche Ogbuji, Partner, Zepheira&lt;br /&gt;
PANELIST: Nikita Ogievetsky, Vice President, Morgan Stanley&lt;br /&gt;
PANELIST: Walter Perry, Managing Director, Fiduciary Automation&lt;br /&gt;
PANELIST: Dr. Andy Seaborne, Research Scientist, Hewlett-Packard Research Laboratories\&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;What questions do you have? What system or architecture issues would you like to see our distinguished panel discuss? For example, if you have an idea for an application that links publicly available data using SPARQL end points or related technology, and you&amp;rsquo;ve worked out part of your app but are unsure about the rest, tell us what difficult parts remain. Perhaps you have a prototype that works fine, but you&amp;rsquo;re wondering about the best way to scale it up. Perhaps you have an idea that&amp;rsquo;s part crazy and part brilliant, and you&amp;rsquo;re wondering how best to nudge it toward &amp;ldquo;useful&amp;rdquo;. With this panel&amp;rsquo;s well-known representatives from the tool side, the app user side, and the integrator side, it&amp;rsquo;s guaranteed to have some good advice for you.&lt;/p&gt;
&lt;p&gt;The best questions will get a full free pass to the conference, and I&amp;rsquo;ll announce the winner&amp;rsquo;s names here and when I read their questions at the panel. Send me your ideas for questions at &lt;a href=&#34;mailto:bob@snee.com&#34;&gt;bob@snee.com&lt;/a&gt;. (I have a few of my own ideas for questions, but I already get in for free.) There are no limits to the number of questions one person can send, but you get extra points for concision—I don&amp;rsquo;t want to spend a lot of the session time reading your question out loud.&lt;/p&gt;
&lt;h2 id=&#34;1-comments&#34;&gt;1 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://blog.triplescape.com&#34; title=&#34;http://blog.triplescape.com&#34;&gt;Brian Manley&lt;/a&gt; on &lt;a href=&#34;#comment-1896&#34;&gt;June 1, 2008 11:22 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;This is a fairly broad question, but: When it comes to exposing enterprise data from proprietary and COTS systems in a linkable way, two obvious choices come to mind: Provide something like a SPARQL front-end on front of each application, or aggregating all of that into a queriable &amp;ldquo;semantic data warehouse&amp;rdquo;. What are the benefits and pitfalls of those approaches, and are there approaches that lie somewhere between that have proven to be successful?&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2008">2008</category>
      
      <category domain="https://www.bobdc.com//categories/semantic-web">semantic web</category>
      
    </item>
    
    <item>
      <title>Adding semantics to make data more valuable</title>
      <link>https://www.bobdc.com/blog/adding-semantics-to-make-data/</link>
      <pubDate>Thu, 29 May 2008 08:51:38 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/adding-semantics-to-make-data/</guid>
      
      
      <description><div>The secret revealed.</div><div>&lt;p&gt;Storing information about the meaning of terms—their &amp;ldquo;semantics&amp;rdquo;—can make data more valuable. Critics of semantic web technology consider such talk to be pie-in-the-sky AI talk; how can you encode the real meaning of words? More importantly, how can you do it in a way that programs can read and use to solve real data problems?&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://www.cafepress.com/lockhorn&#34;&gt;&lt;img id=&#34;id202458&#34; src=&#34;https://www.bobdc.com/img/main/lockhorns20051030.gif&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;undoctored Lockhorns strip&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The answer is very simple: &lt;em&gt;you don&amp;rsquo;t have to encode all of a term&amp;rsquo;s semantics to get value from the standards and software used to do so.&lt;/em&gt; Let&amp;rsquo;s look at an example.&lt;/p&gt;
&lt;p&gt;What are the semantics of the word &amp;ldquo;spouse&amp;rdquo;? What does it mean to a recently engaged nineteen-year-old girl? What does it mean to a fifty-year-old man who&amp;rsquo;s been divorced three times? What does it mean in a court of law in California, Mississippi, Austria, or Thailand?&lt;/p&gt;
&lt;p&gt;That&amp;rsquo;s a lot of meaning to store, but we don&amp;rsquo;t need to store much to make a simple, mundane database such as an address book more valuable. Let&amp;rsquo;s say my address book includes the following facts, and I want Leroy&amp;rsquo;s home phone number:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Leroy has a work phone number of 212-334-4323.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Leroy has an email address of &lt;a href=&#34;mailto:leroy@ngcorp.com&#34;&gt;leroy@ngcorp.com&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Loretta has an email address of &lt;a href=&#34;mailto:loretta031@yahoo.com&#34;&gt;loretta031@yahoo.com&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Loretta has a home phone number of 718-928-6621.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Loretta&amp;rsquo;s spouse is Leroy.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The only information I have about Leroy is his work number and his email address. I don&amp;rsquo;t have his home number or any information about his spouse.&lt;/p&gt;
&lt;p&gt;The W3C OWL web ontology language lets us declare that a property is symmetric, or as the &lt;a href=&#34;http://www.w3.org/TR/2004/REC-owl-features-20040210/&#34;&gt;OWL overview&lt;/a&gt; puts it, &amp;ldquo;if the pair (x,y) is an instance of the symmetric property P, then the pair (y,x) is also an instance of P.&amp;rdquo; With software that understands an OWL expression stating that spouse is a symmetric property and a rule I define to say that spouses have the same home phone number, I can retrieve Leroy&amp;rsquo;s home phone number from the little &amp;ldquo;database&amp;rdquo; above. (More likely, I would define a &amp;ldquo;roommate&amp;rdquo; property as symmetric and a rule saying that roommates have the same home phone number, and then declare spouse to be a subproperty of roommate, but you get the idea.) By doing this, I&amp;rsquo;d be using the OWL rules to let me pull more information out of the data collection than I put into it, making the data collection more valuable.&lt;/p&gt;
&lt;p&gt;Plenty of software claims to make this kind of thing possible, but what interests me in OWL and related standards is the fact that they&amp;rsquo;re standards, so that if I use OWL syntax to say &amp;ldquo;spouse is a symmetric property,&amp;rdquo; a range of commercial and free software can understand and use that little bit of semantics that I&amp;rsquo;ve stored to help me get more work done.&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s easiest to demonstrate this with data stored using an RDF syntax, because the RDF data model has the closest fit to the subject/attribute-name/attribute-value statements in my little database above. If you prefer, though, more and more tools can keep the RDF part under the covers; my XML 2006 paper &lt;a href=&#34;http://www.snee.com/xml/xml2006/owlrdbms.html&#34;&gt;Relational database integration with RDF/OWL&lt;/a&gt; describes a related demo using address book data stored in MySQL tables. It shows a few more use cases of realistic questions to the database that get better answers because of semantics added using OWL.&lt;/p&gt;
&lt;p&gt;There&amp;rsquo;s a lot we can do with this technology&amp;hellip;&lt;/p&gt;
&lt;h2 id=&#34;9-comments&#34;&gt;9 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://w3future.com/weblog/&#34; title=&#34;http://w3future.com/weblog/&#34;&gt;Sjoerd Visscher&lt;/a&gt; on &lt;a href=&#34;#comment-1882&#34;&gt;May 29, 2008 9:52 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Indeed. You don&amp;rsquo;t have to know the semantics of the term &amp;ldquo;spouse&amp;rdquo;, only of the concept of spouse. There are a lot of subtle different meanings for the term &amp;ldquo;spouse&amp;rdquo;, and a lot of different terms for the concept of spouse, but the concept of spouse is unique in the universe.&lt;/p&gt;
&lt;p&gt;Perhaps that&amp;rsquo;s why AI in the 80&amp;rsquo;s didn&amp;rsquo;t work, with their emphasis on natural language.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://aeshin.org/&#34; title=&#34;http://aeshin.org/&#34;&gt;Ryan Shaw&lt;/a&gt; on &lt;a href=&#34;#comment-1883&#34;&gt;May 29, 2008 9:59 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;You&amp;rsquo;re telling a one-sided story here. There&amp;rsquo;s potential value from adding semantics, but also potential harm. Consider my friends Mindy and Mandy, recently married in Massachusetts, but who are currently working in separate cities and visiting on weekends. I have a new phone, but it has a weird bug: whenever I try to call Mindy it rings Mandy&amp;rsquo;s home phone. What the hell?&lt;/p&gt;
&lt;p&gt;Turns out my phone has a semantically-enriched &amp;ldquo;smart&amp;rdquo; addressbook that deduces the &amp;ldquo;fact&amp;rdquo; that Mindy and Mandy are roommates (that share a home phone) from the fact that they are spouses. Or, worse, it won&amp;rsquo;t let me link Mindy and Mandy as spouses because of an ontology that requires spouses to be of the opposite sex.&lt;/p&gt;
&lt;p&gt;Of course, I don&amp;rsquo;t know anything about this, because the software engineers have hidden all this semantic sausage behind a slick interface, so all I know is that my phone doesn&amp;rsquo;t work right, and go out to look for a new one. All because the engineers decided to impose their worldview on me via an addressbook.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.ccil.org/~cowan&#34; title=&#34;http://www.ccil.org/~cowan&#34;&gt;John Cowan&lt;/a&gt; on &lt;a href=&#34;#comment-1884&#34;&gt;May 29, 2008 10:56 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Sjoerd Visscher:&lt;/p&gt;
&lt;p&gt;I profoundly disagree. The only way you have to even articulate &amp;ldquo;the concept of &amp;lsquo;spouse&amp;rsquo;&amp;rdquo; to me, who wishes to know what you mean by the term, is to use words, one or many, words which are themselves polysemous. Adding a layer of concepts to pin down what is meant by words leaves you with something far harder to understand than words themselves are.&lt;/p&gt;
&lt;p&gt;Eric Raymond tells the story of having received a letter about one of his books, the Hacker&amp;rsquo;s Dictionary. He got it because he edited the dictionary: that is, he assembled it out of the evidence provided by the many participants. However, the letter was really intended for &lt;em&gt;his&lt;/em&gt; editor, the publisher&amp;rsquo;s employee who was responsible for preparing the book for publication. He, and the letter&amp;rsquo;s author, were tripped up by the polysemy of &amp;ldquo;editor&amp;rdquo;.&lt;/p&gt;
&lt;p&gt;Could that have been foreseen in advance and accounted for? I don&amp;rsquo;t think so. It&amp;rsquo;s something that arises in the special case of dictionaries (and perhaps anthologies).&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.TimothyHorrigan.com&#34; title=&#34;http://www.TimothyHorrigan.com&#34;&gt;Timothy Horrigan&lt;/a&gt; on &lt;a href=&#34;#comment-1886&#34;&gt;May 30, 2008 2:33 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I apologize if this is too simple minded a perspective&amp;hellip; but something which bugged the hell out of me during many years doing data conversions was the fact that none of the commonly used data-storage tools (e.g., Excel spreadsheets) have any provision for storing the units of measure. (Which I suppose is the most basic form of metadata.)&lt;/p&gt;
&lt;p&gt;I picked on Excel spreadsheets, because those are extraordinarily messy when used wrong, as they are 99.999% of the time (even by me.) Though actually Excel does let you put the name of the unit in the column label (though not on a row below the column label) or in a note field or you can even used name variables (not that anyone in the business world ever does any of those things.) But database products, even fancy ones, are just as bad or worse.&lt;/p&gt;
&lt;p&gt;The school system, BTW, does a rotten job of teaching kids how to use units (and other metadata, I suppose.) I used to grade open ended questions on math tests, and the biggest source of trouble was when kids combined the variables wrong in ways which would have been obvious if you paid attention to the units of measure. Students usually don&amp;rsquo;t get any training in this sort of thing until they begin taking quantitative courses in college, and by then it is too late, especially if the student only takes the minimum Phsyics for Poets and Business Math to meet their degree requirements.&lt;/p&gt;
&lt;p&gt;&lt;br /&gt;
**rant mode off**&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.timothyhorrigan.com/tammi_itunes.html&#34; title=&#34;http://www.timothyhorrigan.com/tammi_itunes.html&#34;&gt;Timothy Horrigan&lt;/a&gt; on &lt;a href=&#34;#comment-1890&#34;&gt;May 30, 2008 10:38 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Here is some meta-metadata about this thread (i.e., data about the data about the data.) The post is illustrated with a Lockhorns cartoon. Our multitalented host, Bob DuCharme, used to play guitar with a band called the Lockhorns.&lt;/p&gt;
&lt;p&gt;Some of their tunes, including Bob&amp;rsquo;s song Hiwataha can be found at:&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://www.philipshelley.com/words/?cat=10&#34;&gt;http://www.philipshelley.com/words/?cat=10&lt;/a&gt;\&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-1891&#34;&gt;May 30, 2008 12:40 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Thanks Tim. I&amp;rsquo;ve blogged here about the Hunting Accident, with a pointer to the &amp;ldquo;canonical&amp;rdquo; version of Hiawatha, at &lt;a href=&#34;http://www.snee.com/bobdc.blog/2006/05/me_as_80s_new_york_lead_guitar.html.&#34;&gt;http://www.snee.com/bobdc.blog/2006/05/me_as_80s_new_york_lead_guitar.html.&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I wasn&amp;rsquo;t really a full time member of that band, but as the founding member&amp;rsquo;s roommate, I filled in for various members as necessary&amp;ndash;on guitar in the recording you point to and percussion in the session that Philip describes at &lt;a href=&#34;http://www.philipshelley.com/words/?p=60.&#34;&gt;http://www.philipshelley.com/words/?p=60.&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;By fauigerzigerk on &lt;a href=&#34;#comment-1892&#34;&gt;May 30, 2008 2:02 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;@Sjoerd: What you do is exactly the reason why this debate keeps going in circles. You ignore what problem is supposed to be solved. If you want to know what Bob means by spouse, you have to ask him, and of course there is nothing better than words for him to explain it to you, because both of you are human beings. But that wasn&amp;rsquo;t the problem Bob attempted to solve.&lt;/p&gt;
&lt;p&gt;What he wanted to do is to run a particular query on address book data: &amp;ldquo;What is the home phone of Leroy?&amp;rdquo;. And what he shows very well, I think, is that collecting a small amount of metadata sometimes allows us to ask questions about our data that we wouldn&amp;rsquo;t be able to ask otherwise. We can ask more questions without much additional effort if we enable the dumb machine to do a few inferencing steps for us. That&amp;rsquo;s all.&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s completely beyond me why the simple affair of deterministic inferencing presses some kind of &amp;ldquo;philosophy button&amp;rdquo; in some people and suddenly it&amp;rsquo;s all about defending the power of natural language against some evil formalism in a race to define the world. There&amp;rsquo;s no need for such a defense as there is no such race. It&amp;rsquo;s like defending an artistic painting showing a bridge against an architects CAD software that helps him design one. I&amp;rsquo;ve never heard painters and architects have such debates. Amazingly, we have them all the time in IT.\&lt;/p&gt;
&lt;p&gt;By fauigerzigerk on &lt;a href=&#34;#comment-1893&#34;&gt;May 30, 2008 2:07 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Sorry, my reply was directed at John Cowan not at Sjoerd Visscher.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-1894&#34;&gt;May 30, 2008 3:41 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;And just to hammer in my original point, &amp;ldquo;what Bob means by spouse&amp;rdquo; was that it&amp;rsquo;s a symmetrical property, and that&amp;rsquo;s all I meant, and that alone is useful.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2008">2008</category>
      
      <category domain="https://www.bobdc.com//categories/rdf/owl">RDF/OWL</category>
      
      <category domain="https://www.bobdc.com//categories/semantic-web">semantic web</category>
      
    </item>
    
    <item>
      <title>Integrating relational data into the semantic web</title>
      <link>https://www.bobdc.com/blog/integrating-relational-data-in/</link>
      <pubDate>Fri, 23 May 2008 17:15:30 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/integrating-relational-data-in/</guid>
      
      
      <description><div>By two guys who know what they&#39;re talking about.</div><div>&lt;p&gt;I was sorry to miss the Semantic Web Technologies conference, but I had a very interesting time yesterday giving a talk at the &lt;a href=&#34;http://www.lib.virginia.edu/newhorizons&#34;&gt;New Horizons in Teaching and Research&lt;/a&gt; conference at the University of Virginia (so nice to go to a conference 25 minutes from my house) on &lt;a href=&#34;http://www.lib.virginia.edu/newhorizons/thursday.html&#34;&gt;Semantic Web technologies, RDF and OWL (and Linked Data)&lt;/a&gt;. It was an audience of professors and researchers from a wide variety of disciplines who were all very open to hearing about how these standards could let them store metadata that helped them them get more out of their data.&lt;/p&gt;
&lt;p&gt;If you follow the second link in the preceding paragraph, you&amp;rsquo;ll see that &amp;ldquo;(and Linked Data)&amp;rdquo; wasn&amp;rsquo;t part of the original title. I added it because I felt that it was important for them to hear about the possibilities of linking publicly available data sources with each other and with local resources in order to find new connections and patterns, whether they were interested in storing the semantics of the various terms or not. They were surprisingly receptive to the basic ideas of RDF—some audiences keep coming back to &amp;ldquo;why use a URI for something that isn&amp;rsquo;t a web page where you can send your browser?&amp;quot;—but I still emphasized the increasing availability of non-RDF data to SPARQL queries and what this means to the &lt;a href=&#34;http://richard.cyganiak.de/2007/10/lod/&#34;&gt;growing collection of linked open data&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;As an example, I showed my &lt;a href=&#34;http://www.snee.com/xml/xml2006/owlrdbms.html&#34;&gt;demo for using D2RQ to integrate two different relational databases&lt;/a&gt;, but that was more of a toy example. At the &lt;a href=&#34;http://www.linkeddataplanet.com/&#34;&gt;LinkedData Planet&lt;/a&gt; conference in a few weeks, we&amp;rsquo;re going to hear Jim Melton and Ashok Malhotra&amp;rsquo;s talk on &lt;a href=&#34;http://www.linkeddataplanet.com/conference/sessionsbyday.php#W4&#34;&gt;Integrating Relational Data into the Semantic Web&lt;/a&gt;. They&amp;rsquo;re both employed by Oracle, but their reputation for developing and implementing important standards such as SQL and XQuery goes way beyond their Oracle work. Here is their abstract:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Invaluable data is stored in relational databases, but a growing fraction is being created in non-traditional forms such as spreadsheets and PDF files. Integrating relational and XML data is well-understood with widely-accepted solutions. Relational data must be similarly integrated with data in other forms. A promising approach is to translate the underlying data into a common format (such as RDF) and the creation of a &amp;ldquo;semantic cover&amp;rdquo; atop the data in the form of an ontology (perhaps using OWL). Classes and subclasses of the ontology are mapped into queries on the underlying data. The ontology can then be queried using SPARQL and the SPARQL queries translated to queries on the underlying data.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The two of them together make for a lot of firepower on the topic, and with so much of the world&amp;rsquo;s data stored in relational databases (particularly in their employer&amp;rsquo;s products), their insights on how to open up such data to Linked Data tools such as SPARQL will help more people get access to more data to mix, match, and use for interesting new applications.&lt;/p&gt;
&lt;p&gt;This is only one of many great talks we&amp;rsquo;ll have at the Linked Data Planet conference; come join us!&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2008">2008</category>
      
      <category domain="https://www.bobdc.com//categories/semantic-web">semantic web</category>
      
    </item>
    
    <item>
      <title>Having fun with Reuters Calais</title>
      <link>https://www.bobdc.com/blog/having-fun-with-reuters-calais/</link>
      <pubDate>Wed, 21 May 2008 11:09:28 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/having-fun-with-reuters-calais/</guid>
      
      
      <description><div>Feeding it Renaissance art history.</div><div>&lt;p&gt;&lt;a href=&#34;http://www.opencalais.com/&#34;&gt;Calais&lt;/a&gt; is the Reuters Clearforest product that, according to their homepage, &amp;ldquo;automatically annotates your content with rich semantic metadata&amp;rdquo;. Give it text, and it returns the text marked up with RDF that identifies entities and various semantic information about those entities.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;m looking forward to Reuters Clearforest CEO Barak Pridor&amp;rsquo;s talk &lt;a href=&#34;http://www.linkeddataplanet.com/conference/sessionsbyday.php#T12&#34;&gt;Enabling Semantic Applications Through Calais&lt;/a&gt; at the &lt;a href=&#34;http://www.linkeddataplanet.com/&#34;&gt;Linked Data Planet&lt;/a&gt; conference (his &lt;a href=&#34;http://talk.talis.com/archives/2008/03/barak_pridor_ta.html&#34;&gt;Talking with Talis&lt;/a&gt; podcast interview is definitely worth listening to), and I thought I&amp;rsquo;d get more out of his talk if I played with the software a bit first. It was easy and straightforward, but before I describe my first experiment, I wanted to mention two important points:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;For work-related projects, I&amp;rsquo;ve been researching machine-aided indexing tools and similar software on and off, and they usually look complex and expensive. &lt;a href=&#34;http://gate.ac.uk/&#34;&gt;GATE&lt;/a&gt; and &lt;a href=&#34;http://domino.research.ibm.com/comm/research_projects.nsf/pages/uima.index.html&#34;&gt;UIMA&lt;/a&gt; look tantalizing, but these are not tools; they&amp;rsquo;re frameworks into which you can plug such tools. Their included sample tools are either simple enough to do little more than a perl script with a few regular expressions would do, or else they&amp;rsquo;re complex enough to appear difficult to set up and get running—the term &amp;ldquo;training corpus&amp;rdquo; comes up a lot. Still, an admirable goal of both frameworks is that tool vendors doing more complex text processing should make their products compatible with these frameworks, letting their customers mix and match tools as necessary instead of forcing them to choose between expensive packages of tools that often do more than they need. (I look forward to discussing these with Jeni Tennison, who&amp;rsquo;s also been &lt;a href=&#34;http://www.jenitennison.com/blog/node/76&#34;&gt;researching this&lt;/a&gt;, the next time we&amp;rsquo;re in the same city. Unfortunately, there won&amp;rsquo;t be an Oxford XML Summer School this year, but we&amp;rsquo;re all very hopeful for 2009.) After a few years of sporadic research into automated entity recognition tools, I was very happy to see a free, RESTful web service one come along.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Kingsley Idehen &lt;a href=&#34;http://www.openlinksw.com/blog/~kidehen/?id=1343&#34;&gt;recently wrote&lt;/a&gt; the following about the Linked Data On the Web workshop in Beijing: &amp;ldquo;As the sessions progressed, it became clear during a number of accompanying Q&amp;amp;A sessions that a new Linked Data exploitation frontier is emerging. The frontier in question takes the form of a Linked Data substrate capable of addressing the taxonomic needs of solutions aimed at automated Named Entity Extraction, Disambiguation, Subject matter Concept alignment, transparently integrated with existing Web Content.&amp;rdquo; As you&amp;rsquo;ll see, I pushed Calais a little further than it was supposed to go for named entity extraction, but it still did an admirable job and helped me to focus more on the kinds of applications where it can shine.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;blockquote id=&#34;id202593&#34; class=&#34;pullquote&#34;&gt;After a few years of sporadic research into automated entity recognition tools, I was very happy to see a free, RESTful web service one come along.&lt;/blockquote&gt;
&lt;p&gt;The quickest way to start playing with Calais is to use &lt;a href=&#34;http://sws.clearforest.com/calaisviewer&#34;&gt;Calais Viewer&lt;/a&gt;, where you just paste some text and click the &amp;ldquo;submit&amp;rdquo; button to see what kinds of entities Calais can find in that text. To write you own applications that use it, you need to &lt;a href=&#34;http://www.opencalais.com/user/register&#34;&gt;become a Calais developer&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;To write my own application that passes content to Calais and then uses the results for something, I like &lt;a href=&#34;http://code.google.com/p/python-calais/&#34;&gt;python-calais&lt;/a&gt;. It handles the web service interactions when you want to send text or a URL identifying content to Calais, and it loads the returned RDF triples into a triplestore where you can play with them. (It uses &lt;a href=&#34;http://rdflib.net/&#34;&gt;RDFlib&lt;/a&gt;, a Python RDF library I hadn&amp;rsquo;t been able to get working for a while, but &lt;a href=&#34;http://rdflib.net/issues/2007/01/03/can&#39;t_complete_install_because_%22the_.net_framework_sdk_needs_to_be_installed_before_building_extensions_for_python.%22/issue&#34;&gt;an RDFLib comment page&lt;/a&gt; explained what I had to do to get a new version working on a Windows machine.)&lt;/p&gt;
&lt;p&gt;Because of a talk I&amp;rsquo;ll be giving tomorrow at the University of Virginia on &lt;a href=&#34;http://www.lib.virginia.edu/newhorizons/thursday.html&#34;&gt;Semantic web technologies, RDF, and OWL&lt;/a&gt;, and because the audience will have a large representation of people using technology for humanities research, I chose Giorgio Vasari&amp;rsquo;s &amp;ldquo;Lives of the Painters&amp;rdquo; for Calais to analyze. Vasari was a late-Renaissance Italian painter and architect whose writings about other painters are his main claim to modern fame. His biographies are considered a founding work of art history, and while &lt;a href=&#34;http://www.amazon.com/s/ref=nb_ss_gw?url=search-alias%3Daps&amp;amp;field-keywords=vasari&#34;&gt;several editions&lt;/a&gt; still sell well, Project Gutenberg has a public domain version of &lt;a href=&#34;http://www.gutenberg.org/etext/21212&#34;&gt;Volume 1&lt;/a&gt; available for free.&lt;/p&gt;
&lt;p&gt;Being more than 100,000 characters long, this work was too big to send off to Calais, so I broke it up into pieces, sent those off, and then aggregated the returned metadata. (Being in RDF makes the metadata very easy to aggregate.) The following is an excerpt, with some chunks removed and some carriage returns added for readability, of what Calais returned for one section of the book:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;lt;!--Use of the Calais Web Service is governed by the Terms of
Service located at http://www.opencalais.com. By using this
service or the results of the service you agree to these terms of
service.--&amp;gt;
&amp;lt;!--Relations: PersonPolitical, Quotation


Facility: castle of S. Angelo, portico of S. Peter, Doge&#39;s palace
Organization: Holy Church
NaturalFeature: Celian Hill
Continent: Africa
Country: France, Greece, Italy
Person: Maria Maggiore, Paul, Giovanni Evangelista, Valentinian,
Giovanni Battista, Ser Brunnellesco, St Gregory,
Giustiniano, John , St Hilarion, Luit, Hugh, Giovanni Morosini
City: Florence, Milan, Alexandria, Rome, Venice, Pistoia--&amp;gt;
  &amp;lt;rdf:Description c:allowDistribution=&amp;quot;false&amp;quot; c:allowSearch=&amp;quot;false&amp;quot; 
      c:externalID=&amp;quot;bb256bde4fbcf05c1942c1e67e256e07b73e5618&amp;quot; 
      c:id=&amp;quot;http://id.opencalais.com/ebZQM2eY3x3zCclB79WRXw&amp;quot;
      rdf:about=&amp;quot;http://d.opencalais.com/dochash-1/67406f3f-36bb-3adb-9364-8b15cc0b741d&amp;quot;&amp;gt;
    &amp;lt;rdf:type rdf:resource=&amp;quot;http://s.opencalais.com/1/type/sys/DocInfo&amp;quot;/&amp;gt;
    &amp;lt;c:document&amp;gt;&amp;lt;![CDATA[&amp;lt;Document&amp;gt;
    &amp;lt;Title&amp;gt;1208342711120-85FCAB2B-568168&amp;lt;/Title&amp;gt;
    &amp;lt;Date&amp;gt;2008-04-15&amp;lt;/Date&amp;gt;&amp;lt;Body&amp;gt;figures and
    some marble candelabra exquisitely carved with leaves,
and some children in bas-relief of extraordinary beauty? In short, by
these and many other signs, it is clear that sculpture was in
decadence in the time of Constantine, and with it the other superior
arts. If anything was required to complete their ruin it was supplied
by the departure of Constantine from Rome when he transferred the
seat of government to Byzantium, as he took with him to Greece not
only all the best sculptors and other artists of the age, such as
they were, but also a quantity of statues and other beautiful works
of sculpture.


&amp;lt;!-- lots of plain text content deleted --&amp;gt;


I must not forget to mention either, how in the course of time the
&amp;lt;/Body&amp;gt;&amp;lt;/Document&amp;gt;]]&amp;gt;&amp;lt;/c:document&amp;gt;
    &amp;lt;c:externalMetadata/&amp;gt;
    &amp;lt;c:submitter&amp;gt;MyProjectID&amp;lt;/c:submitter&amp;gt;
  &amp;lt;/rdf:Description&amp;gt;


&amp;lt;!-- Various header information removed --&amp;gt;


  &amp;lt;rdf:Description rdf:about=&amp;quot;http://d.opencalais.com/pershash-1/f87fd977-7cd7-348d-972b-8f77716da77d&amp;quot;&amp;gt;
    &amp;lt;rdf:type rdf:resource=&amp;quot;http://s.opencalais.com/1/type/em/e/Person&amp;quot;/&amp;gt;
    &amp;lt;c:name&amp;gt;Ser Brunnellesco&amp;lt;/c:name&amp;gt;
  &amp;lt;/rdf:Description&amp;gt;
  &amp;lt;rdf:Description rdf:about=&amp;quot;http://d.opencalais.com/dochash-1/77406f3f-26bb-3adb-9364-8b15cc0f756d/Instance/1&amp;quot;&amp;gt;
    &amp;lt;rdf:type rdf:resource=&amp;quot;http://s.opencalais.com/1/type/sys/InstanceInfo&amp;quot;/&amp;gt;
    &amp;lt;c:docId rdf:resource=&amp;quot;http://d.opencalais.com/dochash-1/77406f3f-26bb-3adb-9364-8b15cc0f756d&amp;quot;/&amp;gt;
    &amp;lt;c:subject rdf:resource=&amp;quot;http://d.opencalais.com/pershash-1/f87fd977-7cd7-348d-972b-8f77716da77d&amp;quot;/&amp;gt;
&amp;lt;!--Person: Ser Brunnellesco--&amp;gt;
    &amp;lt;c:detection&amp;gt;[of this church is such that Pippo di
]Ser Brunnellesco[ did not disdain to make use of it as his model]&amp;lt;/c:detection&amp;gt;
    &amp;lt;c:offset&amp;gt;14713&amp;lt;/c:offset&amp;gt;
    &amp;lt;c:length&amp;gt;16&amp;lt;/c:length&amp;gt;
  &amp;lt;/rdf:Description&amp;gt;


  &amp;lt;rdf:Description rdf:about=&amp;quot;http://d.opencalais.com/genericHasher-1/4fd6dc07-5b9d-3356-8f83-0e735dfa9910&amp;quot;&amp;gt;
    &amp;lt;rdf:type rdf:resource=&amp;quot;http://s.opencalais.com/1/type/em/e/Country&amp;quot;/&amp;gt;
    &amp;lt;c:name&amp;gt;Greece&amp;lt;/c:name&amp;gt;
  &amp;lt;/rdf:Description&amp;gt;
  &amp;lt;rdf:Description rdf:about=&amp;quot;http://d.opencalais.com/dochash-1/77406f3f-26bb-3adb-9364-8b15cc0f756d/Instance/16&amp;quot;&amp;gt;
    &amp;lt;rdf:type rdf:resource=&amp;quot;http://s.opencalais.com/1/type/sys/InstanceInfo&amp;quot;/&amp;gt;
    &amp;lt;c:docId rdf:resource=&amp;quot;http://d.opencalais.com/dochash-1/77406f3f-26bb-3adb-9364-8b15cc0f756d&amp;quot;/&amp;gt;
    &amp;lt;c:subject rdf:resource=&amp;quot;http://d.opencalais.com/genericHasher-1/4fd6dc07-5b9d-3356-8f83-0e735dfa9910&amp;quot;/&amp;gt;
&amp;lt;!--Country: Greece--&amp;gt;
    &amp;lt;c:detection&amp;gt;[government to Byzantium, as he took with him to ]Greece[ not
only all the best sculptors and other]&amp;lt;/c:detection&amp;gt;
    &amp;lt;c:offset&amp;gt;543&amp;lt;/c:offset&amp;gt;
    &amp;lt;c:length&amp;gt;6&amp;lt;/c:length&amp;gt;
  &amp;lt;/rdf:Description&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;After a comment with some legalese, the document starts with an XML comment listing the identified entities by classes such as Facility, Organization, and Person. Next, in a &lt;code&gt;c:document&lt;/code&gt; element, is the text passed to Calais to analyze, followed by RDF/XML about the entities that Calais found with length and offset figures to show where in the original text it found these entities. As you can see, it assigns a URL identifier to each one and lists metadata about it; for example, the thing with an identifier of &lt;a href=&#34;http://d.opencalais.com/pershash-1/f87fd977-7cd7-348d-972b-8f77716da77d&#34;&gt;http://d.opencalais.com/pershash-1/f87fd977-7cd7-348d-972b-8f77716da77d&lt;/a&gt; is a Person, has a name of &amp;ldquo;Ser Brunnellesco&amp;rdquo;, and is 16 characters long starting at position 14713 of the CDATA in the &lt;code&gt;c:document&lt;/code&gt; element within the XML returned by Calais.&lt;/p&gt;
&lt;p&gt;I tried writing something that used this metadata to wrap start- and end-tags around the entities at the identified points (for example, &lt;code&gt;&amp;lt;c:country&amp;gt;Greece&amp;lt;/c:country&amp;gt;&lt;/code&gt;) but quickly found out why so much text analysis software adds metadata out-of-line: because identified entities can overlap, so my added tags would usually make the result ill-formed XML. (I guess I should have read the fourth paragraph of Jeni&amp;rsquo;s post referenced above more closely.) I did write something to insert empty elements marking the beginning of an identified entity and its length (for example, &lt;code&gt;&amp;lt;c:Country length=&amp;quot;6&amp;quot;/&amp;gt;Greece&lt;/code&gt;) but I haven&amp;rsquo;t used it for anything yet.&lt;/p&gt;
&lt;p&gt;Once this new metadata is loaded into a triplestore, you can do some interesting things with it. For example, a little SPARQL let me create a report on the types of entities found. The following shows the beginning of the report after sorting by number of occurrences:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;    323  Person           &amp;quot;Jesus Christ&amp;quot;
    191  City             &amp;quot;Florence&amp;quot;
    136  Person           &amp;quot;After Giotto&amp;quot;
    116  Person           &amp;quot;Jesus Christ&amp;quot;
    108  Person           &amp;quot;Franco Sacchetti&amp;quot;
     87  Person           &amp;quot;Agnolo Gaddi&amp;quot;
     79  Person           &amp;quot;Lorenzo di Bicci&amp;quot;
     72  Person           &amp;quot;Diocletian&amp;quot;
     69  Person           &amp;quot;Giovanni Cimabue&amp;quot;
     67  Person           &amp;quot;Tuscany Niccola&amp;quot;
     65  Person           &amp;quot;St Paul&amp;quot;
     64  Person           &amp;quot;After Andrea&amp;quot;
     58  Person           &amp;quot;Giovanni Evangelista&amp;quot;
     58  City             &amp;quot;Rome&amp;quot;
     57  Person           &amp;quot;Antonio Vite&amp;quot;
     56  Person           &amp;quot;St Francis&amp;quot;
     55  Person           &amp;quot;Andrea Tafi&amp;quot;
     54  Person           &amp;quot;Francesco Petrarch&amp;quot;
     54  Person           &amp;quot;Francesco di Giorgio&amp;quot;
     53  Person           &amp;quot;Jacopo di Casentino&amp;quot;
     48  Person           &amp;quot;Bernardo Orcagna&amp;quot;
     47  Person           &amp;quot;Guglielmo da Forli&amp;quot;
     47  Person           &amp;quot;Giovanni Boccaccio&amp;quot;
     46  Country          &amp;quot;Italy&amp;quot;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;In the world of Italian Renaissance art, Jesus Christ is a big name. Calais didn&amp;rsquo;t really find the phrase &amp;ldquo;After Giotto&amp;rdquo; 136 times; it&amp;rsquo;s in there twice—once at the beginning of a sentence, with &amp;ldquo;After&amp;rdquo; having that capital &amp;ldquo;A&amp;rdquo;—and I assume that this led Calais to believe that &amp;ldquo;After&amp;rdquo; was Giotto&amp;rsquo;s first name, and that all other references to Giotto referred to the same guy.&lt;/p&gt;
&lt;p&gt;Public domain English translations of 16th-century biographies of 15th-century Italian painters are not the class of content that Calais was optimized for. Along with entities such as Person, Country, and City, a look at the &lt;a href=&#34;http://www.opencalais.com/calaisAPI#extractedsemanticdata&#34;&gt;entities, events, and facts&lt;/a&gt; that it searches for shows that business news—the second most popular (that is, second most easily funded) domain in computational linguistics research after terrorism—is the most important domain for the Calais folk. Potentially identified &amp;ldquo;Events/Facts&amp;rdquo; include Acquisition, AnalystEarningsEstimate, and StockSplit. This makes it a little clearer why Reuters would &lt;a href=&#34;http://www.reuters.com/article/technology-media-telco-SP/idUSNAAD300120070430&#34;&gt;buy Clearforest&lt;/a&gt; and the technology behind Calais.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;m working on another project using Calais that comes much closer to the kind of things that it is optimized for, and it&amp;rsquo;s looking really cool. It might even be a worthwhile enough application to park a domain name to host it. I hope to have some demos to show within a few weeks.&lt;/p&gt;
&lt;h2 id=&#34;1-comments&#34;&gt;1 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.opencalais.com&#34; title=&#34;http://www.opencalais.com&#34;&gt;Tom Tague&lt;/a&gt; on &lt;a href=&#34;#comment-1877&#34;&gt;May 25, 2008 10:12 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Tom Tague from Calais here.&lt;/p&gt;
&lt;p&gt;I love this stuff. It never fails to amaze me what uses &amp;amp; experiments people try with Calais. Who knows - maybe feeding a whole collection of similar works from various era&amp;rsquo;s might provide a useful research tool for looking at artistic trends by time or geography?&lt;/p&gt;
&lt;p&gt;One of the interesting experiments a number of Calais users have been playing with is document level co-occurrence. For example, what people occur most frequently mentioned together in news articles, etc.&lt;/p&gt;
&lt;p&gt;Might be a little tough @ the book level - but perhaps a book could be decomposed into chapters by (perhaps representing time periods) to simulate the same thing.&lt;/p&gt;
&lt;p&gt;Thanks for the experiment!&lt;/p&gt;
&lt;p&gt;Regards,&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2008">2008</category>
      
      <category domain="https://www.bobdc.com//categories/semantic-web">semantic web</category>
      
    </item>
    
    <item>
      <title>Upgrading to Movable Type 4</title>
      <link>https://www.bobdc.com/blog/upgrading-to-movable-type-4/</link>
      <pubDate>Sun, 18 May 2008 13:06:54 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/upgrading-to-movable-type-4/</guid>
      
      
      <description><div>Mostly OK.</div><div>&lt;p&gt;I had my host provider upgrade my weblog to Movable Type 4. At first, it looked like connections to the CSS stylesheets were broken, but it turns out that that the public_html/mt-static/themes directory that had the CSS files was replaced with a new one for MT4. Luckily, I had zipped this directory up before the upgrade and copied the zip file somewhere else, so when I restored the directory of the CSS that I use everything went back to normal.&lt;/p&gt;
&lt;p&gt;Well, everything except the ability to search through the entries. Attempting to search gives me the message &amp;ldquo;Publishing results failed: Can&amp;rsquo;t find included template module &amp;lsquo;Header&amp;rsquo;&amp;rdquo;. &lt;a href=&#34;http://forums.sixapart.com/index.php?act=Print&amp;amp;client=printer&amp;amp;f=7&amp;amp;t=63379&#34;&gt;This discussion of a similar problem&lt;/a&gt; tells me to republish the templates, and &lt;a href=&#34;http://www.movabletype.org/documentation/professional/universal-template-set.html&#34;&gt;this Movable Type help page&lt;/a&gt; explains how to do this. I worry that this sweeping command will mess up my existing templates in addition to generating the missing ones, and I&amp;rsquo;ve already spent too much time on this today, so for now I&amp;rsquo;ve commented out the search feature of this blog until I can really make sure that I have everything backed up that needs to be before taking this step.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://www.snee.com/bobdc.blog/blogging_about_blogging/&#34;&gt;Blogging about blogging&lt;/a&gt; is the one category of my entries here that I actively try to avoid contributing to—it&amp;rsquo;s such a tiresome topic on so many weblogs—but this entry also serves as a bit of post-upgrade regression testing. I&amp;rsquo;d hate to find out that the upgrade screwed up my feeds after posting something more substantive.&lt;/p&gt;
&lt;h2 id=&#34;1-comments&#34;&gt;1 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-1864&#34;&gt;May 18, 2008 3:45 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Here is a test comment.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2008">2008</category>
      
      <category domain="https://www.bobdc.com//categories/blogging-about-blogging">blogging about blogging</category>
      
    </item>
    
    <item>
      <title>Reading epub files with the Sony PRS-505 ebook reader</title>
      <link>https://www.bobdc.com/blog/reading-epub-files-with-the-so/</link>
      <pubDate>Wed, 14 May 2008 18:21:24 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/reading-epub-files-with-the-so/</guid>
      
      
      <description><div>For now, only on the Sony business development guys&#39; 505s.</div><div>&lt;img id=&#34;id202481&#34; src=&#34;https://www.bobdc.com/img/main/prs505.jpg&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;PRS-505&#34;/&gt;
&lt;p&gt;Lookout &lt;a href=&#34;http://www.engadget.com/&#34;&gt;engadget&lt;/a&gt;, it&amp;rsquo;s my first consumer electronics scoop: at &lt;a href=&#34;http://www.idpf.org/digitalbook08/default.htm&#34;&gt;Digital Book 2008&lt;/a&gt; today, Sony reader business development managers Bob Nell and Daniel Albohn, who were not listed on the &lt;a href=&#34;http://www.idpf.org/digitalbook08/agenda.htm&#34;&gt;online&lt;/a&gt; or printed programs, made a surprise presentation: they showed that Sony has worked out how to display ebooks in the standard &lt;a href=&#34;https://www.bobdc.com/blog/creating-epub-files&#34;&gt;epub&lt;/a&gt; format on the PRS-505. Nell started up the Adobe Digital Editions reader and dragged a DRM-free epub book and a PDF version of Suze Orman&amp;rsquo;s book &amp;ldquo;Woman and Money&amp;rdquo; to the PRS-505 icon, and then used a digital overhead projector to show us the books on his 505, complete with reflowing and resizing of text.&lt;/p&gt;
&lt;p&gt;Why is this cool? Because it lets us move epub and PDF files directly from our PCs to a high-contrast e-ink device. There&amp;rsquo;s a lot of PDFs and documents that I can easily convert to epub that I want to read without printing out a pile of paper, so I&amp;rsquo;m psyched.&lt;/p&gt;
&lt;p&gt;For now, It&amp;rsquo;s not something you can do on just any 505, but because it apparently doesn&amp;rsquo;t involve any specialized hardware, I&amp;rsquo;m sure that existing 505s will be able to do this after an eventual firmware upgrade. Nell said that Sony&amp;rsquo;s corporate communications department forbid them from discussing such plans.&lt;/p&gt;
&lt;p&gt;A few other notes from the one-day conference:&lt;/p&gt;
&lt;p&gt;Big publishers are apparently buying Sony ebook readers in bulk and giving them to their staff to use for manuscript review so that they can reduce the printing and shipping costs that they spend on moving all those piles of paper around. Apparently it&amp;rsquo;s &lt;a href=&#34;http://www.publishersweekly.com/article/CA6546015.html&#34;&gt;pretty successful&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Harlequin&amp;rsquo;s Malle Vallik did not just rehash her colleague Brent Lewis&amp;rsquo;s talk from the O&amp;rsquo;Reilly Tools of Change conference, which I described in &lt;a href=&#34;https://www.bobdc.com/blog/finding-an-ebook-audience&#34;&gt;Finding an eBook audience: Housewives reading bodice-rippers?&lt;/a&gt; as very inspirational for anyone considering the ebook market; she described several interesting initiatives they&amp;rsquo;ve taken in the few months since then. Did you know that there&amp;rsquo;s a literary genre called &amp;ldquo;Paranormal Romance&amp;rdquo;? They&amp;rsquo;re &lt;a href=&#34;http://www.eharlequin.com/store.html?cid=486&#34;&gt;all over it&lt;/a&gt; (note the &amp;ldquo;category&amp;rdquo; of these books), with a &lt;a href=&#34;http://paranormalromanceblog.wordpress.com/&#34;&gt;blog&lt;/a&gt; and, of course, &lt;a href=&#34;http://ebooks.eharlequin.com/05CE0D51-6D7A-469B-BDCF-4E6CF4CD04BB/10/126/en/SearchResultsNP.htm?SearchID=9799692&#34;&gt;ebook offerings&lt;/a&gt;.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2008">2008</category>
      
      <category domain="https://www.bobdc.com//categories/publishing">publishing</category>
      
    </item>
    
    <item>
      <title>Charlie Rose interviews Charlie Rose</title>
      <link>https://www.bobdc.com/blog/charlie-rose-interviews-charli/</link>
      <pubDate>Fri, 09 May 2008 10:06:30 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/charlie-rose-interviews-charli/</guid>
      
      
      <description><div>&#34;About the future of technology and the internet and mobile devices and all that&#34;. Very funny.</div><div>&lt;p&gt;YouTube video: &lt;a href=&#34;http://www.youtube.com/watch?v=LFE2CCfAP1o&#34;&gt;&amp;ldquo;Charlie Rose&amp;rdquo; by Samuel Beckett&lt;/a&gt;. Microsoft and Yahoo. Google&amp;hellip;&lt;/p&gt;

&lt;div style=&#34;position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;&#34;&gt;
  &lt;iframe src=&#34;https://www.youtube.com/embed/LFE2CCfAP1o&#34; style=&#34;position: absolute; top: 0; left: 0; width: 100%; height: 100%; border:0;&#34; allowfullscreen title=&#34;YouTube Video&#34;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;h2 id=&#34;1-comments&#34;&gt;1 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://csksoft.comk/rails.rdf&#34; title=&#34;http://csksoft.comk/rails.rdf&#34;&gt;Colm Sean Murdoch O Cinneide&lt;/a&gt; on &lt;a href=&#34;#comment-1858&#34;&gt;May 9, 2008 11:42 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;d like to address the economic model now steve(tm) ;-)&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2008">2008</category>
      
      <category domain="https://www.bobdc.com//categories/technology-future">technology, future</category>
      
    </item>
    
    <item>
      <title>My favorite bookmarklets</title>
      <link>https://www.bobdc.com/blog/my-favorite-bookmarklets/</link>
      <pubDate>Thu, 01 May 2008 08:09:25 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/my-favorite-bookmarklets/</guid>
      
      
      <description><div>Bookmarklets to search a website, navigate it, and see what links to it.</div><div>&lt;p&gt;The recent lifehacker article &lt;a href=&#34;http://lifehacker.com/software/feature/special-geek-to-live-129141.php&#34;&gt;Ten Must-Have Bookmarklets&lt;/a&gt; reminded me that I&amp;rsquo;ve developed a few handy ones myself. A bookmarklet (&amp;ldquo;bookmark&amp;rdquo; + &amp;ldquo;applet&amp;rdquo;) is a little bit of Javascript embedded in a link. They usually take some information about the page you&amp;rsquo;re looking at and do something useful with it. For example, if you highlight some text on this page and click &lt;a href=&#34;javascript:alert(document.getSelection())&#34;&gt;this demo&lt;/a&gt; it displays the highlighted text in a message box. This particular example is not very useful, but it demonstrates how a bookmarklet can grab information from the displayed web page and do something with it. The following shows what&amp;rsquo;s really in the link:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;lt;a href=&amp;quot;javascript:alert(document.getSelection())&amp;quot;&amp;gt;this demo&amp;lt;/a&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Bookmarklets more complex than this one may define and call functions, but they&amp;rsquo;re still all packed into a &lt;code&gt;a&lt;/code&gt; element&amp;rsquo;s &lt;code&gt;href&lt;/code&gt; attribute. The lifehacker article has one that builds on this use of the &lt;code&gt;getSelection()&lt;/code&gt; method by looking up the selected text in an acronym dictionary.&lt;/p&gt;
&lt;p&gt;Running bookmarklets on the page that contains them is rarely interesting, but when you keep these bookmarklets in your bookmarks menu (or, more likely, on your bookmarks toolbar), you can run them against anything, which is when they get valuable. You could drag that &amp;ldquo;this demo&amp;rdquo; link to your bookmarks toolbar and then highlight text on any web page, click that link, and see the highlighted text displayed in a message box, but you&amp;rsquo;d be better off dragging the more useful ones below to your toolbar:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;javascript:function%20homepage(currentURL)%20%7BnewURL%20=%20currentURL.replace(/(%5C.%5BA-Za-z%5D%7B2,4%7D)%5B%5C/%5C?%5C#%5D.*/,&#39;%241&#39;);window.location.href%20=%20newURL;%7D;%20homepage(location.href);&#34;&gt;site&amp;rsquo;s homepage&lt;/a&gt; goes right to a particular site&amp;rsquo;s homepage—for example, to &lt;a href=&#34;http://www.snee.com&#34;&gt;http://www.snee.com&lt;/a&gt; from this page.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;javascript:window.location.href=%22http://www.snee.com/bookmarklets/searchform.cgi?url=%22+location.href&#34;&gt;search site&lt;/a&gt; lets you search the website of the displayed page with a minimum of keystrokes. Click it to display a search form with one field (for example, &lt;a href=&#34;http://www.snee.com/bookmarklets/searchform.cgi?url=http://lifehacker.com/software/feature/special-geek-to-live-129141.php&#34;&gt;this form&lt;/a&gt; if you were looking at the lifehacker article), fill out that field with a string to search for, click &amp;ldquo;Go,&amp;rdquo; and Google searches that site for you.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;javascript:function%20parentdir(currentURL)%20%7B%20newURL%20=%20currentURL.replace(/(.*%5C/).*%5C/.*/,&#39;%241&#39;);window.location.href%20=%20newURL;%20%7D;%20parentdir(location.href)&#34;&gt;cd ..&lt;/a&gt; goes to the parent directory of the displayed page&amp;rsquo;s directory.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;javascript:location.href=&#39;http://www.google.com/search?as_lq=&#39;+document.location.href;&#34;&gt;backlink&lt;/a&gt; tells Google to list pages that link to the displayed page.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Other bookmarklets I use include &lt;a href=&#34;http://del.icio.us/help/buttons&#34;&gt;post to del.icio.us&lt;/a&gt; and some of the RDFa bookmarklets that I &lt;a href=&#34;https://www.bobdc.com/blog/digging-rdfa&#34;&gt;mentioned recently&lt;/a&gt;. I also learned from the lifehacker article about &lt;a href=&#34;https://www.squarefree.com/bookmarklets/&#34;&gt;Jesse&amp;rsquo;s Bookmarklets Site&lt;/a&gt;, which has many great ones. I was pleased to see (after an admittedly quick scan) that only my backlink bookmarklet above had an equivalent there—it made me feel like my others where somewhat original. I&amp;rsquo;m especially proud of the &amp;ldquo;search site&amp;rdquo; one, which is like the &amp;ldquo;site&amp;rsquo;s homepage&amp;rdquo; one with a few extra steps thrown in. I use these two every day; for example, a web search sends me to some page, I wonder &amp;ldquo;who are these guys?&amp;rdquo; and I click &lt;a href=&#34;javascript:function%20homepage(currentURL)%20%7BnewURL%20=%20currentURL.replace(/(%5C.%5BA-Za-z%5D%7B2,4%7D)%5B%5C/%5C?%5C#%5D.*/,&#39;%241&#39;);window.location.href%20=%20newURL;%7D;%20homepage(location.href);&#34;&gt;site&amp;rsquo;s homepage&lt;/a&gt; on my toolbar. Or, I get frustrated trying to find something on a site that doesn&amp;rsquo;t offer a Search feature, and I just use my own: &lt;a href=&#34;javascript:window.location.href=%22http://www.snee.com/bookmarklets/searchform.cgi?url=%22+location.href&#34;&gt;search site&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;2009-09-22 update&lt;/em&gt; When looking at a Wikipedia page, this one should take you to the corresponding DBpedia page: &lt;a href=&#34;javascript:location.href=location.href.replace(/en.wikipedia.org%5C/wiki/,%22dbpedia.org%5C/page%22)&#34;&gt;wp -&amp;gt; dbpedia&lt;/a&gt;. (If you use it on a page that you reached via redirection, you&amp;rsquo;ll get the DBpedia page about the redirection data. For example, compare the results of trying it on the &lt;a href=&#34;http://en.wikipedia.org/wiki/Barack_Obama&#34;&gt;http://en.wikipedia.org/wiki/Barack_Obama&lt;/a&gt; and the &lt;a href=&#34;http://en.wikipedia.org/wiki/Obama&#34;&gt;http://en.wikipedia.org/wiki/Obama&lt;/a&gt; Wikipedia pages.)&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2008">2008</category>
      
      <category domain="https://www.bobdc.com//categories/neat-tricks">neat tricks</category>
      
    </item>
    
    <item>
      <title>In single-source publishing, what do you call the source?</title>
      <link>https://www.bobdc.com/blog/in-singlesource-publishing-wha/</link>
      <pubDate>Fri, 25 Apr 2008 08:48:34 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/in-singlesource-publishing-wha/</guid>
      
      
      <description><div>The editorial XML?</div><div>&lt;img id=&#34;id202440&#34; src=&#34;https://www.bobdc.com/img/main/editorial.jpg&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;single-source publishing diagram&#34;/&gt;
&lt;p&gt;The idea of single source publishing is at least as old as SGML. You store one version of your content with all the information necessary to create other the versions (typically, a print version plus the electronic formats du jour), and then you develop automated routines to create those other versions from the central, &amp;ldquo;single&amp;rdquo; source. The central content gets updated as necessary, and you create new publications by running the appropriate routines to generate the other formats. By making changes in one place and generating the other versions with automated routines, you avoid the mistakes that result from trying to make the same change in multiple places. It&amp;rsquo;s a lot like creating different versions of a software product from a base set of source code, and many of the same tools are often used.&lt;/p&gt;
&lt;p&gt;Output formats can include various kinds of XML, such as XHTML, XSL-FO, or homegrown XML formats. To describe the central format I&amp;rsquo;ve heard the term &amp;ldquo;editorial XML&amp;rdquo; used, because it&amp;rsquo;s XML, but it plays a different role from the output XML formats: it&amp;rsquo;s the version that the editors maintain. Its structure is governed by the editorial DTDs or schemas.&lt;/p&gt;
&lt;p&gt;A homegrown XML output format is sometimes called the &amp;ldquo;delivery&amp;rdquo; XML. In his XML 2006 presentation &lt;a href=&#34;http://2006.xmlconference.org/proceedings/89/presentation.html&#34;&gt;Case Study: Managing XML for a Global Content Delivery Platform&lt;/a&gt;, my former LexisNexis colleague Marc Basch described how different business units within the company had their own editorial XML formats and converted these to a particular delivery format that they developed for the central platform that would aggregate them.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve heard this editorial/delivery distinction between sets of XML content elsewhere, but a Google search on the term doesn&amp;rsquo;t find much besides Marc&amp;rsquo;s paper. Has anyone else heard of these terms being used in a production system?&lt;/p&gt;
&lt;h2 id=&#34;4-comments&#34;&gt;4 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.dpawson.co.uk&#34; title=&#34;http://www.dpawson.co.uk&#34;&gt;Dave Pawson&lt;/a&gt; on &lt;a href=&#34;#comment-1834&#34;&gt;April 25, 2008 11:32 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;No Bob, the editorial version is the M$ Word version!&lt;/p&gt;
&lt;p&gt;I use master source for the XML master, however generated.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://ionrock.org/blog/&#34; title=&#34;http://ionrock.org/blog/&#34;&gt;Eric Larson&lt;/a&gt; on &lt;a href=&#34;#comment-1835&#34;&gt;April 25, 2008 12:34 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve noticed the master version is traditionally the original FrameMaker/Word version as well. It is not ideal, but with tools like WebWorks (which uses an XML intermediary format for processing), creating an XML based workflow while keeping the author tools seems possible.&lt;/p&gt;
&lt;p&gt;I think the immediate future will be focused on an author focused master with an XML based master that acts almost as an index. Again, it would be nice to just have one resource as a master, but I think the current technical documentation landscape doesn&amp;rsquo;t quite support it.&lt;/p&gt;
&lt;p&gt;I really do think that using tools like WebWorks can make single sourcing practical from a technical standpoint. Essentially it allows a gateway for tools to add to a structured data store. I know it has been considered primarily a help tool in the past, but for those folks who deal with XML, it really is much closer to a document processing XProc-ish tool built around XSLT.&lt;/p&gt;
&lt;p&gt;I should also mention that I used to work for WebWorks, so I&amp;rsquo;m a hair biased. From a technical standpoint, it is a very powerful tool that I believe many XML developers should consider taking a look at for document processing projects.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://seanmcgrath.blogspot.com&#34; title=&#34;http://seanmcgrath.blogspot.com&#34;&gt;Sean McGrath&lt;/a&gt; on &lt;a href=&#34;#comment-1836&#34;&gt;April 26, 2008 2:06 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Bob,&lt;/p&gt;
&lt;p&gt;The editorial XML version is the color-coded ODT XML version :-)&lt;/p&gt;
&lt;p&gt;I use the term &amp;ldquo;normative copy&amp;rdquo; rather than master/editorial myself. I.e. it it the stream of bytes that must be obeyed and, in the event of a dispute about correctness of content or presentation, acceded to.&lt;/p&gt;
&lt;p&gt;Sean&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://blog.reallysi.com&#34; title=&#34;http://blog.reallysi.com&#34;&gt;Ed Stevenson&lt;/a&gt; on &lt;a href=&#34;#comment-1845&#34;&gt;May 5, 2008 11:53 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;In the publishing industry, I have definitely heard the term “delivery XML” used commonly to refer to various XML outputs (mostly for web or third party aggregators). However, I have not really heard a common term for the source XML. And not sure if I&amp;rsquo;ve ever heard people in that industry refer to it as &amp;ldquo;editorial XML&amp;rdquo; (and that may because many in that industry are still getting XML after or as a part of print production processes).&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2008">2008</category>
      
      <category domain="https://www.bobdc.com//categories/xml">XML</category>
      
      <category domain="https://www.bobdc.com//categories/publishing">publishing</category>
      
    </item>
    
    <item>
      <title>Windows command line text processing with Javascript</title>
      <link>https://www.bobdc.com/blog/windows-command-line-text-proc/</link>
      <pubDate>Tue, 22 Apr 2008 09:11:18 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/windows-command-line-text-proc/</guid>
      
      
      <description><div>Or, technically, with JScript.</div><div>&lt;p&gt;&lt;em&gt;Update: don&amp;rsquo;t bother with CScript. See &lt;a href=&#34;https://www.bobdc.com/blog/javascript-from-the-command-li&#34;&gt;Javascript from the command line&lt;/a&gt; to learn about doing this with Rhino; you can also do it with node.js.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;I recently had to write a script that would make global replacements in a text file on a client&amp;rsquo;s machine. Much as I love python (I look forward to telling you about the fun I&amp;rsquo;m having with &lt;a href=&#34;http://code.google.com/p/python-calais/&#34;&gt;python-calais&lt;/a&gt;), it was one of those tasks that just cried out &amp;ldquo;perl&amp;rdquo;. Unfortunately, I couldn&amp;rsquo;t take perl&amp;rsquo;s existence for granted on the client&amp;rsquo;s computers, and having them install it was too much trouble.&lt;/p&gt;
&lt;blockquote id=&#34;id202455&#34; class=&#34;pullquote&#34;&gt;Between the the string manipulation functions and the regular expression, standard input, and standard output support, the combination of cscript and JScript gives all Windows machines a powerful text processing tool right out of the box.&lt;/blockquote&gt;
&lt;p&gt;I could, however, take everything in a typical Windows installation for granted, and it turned out that Microsoft&amp;rsquo;s &lt;a href=&#34;http://en.wikipedia.org/wiki/Jscript&#34;&gt;JScript&lt;/a&gt; implementation of Javascript could do everything I needed. (&lt;a href=&#34;http://en.wikipedia.org/wiki/VBScript&#34;&gt;VBScript&lt;/a&gt; equivalents of what I describe here shouldn&amp;rsquo;t be much different.) With Javascript&amp;rsquo;s roots as a web scripting language, I knew that safety reasons prevented it from having a lot of file input and especially output capabilities, but I learned that not only are these fairly easy, they can be done with &lt;a href=&#34;http://en.wikipedia.org/wiki/Standard_streams&#34;&gt;standard input&lt;/a&gt; and output, so that a command in a batch file can pipe content through JScript scripts just like it can with perl or python scripts.&lt;/p&gt;
&lt;p&gt;The Windows utility that lets you run scripts from the command line is called &lt;a href=&#34;http://msdn.microsoft.com/archive/default.asp?url=https://www.bobdc.com/archive/en-us/wsh/htm/wsRunCscript.asp&#34;&gt;cscript.exe&lt;/a&gt;. It assumes that a script file with an extension of &amp;ldquo;js&amp;rdquo; is a JScript program, although it can run VBScript programs as well.&lt;/p&gt;
&lt;p&gt;The following script performs a global replacement using the target and replacement strings passed as command line parameters. It demonstrates a few nice things:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;WScript.Echo&lt;/code&gt; writes to standard output. CScript is essentially a less-GUI oriented version of the &lt;a href=&#34;http://msdn.microsoft.com/archive/default.asp?url=https://www.bobdc.com/archive/en-us/wsh/htm/wsRunWscript.asp&#34;&gt;wscript.exe&lt;/a&gt; Windows scripting engine, so many basic library calls include the latter&amp;rsquo;s name.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The &lt;a href=&#34;http://msdn2.microsoft.com/en-us/library/at5ydy31(VS.85).aspx&#34;&gt;WScript object&lt;/a&gt; has a &lt;code&gt;StdIn&lt;/code&gt; method to read from standard input.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;WScript.Arguments&lt;/code&gt; stores the command line parameters used when invoking the script.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;You can use regular expressions with UNIXy syntax. To use a variable such as &lt;code&gt;target&lt;/code&gt; in a regular expression, you need the &lt;code&gt;RegExp&lt;/code&gt; object, as the sample script demonstrates, but the comment preceding that line shows how a hardcoded regular expression would not need this object.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;!-- --&gt;
&lt;pre&gt;&lt;code&gt;// replace.js: globally replace one string with another. 
// See directions for syntax. 


function directions() {
  WScript.Echo(&amp;quot;Enter\n&amp;quot;);
  WScript.Echo(&amp;quot;cscript //Nologo replace.js targetstring \
replstring &amp;lt; infile.txt &amp;gt; outfile.txt&amp;quot;);
     WScript.Echo(&amp;quot;\nEnclose strings that have spaces in quotation marks.&amp;quot;);
}
function processTextStream() {
    target = WScript.Arguments.Item(0);
    newString = WScript.Arguments.Item(1);
    while (!WScript.StdIn.AtEndOfStream) {
        line = WScript.StdIn.ReadAll();
        // If I wasn&#39;t passing a variable as the first argument 
        // of line.replace, I could use normal regex syntax like
        // line.replace(/Robert/g,&amp;quot;Bob&amp;quot;)
        line = line.replace(RegExp(target,&amp;quot;g&amp;quot;),newString)
            WScript.Echo(line);
    }
}


// --------------------------------------------------


if (WScript.Arguments.length &amp;lt; 2) {
    directions();
}
else {
    processTextStream();
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;As the directions show, you could run this at a Windows command line like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;   cscript //Nologo replace.js Robert Bob &amp;lt; oldaddrbook.txt &amp;gt; newaddrbook.txt
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;If you&amp;rsquo;re redirecting the output to a file and don&amp;rsquo;t want the Microsoft cscript banner in that file, don&amp;rsquo;t forget the &lt;a href=&#34;http://www.amazon.com/No-Logo-Space-Choice-Jobs/dp/0312421435&#34;&gt;//Nologo&lt;/a&gt; parameter. &lt;a href=&#34;http://msdn2.microsoft.com/en-us/library/0fs17b0s(VS.85).aspx&#34;&gt;Several other parameters&lt;/a&gt; are available.&lt;/p&gt;
&lt;p&gt;Between the string manipulation functions and the regular expression, standard input, and standard output support, this combination of the cscript engine and the J(ava)Script programming language gives all Windows machines a powerful text processing tool right out of the box.&lt;/p&gt;
&lt;h2 id=&#34;3-comments&#34;&gt;3 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://danbri.org/&#34; title=&#34;http://danbri.org/&#34;&gt;Dan Brickley&lt;/a&gt; on &lt;a href=&#34;#comment-1830&#34;&gt;April 22, 2008 12:19 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Nifty!&lt;/p&gt;
&lt;p&gt;I guess you could run RDF stuff in a single-file too, using &lt;a href=&#34;http://www.jibbering.com/rdf-parser/&#34;&gt;http://www.jibbering.com/rdf-parser/&lt;/a&gt; :)&lt;/p&gt;
&lt;p&gt;Haven&amp;rsquo;t played with this at all yet, am mostly MacoSX-based lately, where addressbook and pubsub APIs are getting my attention lately. Is it possible to access the equivalent in Windows from .js?&lt;/p&gt;
&lt;p&gt;By orlando on &lt;a href=&#34;#comment-1831&#34;&gt;April 23, 2008 11:37 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I would suggest you to use awk95.exe:&lt;/p&gt;
&lt;p&gt;[&amp;hellip;]awk was chosen since it is a very small download (compared with Perl or WSH/VB) and accomplishes the task of modifying configuration files upon installation. Brian Kernighan&amp;rsquo;s &lt;a href=&#34;http://cm.bell-labs.com/cm/cs/who/bwk/&#34;&gt;http://cm.bell-labs.com/cm/cs/who/bwk/&lt;/a&gt; site has a compiled native Win32 binary, &lt;a href=&#34;http://cm.bell-labs.com/cm/cs/who/bwk/awk95.exe&#34;&gt;http://cm.bell-labs.com/cm/cs/who/bwk/awk95.exe&lt;/a&gt; which you must save with the name awk.exe rather than awk95.exe.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://httpd.apache.org/docs/2.2/platform/win_compiling.html&#34;&gt;http://httpd.apache.org/docs/2.2/platform/win_compiling.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-1832&#34;&gt;April 23, 2008 12:57 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I love awk, and actually pulled out my little gray book just recently&amp;ndash;it&amp;rsquo;s one of the few books that I ever owned two copies of so that I could keep one at work and one at home. (When I first discovered SGML, I was writing awk scripts to convert XyWrite files to online help source for Windows, OS/2, mainframes, and Unix flavors.) It&amp;rsquo;s certainly easier to install than perl, but for the situation I described above, I was better off not telling this client to download and install anything.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2008">2008</category>
      
      <category domain="https://www.bobdc.com//categories/neat-tricks">neat tricks</category>
      
    </item>
    
    <item>
      <title>ebook sales</title>
      <link>https://www.bobdc.com/blog/ebook-sales/</link>
      <pubDate>Thu, 17 Apr 2008 08:44:59 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/ebook-sales/</guid>
      
      
      <description><div>Keeping track.</div><div>&lt;p&gt;&lt;a href=&#34;http://www.idpf.org/doc_library/industrystats.htm&#34;&gt;&lt;img id=&#34;id202440&#34; src=&#34;http://www.idpf.org/doc_library/statistics/images/Trade%20Stats.jpg&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;ebook sales charts&#34; width=&#34;300px&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The big question about ebooks is typically this: there&amp;rsquo;s hardware, there&amp;rsquo;s software, and there are titles, but how well are books &lt;em&gt;selling&lt;/em&gt;? A good way to keep track is &lt;a href=&#34;http://www.idpf.org/doc_library/industrystats.htm&#34;&gt;the IDPF&amp;rsquo;s industry stats page&lt;/a&gt;, which was updated a few weeks ago to include January 2008 ebook sales numbers. (Note the caveats below the chart such as &amp;ldquo;Retail numbers may be as much as double the above figures due to industry wholesale discounts.&amp;rdquo;)&lt;/p&gt;
&lt;p&gt;An &lt;a href=&#34;http://toc.oreilly.com/2008/04/lbf-what-ex-smokers-and-ebook-early-adopters-have-in-common.html&#34;&gt;O&amp;rsquo;Reilly Tools of Change blog posting&lt;/a&gt; on Penguin Group Digital Director Genevieve Shore&amp;rsquo;s presentation at the London Book Fair has additional interesting information on ebook sales. Andrew Savikas offers the following highlights from her talk:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Though Penguin USA has been selling ebooks for 10 years, 2007 was the first time they saw &amp;ldquo;interesting revenue&amp;rdquo;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;In the first two months of 2008, Penguin USA has sold more ebooks than in all of 2007&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Readers now expect new frontlist titles to be available as ebooks at the same time they show up in bookstores&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The &lt;a href=&#34;http://toc.oreilly.com/&#34;&gt;Tools of Change&lt;/a&gt; page is definitely worth reading if you&amp;rsquo;re interested in publishing technology. They offer feeds for the the blog, for the news, and a combined feed, which is the one I subscribe to.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2008">2008</category>
      
      <category domain="https://www.bobdc.com//categories/ebooks">ebooks</category>
      
    </item>
    
    <item>
      <title>Digging RDFa</title>
      <link>https://www.bobdc.com/blog/digging-rdfa/</link>
      <pubDate>Mon, 14 Apr 2008 11:24:29 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/digging-rdfa/</guid>
      
      
      <description><div>More data to play with, more tools to play with it.</div><div>&lt;img id=&#34;id202439&#34; src=&#34;https://www.bobdc.com/img/main/diggrdfa.jpg&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;digg.com with RDFa highlighted&#34;/&gt;
&lt;p&gt;RDFa seems to be picking up more momentum in the last few weeks. The &lt;a href=&#34;https://www.bobdc.com/blog/the-future-of-rdfa#comments&#34;&gt;formerly skeptical&lt;/a&gt; Taylor Cowan is &lt;a href=&#34;http://www.oreillynet.com/xml/blog/2008/04/they_knew_the_train_would_come.html&#34;&gt;liking it more&lt;/a&gt;, and I learned from the &lt;a href=&#34;http://rdfa.info/2008/04/04/digg-starts-using-rdfa/&#34;&gt;RDFa blog&lt;/a&gt; that &lt;a href=&#34;http://digg.com/&#34;&gt;Digg&lt;/a&gt; has lots of RDFa—five triples of information for some stories, so there are some simple but cool applications waiting to be written around those.&lt;/p&gt;
&lt;p&gt;I was a little behind in taking advantage of the &lt;a href=&#34;http://www.w3.org/2006/07/SWD/RDFa/impl/js/&#34;&gt;RDFa bookmarklets&lt;/a&gt;, but Ben Adida pointed me to them when I asked about an up-to-date RDFa extractor. I set the GetN3 one to send the N3 triples right to Emacs, and it&amp;rsquo;s great when looking at a web page like Digg&amp;rsquo;s home page to click a button and see all the triples appear in Emacs.&lt;/p&gt;
&lt;p&gt;The first few times I tried the RDFa Highlight bookmarklet, which puts red rectangles around all the parts of a web page that have RDFa metadata assigned, I didn&amp;rsquo;t think it was very useful; I thought, OK, red rectangles, what can I do with them? My experience with Digg changed my mind. A single button click gives a very quick and intuitive display of how much RDFa a page offers to work with.&lt;/p&gt;
&lt;p&gt;Ironically, the RDFa bookmarklets page has no RDFa in it as I write this. My blog&amp;rsquo;s home page has very little, and the bookmarklets can&amp;rsquo;t seem to find it, but they work fine on my Permalink pages (like &lt;a href=&#34;https://www.bobdc.com/blog/digging-rdfa&#34;&gt;this one&lt;/a&gt;), which have a bit more anyway.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2008">2008</category>
      
      <category domain="https://www.bobdc.com//categories/rdf">RDF</category>
      
    </item>
    
    <item>
      <title>Two-side printing on a one-sided printer</title>
      <link>https://www.bobdc.com/blog/twoside-printing-on-a-onesided/</link>
      <pubDate>Wed, 09 Apr 2008 07:57:32 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/twoside-printing-on-a-onesided/</guid>
      
      
      <description><div>Use less paper, carry around less paper.</div><div>&lt;p&gt;I recently wanted to print a 184-page spec to read, but I didn&amp;rsquo;t want to carry around a pile of paper that big. I have no access to a two-sided printer, but I figured out how to create two-sided pages with a one-side printer:&lt;/p&gt;
&lt;img id=&#34;id202410&#34; src=&#34;https://www.bobdc.com/img/main/printer.jpg&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;printer picture&#34;/&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Make a mark at one end of the first blank page waiting to be printed so that you know which side gets printed and which end will be the top.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Print the odd pages of the range of pages you want, in reverse order.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Put the printed pages back in the printer, oriented so that page 2 will be printed on the other side of page 1, page 4 on the other side of page 3, and so forth. The mark you put on the first page in step 1 should give you a clue about how to do this, but if you&amp;rsquo;re not sure, print just the first few pages of your document this way to make sure you have it right. With my Brother MFC-7220 printer, I put the pages back in with the printed side up and the top of the page at the front of the printer. (Update: If the last page you want to print is an odd-numbered page, don&amp;rsquo;t put it back in for the second run. For example, if you&amp;rsquo;re printing a total of 21 pages, don&amp;rsquo;t reload the piece of paper with page 21 on it in for this step. You want page 20 to be on the other side of page 19.)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Print even pages of the same range in reverse order.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The double reverse puts the final product in the correct order.&lt;/p&gt;
&lt;p&gt;I first worked this out printing a PDF file from &lt;a href=&#34;https://www.bobdc.com/blog/a-nice-windows-alternative-to&#34;&gt;Foxit Reader&lt;/a&gt;, which lets you set all the necessary details from the &lt;strong&gt;Print&lt;/strong&gt; dialog box. Microsoft Word&amp;rsquo;s &lt;strong&gt;Print&lt;/strong&gt; dialog box lets you pick &amp;ldquo;Odd Pages&amp;rdquo; or &amp;ldquo;Even Pages&amp;rdquo; instead of &amp;ldquo;All pages in range&amp;rdquo;, but you have to click the &lt;strong&gt;Options&lt;/strong&gt; button to find the &lt;strong&gt;Reverse print order&lt;/strong&gt; checkbox. In Open Office Writer, click the &lt;strong&gt;Options&lt;/strong&gt; button on the &lt;strong&gt;Print&lt;/strong&gt; dialog box to find the &lt;strong&gt;Left pages&lt;/strong&gt;, &lt;strong&gt;Right pages&lt;/strong&gt;, and &lt;strong&gt;Reversed&lt;/strong&gt; check boxes.&lt;/p&gt;
&lt;p&gt;One more hint: if the pages come out of the printer warm and slightly curled after the first pass, it can reduce the chance of paper jams if you let them cool off before inserting them for the second pass.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2008">2008</category>
      
      <category domain="https://www.bobdc.com//categories/neat-tricks">neat tricks</category>
      
    </item>
    
    <item>
      <title>Linked Data Planet program in place</title>
      <link>https://www.bobdc.com/blog/linked-data-planet-program-in/</link>
      <pubDate>Sun, 06 Apr 2008 12:31:19 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/linked-data-planet-program-in/</guid>
      
      
      <description><div>Big names and big ideas.</div><div>&lt;p&gt;&lt;a href=&#34;http://www.linkeddataplanet.com/conference/conferencegrid.php&#34;&gt;&lt;img id=&#34;id202406&#34; src=&#34;http://www.linkeddataplanet.com/images/hdr_logo.jpg&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;Linked Data Planet logo&#34; width=&#34;240px&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The &lt;a href=&#34;http://www.linkeddataplanet.com/conference/conferencegrid.php&#34;&gt;program&lt;/a&gt; for the Linked Data Planet conference that I&amp;rsquo;m co-chairing with Ken North in New York City in June is just about all set. The list of speakers has a great mix of well-known names in the Linked Data world (which overlaps quite a bit with the semantic web world) such as Tim Berners-Lee, Kingsley Idehen, Jim Hendler, and Jim Melton. We have hands-on people, as opposed to marketing people, from large and small software companies talking about tools that you can use to build applications with the increasing amount of linked data out there.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;m especially looking forward to the &lt;a href=&#34;http://www.linkeddataplanet.com/conference/sessionsbyday.php#T8&#34;&gt;Linked Data Workshop panel&lt;/a&gt; that I&amp;rsquo;m moderating, in which seven experts in the field discuss approaches to the key issues involved in building linked data applications. My only regret is that this is scheduled opposite Seth Earley&amp;rsquo;s presentation on &lt;a href=&#34;http://www.linkeddataplanet.com/conference/sessionsbyday.php#T7&#34;&gt;the role of taxonomies and controlled vocabularies in data integration&lt;/a&gt;, because I&amp;rsquo;ve been reading up on taxonomies lately and wanted to hear what he has to say.&lt;/p&gt;
&lt;p&gt;At least I&amp;rsquo;ll get to meet him, and many other people I&amp;rsquo;ve always wanted to meet, in addition to seeing some old friends such as Uche Ogbuji and Walter Perry, two of my panel&amp;rsquo;s participants. Come join us, and then as the world of linked data technologies grows you&amp;rsquo;ll someday be able to say &amp;ldquo;I remember the first Linked Data conference, back in 2008&amp;hellip;&amp;rdquo;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2008">2008</category>
      
      <category domain="https://www.bobdc.com//categories/metadata">metadata</category>
      
    </item>
    
    <item>
      <title>Vote for my brother&#39;s Radiohead remix</title>
      <link>https://www.bobdc.com/blog/vote-for-my-brothers-radiohead/</link>
      <pubDate>Thu, 03 Apr 2008 21:08:14 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/vote-for-my-brothers-radiohead/</guid>
      
      
      <description><div>But only if you get the joke.</div><div>&lt;p&gt;Radiohead has made the separate tracks for their new single &amp;ldquo;Nude&amp;rdquo; available and &lt;a href=&#34;http://www.radioheadremix.com/&#34;&gt;invited people to submit remixes&lt;/a&gt;. My brother Peter, who &lt;a href=&#34;http://www.musicforpicture.com&#34;&gt;scores TV commercials&lt;/a&gt; for a living, has submitted a &lt;a href=&#34;http://www.mcylinder.com&#34;&gt;remix&lt;/a&gt; that&amp;rsquo;s hilarious if you know two things in advance:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Radiohead lead guitarist Jonny Greenwood won the 2006 British Composer Awards prize for a classical piece titled &lt;a href=&#34;http://blog.wired.com/music/2006/10/radioheads_john.html&#34;&gt;Popcorn Superhet Receiver&lt;/a&gt;. This wasn&amp;rsquo;t some &lt;a href=&#34;http://en.wikipedia.org/wiki/Paul_McCartney&#39;s_Liverpool_Oratorio&#34;&gt;Liverpool Oratorio&lt;/a&gt; pop star attempt to move into &amp;ldquo;serious&amp;rdquo; territory by writing for a classical ensemble; he studied viola in his youth, served as the BBC&amp;rsquo;s composer in residence in 2004, and is considered to be somewhat of an expert on French composer Olivier Messiaen.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;A 1972 top ten hit in the US and UK described by &lt;a href=&#34;http://en.wikipedia.org/wiki/Popcorn_(instrumental)&#34;&gt;Wikipedia&lt;/a&gt; as &amp;ldquo;the first primarily electronic-based piece of music to reach the American popular music charts&amp;rdquo; was titled &amp;ldquo;Popcorn&amp;rdquo;. You can hear it on &lt;a href=&#34;http://www.youtube.com/watch?v=9N4ckFN96-k&#34;&gt;YouTube&lt;/a&gt; while watching some forgettable visuals. The &amp;ldquo;Hot Butter&amp;rdquo; version, although not the original, has some proto-disco strings and is a classic among gimmicky instrumentals.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you&amp;rsquo;ve heard &amp;ldquo;Nude&amp;rdquo; and remember &amp;ldquo;Popcorn&amp;rdquo;, you&amp;rsquo;ll should enjoy Peter&amp;rsquo;s &lt;a href=&#34;http://www.radioheadremix.com/remix/?id=349&#34;&gt;More Popcorn, Less Superhet Receiver&lt;/a&gt;. If so, please vote for it. Jonny Greenwood seems to smile even less than Thom Yorke, but this should help.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://www.radioheadremix.com/remix/?id=349&#34;&gt;&lt;img id=&#34;id202409&#34; src=&#34;http://upload.wikimedia.org/wikipedia/commons/thumb/0/09/Popcorn02.jpg/250px-Popcorn02.jpg&#34; border=&#34;0&#34; align=&#34;center&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;popcorn picture&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&#34;3-comments&#34;&gt;3 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://timothyhorrigan.com&#34; title=&#34;http://timothyhorrigan.com&#34;&gt;Timothy Horrigan&lt;/a&gt; on &lt;a href=&#34;#comment-1771&#34;&gt;April 5, 2008 12:50 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I voted for Peter&amp;rsquo;s remix. His remix was a vast improvement over the typical Radiohead fare.&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s kind of a chintzy contest&amp;hellip; you have pay $4.95 to download the tracks and there seems to be no prize for winning other than the honor of winning. But I think I will take a stab at it myself anyway.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://radioheadremix.com/remix/?id=938&#34; title=&#34;http://radioheadremix.com/remix/?id=938&#34;&gt;The Wombat&lt;/a&gt; on &lt;a href=&#34;#comment-1808&#34;&gt;April 10, 2008 1:01 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I like it (and voted). This is also one of my favs:&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://radioheadremix.com/remix/?id=938&#34;&gt;http://radioheadremix.com/remix/?id=938&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The Wombat&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.radioheadremix.com/remix/?id=874&#34; title=&#34;http://www.radioheadremix.com/remix/?id=874&#34;&gt;Scott&lt;/a&gt; on &lt;a href=&#34;#comment-1818&#34;&gt;April 11, 2008 1:52 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I voted as well.&lt;br /&gt;
This is my current fave too.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2008">2008</category>
      
      <category domain="https://www.bobdc.com//categories/music">music</category>
      
    </item>
    
    <item>
      <title>RDF and social networks</title>
      <link>https://www.bobdc.com/blog/rdf-and-social-networks/</link>
      <pubDate>Tue, 01 Apr 2008 09:13:20 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/rdf-and-social-networks/</guid>
      
      
      <description><div>Better than XML!</div><div>&lt;p&gt;Looking at Michael Pick&amp;rsquo;s video &lt;a href=&#34;http://www.vimeo.com/610179&#34;&gt;DataPortability - Connect, Control, Share, Remix&lt;/a&gt; and on the &lt;a href=&#34;http://www.dataportability.com/&#34;&gt;dataportability.com home page&lt;/a&gt;, I saw that RDF was included in a brief list of standards involved, and something occurred to me about the value of RDF in attempts to share data across applications such as social networking sites—in particular, why it&amp;rsquo;s better than XML for this.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://www.dataportability.com/&#34;&gt;&lt;img id=&#34;id202418&#34; src=&#34;https://www.bobdc.com/img/main/dpstandards.jpg&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;data portability standards logos&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;XML was invented for online publishing, but its popularity grew so quickly because of its value for sharing data between organizations that have different information infrastructures. When I give a class in XSLT, I begin by describing the visions people originally had of DTDs describing common information structures (I almost wrote &amp;ldquo;common vocabularies&amp;rdquo; there, but the difference is precisely the point of this posting) that would let different business partners in the same field share data more easily.&lt;/p&gt;
&lt;p&gt;Many of these visions worked out very well, but many people are still waiting for the DTD or schema of their field&amp;rsquo;s information. (I bring this up in XSLT classes because many people got tired of waiting for relevant DTD standards and decided to just accept others&amp;rsquo; XML in whatever format and convert it as needed upon arrival. This practice has been a big driver of XSLT&amp;rsquo;s success.) DTD development is complex because in addition to identifying common vocabularies, you must spell out the relationships between these pieces of information. In other words, you must do some data modeling. There is a lot more heuristic gut reaction involved in this than in designing relational databases, where the straightforward procedure of &lt;a href=&#34;http://en.wikipedia.org/wiki/Database_normalization&#34;&gt;normalization&lt;/a&gt; can guide many of your decisions about data relationships. DTDs also require you to decide which information should be wrapped in containers, which should be stored in attributes, which need unique IDs&amp;hellip; there&amp;rsquo;s a great payoff if you do this work, but it&amp;rsquo;s a lot of work, and it&amp;rsquo;s especially difficult for committees of people from different organizations to work together on this.&lt;/p&gt;
&lt;p&gt;Defining a vocabulary instead of a DTD is the low-hanging fruit. (I&amp;rsquo;m deliberately using the term &amp;ldquo;vocabulary&amp;rdquo; instead of taxonomy or ontology to keep things simple, but the tools and techniques of those fields have much to contribute.) It doesn&amp;rsquo;t reduce the work to do by simplifying it, but by reducing the scope: by forgetting about the data structures. If you want to just define a list of terms and exchange collections of field name/value pairs whose names correspond to that list of terms—or even to several lists, as long as each is clearly identified—RDF makes it very simple, and it&amp;rsquo;s even more portable than XML. With the right vocabularies, I can deliver my contact information, my Facebook and LinkedIn IDs, and any other data that deserves to be portable without worrying about hierarchies in this set of information, or which order they should be in, or which should be in attributes and which should be in elements.&lt;/p&gt;
&lt;p&gt;There is a bit of irony here, in that what turned many people off from RDF was the ugliness of XML representations of it to to represent data structures such as &lt;a href=&#34;http://www.w3.org/TR/2004/REC-rdf-primer-20040210/#containers&#34;&gt;containers&lt;/a&gt;. RDF can help people get a lot of useful work done if they ignore these data structures (and &lt;a href=&#34;http://www.w3.org/2001/10/stripes/&#34;&gt;striping&lt;/a&gt;). This can leave them with some simple, intuitive RDF/XML, and they also have the option of ignoring RDF/XML and using something like &lt;a href=&#34;http://www.w3.org/DesignIssues/Notation3.html&#34;&gt;n3&lt;/a&gt; instead.&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s nice to see that the data portability folks didn&amp;rsquo;t get scared off.&lt;/p&gt;
&lt;h2 id=&#34;1-comments&#34;&gt;1 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://norman.walsh.name/&#34; title=&#34;http://norman.walsh.name/&#34;&gt;Norman Walsh&lt;/a&gt; on &lt;a href=&#34;#comment-1764&#34;&gt;April 2, 2008 6:13 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Yep, all the way back &lt;a href=&#34;http://norman.walsh.name/2003/12/11/practicalrdf&#34;&gt;to 2003&lt;/a&gt;, I&amp;rsquo;ve been pitching RDFs two strengths as aggregation and inference.&lt;/p&gt;
&lt;p&gt;For data portability, there&amp;rsquo;s a huge win to cheap aggregation. XML is my hammer of choice, but even I can&amp;rsquo;t claim that mixing random XML vocabularies is cheap.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2008">2008</category>
      
      <category domain="https://www.bobdc.com//categories/rdf">RDF</category>
      
    </item>
    
    <item>
      <title>What do you do with your ebook prototypes?</title>
      <link>https://www.bobdc.com/blog/what-do-you-do-with-your-ebook/</link>
      <pubDate>Fri, 28 Mar 2008 18:26:39 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/what-do-you-do-with-your-ebook/</guid>
      
      
      <description><div>Or any other new electronic product that you&#39;re not ready to charge money for?</div><div>&lt;p&gt;I&amp;rsquo;ve given several talks on preparing strategies for the ebook market, and one key point is the value of early prototyping. &lt;a href=&#34;http://www.idpf.org&#34;&gt;epub&lt;/a&gt; format ebooks are good for this because they&amp;rsquo;re &lt;a href=&#34;https://www.bobdc.com/blog/creating-epub-files&#34;&gt;easy to make&lt;/a&gt; and fit well into the increasing use of &lt;a href=&#34;http://en.wikipedia.org/wiki/Agile_software_development&#34;&gt;agile software development&lt;/a&gt; practices in the publishing world. (Other formats aren&amp;rsquo;t very difficult to create either, and as we&amp;rsquo;ll see, may be worth including in your prototyping efforts.) You can put something together, show it to your management and customers, get their feedback, make tweaks to reflect the feedback, and continue this cycle as you refine the look and feel of your ebook.&lt;/p&gt;
&lt;p&gt;&amp;ldquo;Showing it to your customers&amp;rdquo;, however, is easier said than done. How do you go about this? The important thing to remember (and to be honest, this all applies to nearly any new electronic delivery medium, not just ebooks) is that while ebooks may not be something you can charge customers money for, they still have value that can help you make money. For example, you can:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Give away an ebook version of a book with the hardcopy version. As with the registration of a product, this can add a customer to a mailing list, and it acclimates them to ebook use, which a lot of publishers are interested in doing as they compare the costs of producing and distributing ebooks with the costs of producing and distributing hardcopy bound books.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Give away ebooks with only the first few chapters of a title to motivate customers to buy the complete hardcopy book. You can combine the beginning of several books into a new product sampler, as record companies have done for years.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Give away ebooks as an incentive for subscribing to a product for a longer period of time—for example, you could offer subscribers a free ebook with a two-year subscription to a product, but not with a one-year subscription. If you&amp;rsquo;re a magazine publisher, the book could compile the work of a particular author, or pieces from a particular section of the magazine such as recipes, travel writing, contemporary reviews of classic movies of the 1970s&amp;hellip;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Give away ebooks as an incentive to fill out surveys. Detailed customer surveys take time to fill out, and you don&amp;rsquo;t just want to know about the buying habits of customers with a lot of time on their hands. The inclusion of a free ebook at the end of a survey could help to accumulate more valuable survey data.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Give away ebooks with public domain material related to your product line. (It looks like &lt;a href=&#34;http://radar.oreilly.com/archives/2008/03/penguins-missed-ebook-opportun.html&#34;&gt;Penguin is already doing this&lt;/a&gt;.)&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Even if you don&amp;rsquo;t use ebooks as incentives for filling out customer surveys, most of these ideas will help you gather data that helps you develop an ebook strategy. Making ebooks available in multiple formats will eventually give you ideas about which formats are more popular with your potential customers. You can also see which content classes (recipes? reference material? humor? technical content? serial fiction?) are more popular with your potential ebook readers.&lt;/p&gt;
&lt;p&gt;Along with whetting your audience&amp;rsquo;s appetite for new titles in more traditional media, free ebooks can also help you explore their appetite for older titles. Publishers who&amp;rsquo;ve been around for a while often have a large backlist of content that isn&amp;rsquo;t valuable enough to print, bind, store, and ship, but still has enough accumulated long tail value that it can drive revenue if the production costs are low enough. With ebooks, those costs are pretty low.&lt;/p&gt;
&lt;h2 id=&#34;1-comments&#34;&gt;1 Comments&lt;/h2&gt;
&lt;p&gt;By andrew on &lt;a href=&#34;#comment-1756&#34;&gt;April 1, 2008 11:48 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Penguin isn&amp;rsquo;t giving ebooks with public domain add-ins away, they&amp;rsquo;re (improbably) going to (try to) charge $8 for them.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2008">2008</category>
      
      <category domain="https://www.bobdc.com//categories/ebooks">ebooks</category>
      
    </item>
    
    <item>
      <title>Customizing nxml to find your schemas automatically</title>
      <link>https://www.bobdc.com/blog/customizing-nxml-to-find-your/</link>
      <pubDate>Tue, 25 Mar 2008 10:46:12 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/customizing-nxml-to-find-your/</guid>
      
      
      <description><div>By namespace or document element.</div><div>&lt;p&gt;The first time I loaded an RDF/XML document into Emacs with &lt;a href=&#34;http://www.thaiopensource.com/nxml-mode/&#34;&gt;nxml mode&lt;/a&gt;, it automatically loaded the appropriate RELAX NG compact schema for me. I was especially impressed because RDF/XML has such a potentially tricky structure. (Perhaps too tricky, but that&amp;rsquo;s another topic.) In its default configuration, nxml automatically loads the appropriate schemas for RDF/XML, XHTML 1, RELAX NG, DocBook, and XSLT. This last one has been my only real XSLT development tool other than actual XSLT processors for years.&lt;/p&gt;
&lt;p&gt;For other document types, I&amp;rsquo;d go to the &lt;strong&gt;XML&lt;/strong&gt; menu that nxml adds to Emacs, pick &lt;strong&gt;Set Schema&lt;/strong&gt;, then &lt;strong&gt;File&amp;hellip;&lt;/strong&gt;, and then browse to the appropriate RELAX NG compact schema file. Because I edit a lot of XML files, this was adding up to a lot of time, and I just learned how to set up nxml to find most of the schemas I need automatically based on a document&amp;rsquo;s namespace or document element.&lt;/p&gt;
&lt;p&gt;All you need to do is to add new namespace/schema pairs to the right configuration file. The same configuration file lets me add more choices to the /&lt;strong&gt;XML&lt;/strong&gt;/&lt;strong&gt;Set Schema&lt;/strong&gt;/&lt;strong&gt;For Document Type&lt;/strong&gt; cascade menu so that if I create an empty new document with an extension of &amp;ldquo;xml&amp;rdquo; and nxml has no other clue about what schema to use, I can pick the document type off of this menu instead of browsing around my hard disk looking for the schema.&lt;/p&gt;
&lt;p&gt;When you assign a schema to a document with /&lt;strong&gt;Set Schema&lt;/strong&gt;/&lt;strong&gt;File&amp;hellip;&lt;/strong&gt;, nxml asks if you want to &amp;ldquo;Save Schema Location to [directory of file being edited]&amp;rdquo;. If so, it adds an entry to that directory&amp;rsquo;s schemas.xml file that points at the document and at the schema so that the next time you load that document nxml will know what schema to use. After using nxml for years, I only just learned that the directory with the elisp files of nxml code has the central schemas.xml file where you can add elements and attributes to do everything I described above.&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s all described on Dave Pawson&amp;rsquo;s &lt;a href=&#34;http://www.dpawson.co.uk/relaxng/nxml/schemaloc.html&#34;&gt;nxml-mode Schema location&lt;/a&gt; page, but I&amp;rsquo;ll summarize it here. The following element in this schemas.xml file adds an XHTML2 entry to the &lt;strong&gt;For Document Type&lt;/strong&gt; cascade menu so that picking it loads an XHTML 2 schema:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;lt;typeId id=&amp;quot;XHTML2&amp;quot; uri=&amp;quot;xhtml2/xhtml2.rnc&amp;quot;/&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The following tells nxml to load the same schema for a document whose document element is in the XHTML 2 namespace:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;lt;namespace ns=&amp;quot;http://www.w3.org/2002/06/xhtml2/&amp;quot; uri=&amp;quot;xhtml2/xhtml2.rnc&amp;quot;/&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Another option is to automatically load a schema based on the document element, as opposed to the namespace. I wouldn&amp;rsquo;t want to do this for a &lt;code&gt;html&lt;/code&gt; element, because it might be an XHTML 1 or an XHTML 2 document. It&amp;rsquo;s handy for DITA documents, though, which don&amp;rsquo;t have specific namespaces. The following loads the appropriate schema for a DITA reference topic:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;lt;documentElement localName=&amp;quot;reference&amp;quot; uri=&amp;quot;/usr/local/DITA-OT1.4.1/rnc/reference.rnc&amp;quot;/&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;(To create RELAX NG compact schemas for DITA, use &lt;a href=&#34;http://www.thaiopensource.com/relaxng/trang.html&#34;&gt;trang&lt;/a&gt; and then set the &lt;code&gt;start&lt;/code&gt; pattern to equal reference.element in reference.rnc, task.element in task.rnc, and so forth.)&lt;/p&gt;
&lt;p&gt;Sometimes I wonder if I&amp;rsquo;ll ever do large-scale editing with anything but Emacs. Then, I find yet another way to make Emacs even more convenient to use, and I know that making such a switch would be an even bigger, more difficult jump.&lt;/p&gt;
&lt;h2 id=&#34;2-comments&#34;&gt;2 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://norman.walsh.name/&#34; title=&#34;http://norman.walsh.name/&#34;&gt;Norman Walsh&lt;/a&gt; on &lt;a href=&#34;#comment-1747&#34;&gt;March 27, 2008 10:30 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I cooked up a Perl script (that I suppose I&amp;rsquo;m now committing to publish) that reads the same locating rules document so that I can say &amp;ldquo;xjparse foo.xml&amp;rdquo; and have it find the right schema automatically.&lt;/p&gt;
&lt;p&gt;(The name xjparse is an historical accident, but it&amp;rsquo;s what I&amp;rsquo;m used to typing).&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-1748&#34;&gt;March 28, 2008 3:13 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Why perl?&lt;/p&gt;
&lt;p&gt;Just kidding. Looking forward to seeing it&amp;hellip;&lt;/p&gt;
&lt;p&gt;Bob&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2008">2008</category>
      
      <category domain="https://www.bobdc.com//categories/xml">XML</category>
      
    </item>
    
    <item>
      <title>How much is a frequent flyer mile worth?</title>
      <link>https://www.bobdc.com/blog/how-much-is-a-frequent-flyer-m/</link>
      <pubDate>Tue, 18 Mar 2008 09:05:57 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/how-much-is-a-frequent-flyer-m/</guid>
      
      
      <description><div>To US Airways, eight tenths of a cent.</div><div>&lt;p&gt;You fly on some airline, you register with their frequent flyer program (although each airline comes up with their own goofy name for the program, such as &amp;ldquo;Dividend Reward Value Sky Miles Plus&amp;rdquo;), you earn miles, and when you earn enough, you cash them in on a free flight. So they&amp;rsquo;re worth something to you, but what? If you cash in 25,000 miles on a flight that would have set you back $500, that doesn&amp;rsquo;t mean that each mile was worth 2 cents, although that is a &lt;a href=&#34;http://answers.yahoo.com/question/index?qid=20070527030928AASOvpO&#34;&gt;popular figure&lt;/a&gt; for estimates of a frequent flyer mile&amp;rsquo;s average value. Getting a seat on that plane is much more difficult using miles than just forking over the cash, so they&amp;rsquo;re really not equivalent.&lt;/p&gt;
&lt;p&gt;My wife and I recently had an opportunity to learn that it&amp;rsquo;s worth less than half that to US Airways. While waiting for a flight home on an overbooked flight, she took them up on their offer for a voucher for a round-trip ticket in the continental US if she waited a few hours for another flight. When I tried to use this voucher for a family vacation, I found that it had the same restrictions that using frequent flyer miles does: each flight has a limited number of seats that the airline gives to people who aren&amp;rsquo;t paying cash, and those seats go very quickly—especially when your local airport is not very big, and neither are its planes—so the voucher is as difficult to use as the miles.&lt;/p&gt;
&lt;blockquote id=&#34;id202401&#34; class=&#34;pullquote&#34;&gt;It becomes an eighth grade math problem.&lt;/blockquote&gt;
&lt;p&gt;The US Airways woman on the phone (another annoying part about using US Airways miles is that you have to book over the phone) offered me another option: the voucher could be used for a $200 credit on any flight that wasn&amp;rsquo;t completely full.&lt;/p&gt;
&lt;p&gt;Now it becomes an eighth-grade math problem. If using this voucher for a round-trip flight gets you the same choice of seats and flights as 25,000 frequent flyer miles, but US Airways will also give you $200 credit toward any flight for the same voucher, how much is a frequent flyer mile really worth? Eight tenths of a cent.&lt;/p&gt;
&lt;p&gt;I gave up trying to book a family trip to the Grand Canyon with the voucher and my collection of frequent flyer miles, which on paper were enough to do the trick. It was just too difficult. And, shortly after, when I was shopping for a new Mastercard, I skipped all the ones that give you frequent flyer miles with purchases, now that I knew how little they were worth.&lt;/p&gt;
&lt;h2 id=&#34;1-comments&#34;&gt;1 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.ccil.org/~cowan&#34; title=&#34;http://www.ccil.org/~cowan&#34;&gt;John Cowan&lt;/a&gt; on &lt;a href=&#34;#comment-1723&#34;&gt;March 18, 2008 11:43 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I think the airlines oughta ditch the miles and just go back to handing out S &amp;amp; H Green Stamps (yes, they still exist, in digital form even). At least you could redeem those for some durable object you could actually use. As it is, miles are the equivalent of &amp;ldquo;the first hit is free&amp;rdquo;: people fly airplanes so they can &amp;hellip; fly airplanes.&lt;/p&gt;
&lt;p&gt;What&amp;rsquo;s more, the existence of miles creates knock-on effects even for people who don&amp;rsquo;t care about them. Employees want to capture the miles gained on business trips for themselves, though on what possible ethical basis, I can&amp;rsquo;t see &amp;ndash; employees aren&amp;rsquo;t normally allowed to accept gifts of this sort. So billion-dollar companies no longer pay for business trips, forcing their employees to loan them money, typically borrowed from a bank, for 4-6 weeks while they go through their (usually paper-based) reimbursement process. I need that money more than my employer does. The whole thing is a cynical exercise in &amp;ldquo;sod the public&amp;rdquo;.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2008">2008</category>
      
      <category domain="https://www.bobdc.com//categories/tourism">tourism</category>
      
    </item>
    
    <item>
      <title>Creating epub files</title>
      <link>https://www.bobdc.com/blog/creating-epub-files/</link>
      <pubDate>Thu, 13 Mar 2008 20:44:57 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/creating-epub-files/</guid>
      
      
      <description><div>With nothing but free tools.</div><div>&lt;p&gt;I&amp;rsquo;ve discussed the &lt;a href=&#34;http://www.idpf.org/&#34;&gt;epub&lt;/a&gt; eBook format here before when describing how I &lt;a href=&#34;https://www.bobdc.com/blog/free-epub-childrens-picture-bo&#34;&gt;created some epub children&amp;rsquo;s books&lt;/a&gt; from Project Gutenberg files for the OLPC XO. In another discussion of the format, I once saw someone complain that Adobe&amp;rsquo;s strong support of it was based on the fact that their tools are the only ones that can create epub files, but this is only true if we add a few qualifications: their tools are the only &lt;em&gt;commercial&lt;/em&gt; ones that can create epub files &lt;em&gt;for now&lt;/em&gt;.&lt;/p&gt;
&lt;blockquote class=&#34;pullquote&#34; style=&#39;width: 190px; font: bold 1.333em/1.125em &#34;Helvetica Neue&#34;, Helvetica, Arial, sans-serif; margin: 1.5em 0 1.5em 1.5em !important; padding: 0.6em 5px !important; background: none !important; border: 3px double \#ddd; border-width: 3px 0; text-align: center; float: right; &#39;&gt;**It&#39;s easy to create epub eBooks with free software if you don&#39;t need commercial tools to create some XML (mostly XHTML) files and zip them up together. I certainly don&#39;t.**&lt;/blockquote&gt;
&lt;p&gt;For all I know, other commercial tools can create them by now, but more importantly, you can easily create epub eBooks with free software if you don&amp;rsquo;t need commercial tools to create some XML files (mostly XHTML) and zip them up together. I certainly don&amp;rsquo;t. The &lt;a href=&#34;http://www.jedisaber.com/eBooks/tutorial.asp&#34;&gt;epub eBooks Tutorial&lt;/a&gt; is a good place to start, and don&amp;rsquo;t miss the &lt;a href=&#34;http://www.hxa7241.org/articles/content/epub-guide_hxa7241_2007.html&#34;&gt;Epub Format Construction Guide&lt;/a&gt;, especially on the tricky zipping issues described below. The latter also points to the &lt;a href=&#34;http://www.info-zip.org/Zip.html&#34;&gt;Info-zip&lt;/a&gt; free Windows zip utility and wisely skips the Tutorial&amp;rsquo;s recommendation to store certain files in an OEBPS directory in the zip file, a practice that is just a convention developed for a related format that predates epub.&lt;/p&gt;
&lt;p&gt;The trickiest part of creating an epub file is that the one-line mimetype file, which looks like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;application/epub+zip
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This mimetype file must be first in the zip file, uncompressed, with no space or any other characters after that final &amp;ldquo;p&amp;rdquo;. The Epub Format Construction guide shows the following command as an example for creating an epub file called EpubGuide-hxa7241.epub with mimetype first,&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;zip -Xr9D EpubGuide-hxa7241.epub mimetype *
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;but I had better luck creating such a file in two stages, like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;zip -q0X  EpubGuide-hxa7241.epub mimetype
zip -qXr9D  EpubGuide-hxa7241.epub *
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Either way, remember that this is far and away the most difficult part of creating an epub file, and it&amp;rsquo;s not very difficult, especially considering that once you have a mimetype file that works (which you can pull from another epub file) you can use it in all of your epub files with no changes.&lt;/p&gt;
&lt;p&gt;To automate a little quality checking of your epub file, an open source utility called &lt;a href=&#34;http://code.google.com/p/epubcheck/&#34;&gt;epubcheck&lt;/a&gt; is now available. It checks the XML files inside the epub file for consistency of internal references, for conformance to the relevant RELAX NG schemas, and for problems with the mimetype file described above. I only recently learned that java jar files can be pulled apart like zip files, so the following two commands will list the epubcheck jar file&amp;rsquo;s contents and pull out one of the listed files (the RELAX NG schema for the Open Packaging Format):&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;jar -tf epubcheck-0.9.2.jar
jar -xf epubcheck-0.9.2.jar com/adobe/epubcheck/rng/opf.rng
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;XML files are easier to create if you use schemas to guide their creation, and RELAX NG is the best schema language, so it&amp;rsquo;s worth pulling all of the RNG Files out of the epubcheck jar file to use when creating the files you&amp;rsquo;ll put in your epub file.&lt;/p&gt;
&lt;p&gt;Most of those files are straightforward XHTML of the content you&amp;rsquo;ll put in your eBook. I&amp;rsquo;ve created epub files from XHTML that was sitting on my hard disk and from &lt;a href=&#34;http://www.gutenberg.org/wiki/Main_Page&#34;&gt;Project Gutenberg&lt;/a&gt; files, although a little &lt;a href=&#34;http://home.ccil.org/~cowan/XML/tagsoup/&#34;&gt;tagsoup&lt;/a&gt; cleanup of these files is worth it to automate the handling of some otherwise annoying quirks you might come across—Project Gutenberg (X)HTML isn&amp;rsquo;t always very consistent.&lt;/p&gt;
&lt;p&gt;The other files in an epub eBook are a table of contents file, a list of the files in use (including image files), and a pointer to the file with the list of files. The tutorials above go into more detail about these, but if you pull these XML files out of any epub file and look them over, their workings are pretty self-evident.&lt;/p&gt;
&lt;p&gt;So if you&amp;rsquo;re interested in eBooks, get an epub file or two (plenty of classics are available at &lt;a href=&#34;http://www.feedbooks.com/&#34;&gt;feedbooks&lt;/a&gt;), unzip them, review the pieces as you look through the &lt;a href=&#34;http://www.jedisaber.com/eBooks/tutorial.asp&#34;&gt;epub eBooks Tutorial&lt;/a&gt; and &lt;a href=&#34;http://www.hxa7241.org/articles/content/epub-guide_hxa7241_2007.html&#34;&gt;Epub Format Construction Guide&lt;/a&gt;, and then make a few of your own and see how they look on some of the free eBook readers out there such as &lt;a href=&#34;http://www.adobe.com/products/digitaleditions/&#34;&gt;Adobe Digital Editions&lt;/a&gt; and &lt;a href=&#34;http://www.fbreader.org/&#34;&gt;FBReader&lt;/a&gt;. If you&amp;rsquo;re a publisher wondering about how to approach the eBook market, start making epub prototypes of some of your titles. In a future posting I&amp;rsquo;ll write about ways to make use of these prototypes as you lay the groundwork for actually selling them.&lt;/p&gt;
&lt;h2 id=&#34;2-comments&#34;&gt;2 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.ccil.org/~cowan&#34; title=&#34;http://www.ccil.org/~cowan&#34;&gt;John Cowan&lt;/a&gt; on &lt;a href=&#34;#comment-1717&#34;&gt;March 14, 2008 10:42 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;How widely accepted is epub, and how much does it deviate from OEBEPS? I&amp;rsquo;m not asking out of randomness; there are Reasons.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-1718&#34;&gt;March 14, 2008 1:03 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Hi John,&lt;/p&gt;
&lt;p&gt;I never looked too closely at OEBPS, but as I understand it, the main difference is that OEBPS books were not zipped up into a single file. There&amp;rsquo;s more at &lt;a href=&#34;https://www.idpf.org/forums/viewtopic.php?t=22.&#34;&gt;https://www.idpf.org/forums/viewtopic.php?t=22.&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;epub acceptance: most people who follow the eBook market closely seem pretty confident that the major eBook readers (except maybe for Kindle, where they do things completely their own way) will be supporting epub Real Soon Now. In &lt;a href=&#34;http://en.oreilly.com/toc2008/public/schedule/detail/2140&#34;&gt;the talk before mine at the O&amp;rsquo;Reilly TOC conference&lt;/a&gt; Adobe&amp;rsquo;s Bill McCoy made it clear that Adobe considers PDF and epub to be the main electronic delivery platforms of the future. Any platform that can run Adobe Digital Editions can display epub files, and Adobe people at the conference were also saying something to the effect of &amp;ldquo;we can&amp;rsquo;t talk about new platforms that we&amp;rsquo;re porting Adobe Digital Editions to just yet, but isn&amp;rsquo;t it great that Apple has an iPhone SDK out now?&amp;rdquo;&lt;/p&gt;
&lt;p&gt;Even while other formats are still around, epub&amp;rsquo;s status as an open, well-documented format, and especially &lt;a href=&#34;http://dearauthor.com/wordpress/2007/11/19/no-kindle-exclusivity-for-harlequin-readers/&#34;&gt;Hachette&amp;rsquo;s attitude&lt;/a&gt; (&amp;ldquo;Every one of our partners (Sony, Amazon, eBooks.com, etc.) will only be receiving the .epub format from us. We will not be doing any special proprietary conversions for anyone, which includes the Kindle. It will be up to each partner to convert to whatever proprietary format can handle the .epub format&amp;hellip;&amp;rdquo;) means that it&amp;rsquo;s gaining traction as a lingua franca common format for B2B exchange of ebooks.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2008">2008</category>
      
      <category domain="https://www.bobdc.com//categories/ebooks">ebooks</category>
      
    </item>
    
    <item>
      <title>Accessibility problems with microformats</title>
      <link>https://www.bobdc.com/blog/accessibility-problems-with-mi/</link>
      <pubDate>Fri, 07 Mar 2008 08:19:37 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/accessibility-problems-with-mi/</guid>
      
      
      <description><div>By guest blogger Sarah Bourne, Chief Technology Strategist for the Commonwealth of Massachusetts.</div><div>&lt;p&gt;&lt;em&gt;In a recent posting here on &lt;a href=&#34;https://www.bobdc.com/blog/the-future-of-rdfa&#34;&gt;The future of RDFa&lt;/a&gt;, I described some of the advantages of RDFa compared with some of the disadvantages of microformats. When &lt;del&gt;Massachusetts Commonwealth&lt;/del&gt; Mass.gov Chief Technology Strategist Sarah Bourne posted &lt;a href=&#34;http://www.snee.com/bobdc.blog/2008/02/the_future_of_rdfa.html#c001572&#34;&gt;a comment&lt;/a&gt; about problems that microformats present for website accessibility, I asked her to elaborate, and she was kind enough to put this together for me.&lt;/em&gt;&lt;/p&gt;
&lt;blockquote class=&#34;pullquote&#34; style=&#39;width: 190px; font: bold 1.333em/1.125em &#34;Helvetica Neue&#34;, Helvetica, Arial, sans-serif; margin: 1.5em 0 1.5em 1.5em !important; padding: 0.6em 5px !important; background: none !important; border: 3px double \#ddd; border-width: 3px 0; text-align: center; float: right; &#39;&gt;**If Massachusetts pursues enriching our content, RDFa seems a more likely candidate.**&lt;/blockquote&gt;
&lt;p&gt;Part of my job is investigating new technologies that we might want or need to support on the Mass.Gov portal. A colleague brought microformats to my attention a year or so ago. Although I found it alluring—re-using standard markup to provide richer content—there were troubling accessibility issues.&lt;/p&gt;
&lt;p&gt;We are bound by a variety of state and federal laws to ensure content on Mass.Gov is fully usable when people are using assistive technologies (AT) to compensate for a wide range of disabilities. By far the most common AT is JAWS—screen reader software that converts the graphical Windows interface to sequential text, which is then rendered as spoken language, allowing the blind to use the same software as other Windows users.&lt;/p&gt;
&lt;p&gt;I suspect that the problems with microformats lie in the fact that they are being developed by a voluntary group instead of an established standards body. The community structure certainly leads to quicker decisions, but they are not as well vetted with a broader audience. Conflicts may not appear until their decisions have been put into practice.&lt;/p&gt;
&lt;p&gt;There are two areas where microformats run afoul of accessibility.&lt;/p&gt;
&lt;h2 id=&#34;yDZg3NWzTn6pEdiIA5hf_A&#34;&gt;&lt;code&gt;abbr&lt;/code&gt; design pattern&lt;/h2&gt;
&lt;p&gt;The first is the &amp;ldquo;abbr design pattern&amp;rdquo;. In this case, the &lt;code&gt;abbr&lt;/code&gt; tag is used to provide machine-readable date and location information in a standardized format. There has been disagreement about the semantics of abbreviation. Is &amp;ldquo;March 12, 2007 at 5 PM, Central Standard Time&amp;rdquo; really an abbreviation of &amp;ldquo;20070312T1700-06&amp;rdquo;? Is &amp;ldquo;Austin, TX&amp;rdquo; really an abbreviation of &amp;ldquo;30.300474;-97.747247&amp;rdquo;? It really comes down to whether you are talking about people (for whom the answer in practical terms is &amp;ldquo;no&amp;rdquo;) or machines (that would answer affirmatively).&lt;/p&gt;
&lt;p&gt;The problem is that JAWS reads the title of &lt;code&gt;abbr&lt;/code&gt; tags, not the text that is enclosed. This is a very friendly behavior in normal use of &lt;code&gt;abbr&lt;/code&gt;: it vocalizes &amp;ldquo;Central Standard Time&amp;rdquo; instead of &amp;ldquo;CST&amp;rdquo;. But the title of a microformat &lt;code&gt;abbr&lt;/code&gt; is decidedly unfriendly: very few people would recognize &amp;ldquo;two zero zero seven zero three one two capital-tee one seven zero zero hyphen zero six&amp;rdquo; as a particular date and time.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;abbr&lt;/code&gt; design pattern is also used by some to provide translations. Unless someone wants to declare that there is a One True Language, this is not only problematic for people using AT, it not semantically defensible.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;abbr&lt;/code&gt; design pattern for date:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;lt;abbr class=&amp;quot;dtstart&amp;quot; title=&amp;quot;20070312T1700-06&amp;quot;&amp;gt;
March 12, 2007 at 5 PM, Central Standard Time
&amp;lt;/abbr&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;code&gt;abbr&lt;/code&gt; design pattern for location:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;lt;abbr class=&amp;quot;geo&amp;quot; title=&amp;quot;30.300474;-97.747247&amp;quot;&amp;gt;
Austin, TX
&amp;lt;/abbr&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;For additional information, see &lt;a href=&#34;http://www.webstandards.org/2007/04/27/haccessibility/&#34;&gt;hAccessibility&lt;/a&gt;, by Bruce Lawson and James Craig.&lt;/p&gt;
&lt;h2 id=&#34;V172oi7HTlqedZ8LPAUzrw&#34;&gt;include pattern&lt;/h2&gt;
&lt;p&gt;The second problem area is the use of empty href in the include pattern. These links are invisible to most people, but not to screen readers, which are often configured to read the link title, especially if there is no text enclosed in the a tag. Again, this means AT users are presented with mysterious content that was intended only for machines, while users of graphical browsers are shielded.&lt;/p&gt;
&lt;p&gt;For additional information, see &lt;a href=&#34;http://www.isolani.co.uk/blog/access/ConfiguringLinksInScreenReaders&#34;&gt;Configuring links in Screen readers&lt;/a&gt;, by &lt;del&gt;Joe Clark&lt;/del&gt; Mike Davies.&lt;/p&gt;
&lt;h2 id=&#34;L-X6R5qUQrSjHovjcYVwgw&#34;&gt;Therefore&amp;hellip;&lt;/h2&gt;
&lt;p&gt;There has been resistance from the microformats community to addressing these conflicts. This is dismaying since one their basic tenets is to give precedence to use &amp;ldquo;in the wild&amp;rdquo; and this is how AT products actually behave. There was a big hullaballoo about this in May 2007, but there has been no change since then. This leads me to believe that the microformats folks just do not care about accessibility to the extent that I need to.&lt;/p&gt;
&lt;p&gt;If Massachusetts pursues enriching our content, RDFa seems a more likely candidate. We prefer to adopt things that have been created and promulgated by standards bodies: they are more stable, the deliberative process surfaces and resolves problems beforehand, and are the only reliable basis for interoperability.&lt;/p&gt;
&lt;h2 id=&#34;7-comments&#34;&gt;7 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://alanhogan.com&#34; title=&#34;http://alanhogan.com&#34;&gt;Alan Hogan&lt;/a&gt; on &lt;a href=&#34;#comment-1698&#34;&gt;March 8, 2008 8:42 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I had no idea about JAWS&amp;rsquo; &lt;abbr&gt; policy. Thanks for the heads-up.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.isolani.co.uk/blog/&#34; title=&#34;http://www.isolani.co.uk/blog/&#34;&gt;Isofarro&lt;/a&gt; on &lt;a href=&#34;#comment-1701&#34;&gt;March 9, 2008 8:31 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The blog post, &lt;a href=&#34;http://www.isolani.co.uk/blog/access/ConfiguringLinksInScreenReaders&#34;&gt;Configuring links in screen readers&lt;/a&gt; wasn&amp;rsquo;t written by Joe Clark, it was written by me on my blog. See the &lt;a href=&#34;http://www.isolani.co.uk/&#34;&gt;homepage of the site&lt;/a&gt;, for example. More importantly, the argument against the anchor-based include pattern is the screenreader testing I did and published the results on Yahoo!s YUI blog, titled &lt;a href=&#34;http://yuiblog.com/blog/2008/01/23/empty-links/&#34;&gt;Empty links and screenreaders&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.isolani.co.uk/blog/&#34; title=&#34;http://www.isolani.co.uk/blog/&#34;&gt;Isofarro&lt;/a&gt; on &lt;a href=&#34;#comment-1704&#34;&gt;March 9, 2008 2:13 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&amp;ldquo;This leads me to believe that the microformats folks just do not care about accessibility to the extent that I need to.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;The work I did testing how screen readers deal with empty links was as a result of microformat folks approaching me and asking for assistance.&lt;/p&gt;
&lt;p&gt;My published findings made its way into the include pattern - this happened a month ago. The criticisms in this blog post about the include pattern are thus out of date.&lt;/p&gt;
&lt;p&gt;From that alone, its clear that the microformats community are aware of, and do care about, the accessibility issues.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://allinthehead.com/&#34; title=&#34;http://allinthehead.com/&#34;&gt;Drew McLellan&lt;/a&gt; on &lt;a href=&#34;#comment-1705&#34;&gt;March 9, 2008 2:29 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The issues highlighted here aren&amp;rsquo;t issues with microformats generally. If you have reservations about using the abbr or include patterns, there nothing is compelling you to use them. The use of microformats can be happily embraced without the need to use either pattern.&lt;/p&gt;
&lt;p&gt;You say that there have been no changes since May 2007, yet you link to some recent research into the matter by members of the community. Work continues in this area, but as there are no simple answers (as with many issue surrounding accessibility and the truly awful user agents in that space), there are no quick fixes.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://ben-ward.co.uk&#34; title=&#34;http://ben-ward.co.uk&#34;&gt;Ben Ward&lt;/a&gt; on &lt;a href=&#34;#comment-1706&#34;&gt;March 9, 2008 3:25 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;A few things:&lt;/p&gt;
&lt;p&gt;Your points on the include-pattern are incorrect, or at best out of date. At no time has the include pattern advocated an empty href attribute, it is always a local document reference. Furthermore, since February 12th 2008 the include-pattern has explicitly &lt;em&gt;forbidden&lt;/em&gt; the use of hyperlinks without inner-text. This was following the excellent research into the behaviour of empty links in assistive technology that Mike Davies lead (not Joe Clark).&lt;/p&gt;
&lt;p&gt;Further, whilst there are some truly outrageous misuses on the ABBR-pattern in the wild, you&amp;rsquo;ve drawn no distinction between techniques which are actually advocated by the microformats specifications, and techniques which are either brainstorms, or separate to microformats.org documentation. The use of ABBR in GEO that you cite, for example, is not part of the specification. It&amp;rsquo;s located on the &lt;a href=&#34;http://microformats.org/wiki/geo-brainstorming&#34;&gt;geo-brainstorming&lt;/a&gt; page, not as a recommended part of the spec.&lt;/p&gt;
&lt;p&gt;I hope I provide some indication that in fact the microformats community does care about fixing accessibility issues. The fact is that it&amp;rsquo;s a community of volunteers, and getting research done to support changes like this isn&amp;rsquo;t trivial. One of my priorities this year is to resolve the open accessibility issues, but it takes time.&lt;/p&gt;
&lt;p&gt;To see a piece like this come out with such criticisms is not in itself a problem (we have open issues, after all). However, to see it draw conclusions and state ‘fact’ based on out-of-date or out-right incorrect information, and fail to link the issues raised back to the microformats documentation is extremely frustrating.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-1707&#34;&gt;March 9, 2008 3:46 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Ben,&lt;/p&gt;
&lt;p&gt;I must take some blame for anything in Sarah&amp;rsquo;s piece being out of date; she sent it to me February 25, so the revision to the include pattern documentation was less than two weeks old when she wrote that.&lt;/p&gt;
&lt;p&gt;Also, wouldn&amp;rsquo;t the following use of abbr have the same problem when a JAWS reader reads the title value as the example she describes above?&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;lt;abbr class=&amp;quot;dtstart&amp;quot; title=&amp;quot;2007-10-05&amp;quot;&amp;gt;October 5&amp;lt;/abbr&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This is from the hcalendar specification at &lt;a href=&#34;http://microformats.org/wiki/hcalendar,&#34;&gt;http://microformats.org/wiki/hcalendar,&lt;/a&gt; not a brainstorming or in-the-wild page.&lt;/p&gt;
&lt;p&gt;By Sarah Bourne on &lt;a href=&#34;#comment-1714&#34;&gt;March 11, 2008 12:52 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I am delighted to hear that the microformats community is working on accessibility issues. I have been trying to keep up on developments in this area, but had found nothing to assure me that any progress was being made. In particular, I&amp;rsquo;ve been watching the microformats.org &lt;a href=&#34;http://microformats.org/wiki/accessibility&#34;&gt;Accessibility&lt;/a&gt;and &lt;a href=&#34;http://microformats.org/wiki/accessibility-issues&#34;&gt;Accessibility Issues&lt;/a&gt; pages for updates. Alas, those pages do not yet show the updated information on the include pattern. (Not complaining, just making an observation!) I love the idea of developing new things with a group of bright and motivated people, like the microformats folks, but it&amp;rsquo;s hard for those of us on the outside to know who&amp;rsquo;s who, and where we should be looking to get authoritative information. And although some of us are pretty smart people, there are people who may not grasp the nuances they should.&lt;/p&gt;
&lt;p&gt;Besides the legal imperative I mentioned, government sites have significant (ahem) &amp;ldquo;public relations&amp;rdquo; vulnerabilities. For instance:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A local paper published an article claiming a site was not accessible; the basis was a claim from one person that they felt the ALT text wasn&amp;rsquo;t good enough on two images &amp;hellip; on an 8,000 page site.&lt;/li&gt;
&lt;li&gt;We stopped using any client-side scripting on Mass.Gov, because we were spending too much time explaining how you really do accessibility testing to people telling us we had &amp;ldquo;failed Bobby&amp;rdquo;.&lt;/li&gt;
&lt;li&gt;Search the Internet for &amp;ldquo;ODF&amp;rdquo;, &amp;ldquo;accessibility&amp;rdquo; and &amp;ldquo;Massachusetts&amp;rdquo; in 2005&amp;hellip;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;br /&gt;
Like Caesar&amp;rsquo;s wife, any technology we use must be above suspicion. This is the context &amp;ldquo;care about accessibility to the extent that I need to&amp;rdquo; should be taken. I need to care a lot.&lt;br /&gt;
\&lt;/p&gt;
&lt;p&gt;@Isofarro: Please accept my apologies for the incorrect attribution. I rely heavily on folks like you who actually test things and share them with the rest of us; I am embarrassed that I screwed up giving you the credit you deserve. I&amp;rsquo;ve asked Bob to correct that for me.\&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2008">2008</category>
      
      <category domain="https://www.bobdc.com//categories/metadata">metadata</category>
      
    </item>
    
    <item>
      <title>An Apple eBook reader?</title>
      <link>https://www.bobdc.com/blog/an-apple-ebook-reader/</link>
      <pubDate>Wed, 05 Mar 2008 08:00:55 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/an-apple-ebook-reader/</guid>
      
      
      <description><div>John Markoff of the New York Times analyzes the clues.</div><div>&lt;p&gt;John Markoff has been one of the most respected tech journalists for a long, long time, so his &lt;a href=&#34;http://bits.blogs.nytimes.com/2008/03/03/reading-steve-jobs/index.html&#34;&gt;Reading Steve Jobs&lt;/a&gt; piece this week on potential clues that Apple is working on an eBook reader is worth a read for anyone interested in the eBook market. I don&amp;rsquo;t have anything to add to what he says, especially when he tells stories like this:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;At Macworld, when I asked Mr. Jobs about the idea of an iPod Touch in a larger &amp;ldquo;Safari Pad&amp;rdquo; format, he snapped at me, &amp;ldquo;I can&amp;rsquo;t talk about unannounced products.&amp;rdquo;&lt;/p&gt;
&lt;/blockquote&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2008">2008</category>
      
      <category domain="https://www.bobdc.com//categories/ebooks">ebooks</category>
      
    </item>
    
    <item>
      <title>DITA Topic Specialization</title>
      <link>https://www.bobdc.com/blog/dita-topic-specialization/</link>
      <pubDate>Tue, 04 Mar 2008 09:03:10 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/dita-topic-specialization/</guid>
      
      
      <description><div>Creating specialized topics for your content: an IBM developerWorks article.</div><div>&lt;p&gt;&lt;a href=&#34;http://www.ibm.com/developerworks/edu/x-dw-x-ditaspecial.html&#34;&gt;&lt;img src=&#34;https://www.bobdc.com/img/main/dw-home2.jpg&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;developerWorks logo&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Many great resources are available to explain the mechanics and syntax of specializing the standard DITA DTDs for your content—for example, Michael Priestley&amp;rsquo;s &lt;a href=&#34;http://www.ibm.com/developerworks/xml/library/x-dita2/&#34;&gt;Specializing topic types in DITA&lt;/a&gt; and Eliot Kimber&amp;rsquo;s &lt;a href=&#34;http://www.xiruss.org/tutorials/dita-specialization/&#34;&gt;DITA specialization&lt;/a&gt;. However, I didn&amp;rsquo;t see any that walk readers through the process of reviewing their existing content, evaluating its fit with the various DITA topic types, and then designing and building a DITA specialization around the needs and structure of their content, so I wrote the IBM developerWorks tutorial &lt;a href=&#34;https://web.archive.org/web/20111023155148/http://www.ibm.com/developerworks/xml/tutorials/x-ditaspecial/&#34;&gt;DITA topic specialization: Analyze your content and build a specialized DTD&lt;/a&gt;.&lt;/p&gt;
&lt;!-- that wayback machine URL doesn&#39;t show the whole thing, but the whole thing seems to be gone as of 2025-01-18. --&gt;
&lt;p&gt;That&amp;rsquo;s not exactly my original subtitle, but they just love that &lt;a href=&#34;http://en.wikipedia.org/wiki/Second-person_narrative&#34;&gt;second person voice&lt;/a&gt; at developerWorks, especially in the &lt;a href=&#34;http://en.wikipedia.org/wiki/Imperative_mood&#34;&gt;imperative&lt;/a&gt;.The article now has a lot more of that than I originally put there; maybe I should reread &lt;a href=&#34;http://www.amazon.com/Bright-Lights-Big-City-Mcinerney/dp/0394726413&#34;&gt;Bright Lights Big City&lt;/a&gt; before I write another tutorial for developerWorks.&lt;/p&gt;
&lt;p&gt;Nah, I don&amp;rsquo;t think so.&lt;/p&gt;
&lt;h2 id=&#34;2-comments&#34;&gt;2 Comments&lt;/h2&gt;
&lt;p&gt;By Pas B on &lt;a href=&#34;#comment-1708&#34;&gt;March 9, 2008 4:36 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;This tutorial raises the whole problem of IBM &amp;ldquo;tutorials&amp;rdquo;, for me. Since you obviously have some relationship with the people in control of this model of article presentation, I&amp;rsquo;ll briefly mention my concerns.&lt;/p&gt;
&lt;p&gt;The tutorials require an IBM account to access. That&amp;rsquo;s IBM&amp;rsquo;s choice, fine, but in as much as many of the tutorials seem to vary insignificantly from other IBM article content that is not account-restricted, what&amp;rsquo;s the point? For the reader, it&amp;rsquo;s just extra hassle (including an extra set of credentials to remember) to sign on.&lt;/p&gt;
&lt;p&gt;Worse, the authentication process is frustratingly involved. Accessing your article above, I first had to authenticate via a page form. Thereupon, I&amp;rsquo;m asked for the 16 bazillionth time to verify my personal information &amp;ndash; an extra scroll down and click-through, not to mention that it&amp;rsquo;s an unprompted echoing of my personal information over the wire. After this authentication, I&amp;rsquo;m prompted for ID and password TWICE MORE, each time my request redirects from one IBM server to another.&lt;/p&gt;
&lt;p&gt;I seem to recall having dumped a comment on this process, at least once before, into whatever designated IBM feedback mechanism, but to no effect. If you happen to read this and to agree that the above is cumbersome, could you please pass on this request to the relevant IBM parties, to &amp;ldquo;get a clue&amp;rdquo; about these &amp;ldquo;tutorials&amp;rdquo; and their annoying form of presentation (namely the annoying and cumbersome authentication)? Thanks very much.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-1709&#34;&gt;March 9, 2008 5:59 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Thanks, I&amp;rsquo;ve already passed it along to my editor at developerWorks.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2008">2008</category>
      
      <category domain="https://www.bobdc.com//categories/dita">DITA</category>
      
    </item>
    
    <item>
      <title>Batch processing of image files</title>
      <link>https://www.bobdc.com/blog/batch-processing-of-image-file/</link>
      <pubDate>Mon, 03 Mar 2008 08:49:10 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/batch-processing-of-image-file/</guid>
      
      
      <description><div>For free, with ImageMagick.</div><div>&lt;p&gt;&lt;a href=&#34;http://www.imagemagick.org&#34;&gt;&lt;img src=&#34;http://www.imagemagick.org/image/logo.jpg&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;some description&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I recently had 178 JPEG files that weren&amp;rsquo;t behaving: Firefox and IE couldn&amp;rsquo;t display them. &lt;a href=&#34;http://www.gimp.org/&#34;&gt;GIMP&lt;/a&gt; could read them and display them, and a GIMP Save As JPEG (after making no changes) created files that displayed properly in Firefox and IE, but I didn&amp;rsquo;t want to do that 178 times.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://www.imagemagick.org&#34;&gt;ImageMagick&lt;/a&gt; to the rescue. This free program, available for Windows, Mac, and Linux comes with &lt;a href=&#34;http://www.imagemagick.org/script/command-line-tools.php&#34;&gt;several command line utilities&lt;/a&gt;, but the &lt;code&gt;convert&lt;/code&gt; one is all I&amp;rsquo;ve ever needed. It has dozens and dozens of &lt;a href=&#34;http://www.imagemagick.org/script/command-line-options.php&#34;&gt;command-line options&lt;/a&gt;, but I didn&amp;rsquo;t even need any to fix these JPEGs. I just converted each to a BMP and then back, like this,&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;convert myfile.jpg myfile.bmp
convert myfile.bmp myfile.jpg
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;and then I had a JPEG file that displayed fine in the web browsers. To do this 178 times in Windows, I created a batch file that looked like this&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;convert %1.jpg %1.bmp
convert %1.bmp %1.jpg
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;and then used the output of &lt;code&gt;dir /b *.jpg&lt;/code&gt; to create a batch file that called the two-line batch file for each image file.&lt;/p&gt;
&lt;p&gt;If you need to process batches of images, it&amp;rsquo;s really worth looking through the command line options it offers, and of course the other ImageMagick utilities as well.&lt;/p&gt;
&lt;p&gt;Since writing all that, I&amp;rsquo;ve found a possibly great new feature of ImageMagick: the ability to &lt;a href=&#34;http://www.imagemagick.org/pipermail/magick-announce/2007-July/000036.html&#34;&gt;add XMP metadata&lt;/a&gt; to image files. (XMP is a profile of RDF from Adobe that has a lot of potential; I&amp;rsquo;ve &lt;a href=&#34;https://www.bobdc.com/blog/using-or-not-using-adobes-xmp&#34;&gt;complained here before&lt;/a&gt; about the lack of free command-line tools for adding and extracting XMP metadata from binary files.) Based on my tests and &lt;a href=&#34;http://www.wizards-toolkit.org/discourse-server/viewtopic.php?f=1&amp;amp;t=8712&#34;&gt;an online problem description&lt;/a&gt;, I don&amp;rsquo;t think this feature works properly yet. If anyone knows how to get ImageMagick or any other free command-line tool to add arbitrary metadata to image and PDF files (and of course, to extract it) please let me know.&lt;/p&gt;
&lt;h2 id=&#34;8-comments&#34;&gt;8 Comments&lt;/h2&gt;
&lt;p&gt;By perusio on &lt;a href=&#34;#comment-1687&#34;&gt;March 3, 2008 9:51 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Hmm, this seems to fit the bill (PDF+image files)&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://www.sno.phy.queensu.ca/~phil/exiftool/&#34;&gt;http://www.sno.phy.queensu.ca/~phil/exiftool/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s in Perl.&lt;/p&gt;
&lt;p&gt;HTH&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-1688&#34;&gt;March 3, 2008 10:10 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Thanks Perusio, This looks great! At first &lt;a href=&#34;http://www.sno.phy.queensu.ca/~phil/exiftool/TagNames/XMP.html&#34;&gt;http://www.sno.phy.queensu.ca/~phil/exiftool/TagNames/XMP.html&lt;/a&gt; made me think that it could only handle predefined XMP fields, but &lt;a href=&#34;http://www.sno.phy.queensu.ca/~phil/exiftool/config.html&#34;&gt;http://www.sno.phy.queensu.ca/~phil/exiftool/config.html&lt;/a&gt; shows how to define your own metadata fields and store them as XMP. (I&amp;rsquo;m a big believer in the ability to store arbitrary metadata.) I will certainly be playing with this and reporting back on it.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.ccil.org/~cowan&#34; title=&#34;http://www.ccil.org/~cowan&#34;&gt;John Cowan&lt;/a&gt; on &lt;a href=&#34;#comment-1689&#34;&gt;March 3, 2008 10:40 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Dude, that would &lt;em&gt;really really&lt;/em&gt; be a case for getting Cygwin, if only so you can do a shell for-loop. (I, of course, download Cygwin first thing onto any Windows computer I have to deal with.)&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-1690&#34;&gt;March 3, 2008 11:38 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Hi John,&lt;/p&gt;
&lt;p&gt;When I set up a Windows machine, I add Firefox, Emacs, and Cygwin before I even change the desktop wallpaper, but I never looked too closely at Cygwin&amp;rsquo;s scripting. Looks like I should.\&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.dpawson.co.uk&#34; title=&#34;http://www.dpawson.co.uk&#34;&gt;Dave Pawson&lt;/a&gt; on &lt;a href=&#34;#comment-1691&#34;&gt;March 3, 2008 11:47 AM&lt;/a&gt;&lt;/p&gt;
&lt;pre&gt;&lt;code&gt; for f in *.CR2
      do
       echo Processing $f
       exiftool -exif:OwnerName=&amp;quot;Dave Pawson&amp;quot; -exif:Copyright=&amp;quot;Dave Pawson 2008&amp;quot; $f
       echo Backup  $f&#39;_&#39;original removed
       rm $f&#39;_&#39;original
      done
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;exiftool would repay some investigation.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.albinblaschka.info&#34; title=&#34;http://www.albinblaschka.info&#34;&gt;Albin&lt;/a&gt; on &lt;a href=&#34;#comment-1692&#34;&gt;March 3, 2008 12:49 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Hello,&lt;/p&gt;
&lt;p&gt;in the Imagemagick package there is the command mogrify, using the same syntax as convert, but if you use wildcards in the filename, it will go through all files in the dir&amp;hellip; so, for your case\&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;mogrify -format bmp *.jpg
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;br /&gt;
and then\&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;mogrify -format jpg *.bmp
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;\&lt;/p&gt;
&lt;p&gt;would have done the same as you showed&amp;hellip;&lt;/p&gt;
&lt;p&gt;Albin&lt;/p&gt;
&lt;p&gt;By Tom Passin on &lt;a href=&#34;#comment-1694&#34;&gt;March 3, 2008 11:32 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Oh, c&amp;rsquo;mon, you can do it with a Windows cmd file, too (this may wrap in the textarea) -&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;for %%v in (*.CR2) do echo Processing %%v &amp;amp; exiftool


  -exif:OwnerName=&amp;quot;Dave Pawson&amp;quot; -exif:Copyright=&amp;quot;Dave Pawson 2008&amp;quot; %%v &amp;amp; 


  del %%v_original
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;\&lt;/p&gt;
&lt;p&gt;Not that I&amp;rsquo;m agin *ix. I&amp;rsquo;ve been running a lot of them in virtual machines lately.\&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-1695&#34;&gt;March 4, 2008 10:25 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Thanks Dave and Tom! (All note that Tom&amp;rsquo;s command should be written out as one line.)&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2008">2008</category>
      
      <category domain="https://www.bobdc.com//categories/neat-tricks">neat tricks</category>
      
      <category domain="https://www.bobdc.com//categories/publishing">publishing</category>
      
    </item>
    
    <item>
      <title>An eBook with free updates, or a bound version from a major publisher?</title>
      <link>https://www.bobdc.com/blog/an-ebook-with-free-updates-or/</link>
      <pubDate>Thu, 28 Feb 2008 09:02:05 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/an-ebook-with-free-updates-or/</guid>
      
      
      <description><div>Ken Holman discusses his successful eight-year experiment with eBooks.</div><div>&lt;p&gt;&lt;em&gt;I was pleasantly surprised to learn recently that almost eight years after buying a PDF eBook version of XML pioneer &lt;a href=&#34;http://www.cranesoftwrights.com/bio/gkholman.htm&#34;&gt;Ken Holman&amp;rsquo;s&lt;/a&gt; book &lt;a href=&#34;http://www.CraneSoftwrights.com/links/books-bdc.htm&#34;&gt;Practical Transformation Using XSLT and XPath&lt;/a&gt;, I am now entitled to a free upgrade to the thirteenth edition, which covers XSLT 2.0 and XPath 2.0. While it&amp;rsquo;s not &lt;a href=&#34;http://www.amazon.com/exec/obidos/ISBN=1930110111/bobducharmeA/&#34;&gt;the first book I&amp;rsquo;d recommend&lt;/a&gt; to an XSLT beginner (keep in mind that I&amp;rsquo;m biased), it&amp;rsquo;s an excellent reference work.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Ken also published this and his &lt;a href=&#34;http://www.amazon.com/Definitive-XSL-FO-Charles-Goldfarb-XML/dp/0131403745&#34;&gt;Definitive XSL-FO&lt;/a&gt; as bound, hardcopy books in the same Prentice Hall Series as my &lt;a href=&#34;http://www.amazon.com/exec/obidos/ISBN=0130826766/bobducharmeA/&#34;&gt;XML: The Annotated Specification&lt;/a&gt;. I asked him how well his decision to sell PDFs of his XSLT and XSL-FO books online directly to his customers had worked out, and how it compared with his experience with the Prentice Hall version. The following is his response.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://www.CraneSoftwrights.com/links/books-bdc.htm&#34;&gt;&lt;img src=&#34;https://www.bobdc.com/img/main/Crane-Logo-Only.jpg&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;Crane Softwrights logo&#34; width=&#34;180px&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The decision to go to paper was secondary. The primary decision was to go electronic so that we could offer the perpetual free updates to all future editions. Specifications don&amp;rsquo;t move that quickly, but how people use specifications does evolve and mature. The first 12 editions of the XSLT book only covered XSLT 1.0, but the editions reflected changing practices. Customers of the first edition in March 1999 have just received their 12th free update to the 13th edition which now includes all of XSLT 2.0. We just published the 7th edition fo the XSL-FO book which now includes all of XSL-FO 1.1.&lt;/p&gt;
&lt;p&gt;As a user of specifications I figured any paper book I would buy would be out of date before long. I didn&amp;rsquo;t want my book to have that experience with our customers. The perpetual purchase is a feature of the electronic book that cannot be offered with the paper publication. Some peers have criticized our choice to make the updates free as a lost opportunity for revenue. Again, putting myself in my customer&amp;rsquo;s shoes, I didn&amp;rsquo;t want to be charged for updates since there isn&amp;rsquo;t any overhead in sending out an email notification of an updated edition. Our customers have appreciated the &amp;ldquo;live&amp;rdquo; feeling of always having an up-to-date publication for the cost of the original purchase. The more editions they get for free, the more cost effective their original purchase becomes compared to a paper book purchase.&lt;/p&gt;
&lt;p&gt;The decision to go to paper was a favour to the series editor Charles Goldfarb who wanted to include the XSLT book in his &amp;ldquo;Definitive&amp;rdquo; series. Prentice Hall only has the print rights. And since the same document model was used for the XSL-FO content, it was a quick project to bring the XSL-FO book to paper in the same series. All other electronic rights and other uses were retained by Crane, and we are in the planning stages for new product offerings based on the same content, probably being announced late Q2&#39;2008.&lt;/p&gt;
&lt;p&gt;Another big benefit of the electronic PDF sale is the flexibility in licensing. We have worldwide staff licensees. These customers pay a one-time fee for the original purchase of the book and they mount the PDF files on a private intranet server for staff use only. We do not need to be informed of the number of copies that get used internally. These customers also get the perpetual updates to all future editions, they just retrieve and mount the revisions up on their servers. For a big example, all US Government employees of all departments of all offices around the world have perpetual access to the one staff purchase of each of the two XSL titles we have. We don&amp;rsquo;t know how many hundreds or thousands of copies may be being used.&lt;/p&gt;
&lt;p&gt;One customer drawback to the electronic format is forgetting to inform us of email address changes. Announcing this last free edition revealed hundreds of dead email addresses, so there are customers out there who are due their most recent copy of the book but we can&amp;rsquo;t tell them about it. Hopefully they&amp;rsquo;ll come back to us in time and request their copy &amp;hellip; we&amp;rsquo;d be glad to keep them up to date.&lt;/p&gt;
&lt;p&gt;From a revenue aspect, we have received many many times more revenue from PDF book sales than from paper book royalties. And the PDF book sales are continuing while the paper book royalties have tailed off. There isn&amp;rsquo;t much market now for version 1.0 of these technologies, whereas the PDF books now include all of XSLT 2.0 and XSL-FO 1.1.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2008">2008</category>
      
      <category domain="https://www.bobdc.com//categories/ebooks">ebooks</category>
      
    </item>
    
    <item>
      <title>Simple flowcharts in Excel</title>
      <link>https://www.bobdc.com/blog/simple-flowcharts-in-excel/</link>
      <pubDate>Mon, 25 Feb 2008 09:02:22 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/simple-flowcharts-in-excel/</guid>
      
      
      <description><div>And OpenOffice Calc.</div><div>&lt;p&gt;A co-worker recently told me that she needed to create a flowchart but didn&amp;rsquo;t have Visio. She knew that I had it, but I played dumb. I told her how I&amp;rsquo;d recently learned to make simple flowcharts in PowerPoint, and recommended that she try that, but she needed more of a &lt;a href=&#34;http://en.wikipedia.org/wiki/Swimlane&#34;&gt;swimlane&lt;/a&gt; diagram, which would have been difficult in PowerPoint.&lt;/p&gt;
&lt;img src=&#34;https://www.bobdc.com/img/main/excelflowcharts.jpg&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;Excel and Calc icons&#34;/&gt;
&lt;p&gt;I then remembered that she and I had both recently received an Excel spreadsheet with some fairly complex workflow diagrams, one of which had a swimlane-like arrangement. Looking at it, I saw that I could select and drag the boxes and arrows, but I didn&amp;rsquo;t see how to add new ones.&lt;/p&gt;
&lt;p&gt;It turned out to be so obvious that OpenOffice&amp;rsquo;s Calc spreadsheet program does it the same way: Select &lt;strong&gt;Toolbars&lt;/strong&gt; from the &lt;strong&gt;View&lt;/strong&gt; menu, then &lt;strong&gt;Drawing&lt;/strong&gt; from the cascade menu. This adds a toolbar at the bottom with far more choices than you need. You can use this toolbar to add labeled boxes and arrows connecting these boxes to a blank spreadsheet or to a spreadsheet that already has numbers and text on it. The first time I tried this, the guy sitting next to me on the airplane—who hadn&amp;rsquo;t said a word to me the whole trip—said &amp;ldquo;I&amp;rsquo;ve never seen flow charts in Excel before!&amp;rdquo;&lt;/p&gt;
&lt;p&gt;On the Excel toolbar, the little square with the letter &amp;ldquo;A&amp;rdquo; and some horizontal lines lets you create resizable labeled boxes on the spreadsheet, and the icon of an arrow pointing to the lower-right lets you create arrows on the diagram. Once you&amp;rsquo;ve created either, selecting and then right-clicking one of these displays a menu that lets you customize its appearance.&lt;/p&gt;
&lt;p&gt;On Calc&amp;rsquo;s toolbar, the plain blue rectangle lets you add a resizable box to your spreadsheet, and double-clicking your new box lets you add a label to it. The Calc drawing toolbar has no icon for arrows, but if you use the one for lines you can then right-click a line and add arrow heads and adjust other properties.&lt;/p&gt;
&lt;p&gt;I thought that Visio was a big, slow, bloated program even before Microsoft bought the company. I guess we need it even less than many people think.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2008">2008</category>
      
      <category domain="https://www.bobdc.com//categories/neat-tricks">neat tricks</category>
      
    </item>
    
    <item>
      <title>Managing digital rights in the publishing world</title>
      <link>https://www.bobdc.com/blog/managing-digital-rights-in-the/</link>
      <pubDate>Thu, 21 Feb 2008 07:59:21 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/managing-digital-rights-in-the/</guid>
      
      
      <description><div>As opposed to enforcing them.</div><div>&lt;p&gt;When people talk about Digital Rights Management, or DRM, the real subject of their discussion is usually Digital Rights Enforcement. The basic use case seems to be how to prevent a teenage boy from duping &amp;ldquo;Rush Hour II&amp;rdquo; for his friends, or some variation thereof—what you add to the DVD storing the movie, what you add to the players to check on the DRE components of the DVD, and so forth.&lt;/p&gt;
&lt;p&gt;The &lt;em&gt;management&lt;/em&gt; of digital rights, as opposed to their enforcement, is a real problem in the publishing industry, but discussion is usually drowned out by the shouting matches about digital rights enforcement. Here&amp;rsquo;s a typical use case: an editor wants to take an article with six pictures from her magazine&amp;rsquo;s print edition and put it online. Two of the pictures come from a cookbook being reviewed by the article, two were shot for the article by a freelancer, and two come from a stock photo house. Which images can the editor use in the online version?&lt;/p&gt;
&lt;blockquote class=&#34;pullquote&#34; style=&#39;width: 190px; font: bold 1.333em/1.125em &#34;Helvetica Neue&#34;, Helvetica, Arial, sans-serif; margin: 1.5em 0 1.5em 1.5em !important; padding: 0.6em 5px !important; background: none !important; border: 3px double \#ddd; border-width: 3px 0; text-align: center; float: right; &#39;&gt;**Does anyone know of a straightforward standard for a content provider to specify re-use rights for a work to a publishing industry business partner?** &lt;/blockquote&gt;
&lt;p&gt;Unlike the mythical teenage boy, a publisher has an ongoing relationship to maintain with the content suppliers and doesn&amp;rsquo;t want to jeopardize any of these relationships by avoiding extra payments. The difficulty is simply digging up the terms of re-use so that the editor knows which images are available to put with the article on the website. When looking this up takes too much time, it affects the publication schedule.&lt;/p&gt;
&lt;p&gt;Whether you build or buy a system to track this, there are three basic approaches, but first, a note on software: there are vendors who will tell you &amp;ldquo;our fabulous product takes care of all that! Simply check in the pictures or other content and enter the re-use terms, and then you can look it up any time!&amp;rdquo; I&amp;rsquo;m not interested in this unless the software can read and write the re-use terms in a standard format whose specs are independent of the software. (As we&amp;rsquo;ll see below, this is easier said than done.)&lt;/p&gt;
&lt;p&gt;The devil is in the details of how you enter the re-use terms. Of the three approaches, the &lt;a href=&#34;http://www.prismstandard.org/&#34;&gt;PRISM&lt;/a&gt; standard for magazine metadata offers variations on each, so I&amp;rsquo;ll refer to that when I need examples, although only the second option below demonstrates a technique that is specific to PRISM.&lt;/p&gt;
&lt;h2 id=&#34;fOfsDN8RT3m37Lj8j440Ug&#34;&gt;Option 1: a slot for a natural language description&lt;/h2&gt;
&lt;p&gt;In this scenario, you write out one or more sentences describing what you can do with the work. This could be stored in a a relational database, a rights tracking package that you bought from a vendor, or in some XML. You may have the option of storing this inside the work itself, especially if it&amp;rsquo;s in XML. The PRISM standard uses the Dublin Core &lt;a href=&#34;http://dublincore.org/documents/dces/#rights&#34;&gt;rights&lt;/a&gt; field as the name of this element, which is a fine idea.&lt;/p&gt;
&lt;p&gt;The problem here is that you leave it up to the person writing out the sentences to either accurately copy the agreement terms or to paraphrase them properly, and that leaves room for error.&lt;/p&gt;
&lt;h2 id=&#34;sJzrbt8vTb-fcYs8YObwzw&#34;&gt;Option 2: a set of fields to store specific re-use parameters&lt;/h2&gt;
&lt;p&gt;The good news: it&amp;rsquo;s easier to automate the processing of this information when, for example, a system puts an image on a website. The bad news: what fields do you use to store the information you want to track? There are a lot of parameters in these agreements. What standards are out there? How well do they fit with your needs? When a stock photo house supplies images to a magazine, there will be one set of information to track; when that magazine supplies an illustrated article to an aggregator, that relationship will be governed by some of the same pieces of information, but also by some different ones.&lt;/p&gt;
&lt;p&gt;PRISM has defined a few fields such as embargoDate and expirationDate for this. These can be stored inside of a dc:rights element, or anywhere else for that matter. If stored in a relational database, it would be useful metadata to indicate somewhere that the &amp;ldquo;embargoDate&amp;rdquo; field means what the PRISM standard says it means and not someone else&amp;rsquo;s slightly different concept of the same term. (I know some people hate namespaces, but they&amp;rsquo;re awfully useful sometimes&amp;hellip;) The PRISM standard will not have everything you need, and I know that arriving at the few re-use rights fields that it does have was the result of a great deal of work.&lt;/p&gt;
&lt;h2 id=&#34;LpA2vG5QRK6I5qEteIcAhA&#34;&gt;Option 3: point to an image of the official document&lt;/h2&gt;
&lt;p&gt;This is my favorite in terms of bang for buck, because the lack of data entry involved means less work and less room for error, and there&amp;rsquo;s no issue about which collection of information fields to track. If you already have your images in a Digital Asset Management system, then tracking images of the agreements that govern those images won&amp;rsquo;t put much extra strain on your system. Scan the agreement and give it an identifier. The PRISM Introduction document includes the following example to identify the file with the contract governing the use of the image Corfu.jpg:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;  &amp;lt;rdf:Description rdf:about=&amp;quot;http://wanderlust.com/2000/08/Corfu.jpg&amp;quot;&amp;gt; 
    &amp;lt;dc:rights rdf:resource=&amp;quot;http://PhillyPhantasyPhotos.com/terms/Contract39283.doc&amp;quot;/&amp;gt; 
  &amp;lt;/rdf:Description&amp;gt; 
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I consider a Word doc file a bit too mutable for an official record of a legal document, but the ID could just as easily point to a TIFF or JPG file of a scanned contract:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;  &amp;lt;rdf:Description rdf:about=&amp;quot;http://wanderlust.com/2000/08/Corfu.jpg&amp;quot;&amp;gt; 
    &amp;lt;dc:rights rdf:resource=&amp;quot;http://somepath/in/our/intranet/Contract39283.jpg&amp;quot;/&amp;gt; 
  &amp;lt;/rdf:Description&amp;gt; 
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The information need not be stored in RDF/XML either—you could put it in a relational database, a rights tracking package, XMP embedded in the image, or whatever you like—but you have to admit, RDF/XML isn&amp;rsquo;t always an ugly mess, and what we see above is pretty straightforward.&lt;/p&gt;
&lt;h2 id=&#34;QDUdISYqRcKhAv3ajYbInw&#34;&gt;Standards&lt;/h2&gt;
&lt;p&gt;As you&amp;rsquo;ve seen, PRISM is one to consider. I think that the OASIS &lt;a href=&#34;http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=xacml&#34;&gt;XACML&lt;/a&gt; standard also looks good (pcmag.com has a &lt;a href=&#34;http://www.pcmag.com/encyclopedia_term/0,2542,t=XACML&amp;amp;i=54987,00.asp&#34;&gt;nice brief summary&lt;/a&gt; of what XACML IS about; wouldn&amp;rsquo;t it be great if the home page for each OASIS standard had such a summary?), and an &lt;a href=&#34;http://code.google.com/p/enterprise-java-xacml/&#34;&gt;open source XACML engine&lt;/a&gt; has just appeared on Google Code, but XACML&amp;rsquo;s flexibility may have turned it into something that&amp;rsquo;s a bit too abstract for people in the publishing industry. Although it&amp;rsquo;s been around for at least five years, the existence of free working code could now lay the groundwork for someone to build something specific to the publishing industry&amp;rsquo;s needs.&lt;/p&gt;
&lt;p&gt;At the &lt;a href=&#34;http://www.online-information.co.uk/index.html&#34;&gt;Online Information 2007&lt;/a&gt; conference and trade show in London in December, I first heard about &lt;a href=&#34;http://www.the-acap.org/&#34;&gt;ACAP&lt;/a&gt;. This nascent standard seems more concerned with standardizing content access policies for web crawlers and search engines than in publishing industry B2B relationships, but as with many standards that are related to your interests, an ACAP advocate that I met said &amp;ldquo;you could use it for that too!&amp;rdquo;&lt;/p&gt;
&lt;p&gt;Does anyone know of a straightforward standard for a content provider to specify re-use rights for a work (an image, text content, or a combination like my magazine article example) to a publishing industry business partner?&lt;/p&gt;
&lt;h2 id=&#34;4-comments&#34;&gt;4 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.griffinbrown.co.uk/&#34; title=&#34;http://www.griffinbrown.co.uk/&#34;&gt;Alex Brown&lt;/a&gt; on &lt;a href=&#34;#comment-1643&#34;&gt;February 21, 2008 8:51 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Take a look at &lt;a href=&#34;http://www.editeur.org/onix_licensing.html.&#34;&gt;http://www.editeur.org/onix_licensing.html.&lt;/a&gt; This is enjoying some traction.&lt;/p&gt;
&lt;p&gt;I wouldn&amp;rsquo;t call it straightforward though!&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Alex.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-1644&#34;&gt;February 21, 2008 10:05 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Alex,&lt;/p&gt;
&lt;p&gt;Thanks, it does look promising, and apparently I&amp;rsquo;ve even &lt;a href=&#34;https://www.bobdc.com/blog/navigating-the-library-metadat&#34;&gt;mentioned it&lt;/a&gt; before.&lt;/p&gt;
&lt;p&gt;Bob\&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://rhizomik.net/~roberto&#34; title=&#34;http://rhizomik.net/~roberto&#34;&gt;Roberto García&lt;/a&gt; on &lt;a href=&#34;#comment-1653&#34;&gt;February 23, 2008 5:20 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;There is the &lt;a href=&#34;http://rhizomik.net/ontologies/copyrightonto&#34;&gt;Copyright Ontology&lt;/a&gt; that is intended for Digital Rights Management (no enforcement).&lt;br /&gt;
And a PhD thesis devoted to it: &lt;a href=&#34;http://rhizomik.net/~roberto/thesis&#34;&gt;A Semantic Web Approach to Digital Rights Management&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-1654&#34;&gt;February 23, 2008 7:56 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Thanks Roberto! Is this ontology being used by any publishers?&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2008">2008</category>
      
      <category domain="https://www.bobdc.com//categories/publishing">publishing</category>
      
    </item>
    
    <item>
      <title>Finding an eBook audience</title>
      <link>https://www.bobdc.com/blog/finding-an-ebook-audience/</link>
      <pubDate>Mon, 18 Feb 2008 11:03:00 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/finding-an-ebook-audience/</guid>
      
      
      <description><div>Housewives reading bodice-rippers?</div><div>&lt;p&gt;&lt;a href=&#34;http://www.eharlequin.com/&#34;&gt;&lt;img src=&#34;https://www.bobdc.com/img/main/pinksony.jpg&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;Pink Harlequin Sony Reader&#34; width=&#34;200px&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;When discussing eBooks, some people think they&amp;rsquo;re making a clever point by announcing that they&amp;rsquo;d never want to curl up by the fire with an eBook reader. This is just useless, because the idea of eBooks replacing all bound paper books was never up for consideration. I&amp;rsquo;m sure that if I want to lie on the beach in 2018 and read Don Quixote, I&amp;rsquo;ll be reading a paperback. However, a new electronic medium can replace certain uses of bound books, and the trick in finding opportunities is to identify them.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve worked with electronic delivery of content since I was at &lt;a href=&#34;http://ria.thomson.com/&#34;&gt;RIA&lt;/a&gt; in New York thirteen years ago, converting SGML to CD-ROMs of primary and secondary tax law for accountants and tax lawyers. Since &lt;a href=&#34;http://en.wikipedia.org/wiki/LexisNexis&#34;&gt;Lexis&lt;/a&gt; went online in 1973, finding an audience that will benefit from a new delivery format has usually meant finding people in a specific job title with certain kinds of work to do. Whether you&amp;rsquo;re a lawyer looking up legal decisions or a maintenance engineer crawling around the engine of a jumbo jet, you often need access to a page or two from a large amount of content, so some sort of electronic delivery is often more convenient than getting to (and for the publisher, updating) a big set of bound books for such information.&lt;/p&gt;
&lt;p&gt;So, people with specific technical needs have been popular target markets for new electronic media. But what could be less technical than housewives reading romance novels? At the recent &lt;a href=&#34;http://en.oreilly.com/toc2008/public/content/home&#34;&gt;O&amp;rsquo;Reilly Tools of Change&lt;/a&gt; conference in New York City, the talk by &lt;a href=&#34;http://en.oreilly.com/toc2008/public/schedule/detail/1075&#34;&gt;Brent Lewis&lt;/a&gt; of Harlequin Enterprises really opened up everyone&amp;rsquo;s assumptions of what could be done in the eBook market.&lt;/p&gt;
&lt;p&gt;I had to leave his talk before it ended to call in to a meeting, but I managed to catch Brent on the exhibit floor the next day. Harlequin started off with nine eBook titles per month in October of 2005, and by September of last year they were publishing their entire frontlist in eBook format as well as paper format. That&amp;rsquo;s a lot of books: over 120 per month. They&amp;rsquo;ve done a lot of slice-and-dice eBooks as well, compiling backlist greatest hits (&lt;a href=&#34;http://ebooks.eharlequin.com/2F96971A-E0AE-46FB-8703-5F64316EAB00/10/126/en/ContentDetails.htm?ID=62AF08E1-340D-4C21-B3F6-94E7AA004FA2&#34;&gt;Out-of-Print Gems&lt;/a&gt;), seasonal compilations (&lt;a href=&#34;http://ebooks.eharlequin.com/74617B8D-DA9C-49A4-8B62-F3DC973010B7/10/126/en/ContentDetails.htm?ID=47C52309-BA17-4C05-B433-8473B0B0558B&#34;&gt;Stocking Stuffers&lt;/a&gt;) and themed compilations such as the &lt;a href=&#34;http://ebooks.eharlequin.com/74617B8D-DA9C-49A4-8B62-F3DC973010B7/10/126/en/ContentDetails.htm?ID=FB924224-D207-40C4-B093-4CA2CA9B69F7&#34;&gt;Bundle of Brides&lt;/a&gt; title.&lt;/p&gt;
&lt;p&gt;Lewis said that their books are available in &amp;ldquo;all six formats,&amp;rdquo; which I assume means Adobe Digital Editions, Microsoft Reader, and Mobipicket (according to their &lt;a href=&#34;http://ebooks.eharlequin.com/74617B8D-DA9C-49A4-8B62-F3DC973010B7/10/126/en/Help-FAQ-General.htm&#34;&gt;eBook Boutique help&lt;/a&gt;—and don&amp;rsquo;t miss this page, which can teach everyone plenty about how to acclimate a suspicious audience to eBook usage), &lt;a href=&#34;http://www.kindlereport.com/ebooks/free-harlequin-ebooks-through-to-jan-1-2008.html&#34;&gt;Kindle&lt;/a&gt;, &lt;a href=&#34;http://www.powells.com/cgi-bin/biblio?inkey=93-9781426813368-0&#34;&gt;Palm format&lt;/a&gt;, and the Sony Reader. For Valentine&amp;rsquo;s day, they even worked with Sony on a &lt;a href=&#34;http://www.trashionista.com/2008/02/harlequins-vale.html&#34;&gt;special pink version of the Sony Reader&lt;/a&gt; bundled with 14 romance novels on it.&lt;/p&gt;
&lt;p&gt;Almost two years ago they brought the Japanese trend of serializing novels to mobile phones to the United States with their &lt;a href=&#34;http://www.eharlequin.com/store.html?cid=425&#34;&gt;Harlequin On the Go&lt;/a&gt;™ series, sending 500 words a day to Sprint and Verizon customers who subscribe to this service. They also distribute through &lt;a href=&#34;http://www.dailylit.com/tags/Harlequin&#34;&gt;dailylit.com&lt;/a&gt;, which serializes books via RSS and email.&lt;/p&gt;
&lt;p&gt;In my own talk about &lt;a href=&#34;http://en.oreilly.com/toc2008/public/schedule/detail/2599&#34;&gt;Strategies for entering the eBook market&lt;/a&gt; the day after Brent&amp;rsquo;s talk, I pointed out how Harlequin should be an inspiration for anyone thinking about a move into the eBook market. What they did, who they did it with, and how they did it is worth a close look to anyone who wants to learn about the potential for eBooks.&lt;/p&gt;
&lt;h2 id=&#34;2-comments&#34;&gt;2 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.ccil.org/~cowan&#34; title=&#34;http://www.ccil.org/~cowan&#34;&gt;John Cowan&lt;/a&gt; on &lt;a href=&#34;#comment-1638&#34;&gt;February 18, 2008 12:07 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;All formats, that is, except honest PDF.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-1639&#34;&gt;February 18, 2008 12:31 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;John,&lt;/p&gt;
&lt;p&gt;This brings up an interesting point: their help page refers to &amp;ldquo;Adobe® Reader®/Adobe® Digital Editions format&amp;rdquo;, but in the &lt;a href=&#34;http://en.oreilly.com/toc2008/public/schedule/detail/2140&#34;&gt;Digital Publishing Beyond eBooks&lt;/a&gt; talk by Adobe&amp;rsquo;s Bill McCoy just before mine, he described how ADE can read both PDF and epub files, and how Adobe is basing a lot of strategy around these two formats. The eHarlequin page doesn&amp;rsquo;t say which they&amp;rsquo;re using, but it does describe their use of PDFs with DRM (as opposed to what I assume you mean by &amp;ldquo;honest PDF&amp;rdquo;.) On the other hand, the new president of the &lt;a href=&#34;http://www.idpf.org/&#34;&gt;organization responsible for epub&lt;/a&gt; came there from Harlequin, so they must find it attractive for something.\&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2008">2008</category>
      
      <category domain="https://www.bobdc.com//categories/ebooks">ebooks</category>
      
    </item>
    
    <item>
      <title>Last day to submit for LinkedData Planet</title>
      <link>https://www.bobdc.com/blog/last-day-to-submit-for-linkedd/</link>
      <pubDate>Fri, 15 Feb 2008 09:01:01 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/last-day-to-submit-for-linkedd/</guid>
      
      
      <description><div>Come join an illustrious group.</div><div>&lt;p&gt;&lt;a href=&#34;http://www.linkeddataplanet.com/speak.php&#34;&gt;&lt;img src=&#34;http://www.linkeddataplanet.com/images/hdr_logo.jpg&#34; border=&#34;0&#34; align=&#34;right&#34; hspace=&#34;30px&#34; width=&#34;240px&#34; vspace=&#34;30px&#34; alt=&#34;LinkedData Planet conference&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I &lt;a href=&#34;https://www.bobdc.com/blog/the-linkeddata-planet-conferen&#34;&gt;wrote earlier&lt;/a&gt; about the &lt;a href=&#34;http://www.linkeddataplanet.com&#34;&gt;LinkedData Planet&lt;/a&gt; Conference and Expo that Ken North and I are co-chairing in New York City on June 17th and 18th. It&amp;rsquo;s all coming together very quickly, and we have a lot of great speakers lined up. In addition to their talks, I&amp;rsquo;m looking forward to the panels we&amp;rsquo;ll hold in which groups of them discuss the building of Linked Data applications.&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s not too late to &lt;a href=&#34;http://www.linkeddataplanet.com/speak.php&#34;&gt;submit a proposal&lt;/a&gt; just yet, but today is the deadline, so if you&amp;rsquo;ve been jotting down any ideas and haven&amp;rsquo;t submitted them yet, today is your last chance to do so.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2008">2008</category>
      
      <category domain="https://www.bobdc.com//categories/metadata">metadata</category>
      
    </item>
    
    <item>
      <title>If content isn&#39;t king, what is?</title>
      <link>https://www.bobdc.com/blog/if-content-isnt-king-what-is/</link>
      <pubDate>Wed, 13 Feb 2008 15:04:57 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/if-content-isnt-king-what-is/</guid>
      
      
      <description><div>And how can you turn it into a catchy slogan?</div><div>&lt;p&gt;I&amp;rsquo;m writing this from the third day of the &lt;a href=&#34;http://en.oreilly.com/toc2008/public/content/home&#34;&gt;O&amp;rsquo;Reilly Tools of Change&lt;/a&gt; publishing conference, and I&amp;rsquo;ll have a lot to say in the coming weeks about ideas I&amp;rsquo;ve had here. I wanted to start with a theme from the opening keynote speeches: whether content is king, and if not, what is.&lt;/p&gt;
&lt;p&gt;Monday&amp;rsquo;s first keynote speaker was SirsiDynix VP of Innovation &lt;a href=&#34;http://en.oreilly.com/toc2008/public/schedule/speaker/1317&#34;&gt;Stephen Abram&lt;/a&gt;, who said that &amp;ldquo;content isn&amp;rsquo;t king—context is.&amp;rdquo; (He also said that &amp;ldquo;XML senses what device it&amp;rsquo;s on&amp;rdquo;, so he was clearly more interested in catchiness than in technical accuracy.)&lt;/p&gt;
&lt;p&gt;The mantra &amp;ldquo;content is king&amp;rdquo; has been around for years. As a LexisNexis employee, I heard it mentioned near the beginning of speeches with the regularity of grace at the beginning of a meal at a religious retreat. If I was still employed there, I could look up the first use of the term in the media. An &lt;a href=&#34;http://www.altavista.com/web/results?itag=ody&amp;amp;pg=aq&amp;amp;aqmode=s&amp;amp;aqa=&amp;amp;aqp=content+is+king&amp;amp;aqo=&amp;amp;aqn=&amp;amp;kgs=1&amp;amp;kls=0&amp;amp;d2=0&amp;amp;dt=dtrange&amp;amp;dfr%5Bd%5D=1&amp;amp;dfr%5Bm%5D=1&amp;amp;dfr%5By%5D=1980&amp;amp;dto%5Bd%5D=13&amp;amp;dto%5Bm%5D=2&amp;amp;dto%5By%5D=1994&amp;amp;filetype=&amp;amp;rc=dmn&amp;amp;swd=&amp;amp;lh=&amp;amp;nbq=10&#34;&gt;Alta Vista advanced search&lt;/a&gt; with a date range included shows a &lt;a href=&#34;http://web.mit.edu/AFS/sipb/project/eichin/cruft/text2/britannica-online&#34;&gt;1994 reference&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;A later keynote speaker, author &lt;a href=&#34;http://www.rushkoff.com/&#34;&gt;Doug Rushkoff&lt;/a&gt;, reminded me of a high-tech Martin Short. He quoted Abrams and said that neither content nor context is king, but that contact is. (If you have any suggestions for what is king, make sure it begins with the letters &amp;ldquo;cont&amp;rdquo;.) He didn&amp;rsquo;t just mean that maintaining two-way contact with your customers is valuable—an idea first made popular in 1999 by the &lt;a href=&#34;http://cluetrain.com/&#34;&gt;Cluetrain Manifesto&lt;/a&gt; authors—but that the real payoff comes from letting your customers maintain contact with each other. As he put it, &amp;ldquo;the Internet is interpersonal, not interactive&amp;rdquo;.&lt;/p&gt;
&lt;p&gt;I could tie this to Web 2.0 talk of the value of user-generated content, but if one reader of a particular book posted some opinion about one of the characters on that book&amp;rsquo;s discussion list nine weeks ago, her three paragraphs won&amp;rsquo;t sell more copies of that book. More importantly, the sharing of her opinion will give her a sense of participation in a community around the book, along with the readers that preceded and followed her in the on-line conversation. If the number one driver of book sales is recommendations, it&amp;rsquo;s very valuable for publishers to build this sense of participation around a book.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://www.moconews.net/entry/text-of-p-diddys-speech-give-them-king-kong-content/&#34;&gt;&lt;img src=&#34;http://upload.wikimedia.org/wikipedia/commons/thumb/c/c4/Sean_Combs.jpg/220px-Sean_Combs.jpg&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;Mr. Diddy&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;These variations on the theme of &amp;ldquo;content is king&amp;rdquo; reminded me of something I recently stumbled across while cleaning up some bookmarks: a keynote speech titled &lt;a href=&#34;http://www.moconews.net/entry/text-of-p-diddys-speech-give-them-king-kong-content/&#34;&gt;I am an MVNO&lt;/a&gt; given by Sean Combs (P/Puff/Diddy/Daddy) to a wireless phone business gathering in 2005. I never was a huge fan of his music; a year after he gave this speech, a New York Times review of his album &amp;ldquo;Press Play&amp;rdquo; and Jay-Z&amp;rsquo;s &amp;ldquo;Kingdom Come&amp;rdquo; made a clever point: one album showed a businessman acting like a hustler, and the other showed a hustler acting like a businessman. (I own neither album, but Mr. Z&amp;rsquo;s &lt;a href=&#34;http://www.amazon.com/exec/obidos/ISBN=B000XXTCH6/bobducharmeA/&#34;&gt;American Gangster&lt;/a&gt; is the only album that I&amp;rsquo;ve bought as a set of MP3s off of Amazon so far.) Combs is obviously a talented business man, and I was impressed with what he had to say to a roomful of people thinking hard about making money from content.&lt;/p&gt;
&lt;p&gt;His words still make sense three years later. And, he has his own variation on the theme of content as king: &amp;ldquo;People always say that content is king, but there&amp;rsquo;s a lot of content out there and it can&amp;rsquo;t all be king&amp;hellip; You want king-kong content&amp;rdquo;. He wasn&amp;rsquo;t just saying that you want really good content. With references to Marshall McLuhan, MTV, and BET, he describes the use of technology to build communities as marketing channels. It&amp;rsquo;s a fresh perspective, certainly not in terms of its age, but in terms of who it comes from, considering his distance from the Stephen Abrams and Doug Rushkoffs of this conference.&lt;/p&gt;
&lt;h2 id=&#34;2-comments&#34;&gt;2 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.ccil.org/~cowan&#34; title=&#34;http://www.ccil.org/~cowan&#34;&gt;John Cowan&lt;/a&gt; on &lt;a href=&#34;#comment-1582&#34;&gt;February 13, 2008 11:02 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Or at the least, make sure it begins with &amp;lsquo;c&amp;rsquo;, has &amp;rsquo;n&amp;rsquo; in the third position, and &amp;rsquo;t&amp;rsquo; immediately following. That should cover all cases.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://dannyayers.com&#34; title=&#34;http://dannyayers.com&#34;&gt;Danny&lt;/a&gt; on &lt;a href=&#34;#comment-1585&#34;&gt;February 14, 2008 3:21 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Fie!&lt;/em&gt; &lt;a href=&#34;http://en.wikipedia.org/wiki/Canute_the_Great&#34;&gt;He&lt;/a&gt; hasn&amp;rsquo;t been King since the 11th century&amp;hellip;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2008">2008</category>
      
      <category domain="https://www.bobdc.com//categories/publishing">publishing</category>
      
    </item>
    
    <item>
      <title>Pavarotti duets</title>
      <link>https://www.bobdc.com/blog/pavarotti-duets/</link>
      <pubDate>Fri, 08 Feb 2008 08:11:06 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/pavarotti-duets/</guid>
      
      
      <description><div>Lou Reed? James Brown? The Spice Girls?</div><div>&lt;p&gt;My small appreciation for opera focuses more on specific composers like Alban Berg and Puccini than on singers, but WFMU&amp;rsquo;s &lt;a href=&#34;http://blog.wfmu.org/freeform/2008/01/lou-reed-vs-pav.html&#34;&gt;Beware of the Blog&lt;/a&gt; led me to an interesting Pavarotti duet, and YouTube showed me that it was one of a truly strange collection. For example, there is the goofy and gimmicky: &lt;a href=&#34;http://youtube.com/watch?v=kL0WFcygdWY&#34;&gt;with Barry White&lt;/a&gt;. The less goofy, but still gimmicky (and I&amp;rsquo;m a bigger fan of Lou&amp;rsquo;s than of anyone else mentioned on this page): &lt;a href=&#34;http://youtube.com/watch?v=kXgbN81zNG8&#34;&gt;with Lou Reed&lt;/a&gt;. Strangely powerful and moving: &lt;a href=&#34;http://youtube.com/watch?v=VCIyzNISw1Q&#34;&gt;with James Brown&lt;/a&gt;.&lt;/p&gt;

&lt;div style=&#34;position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;&#34;&gt;
  &lt;iframe src=&#34;https://www.youtube.com/embed/VCIyzNISw1Q&#34; style=&#34;position: absolute; top: 0; left: 0; width: 100%; height: 100%; border:0;&#34; allowfullscreen title=&#34;YouTube Video&#34;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;p&gt;Obviously, &amp;ldquo;This is a Man&amp;rsquo;s World&amp;rdquo; was a better choice than &amp;ldquo;Hot Pants&amp;rdquo; or &amp;ldquo;Cold Sweat&amp;rdquo; would have been, although &amp;ldquo;Please Please Please&amp;rdquo; might have been interesting. The song&amp;rsquo;s big belting emotions mesh well with a full orchestra, making it pretty operatic before the famous tenor joins in.&lt;/p&gt;
&lt;p&gt;YouTube also offers videos of Pavarotti duets with Bono, Sting (what, no Peter Gabriel?), Liza Minelli, Ricky Martin, Meatloaf, Bon Jovi, Bryan Adams (or was it Ryan Adams?), and even the Spice Girls, but I really wasn&amp;rsquo;t interested in these. None could beat the Godfather of Soul.&lt;/p&gt;
&lt;p&gt;Only once in my life have I ever told my kids &amp;ldquo;someday you&amp;rsquo;ll thank me for this&amp;rdquo;: when we brought them to a local James Brown concert less than a year before he died. He was obviously not in his prime, but it was the full band with the full show, and he was very impressive for a man of his age. (Google video has an obstructed view &lt;a href=&#34;http://video.google.com/videoplay?docid=-7379547043136309313&#34;&gt;video clip of &amp;ldquo;I Feel Good&amp;rdquo; from that show&lt;/a&gt;.)&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2008">2008</category>
      
      <category domain="https://www.bobdc.com//categories/music">music</category>
      
    </item>
    
    <item>
      <title>Unsung Super Bowl hero</title>
      <link>https://www.bobdc.com/blog/unsung-super-bowl-hero/</link>
      <pubDate>Tue, 05 Feb 2008 09:11:02 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/unsung-super-bowl-hero/</guid>
      
      
      <description><div>A Snee.</div><div>&lt;img src=&#34;http://www.giantsspectacular.com/catalog/images/snee%20auto%20a.jpg&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;Chris Snee picture&#34; width=&#34;200px&#34;/&gt;
&lt;p&gt;At least that&amp;rsquo;s what his &lt;a href=&#34;http://www.thetimes-tribune.com/site/news.cfm?newsid=19260281&#34;&gt;hometown paper&lt;/a&gt; says. He also got &lt;a href=&#34;http://www.nytimes.com/2008/02/04/sports/football/04araton.html&#34;&gt;quoted by the New York Times&lt;/a&gt;, whose Harvey Araton felt the need to explicate what our man Snee meant.&lt;/p&gt;
&lt;p&gt;I don&amp;rsquo;t know a lot about football—as a New York City resident, post-season Giants games on TV were about the extent of my football viewing—but I know that protecting Eli Manning while he makes those passes must be &amp;ldquo;challenging&amp;rdquo;, to use the popular business-speak euphemism. I wonder if this very large, powerful man felt any ill will toward the guy who got the snee.com domain name before he did, forcing him to put his personal website at &lt;a href=&#34;http://www.snee76.com/&#34;&gt;snee76.com&lt;/a&gt;? (&lt;a href=&#34;http://www.snee.com/about.html&#34;&gt;Further background&lt;/a&gt; on the choice of this domain name.)&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2008">2008</category>
      
      <category domain="https://www.bobdc.com//categories/miscellaneous">miscellaneous</category>
      
    </item>
    
    <item>
      <title>The future of RDFa</title>
      <link>https://www.bobdc.com/blog/the-future-of-rdfa/</link>
      <pubDate>Mon, 04 Feb 2008 08:47:21 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/the-future-of-rdfa/</guid>
      
      
      <description><div>Think big.</div><div>&lt;p&gt;Since the beginning of RDFa&amp;rsquo;s history, many of its advocates have stressed its value in adding machine-readable semantics to personal web pages. This example from the &lt;a href=&#34;http://www.w3.org/TR/xhtml-rdfa-primer/&#34;&gt;RDFa Primer&lt;/a&gt; is typical:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt; &amp;lt;p class=&amp;quot;contactinfo&amp;quot;  about=&amp;quot;http://example.org/staff/jo&amp;quot;&amp;gt;
    &amp;lt;span property=&amp;quot;contact:fn&amp;quot;&amp;gt;Jo Smith&amp;lt;/span&amp;gt;.
    &amp;lt;span property=&amp;quot;contact:title&amp;quot;&amp;gt;Web hacker&amp;lt;/span&amp;gt; at
    &amp;lt;a rel=&amp;quot;contact:org&amp;quot; href=&amp;quot;http://example.org&amp;quot;&amp;gt;Example.org&amp;lt;/a&amp;gt;.
    You can contact me
    &amp;lt;a rel=&amp;quot;contact:email&amp;quot; href=&amp;quot;mailto:jo@example.org&amp;quot;&amp;gt;via email&amp;lt;/a&amp;gt;.
  &amp;lt;/p&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;An important principle has been the ability to make a web page&amp;rsquo;s data readable by both eyeballs and automated processes. This is great, but there are two related issues that I feel need a higher profile: first, RDFa has great potential for storing non-eyeball information in web pages. Secondly, examples like the one above go after microformats on their own turf, where they&amp;rsquo;re dug in pretty well. Being a more generalized, scalable solution, RDFa can do a lot more than microformats, and with many of those other applications having more commercial potential, I see them as the best growth area for the format.&lt;/p&gt;
&lt;p&gt;First, the non-eyeballs part. When I speak about RDFa to people with a publishing background, they like its ability to store metadata such as workflow information. Some had heard of RDF in its RDF/XML incarnation, and it was just too complex for them. RDFa isn&amp;rsquo;t. I submitted &lt;a href=&#34;http://www.w3.org/TR/xhtml-rdfa-scenarios/#use-case-3&#34;&gt;an example&lt;/a&gt; of this kind of workflow metadata usage to the RDFa Use Cases document, where it can provide a placeholder for future work. People often say that it&amp;rsquo;s difficult to measure RDF adoption rates because so much of it is behind firewalls; electronic publishing workflow metadata is a pretty classic case of this, considering that publishers want to track various bits of information about documents as they work on them but don&amp;rsquo;t want to include that information in the publicly available versions, so again, I think it&amp;rsquo;s great potential growth area for RDFa.&lt;/p&gt;
&lt;blockquote class=&#34;pullquote&#34; style=&#39;width: 190px; font: bold 1.333em/1.125em &#34;Helvetica Neue&#34;, Helvetica, Arial, sans-serif; margin: 1.5em 0 1.5em 1.5em !important; padding: 0.6em 5px !important; background: none !important; border: 3px double \#ddd; border-width: 3px 0; text-align: center; float: right; &#39;&gt;**Being a more generalized, scalable solution, RDFa can do a lot more than microformats, and with many of those other applications having more commercial potential, I see them as the best growth area for the format.** &lt;/blockquote&gt;
&lt;p&gt;I &lt;a href=&#34;https://www.bobdc.com/blog/scraping-and-linked-data&#34;&gt;wrote recently&lt;/a&gt; about how microformats, the semantic web, and the linked data movement are making more data available as HTTP-accessible resources. The linked data strategy is often to build a front end to a data source that lets you issue SPARQL queries against it—a &amp;ldquo;SPARQL endpoint&amp;rdquo; —and/or to maintain an updated copy of valuable information to query against, as with &lt;a href=&#34;http://www.snee.com/bobdc.blog/2007/11/querying_dbpedia.html&#34;&gt;DBPedia&lt;/a&gt;. Microformats and the semantic web efforts (or at least the RDFa aspect of this) compete more directly with each other, each offering ways to embed semantics and machine-readable data into web pages, so it&amp;rsquo;s worth examing what each does well and what clues this offers about their future.&lt;/p&gt;
&lt;p&gt;The microformats effort has settled on formats to &lt;a href=&#34;http://microformats.org/wiki/hcard&#34;&gt;represent vCard contact information&lt;/a&gt; and &lt;a href=&#34;http://microformats.org/wiki/XOXO&#34;&gt;outlines&lt;/a&gt; in HTML, and there are &lt;a href=&#34;http://wiki.caminobrowser.org/Development:Planning:Microformats#Microformats_List&#34;&gt;various efforts&lt;/a&gt; to re-use existing bits of HTML markup for other domains, but there&amp;rsquo;s a &lt;a href=&#34;http://microformats.org/wiki/exploratory-discussions#Moribund&#34;&gt;much longer list&lt;/a&gt; of failed (or rather, &amp;ldquo;moribund&amp;rdquo;) microformats efforts. Microformats&amp;rsquo; hCard conventions for contact information looks like a success, and the XOXO outline effort addresses a problem that RDF was never very good at anyway: imposing structure on the relationships among collection of data.&lt;/p&gt;
&lt;p&gt;The list of moribund microformats efforts shows that it&amp;rsquo;s moving slowly, if at all, to many new domains, and my theory is that it&amp;rsquo;s so slow because for each new domain a new set of things needs to be worked out: how to identify each piece of information and where to put it in the available HTML slots. They have a few &lt;a href=&#34;http://microformats.org/wiki/design-patterns&#34;&gt;design patterns&lt;/a&gt; to guide this process, but I know of no generalized microformats way to say that a given resource has a given field name/value pairing in a way that would work for all resources and fields. RDFa&amp;rsquo;s use of &lt;a href=&#34;http://www.w3.org/TR/2007/WD-rdfa-syntax-20071018/&#34;&gt;actual specifications&lt;/a&gt; (as opposed to warm and fuzzy exhortations like &amp;ldquo;pave the cow paths&amp;rdquo; and &amp;ldquo;a way of thinking about data&amp;rdquo;) make the RDFa representation of any straightforward facts pretty simple, as long as a vocabulary exists to describe the resources and attributes. If it doesn&amp;rsquo;t, you can make one up, but they can build on existing naming schemes such as &lt;a href=&#34;http://en.wikipedia.org/wiki/Stock_Keeping_Unit&#34;&gt;SKU&lt;/a&gt; or &lt;a href=&#34;http://www.isbn.org/standards/home/index.asp&#34;&gt;ISBN&lt;/a&gt; numbers.&lt;/p&gt;
&lt;p&gt;These two naming schemes in particular can cover a vast amount of machine-readable data that&amp;rsquo;s worth embedding into web pages. For example, if the book with ISBN 1930220111 is for sale for $19.77, then it&amp;rsquo;s pretty clear what&amp;rsquo;s going on here:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;lt;span about=&amp;quot;http://site:www.isbn.org/1930220111&amp;quot; property=&amp;quot;cbc:PriceAmount&amp;quot;&amp;gt;19.77&amp;lt;/span&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;(I&amp;rsquo;m assuming for now that an application reading such data would only be interested in its developer&amp;rsquo;s local currency, which leaves plenty of useful applications to write.) If you and I each have a million triples of pricing information, but you used something other than the UBL urn:oasis:names:tc:ubl:CommonBasicComponents:1:0 namespace to indicate your PriceAmount predicates, a simple OWL rule can tell a program reading these prices that you and I meant the same thing by the two different predicates we used.&lt;/p&gt;
&lt;p&gt;Pricing is a good example. It&amp;rsquo;s a huge area where people would be happy to give away data in the form of extra embedded metadata in their web pages, because it can drive new paying customers to the source of that data (for example, to sell more copies of the book with the ISBN 1930220111). Scheduling is another example of how giving away data such as flight times or movie times can drive paying customers to an organization with something for sale. Microformats have made &lt;a href=&#34;http://microformats.org/wiki/hcalendar-examples-in-wild&#34;&gt;some progress&lt;/a&gt; (the &lt;a href=&#34;http://www.depechemode.de/parties/&#34;&gt;German Depeche Mode party list&lt;/a&gt;?), but I think that RDFa can make a lot more progress here.&lt;/p&gt;
&lt;p&gt;Let microformats do what they do best: shoehorning bits of personal data into leftover HTML attributes that no one was using (such as the &lt;code&gt;abbr&lt;/code&gt; attribute for dates) and adding &lt;code&gt;&amp;lt;div class=&amp;quot;foo&amp;quot;&amp;gt;&amp;lt;/div&amp;gt;&lt;/code&gt; and &lt;code&gt;&amp;lt;span class=&amp;quot;foo&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;/code&gt; elements in places where they wish HTML offered a &lt;code&gt;foo&lt;/code&gt; element. That&amp;rsquo;s not going to scale to more enterprise-oriented data, because there are no clear answers to questions about the relationships between the various bits of markup. For example, what does &lt;code&gt;&amp;lt;div class=&amp;quot;title&amp;quot;&amp;gt;&amp;lt;/div&amp;gt;&lt;/code&gt; mean? The &lt;a href=&#34;http://microformats.org/wiki/audio-info-brainstorming#title&#34;&gt;title of an audio track&lt;/a&gt; or a &lt;a href=&#34;http://microformats.org/wiki/meeting-minutes-brainstorming&#34;&gt;job title&lt;/a&gt;? I suppose it depends whether the &lt;code&gt;div&lt;/code&gt; element in question has a &lt;code&gt;&amp;lt;div class=&amp;quot;haudio&amp;quot;&amp;gt;&amp;lt;/div&amp;gt;&lt;/code&gt; ancestor or a &lt;code&gt;&amp;lt;div class=&amp;quot;vcard&amp;quot;&amp;gt;&amp;lt;/div&amp;gt;&lt;/code&gt; ancestor. So what role does a &lt;code&gt;div&lt;/code&gt; element play in setting the context of its descendants? Hell if I know; a &lt;a href=&#34;http://microformats.org/wiki/Special:Search?search=div&amp;amp;go=Go&#34;&gt;search for &amp;ldquo;div&amp;rdquo; at microformats.org&lt;/a&gt; just brought up &amp;ldquo;No page title matches&amp;rdquo; and &amp;ldquo;No page text matches&amp;rdquo;. The documentation for the &lt;a href=&#34;http://microformats.org/wiki/class-design-pattern&#34;&gt;class design pattern&lt;/a&gt; tells us that &amp;ldquo;if an appropriate semantic element is not available, use &lt;code&gt;span&lt;/code&gt; or &lt;code&gt;div&lt;/code&gt;&amp;rdquo;, with no clue about what might be special about &lt;code&gt;div&lt;/code&gt;. The documentation for the &lt;a href=&#34;http://microformats.org/wiki/elemental-microformat&#34;&gt;elemental&lt;/a&gt; and &lt;a href=&#34;http://microformats.org/wiki/compound-microformat&#34;&gt;compound&lt;/a&gt; design patterns don&amp;rsquo;t offer any more help.&lt;/p&gt;
&lt;p&gt;This is not a markup infrastructure that someone can take and run with to develop (or even augment) applications for arbitrary data domains. RDFa is way ahead of microformats in its ability to do this, so its best opportunities for traction are in domains with a lot of structured data that doesn&amp;rsquo;t fit well into hCard format or the two or three other microformat success stories.&lt;/p&gt;
&lt;p&gt;There are plenty of these. Those who would benefit most from giving away embedded machine-readable data are companies and other large organizations who are now generating tables of HTML describing their products and services using PHP, Perl, Ruby on Rails, or other scripting languages, and a few tweaks (&lt;a href=&#34;https://www.bobdc.com/blog/automated-rdfa-output-from-dit&#34;&gt;[1]&lt;/a&gt;, &lt;a href=&#34;http://www.snee.com/bobdc.blog/2007/02/generating_rdfa_from_movable_t_1.html&#34;&gt;[2]&lt;/a&gt;) to those scripts can make a wide range of that data machine-readable RDFa in addition to being human readable data. Let&amp;rsquo;s find the people who can get those tweaks made and convince them of the value of doing so.&lt;/p&gt;
&lt;h2 id=&#34;7-comments&#34;&gt;7 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://thewebsemantic.com&#34; title=&#34;http://thewebsemantic.com&#34;&gt;Taylor&lt;/a&gt; on &lt;a href=&#34;#comment-1566&#34;&gt;February 5, 2008 4:42 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The problem I&amp;rsquo;m having with RDFa is that the semweb community avoids commitment to any common formats whereas the microformats community shares the tools and stylesheets for parsing, which in the end enforce a loose standard. If your format shows up in operator it&amp;rsquo;s ok, if not you need to work on it. What I ended up doing was hAtom on the page, then style to Atom, then RDF.&lt;/p&gt;
&lt;p&gt;I know you&amp;rsquo;re thinking wait, we&amp;rsquo;ve got lots of standards in RDF. When there is a format the XSD types are vague, sometimes even omitted. The cardinalities are often omitted. We need a place to declare &amp;ldquo;here is how we format ____ in RDFa and here is a tool that will tell you if your stuff is good or bad&amp;rdquo;.&lt;/p&gt;
&lt;p&gt;So let&amp;rsquo;s just pretend we&amp;rsquo;ve found somebody who emits lists of products and services onto webpages&amp;hellip;what format do we recommend they use?&lt;/p&gt;
&lt;p&gt;Taylor&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-1567&#34;&gt;February 5, 2008 5:21 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I think that rdfa.info would be a good candidate for a place to store documentation about such best practices.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;m not sure what you mean by &amp;ldquo;shows up in operator&amp;rdquo;&amp;ndash;I find that if I coded some RDFa correctly, any of the tools mentioned in &amp;ldquo;Getting Those Triples&amp;rdquo; at &lt;a href=&#34;http://www.xml.com/pub/a/2007/02/14/introducing-rdfa.html?page=2&#34;&gt;http://www.xml.com/pub/a/2007/02/14/introducing-rdfa.html?page=2&lt;/a&gt; pull out the triples I expected, so I consider those to be shared, available, consistent tools. Several formats for coding the triples are possible, but I wouldn&amp;rsquo;t recommend one over the other, as long as the triples extracted with the tools showed that the chosen format was used correctly.&lt;/p&gt;
&lt;p&gt;I haven&amp;rsquo;t played with typing of values in these triples much, so maybe those are less consistent; is that what you meant about XSD types?&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.boogdesign.com/b2evo/&#34; title=&#34;http://www.boogdesign.com/b2evo/&#34;&gt;Rob Crowther&lt;/a&gt; on &lt;a href=&#34;#comment-1568&#34;&gt;February 5, 2008 5:46 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Thanks for an interesting read, just some comments where there seems to be a bit of confusion:&lt;/p&gt;
&lt;p&gt;The phrase &amp;ldquo;shows up in operator&amp;rdquo; refers to the Operator extension for Firefox, which is the basis for the built in Microformats support in Firefox 3.0:&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://addons.mozilla.org/en-US/firefox/addon/4106&#34;&gt;https://addons.mozilla.org/en-US/firefox/addon/4106&lt;/a&gt;&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;with no clue about what might be special about div&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Div is a block level element, use it if you need a block level element. Span is an inline element, use it if you need an inline element. I&amp;rsquo;m not sure what it is you think should be special about it? The class names can be applied to arbitrary elements as appropriate to your markup, the element is not usually significant.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-1569&#34;&gt;February 5, 2008 6:21 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Rob,&lt;/p&gt;
&lt;p&gt;Bit of confusion is right! I was trying to learn more about div because I was wondering if there was a way to tell whether class=&amp;ldquo;title&amp;rdquo; refers to the title of an audio track or to a job title, because I found both among microformats examples. (For example, does &lt;span class=&#39;title&#39;&gt;Bell Boy&lt;/span&gt; refer to someone who helps you with your luggage or the Who song from Quadrophenia?) I guessed that the criteria was whether the closest enclosing div element had a class value of haudio or vcard, but could find no confirmation of this, or anything else describing how the value of one div&amp;rsquo;s class value could affect the interpretation of another one.&lt;/p&gt;
&lt;p&gt;This is why I was trying to find some documentation about whether there&amp;rsquo;s anything special about the div element. I suppose I shouldn&amp;rsquo;t have assumed that there was something special about the relationship between the value in question and the enclosing div element that identified which microformat (haudio or vcard) was in use; perhaps a span element can identify which microformat is in use and indicate how to interpret what &amp;ldquo;title&amp;rdquo; means in a given context. I&amp;rsquo;m guessing, I&amp;rsquo;m making assumptions that may be incorrect, and I wish I had a place to look up the answer, but I couldn&amp;rsquo;t find it. I believe that the need to make such guesses and assumptions prevents microformats from being a good format for many domains where RDFa would work well, because RDFa has a spec and it&amp;rsquo;s built on a simple, documented data model.\&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://thewebsemantic.com&#34; title=&#34;http://thewebsemantic.com&#34;&gt;Taylor&lt;/a&gt; on &lt;a href=&#34;#comment-1570&#34;&gt;February 5, 2008 10:11 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&amp;ldquo;I haven&amp;rsquo;t played with typing of values in these triples much, so maybe those are less consistent; is that what you meant about XSD types? &amp;quot;&lt;/p&gt;
&lt;p&gt;Yes, that&amp;rsquo;s what I meant. Looking back at your triples introduction article you used the Dublin Core element set, so I&amp;rsquo;ll pick on that since we&amp;rsquo;re both familiar with it. The schema for that is here &lt;a href=&#34;http://purl.org/dc/elements/1.1/.&#34;&gt;http://purl.org/dc/elements/1.1/.&lt;/a&gt; I think dc:subject is similar to a tag set, but the schema doesn’t indicate if it’s a comma separated string, or space, or a bag, or a set, or another datatype node. It doesn’t say anything at all, except that the property exists. dc:date is worse. remember, we&amp;rsquo;re talking machine readable. When a machine encounters a dc:date of 07/08/09 what does it think it means?&lt;/p&gt;
&lt;p&gt;Technically RDFa has tremendous advantages over microformats, and we can be very specific with our vocabularies using RDFS and/or OWL. But the RDFa camp is lacking where it counts most&amp;hellip;community building, and collaborative vocabulary definition.&lt;/p&gt;
&lt;p&gt;From rdfs.info &amp;ldquo;you choose which attributes to use, which to reuse from other sites, and how to evolve, over time, the meaning of these attributes.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;That bothers me. Just sounds too loosey goosey. For meaning to have an impact it needs to be shared and agreed upon. Publishers are thinking &amp;ldquo;how do I get reach&amp;rdquo;&amp;hellip;aggregators are thinking &amp;ldquo;how can I find semantically meaningful content&amp;rdquo;. I&amp;rsquo;d like to participate in building these for my domain but I don&amp;rsquo;t know where to engage, while it&amp;rsquo;s pretty clear on the microformat side of things.\&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-1571&#34;&gt;February 5, 2008 11:07 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Taylor,&lt;/p&gt;
&lt;p&gt;dc:subject is a property, and that&amp;rsquo;s all. In my opinion, building data structures out of triples is what got RDF/XML into trouble, and I get along fine by just not doing that. If I want to say that the resource at &lt;a href=&#34;http://www.snee.com/bobdc.blog/2007/08/automated_rdfa_output_from_dit.html&#34;&gt;http://www.snee.com/bobdc.blog/2007/08/automated_rdfa_output_from_dit.html&lt;/a&gt; has both RDF and DITA has subjects, I&amp;rsquo;ll just do it as two triples,&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://www.snee.com/bobdc.blog/2007/08/automated_rdfa_output_from_dit.html&#34;&gt;http://www.snee.com/bobdc.blog/2007/08/automated_rdfa_output_from_dit.html&lt;/a&gt;&lt;br /&gt;
&lt;a href=&#34;http://purl.org/dc/elements/1.1/subject&#34;&gt;http://purl.org/dc/elements/1.1/subject&lt;/a&gt;&lt;br /&gt;
&lt;a href=&#34;http://www.snee.com/bobdc.blog/metadata/rdf/&#34;&gt;http://www.snee.com/bobdc.blog/metadata/rdf/&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://www.snee.com/bobdc.blog/2007/08/automated_rdfa_output_from_dit.html&#34;&gt;http://www.snee.com/bobdc.blog/2007/08/automated_rdfa_output_from_dit.html&lt;/a&gt;&lt;br /&gt;
&lt;a href=&#34;http://purl.org/dc/elements/1.1/subject&#34;&gt;http://purl.org/dc/elements/1.1/subject&lt;/a&gt;&lt;br /&gt;
&lt;a href=&#34;http://www.snee.com/bobdc.blog/metadata/dita/&#34;&gt;http://www.snee.com/bobdc.blog/metadata/dita/&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;or, in RDFa,&lt;/p&gt;
&lt;p&gt;&amp;lt;span about=&amp;ldquo;&lt;a href=&#34;http://www.snee.com/bobdc.blog/2007/08/automated_rdfa_output_from_dit.html%22&#34;&gt;http://www.snee.com/bobdc.blog/2007/08/automated_rdfa_output_from_dit.html&amp;quot;&lt;/a&gt;&lt;br /&gt;
rel=&amp;ldquo;dc:subject&amp;rdquo; href=&amp;ldquo;&lt;a href=&#34;http://www.snee.com/bobdc.blog/metadata/rdf/%22/%3E&#34;&gt;http://www.snee.com/bobdc.blog/metadata/rdf/&amp;quot;/&amp;gt;&lt;/a&gt;&lt;br /&gt;
&amp;lt;span about=&amp;ldquo;&lt;a href=&#34;http://www.snee.com/bobdc.blog/2007/08/automated_rdfa_output_from_dit.html%22&#34;&gt;http://www.snee.com/bobdc.blog/2007/08/automated_rdfa_output_from_dit.html&amp;quot;&lt;/a&gt;&lt;br /&gt;
rel=&amp;ldquo;dc:subject&amp;rdquo; href=&amp;ldquo;&lt;a href=&#34;http://www.snee.com/bobdc.blog/metadata/dita/%22/%3E&#34;&gt;http://www.snee.com/bobdc.blog/metadata/dita/&amp;quot;/&amp;gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;and I won&amp;rsquo;t even worry about trying to treat dc:subject as multi-valued thing. (There are other ways to do it in RDFa, especially from within the document serving as the subject, but RDFa tools will pull the same triples out of those as they will from the two span elements above.)&lt;/p&gt;
&lt;p&gt;For machine-readable dates, ISO 8601 has no real competition; in fact, it&amp;rsquo;s one of the canonical RDFa examples of using the content attribute, to do something like this:&lt;/p&gt;
&lt;p&gt;property=&amp;ldquo;dc:date&amp;rdquo; content=&amp;ldquo;20070315T15:32:00&amp;rdquo;&amp;gt;March 15, 2007, at 3:32 PM&lt;/p&gt;
&lt;p&gt;RDF(a) aside, it says right at &lt;a href=&#34;http://dublincore.org/documents/dces/#date&#34;&gt;http://dublincore.org/documents/dces/#date&lt;/a&gt; that ISO 8601 format is the best practice for dc:date.&lt;/p&gt;
&lt;p&gt;I think that microformats&amp;rsquo; lack of a spec makes it far more loosey-goosey than RDFa. There is clarity in microformats if you limit your domain to contact information or outlining, but there are many more kinds of data out there. See also my reply to Rob above.&lt;/p&gt;
&lt;p&gt;By Sarah Bourne on &lt;a href=&#34;#comment-1572&#34;&gt;February 6, 2008 10:10 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The problems the abbr design patterns (in particular) create for users of assistive technologies make adoption of microformats pretty much impossible for government sites with accessibility requirements. So, not only can RDFa do everything microformats can, and then some more (as you point out) they do it without adversely impacting usability for people with disabilities.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2008">2008</category>
      
      <category domain="https://www.bobdc.com//categories/rdf">RDF</category>
      
    </item>
    
    <item>
      <title>The LinkedData Planet conference</title>
      <link>https://www.bobdc.com/blog/the-linkeddata-planet-conferen/</link>
      <pubDate>Fri, 01 Feb 2008 13:23:34 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/the-linkeddata-planet-conferen/</guid>
      
      
      <description><div>And Expo!</div><div>&lt;p&gt;&lt;a href=&#34;http://www.linkeddataplanet.com/speak.php&#34;&gt;&lt;img src=&#34;http://www.linkeddataplanet.com/images/hdr_logo.jpg&#34; border=&#34;0&#34; align=&#34;right&#34; hspace=&#34;30px&#34; width=&#34;240px&#34; vspace=&#34;30px&#34; alt=&#34;LinkedData Planet conference&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;m proud to announce that Ken North and I are co-chairing the &lt;a href=&#34;http://www.linkeddataplanet.com&#34;&gt;LinkedData Planet&lt;/a&gt; Conference and Expo being held in New York City on June 17th and 18th. &amp;ldquo;Linked Data&amp;rdquo; refers to the increasing amount of machine-accessible useful data on the public web and the growing collection of practices that let us use different sets of public and private data together to get more out of those sets of data. By combining tools and techniques from the semantic web world, the enterprise database management world, and the XML world, developers are doing fascinating new things every day.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;https://www.bobdc.com/blog/querying-dbpedia&#34;&gt;Listing Bart Simpson blackboard gags&lt;/a&gt; is a silly example, but it shows how a standard query language can access a large, public database to find out some very interesting things, and DBpedia has a lot of data that will be valuable to more useful applications. And, since I wrote that three months ago, that query language &lt;a href=&#34;http://www.w3.org/News/2008#item6&#34;&gt;became a standard&lt;/a&gt;—things are moving very quickly in the world of Linked Data.&lt;/p&gt;
&lt;p&gt;I first met Ken when the Internet bubble economy supported several XML conferences per year, and since then I&amp;rsquo;ve grown to value his big-picture historical perspective on the evolution of database technology. (For example, check out his &lt;a href=&#34;http://ourworld.compuserve.com/homepages/Ken_North/db_hall.htm&#34;&gt;Excellence in Database Technology&lt;/a&gt; page.) He&amp;rsquo;s not looking at this technology from inside of the XML/publishing/semantic web geek world like I tend to, and he makes an excellent advocate for Linked Data technology in the enterprise database management world. The conference has also benefited a great deal from the inspiration of OpenLink Software&amp;rsquo;s &lt;a href=&#34;http://www.openlinksw.com/blog/~kidehen/&#34;&gt;Kingsley Idehen&lt;/a&gt;, who I first knew as the blogger interested in semantic web issues who left the most interesting comments on my own weblog. His experience with OpenLink&amp;rsquo;s technology and clients have given him a vision for the future of Linked Data that is quite inspiring when you hear him discuss it.&lt;/p&gt;
&lt;p&gt;We&amp;rsquo;ve got one speaker lined up already: Tim Berners-Lee, who&amp;rsquo;s &lt;a href=&#34;http://www.w3.org/DesignIssues/LinkedData.html&#34;&gt;been thinking about the possibilities of linked data&lt;/a&gt; for a while. If you&amp;rsquo;re interested in Linked Data, please submit something on the &lt;a href=&#34;http://www.linkeddataplanet.com/speak.php&#34;&gt;call for speakers page&lt;/a&gt; to tell us what you&amp;rsquo;ve done or what you&amp;rsquo;re working on. We&amp;rsquo;d love to hear about it.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2008">2008</category>
      
      <category domain="https://www.bobdc.com//categories/metadata">metadata</category>
      
    </item>
    
    <item>
      <title>Validating XML documents with PUBLIC identifiers and catalogs</title>
      <link>https://www.bobdc.com/blog/validating-xml-documents-with/</link>
      <pubDate>Sun, 27 Jan 2008 11:12:23 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/validating-xml-documents-with/</guid>
      
      
      <description><div>And indenting them, and changing their encoding...</div><div>&lt;p&gt;&lt;a href=&#34;http://xmlsoft.org/xmllint.html&#34;&gt;&lt;img src=&#34;http://xmlsoft.org/Libxml2-Logo-180x168.gif&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;powered by libxml2&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;To check the validity of XML files, I&amp;rsquo;ve used the &lt;a href=&#34;http://xerces.apache.org/xerces-c/stdinparse.html&#34;&gt;stdinparse&lt;/a&gt; utility that comes with Xerces C for years, but no more. While creating some DITA files, I wanted to validate them using the document&amp;rsquo;s PUBLIC identifier and not its SYSTEM identifier. (I didn&amp;rsquo;t use PUBLIC identifiers much in the days between SGML and DITA. They&amp;rsquo;re useful for DITA work because the DITA Open Toolkit automates the assembly of multiple pieces, and sharing pieces in multiple places is easier with PUBLIC declarations, especially if you&amp;rsquo;re assembling a system that will run on a machine other than your own.)&lt;/p&gt;
&lt;p&gt;I did some searches, and it turned out that I&amp;rsquo;d put the perfect utility on my Windows machine&amp;rsquo;s hard disk years ago. It also looks like it&amp;rsquo;s included in some Linux distributions as well, or only an apt-get away: &lt;a href=&#34;http://xmlsoft.org/xmllint.html&#34;&gt;xmllint&lt;/a&gt;, which is part of &lt;a href=&#34;http://xmlsoft.org/&#34;&gt;libxml2&lt;/a&gt;. It&amp;rsquo;s written in C, so it&amp;rsquo;s fast, and binaries are easy to find for Windows and Linux.&lt;/p&gt;
&lt;p&gt;Once you set the SGML_CATALOG_FILES environment variable to point to your catalog, the &lt;code&gt;-catalogs&lt;/code&gt; switch tells it to use the catalog. For example:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;set SGML_CATALOG_FILES=c:/usr/local/DITA-OT1.4.1/catalog-dita.xml
xmllint -noout -valid -catalogs myditafile.xml
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;-noout&lt;/code&gt; switch tells xmllint to not output the document itself, &lt;code&gt;-valid&lt;/code&gt; tells it to validate the document, and &lt;code&gt;-catalogs&lt;/code&gt; tells it to use the catalog defined in SGML_CATALOG_FILES.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;xmllint&lt;/code&gt; has a lot of other nice switches. If you omit the &lt;code&gt;-nout&lt;/code&gt; switch, there are some handy transformations you can easily perform on the document. You can indent it with &lt;code&gt;-format&lt;/code&gt;, and &lt;code&gt;-encode&lt;/code&gt; lets you specify a new encoding for the output, as Dave Holden &lt;a href=&#34;https://www.bobdc.com/blog/converting-an-xml-documents-en#c001278&#34;&gt;pointed out&lt;/a&gt; when I described some simple XSLT stylesheets I once used to convert the encoding of XML documents. The &lt;code&gt;-noblanks&lt;/code&gt; switch drops ignorable white space, &lt;code&gt;-relaxng&lt;/code&gt; validates the document against a RELAX NG schema, &lt;code&gt;-schema&lt;/code&gt; validates it against a W3C schema, and there are dozens of more switches.&lt;/p&gt;
&lt;p&gt;I can&amp;rsquo;t believe this was sitting on my hard disk for so long without my noticing how useful it can be.&lt;/p&gt;
&lt;h2 id=&#34;1-comments&#34;&gt;1 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://causticdave.com&#34; title=&#34;http://causticdave.com&#34;&gt;Caustic Dave&lt;/a&gt; on &lt;a href=&#34;#comment-1560&#34;&gt;January 27, 2008 11:18 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Oh yeah. xmllint is one of my favorite utilities. It has saved me from doom many times.&lt;/p&gt;
&lt;p&gt;I wish there was something like it for Javascript.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2008">2008</category>
      
      <category domain="https://www.bobdc.com//categories/dita">DITA</category>
      
      <category domain="https://www.bobdc.com//categories/xml">XML</category>
      
    </item>
    
    <item>
      <title>Digitization and its discontents</title>
      <link>https://www.bobdc.com/blog/digitization-and-its-disconten/</link>
      <pubDate>Thu, 24 Jan 2008 10:29:12 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/digitization-and-its-disconten/</guid>
      
      
      <description><div>How sloppy is OK for Google scans?</div><div>&lt;p&gt;&lt;a href=&#34;http://radar.oreilly.com/archives/2008/01/hand_of_google.html&#34;&gt;&lt;img src=&#34;http://radar.oreilly.com//ishot-2.jpg&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;bad Google scan&#34; width=&#34;240px&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Every now and then someone finds a page image from &lt;a href=&#34;http://books.google.com&#34;&gt;Google Book Search&lt;/a&gt; that shows the scanner operator&amp;rsquo;s hand over the page or something else that should be equally embarrassing. If finding such images was a game, &lt;a href=&#34;http://www.billtrippe.com/cgi-bin/mt/mt-search.cgi?IncludeBlogs=1&amp;amp;search=%22google%20books%20stupid%20page%20of%20the%20day%22%20&#34;&gt;Bill Trippe&lt;/a&gt; would have the high score. Dale Dougherty &lt;a href=&#34;http://radar.oreilly.com/archives/2008/01/hand_of_google.html&#34;&gt;recently pointed out&lt;/a&gt; the hilarious example shown here. (It took me a while to realize that the little flashes of color are the rubber fingertips that editorial workers often use to help them turn pages faster.)&lt;/p&gt;
&lt;p&gt;One commenter wrote that &amp;ldquo;after this, their QA department must be in trouble&amp;rdquo;. To paraphrase my reply, unless you&amp;rsquo;re a piecemeal nonprofit—which Google certainly isn&amp;rsquo;t—part of the project planning for any large-scale digitization effort is deciding what level of quality you&amp;rsquo;re going to achieve and then putting the QA infrastructure in place to attain it. 100% accuracy isn&amp;rsquo;t possible, but 99.95% and 99.995% are. Of course, 99.995 means a more expensive QA infrastructure, and your budget and schedule are two important inputs when deciding on an accuracy figure to aim for. At my &lt;a href=&#34;http://www.innodata-isogen.com&#34;&gt;place of employment&lt;/a&gt; I have co-workers who have this measurement down to a science; contact me if you&amp;rsquo;re interested in hearing more.&lt;/p&gt;
&lt;p&gt;Google&amp;rsquo;s QA department is only in trouble if they&amp;rsquo;re not meeting their accuracy goal. I&amp;rsquo;d love to see what this figure is, but I&amp;rsquo;m not holding my breath waiting for them to reveal it. I&amp;rsquo;m guessing that it&amp;rsquo;s lower than 99.95%.&lt;/p&gt;
&lt;p&gt;The same day that Dale posted this I learned from Robin Cover&amp;rsquo;s &lt;a href=&#34;http://xml.coverpages.org/newsletter/news2008-01-18.html&#34;&gt;XML Daily Newslink&lt;/a&gt; that the National Library of the Netherlands had published a report titled &lt;a href=&#34;http://www.dlib.org/dlib/january08/klijn/01klijn.html&#34;&gt;The Current State-of-art in Newspaper Digitization: A Market Perspective&lt;/a&gt; in &lt;a href=&#34;http://www.dlib.org&#34;&gt;D-Lib magazine&lt;/a&gt;, an online magazine focused on digital library research. Newspaper digitization presents all the difficulties of book digitization and a few more:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;The pages are bigger, so you need larger (and more expensive) scanners to make page images.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;A single page can have many articles, and many of these—especially those that begin on the front page—often end on a different page. The D-Lib article calls these &amp;ldquo;so-called &amp;lsquo;continuation&amp;rsquo; articles&amp;rdquo;, and newspaper people call the continuations &amp;ldquo;jumps&amp;rdquo;.&lt;/p&gt;
&lt;p&gt;In addition to the extra work of assembling the pieces so that the digital versions of these articles are coherent wholes, this presents some new metadata problems: if a keyword search finds a phrase on page 31, but it&amp;rsquo;s in the jump for an article beginning on page 12, will you retain this information when you assemble the pieces, and if so how will you store it and present it to the user?&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Knowing that most people will throw out their newspapers within days, newspaper publishers save money by printing them on cheap paper. The high acid content of newsprint means that it gets brown, crumbly, and more difficult to OCR much faster than paper used in other kinds of publishing.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The D-Lib article covers these and more generalized digitization issues very well. I recommend it to anyone interested in what goes into such a project, whether it&amp;rsquo;s the Google Book Search project or one you&amp;rsquo;re considering yourself.&lt;/p&gt;
&lt;h2 id=&#34;1-comments&#34;&gt;1 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.ccil.org/~cowan&#34; title=&#34;http://www.ccil.org/~cowan&#34;&gt;John Cowan&lt;/a&gt; on &lt;a href=&#34;#comment-1559&#34;&gt;January 24, 2008 11:27 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I don&amp;rsquo;t have inside knowledge about this, but on the evidence of one case, Google Books will yank a scanned book if there are fingers visible even if they don&amp;rsquo;t interfere with the comprehension of the text. However, they do seem to depend mostly on complaints by others (a reasonable attitude IMHO &amp;ndash; Google *is* behaving like a nonprofit in its book-scanning activities).&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2008">2008</category>
      
      <category domain="https://www.bobdc.com//categories/publishing">publishing</category>
      
    </item>
    
    <item>
      <title>Free epub children&#39;s picture books</title>
      <link>https://www.bobdc.com/blog/free-epub-childrens-picture-bo/</link>
      <pubDate>Sun, 20 Jan 2008 13:22:51 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/free-epub-childrens-picture-bo/</guid>
      
      
      <description><div>Something for kids to read on the OLPC XO.</div><div>&lt;p&gt;&lt;a href=&#34;http://www.snee.com/epubkidsbooks/&#34;&gt;&lt;img src=&#34;http://www.snee.com/epubkidsbooks/images/morerussian.jpg&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;some description&#34; width=&#34;160px&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I &lt;a href=&#34;https://www.bobdc.com/blog/lazyweb-grants-a-christmas-wis&#34;&gt;wrote recently&lt;/a&gt; about how several people have gotten the FBReader eBook reading program to run on the OLPC XO laptop. I thought it would be nice to have some books appropriate to the XO&amp;rsquo;s audience available, so I converted sixteen picture books from &lt;a href=&#34;http://www.gutenberg.org/wiki/Main_Page&#34;&gt;Project Gutenberg&lt;/a&gt; to the epub format, which is one of the formats that FBReader can read. You can find them at the &lt;a href=&#34;http://www.snee.com/epubkidsbooks/&#34;&gt;Free epub children&amp;rsquo;s picture books&lt;/a&gt; page that I created.&lt;/p&gt;
&lt;p&gt;Notes on the page discuss why I chose the books that I converted, why I chose the epub format, and some issues with FBReader&amp;rsquo;s current support of the format. I won&amp;rsquo;t repeat all those notes here, but I will repeat one: because I don&amp;rsquo;t own an XO, I&amp;rsquo;d appreciate it if someone who does could write a few paragraphs for me to add to that page on how to download one of the epub files to an XO and then open it up in FBReader.&lt;/p&gt;
&lt;h2 id=&#34;2-comments&#34;&gt;2 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.ccil.org/~cowan&#34; title=&#34;http://www.ccil.org/~cowan&#34;&gt;John Cowan&lt;/a&gt; on &lt;a href=&#34;#comment-1555&#34;&gt;January 20, 2008 2:33 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I grabbed FBReader and Little Bo-Peep and tried running them together. The experience was confusing as hell.&lt;/p&gt;
&lt;p&gt;The book came up with a nearly blank screen, with the big arrow icons (&amp;ldquo;Move Forward&amp;rdquo; and &amp;ldquo;Move Back&amp;rdquo;) grayed out, and with a mysterious &amp;ldquo;1 out of 3&amp;rdquo; on the progress bar at the bottom. Only when I hit the Page Down key (mouse scrolling also worked) did I see the cover illustration, and there was zero rhyme or reason that I could discern for the progress bar shifting to 2/3 and eventually to 3/3. The big arrow icons remained grayed out the whole time.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;m moaning like this because I&amp;rsquo;m supposing that there&amp;rsquo;s something your conversion could be doing differently to change the situation. If it&amp;rsquo;s just FBReader bugs, then so be it.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-1556&#34;&gt;January 20, 2008 3:40 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Hi John,&lt;/p&gt;
&lt;p&gt;The 1/3 etc. thing is just FBReader&amp;rsquo;s interface. The blank screen until you press page (or cursor) down happened to me with my subnotebook Lifebook running Ubuntu, because the screen is small, but not on a Windows machine with a bigger screen. More importantly, a friend with an XO said that the first image showed there upon first loading. How big was the screen where you tried this?&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2008">2008</category>
      
      <category domain="https://www.bobdc.com//categories/ebooks">ebooks</category>
      
    </item>
    
    <item>
      <title>Scraping and linked data</title>
      <link>https://www.bobdc.com/blog/scraping-and-linked-data/</link>
      <pubDate>Fri, 11 Jan 2008 08:59:40 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/scraping-and-linked-data/</guid>
      
      
      <description><div>Wired Magazine gives scraping the buzzword treatment but remains clueless about the semantic web and linked data.</div><div>&lt;p&gt;The latest issue of Wired has an article with the provocative title of &lt;a href=&#34;http://www.wired.com/techbiz/media/magazine/16-01/ff_scraping&#34;&gt;The Data Wars&lt;/a&gt; about web sites built around data retrieved by &amp;ldquo;bots&amp;rdquo; doing &amp;ldquo;scraping&amp;rdquo;. I quote these because the article twists the terms a bit to make them and their subjects seem more dramatic, more cutting edge, and—you guessed it—more &amp;ldquo;Web 2.0&amp;rdquo;.&lt;/p&gt;
&lt;blockquote class=&#34;pullquote&#34; style=&#39;width: 190px; font: bold 1.333em/1.125em &#34;Helvetica Neue&#34;, Helvetica, Arial, sans-serif; margin: 1.5em 0 1.5em 1.5em !important; padding: 0.6em 5px !important; background: none !important; border: 3px double \#ddd; border-width: 3px 0; text-align: center; float: right; &#39;&gt;**I see three historical phases for this kind of data retrieval, and Wired still doesn&#39;t know about the third.**&lt;/blockquote&gt;
&lt;p&gt;The dramatic tension of the article is the conflict between, on the one hand, craigslist and other large sites with lots of valuable public data, and on the other hand, the sites who pull some of this data, &amp;ldquo;remix&amp;rdquo; it to be more useful to their own audience, and then put ads around their &amp;ldquo;mashups&amp;rdquo;. (Somehow, code monkeys surrounded by earth-toned cubicle fabric think that it makes them resemble DJs surrounded by crates of vinyl if they use musical buzzwords to refer to the act of combining multiple things into a new one. If I wrote a Turbo Pascal program twenty years ago that included a few existing libraries that I collected from different sources, was that a mashup, or was it a remix?)&lt;/p&gt;
&lt;p&gt;According to the article, scraping&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;refers to the act of automatically harvesting information from another site and using the results for sometimes nefarious activities. (Some scrapers, for instance, collect email addresses from public web sites and sell them to spammers)&amp;hellip; Scrapers write software robots using script languages like Perl, PHP, or Java. They direct the bots to go out (either from a Web server or a computer of their own) to the target site and, if necessary, log in. Then the bots copy and bring back the requested payload, be it images, lists of contact information, or a price catalog.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;So using &lt;a href=&#34;http://www.gnu.org/software/wget/&#34;&gt;wget&lt;/a&gt; or &lt;a href=&#34;http://curl.haxx.se/&#34;&gt;curl&lt;/a&gt; to pull down a text file and then feeding that file to a Perl script that looks for and extracts strings that match certain patterns is now a command to a robot (excuse me, &amp;ldquo;bot&amp;rdquo;) army to go forth and retrieve payloads. I suppose that if we do this after the sun goes down, we can refer to our scripts as an &amp;ldquo;unholy army of the night&amp;rdquo; for added excitement.&lt;/p&gt;
&lt;p&gt;I see three historical phases for this kind of data retrieval, and Wired still doesn&amp;rsquo;t know about the third. The first, which people have been doing since late in the last century, involves retrieving files and then running scripts to find and pull out useful information as described above. (By the way, John Cowan has just put out a &lt;a href=&#34;http://recycledknowledge.blogspot.com/2008/01/tagsoup-12-released-at-long-last.html&#34;&gt;new release&lt;/a&gt; of TagSoup, a parser that converts HTML retrieved from &amp;ldquo;in the wild&amp;rdquo; to well-formed XML. My own base tool set for scraping is wget, TagSoup, and XSLT; lately I&amp;rsquo;ve been using these to get Project Gutenberg metadata about public domain children&amp;rsquo;s books such as &lt;a href=&#34;http://www.gutenberg.org/etext/23598&#34;&gt;Little Bo Peep&lt;/a&gt;.)&lt;/p&gt;
&lt;p&gt;The second phase, implemented by sites that are willing to share some data but want to control that sharing, are APIs like those provided by &lt;a href=&#34;http://www.amazon.com/gp/browse.html?node=3435361&#34;&gt;Amazon&lt;/a&gt; and &lt;a href=&#34;http://code.google.com/&#34;&gt;Google&lt;/a&gt;, which the article covers. Since I first drafted this posting, the difference between scraping and API use became a big story in the data geek world after Facebook &lt;a href=&#34;http://scobleizer.com/2008/01/03/ive-been-kicked-off-of-facebook/&#34;&gt;disabled&lt;/a&gt; Robert Scoble&amp;rsquo;s account because he was beta testing &lt;a href=&#34;http://www.plaxo.com/&#34;&gt;Plaxo&lt;/a&gt;. This online address service &lt;a href=&#34;http://www.news.com/8301-13577_3-9839474-36.html&#34;&gt;scrapes Facebook instead of using its API&lt;/a&gt; because Facebook&amp;rsquo;s API doesn&amp;rsquo;t provide address book information, and Scoble has enough Facebook friends that trying to scrape all that data violated Facebook&amp;rsquo;s terms of service, or something. (If you think that this is a really big deal, Dan Brickley brings some &lt;a href=&#34;http://danbri.org/words/2008/01/07/249&#34;&gt;much-needed perspective&lt;/a&gt; to it.)&lt;/p&gt;
&lt;p&gt;The third phase of web-based data retrieval is the pulling down of data that was intentionally put into web pages for retrieval by automated processes. Unlike the data retrieved in the first phase of web data retrieval, this data goes into the web pages in a format that conforms to simple rules so that it&amp;rsquo;s immediately usable, with no requirement for pattern matching and rearranging. Unlike the APIs of the second phase, the new data is retrieved with a simple HTTP request (perhaps wget or curl) with no need to provide a login developer token or to make calls to specific processes that will then hand you the data if you make the calls correctly.&lt;/p&gt;
&lt;p&gt;There are multiple efforts working in this area. The &lt;a href=&#34;http://linkeddata.org/&#34;&gt;Linked Data&lt;/a&gt;, &lt;a href=&#34;http://www.w3.org/2001/sw/&#34;&gt;Semantic Web&lt;/a&gt;, and &lt;a href=&#34;http://microformats.org/&#34;&gt;microformats&lt;/a&gt; movements all overlap to some extent, but I don&amp;rsquo;t know of any single term that encompasses them all, unless an especially passionate advocate of one insists that the others are subsets of their work. The key difference between this work and the scraping described in the Wired article is that this third phrase is about people putting up data that they &lt;em&gt;want&lt;/em&gt; others to retrieve and use. I don&amp;rsquo;t want you pulling my data and running it next to Google AdSense ads unless it helps me in some way. If the data consists of schedules for events that I charge money for, such as plane flights or movie showings, then I&amp;rsquo;m happy to let you drive more business to me. If I&amp;rsquo;m craigslist or Facebook, I just see you building a business model around my data with no benefit to me, and I don&amp;rsquo;t like it.&lt;/p&gt;
&lt;p&gt;Of course, it&amp;rsquo;s not purely about sharing data to make more money; the academic world also has plenty of research efforts with good reasons to share their data. I&amp;rsquo;ll be writing soon here about the value of seeking out organizations with strong motivations to share their data.&lt;/p&gt;
&lt;h2 id=&#34;3-comments&#34;&gt;3 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.openlinksw.com/blog/~kidehen&#34; title=&#34;http://www.openlinksw.com/blog/~kidehen&#34;&gt;Kingsley Idehen&lt;/a&gt; on &lt;a href=&#34;#comment-1536&#34;&gt;January 11, 2008 11:19 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Bob,&lt;/p&gt;
&lt;p&gt;In the [Linked Data]{} community we see Linked Data as ground zero (foundation layer) within the Semantic Web stack. It&amp;rsquo;s the part Semantic Web that deals with injecting structured data into the Web with Meshing/Mixing/Joining in mind from the onset (courtesy of HTTP based Data Object Identifiers or URIs, RDF Data Model, and HTTP Content Negotiation).&lt;/p&gt;
&lt;p&gt;Hopefully, we might be able to use the term: Meshing as a constructive distinguishing mechanism between Web 2.0 and Web.vNext (3.0, Semantic Web, Semantic Data web, Linked Data Web, Giant Global Graph or whatever label eventually sticks).&lt;/p&gt;
&lt;p&gt;Happy New Year!&lt;/p&gt;
&lt;p&gt;Kingsley&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://danbri.org/&#34; title=&#34;http://danbri.org/&#34;&gt;Dan Brickley&lt;/a&gt; on &lt;a href=&#34;#comment-1540&#34;&gt;January 11, 2008 3:46 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;As a big fan of Dan Bricklin&amp;rsquo;s various work, I was dissapointed to find that your intriguing link above points to my blog instead. Shome mishtake shurely?&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-1541&#34;&gt;January 11, 2008 3:49 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Oops. Sorry. Corrected.&lt;/p&gt;
&lt;p&gt;You danbris confuse me.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2008">2008</category>
      
      <category domain="https://www.bobdc.com//categories/publishing">publishing</category>
      
    </item>
    
    <item>
      <title>Command line processing with the DITA Open Toolkit</title>
      <link>https://www.bobdc.com/blog/command-line-processing-with-t/</link>
      <pubDate>Wed, 09 Jan 2008 18:06:18 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/command-line-processing-with-t/</guid>
      
      
      <description><div>A new IBM developerWorks article.</div><div>&lt;p&gt;&lt;a href=&#34;http://www.ibm.com/developerworks/library/x-tipditajavacmd.html&#34;&gt;&lt;img src=&#34;https://www.bobdc.com/img/main/dw-home2.jpg&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;developerWorks logo&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The first time I installed the open source &lt;a href=&#34;http://dita-ot.sourceforge.net/&#34;&gt;DITA Open Toolkit&lt;/a&gt; I ran the demo transformation and thought &amp;ldquo;great, now what?&amp;rdquo; It was all very &lt;a href=&#34;http://ant.apache.org/&#34;&gt;Ant&lt;/a&gt;-driven, and while I know the basics of Ant I didn&amp;rsquo;t know where to start with the fairly complex demo script.&lt;/p&gt;
&lt;p&gt;I later found that the toolkit&amp;rsquo;s &lt;code&gt;ant&lt;/code&gt; subdirectory has some small, manageable templates for Ant build files, but meanwhile I discovered a jar file at the core of the toolkit that could be invoked from the command line with no need to write Ant files. This makes it easier to jump right in with using the toolkit with your own DITA documents, so I wrote up what I learned and the article &lt;a href=&#34;https://web.archive.org/web/20080628064854/http://www.ibm.com/developerworks/library/x-tipditajavacmd.html&#34;&gt;Easy command line processing with the DITA Open Toolkit&lt;/a&gt; just went live on IBM developerWorks.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2008">2008</category>
      
      <category domain="https://www.bobdc.com//categories/dita">DITA</category>
      
    </item>
    
    <item>
      <title>Information wants to be expensive</title>
      <link>https://www.bobdc.com/blog/information-wants-to-be-expens/</link>
      <pubDate>Sun, 06 Jan 2008 11:40:23 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/information-wants-to-be-expens/</guid>
      
      
      <description><div>And, out of context, it can mislead.</div><div>&lt;p&gt;&lt;a href=&#34;http://en.wikipedia.org/wiki/Stewart_Brand&#34;&gt;&lt;img src=&#34;http://media.ted.com/images/ted/8497_254x191.jpg&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; width=&#34;160px&#34; alt=&#34;Stewart Brand&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;How many people know that one of the IT world&amp;rsquo;s tritest catch phrases is only half of a much more sensible description of two opposing forces? My title above may appear to contradict the well-known truism that &amp;ldquo;information wants to be free&amp;rdquo;, but it&amp;rsquo;s actually the other half of Stewart Brand&amp;rsquo;s &lt;a href=&#34;http://www.anu.edu.au/people/Roger.Clarke/II/IWtbF.html&#34;&gt;original quote&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;On the one hand information wants to be expensive, because it&amp;rsquo;s so valuable. The right information in the right place just changes your life. On the other hand, information wants to be free, because the cost of getting it out is getting lower and lower all the time. So you have these two fighting against each other.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Note &lt;em&gt;why&lt;/em&gt; information wants to be free: &amp;ldquo;because the cost of getting it out is getting lower and lower all the time.&amp;rdquo; He&amp;rsquo;s describing a trend in publishing by anthropomorphizing information to make his image more vivid, not ascribing volition to an abstract concept. (Being an abstraction, information can&amp;rsquo;t really &lt;em&gt;want&lt;/em&gt; anything.) Brand was saying that, while information can be very valuable, advances in publishing technology create a force that continually lowers the cost of distributing it, and &amp;ldquo;you have these two [forces] fighting against each other&amp;rdquo;.&lt;/p&gt;
&lt;p&gt;The expression &amp;ldquo;but information wants to be free&amp;rdquo; gets over 900 hits on Google, showing that many people think that it&amp;rsquo;s a valid point in an argument. The next time you hear someone use it, ask them &amp;ldquo;why does it want to be free?&amp;rdquo; or even &amp;ldquo;says who?&amp;rdquo; Call me cynical, but I don&amp;rsquo;t think you&amp;rsquo;re going to get good answers.&lt;/p&gt;
&lt;h2 id=&#34;3-comments&#34;&gt;3 Comments&lt;/h2&gt;
&lt;p&gt;By Rick Jelliffe on &lt;a href=&#34;#comment-1531&#34;&gt;January 7, 2008 8:33 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;And if they do know it was Stewart Brand, you could still stump them by asking them to sing the Karl Bartos song &amp;ldquo;Information wants to be free&amp;rdquo; :-)&lt;/p&gt;
&lt;p&gt;Actually, I was wondering about this phrase just last week&amp;hellip;Bob the Psychic!&lt;/p&gt;
&lt;p&gt;By Ontic on &lt;a href=&#34;#comment-1532&#34;&gt;January 8, 2008 10:35 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;From my understanding of Heidegger, I believe he would state that information is free (or has a low marginal cost) it is knowledge (skillful doing) that is expensive and that has high marginal cost to obtain.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob&lt;/a&gt; on &lt;a href=&#34;#comment-1533&#34;&gt;January 8, 2008 10:50 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;This depends on Heidegger&amp;rsquo;s original wording and how we choose to translate it. I don&amp;rsquo;t want to debate ideas about information and its cost as much as I want to restore the overused five-word Brand quote to its original context.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2008">2008</category>
      
      <category domain="https://www.bobdc.com//categories/publishing">publishing</category>
      
    </item>
    
    <item>
      <title>Lazyweb grants a Christmas wish</title>
      <link>https://www.bobdc.com/blog/lazyweb-grants-a-christmas-wis/</link>
      <pubDate>Mon, 31 Dec 2007 17:26:15 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/lazyweb-grants-a-christmas-wis/</guid>
      
      
      <description><div>An epub-reading ebook reader for the OLPC laptop.</div><div>&lt;p&gt;&lt;a href=&#34;http://www.teleread.org/blog/2007/12/28/eureka-fbreader-already-running-on-olpc-laptop-epub-books-in-time-for-one-laptop-per-child-kids/&#34;&gt;&lt;img src=&#34;http://upload.wikimedia.org/wikipedia/en/8/8f/OLPC_logo.png&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;OLPC logo&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I recently &lt;a href=&#34;http://groups.google.com/group/fbreader/browse_thread/thread/f464cc63bb10c669/d3c223ba1ecaf2d0?#d3c223ba1ecaf2d0&#34;&gt;asked&lt;/a&gt; on an &lt;a href=&#34;http://www.fbreader.org/&#34;&gt;FBReader&lt;/a&gt; mailing list about the possibility of porting this ebook reading program to the &lt;a href=&#34;http://www.laptop.org/&#34;&gt;One Laptop Per Child&lt;/a&gt; XO computer. A good ebook-reading program capable of reading &lt;a href=&#34;http://www.idpf.org/&#34;&gt;epub&lt;/a&gt; files was an obvious application for this machine, and once I got FBReader running on my Ubuntu laptop it seemed like the best candidate for an XO ebook reader.&lt;/p&gt;
&lt;p&gt;The ensuing thread shows that David Rothman of the &lt;a href=&#34;http://www.teleread.org/blog/&#34;&gt;Teleread&lt;/a&gt; weblog (&amp;ldquo;News &amp;amp; views on e-books, libraries, publishing and related topics&amp;rdquo;) joined in, and he &lt;a href=&#34;http://www.teleread.org/blog/2007/12/21/needed-asap-on-the-100-laptop-fbreader-and-easy-opera-installation/&#34;&gt;blogged&lt;/a&gt; the issue. Friday, David &lt;a href=&#34;http://www.teleread.org/blog/2007/12/28/eureka-fbreader-already-running-on-olpc-laptop-epub-books-in-time-for-one-laptop-per-child-kids/&#34;&gt;announced&lt;/a&gt; that Bennett Todd had worked out how to install the FBReader on the XO and posted the instructions. It still requires a comfort level with the command line, so installation won&amp;rsquo;t be easy for most XO owners at this point, but the fact that it&amp;rsquo;s running at all shows that the biggest hurdle has been cleared.&lt;/p&gt;
&lt;p&gt;I take back &lt;a href=&#34;https://www.bobdc.com/blog/the-cheap-commodity-ebook-read&#34;&gt;what I said&lt;/a&gt; about comparisons of Amazon&amp;rsquo;s Kindle to the XO being a red herring, because &lt;a href=&#34;http://dubinko.info/blog/2007/12/24/olpc-is-here/&#34;&gt;according to Micah Dubinko&lt;/a&gt;, the screen resolution in the XO&amp;rsquo;s monochrome mode is appreciably better than the standard mode, so the machine looks like a good candidate for an ebook reading device after all.&lt;/p&gt;
&lt;p&gt;In fact, considering the target audience of the OLPC project, it&amp;rsquo;s a great candidate.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2007">2007</category>
      
      <category domain="https://www.bobdc.com//categories/ebooks">ebooks</category>
      
    </item>
    
    <item>
      <title>Apple marketing slogans</title>
      <link>https://www.bobdc.com/blog/apple-marketing-slogans/</link>
      <pubDate>Fri, 28 Dec 2007 08:36:59 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/apple-marketing-slogans/</guid>
      
      
      <description><div>Warm! Fuzzy!</div><div>&lt;p&gt;&lt;a href=&#34;https://www.bobdc.com/blog/nice-parodies-of-mac-hipper-th&#34;&gt;&lt;img src=&#34;https://www.bobdc.com/img/main/macguycryinggirl.jpg&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;Mac guy and crying girl&#34; /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;http://youtube.com/watch?v=ZtPPFZERXyg&#34;&gt;The computer for the rest of us.&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;http://en.wikipedia.org/wiki/Think_Different&#34;&gt;Think different.&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;http://www.theapplecollection.com/Collection/AppleMovies/mov/concert_144a.html&#34;&gt;Rip. Mix. Burn.&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;http://hubpages.com/hub/Apple-Sends-3rd-Grader-Cease-And-Desist-Letter&#34;&gt;Hey little nine-year-old girl&lt;/a&gt;, cease and desist sending product ideas about the iPod Nano to Steve Jobs like you did in that letter you sent three months ago. If you want to know why, see our legal policy on apple.com.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Apparently the story is two years old, but having enough of a surge in recent popularity to make &lt;a href=&#34;http://www.npr.org/programs/waitwait/&#34;&gt;Wait Wait Don&amp;rsquo;t Tell Me&lt;/a&gt; last Saturday. It sure gave me a happy holiday smirk.&lt;/p&gt;
&lt;h2 id=&#34;1-comments&#34;&gt;1 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.ccil.org/~cowan&#34; title=&#34;http://www.ccil.org/~cowan&#34;&gt;John Cowan&lt;/a&gt; on &lt;a href=&#34;#comment-1524&#34;&gt;December 28, 2007 10:19 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The charming part of the story you linked to is the &amp;ldquo;I&amp;rsquo;m sure Steve Jobs didn&amp;rsquo;t know&amp;rdquo;, which reminds me irresistibly of &amp;ldquo;If only the Tsar knew of our plight!&amp;rdquo; from &lt;em&gt;Fiddler on the Roof&lt;/em&gt;.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2007">2007</category>
      
      <category domain="https://www.bobdc.com//categories/miscellaneous">miscellaneous</category>
      
    </item>
    
    <item>
      <title>ebooks, on ebook readers or not</title>
      <link>https://www.bobdc.com/blog/ebooks-on-ebook-readers-or-not/</link>
      <pubDate>Fri, 21 Dec 2007 08:50:02 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/ebooks-on-ebook-readers-or-not/</guid>
      
      
      <description><div>New gadgets can be fun, but there&#39;s a lot you can do with the computer you already own.</div><div>&lt;p&gt;While doing some Christmas shopping at Best Buy last Saturday I asked one of their &lt;a href=&#34;http://www.geeksquad.com/&#34;&gt;Geek Squad&lt;/a&gt; guys at the information desk if they had the Sony ebook reader, because I wanted to see what one looked like up close. This low-end Napoleon Dynamite snorted &amp;ldquo;They tried ebooks a few years ago, and it was a complete failure&amp;rdquo;. I was tempted to press him on what he meant by &amp;ldquo;they&amp;rdquo;, but I just smiled, told him &amp;ldquo;it&amp;rsquo;s a little more complicated than that&amp;rdquo;, and headed for the cash registers.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://www.feedbooks.com/&#34;&gt;&lt;img src=&#34;http://www.teleread.org/blog/wp-content/uploads/2007/12/feedbooks2.jpg&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;feedbooks logo&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Various &lt;a href=&#34;https://www.bobdc.com/blog/the-cheap-commodity-ebook-read&#34;&gt;specialized ebook reading devices&lt;/a&gt; have come out over the years, usually with less success than their makers had hoped, and now Sony and Amazon&amp;rsquo;s Kindle are trying to find out if 2007 is the right timing for such an appliance. Meanwhile, many people have a broader idea of what &amp;ldquo;ebooks&amp;rdquo; are, and it&amp;rsquo;s worked out quite well for them. Along with (or instead of) an electronic version of a book to read on a specialized reading device, more and more publishers are making them available for reading on regular, general-purpose PCs.&lt;/p&gt;
&lt;p&gt;These might be offered in specialized ebook formats such as &lt;a href=&#34;http://www.idpf.org/&#34;&gt;epub&lt;/a&gt;, or they could be PDF files, or they could be offered in multiple formats and others. The &lt;a href=&#34;http://www.feedbooks.com/&#34;&gt;feedbooks&lt;/a&gt; site offers many literary classics in epub, Mobipocket/Kindle, Sony Reader, iLiad, and two PDF formats. While I have no intention of reading &lt;a href=&#34;http://www.feedbooks.com/discover/view_book/83&#34;&gt;War and Peace&lt;/a&gt; off of a regular computer screen (although it was nice to see that the Ubuntu version of &lt;a href=&#34;http://www.fbreader.org/&#34;&gt;FBReader&lt;/a&gt; displayed the epub version of feedbook&amp;rsquo;s &amp;ldquo;War and Peace&amp;rdquo; just fine on my Lifebook laptop), commercial publishers are finding people willing to pay for electronic versions of their books to use as reference material. For example, an editor at Manning told me that &lt;a href=&#34;http://www.manning.com/about/ebooks.html&#34;&gt;their ebooks&lt;/a&gt; do well among tech consultants who want to bring multiple books along on consulting engagements for reference without straining their backs.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://innodata-isogen.com&#34;&gt;My employer&lt;/a&gt; has created PDF versions of backlist books for clients who are now well-positioned to distribute these titles as electronic books like Manning does and to also use them for print-on-demand delivery. (I&amp;rsquo;m supposed to be careful about throwing clients&amp;rsquo; names around, so contact me privately if you want to hear more. To summarize, doing this is pretty straightforward if your content is already in XML, and if not, we have plenty of experience converting odd formats to XML.)&lt;/p&gt;
&lt;p&gt;For online delivery, though, dedicated ebook formats have one serious advantage over PDF files: PDFs are ultimately designed for printed pages, so that viewing them on a screen is like moving a window around a page that&amp;rsquo;s larger than the window. If you change the size of your window, the page doesn&amp;rsquo;t care. Reading a page with multiple columns means scrolling down and up and down and up and down all to read a single page. Change the size of a dedicated ebook reader program and it re-scrolls the content to accomodate the window instead of making the window work around the page layout. (If you want to play with this, &lt;a href=&#34;http://www.adobe.com/products/digitaleditions/&#34;&gt;Adobe Digital Editions reader&lt;/a&gt;, &lt;a href=&#34;http://www.fbreader.org/&#34;&gt;FBreader&lt;/a&gt;, &lt;a href=&#34;http://www.mobipocket.com/en/HomePage/default.asp?Language=EN&#34;&gt;Mobipocket&lt;/a&gt;, and &lt;a href=&#34;http://www.dotreader.com/site/index.php&#34;&gt;dotReader&lt;/a&gt; are all free ebook reading programs.)&lt;/p&gt;
&lt;p&gt;So while I do like the Geek Squad ties—although I&amp;rsquo;d wear one with a long sleeve, more cotton-based shirt—I find their consumer electronics perspective on what new technology is going where to be a bit narrow. People are doing more cool things with ebooks all the time, both for free and for money, and I look forward to seeing where it goes in the next few years.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2007">2007</category>
      
      <category domain="https://www.bobdc.com//categories/ebooks">ebooks</category>
      
    </item>
    
    <item>
      <title>Stopping phone spam</title>
      <link>https://www.bobdc.com/blog/stopping-phone-spam/</link>
      <pubDate>Sun, 16 Dec 2007 11:52:23 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/stopping-phone-spam/</guid>
      
      
      <description><div>In the U.S., at least.</div><div>&lt;p&gt;&lt;a href=&#34;http://www.fcc.gov/cgb/donotcall/&#34;&gt;&lt;img src=&#34;http://www.fcc.gov/cgb/donotcall/images/donotcall-banner.jpg&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;some description&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;A few weeks ago an autodialer at a &lt;a href=&#34;http://www.michaels.com&#34;&gt;crafts supply chain&lt;/a&gt; called us up and played a recording of some perky woman telling us about their fabulous new deals. I called the local store, asked for a manager, and told them that this was a bad thing to do. She was unaware of this bit of marketing. Two days ago I picked up the ringing phone and heard a recorded message from the car dealership where we once bought a car telling us about the &amp;ldquo;Holiday Express&amp;rdquo; shuttle service that would take people back and forth to the mall a half mile away while they waited for maintenance work on their cars.&lt;/p&gt;
&lt;p&gt;I called them to express my annoyance as well, and got a little worried: was this a trend? It could be far worse than email spam, because you can look through 100 spam candidates in a special folder of your email program in just a few seconds at your convenience, but when the phone rings, you may not know if it&amp;rsquo;s important or marketing nonsense, so you pick it up.&lt;/p&gt;
&lt;p&gt;As it turns out, it&amp;rsquo;s much easier to deal with phone span than email spam because it&amp;rsquo;s much easier to regulate. The U.S. Federal Communications Commission has some &lt;a href=&#34;http://www.fcc.gov/cgb/consumerfacts/tcpa.html&#34;&gt;excellent background&lt;/a&gt; on the topic, explaining that these calls must include a phone number that you can call to tell them to remove you from their list. I learned that because we&amp;rsquo;re on the national &lt;a href=&#34;http://en.wikipedia.org/wiki/Do_not_call_list&#34;&gt;Do Not Call list&lt;/a&gt;, the crafts chain and the car dealership violated FCC regulations. The same web page links to a complaint form where you can tell the FCC about a violation, and I took plenty of satisfaction in doing so.&lt;/p&gt;
&lt;p&gt;(You&amp;rsquo;ve gotta love the picture included with the FCC&amp;rsquo;s information page, reproduced above—a few clues in it give me the impression that it&amp;rsquo;s older than the four-year-old Do Not Call Registry.)&lt;/p&gt;
&lt;h2 id=&#34;3-comments&#34;&gt;3 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://war-on-error.blogspot.com&#34; title=&#34;http://war-on-error.blogspot.com&#34;&gt;Michael Friedman&lt;/a&gt; on &lt;a href=&#34;#comment-1519&#34;&gt;December 18, 2007 10:20 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Are there distinctions between cell phones and dedicated land lines for this, or is just related to phone number? When my wife and I moved to Dallas, we decided to drop our land line for good. Our &amp;ldquo;home&amp;rdquo; phone is our cell phone, and we&amp;rsquo;ve gotten along just fine in the last two years without a dedicated line. We rarely get calls to our cell phones, but occasionally the computer voice butts in and its particularly irritating because they are using up our (plentiful) minutes.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob&lt;/a&gt; on &lt;a href=&#34;#comment-1520&#34;&gt;December 18, 2007 10:59 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Yes, the third mention of the word &amp;lsquo;wireless&amp;rsquo; on the &lt;a href=&#34;http://www.fcc.gov/cgb/consumerfacts/tcpa.html&#34;&gt;http://www.fcc.gov/cgb/consumerfacts/tcpa.html&lt;/a&gt; page that I pointed to says that it&amp;rsquo;s against FCC regulations to send these calls to wireless phones, even if you&amp;rsquo;re not on the Do Not Call registry, or &amp;ldquo;any other service for which the person being called would be charged for the call&amp;rdquo;.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.TimothyHorrigan.com/election2008.html&#34; title=&#34;http://www.TimothyHorrigan.com/election2008.html&#34;&gt;Timothy Horrigan&lt;/a&gt; on &lt;a href=&#34;#comment-1522&#34;&gt;December 20, 2007 10:32 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I have been volunteering to do phone calls for a Presidential candidate in New Hampshire. In spite of the incredible hype which has been going on for a year now (actually more than a year&amp;hellip; 2004/2008 repeat candidates Kucinich and Edwards never really stopped running) about 40% of the registered voters will not vote. Another large faction takes pride in not deciding until election day. And my guy (OK, it&amp;rsquo;s Edwards, even if I did go to college with Obama) will get maybe 35% of the vote even if everything goes right. Anyway, we have to call about a million people to find the 100,000 or so who want to talk to us.&lt;/p&gt;
&lt;p&gt;A lot of those million gripe that they are on the no-call list. However, the No Call Lost rule actually doesn&amp;rsquo;t apply to political campaigns (although we sure don&amp;rsquo;t want to waste time calling people who don&amp;rsquo;t want to listen to us.) It also does not apply to charities (who have some of the most annoying telemarketers of all) or to companies you have an ongoing business relationship with (e.g., that annoying national crafts store chain.)\&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2007">2007</category>
      
      <category domain="https://www.bobdc.com//categories/neat-tricks">neat tricks</category>
      
    </item>
    
    <item>
      <title>Roy Head&#39;s &#34;Treat Her Right&#34;</title>
      <link>https://www.bobdc.com/blog/roy-heads-treat-her-right/</link>
      <pubDate>Mon, 10 Dec 2007 08:42:42 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/roy-heads-treat-her-right/</guid>
      
      
      <description><div>Somersaulting off the stage—in a buttoned-up double-breasted suit.</div><div>&lt;p&gt;Texan &lt;a href=&#34;http://en.wikipedia.org/wiki/Roy_Head&#34;&gt;Roy Head&lt;/a&gt; had several small and regional hits and one really big one: &amp;ldquo;Treat Her Right&amp;rdquo;, a single that was second only to the Beatles&amp;rsquo; &amp;ldquo;Yesterday&amp;rdquo; at its peak on the U.S. charts. The chorus doesn&amp;rsquo;t come until the very end, but what a chorus, and it&amp;rsquo;s well worth the wait through the long slow burn leading up to it. The song remains an R&amp;amp;B standard for many bar bands, at least among the hipper ones.&lt;/p&gt;
&lt;p&gt;Head, who surprised more than one interviewer and concert venue by being white when he showed up, was and still is known for a wild stage show, as the &lt;a href=&#34;http://www.youtube.com/watch?v=2BLQgQNuN-M&#34;&gt;following video&lt;/a&gt; amply demonstrates.&lt;/p&gt;

&lt;div style=&#34;position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;&#34;&gt;
  &lt;iframe src=&#34;https://www.youtube.com/embed/2BLQgQNuN-M&#34; style=&#34;position: absolute; top: 0; left: 0; width: 100%; height: 100%; border:0;&#34; allowfullscreen title=&#34;YouTube Video&#34;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;p&gt;As he described in a &lt;a href=&#34;http://www.wfmu.org/playlists/shows/24339&#34;&gt;recent interview&lt;/a&gt;, many of the moves that may look like they came from James Brown actually come from Jackie Wilson, a big influence on Brown, Head, Michael Jackson, and many others. YouTube has &lt;a href=&#34;http://www.youtube.com/watch?v=AXj8JLSVWvc&#34;&gt;another version&lt;/a&gt; of Head doing &amp;ldquo;Treat Her Right&amp;rdquo; whose slower burn and slightly different arrangement leads me to believe that it&amp;rsquo;s a live performance and not lip synching.&lt;/p&gt;
&lt;p&gt;In the same interview, he mentioned a son named Sundance that had gotten pretty far on American Idol and is still successful. I asked my wife and daughters about him, and my wife said &amp;ldquo;his parents are hippies or something&amp;rdquo;. The video above demonstrates otherwise, but naming your kid &amp;ldquo;Sundance Head&amp;rdquo; could give that impression. (It turns out that this is just a nickname, and his real first name is the more prosaic &amp;ldquo;Jason&amp;rdquo;.) My younger daughter insisted that I had seen Sundance on the show, but I don&amp;rsquo;t pay very close attention to Simon, Paula, et. al. so I don&amp;rsquo;t remember.&lt;/p&gt;
&lt;p&gt;The same daughter likes very little of the music that I do and is suspicious of anything in black and white that doesn&amp;rsquo;t star Lucille Ball and Desi Arnaz, but when I showed her the video above she said &amp;ldquo;Wow, the father is &lt;em&gt;&lt;strong&gt;so&lt;/strong&gt;&lt;/em&gt; much better.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;To quote &lt;a href=&#34;http://www.hbo.com/entourage/cast/character/drama.html&#34;&gt;Johnny Drama&lt;/a&gt; from Entourage: &amp;ldquo;Victory!&amp;rdquo;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2007">2007</category>
      
      <category domain="https://www.bobdc.com//categories/music">music</category>
      
    </item>
    
    <item>
      <title>XHTML 2 for authoring?</title>
      <link>https://www.bobdc.com/blog/xhtml-2-for-authoring/</link>
      <pubDate>Wed, 05 Dec 2007 17:50:33 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/xhtml-2-for-authoring/</guid>
      
      
      <description><div>Suddenly it all made sense.</div><div>&lt;p&gt;On Monday I gave a talk at &lt;a href=&#34;http://2007.xmlconference.org/&#34;&gt;XML 2007&lt;/a&gt; titled &lt;a href=&#34;http://2007.xmlconference.org/public/schedule/detail/302&#34;&gt;XHTML 2 for Publishers: New opportunities for storing interoperable content and metadata&lt;/a&gt;. It used a lot of material from the article &lt;a href=&#34;http://www.ibm.com/developerworks/library/x-xhtml2now.html&#34;&gt;Put XHTML 2 to work now&lt;/a&gt; that I did for developerWorks, but with a greater focus on the potential value to publishers.&lt;/p&gt;
&lt;p&gt;The session that preceded mine in the Publishing Track room was &lt;a href=&#34;http://2007.xmlconference.org/public/schedule/detail/360&#34;&gt;Where are XML authoring tools today, where are they going, and what do we want?&lt;/a&gt;, in which Marc Jacobson of Really Strategies moderated a panel discussion on XML authoring by representatives of Just Systems XMetal, Xopus, and Adobe. A major theme in that discussion was how Microsoft Word has set people&amp;rsquo;s expectations for a writing environment, and a minor theme was how people learning about XML-based authoring can be intimidated by a huge number of elements to learn about possibly using for their document.&lt;/p&gt;
&lt;p&gt;I had an idea, and discussed it with several people during the question session after my talk: how about having people author editorial content in XHTML 2? While previous versions of HTML were useful for little more than shipping pages to browsers for display, the additional structure, semantics, and metadata that you can add to an XHTML 2 document make it a more reasonable option for authoring content that may end up being used in a variety of media. There are plenty of people who have created web pages and don&amp;rsquo;t expect to use Word to do so, so they don&amp;rsquo;t have the expectation of a &lt;strong&gt;B&lt;/strong&gt; button to bold text, an &lt;em&gt;I&lt;/em&gt; button to italicize, and change bars to identify revisions. They&amp;rsquo;re already familiar with the basic elements of HTML, and the few new XHTML 2 ones they&amp;rsquo;d have to learn (for example, &lt;code&gt;h&lt;/code&gt; for headers and the &lt;code&gt;section&lt;/code&gt; element) are intuitive enough for them to pick up without much trouble.&lt;/p&gt;
&lt;p&gt;If I was going to set up such an authoring environment, I&amp;rsquo;d customize the XHTML 2 schema to impose a few more constraints, none of which should appear illogical to people who&amp;rsquo;ve created web pages before and none of which would make the documents invalid XHTML 2. For example, instead of letting authors put &lt;code&gt;h&lt;/code&gt;, &lt;code&gt;h1&lt;/code&gt;, &lt;code&gt;h2&lt;/code&gt;, and other header elements anywhere they wanted, I&amp;rsquo;d remove the &lt;code&gt;h1&lt;/code&gt; through &lt;code&gt;h6&lt;/code&gt; elements from the schema and only allow &lt;code&gt;h&lt;/code&gt; elements as the first element of &lt;code&gt;body&lt;/code&gt; and &lt;code&gt;section&lt;/code&gt; elements. (Perhaps I&amp;rsquo;d even require &lt;code&gt;h&lt;/code&gt; as the first element of a &lt;code&gt;section&lt;/code&gt; element, depending on the content being authored.)&lt;/p&gt;
&lt;p&gt;To add additional semantics, I might make XHTML 2&amp;rsquo;s new &lt;code&gt;role&lt;/code&gt; attribute required for certain elements and specify a list of allowable values that could be entered there—again, depending on the content being authored. If an application that used this content needed XML that was not XHTML 2, these values could be used as hooks to transform the content to conform to another schema.&lt;/p&gt;
&lt;p&gt;Metadata would also depend on the needs of the shop setting up the authoring environment, and because mixed content isn&amp;rsquo;t an issue for this, XForms or InfoPath forms would be a sensible way to gather this information and then insert it as RDFa in the appropriate places in the document.&lt;/p&gt;
&lt;p&gt;There are cases where this wouldn&amp;rsquo;t be a good idea, but there are cases where it could be a good idea. It fits in well with the main thesis of my talk: unlike all versions of HTML being developed before (or concurrently with) XHTML 2, XHTML 2 can be useful for more than just shipping pages to browsers for display.&lt;/p&gt;
&lt;h2 id=&#34;2-comments&#34;&gt;2 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.peterkrantz.com&#34; title=&#34;http://www.peterkrantz.com&#34;&gt;Peter Krantz&lt;/a&gt; on &lt;a href=&#34;#comment-1460&#34;&gt;December 6, 2007 3:17 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;We have found XHTML2 to be perfect for legal documents. The defined elements in XHTML2 cover all structural requirements and RDFa allows the creator to add stuff from the legal domain.&lt;/p&gt;
&lt;p&gt;On a national level we define a basic vocabulary for law makers. Each government authority can then add their own domain specific information in the same document. In the end, the XHTML2 document carries a lot of information and can be parsed from different perspectives and be converted to HTML/PDF/Whatever.&lt;/p&gt;
&lt;p&gt;I foresee that XHTML2 may be great for this type of work but I doubt that it will ever become the preferred way to create web pages.&lt;/p&gt;
&lt;p&gt;On the downside (currently) is the lack of tool support. We implemented our own in-browser editor to create legal documents (e.g. it has buttons for &amp;ldquo;legal paragraph&amp;rdquo; and other things from the domain). We have also extended MS Word.&lt;/p&gt;
&lt;p&gt;What do you use when you want to provide an editor for people who shouldn&amp;rsquo;t see the XML (i.e. WYSIWYM)?&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob&lt;/a&gt; on &lt;a href=&#34;#comment-1461&#34;&gt;December 6, 2007 3:48 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I haven&amp;rsquo;t assembled such a system for others, and for my own work I use Emacs+nxml (see &lt;a href=&#34;http://www.snee.com/bobdc.blog/2007/04/using_xhtml_2_schemas.html&#34;&gt;http://www.snee.com/bobdc.blog/2007/04/using_xhtml_2_schemas.html&lt;/a&gt; ).&lt;/p&gt;
&lt;p&gt;After nearly five years at LexisNexis, I found it very interesting to hear that you use XHTML 2 for legal documents. What governments are using this (or working toward doing so), and is the documentation available on the public web? Has it been a problem that XHTML 2 is still in Working Draft status?&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2007">2007</category>
      
      <category domain="https://www.bobdc.com//categories/xml">XML</category>
      
    </item>
    
    <item>
      <title>The cheap commodity eBook reader of the future</title>
      <link>https://www.bobdc.com/blog/the-cheap-commodity-ebook-read/</link>
      <pubDate>Wed, 28 Nov 2007 09:43:43 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/the-cheap-commodity-ebook-read/</guid>
      
      
      <description><div>If Kindle is the iPod of eBooks, I&#39;m waiting for the Sansa.</div><div>&lt;p&gt;My MP3 player is a &lt;a href=&#34;http://www.amazon.com/dp/B000IM9542/?tag=bobducharmeA&#34;&gt;Sandisk Sansa&lt;/a&gt;. It does everything I would want an iPod to do with a slightly clunkier interface for much less money. I can move music, podcasts, and playlists onto it from free software on both Windows and Linux machines, and it works with all standard MP3 accessories such as my noise-cancelling headphones and my in-car player. It can even record live room sound.&lt;/p&gt;
&lt;p&gt;I don&amp;rsquo;t think I&amp;rsquo;d even heard of Sandisk before I bought the device. I always assumed that they were based along the western edge of the Pacific Rim, but I &lt;a href=&#34;http://en.wikipedia.org/wiki/Sandisk&#34;&gt;just found out&lt;/a&gt; that they&amp;rsquo;re actually in California. No matter; the last time I needed an MP3 player, they had the best price/performance ratio in the local BestBuy, and they didn&amp;rsquo;t try to lock me into some distribution deal for the content I play on their device. (It may have included a CD and a deal with some music downloading service, but I ignored it and got along just fine.) Someday when I replace it, I might get something else, because I have no brand loyalty to it—I&amp;rsquo;m looking for good performance at a low price, not a fashion accessory.&lt;/p&gt;
&lt;blockquote class=&#34;pullquote&#34; style=&#39;width: 190px; font: bold 1.333em/1.125em &#34;Helvetica Neue&#34;, Helvetica, Arial, sans-serif; margin: 1.5em 0 1.5em 1.5em !important; padding: 0.6em 5px !important; background: none !important; border: 3px double \#ddd; border-width: 3px 0; text-align: center; float: right; &#39;&gt;**Comparisons to the OLPC look like a red herring to me.**&lt;/blockquote&gt;
&lt;p&gt;I&amp;rsquo;m &lt;a href=&#34;https://www.bobdc.com/blog/ebook-hardware-readers-suddenl&#34;&gt;looking forward&lt;/a&gt; to getting an eBook reader, and the details of Amazon&amp;rsquo;s &lt;a href=&#34;http://www.amazon.com/exec/obidos/ASIN/B000FI73MA/ref=pd_sl_aw_manual-1_kindle1_40650458_1&#34;&gt;Kindle&lt;/a&gt; helped me realize what I really want in an eBook reader. It ain&amp;rsquo;t Kindle. While the term &amp;ldquo;delivery platform&amp;rdquo; has become an over-used buzz phrase, Amazon sure has created one: you pay them $400 for a device that only they can deliver content to.&lt;/p&gt;
&lt;p&gt;Some people (I&amp;rsquo;m sure to Amazon&amp;rsquo;s delight) have called it &amp;ldquo;the iPod of eBook readers&amp;rdquo;, but Apple never charged people to move MP3s from CDs that they already owned—or MP3s that they had recorded themselves of their own kids, bands or whatever—to the iPod. Once you&amp;rsquo;ve bought the Kindle, you can&amp;rsquo;t move your own content from your computer over a USB connection to your new eBook device to read it there; you have to mail it (if it&amp;rsquo;s on the approved list of formats such as Microsoft Word and JPEG) to your device via Amazon&amp;rsquo;s servers. No thanks.&lt;/p&gt;
&lt;p&gt;The key thing that Kindle has in common with its most well-known predecessor, &lt;a href=&#34;http://www.sonystyle.com/webapp/wcs/stores/servlet/ProductDisplay?catalogId=10551&amp;amp;storeId=10151&amp;amp;langId=-1&amp;amp;productId=11038811&#34;&gt;Sony&amp;rsquo;s eBook reader&lt;/a&gt;, is the use of the &lt;a href=&#34;http://www.eink.com&#34;&gt;e-ink&lt;/a&gt; high contrast low power display. What I&amp;rsquo;d really like to see is some commodity electronics manufacturer crank out an inexpensive machine around an e-ink display that let us view &lt;a href=&#34;http://www.idpf.org/&#34;&gt;.epub&lt;/a&gt; eBooks and PDF files—the Sansa of eBook readers. Along with the low overall weight of the device, this high contrast low power display is the most important feature of an eBook reader, because otherwise, why not just use the computers that we already own to read electronic documents? This is why comparisons to the &lt;a href=&#34;http://en.wikipedia.org/wiki/OLPC_XO-1&#34;&gt;OLPC&lt;/a&gt; look like a red herring to me—it doesn&amp;rsquo;t use much power, and I&amp;rsquo;m all for its general goals, but if the screen isn&amp;rsquo;t any better than the Dell my employer supplied me with or the Lifebook where I keep my own stuff, I don&amp;rsquo;t see the point of buying another computer just for reading.&lt;/p&gt;
&lt;p&gt;Publishers interested in the eBook market are &lt;a href=&#34;http://dearauthor.com/wordpress/2007/11/19/no-kindle-exclusivity-for-harlequin-readers/&#34;&gt;not giving Amazon exclusive distribution rights&lt;/a&gt;, which is encouraging to those of us who want to pick a reading device and then separately pick distribution channels (or even—just imagine!—make our own content for the device without relying on a big established distributor). I also want the ability to replace the device with another one from a different manufacturer and still use all the same content that I had paid for.&lt;/p&gt;
&lt;p&gt;It sure would be interesting to see who&amp;rsquo;s been buying the Linux-based $3,000 and $4,000 &lt;a href=&#34;http://www.eink.com/kits/index.html&#34;&gt;prototype kits&lt;/a&gt; from E-ink. Their website &lt;a href=&#34;http://www.eink.com/company/index.html&#34;&gt;says that&lt;/a&gt; Seiko and Casio have been working on E-ink timepieces. I&amp;rsquo;ve never known Seiko to make anything but watches, but Casio has a &lt;a href=&#34;http://www.casio.com/products/&#34;&gt;wide range of products&lt;/a&gt;. (I still have my old &lt;a href=&#34;http://hem.passagen.se/tkolb/art/synth/cz101_e.htm&#34;&gt;CZ-101&lt;/a&gt; keyboard.) I&amp;rsquo;d love to see Casio make a $150 E-ink reader, and then the Sandisks of the world could make their $99 versions.&lt;/p&gt;
&lt;p&gt;And if any of these devices can play MP3s through a headphone out jack, I promise not to complain that it&amp;rsquo;s trying to do too many things. Reading and listening to music often go well together.&lt;/p&gt;
&lt;h2 id=&#34;2-comments&#34;&gt;2 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.wowio.com&#34; title=&#34;http://www.wowio.com&#34;&gt;Paula Wellings&lt;/a&gt; on &lt;a href=&#34;#comment-1452&#34;&gt;December 3, 2007 10:25 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I don&amp;rsquo;t understand the comment &amp;ldquo;Comparisons to the OLPC look like a red herring to me.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;My understanding was that the XO Laptop had a dual display&amp;ndash;regular rez colour and high rez black and white.&lt;/p&gt;
&lt;p&gt;Please explain. Many thanks.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob&lt;/a&gt; on &lt;a href=&#34;#comment-1453&#34;&gt;December 3, 2007 12:45 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I hadn&amp;rsquo;t heard about the high-res black and white. I look forward to checking it out.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2007">2007</category>
      
      <category domain="https://www.bobdc.com//categories/ebooks">ebooks</category>
      
    </item>
    
    <item>
      <title>Customized cookbooks</title>
      <link>https://www.bobdc.com/blog/customized-cookbooks/</link>
      <pubDate>Tue, 20 Nov 2007 18:00:18 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/customized-cookbooks/</guid>
      
      
      <description><div>Pay for professional recipes, or do it the XML geek way.</div><div>&lt;p&gt;The New York Times article &lt;a href=&#34;http://www.nytimes.com/2007/11/12/technology/12tastebook.html&#34;&gt;A Cookbook of One&amp;rsquo;s Own From the Internet&lt;/a&gt; (registration required) describes how Kamran Mohsenin, the founder of a photography web site, took an interesting step beyond personalized calendars: personalized cookbooks using recipes from &lt;a href=&#34;http://www.epicurious.com/&#34;&gt;epicurious.com&lt;/a&gt;, a web site has 25,000 recipes from Gourmet and Bon Appetit magazines. (I grew up with both of these magazines around the house, because my parents were big fans.) This reminds me of a quote I just read near the end of Stephen Colbert&amp;rsquo;s hilarious book &lt;a href=&#34;http://www.amazon.com/exec/obidos/ISBN=0446580503/bobducharmeA/&#34;&gt;I am America, and So Can You&lt;/a&gt;: &amp;ldquo;There&amp;rsquo;s a lot of repurposing of content yet to be done, believe me!&amp;rdquo;&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://en.wikipedia.org/wiki/Thanksgiving_dinner&#34;&gt;&lt;img src=&#34;http://upload.wikimedia.org/wikipedia/en/thumb/8/8f/New_England_Thanksgiving_Dinner.jpg/300px-New_England_Thanksgiving_Dinner.jpg&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;Thanksgiving menus&#34; width=&#34;200px&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Mohsenin&amp;rsquo;s &lt;a href=&#34;http://www.tastebook.com/home&#34;&gt;TasteBook.com&lt;/a&gt; web site lets you pick 100 recipes, some cover images, and your own title, and then they&amp;rsquo;ll put it in a ring binder with a color hard cover for $34.95. It looks great, but someone who enjoys playing with free XML tools could do something similar without spending a dime. There&amp;rsquo;d be a few hours of work involved, but you&amp;rsquo;d have the bonus of something to add to your résumé.&lt;/p&gt;
&lt;p&gt;In two two-part articles I did in XML.com (&amp;ldquo;Getting Started with XQuery&amp;rdquo; &lt;a href=&#34;http://www.xml.com/pub/a/2005/03/02/xquery.html&#34;&gt;part 1&lt;/a&gt;, &lt;a href=&#34;http://www.xml.com/pub/a/2005/03/23/xquery-2.html&#34;&gt;part 2&lt;/a&gt;; &amp;ldquo;Scaling up with XQuery&amp;rdquo; &lt;a href=&#34;http://www.xml.com/pub/a/2006/06/14/scaling-up-with-xquery-part-1.html&#34;&gt;part 1&lt;/a&gt;, &lt;a href=&#34;http://www.xml.com/pub/a/2006/06/21/scaling-up-with-xquery-part-2.html&#34;&gt;part 2&lt;/a&gt;), for sample data I used recipes from the &lt;a href=&#34;http://dsquirrel.tripod.com/recipeml/indexrecipes2.html&#34;&gt;Squirrel RecipeML archive&lt;/a&gt;, which has 10,000 public domain recipes marked up in XML. XQuery makes it easy to retrieve all the recipes meeting certain conditions, such as those that mention a certain ingredient, so dynamic generation of customized cookbooks would be simple. Acting as an HTTP server, XQuery applications typically retrieve XML, convert it to HTML, and deliver it to a browser, but they don&amp;rsquo;t have to. You could retrieve the XML and convert it to XSL-FO (maybe with a little XSLT along the way) and create a PDF book. From there, you could use &lt;a href=&#34;https://www.bobdc.com/blog/selfpublishing-bound-hardcopy&#34;&gt;lulu&lt;/a&gt; to create the bound version, but you&amp;rsquo;d have to spend a few dollars there. An &lt;a href=&#34;http://www.idpf.org/&#34;&gt;.epub&lt;/a&gt; eBook is another option; I &lt;a href=&#34;http://www.jedisaber.com/eBooks/tutorial.asp&#34;&gt;recently learned&lt;/a&gt; how easy these are to make.&lt;/p&gt;
&lt;p&gt;If you really want to turn this into something for your résumé, a &lt;a href=&#34;https://www.bobdc.com/blog/praising-dita&#34;&gt;DITA&lt;/a&gt; angle would be nice. When people believe too much of the hype around DITA and think that it will automate all of their XML-related publishing, I tell them that DITA is designed around topic-oriented content, and that it may or may not be a good fit for their content. My favorite example of content that is a good fit is cookbooks: of the three basic topic types in the DITA architecture, &amp;ldquo;task&amp;rdquo; has a structure that fits around the {title, description, ingredient list, assembly step list, conclusion} structure of a typical recipe with just a few renames. Once you automate the conversion of RecipeML files into DITA-compliant recipes, the DITA Open Toolkit can turn them into HTML, PDF, RTF, troff, and several other formats.&lt;/p&gt;
&lt;p&gt;To be honest, as I noted in one of the articles mentioned above, some RecipeML recipes will require a bit of manual cleanup before you can feed them to your application, because volunteer XML data entry isn&amp;rsquo;t always as well-formed as you&amp;rsquo;d like. I did it for 291 of these recipes for &lt;a href=&#34;http://www.xml.com/pub/a/2006/06/14/scaling-up-with-xquery-part-1.html&#34;&gt;part 1 of the &amp;ldquo;Scaling Up&amp;rdquo;&lt;/a&gt; article, so I know that it&amp;rsquo;s not too much work. Perhaps splitting up the cleanup and the coding would be a nice project for an XML-related class. On the last day of class, everyone could cook a recipe from the collection. (Can you tell that I&amp;rsquo;m an American bearing down on the fourth Thursday of November?)&lt;/p&gt;
&lt;h2 id=&#34;1-comments&#34;&gt;1 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.whatsyourmedstyle.com/medstyle/demo.aspx?cat=b&#34; title=&#34;http://www.whatsyourmedstyle.com/medstyle/demo.aspx?cat=b&#34;&gt;Jayne&lt;/a&gt; on &lt;a href=&#34;#comment-1414&#34;&gt;November 20, 2007 8:03 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I for one am a big cookbook fan! This sounds really good, I&amp;rsquo;ll have to look into it for sure&amp;hellip; My new favorite is Rocco DiSpirito&amp;rsquo;s new book Real Life Recipes. It is full of usefull everyday quick and easy foods that are actually a joy to make.. Along with his usual helpfull tips and suggestions&amp;hellip; Rocco is on a roll right now with easy cooking ideas, he&amp;rsquo;s also doing these great video blogs for Bertolli&amp;rsquo;s Mediterranean style frozen dinners. You can check them out at &lt;a href=&#34;http://www.whatsyourmedstyle.com/medstyle/demo.aspx?cat=b&#34;&gt;http://www.whatsyourmedstyle.com/medstyle/demo.aspx?cat=b&lt;/a&gt;&lt;br /&gt;
They are really light, and quick in prepare time.. 10 mins and dinner is on the table&amp;hellip; I work with them so I got the inside info, but I gotta pass it along, these are too good to not talk about!!&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2007">2007</category>
      
      <category domain="https://www.bobdc.com//categories/dita">DITA</category>
      
      <category domain="https://www.bobdc.com//categories/xml">XML</category>
      
    </item>
    
    <item>
      <title>The Nazz: &#34;Open My Eyes&#34;</title>
      <link>https://www.bobdc.com/blog/the-nazz-open-my-eyes/</link>
      <pubDate>Thu, 15 Nov 2007 09:16:46 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/the-nazz-open-my-eyes/</guid>
      
      
      <description><div>Vintage rockin&#39; psychedelia.</div><div>&lt;p&gt;When I was a teenager, Todd Rundgren had his one big hit, &amp;ldquo;Hello It&amp;rsquo;s Me&amp;rdquo;. Combining pop, jazz, and so-called &amp;ldquo;progressive rock&amp;rdquo; usually results in a self-indulgent mess instead of good pop music, but he hit the balance right for this one.&lt;/p&gt;
&lt;p&gt;I have the 45 RPM single of his first version of the song, a recording he did with his Philadelphia psychedelic sixties band &lt;a href=&#34;https://en.wikipedia.org/wiki/Nazz&#34;&gt;The Nazz&lt;/a&gt; in 1968. The B side, also available on &lt;a href=&#34;https://en.wikipedia.org/wiki/Nuggets:_Original_Artyfacts_from_the_First_Psychedelic_Era,_1965%E2%80%931968&#34;&gt;Nuggets&lt;/a&gt;, is &amp;ldquo;Open My Eyes&amp;rdquo;. After starting with the electric piano equivalent of the Kinks &amp;ldquo;You Really Got Me&amp;rdquo;, The Move&amp;rsquo;s &amp;ldquo;Do Ya&amp;rdquo;, and especially The Who&amp;rsquo;s &amp;ldquo;Can&amp;rsquo;t Explain&amp;rdquo;, the guitars kick in, and along with its own artsy touches, this thing really rocks. The lyrics seem to be about falling in love while tripping, a popular theme of the time (hell, even the squeaky-clean &lt;a href=&#34;http://www.youtube.com/watch?v=dYUQ131RYEU&#34;&gt;Cowsills did it&lt;/a&gt;). And there&amp;rsquo;s a &lt;a href=&#34;http://www.youtube.com/watch?v=llJG6pHhuBY&#34;&gt;YouTube video&lt;/a&gt; of the Nazz lip syncing it!&lt;/p&gt;

&lt;div style=&#34;position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;&#34;&gt;
  &lt;iframe src=&#34;https://www.youtube.com/embed/llJG6pHhuBY&#34; style=&#34;position: absolute; top: 0; left: 0; width: 100%; height: 100%; border:0;&#34; allowfullscreen title=&#34;YouTube Video&#34;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;p&gt;You want flower power? How about a drummer using flowers instead of sticks?&lt;/p&gt;
&lt;p&gt;(Now that I&amp;rsquo;ve shown you this and the &lt;a href=&#34;https://www.bobdc.com/blog/the-13th-floor-elevators&#34;&gt;13th Floor Elevators&lt;/a&gt;, I promise not to search YouTube for every Nuggets band and then blog the results. This is left as an exercise for the reader, and there would be far worse ways to spend your time.) I&amp;rsquo;d be more curious to see Rundgren&amp;rsquo;s &lt;a href=&#34;http://www.rollingstone.com/news/story/9480060/the_cars_reform_with_rundgren&#34;&gt;current project&lt;/a&gt; of fronting the partially reunited &lt;a href=&#34;https://en.wikipedia.org/wiki/The_Cars&#34;&gt;Cars&lt;/a&gt; if Rundgren and his former bandmates didn&amp;rsquo;t outnumber the actual former Cars in the group.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2007">2007</category>
      
      <category domain="https://www.bobdc.com//categories/music">music</category>
      
    </item>
    
    <item>
      <title>Querying DBpedia</title>
      <link>https://www.bobdc.com/blog/querying-dbpedia/</link>
      <pubDate>Fri, 09 Nov 2007 09:53:38 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/querying-dbpedia/</guid>
      
      
      <description><div>And looking forward to more.</div><div>&lt;p&gt;DBpedia, as its &lt;a href=&#34;http://dbpedia.org/&#34;&gt;home page&lt;/a&gt; tells us, &amp;ldquo;is a community effort to extract structured information from Wikipedia and to make this information available on the Web.&amp;rdquo; That&amp;rsquo;s &amp;ldquo;available&amp;rdquo; in the sense of available as data to programs that read and process it, because the data was already available to eyeballs on Wikipedia. This availability is a big deal to the semantic web community because it&amp;rsquo;s a huge amount of valuable (and often, fun) information that the public can now query with &lt;a href=&#34;http://www.w3.org/TR/rdf-sparql-query/&#34;&gt;SPARQL&lt;/a&gt;, the W3C standard query language that is one of the pillars of the semantic web.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://dbpedia.org/page/Chalkboard_gag&#34;&gt;&lt;img src=&#34;http://www.simpsoncrazy.com/product_pics/simpsons-hd-blackboard.jpg&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;Bart at blackboard&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Although I&amp;rsquo;d &lt;a href=&#34;http://www.snee.com/xml/xml2006/owlrdbms.html&#34;&gt;dabbled in SPARQL&lt;/a&gt; and seen several &lt;a href=&#34;http://wiki.dbpedia.org/Datasets?v=lbf#h18-7&#34;&gt;sample SPARQL queries against DBpedia&lt;/a&gt; in action, I had a little trouble working out how to create my own SPARQL queries against DBpedia data. I finally managed to do it, so I thought I&amp;rsquo;d describe here how I successfully implemented my first use case. Instead of a &amp;ldquo;Hello World&amp;rdquo; example, I went with more of an &amp;ldquo;I will not publish the principal&amp;rsquo;s credit report&amp;rdquo; example: a list of things written by Bart on the school blackboard at the beginning of a collection of Simpsons episodes.&lt;/p&gt;
&lt;p&gt;For an example of the structured information available in Wikipedia, see the &lt;a href=&#34;http://wiki.dbpedia.org/Datasets?v=lbf#h18-8&#34;&gt;Infobox data&lt;/a&gt; on the right of the Wikipedia page for the 2001 Simpsons episode &lt;a href=&#34;http://en.wikipedia.org/wiki/Tennis_the_Menace&#34;&gt;Tennis the Menace&lt;/a&gt; and the Categories links at the bottom of the same page. The &lt;a href=&#34;http://dbpedia.org/page/Tennis_the_Menace&#34;&gt;DBpedia page&lt;/a&gt; for that episode shows the Infobox information with the property names that you would use in SPARQL queries; semantic web fans will recognize some of the property and namespace prefixes. I&amp;rsquo;m going to repeat this because it took a while for it to sink into my own head, and once it did it made everything much easier: most Wikipedia pages with fielded information have corresponding DBpedia pages, and those corresponding pages are where you find the names of the &amp;ldquo;fields&amp;rdquo; that you&amp;rsquo;ll use in your queries.&lt;/p&gt;
&lt;p&gt;Once I knew the following three things, I could create the SPARQL query:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;The Simpson episode Wikipedia pages are the &lt;a href=&#34;http://wiki.dbpedia.org/Datasets?v=lbf#h18-4&#34;&gt;identified &amp;ldquo;things&amp;rdquo;&lt;/a&gt; that we would consider as the subjects of our RDF triples (or, put another way, as the objects in the {object, attribute name, attribute value} triplets that contain our data).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The bottom of the Wikipedia page for the &amp;ldquo;Tennis the Menace&amp;rdquo; episode tells us that it is a member of the Wikipedia category &amp;ldquo;The Simpsons (season 12) episodes&amp;rdquo;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The episode&amp;rsquo;s DBpedia page tells us that &lt;code&gt;http://dbpedia.org/property/blackboard&lt;/code&gt; is the property URI for the Wikipedia infobox &amp;ldquo;Chalkboard&amp;rdquo; field.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Knowing this, I created the following SPARQL query to list everything that Bart wrote on the blackboard in season 12 (revised 4/21/11 to account for changes in DBpedia vocabularies):&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;SELECT * WHERE {
  ?episode &amp;lt;http://purl.org/dc/terms/subject&amp;gt;
   &amp;lt;http://dbpedia.org/resource/Category:The_Simpsons_%28season_14%29_episodes&amp;gt; .
  ?episode dbpedia2:blackboard ?chalkboard_gag .
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;You can paste this into the form in &lt;a href=&#34;http://dbpedia.org/snorql/&#34;&gt;DBpedia&amp;rsquo;s SNORQL interface&lt;/a&gt; (which adds the namespace declarations that I&amp;rsquo;ve omitted above) to run it, or you can just click &lt;a href=&#34;http://dbpedia.org/snorql/?query=SELECT+*+WHERE+%7B++%3Fepisode+%3Chttp://purl.org/dc/terms/subject%3E+++%3Chttp://dbpedia.org/resource/Category:The_Simpsons_%2528season_14%2529_episodes%3E+.++%3Fepisode+dbpedia2:blackboard+%3Fchalkboard_gag+.%7D&#34;&gt;here&lt;/a&gt; to execute the URL version of the query as created by the SNORQL interface.&lt;/p&gt;
&lt;p&gt;Of course this is just scratching the surface. More extensive use of SPARQL features (see Leigh Dodds&amp;rsquo; excellent &lt;a href=&#34;http://www.xml.com/pub/a/2005/11/16/introducing-sparql-querying-semantic-web-tutorial.html&#34;&gt;tutorial on XML.com)&lt;/a&gt; and more study of the available classes of &lt;a href=&#34;http://wiki.dbpedia.org/Datasets?v=lbf&#34;&gt;DBpedia data&lt;/a&gt; will turn up huge new possibilities. It&amp;rsquo;s like having a new toy to play with, and I know I&amp;rsquo;m going to have fun.&lt;/p&gt;
&lt;p&gt;For my next step, I was hoping to list what Bart wrote in all the episodes, not just season 12. The bottom of the &lt;a href=&#34;http://en.wikipedia.org/wiki/Category:The_Simpsons_episodes%2C_season_12&#34;&gt;Wikipedia page for season 12&lt;/a&gt; tells us that that this category is part of the category &lt;a href=&#34;http://en.wikipedia.org/wiki/Category:The_Simpsons_episodes&#34;&gt;The Simpsons episodes&lt;/a&gt;, but I haven&amp;rsquo;t found a variation on the query above that makes the connection. I know that getting a handle on this category/subcategory distinction is going to open up what I can do with SPARQL and DBpedia in a lot more ways than just listing everything that Bart ever wrote on that blackboard.&lt;/p&gt;
&lt;h2 id=&#34;3-comments&#34;&gt;3 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://lespetitescases.net&#34; title=&#34;http://lespetitescases.net&#34;&gt;Got&lt;/a&gt; on &lt;a href=&#34;#comment-1392&#34;&gt;November 11, 2007 12:41 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Nice funny example of using Dbpedia.&lt;/p&gt;
&lt;p&gt;I developped some examples of using Dbpedia. It&amp;rsquo;s in french, but these examples may interrest you : &lt;a href=&#34;http://www.lespetitescases.net/dbpedia/,&#34;&gt;http://www.lespetitescases.net/dbpedia/,&lt;/a&gt; e.g. :\&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;a mashup with Google maps (&lt;a href=&#34;http://www.lespetitescases.net/dbpedia/dbpedia-googlemaps.php?category=Category:Capitals_in_Europe&#34;&gt;http://www.lespetitescases.net/dbpedia/dbpedia-googlemaps.php?category=Category:Capitals_in_Europe&lt;/a&gt;)\&lt;/li&gt;
&lt;li&gt;List of persons thanks to Birth city and exhibit (&lt;a href=&#34;http://www.lespetitescases.net/dbpedia/dbpedia-exhibit.php?place=http://dbpedia.org/resource/Amsterdam&#34;&gt;http://www.lespetitescases.net/dbpedia/dbpedia-exhibit.php?place=http://dbpedia.org/resource/Amsterdam&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I work about an english/french version.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://vaclav.synacek.com/&#34; title=&#34;http://vaclav.synacek.com/&#34;&gt;Vaclav Synacek&lt;/a&gt; on &lt;a href=&#34;#comment-1393&#34;&gt;November 11, 2007 3:31 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;This works for all the seasons:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;  SELECT ?season, ?episode,?chalkboard_gag WHERE {
  ?episode skos:subject ?season .
  ?season rdfs:label ?season_title .
  ?episode dbpedia2:blackboard ?chalkboard_gag .
  FILTER (regex(?season_title, &amp;quot;The Simpsons episodes, season&amp;quot;)) .
  }
  ORDER BY ?season
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;It is not a perfect query-there are some extra lines in the result, it&amp;rsquo;s not perfectly ordered and relying on regex is not very semantically clean. But it works.&lt;/p&gt;
&lt;p&gt;Thanks for showing me a new toy.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc&#34; title=&#34;http://www.snee.com/bobdc&#34;&gt;Bob&lt;/a&gt; on &lt;a href=&#34;#comment-1394&#34;&gt;November 11, 2007 11:21 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Thanks Vaclav! It&amp;rsquo;s the funniest list of anything that I&amp;rsquo;ve read in a while. I was going to list some of my favorites here, but there are just too many. And, I&amp;rsquo;ve learned a little more about SPARQL and DBpedia.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2007">2007</category>
      
      <category domain="https://www.bobdc.com//categories/dbpedia">DBpedia</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
      <category domain="https://www.bobdc.com//categories/semantic-web">semantic web</category>
      
    </item>
    
    <item>
      <title>Computing with our phones</title>
      <link>https://www.bobdc.com/blog/computing-with-our-phones/</link>
      <pubDate>Wed, 07 Nov 2007 19:52:43 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/computing-with-our-phones/</guid>
      
      
      <description><div>I want to believe...</div><div>&lt;p&gt;Soon we&amp;rsquo;ll do most of our computing on our phones (see AP article &lt;a href=&#34;http://biz.yahoo.com/ap/071105/japan_bye_bye_pcs.html&#34;&gt;PCs Losing Their Relevance in Japan&lt;/a&gt;, as noted on &lt;a href=&#34;http://radar.oreilly.com/archives/2007/11/japan_leading_t.html&#34;&gt;O&amp;rsquo;Reilly Radar&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;But not too soon, as noted in a recent &lt;a href=&#34;http://publishing2.com/2007/09/26/five-reasons-why-the-mobile-web-sucks/&#34;&gt;Publishing 2.0&lt;/a&gt; posting. It&amp;rsquo;s nice to see this dose of reality among all the hype.&lt;/p&gt;
&lt;h2 id=&#34;3-comments&#34;&gt;3 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.ccil.org/~cowan&#34; title=&#34;http://www.ccil.org/~cowan&#34;&gt;John Cowan&lt;/a&gt; on &lt;a href=&#34;#comment-1386&#34;&gt;November 7, 2007 9:16 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I suspect that moving computing to phones won&amp;rsquo;t last. As the mobile generation ages they&amp;rsquo;ll need bigger screens and bigger keyboards and pointing devices that don&amp;rsquo;t require such precision. And then even though your computer is your phone, it&amp;rsquo;ll still spend 9 to 5 docked to something very much like a laptop dock today, only smaller.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.dur.ac.uk/j.r.c.geldart/&#34; title=&#34;http://www.dur.ac.uk/j.r.c.geldart/&#34;&gt;Joe Geldart&lt;/a&gt; on &lt;a href=&#34;#comment-1387&#34;&gt;November 8, 2007 1:22 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;But hopefully the idea of device-dependent &lt;em&gt;access&lt;/em&gt; will survive that.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.dur.ac.uk/j.r.c.geldart/&#34; title=&#34;http://www.dur.ac.uk/j.r.c.geldart/&#34;&gt;Joe Geldart&lt;/a&gt; on &lt;a href=&#34;#comment-1388&#34;&gt;November 8, 2007 1:28 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Uh, I meant to say &amp;lsquo;independent&amp;rsquo; of course (self-confessed ubicomp supporter).&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2007">2007</category>
      
      <category domain="https://www.bobdc.com//categories/technology-future">technology, future</category>
      
    </item>
    
    <item>
      <title>A nice free XML editor</title>
      <link>https://www.bobdc.com/blog/a-nice-free-xml-editor/</link>
      <pubDate>Sun, 04 Nov 2007 09:56:08 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/a-nice-free-xml-editor/</guid>
      
      
      <description><div>(Of course, I&#39;ll continue to use Emacs in nxml mode.)</div><div>&lt;p&gt;I always thought that free XML editors (and some commercial ones) were limited to two display modes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;A tag view, which essentially displays your document as-is, but with color-coding of tags and other kinds of markup. When XML first became popular, the more powerful programmer&amp;rsquo;s editors added an XML mode that did this to their list of modes for various programming languages. A proper XML editor also enforces dynamic validation, so that when you enter the command to insert an element the editor only offers you a choice of valid elements for the cursor position.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;A tree view, which is easy to create with typical GUI widgets, and can be handy for transactional XML data (a.k.a. &lt;a href=&#34;http://www.snee.com/xml/xml2004paper.html&#34;&gt;data-oriented&lt;/a&gt; XML) but is lame for narrative (a.k.a. &amp;ldquo;document-oriented&amp;rdquo;) XML—who wants to see their paragraph shown as a node of a tree with text node children that have a an &lt;code&gt;emphasis&lt;/code&gt; element as one of their siblings?&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;img src=&#34;https://www.bobdc.com/img/main/xmlmind.jpg&#34; width=&#34;300px&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;[XMLMind screen shot of this blog posting]&#34;/&gt;
&lt;p&gt;A word processor-like view, which distinguishes between block and inline elements and displays each according to some hopefully straightforward specification, seemed like the province of more expensive editors such as Arbortext and XMetaL, but I recently discovered Pixware&amp;rsquo;s &lt;a href=&#34;http://www.xmlmind.com/&#34;&gt;XMLmind&lt;/a&gt;. Their free &amp;ldquo;&lt;a href=&#34;http://www.xmlmind.com/xmleditor/persoedition.html&#34;&gt;Personal Edition&lt;/a&gt;&amp;rdquo; is &lt;a href=&#34;http://www.xmlmind.com/xmleditor/download.shtml&#34;&gt;available&lt;/a&gt; for both Windows machines and Macs. It uses CSS stylesheets to style the documents that you create as you edit them. Pixware offers a &lt;a href=&#34;http://www.xmlmind.com/xmleditor/proedition.html&#34;&gt;Professional Edition&lt;/a&gt; for $250 that has more features such as on-the-fly spell checking, but the free one still has a spell checker (with a wide choice of dictionaries, because it&amp;rsquo;s from a French company), and comes with CSS stylesheets for several popular document types.&lt;/p&gt;
&lt;p&gt;It worked just fine with my own HTML CSS stylesheet that I use for writing things like this. I might even suggest that someone uninterested in XML but who wants to learn about web page creation use it with the XHTML DTD and CSS. You can&amp;rsquo;t beat the Personal Edition&amp;rsquo;s price.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ll &lt;a href=&#34;https://www.bobdc.com/blog/emacs-good-and-how-to-create-a&#34;&gt;continue to use Emacs&lt;/a&gt;, because I do want to see the markup, and I have too much affection for &lt;a href=&#34;http://www.thaiopensource.com/nxml-mode/&#34;&gt;nxml mode&lt;/a&gt;, all the other off-the-shelf Emacs features, and the ones that I&amp;rsquo;ve developed myself. To proofread, I add an &lt;a href=&#34;http://www.w3.org/1999/06/REC-xml-stylesheet-19990629/&#34;&gt;xml-stylesheet&lt;/a&gt; processing instruction and then view the document in Firefox. For people who are interested (or have friends or clients interested) in interactive editing of styled XML documents without spending any money, though, XML Mind is worth a good look.&lt;/p&gt;
&lt;h2 id=&#34;3-comments&#34;&gt;3 Comments&lt;/h2&gt;
&lt;p&gt;By David Holden on &lt;a href=&#34;#comment-1381&#34;&gt;November 4, 2007 6:31 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Hello Bob,&lt;/p&gt;
&lt;p&gt;you may also be interested in&lt;/p&gt;
&lt;p&gt;epcedit &lt;a href=&#34;http://www.epcedit.com/,&#34;&gt;http://www.epcedit.com/,&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;a now free SGML/XML editor. It can also be scripted using tcl.&lt;/p&gt;
&lt;p&gt;Dave.\&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-1382&#34;&gt;November 4, 2007 10:23 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;m impressed that it can edit SGML, but the website doesn&amp;rsquo;t show many screen shots, and the main one only shows a tree view and a tag view. Is there a formatted view?&lt;/p&gt;
&lt;p&gt;By David Holden on &lt;a href=&#34;#comment-1384&#34;&gt;November 5, 2007 6:45 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I think it uses ths SP libraries for its SGML cabability either way I&amp;rsquo;ve had this editor correctly pick treat SGML documents where the Arbortext editor failed.&lt;/p&gt;
&lt;p&gt;There is a formatted view, you can style using its own XPATH/CSS like language which can be saved as templates, and view with or without tags, however I would not say it allowed the most sophisticated styling. It does have CALS table editor.&lt;/p&gt;
&lt;p&gt;On a Linux platform it seems to lack antialiased fonts although I think this is due to its Tk underpinings.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2007">2007</category>
      
      <category domain="https://www.bobdc.com//categories/xml">XML</category>
      
    </item>
    
    <item>
      <title>Praising DITA</title>
      <link>https://www.bobdc.com/blog/praising-dita/</link>
      <pubDate>Thu, 25 Oct 2007 21:00:26 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/praising-dita/</guid>
      
      
      <description><div>For the wrong and right reasons.</div><div>&lt;p&gt;There many reasons to like the &lt;a href=&#34;http://dita.xml.org/&#34;&gt;Darwin Information Typing Architecture&lt;/a&gt;, but much of the praise for it lately seems a bit misguided. For a lot of XML products and services companies, DITA is the new bottle in which to put their old wine. They talk about how DITA is great because it lets you:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;write content once and then automate its use in multiple media (streamlining the publishing process, etc. etc.)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;mix and match blocks of content to create new products on the fly&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;reduce dependency on proprietary tools&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;select subsets of content based on metadata in attributes&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These are all great things, but XML technology had them before DITA came along. Take a look at the boldface bullet points in the SOA World article &lt;a href=&#34;http://xml.sys-con.com/read/434465.htm&#34;&gt;Improving Customer&amp;rsquo;s SOA Experience with DITA&lt;/a&gt; (a product of the ever-opportunistic SYS-CON folk—let&amp;rsquo;s give them extra points for working two trendy acronyms into the same article title, but take one off for bad punctuation): every one applies to to pre-DITA XML, and even to SGML. (Perhaps &amp;ldquo;Translation efficiency and acceleration&amp;rdquo; wouldn&amp;rsquo;t be as easy in SGML—people forget how much easier XML&amp;rsquo;s Unicode base made a lot of things.)&lt;/p&gt;
&lt;h2 id=&#34;rkFUet1SQ42vasxOCLxg5g&#34;&gt;Pre-DITA schema customization&lt;/h2&gt;
&lt;p&gt;While the idea of customizing DTDs isn&amp;rsquo;t new with DITA, to me DITA&amp;rsquo;s greatest contribution is the new possibilities it offers for DTD/schema customization.&lt;/p&gt;
&lt;p&gt;Most good XML and SGML schemas were customizable before. At the XML 2005 conference I did a presentation titled &lt;a href=&#34;http://www.idealliance.org/proceedings/xml05/abstracts/paper30.HTML&#34;&gt;Your schema and the industry-standard schema&lt;/a&gt; on how to evaluate the customizability of a standard schema as you consider adopting it. That paper goes into more detail on the syntax for the hooks typically used to allow customization, but to summarize, there are two basic techniques that can be mixed and matched. First, instead of defining an element&amp;rsquo;s content like this,&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;lt;!ELEMENT article (title,(paragraph|picture)+)&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;you can define it using a parameter entity and reference the parameter entity:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;lt;!ENTITY % article-content &amp;quot;title,(paragraph|picture)+&amp;quot;&amp;gt;
&amp;lt;!ELEMENT article (%article-content;)&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;blockquote class=&#34;pullquote&#34; style=&#39;width: 190px; font: bold 1.333em/1.125em &#34;Helvetica Neue&#34;, Helvetica, Arial, sans-serif; margin: 1.5em 0 1.5em 1.5em !important; padding: 0.6em 5px !important; background: none !important; border: 3px double \#ddd; border-width: 3px 0; text-align: center; float: right; &#39;&gt;DITA&#39;s greatest contribution is the new possibilities it offers for DTD/schema customization.&lt;/blockquote&gt;
&lt;p&gt;Your customized version can reference the DTD with these declarations and then redeclare the &lt;code&gt;article-content&lt;/code&gt; parameter entity to have any content model you like.&lt;/p&gt;
&lt;p&gt;If a DTD doesn&amp;rsquo;t want to allow a complete replacement of its &lt;code&gt;article&lt;/code&gt; content model and wants to instead just provide a hook to add new things, it can use the second technique for customization, which looks more like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;lt;!ENTITY % article.content.cust &amp;quot;&amp;quot;&amp;gt;
&amp;lt;!ELEMENT article (title,(paragraph|picture %article.content.cust;)+)&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This still requires that an &lt;code&gt;article&lt;/code&gt; begin with a &lt;code&gt;title&lt;/code&gt; and allows &lt;code&gt;paragraph&lt;/code&gt; and &lt;code&gt;picture&lt;/code&gt; elements in the mix of what comes after that, but it lets you redefine the customization parameter entity &lt;code&gt;article.content.cust&lt;/code&gt; from its default value as an empty string to something like &amp;ldquo;&lt;code&gt;| warning&lt;/code&gt;&amp;rdquo;. Then, as long as your customization declares a &lt;code&gt;warning&lt;/code&gt; element somewhere, that element can be part of the mix of what follows the &lt;code&gt;article&lt;/code&gt; element&amp;rsquo;s &lt;code&gt;title&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;As I described in the paper accompanying the XML 2005 presentation, well-behaved DTDs such as DocBook and &lt;a href=&#34;http://www.tei-c.org/P4X/MD.html&#34;&gt;TEI&lt;/a&gt; provide many such opportunities for customization; badly-bahaved DTDs like NITF don&amp;rsquo;t.&lt;/p&gt;
&lt;h2 id=&#34;jXXTM7wcTu-rxoVAiRftwA&#34;&gt;DITA&amp;rsquo;s new approach to DTD customization&lt;/h2&gt;
&lt;p&gt;DITA defines a &lt;code&gt;topic&lt;/code&gt; element, with a structure suitable for technical documentation, and a &lt;code&gt;map&lt;/code&gt; element for assembling topics into a sequence, hierarchy, or whatever you need for your output media. DITA also declares three specialized versions of &lt;code&gt;topic&lt;/code&gt; called &lt;code&gt;concept&lt;/code&gt;, &lt;code&gt;task&lt;/code&gt;, and &lt;code&gt;reference&lt;/code&gt;, and a mechanism for creating your own specializations of &lt;code&gt;topic&lt;/code&gt;, &lt;code&gt;task&lt;/code&gt;, &lt;code&gt;reference&lt;/code&gt;, &lt;code&gt;concept&lt;/code&gt;, &lt;code&gt;map&lt;/code&gt;, and other elements.&lt;/p&gt;
&lt;p&gt;This specialization mechanism is DITA&amp;rsquo;s real contribution to the XML world, because it offers new levels of customizability—not new levels above and beyond the old levels, but new levels in between &amp;ldquo;just use the existing content models&amp;rdquo; and &amp;ldquo;rewrite the content models&amp;rdquo; that were the choices before. This is great news for those who found the other two choices a little too far apart. (Before I go further, Norm Walsh &lt;a href=&#34;http://norman.walsh.name/2005/10/21/dita#specialization&#34;&gt;has shown&lt;/a&gt; that all of these ideas can be implemented in DocBook, but by limiting DITA&amp;rsquo;s domain to be a little narrower than that of DocBook, its developers seem to have created something that appears less complex and therefore easier to use for people in that domain: topic-oriented technical documentation.)&lt;/p&gt;
&lt;p&gt;I won&amp;rsquo;t go into the mechanics of DITA specialization here, but their key advantage is that a processor for a base version of an element can process a specialization that it didn&amp;rsquo;t know existed. For example, if you have a &lt;code&gt;recipe&lt;/code&gt; element based on DITA&amp;rsquo;s &lt;code&gt;task&lt;/code&gt; element, a DITA-compliant XSLT stylesheet designed to create HTML versions of &lt;code&gt;task&lt;/code&gt; elements can do the same with &lt;code&gt;recipe&lt;/code&gt; elements, even if the stylesheet was written before anyone had the idea to create this &lt;code&gt;recipe&lt;/code&gt; specialization. It&amp;rsquo;s similar to the object-oriented technique of treating objects of derived classes as objects of the base classes, although the DITA analogy with OO development can get &lt;a href=&#34;http://www-128.ibm.com/developerworks/xml/library/x-dita1/#h4&#34;&gt;pushed too far&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;This feature of DITA has its roots in a noble SGML concept known as &lt;a href=&#34;http://xml.coverpages.org/archForms.html&#34;&gt;architectural forms&lt;/a&gt; that never made much progress. The &amp;ldquo;conversion to HTML&amp;rdquo; part of my example above demonstrates why the DITA version of this concept got so much more traction than the original architectural forms idea: because there&amp;rsquo;s a working implementation that can convert your DITA documents, regardless of their level of specialization, into HTML, PDF, Java help, troff, RTF, and more—the &lt;a href=&#34;http://dita-ot.sourceforge.net/&#34;&gt;DITA Open Toolkit&lt;/a&gt;. People are using it to get useful work done now.&lt;/p&gt;
&lt;p&gt;Watch out for products whose DITA support consists of bundling the DITA DTDs and the open-source DITA Open Toolkit with their product and then documenting this &amp;ldquo;support&amp;rdquo; by rehashing &lt;a href=&#34;http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=dita&#34;&gt;OASIS&lt;/a&gt; documents telling you what DITA is. Proper support of DITA means proper support of DITA specialization. For example, if you take your editor from a document using a DITA DTD to a document using a specialization of that DTD (for example, a &lt;code&gt;recipe&lt;/code&gt; document), you want to see all the DITA features still available in the editor to edit that second document. Eliot Kimber recently wrote about some related issues in &lt;a href=&#34;http://drmacros-xml-rants.blogspot.com/2007/10/automatic-handling-of-dita-docs-in-xml.html&#34;&gt;Automatic Handling of DITA Docs In XML Editors&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;As much as I like this new approach to DTD customization (apparently, it works for W3C Schemas as well, but most support is a little behind there), it seems that a surprisingly small amount of DITA users are actually customizing the base DTD. I guess DITA&amp;rsquo;s appeal for them is the straightforward recipe (no pun intended) for a topic-oriented organization of technical material and the free toolkit for converting this content to all the other formats, which, being open source, is easy to integrate into larger applications. (Don&amp;rsquo;t get too excited by the existence of these transformations—what I&amp;rsquo;ve seen of the output isn&amp;rsquo;t very slick looking, and would need some work before a professional publisher would want their content delivered to their customers looking like that.)&lt;/p&gt;
&lt;p&gt;Regardless of the reasons for its appeal, DITA is hot. Above I mentioned that many XML products and services companies are trumpeting their comfort level with DITA, and I do work for a company that offers XML services and has a lot of &lt;a href=&#34;http://www.innodata-isogen.com/knowledge_center/dita?dref=website&#34;&gt;DITA-related promotional material&lt;/a&gt; on their website. I had no hand in writing this material, but I do like its realistic tone of &amp;ldquo;DITA can be a big help, but it&amp;rsquo;s not a magic bullet, here are some important things to think about&amp;rdquo;, and I&amp;rsquo;ve recommended to new co-workers who want to learn about DITA that they read those web pages. The free and commercial software and acceptance by the tech writing community of this relatively new XML standard have given it good traction and a bright future.&lt;/p&gt;
&lt;h2 id=&#34;11-comments&#34;&gt;11 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://dita.xml.org/blog/25&#34; title=&#34;http://dita.xml.org/blog/25&#34;&gt;Michael Priestley&lt;/a&gt; on &lt;a href=&#34;#comment-1350&#34;&gt;October 26, 2007 3:54 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Excellent summary. I&amp;rsquo;ll add that I do think DITA&amp;rsquo;s relatively strict definition of topic, which in turn enables the use of DITA maps across a diverse range of content, is one of the reasons that DITA has been able to deliver scalable single-sourcing - something that has been traditionally promised by XML, but hard to achieve without a lot of planning and discipline (which DITA doesn&amp;rsquo;t replace, but does provide a headstart with).&lt;/p&gt;
&lt;p&gt;On the specialization front, anecdotally I&amp;rsquo;ve seen about half of DITA&amp;rsquo;s adopters using specialization, which sounds like more than you&amp;rsquo;ve seen. A typical adoption curve is use DITA out-of-the-box for a year or two, then add some small-scale specializations, then begin building out additional specializations as you hit particular needs. In other words, you start simple, then evolve the architecture along with your understanding, rather than trying to do everything at once in the first adoption phase.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-1351&#34;&gt;October 26, 2007 4:32 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Thanks Michael! I like the idea of the headstart that doesn&amp;rsquo;t replace planning and discipline. This could be a recurring theme when discussing a variety of aspects of DITA, ranging from the structure of maps to the transformations available in the DITA Open Toolkit.&lt;/p&gt;
&lt;p&gt;By Ramon M. Felciano on &lt;a href=&#34;#comment-1359&#34;&gt;October 27, 2007 6:04 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Can you expand a bit on this comment about DITA relative to Docbook: &amp;ldquo;by limiting DITA&amp;rsquo;s domain to be a little narrower than that of DocBook, its developers seem to have created something that appears less complex and therefore easier to use for people in that domain: topic-oriented technical documentation.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;It was my impression that DocBook was actually more limited in the sense that is designed for a single technical narrative. It has a particular use case in mind, and is designed around that use case. DITA seems more like a toolkit approach for writing building blocks of documentation.&lt;/p&gt;
&lt;p&gt;Is that right? If so, how do you see DITA&amp;rsquo;s domain actually being narrower than DocBook?&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-1365&#34;&gt;October 27, 2007 7:16 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;That&amp;rsquo;s what I meant by &amp;ldquo;topic-oriented technical documentation&amp;rdquo;. There are lot of different elements that can serve as the root document of a Docbook document (book, article, etc.), so in my opinion it can be used in a broader range of cases than DITA. For example, a Docbook document could be organized by topic (or some rough equivalent) with less shoe-horning than would be necessary if using DITA for a narrative document.&lt;/p&gt;
&lt;p&gt;This reduced amount of flexibility in DITA makes it easier for people to get a handle on exactly what good it can do them&amp;ndash;sometimes extra flexibility is is more work for people as they figure out what profiles are available and which fits their needs best.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://dita.xml.org/blog/25&#34; title=&#34;http://dita.xml.org/blog/25&#34;&gt;Michael Priestley&lt;/a&gt; on &lt;a href=&#34;#comment-1369&#34;&gt;October 28, 2007 5:08 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I think the distinction is appropriate (topic-oriented vs. narrative-oriented), but I&amp;rsquo;m not sure about which is broader these days, or which is easier to adapt towards the other.&lt;/p&gt;
&lt;p&gt;Topic-oriented info is pretty much the norm for any professional tech pubs group, and with the rise of wikis and component-oriented CMSs it&amp;rsquo;s rapidly becoming the norm for most new content on the Web or intranet. This doesn&amp;rsquo;t mean that narrative content goes away, but I do think the broader use case probably is topics these days.&lt;/p&gt;
&lt;p&gt;By Tony DaSilva on &lt;a href=&#34;#comment-1371&#34;&gt;October 29, 2007 12:27 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;While it&amp;rsquo;s hard to disagree with a man as likable as Mike Priestley, I believe the &amp;ldquo;DITA is for topics/DocBook is for narratives&amp;rdquo; meme is a false dichotomy. Nothing prevents you from writing effective, topic-oriented content with DocBook. As a technical communicator who uses DocBook every day, it&amp;rsquo;s the only way I work. DocBook gives you everything you need to write granular, topic-oriented content. You can detail a task, reference information or a concept, recurse the heck out of it within your document (online help, web page, pdf, or whatever) reuse it in other documents, or stick that bit of content in a CMS and do all those things you&amp;rsquo;d expect that system to do.&lt;/p&gt;
&lt;p&gt;Mr. DuCharme&amp;rsquo;s old wine, new bottle argument is spot on. I&amp;rsquo;ve long asserted that DITA&amp;rsquo;s &amp;ldquo;hotness&amp;rdquo; has more to do with the marketing dollars spent by IBM and the rest of the vendor community, than with any intrinsic feature or function DITA provides. While there may be specific reasons for one person to prefer DITA over DocBook, my experience tells me that it is largely about preference. One might prefer DITA because it seems simpler than DocBook (&amp;ldquo;Martha, just look at all those tags!); seems more suited to online work (&amp;ldquo;DocBook? We don&amp;rsquo; nee no estinkin&amp;rsquo; books!&amp;rdquo;); or, because it seems more appropriate for writing topics (&amp;ldquo;Our help is organized by topics, except when we publish it as a PDF; then, it&amp;rsquo;s a narrative.&amp;rdquo;). All these are fine and appropriate reasons, but none of them have anything to do with one standard having more juice than the other.&lt;/p&gt;
&lt;p&gt;Yes, yes. I know. Except for specialization:&lt;/p&gt;
&lt;p&gt;DITAdroid: &amp;ldquo;That&amp;rsquo;s right. You can specialize with DITA. That way, when you&amp;rsquo;re entire enterprise writes everything in DITA, you&amp;rsquo;ll be able to use the Development team&amp;rsquo;s use cases in the Marketing team&amp;rsquo;s white papers.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;You: &amp;ldquo;The whole enterprise writes in DITA? Even marketing?&amp;rdquo;&lt;/p&gt;
&lt;p&gt;DITAdroid: &amp;ldquo;Yeah. Isn&amp;rsquo;t it great!&amp;rdquo;&lt;/p&gt;
&lt;p&gt;You: &amp;ldquo;Yeah, except, if you choose DITA because it&lt;br /&gt;
is &amp;ldquo;easier,&amp;rdquo; then why would you wade in the deep&lt;br /&gt;
waters of specialization?&amp;rdquo;&lt;/p&gt;
&lt;p&gt;DITAdroid: &amp;ldquo;Because, silly, once you get&lt;br /&gt;
experience with DITA, it&amp;rsquo;s SOOO much easier&lt;br /&gt;
to specialize.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;You: &amp;ldquo;Right, just like with DocBook. Once you&lt;br /&gt;
gain experience with it, the full range of its&lt;br /&gt;
possibilities for writing reusable,&lt;br /&gt;
topic-oriented, online content, and even - gasp -&lt;br /&gt;
narrative-oriented content destined for the&lt;br /&gt;
landfill (we called them books) becomes easier to&lt;br /&gt;
understand and implement.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;DITAdroid: &amp;ldquo;No.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;You: &amp;ldquo;Why?&amp;rdquo;&lt;/p&gt;
&lt;p&gt;DITAdroid: &amp;ldquo;Because&amp;hellip;Because the DITA guy told&lt;br /&gt;
me that DocBook is for books.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;You :&amp;ldquo;Right.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;DITA is cool. DITA rocks, rolls, and still loves you in the morning. Like any new love, she&amp;rsquo;s hotter than the old one. But, is she really all that different? Not so much, I think.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://dita.xml.org/blog/25&#34; title=&#34;http://dita.xml.org/blog/25&#34;&gt;Michael Priestley&lt;/a&gt; on &lt;a href=&#34;#comment-1372&#34;&gt;October 29, 2007 2:53 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;IBM had topic-oriented authoring guidelines long before we had DITA; yet the move to DITA still caused shakeups and rewriting, because they forced teams to confront the issue of topics and chunking to fit their content in the architecture. One of the roles of XML is to enforce your content model - if your content model includes topic-orientation, it makes sense to use an XML architecture that includes that constraint, rather than punting to editorial guidelines on a key information model issue.&lt;/p&gt;
&lt;p&gt;You don&amp;rsquo;t mention DITA maps, which is interesting - I see them as a key component to the architecture, as important as topics or specialization. And maps don&amp;rsquo;t work if you don&amp;rsquo;t have predictable, addressable topics.&lt;/p&gt;
&lt;p&gt;On the specialization front, generally people use specialization when they have business rules or information model constraints beyond just topic orientation that they want to enforce. That&amp;rsquo;s why concept, task, and reference exist as specializations, along with all the user-created specializations out there, and why DITA subcommittees are actively developing industry-specific specializations. I recommend you read a few DITA specialization case studies, or ask on the dita-users list, if you want more detailed examples.&lt;/p&gt;
&lt;p&gt;I like DocBook for what it does, but it doesn&amp;rsquo;t enforce topics, it doesn&amp;rsquo;t do maps, and it doesn&amp;rsquo;t do specialization. If these three things don&amp;rsquo;t matter to you, then you probably shouldn&amp;rsquo;t consider DITA. But clearly they do matter to a lot of DITA users who are making active use of all three capabilities.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-1373&#34;&gt;October 29, 2007 4:55 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Just to add to what Michael said, Norm has shown that you could implement map- and topic-like constructs in Docbook, and it&amp;rsquo;s very customizable, but these particular constructs are easier in DITA because of the head start it gives you if your information fits well into these structures. (That&amp;rsquo;s why I liked his &amp;ldquo;head start&amp;rdquo; image.)&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;m sure that IBM has a budgeted strategy regarding it use of and participation in DITA work, but the term &amp;ldquo;marketing dollars&amp;rdquo; is a bit much&amp;ndash;it&amp;rsquo;s not an IBM product for sale like Websphere or DB2, but a standard that they support and want to see grow. I&amp;rsquo;ve written entire books in DocBook, and may write more, but in my role as an employee of a consulting firm that specializes in standards-based automated publishing, I see plenty of situations where DITA is more appropriate.&lt;/p&gt;
&lt;p&gt;By Tony DaSilva on &lt;a href=&#34;#comment-1374&#34;&gt;October 29, 2007 10:06 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Mike, I appreciate your passion for DITA. I respect your deep knowledge of the standard and how it can support the work of technical communicators. I&amp;rsquo;ve been the beneficiary of your instruction and am happy to say that nearly everything (everything that&amp;rsquo;s correct, anyway) I know about DITA, I first learned from you. Just so we&amp;rsquo;re clear, I think DITA rocks. I don&amp;rsquo;t believe I&amp;rsquo;ve ever once suggested that it fails to deliver the goods for online content (and, this latest iteration is well on its way to giving us more of what we need for print). If you say folks are falling over themselves to specialize, well more power to them - specialize away. And, while I can do the much same with DocBook, DITA maps are pretty neat, too.&lt;/p&gt;
&lt;p&gt;My primary beef is that people who know better seem to consistently mischaracterize the capabilities of a mature, time-tested standard that continues to offer benefits to tens of thousands of users across the globe. These mischaracterizations end up misleading the uninitiated into believing that DocBook can&amp;rsquo;t do what it can and isn&amp;rsquo;t suited to do what it does. It may lead them to abandon an approach that gives them 99% of what they want and gives it to them as fast and as cheap as can be. I have a problem with that.&lt;/p&gt;
&lt;p&gt;You know we can write topic oriented content in DocBook, and we can write narrative content in DITA. We can be as strict or as loosey goosey as our hearts desire. Whatever topic enforcement is purpose built into DITA (or, is not in DocBook) is only as strong as the willingness of the writer to obey those restrictions. There&amp;rsquo;s no invisible hand guiding authors working in DITA compelling them to write tidy, compact and standalone topics. That&amp;rsquo;s a skill to be learned, and - as you say - an approach that requires planning and discipline. DITA&amp;rsquo;s &amp;ldquo;head start&amp;rdquo; might make it a bit easier, but it alone is not sufficient.&lt;/p&gt;
&lt;p&gt;All I am saying is that it is misleading to assert that DocBook is unsuited for topic oriented content. That&amp;rsquo;s demonstrably false. There may be plenty of reasons to choose DITA over DocBook (maps, specialization, hanging with the cool kids, whatever), but topic orientation is not an item where the two standards offer much of a difference. I&amp;rsquo;ve written books in DocBook, will write more, and as an owner of a consulting firm see situations where DocBook is more appropriate and situations where it is less appropriate. I don&amp;rsquo;t have a horse in the race, and I owe my clients my objective opinion derived from a study of their requirements and the technical landscape. Other folks do have a vested interest in the growth and adoption of one standard, and when these folks consistently misrepresent and minimize the capabilities of a - like it or not - competing standard, that get&amp;rsquo;s my attention.&lt;/p&gt;
&lt;p&gt;As for IBM and marketing DITA, puh-leeze. Who employs Day, Priestley, Schell, Anderson, Hennum, Hunt, et. al.? Who pays their travel and perdiem at the dozen or so conferences they attend each year. Who sponsors these conferences, pays for developing the Toolkit and the Task Modeler, provides these folks time for writing articles in DeveloperWorks, etcetera? IBM. Every dollar they spend, compounded by the dollars spent by the dozens of smaller companies pushing their particular DITA &amp;ldquo;solution&amp;rdquo; is a marketing dollar for DITA. Why is it that so many folks of common sense and uncommon intelligence seem to squirm at the mere mention of this?\&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://dita.xml.org/blog/25&#34; title=&#34;http://dita.xml.org/blog/25&#34;&gt;Michael Priestley&lt;/a&gt; on &lt;a href=&#34;#comment-1375&#34;&gt;October 30, 2007 12:21 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Tony, I gave you a concrete example of the difference between system-based topic orientation versus guideline-based. I am not slamming DocBook, I am reporting personal experience. That experience has been validated by dozens of other early DITA adopters, who have had to shake up content to fit it into DITA, and have benefitted from the result.&lt;/p&gt;
&lt;p&gt;In terms of marketing dollars, my main job is not promoting DITA, nor is that the main job of the others you mentioned. We&amp;rsquo;re fortunate to have supportive management, and in some cases conferences that are willing to help with the cost of travel to get us where we&amp;rsquo;re needed. Does Norm Walsh&amp;rsquo;s work on DocBook represent &amp;ldquo;marketing dollars&amp;rdquo; from Sun?&lt;/p&gt;
&lt;p&gt;You&amp;rsquo;re going to have to spread the credit for DITA&amp;rsquo;s success much more widely. I&amp;rsquo;d start with the list of companies and individuals on the DITA TC page (which includes Sun, by the way).&lt;/p&gt;
&lt;p&gt;By Tony DaSilva on &lt;a href=&#34;#comment-1376&#34;&gt;October 30, 2007 9:48 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;You&amp;rsquo;re so very right, Mike. Thanks for DITA. Thanks to &lt;a href=&#34;http://www.oasis-open.org/committees/membership.php?wg_abbrev=dita&#34;&gt;the DITA TC membership&lt;/a&gt; for all your work. Thanks also to&lt;/p&gt;
&lt;p&gt;Adobe Systems&lt;br /&gt;
BMC Software&lt;br /&gt;
Citrix Systems, Inc.&lt;br /&gt;
Comet Communication&lt;br /&gt;
Comtech Services, Inc.&lt;br /&gt;
IBM&lt;br /&gt;
Intel Corporation&lt;br /&gt;
Justsystems Corporation&lt;br /&gt;
Nokia Corporation&lt;br /&gt;
Oracle Corporation&lt;br /&gt;
PTC&lt;br /&gt;
Sun Microsystems&lt;br /&gt;
The Boeing Company&lt;br /&gt;
US Department of Defense&lt;/p&gt;
&lt;p&gt;for their continuing investment in the work of these good people.&lt;/p&gt;
&lt;p&gt;Thanks for DocBook, too. Thanks to &lt;a href=&#34;http://nwalsh.com/&#34;&gt;Norm&lt;/a&gt; and &lt;a href=&#34;http://www.sagehill.net/&#34;&gt;Bob&lt;/a&gt; and &lt;a href=&#34;http://www.oasis-open.org/committees/membership.php?wg_abbrev=docbook&#34;&gt;the DocBook TC membership&lt;/a&gt; for all your work. Thanks also to&lt;/p&gt;
&lt;p&gt;PTC&lt;br /&gt;
Reed Elsevier&lt;br /&gt;
Sun Microsystems&lt;/p&gt;
&lt;p&gt;for their continuing investment in the work of these good people.&lt;/p&gt;
&lt;p&gt;And, to all those people and organizations whose work benefits both communities, thanks and thanks again.&lt;/p&gt;
&lt;p&gt;You&amp;rsquo;ve helped me feed my brain, feed my kids, and feed my wife&amp;rsquo;s appetite for &lt;a href=&#34;http://www.neimanmarcus.com/store/catalog/templates/EntrySC.jhtml?itemId=cat000209&amp;amp;parentId=cat000199&amp;amp;masterId=cat000149&#34;&gt;designer shoes&lt;/a&gt;!&lt;/p&gt;
&lt;p&gt;I owe you, man. Big time.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2007">2007</category>
      
      <category domain="https://www.bobdc.com//categories/dita">DITA</category>
      
    </item>
    
    <item>
      <title>What Shelley said</title>
      <link>https://www.bobdc.com/blog/what-shelley-said/</link>
      <pubDate>Sat, 20 Oct 2007 10:50:16 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/what-shelley-said/</guid>
      
      
      <description><div>What counts as a semantic web tool?</div><div>&lt;p&gt;&lt;a href=&#34;http://twine.com/&#34;&gt;Twine&lt;/a&gt; looks like fun to me, the standards support looks great, and I&amp;rsquo;ve applied to be a beta tester. Still, one point that &lt;a href=&#34;http://burningbird.net/technology/semantic-to-go/&#34;&gt;Shelley Powers made about it&lt;/a&gt; bears repeating and rereading:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;To me, the semantic web means the web in the wild, not centralized in a specific tool or environment. If this becomes a &amp;ldquo;Facebook and Wikipedia mashup&amp;rdquo;, it might be successful, and it might be semantic, but it isn&amp;rsquo;t the web. The whole point of the semantic web technologies is for each of us to annotate our data, wherever we are, regardless of tool, and begin to really drive out the tiny threads of true meaning on a global scale. If we have to leave our places where we&amp;rsquo;re at and go elsewhere, this seems to create a disconnect, right from the start. I have this same quibble with the other &amp;lsquo;mainstream applications using semantic web technologies&amp;rsquo;, so the concern isn&amp;rsquo;t targeted specifically at Twine.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id=&#34;1-comments&#34;&gt;1 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://blog.scheir.net&#34; title=&#34;http://blog.scheir.net&#34;&gt;peter&lt;/a&gt; on &lt;a href=&#34;#comment-1330&#34;&gt;October 20, 2007 12:42 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I’ve just applied for a Twine account, so what I am writing here is basically just guessing.&lt;br /&gt;
Twine is going to provide an SPARQL API to the data stored in it. So the data is on the web. Twine is a place to store data like a web server is a place to store web pages.\&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2007">2007</category>
      
      <category domain="https://www.bobdc.com//categories/semantic-web">semantic web</category>
      
    </item>
    
    <item>
      <title>Command line from Windows Explorer and vice versa</title>
      <link>https://www.bobdc.com/blog/command-line-from-windows-expl/</link>
      <pubDate>Fri, 19 Oct 2007 08:28:56 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/command-line-from-windows-expl/</guid>
      
      
      <description><div>And an Xubuntu equivalent.</div><div>&lt;img src=&#34;https://www.bobdc.com/img/main/cmdlineexplorer.jpg&#34; border=&#34;0&#34; align=&#34;right&#34; hspace=&#34;30px&#34; vspace=&#34;30px&#34;/&gt;
&lt;p&gt;I use the Windows command line &lt;a href=&#34;https://www.bobdc.com/blog/w2k-batch-file-programming&#34;&gt;a lot&lt;/a&gt;, but some things are easier with the graphical interface, such as deleting or moving a group of files that don&amp;rsquo;t have some obvious part of their name in common. I&amp;rsquo;ve know for years that you can start up Windows Explorer from the command line, and you can pass it an argument telling it which directory/folder to open in that window. The following command is a simple way to open up an explorer window for the current directory from the command line:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;explorer . 
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I recently learned of a Microsoft &lt;a href=&#34;http://www.microsoft.com/windowsxp/downloads/powertoys/xppowertoys.mspx&#34;&gt;XP &amp;ldquo;Power Toy&amp;rdquo;&lt;/a&gt; that performs the opposite trick: when it&amp;rsquo;s loaded, a folder icon&amp;rsquo;s right-click menu in explorer includes an &amp;ldquo;Open Command Window Here&amp;rdquo; choice that brings you to a prompt with that folder as the current directory.&lt;/p&gt;
&lt;p&gt;I was going to write &amp;ldquo;Does anyone know of an equivalent that runs on Xubuntu?&amp;rdquo; but a bit of web searching turned up &lt;a href=&#34;http://roland65.free.fr/xfe/&#34;&gt;xfe&lt;/a&gt;. When you start it from the command line with no parameters, its default behavior is to open a window with icons of the files and subdirectories in your command line&amp;rsquo;s current directory. Its tool bar includes a little terminal icon that starts up a command line window in the directory for the folder you&amp;rsquo;re viewing, so you can have it both ways with this one utility.&lt;/p&gt;
&lt;h2 id=&#34;5-comments&#34;&gt;5 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.triplescape.com&#34; title=&#34;http://www.triplescape.com&#34;&gt;Brian Manley&lt;/a&gt; on &lt;a href=&#34;#comment-1325&#34;&gt;October 19, 2007 11:37 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Here&amp;rsquo;s a shorter equivalent:&lt;/p&gt;
&lt;p&gt;start .&lt;/p&gt;
&lt;p&gt;The nice thing about &amp;lsquo;start&amp;rsquo; is that, when given a file or URL, will start the system default application to process the parameter:&lt;/p&gt;
&lt;p&gt;start mailto:foobar@snee.com&lt;br /&gt;
start &lt;a href=&#34;http://www.snee.com&#34;&gt;http://www.snee.com&lt;/a&gt;&lt;br /&gt;
start MyDocument.doc&lt;br /&gt;
etc..&lt;/p&gt;
&lt;p&gt;Cheers,&lt;br /&gt;
Brian&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://dannyayers.com&#34; title=&#34;http://dannyayers.com&#34;&gt;Danny&lt;/a&gt; on &lt;a href=&#34;#comment-1326&#34;&gt;October 19, 2007 11:46 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve been back on a Win32 machine a lot recently, and in the past have found Cygwin must-have. This time even more so - someone suggested the rxvt package, seems an excellent lightweight terminal.&lt;/p&gt;
&lt;p&gt;I put xfe on a&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.megginson.com/blogs/quoderat/&#34; title=&#34;http://www.megginson.com/blogs/quoderat/&#34;&gt;David Megginson&lt;/a&gt; on &lt;a href=&#34;#comment-1327&#34;&gt;October 19, 2007 11:52 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;If you wanted to move to regular Ubuntu (with Gnome), then you could type&lt;/p&gt;
&lt;p&gt;nautilus .&lt;/p&gt;
&lt;p&gt;From the shell to get a window in the current directory. To go the other way, install the (tiny) nautilus-open-terminal package, which adds a menu entry to open a shell in the current directory from a Nautilus window.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-1328&#34;&gt;October 19, 2007 12:50 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Danny and David: Thanks!&lt;/p&gt;
&lt;p&gt;Brian: that looks great, I&amp;rsquo;ll be using that instead of explorer.exe from now on.&lt;/p&gt;
&lt;p&gt;For your MyDocument.doc example, though, the &amp;ldquo;start&amp;rdquo; isn&amp;rsquo;t necessary&amp;ndash;if a document has an extension that has an app associated with it, merely typing the name of the document will start up the app. This proved very useful about an hour ago when I couldn&amp;rsquo;t figure out how to tell the Adobe Digital Editions ebook client how to open up a file I had sitting on a disk (they don&amp;rsquo;t need no stinkin&amp;rsquo; &amp;ldquo;File&amp;rdquo; menu with an &amp;ldquo;Open&amp;rdquo; choice) so I entered the name of the .epub ebook file at the command line and it opened up in the Adobe reader.&lt;/p&gt;
&lt;p&gt;By Ed Davies on &lt;a href=&#34;#comment-1329&#34;&gt;October 19, 2007 2:09 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;On Ubuntu I have a symlink:&lt;/p&gt;
&lt;p&gt;lrwxrwxrwx 1 root root 19 2007-08-11 13:11 /usr/local/bin/go -&amp;gt; /usr/bin/gnome-open&lt;/p&gt;
&lt;p&gt;which allows &amp;ldquo;go somedoc.pdf&amp;rdquo;, &amp;ldquo;go &lt;a href=&#34;http://www.snee.com&#34;&gt;http://www.snee.com&amp;rdquo;&lt;/a&gt; or whatever. Naturally, &amp;ldquo;go .&amp;rdquo; opens Nautilus on the current directory, not that I use it much. The only snag is that it will only do one thing at a time; &amp;ldquo;go *.xml&amp;rdquo; only opens the first XML file in the directory.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2007">2007</category>
      
      <category domain="https://www.bobdc.com//categories/neat-tricks">neat tricks</category>
      
    </item>
    
    <item>
      <title>Playing with pull quotes</title>
      <link>https://www.bobdc.com/blog/playing-with-pull-quotes/</link>
      <pubDate>Wed, 17 Oct 2007 21:36:49 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/playing-with-pull-quotes/</guid>
      
      
      <description><div>Now maybe they display properly in the full Atom feed.</div><div>&lt;blockquote class=&#34;pullquote&#34; style=&#39;width: 190px; font: bold 1.333em/1.125em &#34;Helvetica Neue&#34;, Helvetica, Arial, sans-serif; margin: 1.5em 0 1.5em 1.5em !important; padding: 0.6em 5px !important; background: none !important; border: 3px double \#ddd; border-width: 3px 0; text-align: center; float: right; &#39;&gt;**Or at least look a little better.**&lt;/blockquote&gt;
&lt;p&gt;I like to put an image or a &lt;a href=&#34;http://en.wikipedia.org/wiki/Pull_quote&#34;&gt;pull quote&lt;/a&gt; in my weblog postings to break up the gray of pure text. I also prefer pulling summary Atom feeds to full ones into my reader, but I offer both on my weblog&amp;rsquo;s &lt;a href=&#34;http://www.snee.com/bobdc.blog/&#34;&gt;home page&lt;/a&gt;. I recently saw in logs that the Atom feed with my complete postings is more popular, and I knew that the pullquotes looked wrong there, so I&amp;rsquo;ve tried to fix it.&lt;/p&gt;
&lt;p&gt;Because the full Atom feed version of the postings doesn&amp;rsquo;t know about the HTML version&amp;rsquo;s style sheet, which has a specific style for pull quotes, these look like an extra body text paragraph that happens to repeat something from one of the other paragraphs. To make them look more like pull quotes in those feeds, I&amp;rsquo;m now embedding the necessary CSS right in a &lt;code&gt;style&lt;/code&gt; attribute of the &lt;code&gt;blockquote&lt;/code&gt; element that holds the pullquote, so it should show up properly in the display of the full Atom feed version of each posting—or at least look a little better. Tunneling of CSS in the content does feel like cheating, in a purist markup geek kind of way, but those &lt;code&gt;blockquote&lt;/code&gt; elements have their own &lt;code&gt;class&lt;/code&gt; attribute with a value of &amp;ldquo;pullquote&amp;rdquo; if some automated process wanted to just find them and get rid of them, so I guess it&amp;rsquo;s not too bad.&lt;/p&gt;
&lt;p&gt;It looks fine in Bloglines in recent releases of Firefox and IE. Let me know if it&amp;rsquo;s a mess elsewhere.&lt;/p&gt;
&lt;h2 id=&#34;2-comments&#34;&gt;2 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://kontrawize.blogs.com/kontrawize/&#34; title=&#34;http://kontrawize.blogs.com/kontrawize/&#34;&gt;Anthony B. Coates&lt;/a&gt; on &lt;a href=&#34;#comment-1317&#34;&gt;October 18, 2007 3:56 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;In Opera (when using it as a feed reader), the pull text shows up as an indented paragraph (i.e. blockquote), but not with text wrapping around it as it appears on the Web page.&lt;br /&gt;
Cheers, Tony.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-1321&#34;&gt;October 18, 2007 8:08 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Hi Tony,&lt;/p&gt;
&lt;p&gt;That&amp;rsquo;s how it looked in Bloglines before I made this change.&lt;/p&gt;
&lt;p&gt;I just added &lt;b&gt;&lt;/b&gt; tags to make it a little clearer that it&amp;rsquo;s not a regular blockquote.&lt;/p&gt;
&lt;p&gt;See you in Boston!&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2007">2007</category>
      
      <category domain="https://www.bobdc.com//categories/blogging-about-blogging">blogging about blogging</category>
      
    </item>
    
    <item>
      <title>Buying the new Radiohead album</title>
      <link>https://www.bobdc.com/blog/buying-the-new-radiohead-album/</link>
      <pubDate>Sun, 14 Oct 2007 16:25:48 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/buying-the-new-radiohead-album/</guid>
      
      
      <description><div>The price? &#34;IT&#39;S UP TO YOU&#34;</div><div>&lt;p&gt;If you follow the news of either hi-tech media or music, you already know that Radiohead is selling their new album &amp;ldquo;In Rainbows&amp;rdquo; to customers online for whatever they want to pay. A deluxe boxed set is also available for a fixed price of 40 pounds.&lt;/p&gt;
&lt;p&gt;Radiohead is on my short list of artists whose new work I buy as soon as it&amp;rsquo;s released without waiting for any reviews. To be honest, my first few Radiohead CDs were dupes of a friend&amp;rsquo;s copies, but I liked them enough to buy shrinkwrapped new copies of the next few that came out. The early stuff is a little too Junior U2 for me, and their later work seems like the kind of pretentious drama that should really irk me, but as with so many bands, if we take the lead singer&amp;rsquo;s histrionics with a grain of salt the package is really well done. I especially love Jonny Greenwood&amp;rsquo;s awareness that &amp;ldquo;electronic music&amp;rdquo; is not something that was invented for dance floors in the 1980s, but goes all the way back to the work of people such as &lt;a href=&#34;http://en.wikipedia.org/wiki/Pierre_Schaeffer&#34;&gt;Pierre Schaeffer&lt;/a&gt;, who was recording machinery and then cutting up and gluing bits of recording tape together in the 1940s, generations before any talk of sampling or mashups. Their balance of strange noises with crunchy power chords continues a legacy that goes back to Roxy Music&amp;rsquo;s first album, only with much better guitar playing; for a great example of a rocking Radiohead kick-in, see what happens at 2:49 of &lt;a href=&#34;http://www.youtube.com/watch?v=m_mMzOQpe0I&#34;&gt;this YouTube video&lt;/a&gt; of &amp;ldquo;Paranoid Android&amp;rdquo;. (The song is from &amp;ldquo;OK Computer,&amp;rdquo; an excellent Radiohead album to start with.)&lt;/p&gt;
&lt;p&gt;Regular trips to Oxford for the &lt;a href=&#34;http://www.xmlsummerschool.com&#34;&gt;XML Summer School&lt;/a&gt; also make me a fan, because they&amp;rsquo;re an Oxford band—the city, not the university, contrary to what &lt;a href=&#34;http://www.allmusic.com/cg/amg.dll?p=amg&amp;amp;sql=11:fxfoxql5ld6e~T1&#34;&gt;allmusic.com&lt;/a&gt; says—and apparently they still live there. &lt;a href=&#34;http://www.xmlsummerschool.com/speakers/johnchelsom.html&#34;&gt;John Chelsom&lt;/a&gt; told me that while taking his son for a walk in the park, he passed Thomas Yorke doing the same, and they gave each other the little &amp;ldquo;here we are, dads pushing our prams among all these moms and nannies&amp;rdquo; nod. I would trade that for both my stories of passing local boy Dave Matthews in downtown Charlottesville any day. (I guess he&amp;rsquo;s also a big fan—he &lt;a href=&#34;http://www.rollingstone.com/news/story/7248604/73_radiohead&#34;&gt;wrote in Rolling Stone magazine&lt;/a&gt; that &amp;ldquo;Listening to Radiohead makes me feel like I&amp;rsquo;m a Salieri to their Mozart&amp;rdquo;.)&lt;/p&gt;
&lt;p&gt;Go to &lt;a href=&#34;http://www.radiohead.com&#34;&gt;radiohead.com&lt;/a&gt; or &lt;a href=&#34;http://www.radiohead.co.uk&#34;&gt;radiohead.co.uk&lt;/a&gt; and you&amp;rsquo;ll be redirected to &lt;a href=&#34;http://www.inrainbows.com/&#34;&gt;inrainbows.com&lt;/a&gt;. After clicking through some screens to indicate that you want to download a copy, you get this:&lt;/p&gt;
&lt;img src=&#34;https://www.bobdc.com/img/main/buyInRainbows1.jpg&#34; alt=&#34;[In Rainbows transaction screen 1]&#34; border=&#34;0&#34; vspace=&#34;30px&#34; /&gt;
&lt;p&gt;Clicking the question mark displays this:&lt;/p&gt;
&lt;img src=&#34;https://www.bobdc.com/img/main/buyInRainbows2.jpg&#34; alt=&#34;[In Rainbows transaction screen 2]&#34; border=&#34;0&#34; vspace=&#34;30px&#34;/&gt;
&lt;p&gt;Another question mark, another reassurance:&lt;/p&gt;
&lt;img src=&#34;https://www.bobdc.com/img/main/buyInRainbows3.jpg&#34; alt=&#34;[In Rainbows transaction screen 3]&#34; border=&#34;0&#34; vspace=&#34;30px&#34;/&gt;
&lt;p&gt;I decided that my magic number for how much I wanted the album was $10, a nice round figure a few bucks less than what a hard copy CD and booklet would cost. Radiohead is so British that regardless of where you hit their website from, you have to pick a figure in pounds sterling. Unfortunately for us Yanks, the math is only too easy these days: divide the dollar figure by 2, so I entered 5.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;m listening to the album now, and it was worth every p.&lt;/p&gt;
&lt;h2 id=&#34;1-comments&#34;&gt;1 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.rickyboy.squarespace.com&#34; title=&#34;http://www.rickyboy.squarespace.com&#34;&gt;Rick Schochler&lt;/a&gt; on &lt;a href=&#34;#comment-1324&#34;&gt;October 18, 2007 3:56 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Hi, Bob&amp;hellip;you&amp;rsquo;ve convinced me to jump onto the historical bandwagon&amp;hellip;I&amp;rsquo;ll download tonight. BTW, did you see the article (I found it on slashdot) that most downloads thus far have been for nil? It&amp;rsquo;ll be interesting to see how this plays out.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2007">2007</category>
      
      <category domain="https://www.bobdc.com//categories/music">music</category>
      
    </item>
    
    <item>
      <title>ebook hardware readers: suddenly looking good</title>
      <link>https://www.bobdc.com/blog/ebook-hardware-readers-suddenl/</link>
      <pubDate>Thu, 11 Oct 2007 08:56:45 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/ebook-hardware-readers-suddenl/</guid>
      
      
      <description><div>So many PDFs to read...</div><div>&lt;p&gt;&lt;a href=&#34;http://en.wikipedia.org/wiki/Ebook&#34;&gt;&lt;img src=&#34;https://www.bobdc.com/img/main/180px-Laptop-ebook.jpg&#34; alt=&#34;[wikipedia ebook picture]&#34; border=&#34;0&#34; align=&#34;right&#34; hspace=&#34;30px&#34; vspace=&#34;30px&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I always said that a specialized device for reading electronic books wasn&amp;rsquo;t worth owning. For example, if you brought one to the beach, your electronic device would be surrounded by sand that could be bad for the machine&amp;rsquo;s inner workings, and the screen would be difficult to read in the glare of the sun, and someone could easily steal the device when you went in for a swim. And when I go to the beach, I like to read a lot, but hardcopy books are obviously superior for this.&lt;/p&gt;
&lt;p&gt;Having been familiar for many years with arguments for why electronic media should replace some but not all books, I should have realized that just because ebook reading devices aren&amp;rsquo;t appropriate for beach reading doesn&amp;rsquo;t mean that they&amp;rsquo;re useless. Several times a week I seem to find yet another PDF document that I want to read, but I don&amp;rsquo;t want to read it while sitting at my desk, I don&amp;rsquo;t want to balance a laptop on my lap and either plug it in or think about its battery life, and I don&amp;rsquo;t want to use up a lot of paper and toner to print something that I&amp;rsquo;m going to read once and throw away. &lt;a href=&#34;http://www.preterhuman.net/texts/science_and_technology/The%20Description%20Logic%20Handbook%20-%20Theory,%20Implementation%20and%20Applications%20(2003).pdf&#34;&gt;The book on Description Logics&lt;/a&gt; whose first chapter I &lt;a href=&#34;https://www.bobdc.com/blog/the-dl-in-owl-dl&#34;&gt;summarized last week&lt;/a&gt; is a good example.&lt;/p&gt;
&lt;p&gt;Many useful books and papers—especially technical ones—are sitting on public servers as PDF files, freely available for download, and I&amp;rsquo;m thinking that a black-and-white ebook reader with no keyboard or hard disk would use less power and be lighter than a typical laptop and save me from using up lots of paper and toner. Of course, I don&amp;rsquo;t want one badly enough to shell out $225, which seems to be a &lt;a href=&#34;http://search.ebay.com/search/search.dll?from=R40&amp;amp;_trksid=m37&amp;amp;satitle=ebook+reader&amp;amp;category0=&#34;&gt;typical ebay price&lt;/a&gt; for them, but my attitude toward them has changed, and I&amp;rsquo;m going to look at these devices differently from now on.&lt;/p&gt;
&lt;p&gt;There are alternative formats to read on these machines, especially &lt;a href=&#34;http://www.idpf.org/oebps/oebps1.2/download/oeb12-xhtml.htm&#34;&gt;OEBPS&lt;/a&gt;, which looks like a nicely-done standard, but the existence of all those PDF files of documents that I want to read is driving my interest in ebook reading devices for now. Has anyone else had much experience reading PDFs off of one of these devices on a large scale? Does my lack of experience with them mean that I&amp;rsquo;m missing something in my assumptions about why it would be handy to have one?&lt;/p&gt;
&lt;h2 id=&#34;2-comments&#34;&gt;2 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.ccil.org/~cowan&#34; title=&#34;http://www.ccil.org/~cowan&#34;&gt;John Cowan&lt;/a&gt; on &lt;a href=&#34;#comment-1304&#34;&gt;October 11, 2007 2:33 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s all about screen resolution and contrast: hard to beat paper yet for either one. I print everything and put the paper in the recycle bin: nothing much I can do about toner, though I do have my cartridges refilled rather than replacing them.&lt;/p&gt;
&lt;p&gt;By Ed Davies on &lt;a href=&#34;#comment-1305&#34;&gt;October 11, 2007 3:32 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;One of the intriguing things about the OLPC machines is that they are designed to be pretty good e-book readers, without being &lt;em&gt;just&lt;/em&gt; e-book readers.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2007">2007</category>
      
      <category domain="https://www.bobdc.com//categories/ebooks">ebooks</category>
      
    </item>
    
    <item>
      <title>How college students really use Facebook</title>
      <link>https://www.bobdc.com/blog/how-college-students-really-us/</link>
      <pubDate>Sat, 06 Oct 2007 10:11:43 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/how-college-students-really-us/</guid>
      
      
      <description><div>&#34;Online community theater&#34;.</div><div>&lt;blockquote class=&#34;pullquote&#34;&gt;It turns out that they don&#39;t need software for social networking.&lt;/blockquote&gt;
&lt;p&gt;An op-ed piece by a recent Dartmouth graduate in today&amp;rsquo;s New York Times titled &lt;a href=&#34;http://www.nytimes.com/2007/10/06/opinion/06mathias.html&#34;&gt;The Fakebook Generation&lt;/a&gt; (registration required) tells us how college students really use Fakebook. It turns out that they don&amp;rsquo;t need software for social networking, because they have dorms and classes and libraries and sports teams and clubs and even parties for that. According to its author,&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;We log into the Web site because it&amp;rsquo;s entertaining to watch a constantly evolving narrative starring the other people in the library. I&amp;rsquo;ve always thought of Facebook as online community theater. In costumes we customize in a backstage makeup room — the Edit Profile page, where we can add a few Favorite Books or touch up our About Me section — we deliver our lines on the very public stage of friends&amp;rsquo; walls or photo albums. And because every time we join a network, post a link or make another friend it&amp;rsquo;s immediately made visible to others via the News Feed, every Facebook act is a soliloquy to our anonymous audience.&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s all comedy: making one another laugh matters more than providing useful updates about ourselves, which is why entirely phony profiles were all the rage before the grown-ups signed in.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;(The title of her piece was a bit confusing for someone who attempts to play jazz as a hobby—the term &lt;a href=&#34;http://en.wikipedia.org/wiki/Fake_book&#34;&gt;fakebook&lt;/a&gt; has meant something else for about sixty years, and I thought &amp;ldquo;now there&amp;rsquo;s a generation for it?&amp;rdquo;)&lt;/p&gt;
&lt;p&gt;My Facebook experiment is to do as little as possible with &lt;a href=&#34;http://www.facebook.com/profile.php?id=623756409&#34;&gt;my page&lt;/a&gt; and see what happens. So far I have about twenty friends and have received two zombie invitations. It&amp;rsquo;s always funny to have Facebook tell me that I and someone I&amp;rsquo;ve know for years &amp;ldquo;are now friends&amp;rdquo;; it makes the relationship feel so validated. After my older daughter and I each found out over dinner that the other had a Facebook page, I tried to find hers and couldn&amp;rsquo;t, but I suppose it&amp;rsquo;s a Good Thing that a 47 year old man can&amp;rsquo;t easily locate a particular teenage girl on Facebook.&lt;/p&gt;
&lt;p&gt;Facebook has also proved useful to send a message to a friend whose spam filters kept eating my regular email. It&amp;rsquo;s actually a bit troubling to extrapolate this scenario out further, to a time when walled gardens owned by specific corporations where you have to be a member become a more reliable way to send email than the public internet.&lt;/p&gt;
&lt;p&gt;We IT-oriented adults each use Facebook for different things, experimental or otherwise. It&amp;rsquo;s worth reading in the Times op-ed piece about how Facebook&amp;rsquo;s original audience really uses it.&lt;/p&gt;
&lt;h2 id=&#34;1-comments&#34;&gt;1 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.TimothyHorrigan.com/2ndlife.html&#34; title=&#34;http://www.TimothyHorrigan.com/2ndlife.html&#34;&gt;Timothy Horrigan&lt;/a&gt; on &lt;a href=&#34;#comment-1299&#34;&gt;October 8, 2007 3:42 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;For online community theatre, nothin&amp;rsquo; beats Second Life&amp;hellip;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2007">2007</category>
      
      <category domain="https://www.bobdc.com//categories/miscellaneous">miscellaneous</category>
      
    </item>
    
    <item>
      <title>The &#34;DL&#34; in &#34;OWL DL&#34;</title>
      <link>https://www.bobdc.com/blog/the-dl-in-owl-dl/</link>
      <pubDate>Thu, 04 Oct 2007 09:11:24 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/the-dl-in-owl-dl/</guid>
      
      
      <description><div>An interesting legacy that contributes many cool things to OWL.</div><div>&lt;p&gt;I drive a Honda Accord EX. To even write that, I had to look at the back of my car to remember the &amp;ldquo;EX&amp;rdquo; part, because it never meant anything to me. I try to remember to mention it when I call the dealership to ask about the availability of some part, because it might matter to them, but it doesn&amp;rsquo;t to me.&lt;/p&gt;
&lt;p&gt;The &amp;ldquo;DL&amp;rdquo; in &amp;ldquo;OWL DL&amp;rdquo; used to mean about as much to me as the &amp;ldquo;EX&amp;rdquo; in &amp;ldquo;Honda Accord EX&amp;rdquo;. I always thought of the &lt;a href=&#34;http://www.w3.org/TR/2004/REC-owl-features-20040210/#s1.3&#34;&gt;three versions of the OWL language&lt;/a&gt; as basically being small, medium, and large, and with most OWL tools going with the Goldilocks approach of selecting the one in the middle, I always worked with that one, even if I had no idea where the plural form &amp;ldquo;Description Logics&amp;rdquo; was used. After reading chapter 1 of the Nardi and Brachman 2003 book &amp;ldquo;An Introduction to Description Logics&amp;rdquo; (&lt;a href=&#34;http://www.inf.unibz.it/~franconi/dl/course/dlhb/dlhb-01.pdf&#34;&gt;PDF&lt;/a&gt;) I understand much more what Description Logics bring to OWL. The chapter never actually mentions OWL, but now I understand better what OWL DL has that OWL Lite doesn&amp;rsquo;t, where those extra features came from, and what they buy you. I&amp;rsquo;ll try to summarize the key points here, with ample quotations, and I heartily encourage you to read the whole thing. (I also encourage anyone to correct any misunderstandings in my attempts to summarize.)&lt;/p&gt;
&lt;blockquote class=&#34;pullquote&#34;&gt;The &#34;DL&#34; in &#34;OWL DL&#34; used to mean about as much to me as the &#34;EX&#34; in &#34;Honda Accord EX&#34;.&lt;/blockquote&gt;
&lt;p&gt;According to the paper&amp;rsquo;s introduction,&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Research in the field of knowledge representation and reasoning is usually focused on methods for providing high-level descriptions of the world that can be effectively used to build intelligent applications. In this context, &amp;ldquo;intelligent&amp;rdquo; refers to the ability of a system to find implicit consequences of its explicitly represented knowledge.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;A good goal, and for me, one of the keys to the value of OWL: finding implicit consequences of explicitly represented knowledge.&lt;/p&gt;
&lt;p&gt;If I understand the paper correctly, Descriptions Logics build on first-order logic, an alternative to the network- and frame-based approaches to knowledge representation that became popular in the 1970s. The second paragraph of &lt;a href=&#34;http://en.wikipedia.org/wiki/First_order_logic&#34;&gt;Wikipedia&amp;rsquo;s entry on First-order logic&lt;/a&gt; gives a good idea of what it does:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;While &lt;a href=&#34;http://en.wikipedia.org/wiki/Propositional_logic&#34; title=&#34;Propositional logic&#34;&gt;propositional logic&lt;/a&gt; deals with simple declarative propositions, first-order logic additionally covers &lt;a href=&#34;http://en.wikipedia.org/wiki/Predicate_%28mathematics%29&#34; title=&#34;Predicate (mathematics)&#34;&gt;predicates&lt;/a&gt; and quantification. Take for example the following sentences: &amp;ldquo;Socrates is a man&amp;rdquo;, &amp;ldquo;Plato is a man&amp;rdquo;. In &lt;a href=&#34;http://en.wikipedia.org/wiki/Propositional_logic&#34; title=&#34;Propositional logic&#34;&gt;propositional logic&lt;/a&gt; these will be two unrelated propositions, denoted for example by &lt;em&gt;p&lt;/em&gt; and &lt;em&gt;q&lt;/em&gt;. In first-order logic however, both sentences would be connected by the same property: Man(x), where Man(x) means that x is a man. When x=Socrates we get the first proposition - &lt;em&gt;p&lt;/em&gt;, and when x=Plato we get the second proposition - &lt;em&gt;q&lt;/em&gt;. Such a construction allows for a much more powerful logic when quantifiers are introduced, such as &amp;ldquo;for every x&amp;hellip;&amp;rdquo; - for example, &amp;ldquo;for every x, if Man(x), then&amp;hellip;&amp;rdquo;. Without quantifiers, every valid argument in FOL is valid in propositional logic, and vice versa.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;(The idea of &amp;ldquo;predicates&amp;rdquo; and two sentences being connected by the same property should be familiar to RDF users.) Nardi and Brachman tell us that:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Owing to their more human-centered origins, the network-based systems were often considered more appealing and more effective from a practical viewpoint than the logical systems. Unfortunately they were not fully satisfactory because of their usual lack of precise semantic characterization. The end result of this was that every system behaved differently from the others, in many cases despite virtually identical looking components and even identical relationship names.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The article describes DL&amp;rsquo;s &amp;ldquo;humble origins in the late 1970&amp;rsquo;s as a remedy for logical and semantic problems in frame and semantic network representations&amp;rdquo;, although it wasn&amp;rsquo;t always known as Description Logics.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&amp;hellip;research in the area of Description Logics began under the label &lt;em&gt;terminological systems&lt;/em&gt;, to emphasize that the representation language was used to establish the basic terminology adopted in the modeled domain.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Coming up with a terminology to model a domain! That should be familiar to anyone who has worked on OWL ontologies.&lt;/p&gt;
&lt;p&gt;A particular terminological system is known as a &amp;ldquo;Description Logic&amp;rdquo; (for example, DAML+OIL, as described by &lt;a href=&#34;http://web.comlab.ox.ac.uk/oucl/work/ian.horrocks/Publications/download/2002/ieeede2002.pdf&#34;&gt;this pdf file&lt;/a&gt;) and &amp;ldquo;DL&amp;rdquo; refers to the family of Description Logics. Knowledge Representation Systems based on Description Logics are collectively known as DL-KRS. (Nardi and Brachman also tell us that &amp;ldquo;The ancestor of DL systems is KL-ONE [Brachman and Schmolze, 1985], which signaled the transition from semantic networks to more well-founded terminological (description) logics&amp;rdquo;—can we look forward to a Description Logic named &lt;a href=&#34;https://en.wikipedia.org/wiki/KRS-One&#34;&gt;KRS-One&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Here&amp;rsquo;s their perspective on something else that makes OWL DL more powerful than OWL Lite—the ability to say that a string quartet has exactly four musicians or that a basketball team has at least five players:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Number restrictions are sometimes viewed as a distinguishing feature of Description Logics, although one can find some similar constructs in some database modeling languages (notably Entity-Relationship models).&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The section titled &amp;ldquo;Reasoning&amp;rdquo; begins with this introduction to another important concept that DL brings to OWL:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The basic inference on concept expressions in Description Logics is subsumption, typically written as C &lt;img src=&#34;https://www.bobdc.com/img/main/subsumes.jpg&#34; alt=&#34;subsumes&#34;/&gt; D. Determining subsumption is the problem of checking whether the concept denoted by D (the &lt;em&gt;subsumer&lt;/em&gt;) is considered more general than the one denoted by C (the &lt;em&gt;subsumee&lt;/em&gt;). In other words, subsumption checks whether the first concept always denotes a subset of the set denoted by the second one.&lt;/p&gt;
&lt;p&gt;For example, one might be interested in knowing whether Woman &lt;img src=&#34;https://www.bobdc.com/img/main/subsumes.jpg&#34; alt=&#34;subsumes&#34;/&gt; Mother. In order to verify this kind of relationship one has in general to take into account the relationships defined in the terminology&amp;hellip; Another typical inference on concept expressions is concept &lt;em&gt;satisfiability&lt;/em&gt;, which is the problem of checking whether a concept expression does not necessarily denote the empty concept.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;That is, that there is at least one thing that it describes.&lt;/p&gt;
&lt;p&gt;The concepts of ABox and TBox used to confuse me, but I have a clearer idea now:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;A DL knowledge base is analogously typically comprised by two components—&amp;ldquo;TBox&amp;rdquo; and an &amp;ldquo;ABox&amp;rdquo;. The TBox contains intensional knowledge in the form of a terminology (hence the term &amp;ldquo;TBox&amp;rdquo;, but &amp;ldquo;taxonomy&amp;rdquo; could be used as well) and is built through declarations that describe general properties of concepts&amp;hellip; The ABox contains extensional knowledge—also called assertional knowledge (hence the term &amp;ldquo;ABox&amp;rdquo;)—knowledge that is specific to the individuals of the domain of discourse.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;ABox assertions are about Individuals in the data (what object-oriented developers call &amp;ldquo;instances&amp;rdquo;—the actual data, as opposed to metadata) and TBox assertions build the terminology used to describe the classes. The assertion/taxonomy part makes for a nice mnemonic to keep ABoxes and TBoxes straight, similar to the &lt;a href=&#34;http://www.bartelby.org/68/97/5697.html&#34;&gt;ceiling/ground&lt;/a&gt; trick for remembering the difference between stalactites and stalagmites.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The basic reasoning task in an ABox is &lt;em&gt;instance checking&lt;/em&gt;, which verifies whether a given individual is an instance of (belongs to) a specified concept. Although other reasoning services are usually considered and employed, they can be defined in terms of instance checking. Among them we find &lt;em&gt;knowledge base consistency&lt;/em&gt;, which amounts to verifying whether every concept in the knowledge base admits at least one individual; &lt;em&gt;realization&lt;/em&gt;, which finds the most specific concept an individual object is an instance of; and &lt;em&gt;retrieval&lt;/em&gt;, which finds the individuals in the knowledge base that are instances of a given concept. These can all be accomplished by means of instance checking.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;&amp;ldquo;Retrieval&amp;rdquo; sounds simple enough when you think in terms of a traditional database application, but when you use implicit as well as explicit knowledge to find individuals that are instances of a given concept, it&amp;rsquo;s not so simple anymore. In fact,&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The presence of individuals in a knowledge base makes reasoning more complex from a computational viewpoint [Donini et al., 1994b], and may require significant extensions of some TBox reasoning techniques.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;No wonder ontologies with no associated &amp;ldquo;&lt;a href=&#34;https://www.bobdc.com/blog/my-new-favorite-typo&#34;&gt;meatdata&lt;/a&gt;&amp;rdquo; are so popular!&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Later, an approach that viewed the DL more as a component became evident; in this view the DL system acts as a component of a larger environment, typically leaving out functions, such those [sic] for data management, that are more effectively implemented by other technologies. The architecture where the component view is taken requires the definition of a clear interface between the components, possibly adopting different modeling languages, but focusing on Description Logics for the implementation of the reasoning services that can add powerful capabilities to the application.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This is where the real promise of this technology lies: plugging in these &amp;ldquo;reasoning services&amp;rdquo; to the many existing systems that do data management, whether the latest RDF triplestore or a straightforward MySQL or Oracle RDBMS. The paper elaborates on the relationship to database managers again later:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&amp;hellip;Description Logics as modeling languages overlap to a large extent with other modeling languages developed in fields such as Programming languages and Database Management. While we shall focus on this relationship later, we recall here that, when compared to modeling languages developed in other fields the characteristic feature of Description Logics is in the reasoning capabilities that are associated with it. In other words, we believe that, while modeling has general significance, the capability of exploiting the description of the model to draw conclusions about the problem at hand is a particular advantage of modeling using Description Logics.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;DL has another important connection to traditional database development:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;In addition, Description Logics provide a formal framework that has been shown to be rather close to the languages used in semantic data modeling, such as the Entity-Relationship Model [Calvanese et al., 1998g]. Description Logics are equipped with reasoning tools that can bring to the conceptual modeling phase significant advantages, as compared with traditional languages, whose role is limited to modeling. For instance, by using concept consistency one can verify at design time whether an entity can have at least one instance, thus clearly saving all the difficulties arising from discovering such a situation when the database is being populated [Borgida, 1995].&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The paper later describes a particular Description Logic and &amp;ldquo;a precise correspondence between the chosen DL and the Entity-Relationship model&amp;rdquo;. There&amp;rsquo;s an even more obvious correspondence to another popular concept in system development; the section on &amp;ldquo;Relationship to other fields of Computer Science&amp;rdquo; tells us that&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&amp;hellip;the underlying ideas of concept/class and hierarchical structure based upon the generality and specificity of a set of classes have appeared in many other field [sic] of Computer Science, such as Database Management and Programming Languages.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The article puts other related work into historical perspective, such as DAML+OIL, a language that derives from&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;DAML-ONT [McGuinness et al., 2002], an ontology language for the Web inspired by object-oriented and frame-based languages, and OIL [Fensel et al., 2001], with a similar goal of expressing ontologies, but with a closer connection to Description Logics.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The appeal of using DL with more W3C-oriented technologies such as RDF and XML becomes more apparent here:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;A more recent use of Description Logics is concerned with so-called &amp;ldquo;semi-structured&amp;rdquo; data models [Calvanese et al., 1998c], which are being proposed in order to overcome the difficulties in treating data that are not structured in a relational form, such as data on the Web, data in spreadsheets, etc. In this area Description Logics are sufficiently expressive to represent models and languages that are being used in practice, and they can offer significant advantages over other approaches because of the reasoning services they provide.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The use with RDBMS and &amp;ldquo;semi-structured&amp;rdquo; data leads to what could be the biggest payoff of all for DL:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Another problem that has recently increased the applicability of Description Logics is information integration. As already remarked, data are nowadays available in large quantities and from a variety of sources. Information integration is the task of providing a unique coherent view of the data stored in the sources available. In order to create such a view, a proper relationship needs to be established between the data in the sources and the unified view of the data. Description Logics not only have the expressiveness needed in order to model the data in the sources, but their reasoning services can help in the selection of the sources that are relevant for a query of interest, as well as to specify the extraction process [Calvanese et al., 2001c].&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The paper often returns to the theme that more than most knowledge representation systems, Description Logics were designed to be something that could be implemented—and not only implemented, but designed in such a way that specific features could have their effect on implementation efficiency measured and evaluated:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The way Description Logics were able to separate out the structure of concepts and roles into simple term-forming operators opened the door to extensive analysis of a broad family of languages. One could add and subtract these operators from the language and explore both the computational ramifications and the relationship of the resulting language to other formal languages in Computer Science, such as modal logics and data models for database systems.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Or, as they say in their conclusion,&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Perhaps, the most important aspect of work on Description Logics has been the very tight coupling between theory and practice. The exemplary give-and-take between the formal, analytical side of the field and the pragmatic, implemented side—notable throughout the entire history of Description Logics—has been a role model for other areas of AI.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The great part about it for the rest of us is that existing implementations of this knowledge representation system are not badly-documented demos sitting on a computer at the former university of whoever wrote the paper several years ago (which is often the case with knowledge representation implementations), but W3C-standardized open source software with a wide community of practice. So get out there and play with OWL DL, and perhaps Nardi and Brachman&amp;rsquo;s paper can give you the background to get even more out of these tools.&lt;/p&gt;
&lt;h2 id=&#34;2-comments&#34;&gt;2 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://mhausenblas.blogr.com&#34; title=&#34;http://mhausenblas.blogr.com&#34;&gt;Michael Hausenblas&lt;/a&gt; on &lt;a href=&#34;#comment-1293&#34;&gt;October 4, 2007 3:44 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Bob,&lt;/p&gt;
&lt;p&gt;Indeed this is an excellent book. However I must admit that I actually groked the whole KR/DL/etc. stuff not precisely right after this book, but rather after reading an excellent article by Levesque and Brachman [1].&lt;/p&gt;
&lt;p&gt;Please tell me what you think in case you read it ;)&lt;/p&gt;
&lt;p&gt;Cheers,&lt;br /&gt;
Michael&lt;/p&gt;
&lt;p&gt;[1] &lt;a href=&#34;http://www4.wiwiss.fu-berlin.de/dblp/page/record/journals/ci/LevesqueB87&#34;&gt;http://www4.wiwiss.fu-berlin.de/dblp/page/record/journals/ci/LevesqueB87&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-1294&#34;&gt;October 4, 2007 7:02 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Thanks, I&amp;rsquo;ll read it if I can find it&amp;ndash;all I can find is metadata about it. Is the whole paper available online anywhere?&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2007">2007</category>
      
      <category domain="https://www.bobdc.com//categories/rdf/owl">RDF/OWL</category>
      
    </item>
    
    <item>
      <title>Converting an XML document&#39;s encoding</title>
      <link>https://www.bobdc.com/blog/converting-an-xml-documents-en/</link>
      <pubDate>Fri, 28 Sep 2007 08:16:07 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/converting-an-xml-documents-en/</guid>
      
      
      <description><div>With a very brief XSLT stylesheet.</div><div>&lt;p&gt;A colleague recently asked about converting a collection of XML documents to the US-ASCII encoding (that is, to documents where everything is either a US ASCII character or a numeric character reference such as &amp;amp;#233; for the é character). I have several utility stylesheets for converting the encoding of XML documents, and a slight change to one of them gave me a new version that would create a US-ASCII version of any XML document:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;lt;xsl:stylesheet xmlns:xsl=&amp;quot;http://www.w3.org/1999/XSL/Transform&amp;quot; version=&amp;quot;1.0&amp;quot;&amp;gt;


&amp;lt;xsl:output encoding=&amp;quot;us-ascii&amp;quot;/&amp;gt;


&amp;lt;xsl:template match=&amp;quot;@*|node()&amp;quot;&amp;gt;
  &amp;lt;xsl:copy&amp;gt;
    &amp;lt;xsl:apply-templates select=&amp;quot;@*|node()&amp;quot;/&amp;gt;
  &amp;lt;/xsl:copy&amp;gt;
&amp;lt;/xsl:template&amp;gt;


&amp;lt;/xsl:stylesheet&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The stylesheet&amp;rsquo;s single template rule copies every node in the source document tree to the result document tree unchanged. The key to the stylesheet is the value of the &lt;code&gt;xsl:output&lt;/code&gt; element&amp;rsquo;s &lt;code&gt;encoding&lt;/code&gt; attribute, which specifies the encoding to use when writing the result tree to a file. My similar stylesheets, which have have names like latin1out.xsl and utf8out.xsl, are identical except for this &lt;code&gt;encoding&lt;/code&gt; attribute&amp;rsquo;s value.&lt;/p&gt;
&lt;p&gt;Your choices for what to put in this attribute are limited to what your XSLT processor can handle. While &lt;a href=&#34;http://xml.apache.org/xalan-j/&#34;&gt;Xalan&lt;/a&gt; is usually my third favorite XSLT processor, with Xerces as the XML parser underneath it can read and write &lt;a href=&#34;http://xerces.apache.org/xerces-j/faq-general.html#faq-8&#34;&gt;quite a few encodings&lt;/a&gt;. (I know it&amp;rsquo;s possible to tell &lt;a href=&#34;http://saxon.sourceforge.net/&#34;&gt;Saxon&lt;/a&gt; to use Xerces instead of the &lt;a href=&#34;http://saxon.sourceforge.net/aelfred.html&#34;&gt;Aelfred&lt;/a&gt; parser that it usually uses, but I&amp;rsquo;m too lazy to figure out how.)&lt;/p&gt;
&lt;p&gt;So if you need a simple tool to convert the encoding of one or more XML documents, find the encoding name in &lt;a href=&#34;http://xerces.apache.org/xerces-j/faq-general.html#faq-8&#34;&gt;this list&lt;/a&gt; (if the one you pick has a name in parentheses on that list, use that), make it the value of the &lt;code&gt;encoding&lt;/code&gt; attribute in the &lt;code&gt;xsl:output&lt;/code&gt; element above, and you&amp;rsquo;ll have a stylesheet that converts any well-formed XML document to that encoding.&lt;/p&gt;
&lt;h2 id=&#34;3-comments&#34;&gt;3 Comments&lt;/h2&gt;
&lt;p&gt;By Dave Holden on &lt;a href=&#34;#comment-1278&#34;&gt;September 28, 2007 10:07 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Hi Bob,&lt;/p&gt;
&lt;p&gt;xmllint &lt;a href=&#34;http://xmlsoft.org/xmllint.html&#34;&gt;http://xmlsoft.org/xmllint.html&lt;/a&gt; has a &amp;ldquo;encode&amp;rdquo; option which is also useful for this.\&lt;/p&gt;
&lt;p&gt;By David Carlisle on &lt;a href=&#34;#comment-1279&#34;&gt;September 28, 2007 11:00 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s possibly marginally quicker to use the single template&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;lt;xsl:template match=&amp;quot;/&amp;quot;&amp;gt;
 &amp;lt;xsl:copy-of select=&amp;quot;.&amp;quot;/&amp;gt;
&amp;lt;/xsl:template&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;to save the system doing a template match at every level&lt;/p&gt;
&lt;p&gt;Also, if you use xslt2 you can add omit-xml-declaration=&amp;ldquo;yes&amp;rdquo;&lt;br /&gt;
which is useful, especially with US-ASCII encoding as it gains the benefit of using ascii without the potential drawback of the document being rejected with an unknown encoding. (If I recall correctly early msxml systems wanted &amp;ldquo;ASCII&amp;rdquo; not &amp;ldquo;US-ASCII&amp;rdquo; and just putting nothing and so letting the receiving system default to utf-8 works fine for ASCII documents).&lt;/p&gt;
&lt;p&gt;David\&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-1280&#34;&gt;September 28, 2007 11:18 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;David,&lt;/p&gt;
&lt;p&gt;Nice idea, thanks. The template rule I did have is part of my starting point when I&amp;rsquo;m creating a new stylesheet, but yours is terser. (I try to avoid matching on &amp;ldquo;/&amp;rdquo; because there have been too many times when it&amp;rsquo;s come back to bite me after I added the document() function somewhere in the stylesheet.)&lt;/p&gt;
&lt;p&gt;Bob&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2007">2007</category>
      
      <category domain="https://www.bobdc.com//categories/xslt">XSLT</category>
      
      <category domain="https://www.bobdc.com//categories/neat-tricks">neat tricks</category>
      
    </item>
    
    <item>
      <title>The 13th Floor Elevators</title>
      <link>https://www.bobdc.com/blog/the-13th-floor-elevators/</link>
      <pubDate>Fri, 21 Sep 2007 08:46:16 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/the-13th-floor-elevators/</guid>
      
      
      <description><div>Lip-syncing at a pool party!</div><div>&lt;p&gt;When people started retroactively applying the term &amp;ldquo;psychedelic punk&amp;rdquo; in the late seventies, the late sixties Texas band the 13th Floor Elevators were among the first to earn the title. Tom Verlaine&amp;rsquo;s band Television covered &amp;ldquo;Fire Engine&amp;rdquo; live, but the Texas band&amp;rsquo;s most well-known song (mostly because of the &lt;a href=&#34;http://www.amazon.com/exec/obidos/ISBN=B000E6ET1G/bobducharmeA/&#34;&gt;Nuggets&lt;/a&gt; compilation that everyone had) was &amp;ldquo;You&amp;rsquo;re Gonna Miss Me&amp;rdquo;. I&amp;rsquo;m not sure whether it&amp;rsquo;s annoying or just trippier that Dell used it in an ad recently, but it was a thrill for me to see &lt;a href=&#34;http://www.youtube.com/watch?v=lWCmx1QQgIQ&#34;&gt;them lip synching it at a pool party&lt;/a&gt; on YouTube.&lt;/p&gt;

&lt;div style=&#34;position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;&#34;&gt;
  &lt;iframe src=&#34;https://www.youtube.com/embed/lWCmx1QQgIQ&#34; style=&#34;position: absolute; top: 0; left: 0; width: 100%; height: 100%; border:0;&#34; allowfullscreen title=&#34;YouTube Video&#34;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;p&gt;They look like sixties frat boys, but Roky Erickson looks like a pretty intense frat boy, and he apparently got weirder and weirder from that point on, eventually ending up in a hospital for the criminally insane. According to &lt;a href=&#34;http://en.wikipedia.org/wiki/Roky_Erickson&#34;&gt;Wikipedia&lt;/a&gt;, Texas musicians ranging from Butthole Surfer King Coffey to ZZ Top&amp;rsquo;s Billy Gibbons have worked to get Erickson good treatment and a renewed musical career recently.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://www.rokyerickson.net/&#34;&gt;&lt;img src=&#34;http://www.tenpinmgt.com/picture_library/roky.jpg&#34; alt=&#34;[Roky Erickson picture]&#34; border=&#34;0&#34; hspace=&#34;30px&#34; vspace=&#34;30px&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&#34;1-comments&#34;&gt;1 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://timothyhorrigan.com&#34; title=&#34;http://timothyhorrigan.com&#34;&gt;Tim Horrigan&lt;/a&gt; on &lt;a href=&#34;#comment-1261&#34;&gt;September 22, 2007 11:21 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I think I spotted a three celebrity cameos: Jerry Lewis and Condi Rice are bopping and grooving poolside, and Dee Dee Ramone is a-rockin&amp;rsquo; in the rocking chair off to one side.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2007">2007</category>
      
      <category domain="https://www.bobdc.com//categories/music">music</category>
      
    </item>
    
    <item>
      <title>Metadata data entry</title>
      <link>https://www.bobdc.com/blog/metadata-data-entry/</link>
      <pubDate>Tue, 18 Sep 2007 23:07:00 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/metadata-data-entry/</guid>
      
      
      <description><div>Who (or what), and why.</div><div>&lt;p&gt;How do we assign metadata to data? Ontologies often say &amp;ldquo;here is some information about the metadata we&amp;rsquo;d like to have for our data&amp;rdquo;, but the actual assignment of metadata that conforms to an ontology is usually more work than developing the ontology. Who assigns this metadata, and why do they do it? You have three choices: people who do it because they&amp;rsquo;re paid to, people who do it because they want to, and automated processes. I&amp;rsquo;m reading up on doing it with automated processes and will hopefully be reporting on this soon.&lt;/p&gt;
&lt;p&gt;For now, the second choice is the most interesting because it&amp;rsquo;s the newest and people are still trying to get a handle on it. (Don&amp;rsquo;t miss the &lt;a href=&#34;http://blogs.talis.com/nodalities/2007/08/thomas_vander_wal_talks_with_t.php&#34;&gt;Talis interview with Thomas Vander Wal&lt;/a&gt;, who coined the term &amp;ldquo;folksonomies&amp;rdquo; and has thought very hard about their potential relationship to taxonomies.) In the early days of folksonomies, some didn&amp;rsquo;t like the idea that the assigned metadata might not conform to a specific taxonomy or ontology. Folksonomies trade query precision (if you can&amp;rsquo;t know all the terms that may have been assigned to a resource, you can&amp;rsquo;t be sure of finding it) for something that&amp;rsquo;s often more valuable: a lower threshold in the resistance of volunteers to do free data entry. It reminds me a bit of Tim Berners-Lee&amp;rsquo;s acceptance of broken links when he designed a large-scale hypertext system: by questioning one of the original requirements for system integrity, you can end up with a large, inexpensive system that still helps people retrieve much of the information that they want.&lt;/p&gt;
&lt;blockquote class=&#34;pullquote&#34;&gt;Folksonomies remind me of Tim Berners-Lee&#39;s acceptance of broken links.&lt;/blockquote&gt;
&lt;p&gt;The success of folksonomies and the web don&amp;rsquo;t prove the original &amp;ldquo;requirements&amp;rdquo; to be wrong; they just prove that similar systems that don&amp;rsquo;t meet those particular requirements can still be useful. Systems that do meet those requirements can be even more useful, but they&amp;rsquo;re more expensive to create and hence to use. Attorneys searching the legal cases stored in LexisNexis or Westlaw have every right to expect that all the links work and that the keywords assigned to each case belong to a carefully maintained taxonomy. This &lt;em&gt;is&lt;/em&gt; in their requirements, because if you plan to tell a judge &amp;ldquo;you should rule in favor of my client because another judge ruled in favor of a plaintiff in a nearly identical situation, and no one&amp;rsquo;s ever overturned that ruling&amp;rdquo;, you want to be really, really sure that no one&amp;rsquo;s ever overturned it. That&amp;rsquo;s why these products are expensive, and that&amp;rsquo;s why people cheering for comprehensive free online versions of the law (which I&amp;rsquo;m all for) don&amp;rsquo;t realize which features they&amp;rsquo;ll be giving up if they switch to the free ones.&lt;/p&gt;
&lt;p&gt;How do you get metadata assigned to a large volume of resources? Letting people assign arbitrary keywords might be almost enough, but you need to provide them with a little more incentive, such as making their own resources easier to find through the use of their own tags—for example, I can search for my own &lt;a href=&#34;http://www.flickr.com/photos/bobdc&#34;&gt;pictures on flickr&lt;/a&gt; or &lt;a href=&#34;http://del.icio.us/bobdc&#34;&gt;bookmarks on del.icio.us&lt;/a&gt; by searching the tags that I created for them. (Another incentive, which doesn&amp;rsquo;t play too well into the data integrity angle, is letting people assign silly funny metadata—check out the &lt;a href=&#34;http://www.amazon.com/Playing-Fire-Kevin-Federline/dp/tags-on-product/B000IU3YLY/ref=tag_dpp_cust_edpp_sa/105-7782200-2746866?ie=UTF8&amp;amp;qid=1185935519&amp;amp;sr=8-1&#34;&gt;tags assigned&lt;/a&gt; to Kevin &amp;ldquo;Mr. Britney Spears&amp;rdquo; Federline&amp;rsquo;s CD on Amazon).&lt;/p&gt;
&lt;p&gt;Paying other people to tag resources with keywords has the advantage of ensuring that the metadata conforms to a worked-out structure. This makes the metadata (and hence the data) more valuable and isn&amp;rsquo;t always as expensive as you might think, even when you need specific subject expertise in your metadata assigners. (Please excuse the brief plug for my employer, &lt;a href=&#34;http://www.innodata-isogen.com/services/data_capture_and_data_entry&#34;&gt;Innodata Isogen&lt;/a&gt;, and contact me if you&amp;rsquo;d like to know more.)&lt;/p&gt;
&lt;p&gt;While some metadata is free, such as the size of a file and the last time it was edited, creation of new metadata is never completely free. If you&amp;rsquo;re not paying people outright, you must come up with and then implement some system that makes people &lt;em&gt;want&lt;/em&gt; to do your metadata data entry without being paid. What if someone created a site that let users make up tags, and no one did so? There are plenty of examples of such sites, which jumped on the Web 2.0 bandwagon as if letting users tag contents was some sort of silver bullet. Despite what David Weinberger writes in &lt;a href=&#34;http://www.amazon.com/exec/obidos/ISBN=0805080430/bobducharmeA/&#34;&gt;Everything is Miscellaneous&lt;/a&gt;, simply letting people add metadata isn&amp;rsquo;t enough. They need an incentive.&lt;/p&gt;
&lt;p&gt;If you don&amp;rsquo;t want to pay someone to paint your fence, you can do what Tom Sawyer did and convince others that it&amp;rsquo;s a fun privilege for them to do it for you. Weinberger and &lt;a href=&#34;https://www.bobdc.com/blog/semantic-web-project-ideas-num&#34;&gt;Don Tapscott&lt;/a&gt; help lead cheers that metadata data entry is fun, but it seems to be the most fun for people making fun of Kevin Federline, and if it doesn&amp;rsquo;t make your life more fun, it better do something to make your life easier. Coming up with that incentive is the real silver bullet, if you want to avoid writing a check for human labor or automated systems to do this work for you.&lt;/p&gt;
&lt;h2 id=&#34;2-comments&#34;&gt;2 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.jenitennison.com&#34; title=&#34;http://www.jenitennison.com&#34;&gt;Jeni Tennison&lt;/a&gt; on &lt;a href=&#34;#comment-1251&#34;&gt;September 19, 2007 2:48 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Another way is to make capturing metadata fun or competitive. See &lt;a href=&#34;http://www.espgame.org/&#34;&gt;The ESP Game&lt;/a&gt; for example, or &lt;a href=&#34;http://www.peekaboom.org/&#34;&gt;Peekaboom&lt;/a&gt;. These both come from &lt;a href=&#34;http://www.cs.cmu.edu/~biglou/&#34;&gt;Louis von Ahn&lt;/a&gt; and his team.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-1252&#34;&gt;September 19, 2007 5:08 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Yes, there was an &lt;a href=&#34;http://www.wired.com/techbiz/it/magazine/15-07/ff_humancomp&#34;&gt;interesting article about von Ahn&lt;/a&gt; in Wired recently.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2007">2007</category>
      
      <category domain="https://www.bobdc.com//categories/metadata">metadata</category>
      
    </item>
    
    <item>
      <title>Using Word for command line conversion of DOC files to XML</title>
      <link>https://www.bobdc.com/blog/using-word-for-command-line-co/</link>
      <pubDate>Fri, 14 Sep 2007 09:11:03 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/using-word-for-command-line-co/</guid>
      
      
      <description><div>Or to RTF, or to whatever.</div><div>&lt;p&gt;I&amp;rsquo;ve &lt;a href=&#34;http://www.xml.com/pub/a/2006/01/11/from-microsoft-to-openoffice.html&#34;&gt;written before&lt;/a&gt; about using OpenOffice to convert Microsoft Office files to OpenOffice files (and hence XML) with a shell prompt command that starts up OpenOffice with the MS Office file, does a Save As, and then quits OpenOffice. Because it can be done from the command line, this makes conversion of multiple files with a batch file or shell script much easier.&lt;/p&gt;
&lt;img src=&#34;https://www.bobdc.com/img/main/word2xml.jpg&#34; alt=&#34;[Word and Word XML icons]&#34; border=&#34;0&#34; align=&#34;right&#34; hspace=&#34;30px&#34; vspace=&#34;30px&#34;/&gt;
&lt;p&gt;I recently had to do the same thing with Word to convert Word files to MS XML, and it turned out to be similar: you write a macro that does the SaveAs and then quits, and you start up Word from the command line naming the file to convert and the macro to do the conversion.&lt;/p&gt;
&lt;p&gt;The macro I wrote yesterday could use some refinement, but it works:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Sub SaveAsXML()
NewFilename = (Replace(ActiveDocument.FullName, &amp;quot;.doc&amp;quot;, &amp;quot;.xml&amp;quot;))
ActiveDocument.SaveAs FileName:=NewFilename, FileFormat:=wdFormatXML
Application.Quit
End Sub
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;(It seems like I have to write a bit of VB code about every three years, so with any luck that&amp;rsquo;s it until 2010. I was sorry to hear that in my nephew&amp;rsquo;s first year at the University of Kansas, the &amp;ldquo;Intro to Programming&amp;rdquo; course uses VB. As I said to my sister, &amp;ldquo;But you&amp;rsquo;re not living in a Seattle suburb anymore!&amp;rdquo;) If you want this to save as something other than XML, see &lt;a href=&#34;http://infotools.ru/products/AXAPI/BRIZLGWORD/BRIZLGWORD/_DocSaveAs.htm&#34;&gt;the other options for the FileFormat parameter&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;My word2xml.bat batch file to tell Word to start up with a given file and run the macro looks like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;quot;C:\Program Files\Microsoft Office\OFFICE11\winword&amp;quot; %1 /mSaveAsXML 
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;There are &lt;a href=&#34;http://support.microsoft.com/kb/210565&#34;&gt;other command line options&lt;/a&gt; for winword.exe besides /m, but none looked very interesting to me.&lt;/p&gt;
&lt;p&gt;As with my command line trick for converting MS Office files to OpenOffice files, this technique can get filed with quick and dirty perl scripts: if you have a batch of files that need a one-time conversion some afternoon, it&amp;rsquo;s great, but it&amp;rsquo;s not really fast, so if you&amp;rsquo;re building a production system that needs to perform this conversion every day, there are some other options that will be more complex to set up but will run more quickly because they won&amp;rsquo;t require starting up and shutting down the word processor for every document.&lt;/p&gt;
&lt;p&gt;As far as what to do with the Word XML files once I have them, well, &lt;a href=&#34;https://www.bobdc.com/blog/more-on-words-mediocre-xml&#34;&gt;don&amp;rsquo;t get me started&amp;hellip;&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&#34;8-comments&#34;&gt;8 Comments&lt;/h2&gt;
&lt;p&gt;By marcelo on &lt;a href=&#34;#comment-1236&#34;&gt;September 14, 2007 11:27 AM&lt;/a&gt;&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The macro I wrote yesterday could use some refinement,&lt;br /&gt;
but it works:&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;thank you for the code&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;As far as what to do with the Word XML files once I&lt;br /&gt;
have them, well, don&amp;rsquo;t get me started&amp;hellip;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;when will you give us another Word XML review? i found the old ones very insightful&lt;/p&gt;
&lt;p&gt;greetings&lt;/p&gt;
&lt;p&gt;marcelo&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-1237&#34;&gt;September 14, 2007 11:45 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Thanks Marco!&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;when will you give us another Word XML review?&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Let me put it this way: I look hard at certain technology for fun, and at other technology only because it&amp;rsquo;s related to something I&amp;rsquo;m being paid to do.&lt;/p&gt;
&lt;p&gt;Word XML does not fall in the &amp;ldquo;fun&amp;rdquo; category.\&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://evanlenz.net/blog&#34; title=&#34;http://evanlenz.net/blog&#34;&gt;Evan Lenz&lt;/a&gt; on &lt;a href=&#34;#comment-1238&#34;&gt;September 14, 2007 10:26 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Hi Bob,&lt;/p&gt;
&lt;p&gt;Maybe it&amp;rsquo;s because I already invested about a year of my life into WordML because I was paid to do it (writing for O&amp;rsquo;Reilly), but I think processing WordML can be fun too. It certainly gives me a lot of tough, real-world problems to try out XSLT 2.0&amp;rsquo;s more advanced facilities on. WordML&amp;rsquo;s format in itself isn&amp;rsquo;t terribly nice in general, and I touched on some of its idiosyncrasies in the Office 2003 XML book, but it does have a certain consistency to it. Also, I&amp;rsquo;ve found XML-config-file-driven invocations of xsl:for-each-group to be a very powerful, generic way of reconstituting the hierarchy that&amp;rsquo;s implicit in the relationship of flat lists of paragraph styles.&lt;/p&gt;
&lt;p&gt;Evan&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.aticorp.org&#34; title=&#34;http://www.aticorp.org&#34;&gt;Lynwood Hines&lt;/a&gt; on &lt;a href=&#34;#comment-1295&#34;&gt;October 5, 2007 9:17 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&amp;ldquo;&amp;hellip;there are some other options that will be more complex to set up but will run more quickly because they won&amp;rsquo;t require starting up and shutting down the word processor for every document&amp;rdquo;&lt;/p&gt;
&lt;p&gt;TEASE!!! What are these other options? I need to do this on hundreds of files on a regular basis. If you can&amp;rsquo;t write an article on how to accomplish this efficiently, could you drop some breadcrumbs to help me research it further?&lt;/p&gt;
&lt;p&gt;Also, many thanks for writing this article; it gets me one BIG step closer to a solution.&lt;/p&gt;
&lt;p&gt;Lynwood Hines&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-1296&#34;&gt;October 5, 2007 10:27 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Lynwood,&lt;/p&gt;
&lt;p&gt;These other methods would involve telling an existing running process to open a file, save it as XML, close it, and then move on to the next file. If you were going to have OpenOffice do this, &lt;a href=&#34;http://api.openoffice.org&#34;&gt;http://api.openoffice.org&lt;/a&gt; would be a place to start, but I would look through and ask questions on the appropriate OpenOffice mailing list before I started serious coding. To do this with Word,&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;m sure there&amp;rsquo;s some VB or C# way to tell a running Word instance to do this, either via COM from outside the instance, or with some macro from inside of Word. For example, the macro might look in a certain directory and convert everything it finds there. As with OpenOffice, I&amp;rsquo;m sure there&amp;rsquo;s a mailing list out there where you can find people to give you some more specific tips. I haven&amp;rsquo;t done it myself, but this is what would guide my research.\&lt;/p&gt;
&lt;p&gt;By Lynwood Hines on &lt;a href=&#34;#comment-1298&#34;&gt;October 8, 2007 9:57 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Bob,&lt;/p&gt;
&lt;p&gt;Thank you for the quick response. I&amp;rsquo;ll look into the OA api and COM communication approaches first. If I come up with a useful recipe I&amp;rsquo;ll post it back here.&lt;/p&gt;
&lt;p&gt;LH&lt;/p&gt;
&lt;p&gt;By Lynwood Hines on &lt;a href=&#34;#comment-1312&#34;&gt;October 13, 2007 3:19 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I figured out how to use OLE to tell Word to convert documents to text format. This approach would only require a minor tweak to generate XML, RTF, or any other supported format.&lt;/p&gt;
&lt;p&gt;The example below is written in Perl and used the Win32::OLE Perl package:&lt;/p&gt;
&lt;p&gt;use Win32::OLE qw(in with);&lt;br /&gt;
use Win32::OLE::Const;&lt;br /&gt;
use Win32::OLE::Const &amp;lsquo;Microsoft Word&amp;rsquo;;&lt;/p&gt;
&lt;p&gt;# Instantiate our very own MS Word process:&lt;/p&gt;
&lt;p&gt;$wordApp = Win32::OLE-&amp;gt;new(&amp;lsquo;Word.Application&amp;rsquo;, &amp;lsquo;Quit&amp;rsquo;);&lt;/p&gt;
&lt;p&gt;$wordApp-&amp;gt;{Visible}= 1; # Set to 0 to hide&lt;/p&gt;
&lt;p&gt;# Load &amp;ldquo;foo.doc&amp;rdquo; into our instance of MS Word. Terminate with error message if something goes awry:&lt;/p&gt;
&lt;p&gt;$wordApp-&amp;gt;Documents-&amp;gt;Open(&amp;ldquo;foo.doc&amp;rdquo;)&lt;br /&gt;
or die(&amp;ldquo;Unable to open Word document: &amp;ldquo;, Win32::OLE-&amp;gt;LastError());&lt;/p&gt;
&lt;p&gt;# Save the file as a text file. Delete the destination text file first so we don&amp;rsquo;t have to contend with an overwrite warning window in Word:&lt;/p&gt;
&lt;p&gt;unlink &amp;ldquo;foo.txt&amp;rdquo; if (-e &amp;ldquo;foo.txt&amp;rdquo;);&lt;/p&gt;
&lt;p&gt;$wordApp-&amp;gt;ActiveDocument-&amp;gt;SaveAs&lt;br /&gt;
({&lt;br /&gt;
FileName =&amp;gt; &amp;ldquo;foo.txt&amp;rdquo;,&lt;br /&gt;
FileFormat =&amp;gt; wdFormatDOSTextLineBreaks&lt;br /&gt;
});&lt;/p&gt;
&lt;p&gt;# Close document, leave Word running to speed future conversions:&lt;/p&gt;
&lt;p&gt;$wordApp-&amp;gt;ActiveDocument-&amp;gt;Close();&lt;/p&gt;
&lt;p&gt;# When you are finished doing conversions, kill the word instance:&lt;/p&gt;
&lt;p&gt;$wordApp-&amp;gt;Quit;\&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-1313&#34;&gt;October 13, 2007 3:36 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Looks great, thanks!&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2007">2007</category>
      
      <category domain="https://www.bobdc.com//categories/xml">XML</category>
      
      <category domain="https://www.bobdc.com//categories/neat-tricks">neat tricks</category>
      
    </item>
    
    <item>
      <title>Command prompt as an IM session with my  computer?</title>
      <link>https://www.bobdc.com/blog/command-prompt-as-an-im-sessio/</link>
      <pubDate>Wed, 12 Sep 2007 08:40:41 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/command-prompt-as-an-im-sessio/</guid>
      
      
      <description><div>Cryptic abbreviations and scrolling text: not so old-fashioned after all.</div><div>&lt;img src=&#34;https://www.bobdc.com/img/main/cprompt.jpg&#34; alt=&#34;[C prompt]&#34; border=&#34;0&#34; align=&#34;right&#34; hspace=&#34;30px&#34; vspace=&#34;30px&#34;/&gt;
&lt;p&gt;My daughters have always thought that the &amp;ldquo;black box&amp;rdquo; that I use so much (the command prompt shell) was hilarious. I type my cryptic little abbreviations, press Enter, and then more text goes scrolling up the window and off the top. I tell them how in the early days of PCs, and even pre-PC computers, the whole computer screen was just one big version of that window, and how before that people did the same thing with a keyboard and a printer, and the girls roll their eyes and patronizingly say &amp;ldquo;great, Dad!&amp;rdquo;&lt;/p&gt;
&lt;p&gt;Once, after IMing with a colleague about some work-related issue, I wanted to make my thumb drive the default drive in my command window, and I accidentally sent the colleague the instant message &amp;ldquo;f:&amp;rdquo;. Then I realized: the command prompt window is like IMing with your operating system. I pointed out to my daughters that one of their favorite applications had plenty in common with my command prompt window: they type their cryptic abbreviations (ROTFL!) and then wait for the response, which is text scrolling up and off the top of the window. I think their response was something along the lines of &amp;ldquo;great, Dad!&amp;rdquo;&lt;/p&gt;
&lt;h2 id=&#34;2-comments&#34;&gt;2 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://blog.triplescape.com&#34; title=&#34;http://blog.triplescape.com&#34;&gt;Brian&lt;/a&gt; on &lt;a href=&#34;#comment-1231&#34;&gt;September 12, 2007 9:50 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;..:: \/\/31c0/\/\3 2 l33t ohS ::..&lt;/p&gt;
&lt;p&gt;$ d3l t.txt&lt;br /&gt;
OMG! r34lly d3l3t3 t.txt? y3s&lt;br /&gt;
w00t! i p0wn&amp;rsquo;d t.txt!&lt;/p&gt;
&lt;p&gt;$ ct4 stdout.log&lt;br /&gt;
n00b. wahts ct4&lt;/p&gt;
&lt;p&gt;$ c4t l0lkatz.l0g&lt;br /&gt;
i c4n haz ch33zb3rgr? i c4n haz ch33zb3rgr? i c4n haz ch33zb3rgr? i c4n haz ch33zb3rgr? i c4n haz ch33zb3rgr? i c4n haz ch33zb3rgr? i c4n haz ch33zb3rgr? i c4n haz ch33zb3rgr?&lt;/p&gt;
&lt;p&gt;$\&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://xmlhacker.com/&#34; title=&#34;http://xmlhacker.com/&#34;&gt;M. David Peterson&lt;/a&gt; on &lt;a href=&#34;#comment-1232&#34;&gt;September 12, 2007 11:01 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Isn&amp;rsquo;t it funny how the more things change, the more they stay the same? What amazes me is how much eye-candy we have at our disposal these days and yet by our very nature the primary thing we have interest in our the tools that enable the ability to communicate with one another using the most basic of all interfaces&amp;hellip;&lt;/p&gt;
&lt;p&gt;A command prompt. :)&lt;/p&gt;
&lt;p&gt;Great story! :D&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2007">2007</category>
      
      <category domain="https://www.bobdc.com//categories/miscellaneous">miscellaneous</category>
      
    </item>
    
    <item>
      <title>Tracking the Semantic Web Strategies conference</title>
      <link>https://www.bobdc.com/blog/tracking-the-semantic-web-stra/</link>
      <pubDate>Sat, 08 Sep 2007 10:13:32 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/tracking-the-semantic-web-stra/</guid>
      
      
      <description><div>A taxonomy with two categories.</div><div>&lt;p&gt;The Semantic Web Strategies conference is just a few weeks away, although there are still a few days before September 12th to get the early registration rate.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://www.semanticwebstrategies.com/index.php&#34;&gt;&lt;img src=&#34;http://www.semanticwebstrategies.com/images/logo_SWS_hdr.gif&#34; alt=&#34;[Semantic Web Strategies logo]&#34; border=&#34;0&#34; align=&#34;right&#34; hspace=&#34;30px&#34; vspace=&#34;30px&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;When we first thought about what to call the tracks, it was difficult to come up with two good category names that would each account for half the presentations, so we picked the names &amp;ldquo;The Past and Present of the Semantic Web&amp;rdquo; and &amp;ldquo;The Present and Future of the Semantic Web&amp;rdquo; as placeholders. The idea was that the former would present stories from the trenches about people&amp;rsquo;s experience planning and rolling out implementations, while presentations in the latter category would describe applications that people are assembling now and their hopes for where this work would take their business.&lt;/p&gt;
&lt;p&gt;Once we saw the submissions and picked the best ones, though, two new track names seemed obvious: &amp;ldquo;The Semantic Web and Your Applications&amp;rdquo; and &amp;ldquo;The Semantic Web and Your Users&amp;rdquo;. Of course all applications have users at some level, but with half the talks focusing on how to get semantic web tools and standards to best serve the sponsoring organization and the other half focusing on &lt;a href=&#34;http://www.semanticwebstrategies.com/conference/sessionsbyday.php#B1&#34;&gt;user interfaces&lt;/a&gt;, &lt;a href=&#34;http://www.semanticwebstrategies.com/conference/sessionsbyday.php#B2&#34;&gt;training&lt;/a&gt;, and especially user-related data, the new division of tracks was clearly a better idea.&lt;/p&gt;
&lt;p&gt;User-related data is an especially hot area. The people who added the tagging features to &lt;a href=&#34;http://www.flickr.com/photos/bobdc&#34;&gt;flickr&lt;/a&gt; and &lt;a href=&#34;http://del.icio.us/bobdc&#34;&gt;del.icio.us&lt;/a&gt; weren&amp;rsquo;t thinking &amp;ldquo;semantic web&amp;rdquo; when they did it, but they weren&amp;rsquo;t thinking &amp;ldquo;Web 2.0&amp;rdquo; either. They were thinking of something that would help both their customers and their businesses, and it&amp;rsquo;s great to see people thinking about what semantic web technologies can add to that. For example, Taylor Cowan will talk in the &amp;ldquo;Semantic Web and Your Users&amp;rdquo; track about &lt;a href=&#34;http://www.semanticwebstrategies.com/conference/sessionsbyday.php#B3&#34;&gt;Ontology-driven Travel Recommendations&lt;/a&gt; and how &lt;a href=&#34;http://www.bambora.com/&#34;&gt;bambora.com&lt;/a&gt; uses an ontology to get more value out of user-entered data, and Tony Hammond will give a talk titled &lt;a href=&#34;http://www.semanticwebstrategies.com/conference/sessionsbyday.php#B5&#34;&gt;Publishing Science on the Open Web: Enter the User&lt;/a&gt; about the effect of user data on one of the world&amp;rsquo;s most well-known science publishers .&lt;/p&gt;
&lt;p&gt;For making the applications serve an enterprise better, we have talks such as Melliyal Annamalai&amp;rsquo;s on &lt;a href=&#34;http://www.semanticwebstrategies.com/conference/sessionsbyday.php#A7&#34;&gt;Using Oracle RDF for Managing Customer Subscription Data&lt;/a&gt;. I knew that Oracle had done a lot of work in this area, and I had assumed that we could get them to send a marketing person to discuss their RDF work if we had asked, so I was very pleased that we could get a &amp;ldquo;Principal Member of Technical Staff&amp;rdquo; (her job title) with a computer science Ph.D. instead. While some Ph.D.&amp;rsquo;s might talk in abstract terms about prototype projects, Melliyal will describe how the Oracle semantic web products addressed a specific customer&amp;rsquo;s business needs.&lt;/p&gt;
&lt;p&gt;Businesses, applications and users have been around for a long time. I&amp;rsquo;m really looking forward to learning more in San Jose about what semantic web tools and standards can add to the relationships between these three things.&lt;/p&gt;
&lt;h2 id=&#34;4-comments&#34;&gt;4 Comments&lt;/h2&gt;
&lt;p&gt;By Ramon on &lt;a href=&#34;#comment-1239&#34;&gt;September 15, 2007 1:11 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;All your links to the conference seem to be broken, ending at a Jupitermedia Sitemap page for the past 24 hours. Is their site down?&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-1240&#34;&gt;September 15, 2007 1:58 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Unfortunately, the conference has been postponed until April. I was waiting for the Jupiter people to make a more official announcement before I said anything about it.&lt;/p&gt;
&lt;p&gt;Bob&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://cantorva.com&#34; title=&#34;http://cantorva.com&#34;&gt;Simon Gibbs&lt;/a&gt; on &lt;a href=&#34;#comment-1322&#34;&gt;October 18, 2007 11:19 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Are we still getting the same speakers in April, or what is happening?&lt;/p&gt;
&lt;p&gt;The program page is curerently &amp;ldquo;not yet configured&amp;rdquo;&lt;/p&gt;
&lt;p&gt;Simon&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-1323&#34;&gt;October 18, 2007 11:29 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I imagine that once Jupiter Media has some other things in place, contacting those speakers will be the next step, but I haven&amp;rsquo;t heard from them on this yet.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2007">2007</category>
      
      <category domain="https://www.bobdc.com//categories/semantic-web">semantic web</category>
      
    </item>
    
    <item>
      <title>David Bowie on Dinah Shore</title>
      <link>https://www.bobdc.com/blog/david-bowie-on-dinah-shore/</link>
      <pubDate>Wed, 05 Sep 2007 08:52:26 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/david-bowie-on-dinah-shore/</guid>
      
      
      <description><div>The Thin White Duke in his prime with a great band.</div><div>&lt;p&gt;While I&amp;rsquo;ve sworn that I would never write about my new favorite album in this weblog—if I think that the White Stripes&amp;rsquo; &lt;a href=&#34;http://www.amazon.com/exec/obidos/ISBN=B000OYC3J8/bobducharmeA/&#34;&gt;latest album&lt;/a&gt; is even better than &lt;a href=&#34;http://www.amazon.com/exec/obidos/ISBN=B00097A5H2/bobducharmeA/&#34;&gt;Get Behind Me Satan&lt;/a&gt;, what do you care?—but at the recent XML Summer School in Oxford Norm Walsh convinced me that more coverage of non-technical topics is a good thing. A few recent Tim Bray postings convinced me that it&amp;rsquo;s worthwhile to point at great musical performances on YouTube that people don&amp;rsquo;t know about. While it&amp;rsquo;s very easy to find people discussing the most recent White Stripes album, you probably didn&amp;rsquo;t even know about David Bowie&amp;rsquo;s 1975 stint on the Dinah Shore show.&lt;/p&gt;
&lt;p&gt;I stumbled across it while looking through YouTube for Louis Prima clips one night. (&lt;a href=&#34;http://www.youtube.com/watch?v=8K1InOOLEsQ&#34;&gt;Here&amp;rsquo;s a nice one&lt;/a&gt;, and &lt;a href=&#34;http://www.youtube.com/watch?v=QTO5jc71fbE&#34;&gt;here&amp;rsquo;s a very early one&lt;/a&gt; if you&amp;rsquo;re interested.) New Orleans&amp;rsquo; second most famous scat singing trumpet player named Louis—the Italian one—had been on a Dinah Shore TV show in 1958, and YouTube showed me popular related clips on the right side of the screen—David Bowie? In his Thin White Duke prime?&lt;/p&gt;
&lt;p&gt;You could watch him &lt;a href=&#34;http://www.youtube.com/watch?v=T1bRIwqv_tM&#34;&gt;describe his admiration for Fonzie&lt;/a&gt; (and Dinah Shore calling Fonzie &amp;ldquo;the David Bowie of &amp;lsquo;Happy Days&amp;rsquo;&amp;rdquo;), or &lt;a href=&#34;http://www.youtube.com/watch?v=jfTdyfwSB04&#34;&gt;take a Karate lesson&lt;/a&gt;, but if you want to skip the camp, check out his &lt;a href=&#34;http://www.youtube.com/watch?v=Gx8RNvhKTMc&#34;&gt;kickass version of &amp;ldquo;Stay&amp;rdquo;&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;
&lt;div style=&#34;position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;&#34;&gt;
  &lt;iframe src=&#34;https://www.youtube.com/embed/Gx8RNvhKTMc&#34; style=&#34;position: absolute; top: 0; left: 0; width: 100%; height: 100%; border:0;&#34; allowfullscreen title=&#34;YouTube Video&#34;&gt;&lt;/iframe&gt;
&lt;/div&gt;
 &lt;a href=&#34;http://www.amazon.com/Funk-Drumming-Jim-Payne/dp/0871665115&#34;&gt;&lt;img src=&#34;http://g-ec2.images-amazon.com/images/I/51MDZKNSZ7L.jpg&#34; alt=&#34;[Funk Drumming cover]&#34; border=&#34;0&#34; class=&#34;rightAlignedOpeningPicture&#34; width=&#34;160px&#34; align=&#34;right&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;My daughter&amp;rsquo;s drum teacher has her using this &lt;a href=&#34;http://www.amazon.com/Funk-Drumming-Jim-Payne/dp/0871665115&#34;&gt;Mel Bay&amp;rsquo;s Funk Drumming&lt;/a&gt; book, and I&amp;rsquo;ve often joked that its author doesn&amp;rsquo;t look very funky. To show her a best case, I showed her Bowie on Dinah Shore. &lt;a href=&#34;http://www.drumsoloartist.com/Site/Drummers3/Dennis_Davis.html&#34;&gt;Dennis Davis&lt;/a&gt; was a Vietnam Vet who had studied with Max Roach and Elvin Jones, and boy, Bowie spent his money well. The whole band is amazing. Even with Earl Slick&amp;rsquo;s awful 70s hair and awful 70s white suit, he made a fine contribution to the song. Keep in mind as you watch this that it&amp;rsquo;s on a midday talk show whose closest modern equivalent would be &lt;a href=&#34;http://abc.go.com/daytime/theview/index&#34;&gt;The View&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;(If you&amp;rsquo;re a big Bowie fan, don&amp;rsquo;t miss his &lt;a href=&#34;http://www.youtube.com/watch?v=Bed-pnf6oGY&#34;&gt;1974 performance of &amp;ldquo;Young Americans&amp;rdquo;&lt;/a&gt; on Dick Cavett, either.)&lt;/p&gt;
&lt;h2 id=&#34;3-comments&#34;&gt;3 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://synklynk.com&#34; title=&#34;http://synklynk.com&#34;&gt;James Lynch III&lt;/a&gt; on &lt;a href=&#34;#comment-1208&#34;&gt;September 5, 2007 11:19 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Perhaps there should be a best of &amp;ldquo;Dinah,&amp;rdquo; as they did for the Dick Cavett show&amp;hellip; focusing on the music&amp;hellip; her &lt;a href=&#34;http://youtube.com/watch?v=Sr0EkGiwfS4&#34;&gt;Iggy interview&lt;/a&gt; is classic, and I&amp;rsquo;m sure there are some other gems amid the dreck&amp;hellip;&lt;/p&gt;
&lt;p&gt;I think I was home sick from high school when the bowie/iggy Dinah show first aired.&lt;/p&gt;
&lt;p&gt;BTW&amp;hellip; your TypeKey implementation isn&amp;rsquo;t working for me.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-1209&#34;&gt;September 5, 2007 12:02 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Thanks Jim. I didn&amp;rsquo;t notice the TypeKey comment there, and took it out. I&amp;rsquo;ve been trying to figure out how to automatically have Movable Type approve comments from people I&amp;rsquo;ve approved before without forcing them to use a TypeKey ID, and I must have checked the wrong dialog box somewhere. I found a plugin at Movalog.com, but the link to the plugin to download is broken and they don&amp;rsquo;t answer their email.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.TimothyHorrigan.com&#34; title=&#34;http://www.TimothyHorrigan.com&#34;&gt;Tim Horrigan&lt;/a&gt; on &lt;a href=&#34;#comment-1218&#34;&gt;September 6, 2007 6:29 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Bob! You forgot to your affiliate tag on the link to the Mel Bay book. Oh well, I don&amp;rsquo;t really need to buy anything from Amazon right now anyway. I can always click on HipsterGifts.com in the event I need anything, anyway. Not only that, if your audience is like my site&amp;rsquo;s audience, no one ever clicks on any ****ing links anyway, especially not the paying ones :-(&lt;/p&gt;
&lt;p&gt;I will remember to hit a few Google ads before I go!\&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2007">2007</category>
      
      <category domain="https://www.bobdc.com//categories/music">music</category>
      
    </item>
    
    <item>
      <title>Automated RDFa Output from DITA Open Toolkit</title>
      <link>https://www.bobdc.com/blog/automated-rdfa-output-from-dit/</link>
      <pubDate>Fri, 31 Aug 2007 07:56:52 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/automated-rdfa-output-from-dit/</guid>
      
      
      <description><div>A replacement module to make it easy.</div><div>&lt;p&gt;I &lt;a href=&#34;https://www.bobdc.com/blog/who-uses-metadata-from-html-he&#34;&gt;recently asked&lt;/a&gt; if anyone knew of applications that pull &lt;code&gt;meta[@name and @content]&lt;/code&gt; metadata out of HTML &lt;code&gt;head&lt;/code&gt; elements, and I got a few interesting answers. To extract such data, writing a short XSLT stylesheet that reads the output of John Cowan&amp;rsquo;s &lt;a href=&#34;http://ccil.org/~cowan/XML/tagsoup/&#34;&gt;TagSoup&lt;/a&gt; would be easy, but lately I&amp;rsquo;ve been thinking: with a slight change to those &lt;code&gt;meta&lt;/code&gt; elements, they&amp;rsquo;d be RDFa, which can store more versatile metadata that is easier to get out (see &lt;a href=&#34;http://www.xml.com/pub/a/2007/02/14/introducing-rdfa.html?page=2&#34;&gt;Getting Those Triples&lt;/a&gt;).&lt;/p&gt;
&lt;blockquote class=&#34;pullquote&#34;&gt;For those interested in seeing more RDFa `meta` elements in their HTML `head` elements, the difficult work has already been done.&lt;/blockquote&gt;
&lt;p&gt;For those interested in seeing more RDFa &lt;code&gt;meta&lt;/code&gt; elements in their HTML &lt;code&gt;head&lt;/code&gt; elements, the difficult work has already been done. Many HTML generation routines out there have code to find the value that goes with a certain name (typically, a Dublin Core property name) and then insert the &lt;code&gt;meta&lt;/code&gt; element with the name/value pair. Minimal changes to this code can change it to output RDFa instead. For example, the &lt;a href=&#34;http://dita-ot.sourceforge.net/&#34;&gt;DITA Open Toolkit&lt;/a&gt; is an open-source package that converts base or specialized &lt;a href=&#34;http://dita.xml.org/&#34;&gt;DITA&lt;/a&gt; content to HTML, XHTML, Java help, RTF, troff, PDF, and more formats. The HTML generation part includes a get-meta.xsl stylesheet that inserts the &lt;code&gt;meta&lt;/code&gt; elements, and I&amp;rsquo;ve created a revised version called &lt;a href=&#34;http://www.snee.com/bobdc.blog/files/get-meta-rdfa.xsl&#34;&gt;get-meta-rdfa.xsl&lt;/a&gt; that inserts RDFa &lt;code&gt;meta&lt;/code&gt; elements instead. If you point the &lt;a href=&#34;http://dita-ot.sourceforge.net/doc/ot-userguide13/xhtml/release_current/commandline_help.html&#34;&gt;DITA Open Toolkit jar file&lt;/a&gt; at a stylesheet that just has an &lt;code&gt;xsl:import&lt;/code&gt; instruction pointing at get-meta-rdfa.xsl, you&amp;rsquo;ll get all of the Toolkit&amp;rsquo;s default HTML generation with the RDFa &lt;code&gt;meta&lt;/code&gt; elements substituted for the default ones. For example, instead of this,&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;lt;meta name=&amp;quot;DC.Title&amp;quot; content=&amp;quot;My Topic&amp;quot; /&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;you get this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;lt;meta property=&amp;quot;dc:title&amp;quot; content=&amp;quot;My Topic&amp;quot; /&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;It also adds namespace declarations for Dublin Core, Dublin Core basic terms, and PRISM, because those were the most appropriate vocabularies for the terms being added. I didn&amp;rsquo;t see any opportunities to add triplets that would have URLs as the objects, which would look more like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;lt;link rel=&amp;quot;dc:identifier&amp;quot; href=&amp;quot;http://www.snee.com/bobdc.blog/2007/08/who_uses_metadata_from_html_he.html&amp;quot;/&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;You can see an example of the HTML created by the Toolkit with this module &lt;a href=&#34;http://www.snee.com/bobdc.blog/files/currywurst.html&#34;&gt;here&lt;/a&gt; (the look is pretty minimal—while you&amp;rsquo;re customizing the HTML generation code, you might want to point it at a CSS stylesheet as well) and the RDF triples extracted from that by triplr.org &lt;a href=&#34;http://triplr.org/rdfa-n3/http://www.snee.com/bobdc.blog/files/currywurst.html&#34;&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;If you&amp;rsquo;re a fan of RDFa, find some HTML generation code out there and write or revise a module to have it add some RDFa metadata. Like I said, the code has probably already been written to do the difficult part—actually identifying the name/value pairs—so you just need to revise that code to output the slightly different syntax and add a &lt;code&gt;meta&lt;/code&gt; element wrapper for the revised &lt;code&gt;meta&lt;/code&gt; elements to store the subject of the triples and the namespace declarations. (I did this for this weblog&amp;rsquo;s Movable Type templates &lt;a href=&#34;https://www.bobdc.com/blog/generating-rdfa-from-movable-t-1&#34;&gt;several months ago&lt;/a&gt;.)&lt;/p&gt;
&lt;h2 id=&#34;2-comments&#34;&gt;2 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://scottysengineeringlog.net&#34; title=&#34;http://scottysengineeringlog.net&#34;&gt;Scott Hudson&lt;/a&gt; on &lt;a href=&#34;#comment-1194&#34;&gt;August 31, 2007 12:34 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;This is very cool stuff, Bob! I also took from your example the RDFa I would need to insert in my Blogger template. I was hoping to co-exist the RDFa metadata and the Dublin Core recommendation for expressing metadata, but the Triplr.org app didn&amp;rsquo;t seem to like it:&lt;/p&gt;
&lt;p&gt;Parsing &lt;a href=&#34;http://shudson310.blogspot.com/index.html&#34;&gt;http://shudson310.blogspot.com/index.html&lt;/a&gt; content with &amp;lsquo;rdfa&amp;rsquo; parser failed with errors:&lt;br /&gt;
line 1: XML parser error: EntityRef: expecting &amp;lsquo;;&amp;rsquo;&lt;br /&gt;
line 1: XML parser error: EntityRef: expecting &amp;lsquo;;&amp;rsquo;&lt;/p&gt;
&lt;p&gt;Any ideas what the issue is here?&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-1195&#34;&gt;August 31, 2007 10:33 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Hi Scott,&lt;/p&gt;
&lt;p&gt;Your index.html file isn&amp;rsquo;t well-formed. The permalink versions of my postings have a DOCTYPE declaration and fail validation because of the RDFa meta elements, but they&amp;rsquo;re still well-formed, so triplr is OK with them.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2007">2007</category>
      
      <category domain="https://www.bobdc.com//categories/dita">DITA</category>
      
      <category domain="https://www.bobdc.com//categories/rdf">RDF</category>
      
    </item>
    
    <item>
      <title>Who uses metadata from HTML head/meta @name and @content attributes?</title>
      <link>https://www.bobdc.com/blog/who-uses-metadata-from-html-he/</link>
      <pubDate>Sun, 26 Aug 2007 22:53:19 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/who-uses-metadata-from-html-he/</guid>
      
      
      <description><div>For example, name=&#34;DC.Title&#34; content=&#34;My Title&#34;?</div><div>&lt;p&gt;A view source on a lot of web pages out there shows something like this, which is from a web page created by the &lt;a href=&#34;http://dita-ot.sourceforge.net/&#34;&gt;DITA Open Toolkit&lt;/a&gt; from a DITA XML file:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;lt;html lang=&amp;quot;en-us&amp;quot; xml:lang=&amp;quot;en-us&amp;quot;&amp;gt;
  &amp;lt;head&amp;gt;
    &amp;lt;meta content=&amp;quot;text/html; charset=utf-8&amp;quot; http-equiv=&amp;quot;Content-Type&amp;quot; /&amp;gt;
    &amp;lt;meta name=&amp;quot;copyright&amp;quot; content=&amp;quot;(C) Copyright 2005&amp;quot; /&amp;gt;
    &amp;lt;meta name=&amp;quot;DC.rights.owner&amp;quot; content=&amp;quot;(C) Copyright a2005&amp;quot; /&amp;gt;
    &amp;lt;meta content=&amp;quot;recipe&amp;quot; name=&amp;quot;DC.Type&amp;quot; /&amp;gt;
    &amp;lt;meta name=&amp;quot;DC.Title&amp;quot; content=&amp;quot;My Topic&amp;quot; /&amp;gt;
    &amp;lt;meta name=&amp;quot;abstract&amp;quot; content=&amp;quot;Sample description of the topic.&amp;quot; /&amp;gt;
    &amp;lt;meta name=&amp;quot;description&amp;quot; content=&amp;quot;Sample description of the topic.&amp;quot; /&amp;gt;
    &amp;lt;meta content=&amp;quot;XHTML&amp;quot; name=&amp;quot;DC.Format&amp;quot; /&amp;gt;
    &amp;lt;meta content=&amp;quot;r1&amp;quot; name=&amp;quot;DC.Identifier&amp;quot; /&amp;gt;
    &amp;lt;link href=&amp;quot;commonltr.css&amp;quot; type=&amp;quot;text/css&amp;quot; rel=&amp;quot;stylesheet&amp;quot; /&amp;gt;
    &amp;lt;title&amp;gt;My Topic&amp;lt;/title&amp;gt;
&amp;lt;/head&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The HTML &lt;code&gt;head&lt;/code&gt; elements of many web pages have metadata in collections of name/value pairs stored in the &lt;code&gt;name&lt;/code&gt; and &lt;code&gt;content&lt;/code&gt; attributes of &lt;code&gt;meta&lt;/code&gt; elements like this. We&amp;rsquo;ve all seen that many HTML generation routines add these &lt;code&gt;head/meta&lt;/code&gt; elements, but what kind of applications actually pull these name/value pairs out and do something with them? Web-focused content management systems are the only candidate I can think of; can anyone confirm that one of those uses this data, or name some other kind of application that does?&lt;/p&gt;
&lt;p&gt;A funny side note: a &lt;a href=&#34;http://www.google.com/search?hl=en&amp;amp;q=meta+name+*+content&amp;amp;btnG=Search&#34;&gt;web search&lt;/a&gt; to find some numbers on this usage of &lt;code&gt;meta&lt;/code&gt; tags also displays a single paid ad with the title &amp;ldquo;Meta Tags Are Dead&amp;rdquo; that links to an ad for a $79 book called &amp;ldquo;Google Secrets: How to Get a Top 10 Ranking&amp;rdquo; at the clever domain name google-secrets.com. The ad title reminded me of a certain &lt;a href=&#34;http://www.amazon.com/Soul-Dead/dp/B000000HHR&#34;&gt;De La Soul album&lt;/a&gt; name.&lt;/p&gt;
&lt;h2 id=&#34;7-comments&#34;&gt;7 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://kontrawize.blogs.com/kontrawize/&#34; title=&#34;http://kontrawize.blogs.com/kontrawize/&#34;&gt;Anthony B. Coates&lt;/a&gt; on &lt;a href=&#34;#comment-1184&#34;&gt;August 27, 2007 1:12 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;One limited usage, but very useful for me, is that when you bookmark a page in the Opera browser, it stores the description metadata as well as the page title and URL (if the page doesn&amp;rsquo;t have any description, you can add your own, and I often have to, by copying/pasting content from the page). The Opera UI is based on fast search for navigation, so if you are looking at your bookmarks, you just type in a relevant word or two, and it searches all of the bookmark info, including the description, to filter the bookmark tab to show only the relevant bookmarks. This works really well.&lt;br /&gt;
Cheers, Tony.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-1185&#34;&gt;August 27, 2007 1:18 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Thanks Tony! This is just the kind of thing I was curious about. I see that they document it a bit &lt;a href=&#34;http://www.opera.com/docs/specs/opera7/html/index.dml&#34;&gt;here&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://simonster.com/&#34; title=&#34;http://simonster.com/&#34;&gt;Simon Kornblith&lt;/a&gt; on &lt;a href=&#34;#comment-1186&#34;&gt;August 27, 2007 3:39 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://zotero.org/&#34;&gt;Zotero&lt;/a&gt; uses Dublin Core meta tags to import bibliographic metadata.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-1187&#34;&gt;August 27, 2007 4:16 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Just to make sure I understand what you mean by &amp;ldquo;Dublin Core meta tags&amp;rdquo;: when you tell Zotero to save information about a web page, it looks for meta elements where the @name value begins with &amp;ldquo;DC.&amp;rdquo; and, for each that it finds, it saves the @name and @content values, right?&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://code.google.com/p/zotero-for-lawyers/&#34; title=&#34;http://code.google.com/p/zotero-for-lawyers/&#34;&gt;bill mckinney&lt;/a&gt; on &lt;a href=&#34;#comment-1191&#34;&gt;August 30, 2007 3:17 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I scrape meta tags in many of the law-related Zotero translators I&amp;rsquo;ve contributed (Bob, you would be familiar with Cornell&amp;rsquo;s LII I think).&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve also been recently shot down for suggesting that publishers use simple meta tags in order to be &amp;ldquo;Zotero friendly&amp;rdquo; - see: &lt;a href=&#34;http://groups.google.com/group/zotero-dev/browse_thread/thread/4a6f0190afc3e4a&#34;&gt;http://groups.google.com/group/zotero-dev/browse_thread/thread/4a6f0190afc3e4a&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;RDF and ontologies are way sexier!&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-1192&#34;&gt;August 30, 2007 4:48 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Definitely familiar with the Cornell LII. They&amp;rsquo;re doing great work, and not enough people realize that findlaw.com is owned by Thomson, so I think that the work at Cornell is the great hope for free access to online U.S. law.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;RDF and ontologies are way sexier!&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;An important advantage of RDF is the ability to make the value of the name/value pair (the object) a URI, so that it can serve as the subject of other triples, so that you can start linking up triples to gain new information. I&amp;rsquo;ll be posting something that mentions doing this in HTML head elements shortly.&lt;/p&gt;
&lt;p&gt;Ontologies are great, but you can get a lot done without them.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.ldodds.com/blog&#34; title=&#34;http://www.ldodds.com/blog&#34;&gt;Leigh Dodds&lt;/a&gt; on &lt;a href=&#34;#comment-1193&#34;&gt;August 31, 2007 4:59 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;We publish meta tags from the article pages on IngentaConnect. Google Scholar uses them as a way to get better bibliographic metadata than they can automatically harvest from the full-text.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2007">2007</category>
      
      <category domain="https://www.bobdc.com//categories/metadata">metadata</category>
      
    </item>
    
    <item>
      <title>using owl:imports</title>
      <link>https://www.bobdc.com/blog/using-owlincludes/</link>
      <pubDate>Thu, 23 Aug 2007 08:42:43 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/using-owlincludes/</guid>
      
      
      <description><div>Like XInclude, or #include, or xsl:include and xsl:import, but trickier.</div><div>&lt;p&gt;I&amp;rsquo;ve had problems getting OWL&amp;rsquo;s import mechanism to work before, and once I got a simple demo of it to work I wanted to make it available. owl:imports is great because it helps make your ontologies more modular, even letting you separate your ontology from the data it describes, sort of like—&lt;a href=&#34;http://www.jibboo.com/beatles/help/helpline.htm&#34;&gt;dare I say it&lt;/a&gt;—a schema.&lt;/p&gt;
&lt;p&gt;To avoid a simple typo that I made, note that the OWL property uses the third person singular form of the verb &amp;ldquo;imports&amp;rdquo; instead of the second person &amp;ldquo;import&amp;rdquo; command used by XSLT (or the &amp;ldquo;include&amp;rdquo; used by XSLT and other programming languages). We&amp;rsquo;re not saying &amp;ldquo;Hey compiler or interpreter! Import this other file!&amp;rdquo; Instead we&amp;rsquo;re saying, in subject-predicate-object RDF fashion, &amp;ldquo;this ontology imports this other one&amp;rdquo;.&lt;/p&gt;
&lt;p&gt;What makes owl:imports tricky is that we&amp;rsquo;re not just importing a file, but importing an ontology, and because the importing file, the imported file, the terms being defined, and, well, pretty much everything all have URLs to represent their full names, the use of relative names, prefixes, and xml:base can make things easier or add to the confusion. As a starting point, the example below does work.&lt;/p&gt;
&lt;p&gt;Here&amp;rsquo;s the data file to import, which I called addressbook.rdf:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;lt;rdf:RDF
    xmlns:rdf=&amp;quot;http://www.w3.org/1999/02/22-rdf-syntax-ns#&amp;quot;
    xmlns:id=&amp;quot;http://www.snee.com/ns/id#&amp;quot;
    xmlns=&amp;quot;http://www.snee.com/ns/addressbook#&amp;quot;&amp;gt;


  &amp;lt;rdf:Description rdf:about=&amp;quot;id:NormaS&amp;quot;&amp;gt;
    &amp;lt;firstName&amp;gt;Norma&amp;lt;/firstName&amp;gt;
    &amp;lt;lastName&amp;gt;Smith&amp;lt;/lastName&amp;gt;
    &amp;lt;homePhone&amp;gt;(445) 138-6676&amp;lt;/homePhone&amp;gt;
    &amp;lt;workPhone&amp;gt;(326) 852-7714&amp;lt;/workPhone&amp;gt;
  &amp;lt;/rdf:Description&amp;gt;


  &amp;lt;rdf:Description rdf:about=&amp;quot;id:andy-g&amp;quot;&amp;gt;
    &amp;lt;firstName&amp;gt;Andy&amp;lt;/firstName&amp;gt;
    &amp;lt;lastName&amp;gt;Gibson&amp;lt;/lastName&amp;gt;
    &amp;lt;homePhone&amp;gt;(652) 348-2796&amp;lt;/homePhone&amp;gt;
  &amp;lt;/rdf:Description&amp;gt;


&amp;lt;/rdf:RDF&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Here&amp;rsquo;s a short ontology that I named addressbook.owl. It importes addressbook.rdf and adds some metadata asserting that the mobile, workPhone, and homePhone properties are subproperties of the datatype property &amp;ldquo;phone&amp;rdquo;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;lt;rdf:RDF
    xmlns:rdf=&amp;quot;http://www.w3.org/1999/02/22-rdf-syntax-ns#&amp;quot;
    xmlns:owl=&amp;quot;http://www.w3.org/2002/07/owl#&amp;quot;
    xmlns:rdfs=&amp;quot;http://www.w3.org/2000/01/rdf-schema#&amp;quot;&amp;gt;


  &amp;lt;owl:Ontology&amp;gt;
    &amp;lt;owl:imports&amp;gt;
      &amp;lt;owl:Ontology rdf:about=&amp;quot;addressbook.rdf&amp;quot;/&amp;gt;
    &amp;lt;/owl:imports&amp;gt;
  &amp;lt;/owl:Ontology&amp;gt;


  &amp;lt;owl:DatatypeProperty rdf:about=&amp;quot;http://www.snee.com/ns/addressbook#phone&amp;quot;/&amp;gt;


  &amp;lt;owl:DatatypeProperty rdf:about=&amp;quot;http://www.snee.com/ns/addressbook#mobile&amp;quot;&amp;gt;
    &amp;lt;rdfs:subPropertyOf&amp;gt;
      &amp;lt;owl:DatatypeProperty rdf:about=&amp;quot;http://www.snee.com/ns/addressbook#phone&amp;quot;/&amp;gt;
    &amp;lt;/rdfs:subPropertyOf&amp;gt;
  &amp;lt;/owl:DatatypeProperty&amp;gt;


  &amp;lt;owl:DatatypeProperty rdf:about=&amp;quot;http://www.snee.com/ns/addressbook#workPhone&amp;quot;&amp;gt;
    &amp;lt;rdfs:subPropertyOf&amp;gt;
      &amp;lt;owl:DatatypeProperty rdf:about=&amp;quot;http://www.snee.com/ns/addressbook#phone&amp;quot;/&amp;gt;
    &amp;lt;/rdfs:subPropertyOf&amp;gt;
  &amp;lt;/owl:DatatypeProperty&amp;gt;


  &amp;lt;owl:DatatypeProperty rdf:about=&amp;quot;http://www.snee.com/ns/addressbook#homePhone&amp;quot;&amp;gt;
    &amp;lt;rdfs:subPropertyOf&amp;gt;
      &amp;lt;owl:DatatypeProperty rdf:about=&amp;quot;http://www.snee.com/ns/addressbook#phone&amp;quot;/&amp;gt;
    &amp;lt;/rdfs:subPropertyOf&amp;gt;
  &amp;lt;/owl:DatatypeProperty&amp;gt;
&amp;lt;/rdf:RDF&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The following &lt;a href=&#34;http://www.w3.org/TR/2007/CR-rdf-sparql-query-20070614/&#34;&gt;SPARQL&lt;/a&gt; query, stored in the file normaPhone.spq, asks for any phone numbers that NormaS has, even though the person issuing the query may not know which of her phone numbers are in the database:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;PREFIX t: &amp;lt;http://www.snee.com/ns/addressbook#&amp;gt;


SELECT ?phoneType ?value
WHERE {
    &amp;lt;id:NormaS&amp;gt; t:phone ?value.
    &amp;lt;id:NormaS&amp;gt; ?phoneType ?value
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The following command line tells &lt;a href=&#34;http://pellet.owldl.com/&#34;&gt;pellet&lt;/a&gt; to run the normaPhone.spq query against addressbook.owl:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;pellet -if addressbook.owl -qf normaPhone.spq
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Pellet gives me the following answer:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Query Results (2 answers):
phoneType  | value
=============================
:workPhone | &amp;quot;(326) 852-7714&amp;quot;
:homePhone | &amp;quot;(445) 138-6676&amp;quot;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;(It also gives me a report on the non-&lt;a href=&#34;http://www.w3.org/TR/2004/REC-owl-guide-20040210/#OwlVarieties&#34;&gt;DL&lt;/a&gt; aspects of my ontology and what I can add to make it more DL compliant, but I&amp;rsquo;m not including that here.) The query is an important part of my owl:imports demo because it shows how the separate OWL ontology actually adds to the usefulness of the simple address book data that has no ontology information: it lets me get Norma&amp;rsquo;s phone numbers without knowing exactly which kind are stored. (As a bonus, this also demonstrates the value of the rdfs:subPropertyOf property.) Of course, the query also forces me to run the whole setup with some software that will complain if I didn&amp;rsquo;t do it properly.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;m open to suggestions on ways to improve all of this.&lt;/p&gt;
&lt;h2 id=&#34;4-comments&#34;&gt;4 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.jenitennison.com/blog&#34; title=&#34;http://www.jenitennison.com/blog&#34;&gt;Jeni Tennison&lt;/a&gt; on &lt;a href=&#34;#comment-1162&#34;&gt;August 23, 2007 11:15 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Why do you include the data in the ontology rather than the other way around? Doesn&amp;rsquo;t this mean that if someone wants to reuse your ontology, they get all your data along for the ride?&lt;/p&gt;
&lt;p&gt;(Analogously, you point to a schema from an XML document rather than the other way around.)&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-1163&#34;&gt;August 23, 2007 11:28 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Jeni,&lt;/p&gt;
&lt;p&gt;I did it that way because I really like the OWL use case of creating an ontology around existing data: &amp;ldquo;for the data in such-and-such a file, here is some metadata to go with it.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;You don&amp;rsquo;t always point to a schema from an XML document; James Clark has some arguments against that that make sense to me. What makes the most sense to me, which would be easy enough using owl:includes, would be to have a separate skeleton document that has one pointer to the data document and another pointer to the ontology to say that, for a given processing need, these two are to be used together.&lt;/p&gt;
&lt;p&gt;Bob\&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://dowhatimean.net/&#34; title=&#34;http://dowhatimean.net/&#34;&gt;Richard Cyganiak&lt;/a&gt; on &lt;a href=&#34;#comment-1164&#34;&gt;August 23, 2007 12:07 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;m a bit confused, shouldn&amp;rsquo;t you say owl:imports rather than owl:includes throughout your post?&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-1165&#34;&gt;August 23, 2007 12:24 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Richard,&lt;/p&gt;
&lt;p&gt;You&amp;rsquo;re absolutely right. I&amp;rsquo;m suitably embarrassed and just corrected it. (I&amp;rsquo;ll blame my usage of XSLT, with it&amp;rsquo;s slightly different xsl:import and xsl:include commands, for the confusion.)&lt;/p&gt;
&lt;p&gt;Bob\&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2007">2007</category>
      
      <category domain="https://www.bobdc.com//categories/rdf/owl">RDF/OWL</category>
      
    </item>
    
    <item>
      <title>Semantic Web Strategies program ready</title>
      <link>https://www.bobdc.com/blog/semantic-web-strategies-progra/</link>
      <pubDate>Mon, 20 Aug 2007 10:42:38 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/semantic-web-strategies-progra/</guid>
      
      
      <description><div>Lots of great speakers and talks.</div><div>&lt;p&gt;I&amp;rsquo;m very happy to announce that the &lt;a href=&#34;http://www.semanticwebstrategies.com/conference/conferencegrid.php&#34;&gt;program&lt;/a&gt; for the Semantic Web Strategies conference in San Jose September 30 - October 2nd is finished and available. For keynote speakers, we&amp;rsquo;ve got some well-known names who all bring a combination of experience and creativity to their semantic web work: Eric Miller, Nova Spivack, and Kingsley Idehen. We also have presentations on many interesting projects from large and small organizations and well-known semantic web companies such as TopQuadrant, Zepheira, and Access Innovations (of DataHarmony fame) as sponsors.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://www.semanticwebstrategies.com/index.php&#34;&gt;&lt;img src=&#34;http://www.semanticwebstrategies.com/images/logo_SWS_hdr.gif&#34; alt=&#34;[Semantic Web Strategies logo]&#34; border=&#34;0&#34; align=&#34;right&#34; hspace=&#34;30px&#34; vspace=&#34;30px&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The Jupiter Events people have been very good about helping me keep pure product pitches out of the presentations. I&amp;rsquo;ve suggested to several vendors that a tag team presentation with a customer talking about a project that uses the vendor&amp;rsquo;s product would be OK, and we have a few of those. These actually fit well with the overall theme of the conference, which is to get people talking about what they had to do to connect the technology to their specific business needs.&lt;/p&gt;
&lt;p&gt;We received several submissions for talks that had something to do with semantics and something to do with the web, but less to do with the &amp;ldquo;semantic web&amp;rdquo; in the W3C sense of the term. While many of these looked interesting, the majority of the presentations that made the program are related to applications using metadata, ontologies and taxonomies. The use of standards is what makes it possible to hook these things up together to build larger applications, and W3C standards such as OWL and RDF give applications in this area a common language to work together and form these larger applications.&lt;/p&gt;
&lt;p&gt;As the popularity of the Semantic Web grows, many use it as an umbrella term similar to &amp;ldquo;Web n&amp;rdquo; (where n &amp;gt; 1) for this season&amp;rsquo;s Hot New Technologies, and technology companies want their products to be seen as hot and new. Oddly enough, the bulk of the pure marketing-driven submissions, in which PR reps were pushing clients whose products happened to mention &amp;ldquo;semantics&amp;rdquo; and &amp;ldquo;web&amp;rdquo; in the same press release, came after the deadline for submission. We may offer a third vendor track next time for these people—many are doing some cool things—but I&amp;rsquo;ll have a difficult enough time with the upcoming conference trying to hear two talks at once considering how many good ones we have competing with each other.&lt;/p&gt;
&lt;p&gt;So, if you want to learn more about making the connections between the latest semantic web work and your business needs, &lt;a href=&#34;http://www.semanticwebstrategies.com/register.php&#34;&gt;register&lt;/a&gt; for the Semantic Web Strategies conference and come join us in San Jose. The conference proper is September 1st and 2nd and there are some pre-conference tutorials for people just getting started.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2007">2007</category>
      
      <category domain="https://www.bobdc.com//categories/semantic-web">semantic web</category>
      
    </item>
    
    <item>
      <title>How to tell if a forwarded email is a hoax</title>
      <link>https://www.bobdc.com/blog/how-to-tell-if-a-forwarded-ema/</link>
      <pubDate>Thu, 16 Aug 2007 08:53:22 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/how-to-tell-if-a-forwarded-ema/</guid>
      
      
      <description><div>Important warning! Please forward to everyone you know!</div><div>&lt;p&gt;I assume that people reading my weblog are pretty tech-savvy. Otherwise, they&amp;rsquo;d find most of what I write pretty boring. (That&amp;rsquo;s why no one in my family reads it.) The following advice will look like common sense to most of you, but after getting an email with a subject header of &amp;ldquo;Fw: FW: Fw: I M P O R T A N T W A R N I NG ! ! ! ! ! !&amp;rdquo; from a family member today, I thought I&amp;rsquo;d write this out in case it&amp;rsquo;s useful to anyone. You can send the URL to anyone who sends you such an email to save yourself some typing.&lt;/p&gt;
&lt;p&gt;Email that encourages you to forward it to lots of people shouldn&amp;rsquo;t be, for the same reason that spam is bad. These emails are often Dire Warnings that describe simple actions that can prevent or cause bad health problems—for example, that you should clean off the top of your soda cans before opening them to avoid death from dried rat urine, or that you should be careful about microwaving food with certain brands of plastic wrap because some guy on the Today Show said that it would cause cancer. (The latter was actually forwarded by a family friend who is a doctor, and a cynical one at that.)&lt;/p&gt;
&lt;p&gt;When I receive any email that encourages its recipients to forward it to everyone they know, I pick a few phrases that distinguish it from other emails (this morning, it was &amp;ldquo;life is beautiful&amp;rdquo; and &amp;ldquo;power point&amp;rdquo;) and do a Google search. &lt;a href=&#34;http://www.google.com/search?q=%22power%20point%22%20%22life%20is%20beatiful%22&#34;&gt;Today&amp;rsquo;s search&lt;/a&gt; quickly revealed that this morning&amp;rsquo;s email was a hoax that had been forwarded around for over five years.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://www.snopes.com/&#34;&gt;&lt;img src=&#34;http://www.snopes.com/graphics/header/snopes_02.gif&#34; alt=&#34;[snopes.com logo]&#34; border=&#34;0&#34; align=&#34;right&#34; hspace=&#34;30px&#34; vspace=&#34;30px&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Your Google search may get a hit on &lt;a href=&#34;http://www.snopes.com/&#34;&gt;snopes.com&lt;/a&gt;, an urban legends clearing house that tracks such things. This morning&amp;rsquo;s email finished with &amp;ldquo;PASS IT ON IMMEDIATELY! THIS HAS BEEN CON FIRMED [sic] BY SNOPES&amp;rdquo;, which I thought was an interesting touch. (Over-use of upper-case letters adds more points to the &amp;ldquo;possible hoax&amp;rdquo; score.) Snopes actually identifies the &lt;a href=&#34;http://www.snopes.com/computer/virus/life.asp&#34;&gt;&amp;ldquo;life is beautiful&amp;rdquo; email as a hoax&lt;/a&gt;, so this addition to the forwarded email reveals it as the work of a deliberate hoaxer.&lt;/p&gt;
&lt;p&gt;Let&amp;rsquo;s not help these people by forwarding their emails. Remember: whenever you get an email encouraging you to forward it to lots of people, do a web search on a few phrases from it that won&amp;rsquo;t come up in other emails before adding more clutter to your friends&amp;rsquo; and family&amp;rsquo;s email.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2007">2007</category>
      
      <category domain="https://www.bobdc.com//categories/neat-tricks">neat tricks</category>
      
    </item>
    
    <item>
      <title>Getting started with Subversion</title>
      <link>https://www.bobdc.com/blog/getting-started-with-subversio/</link>
      <pubDate>Tue, 14 Aug 2007 09:06:04 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/getting-started-with-subversio/</guid>
      
      
      <description><div>The basics of the popular version control system.</div><div>&lt;p&gt;Because the open source &lt;a href=&#34;http://subversion.tigris.org/&#34;&gt;Subversion&lt;/a&gt; version control system lets you assign fairly arbitrary keywords to resources, I had &lt;a href=&#34;https://www.bobdc.com/blog/dam-subversion-rdf-owl&#34;&gt;some ideas&lt;/a&gt; a few months ago about combining Subversion with an RDF triple store to track resource metadata. I never learned Subversion properly, though, and recently decided to keep my to-do lists, address book, and notes files in Subversion to get better accustomed to its important commands. Many introductions to Subversion are available, but none were quite what I wanted, so I decided to write up the basics myself once I worked them out. I do recommend Garrett Rooney&amp;rsquo;s &amp;ldquo;A Crash Course in Subversion&amp;rdquo; (&lt;a href=&#34;http://www.developer.com/tech/article.php/3499816&#34;&gt;part one&lt;/a&gt;, &lt;a href=&#34;http://www.developer.com/tech/article.php/10923_3503151_1&#34;&gt;part two&lt;/a&gt;), because it&amp;rsquo;s more detailed than mine, but it has too many details for a quick introduction, with digressions about how competing programs implement certain features and other things I wasn&amp;rsquo;t interested in.&lt;/p&gt;
&lt;blockquote class=&#34;pullquote&#34;&gt;One problem I had with Subversion was my tendency at first to think of it as a version control system for files.&lt;/blockquote&gt;
&lt;p&gt;One problem I had with Subversion was my tendency at first to think of it as a version control system for files. Once I started thinking of it as a version control system for directories full of files, it was easier to understand its logic about certain things. For example, the &lt;code&gt;commit&lt;/code&gt; command commits the changes to any files in the current working directory to the repository, and the &lt;code&gt;update&lt;/code&gt; command updates a directory&amp;rsquo;s collection of files with any more recent versions from the repository. (These are simplifications of the default behavior—of course these commands can do more.) Maybe this approach is more intuitive to others, but it took me a while to learn to think that way.&lt;/p&gt;
&lt;p&gt;The most important tasks you do with Subversion are to:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;#svn01&#34;&gt;Create a repository&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;#svn02&#34;&gt;Prepare the repository&lt;/a&gt; for your files&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;#svn03&#34;&gt;Add a directory full of files&lt;/a&gt; to the repository&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;#svn04&#34;&gt;List the files&lt;/a&gt; in a directory in the repository&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;#svn05a&#34;&gt;Create a working copy of a directory&lt;/a&gt; from the repository&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;#svn05&#34;&gt;Add revised versions&lt;/a&gt; of your working directory&amp;rsquo;s files to the repository after editing them&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;#svn11&#34;&gt;Find out what Subversion thinks&lt;/a&gt; of the files currently in your working directory (that is, what work it might need to perform on them)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;#svn06&#34;&gt;Extract updated versions&lt;/a&gt; of your files after someone else updated them in the repository (or in my case, after I updated the repository version from another machine)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;#svn07&#34;&gt;Let the repository know&lt;/a&gt; that you&amp;rsquo;ve renamed or deleted files&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;#svn09&#34;&gt;Resolve conflicts&lt;/a&gt; if different edits were made to different copies of the same file&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;#svn10&#34;&gt;Revert your working copy&lt;/a&gt; back to an earlier version from the repository&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;#svn08&#34;&gt;Look at old versions&lt;/a&gt; of files without reverting to them&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;svn00&#34;&gt;Basic background&lt;/h2&gt;
&lt;p&gt;As far as I can tell, Subversion works identically on Linux and Windows. I&amp;rsquo;ve been keeping a repository on a thumb drive and using it to keep certain directories of a (Xubuntu) Linux machine and a Windows machine in sync. So far, this has worked well.&lt;/p&gt;
&lt;p&gt;There are two command line programs you use for typical Subversion use: &lt;code&gt;svn&lt;/code&gt; and &lt;code&gt;svnadmin&lt;/code&gt;. When someone refers to the svn command &lt;code&gt;foobar&lt;/code&gt;, they generally mean something that you enter like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;$ svn foobar arg1 arg2 etc.
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;(I&amp;rsquo;m using the dollar sign to represent the command prompt—in later examples, lines with no dollar sign that follow these commands will show you the results of the command at the dollar sign.) Entering &amp;ldquo;help&amp;rdquo; after &amp;ldquo;svn&amp;rdquo; or &amp;ldquo;svnadmin&amp;rdquo; lists the available commands and the potential abbreviations of those commands. Entering a command name after &amp;ldquo;help&amp;rdquo; tells you more about that command. If svn included a &lt;code&gt;foobar&lt;/code&gt; command, this would tell you more about it:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;$ svn help foobar
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Subversion commands use URLs to refer to repositories and resources within repositories. In the simplest form of this arrangement, in which your repository sits on locally accessible disk, this means adding &amp;ldquo;file:///&amp;rdquo; before the path name. Fancier Subversion server arrangements let you reference files using &amp;ldquo;http://&amp;rdquo; and other URL prefixes.&lt;/p&gt;
&lt;h2 id=&#34;svn01&#34;&gt;Creating a repository&lt;/h2&gt;
&lt;p&gt;The following creates an empty repository as a subdirectory of /media/disk:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;$ svnadmin create /media/disk/testrepos
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now that it&amp;rsquo;s created, commands will refer to the repository as file:///media/disk/testrepos. Because it&amp;rsquo;s on a thumb drive, when I later move the thumb drive to a Windows machine that decides to call the thumb drive f:, my svn commands will refer to the repository as file:///f:/testrepos.&lt;/p&gt;
&lt;h2 id=&#34;svn02&#34;&gt;Preparing the repository for your files&lt;/h2&gt;
&lt;p&gt;The next step is to create a child of the repository root called &amp;ldquo;trunk&amp;rdquo;. Eventually, you might create siblings of &lt;code&gt;trunk&lt;/code&gt; called &lt;code&gt;tags&lt;/code&gt; and &lt;code&gt;branches&lt;/code&gt; to help organize alternate development branches and groups of files used together for a particular milestone such as a software release, but these are not issues for a quickstart guide. If you keep all of your directories full of files in descendants of &lt;code&gt;trunk&lt;/code&gt;, you&amp;rsquo;ll follow existing conventions and set yourself up to eventually move on to more sophisticated use of Subversion.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;$ svn mkdir -m &amp;quot;creating the trunk dir&amp;quot; file:///media/disk/testrepos/trunk 
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;For certain commands, Subversion wants you to supply a comment, so if you don&amp;rsquo;t include &lt;code&gt;-m&lt;/code&gt; followed by a quoted string, it will try to start up an editor in which you enter the comment. I think it&amp;rsquo;s easier to add the comment to the command line with the &lt;code&gt;-m&lt;/code&gt; switch, and I don&amp;rsquo;t alway include a comment—an empty string between the quotation marks works just fine.&lt;/p&gt;
&lt;h2 id=&#34;svn03&#34;&gt;Adding a directory of files to the repository&lt;/h2&gt;
&lt;p&gt;For demonstration purposes, I have a directory called /home/bob/samplewd with two files in it: myfile1.txt and myfile2.txt. The following puts this directory and its contents into version control by importing it into Subversion:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;$ svn import -m &amp;quot;Adding first directory&amp;quot; /home/bob/samplewd file:///media/disk/testrepos/trunk/samplewd
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The first argument after the &lt;code&gt;-m&lt;/code&gt; comment is the directory to import and the second is the URL for the place in Subversion where I want to import it.&lt;/p&gt;
&lt;h2 id=&#34;svn04&#34;&gt;Listing files in the repository&lt;/h2&gt;
&lt;p&gt;The command to list the files in a directory in the repository is simple enough:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;$ svn list file:///media/disk/testrepos/trunk/samplewd
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Again, when you first learn Subversion, it&amp;rsquo;s worth entering the help command for each new command that you try in order to learn more about it, like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;$ svn help list
&lt;/code&gt;&lt;/pre&gt;
&lt;h2 id=&#34;svn05a&#34;&gt;Creating a working copy of a directory from the repository&lt;/h2&gt;
&lt;p&gt;It would be nice if, after checking a directory of files into Subversion, you could then edit files in that directory and issue Subversion commands to pull the revised versions of those files into the repository, but that&amp;rsquo;s not the way Subversion works. It tracks files in a &amp;ldquo;working directory&amp;rdquo;, and it doesn&amp;rsquo;t know that the directory that it just copied into the repository is a working directory. You must tell Subversion to create a working directory, and you can have it do this anywhere you like. Some people like to delete their original directory and then check out the Subversion copy in its place to be the working directory. If you&amp;rsquo;re new to Subversion and doing this with files that matter to you, you&amp;rsquo;re better off renaming the existing directory as a backup and then checking out a new copy. I did this from my /home/bob directory:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt; mv samplewd samplewd.bkp    # or in Windows, rename instead of mv
 svn checkout file:///media/disk/testrepos/trunk/samplewd samplewd
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The second command here tells Subversion to create a child of the current directory called &lt;code&gt;samplewd&lt;/code&gt; as a working directory copy of the named directory in the Subversion repository.&lt;/p&gt;
&lt;h2 id=&#34;svn05&#34;&gt;Adding revised versions of files to the repository&lt;/h2&gt;
&lt;p&gt;The &lt;code&gt;commit&lt;/code&gt; command tells Subversion to pull any revised files from the working directory into the repository. After I checked the samplewd directory and its contents out of the repository as shown above, I edited myfile1.txt and entered this command from within the samplewd directory to commit the changes:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;$ svn commit -m &amp;quot;made first edits&amp;quot; 
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This commits all the changes in the current working directory to the repository.&lt;/p&gt;
&lt;h2 id=&#34;svn11&#34;&gt;Finding out the status of your working directory files&lt;/h2&gt;
&lt;p&gt;The &lt;code&gt;status&lt;/code&gt; command tells Subversion to list the status of files in the current directory. The default behavior is to list the status of files that have modifications that haven&amp;rsquo;t been committed to the repository. After making a few edits to myfile1.txt and creating a new file called myfile3.txt, the &lt;code&gt;status&lt;/code&gt; command gives me this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;$ svn status
?      myfile3.txt
M      myfile1.txt
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This tells me that myfile3.txt is not under version control and that myfile1.txt has been Modified. (&amp;ldquo;svn help status&amp;rdquo; tells you about other codes that may appear at the beginning of each line.) myfile2.txt isn&amp;rsquo;t listed because Subversion has nothing to say about it: it&amp;rsquo;s in version control and hasn&amp;rsquo;t been changed since the last time it was put there.&lt;/p&gt;
&lt;p&gt;To put myfile3.txt under version control, I use the &lt;code&gt;add&lt;/code&gt; command:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;$ svn add myfile3.txt
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;(For filename arguments, svn usually accepts wildcard arguments such as my*3.html, which can save you some keystrokes. If you&amp;rsquo;re really interested in saving keystrokes, &amp;ldquo;svn help&amp;rdquo; shows you the abbreviations you can use for many commands, such as &lt;code&gt;ci&lt;/code&gt; instead of &lt;code&gt;commit&lt;/code&gt;.) Instead of actually putting myfile3.txt in the repository, this command only marks it for addition to the repository the next time you commit. The same &lt;code&gt;commit&lt;/code&gt; command that puts the edited version of myfile1.txt in will put myfile3.txt in as well, because when used with no arguments the &lt;code&gt;commit&lt;/code&gt; command applies to everything in the current directory under source control:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;$ svn commit -m &amp;quot;created new myfile3.txt file and edited myfile1.txt&amp;quot; 
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now, entering the following with no special parameters shows no output, which is always good to see when you&amp;rsquo;re finishing an editing session, because it means that your working directory is in sync with the repository:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;$ svn status
&lt;/code&gt;&lt;/pre&gt;
&lt;h2 id=&#34;svn06&#34;&gt;Extracting updated versions of files from the repository&lt;/h2&gt;
&lt;p&gt;The &lt;code&gt;update&lt;/code&gt; command tells Subversion to update the current directory based on whatever&amp;rsquo;s in the repository. Let&amp;rsquo;s say I&amp;rsquo;ve revised several files on my Linux laptop and then committed the changes to the Subversion repository that I keep on a thumb drive. After I move the thumb drive to a Windows machine and make the corresponding directory my current working directory, I enter the following to update that directory from the repository copy:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;c:\some\path\samplewd\&amp;gt;svn update
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;After I do that, the directory should have the same contents that the Linux samplewd directory did the last time I committed its contents to to the repository. If I edit or add to the files in the working directory on the Windows machine, I&amp;rsquo;ll commit those changes to the repository on the thumb drive and then update the working directory in the Linux laptop&amp;rsquo;s directory from the thumb drive the next time I use it.&lt;/p&gt;
&lt;h2 id=&#34;svn07&#34;&gt;The repository and renamed or deleted files&lt;/h2&gt;
&lt;p&gt;If I rename a file on my laptop, I want Subversion to know that I did, because the next time I update the corresponding directory on my Windows machine, I don&amp;rsquo;t want to find copies of this file with both the old and new names. If I use Subversion to do the rename, I&amp;rsquo;ll only have the renamed version in all updated versions of the directory:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;$ svn rename myfile3.txt myfile3a.txt
A         myfile3a.txt
D         myfile3.txt
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;As the output shows, Subversion treats it as the addition of one file and the deletion of another, but this achieves what you want. Again, remember that the Subversion repository doesn&amp;rsquo;t really know that you did this until you commit your most recent changes. Once you do, Subversion knows that your myfile3a.txt file used to be your myfile3.txt file, and at which revision of your working directory the rename took place, so you can still go back and get access to the old pre-rename version.&lt;/p&gt;
&lt;p&gt;The same logic applies to deletion of files: have Subversion do it for you, and it will remove it from all corresponding working directories while keeping the older version in the repository if you need it:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;$ svn delete myfile2.txt
D         myfile2.txt
&lt;/code&gt;&lt;/pre&gt;
&lt;h2 id=&#34;svn09&#34;&gt;Resolving conflicting versions of a file&lt;/h2&gt;
&lt;p&gt;What if Jane and Jack make different edits to the same file in two different working directories that are supposed to reflect the same directory in the repository? Jane can commit her edited version with no problem, because Subversion doesn&amp;rsquo;t know that there&amp;rsquo;s a problem yet. When Jack tries to commit his changes, Subversion knows that Jack was editing something from before Jane&amp;rsquo;s round of committed edits, and it won&amp;rsquo;t put Jack&amp;rsquo;s version into the repository. Instead, it lets Jack know that there&amp;rsquo;s a problem and gives him information to straighten out the problem.&lt;/p&gt;
&lt;p&gt;To demonstrate what happens, I made one edit to a copy of myfile1.txt in one working directory and another to a copy in a second working directory. I committed the first one with no problem. An attempt to commit the second looked like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;$ svn commit -m &amp;quot;testing conflicts part 2&amp;quot;
Sending        myfile1.txt
svn: Commit failed (details follow):
svn: Out of date: &#39;/trunk/samplewd/myfile1.txt&#39; in transaction &#39;8-1&#39;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;It looks like there&amp;rsquo;s a problem. The next step is to run Subversion&amp;rsquo;s &lt;code&gt;update&lt;/code&gt; command, which will help you sort the problem out.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;$ svn update
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This merges the two conflicting versions into a combined version in the current directory—if possible. In one of my tests, I added one new line at the beginning of one copy of the file and another new line at the end of another copy in a different working directory. After committing one, attempting to commit the other, and getting an error message similar to the one shown above, the &lt;code&gt;update&lt;/code&gt; operation in the second directory created a version of the file with the two new lines in their appropriate places. (This still needs to be committed to the repository.) As it does so, the &lt;code&gt;update&lt;/code&gt; command outputs the filename with a &amp;ldquo;G&amp;rdquo; status code for &amp;ldquo;merGed&amp;rdquo;.&lt;/p&gt;
&lt;p&gt;What if the edits can&amp;rsquo;t be combined so easily and Subversion can&amp;rsquo;t merge the two versions? It outputs the filename with a status code of &amp;ldquo;C&amp;rdquo; for &amp;ldquo;Conflict&amp;rdquo; and provides you with plenty of information to help you fix the problem. For example, let&amp;rsquo;s say you checked out the file myfile1.txt from release 7 and then someone else committed a new version to the repository as release 8. When you unsuccessfully tried to commit your own revised version, Subversion would revise your myfile1.txt file to include some &lt;a href=&#34;http://en.wikipedia.org/wiki/Diff&#34;&gt;diff&lt;/a&gt; output (the utility that shows the differences between the versions) and it would create a few new files to help me fix the problem:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;myfile1.txt.r7&lt;/strong&gt; The release 7 version of the file, which was checked out and edited to create the two different conflicting versions.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;myfile1.txt.r8&lt;/strong&gt; The release 8 version of the file, which was checked in from the other working directory.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;myfile1.txt.mine&lt;/strong&gt; The version of the file in the current directory that conflicted with the repository.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Use these to edit myfile1.txt until it looks the way you really want it, and then enter the following command to tell Subversion &amp;ldquo;I&amp;rsquo;ve resolved the problem and the current version of myfile1.txt is the one that I want as the official copy in the repository&amp;rdquo;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;$ svn resolved myfile1.txt
Resolved conflicted state of &#39;myfile1.txt&#39;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Of course, it didn&amp;rsquo;t &lt;em&gt;really&lt;/em&gt; resolve the conflict; you must execute another &lt;code&gt;commit&lt;/code&gt; command before that&amp;rsquo;s complete. The &lt;code&gt;resolved&lt;/code&gt; command also removes the extra files that the &lt;code&gt;update&lt;/code&gt; command created (in this case, the .r7, .r8, and .mine files) instead of leaving them there to clutter up your directory.&lt;/p&gt;
&lt;h2 id=&#34;svn10&#34;&gt;Reverting your working copy back to an earlier version from the repository&lt;/h2&gt;
&lt;p&gt;The simplest case for reverting a file to an earlier version is to tell the repository to throw out your recent edits and replace a working copy with the most recent committed version from the repository. Let&amp;rsquo;s say that one afternoon&amp;rsquo;s round of edits led you down a blind alley and you want to go back to where you were at lunch time. You issue a &lt;code&gt;status&lt;/code&gt; command, which shows you that myfile1.txt has been modified, and you tell Subversion to revert to the last committed version. You then enter another &lt;code&gt;status&lt;/code&gt; command, which shows you that no files in your directory are out of sync with your repository versions:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;$ svn status
M      myfile1.txt
$ svn revert myfile1.txt
Reverted &#39;myfile1.txt&#39;
$ svn status
$ 
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;If you want to go back to an earlier version of a file in the repository, you&amp;rsquo;ll use the same command that you used to get files out of the repository in the first place: &lt;code&gt;update&lt;/code&gt;. Before you use it, remember to either commit your current version to the repository or revert as shown above— Subversion doesn&amp;rsquo;t want you to lose any of your work, so if you pull an older version of a file out of the repository and Subversion sees uncommitted modifications in your file, it will create &lt;a href=&#34;#i2&#34;&gt;all the extra files&lt;/a&gt; that it creates to help you resolve any conflict between your existing working copy and the &amp;ldquo;official&amp;rdquo; one.&lt;/p&gt;
&lt;p&gt;The following tells Subversion to replace the working directory&amp;rsquo;s version of myfile1.txt with the one from revision 6:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;$ svn update -r 6 myfile1.txt
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;(When referencing a particular version, the space after the &amp;ldquo;-r&amp;rdquo; is optional.) The online help for the &lt;code&gt;update&lt;/code&gt; command lists some handy keywords that you can use instead of a number to specify a particular revision, such as &amp;ldquo;PREV&amp;rdquo; to get the version previous to the most recent committed version.&lt;/p&gt;
&lt;h2 id=&#34;svn08&#34;&gt;Looking at old versions of files without reverting to them&lt;/h2&gt;
&lt;p&gt;If you only want to look at an older version of a file without making it your working copy version, a simpler command is available. The following command tells Subversion to display the contents of myfile1.txt from release 3 of the committed working directory:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;$ svn cat -r 3 myfile1.txt
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;cat&lt;/code&gt; command can use the same keywords as &lt;code&gt;update&lt;/code&gt; in place of specific version numbers.&lt;/p&gt;
&lt;h2 id=&#34;xwrezklKTMitZcDBpEPC7w&#34;&gt;Summary&lt;/h2&gt;
&lt;p&gt;Once you&amp;rsquo;ve put a directory of files (and potentially subdirectories) into a repository, then checked them out into a working directory, your typical procedure for a session of working with those files is to issue an an &lt;code&gt;update&lt;/code&gt; command to ensure that the directory has the most recent versions, then make your edits, then issue a &lt;code&gt;commit&lt;/code&gt; command to put your revised versions into the repository. (I almost wrote &amp;ldquo;to put your &lt;em&gt;updated&lt;/em&gt; versions into the repository&amp;rdquo;, and it wouldn&amp;rsquo;t be the first time that re-use of Subversion vocabulary with slightly different semantics led me to a bit of confusion.) To reiterate one point I&amp;rsquo;ve already made several times, remember that many important Subversion commands have no effect until the next time you issue a &lt;code&gt;commit&lt;/code&gt; command.&lt;/p&gt;
&lt;p&gt;Subversion offers many more commands than we&amp;rsquo;ve seen here. You can specify files in your working directory for it to ignore, you can list the user names of who made which changes when, you can store repositories on remote systems, you can change the URL used to reference a particular repository if it got moved to a different server, you can fork off a development effort into multiple distinct branches&amp;hellip; you can do all kinds of cool things. The Garrett Rooney two-part article is a great place to learn more about these features. Also, remember that I&amp;rsquo;ve only skimmed the surface of the commands that I did cover, so don&amp;rsquo;t forget to enter &amp;ldquo;svn help command&amp;rdquo; for each of the above commands after you try them to learn about what else they can do.&lt;/p&gt;
&lt;p&gt;For some more interesting thoughts on Subversion, see &lt;a href=&#34;http://norman.walsh.name/2007/07/19/mercurial&#34;&gt;this recent blog posting by Norm Walsh&lt;/a&gt; and Joey Hess&amp;rsquo;s &lt;a href=&#34;http://www.onlamp.com/pub/a/onlamp/2005/01/06/svn_homedir.html&#34;&gt;Keeping Your Life in Subversion&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;3-comments&#34;&gt;3 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.peterkrantz.com&#34; title=&#34;http://www.peterkrantz.com&#34;&gt;Peter Krantz&lt;/a&gt; on &lt;a href=&#34;#comment-1130&#34;&gt;August 14, 2007 11:16 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;A great way to get started with subversion is to install Mercurial :-) I had a look at Mercurial after a friend suggested it and after you get used to the distributedness of it it is very pleasant. Subversion users will find commands familiar.&lt;/p&gt;
&lt;p&gt;By Simon Rozet on &lt;a href=&#34;#comment-1131&#34;&gt;August 14, 2007 11:30 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Peter Krantz wrote:\&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;A great way to get started with subversion is to install &amp;gt;Mercurial :-)&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Oh yes, definitely! hg is IMHO a lot more simpler than svn. You can create a new &amp;ldquo;repo&amp;rdquo; in a second, create a new branch for free and it have real branching and merging capabilities.&lt;/p&gt;
&lt;p&gt;You should really give it a try :-)\&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-1132&#34;&gt;August 14, 2007 12:25 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;That&amp;rsquo;s why I linked to Norm&amp;rsquo;s recent posting. (Although, according to a more &lt;a href=&#34;http://norman.walsh.name/2007/08/09/mercurial&#34;&gt;recent posting&lt;/a&gt;, he&amp;rsquo;s been having some problems with Mercurial.)&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2007">2007</category>
      
      <category domain="https://www.bobdc.com//categories/neat-tricks">neat tricks</category>
      
    </item>
    
    <item>
      <title>Some great W3C explanations of basic ontology concepts</title>
      <link>https://www.bobdc.com/blog/some-great-w3c-explanations-of/</link>
      <pubDate>Mon, 06 Aug 2007 10:23:57 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/some-great-w3c-explanations-of/</guid>
      
      
      <description><div>Highlights from the OWL Use Cases and Requirements</div><div>&lt;p&gt;While reading the W3C Recommendation &lt;a href=&#34;http://www.w3.org/TR/2004/REC-webont-req-20040210/&#34;&gt;OWL Use Cases and Requirements&lt;/a&gt;, I was surprised at how many nice, succinct explanations of basic OWL and ontology-related concepts it had, so I thought I&amp;rsquo;d reproduce some highlights here. For example, take its definition of an ontology:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;An ontology formally defines a common set of terms that are used to describe and represent a domain&amp;hellip;An ontology defines the terms used to describe and represent an area of knowledge.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote class=&#34;pullquote&#34;&gt;What the hell is a conceptualization? &lt;/blockquote&gt;
&lt;p&gt;The most popular definition of ontology is &amp;ldquo;a formalization of a conceptualization&amp;rdquo;, as if the readers are going to sit back and say &amp;ldquo;OK! Formalization! Conceptualization! Got it! All set here!&amp;rdquo;, except that they&amp;rsquo;re really thinking &amp;ldquo;What the hell is a conceptualization?&amp;rdquo; I much prefer the OWL Use Cases explanation, which while not being as &amp;ldquo;formal&amp;rdquo;, at least explains what&amp;rsquo;s going on.&lt;/p&gt;
&lt;p&gt;Here are some more nice quotes from the Use Cases document, arranged as a fake FAQ:&lt;/p&gt;
&lt;p&gt;Why would anyone use an ontology?&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Ontologies are used by people, databases, and applications that need to share domain information (a domain is just a specific subject area or area of knowledge, like medicine, tool manufacturing, real estate, automobile repair, financial management, etc.). Ontologies include computer-usable definitions of basic concepts in the domain and the relationships among them&amp;hellip;Ontologies can prove very useful for a community as a way of structuring and defining the meaning of the metadata terms that are currently being collected and standardized.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I keep seeing the word &amp;ldquo;ontology&amp;rdquo; used to describe different things. What&amp;rsquo;s up with that?&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;(note that here and throughout this document, [the] definition [of ontology] is not used in the technical sense understood by logicians)&amp;hellip; The word ontology has been used to describe artifacts with different degrees of structure. These range from simple taxonomies (such as the Yahoo hierarchy), to metadata schemes (such as the Dublin Core), to logical theories.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;What is the Semantic Web, and where do ontologies fit in?&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The Semantic Web is a vision for the future of the Web in which information is given explicit meaning, making it easier for machines to automatically process and integrate information available on the Web. The Semantic Web will build on XML&amp;rsquo;s ability to define customized tagging schemes and RDF&amp;rsquo;s flexible approach to representing data. The next element required for the Semantic Web is a web ontology language which can formally describe the semantics of classes and properties used in web documents. In order for machines to perform useful reasoning tasks on these documents, the language must go beyond the basic semantics of RDF Schema.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;How can OWL do this?&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The &lt;a href=&#34;http://www.w3.org/2001/sw/WebOnt/charter&#34;&gt;Web Ontology Working Group charter&lt;/a&gt; tasks the group to produce this more expressive semantics and to specify mechanisms by which the language can provide &amp;ldquo;more complex relationships between entities including: means to limit the properties of classes with respect to number and type, means to infer that items with various properties are members of a particular class, a well-defined model of property inheritance, and similar semantic extensions to the base languages.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;The Semantic Web needs ontologies with a significant degree of structure. These need to specify descriptions for the following kinds of concepts:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Classes (general things) in the many domains of interest&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The relationships that can exist among things&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The properties (or attributes) those things may have&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;I don&amp;rsquo;t know much about &lt;a href=&#34;http://www.w3.org/TR/2004/REC-rdf-schema-20040210/&#34;&gt;RDF Schema&lt;/a&gt;. What does it let you do, and what does OWL add?&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;With RDF Schema, one can define classes that may have multiple subclasses and super classes, and can define properties, which may have sub properties, domains, and ranges. In this sense, RDF Schema is a simple ontology language. However, in order to achieve interoperation between numerous, autonomously developed and managed schemas, richer semantics are needed. For example, RDF Schema cannot specify that the Person and Car classes are disjoint, or that a string quartet has exactly four musicians as members.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The whole Uses Cases document is worth checking out.&lt;/p&gt;
&lt;h2 id=&#34;2-comments&#34;&gt;2 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.michaelgaio.com&#34; title=&#34;http://www.michaelgaio.com&#34;&gt;Michael Gaio&lt;/a&gt; on &lt;a href=&#34;#comment-1112&#34;&gt;August 7, 2007 5:13 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;m don&amp;rsquo;t know much (yet) about OWL or RDF, but I would imagine&amp;ndash;in addition to the semantic web needing defined Classes, Relationships (Connections), and Properties (Attributes)&amp;ndash;that it will also benefit from having Methods (or functions). Essentially, when the structural description of knowledge domains approach a more dynamic programic / computation element&amp;ndash;we will be establishing a &amp;ldquo;Web 3.0&amp;rdquo; framework for brining rich semantics into the simulation worlds (virtual world arenas). Imagine an ontological knowledge domain (or taxonomy) for arborology (the study of trees). In a virtual environment, actual landscapes can be generated and informed (as an intersection of multiple ontologies). We may even find that other normally unrelated information ontologies with similar structures (or methodical structures) can borrow or interface with a &amp;ldquo;tree&amp;rdquo; ontology as a good fit for information visualization. In this way, virtual simulation becomes a most nuanced and dynamic way of communication.&lt;/p&gt;
&lt;p&gt;have classes,&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-1113&#34;&gt;August 8, 2007 11:47 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s really more about open data than a large, distributed OO system, which people have tried to do for years without success. By making lots of documented data available, people can then define whatever methods they want around that data to build their applications, and other people can define other methods instead of having everyone work around the same methods defined with the data.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2007">2007</category>
      
      <category domain="https://www.bobdc.com//categories/rdf/owl">RDF/OWL</category>
      
    </item>
    
    <item>
      <title>Another great XML Summer School in Oxford</title>
      <link>https://www.bobdc.com/blog/another-great-xml-summer-schoo/</link>
      <pubDate>Mon, 30 Jul 2007 09:13:39 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/another-great-xml-summer-schoo/</guid>
      
      
      <description><div>Despite some flooding here and there.</div><div>&lt;p&gt;The eighth meeting of the &lt;a href=&#34;http://www.xmlsummerschool.com&#34;&gt;XML Summer School&lt;/a&gt; sponsored by &lt;a href=&#34;http://www.csw.co.uk/&#34;&gt;The CSW Group&lt;/a&gt; at Oxford University was another great one, with Norm Walsh and Dan Connolly being excellent additions to the list of XML luminaries presenting. Norm started a &lt;a href=&#34;http://www.flickr.com/groups/xmlsummerschool2007/&#34;&gt;flickr group&lt;/a&gt; for the summer school, and I just added a few pictures.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://www.flickr.com/photos/bobdc/946074390/&#34;&gt;&lt;img src=&#34;http://farm2.static.flickr.com/1386/946074390_69079b8355_m.jpg&#34; alt=&#34;[Cherwell punts at high water]&#34; border=&#34;0&#34; align=&#34;right&#34; hspace=&#34;30px&#34; vspace=&#34;30px&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Judging by an email I received from my mother while I was in Oxford, the flooding in Oxfordshire and Gloucestershire was big enough news to be reported in the US, but it didn&amp;rsquo;t affect the Wadham College campus. We did hear tales of lost power, lost water, flooded homes, and bad transportation delays from several locals, though, and we saw ducks and a swan swimming around the giant pond that had been the Queens College cricket fields.&lt;/p&gt;
&lt;p&gt;The flooded train tracks between Oxford and Didcot prevented my family and I from visiting Bath, which was one of our tourism plans, and it also prevented Leigh Dodds from joining the &lt;a href=&#34;https://www.bobdc.com/blog/come-join-the-oxfordshire-sema&#34;&gt;Oxfordshire Semantic Web Interest Group&lt;/a&gt; meeting held Wednesday evening. There were still several interesting presentations at the SWIG, and I look forward to querying the structured data in Wikipedia (for example, the structured data about each Simpsons&amp;rsquo; episode shown in a &lt;a href=&#34;http://en.wikipedia.org/wiki/Old_Yeller_Belly&#34;&gt;box down the right of each page&lt;/a&gt;) using SPARQL after hearing about the &lt;a href=&#34;http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData&#34;&gt;Linking Open Data&lt;/a&gt; project from Dan Connolly&amp;rsquo;s presentation. The meeting reminded me how much face-to-face club meetings have been replaced by technology as a way for people with similar interests to share them; doing it the old-fashioned way has some nice advantages.&lt;/p&gt;
&lt;p&gt;This year also saw more offspring of XML people than ever before. In addition to my wife Jennifer and I bringing along our two daughters, Jeni Tennison, Lauren Wood, and Paul Prescod each brought a daughter, and John Chelsom&amp;rsquo;s wife Angela arranged a tea for them all at the famous old &lt;a href=&#34;http://www.randolph-hotel.com/&#34;&gt;Randolph Hotel&lt;/a&gt; that all the kids enjoyed. Angela was also kind enough to arrange for my daughters to go horseback riding at a local stable, but the torrential rains prevented the actual riding. The girls still managed to get some English mud on their riding boots, and they still have some funny stories about the trip to tell their friends at the barn where they ride at home in Virginia.&lt;/p&gt;
&lt;p&gt;Despite the floods and the transitional state of the CSW staff running the event, they all did a fine job, especially new members of the staff Rose Barnard and second-generation CSW employee Jo Nurse. For the Wednesday evening social events, we were all very happy to see the newly-married Kerry Poulter join us; she ran the summer schools before Rose joined this year and has since moved on to an energy company where they all seem surprised by the depth of her technical knowledge. We&amp;rsquo;ll all miss Omar Tamer, the tech guy and heavy lifter for the ladies, who waited for the end of the summer school before moving on to his new position at &lt;a href=&#34;http://www.o2.com/&#34;&gt;O~2~&lt;/a&gt;. I hope that his replacement is as good at configuring wireless routers from both Windows and Linux machines. And as always, Sara Price serenely watched over the proceedings and provided calm assistance to all attendees and CSW employees who needed it.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://www.flickr.com/photos/bobdc/944107881/&#34;&gt;&lt;img src=&#34;http://farm2.static.flickr.com/1378/944107881_b6ceeb4c66_m.jpg&#34; alt=&#34;[reading Harry Potter on the way to Oxford]&#34; border=&#34;0&#34; align=&#34;right&#34; hspace=&#34;30px&#34; vspace=&#34;30px&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Related reading material: I like to match my travel reading to my destination, and I heartily recommend Joe Queenan&amp;rsquo;s &lt;a href=&#34;http://www.amazon.com/exec/obidos/ISBN=031242521X/bobducharmeA/&#34;&gt;Queenan Country&lt;/a&gt; as reading material when traveling in England. His survey of trashy American culture &lt;a href=&#34;http://www.amazon.com/exec/obidos/ISBN=0786884088/bobducharmeA&#34;&gt;Red Lobster, White Trash, and Blue Lagoon&lt;/a&gt; was one of the funniest books I&amp;rsquo;ve read in the last few years, and his account of his first big solo tour of the UK without his English wife is also hilarious. (I even learned that some of Elizabeth Taylor and Richard Burton&amp;rsquo;s earlier liaisons took place in the Oxford hotel where my family and I stayed.) Despite what the Amazon reviews say, it&amp;rsquo;s funnier and better written than Bill Bryson&amp;rsquo;s &amp;ldquo;Notes from a Small Island&amp;rdquo;, which I never even finished.&lt;/p&gt;
&lt;p&gt;We were in Germany the day that &amp;ldquo;Harry Potter and the Deathly Hallows&amp;rdquo; came out, and luckily the British edition was easy to find in Heidelberg. My daughters, who had just re-read the first six books in anticipation of the seventh, each bought a copy and stayed pretty focused until they finished it. The picture on the right shows them on the bus from Heathrow to Oxford.&lt;/p&gt;
&lt;h2 id=&#34;2-comments&#34;&gt;2 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.ccil.org/~cowan&#34; title=&#34;http://www.ccil.org/~cowan&#34;&gt;John Cowan&lt;/a&gt; on &lt;a href=&#34;#comment-1095&#34;&gt;July 30, 2007 2:14 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Bill Bryson&amp;rsquo;s books shouldn&amp;rsquo;t even be donated; they should be used as landfill. He makes at least one error per page.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.rafterjumpon.com&#34; title=&#34;http://www.rafterjumpon.com&#34;&gt;Emily&lt;/a&gt; on &lt;a href=&#34;#comment-1096&#34;&gt;July 30, 2007 2:54 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Hi Bob-&lt;br /&gt;
I was very impressed with your up-to-date and knowledgeable blog. We are starting a new website and are looking for creative writers, journalists, and photojournalists to act as correspondants. Please go to &lt;a href=&#34;http://www.rafterjumpon.com&#34;&gt;http://www.rafterjumpon.com&lt;/a&gt; and submit an application today!&lt;br /&gt;
-Emily&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2007">2007</category>
      
      <category domain="https://www.bobdc.com//categories/xml">XML</category>
      
    </item>
    
    <item>
      <title>Come join the Oxfordshire Semantic Web Interest Group</title>
      <link>https://www.bobdc.com/blog/come-join-the-oxfordshire-sema/</link>
      <pubDate>Tue, 10 Jul 2007 19:47:12 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/come-join-the-oxfordshire-sema/</guid>
      
      
      <description><div>Rescheduling their meeting to accommodate the XML Summer School.</div><div>&lt;p&gt;Eamonn Neylon has kindly moved the monthly meeting of the &lt;a href=&#34;http://swig.networkedplanet.com/&#34;&gt;Oxfordshire Semantic Web Interest Group&lt;/a&gt; to later in the month than usual so that people who are in Oxford for the &lt;a href=&#34;http://www.xmlsummerschool.com/&#34;&gt;XML Summer School&lt;/a&gt; can attend. To make it even easier for them, the meeting will be at &lt;a href=&#34;http://www.wadham.ox.ac.uk/&#34;&gt;Wadham College&lt;/a&gt;, where the rest of the summer school takes place.&lt;/p&gt;
&lt;p&gt;Instead of one presenter, there will be a series of lightning talks, and he&amp;rsquo;s encouraging people to contact him at [his first name]@xmlopen.org if they&amp;rsquo;d like to do one of the talks. I&amp;rsquo;m not sure what I&amp;rsquo;ll speak about, but I&amp;rsquo;ll think of something.&lt;/p&gt;
&lt;p&gt;Even if you&amp;rsquo;ve never been to an Oxon SWIG meting and you&amp;rsquo;re not taking part in the summer school, if you&amp;rsquo;re in the Oxford area and interested in semweb work please come by. I&amp;rsquo;m looking forward to the chance to meet people who&amp;rsquo;ve been just names (and perhaps weblogs) to me so far. So come on by and speak or listen or hang out!&lt;/p&gt;
&lt;h2 id=&#34;1-comments&#34;&gt;1 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.ldodds.com/blog&#34; title=&#34;http://www.ldodds.com/blog&#34;&gt;Leigh Dodds&lt;/a&gt; on &lt;a href=&#34;#comment-1043&#34;&gt;July 11, 2007 4:56 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Thanks for the reminder. The conjunction of so many geeks is just too good to miss!&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2007">2007</category>
      
      <category domain="https://www.bobdc.com//categories/semantic-web">semantic web</category>
      
    </item>
    
    <item>
      <title>Semantic Web project ideas number 6</title>
      <link>https://www.bobdc.com/blog/semantic-web-project-ideas-num-5/</link>
      <pubDate>Fri, 06 Jul 2007 09:27:09 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/semantic-web-project-ideas-num-5/</guid>
      
      
      <description><div>A form-driven front end to a SPARQL engine (as opposed to a SPARQL front end).</div><div>&lt;p&gt;I&amp;rsquo;ve only written up about half of my &lt;a href=&#34;http://www.snee.com/bobdc.blog/metadata/semantic_web/project_ideas/&#34;&gt;list of semantic web project ideas&lt;/a&gt; first described &lt;a href=&#34;https://www.bobdc.com/blog/semantic-web-project-ideas-num#JbCDa2KDRty2dcZb7YDo0Q&#34;&gt;here&lt;/a&gt;, and the &lt;a href=&#34;http://www.mindswap.org/blog/2007/07/01/2007-semantic-web-challenge/&#34;&gt;2007 Semantic Web Challenge&lt;/a&gt; has just been announced, so I wanted to get another one of the ideas out there.&lt;/p&gt;
&lt;blockquote class=&#34;pullquote&#34;&gt;An RDF report generator could be the RDF killer app.&lt;/blockquote&gt;
&lt;p&gt;A GUI front end to SPARQL that generates SPARQL queries without requiring someone to know SPARQL—in short, an RDF report generator—could be the RDF killer app, because providing useful access to data that made more sense in a triplestore than in a relational database would provide a vivid demonstration of the value of the RDF data model to people who wanted to avoid hands-on experience with RDF or SPARQL syntax. The project isn&amp;rsquo;t ultimately about SPARQL, although obviously implementing it would require a high comfort level with the query language. It&amp;rsquo;s about proving the value of the RDF data model for data that typical end users need.&lt;/p&gt;
&lt;p&gt;Submitting queries to an RDF front end to relational data wouldn&amp;rsquo;t prove anything. For example, imagine that you put an RDF interface in front of a movie relational database application and then created a form-based interface to send SPARQL queries to that RDF interface. Jane Moviefan could fill out those forms to look up more movies by the director of the film she loved so much last night, but so what? Forms to generate the SQL queries to send to the relational database are easy enough to create, and additional moving parts inside the black box won&amp;rsquo;t make any difference to Jane.&lt;/p&gt;
&lt;p&gt;What would make a difference to the end users would be the ability to query against data that didn&amp;rsquo;t fit well into relational databases. These users may not care about relational versus non-relational storage, but they should have a shot at an &amp;ldquo;aha&amp;rdquo; moment when using this application with data that they care about and can&amp;rsquo;t conveniently gain access to elsewhere. What data fits in RDF triplestores better than it fits into normalized relational tables? Who wants to query this data apart from RDF geeks who are comfortable with SPARQL syntax? These questions will help lay out the use cases for this application.&lt;/p&gt;
&lt;p&gt;Reducing the domain scope would be important, because a general-purpose query form that worked for all RDF data but doesn&amp;rsquo;t require query language syntax would be difficult. In other words, a general-purpose front end to generate any SPARQL query for any domain would be too ambitious. Finding useful applications within a given domain for the following queries would be a good start—I wouldn&amp;rsquo;t be surprised if such a tool already exists in the world of biology and pharmaceuticals—because these queries would be easy enough to generate from a forms-based front end:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Show all data about the subject with a dc:title value of &amp;ldquo;Lost in Translation&amp;rdquo;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Show a subset of data about the subject with a particular dc:title based on entered criteria&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;List all subjects an imdb:director value of &lt;a href=&#34;http://www.imdb.com/name/nm0001068/&#34;&gt;imdbn:nm0001068&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Show all data about subjects with an imdb:director value of imdbn:nm0001068&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Show selected predicate/object (name-value) pairs for subject imdbn:nm0001068&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I realize that after saying that a movie database would be not be a good example because such data fits just fine into a relational database, I&amp;rsquo;m using movie data for use case queries. A good example database would take advantage of the added flexibility of RDF, so that drop-down selections of &amp;ldquo;fields&amp;rdquo; to query when retrieving &amp;ldquo;records&amp;rdquo; (while not technically correct terms, &amp;ldquo;field&amp;rdquo; and &amp;ldquo;record&amp;rdquo; could still be helpful metaphors for the end-user to understand how to use the query tool) would be based on the available data, not on some schema that predates the data—which I&amp;rsquo;m sure is the case with &lt;a href=&#34;http://www.imdb.com/&#34;&gt;IMDB&lt;/a&gt;. For flexible data that lots of people care about, I like to use &lt;a href=&#34;https://www.bobdc.com/blog/all-the-personal-data-you-want&#34;&gt;address book data&lt;/a&gt; in my own examples.&lt;/p&gt;
&lt;p&gt;Hooking up an XForms front end to an RDF triplestore back end would be an interesting place to start. Also check out Leigh Dodds&amp;rsquo; &lt;a href=&#34;http://www.ldodds.com/projects/twinkle/&#34;&gt;Twinkle&lt;/a&gt;, which is an aid to constructing SPARQL queries to submit to a SPARQL engine, as opposed to a forms-based interface that hides the actual queries from the user, but plenty of good work has gone into it. I haven&amp;rsquo;t had a chance to take a close look at Oracle&amp;rsquo;s RDF offerings, but if they&amp;rsquo;re so far along with it, maybe they have a way to create forms that create queries against the RDF for use by people who know nothing about RDF. I look forward to finding out.&lt;/p&gt;
&lt;h2 id=&#34;3-comments&#34;&gt;3 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://fgiasson.com/blog/&#34; title=&#34;http://fgiasson.com/blog/&#34;&gt;Fred&lt;/a&gt; on &lt;a href=&#34;#comment-1015&#34;&gt;July 6, 2007 3:03 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Hi Bob,&lt;/p&gt;
&lt;p&gt;&lt;br /&gt;
I think you could be interested in the &lt;a href=&#34;http://fgiasson.com/blog/index.php/2007/03/26/zitgist-search-query-interface-a-new-search-engine-paradigm/&#34;&gt;Zitgist Query Builder&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;br /&gt;
It should be released some time this summer.&lt;/p&gt;
&lt;p&gt;&lt;br /&gt;
Take care,&lt;/p&gt;
&lt;p&gt;&lt;br /&gt;
Fred&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-1016&#34;&gt;July 6, 2007 3:08 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Fred,&lt;/p&gt;
&lt;p&gt;It looks great, and I look forward to playing with it.&lt;/p&gt;
&lt;p&gt;Bob&lt;/p&gt;
&lt;p&gt;By Kendall Clark on &lt;a href=&#34;#comment-1017&#34;&gt;July 6, 2007 3:50 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;This is what our RDF client, JSpace, does: builds RDF queries (SPARQL, SeRQL, whatever &amp;ndash; this isn&amp;rsquo;t really the interesting part, as you note) in response to user input in a GUI, displays the results, rinse-and-repeat. :&amp;gt;&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s not totally mature yet, but getting there. We&amp;rsquo;ve got a lot more stuff to do it, including different column types (geo/maps, datatype literal filters, etc) and more expressivity, but the basic ideas are pretty much laid in.&lt;/p&gt;
&lt;p&gt;We&amp;rsquo;ve found that getting it in use by real people very early on has helped a lot.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://clarkparsia.com/projects/code/jspace/&#34;&gt;http://clarkparsia.com/projects/code/jspace/&lt;/a&gt;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2007">2007</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
      <category domain="https://www.bobdc.com//categories/project-ideas">project ideas</category>
      
    </item>
    
    <item>
      <title>XML Summer School in Oxford</title>
      <link>https://www.bobdc.com/blog/xml-summer-school-in-oxford-1/</link>
      <pubDate>Sun, 01 Jul 2007 09:18:55 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/xml-summer-school-in-oxford-1/</guid>
      
      
      <description><div>Teaching and learning XML with old and new friends.</div><div>&lt;p&gt;I wrote &lt;a href=&#34;https://www.bobdc.com/blog/xml-summer-and-oxford&#34;&gt;last year&lt;/a&gt; about how much I was looking forward to going to the &lt;a href=&#34;http://www.xmlsummerschool.com&#34;&gt;XML Summer School&lt;/a&gt; at Oxford University, and I&amp;rsquo;m looking forward to it even more this year, because my wife and daughters will come with me. (Not to the classes, but certainly to several of the social events, and there are &lt;a href=&#34;http://www.xmlsummerschool.com/social.html&#34;&gt;plenty of those&lt;/a&gt;.) It will be held at Wadham College again; the picture shows Wadham&amp;rsquo;s beautiful chapel, which adjoins the room where they usually hold the opening reception.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://www.xmlsummerschool.com&#34;&gt;&lt;img src=&#34;http://www.liedertafel.org/img/wadham1.jpg&#34; alt=&#34;[Wadham chapel]&#34; border=&#34;0&#34; align=&#34;right&#34; hspace=&#34;30px&#34; vspace=&#34;30px&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Most of what I wrote last year still applies, and &lt;a href=&#34;http://www.xmlgrrl.com/blog/archives/2007/06/13/hot-xml-fun-in-the-summertime/&#34;&gt;Eve Maler&lt;/a&gt;, &lt;a href=&#34;http://kontrawize.blogs.com/kontrawize/2007/06/xml-summer-scho.html&#34;&gt;Tony Coates&lt;/a&gt;, and &lt;a href=&#34;http://www.laurenwood.org/anyway/archives/2007/06/27/summer-in-oxford/&#34;&gt;Lauren Wood&lt;/a&gt; already wrote about this year&amp;rsquo;s upcoming event. One great addition to the lineup this year is &lt;a href=&#34;http://www.xmlsummerschool.com/speakers/danconnolly.html&#34;&gt;Dan Connolly&lt;/a&gt;, who will join the &lt;a href=&#34;http://www.xmlsummerschool.com/curriculum/trendsandtransients.html&#34;&gt;Trends and Transients&lt;/a&gt; track.&lt;/p&gt;
&lt;p&gt;The distinguished personnel and fascinating topics of the XSLT-and-related track that I&amp;rsquo;m chairing will remain the same. One change to this track&amp;rsquo;s authors&amp;rsquo; accomplishments in their fields is that Priscilla Walmsley, who will teach the XSL-FO and XQuery classes again, just saw O&amp;rsquo;Reilly publish her &lt;a href=&#34;http://www.oreilly.com/catalog/9780596006341/&#34;&gt;XQuery book&lt;/a&gt;. Of the others teaching in this track, XSLT experts don&amp;rsquo;t get any more expert than Michael Kay and Jeni Tennison, and Paul Prescod&amp;rsquo;s explanation of XSLT use in popular AJAX applications taught me that there was a lot more XSLT under the hood of many applications I use often than I had realized, in addition to giving me a firmer grounding in what really goes into an AJAX application.&lt;/p&gt;
&lt;p&gt;The Trends and Transients track includes a section where each track chair gets to &amp;ldquo;rant&amp;rdquo; about something. I put &amp;ldquo;rant&amp;rdquo; in quotes because Ian Forrester put my five-minute talk from last year &lt;a href=&#34;http://www.youtube.com/watch?v=mTPA6YVwnCg&#34;&gt;on YouTube&lt;/a&gt;, and while I don&amp;rsquo;t think I was ranting that much, it&amp;rsquo;s still titled &amp;ldquo;Bob DuCharme rants&amp;rdquo; because that section of the program was billed as track chairs ranting. It went well, and people laughed at my jokes, but it was hot, and in addition to the fan that you see next to me there are fans near Ian, and they gave me a little too much competition for the audio attention of Ian&amp;rsquo;s video recorder. (I expanded on the topic of the rant &lt;a href=&#34;https://www.bobdc.com/blog/what-data-is-your-metadata-abo&#34;&gt;in this weblog&lt;/a&gt; about a month after the summer school.)&lt;/p&gt;
&lt;p&gt;The social events that are an integral part of the schedule are a lot of fun and always in very interesting settings. I know my kids will want to catch the newest event this year, a tour of the 400 year-old &lt;a href=&#34;http://en.wikipedia.org/wiki/Bodleian_Library&#34;&gt;Bodleian Library&lt;/a&gt;, although the library&amp;rsquo;s role as the Hogwarts infirmary in the first two Harry Potter movies will appeal to them more than the library&amp;rsquo;s age.&lt;/p&gt;
&lt;p&gt;There always seems to be plenty of wine with dinner and beer wherever we go afterwards, so it takes some discipline the night before presenting some sessions to go easy on the alcohol. For more on how much learning and fun the week provides, see also my &lt;a href=&#34;https://www.bobdc.com/blog/xml-summer-school-in-oxford&#34;&gt;wrapup&lt;/a&gt; from after the event last year. And it&amp;rsquo;s not too late to sign up and join us!&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2007">2007</category>
      
      <category domain="https://www.bobdc.com//categories/xml">XML</category>
      
    </item>
    
    <item>
      <title>Women in computing: what about the cultural variable?</title>
      <link>https://www.bobdc.com/blog/women-in-computing-what-about/</link>
      <pubDate>Fri, 22 Jun 2007 08:57:37 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/women-in-computing-what-about/</guid>
      
      
      <description><div>Why do I see more women programmers among Eastern Europeans?</div><div>&lt;p&gt;A recent &lt;a href=&#34;http://www.devchix.com/2007/06/09/let%e2%80%99s-all-evolve-past-this-the-barriers-women-face-in-tech-communities/&#34;&gt;devchix&lt;/a&gt; blog post has inspired a lot of discussion about the low percentage of women software engineers out there. There&amp;rsquo;s been plenty of discussion in the XML community, as &lt;a href=&#34;http://www.tbray.org/ongoing/When/200x/2007/06/20/Women&#34;&gt;Tim Bray&lt;/a&gt;, &lt;a href=&#34;http://www.laurenwood.org/anyway/archives/2007/06/14/tech-women/&#34;&gt;Lauren Wood&lt;/a&gt;, &lt;a href=&#34;http://burningbird.net/technology/women-evidentally-dont-program/&#34;&gt;Shelley Powers&lt;/a&gt;, &lt;a href=&#34;http://www.jenitennison.com/blog/node/33&#34;&gt;Jeni Tennison&lt;/a&gt;, &lt;a href=&#34;http://times.usefulinc.com/2007/06/21-jeni&#34;&gt;Edd Dumbill&lt;/a&gt; and &lt;a href=&#34;http://www.megginson.com/blogs/quoderat/2007/06/21/maybe-the-women-are-right/&#34;&gt;David Megginson&lt;/a&gt; have contributed thoughtful comments. Everyone says that there are a lot fewer women than men writing code, especially in the US, the UK, and western Europe. OK, to be honest, I haven&amp;rsquo;t seen anyone include this qualifier, which is very interesting. (Note that Jeni, who is British, shows &lt;a href=&#34;http://www.jenitennison.com/blog/node/30&#34;&gt;graphs&lt;/a&gt; of male vs. female computer science degrees based on purely American data, or so I assume from the reference to &amp;ldquo;national data&amp;rdquo; in the &lt;a href=&#34;http://www.nsf.gov/statistics/nsf07307/tables/tab34.xls&#34;&gt;nsf.gov source&lt;/a&gt; of the data points.)&lt;/p&gt;
&lt;p&gt;The &amp;ldquo;western Europe&amp;rdquo; part is particularly important, because I&amp;rsquo;d like to avoid the minefield of generalizations based on race or ethnicity. While it would be almost trite to say that Japanese attitudes about a culture of consensus are more in line with what devchix&amp;rsquo;s gloriajw is looking for, I&amp;rsquo;ve never personally known any programmers of either sex from that country. Professionally and personally, though, I&amp;rsquo;d say that at least half of the programmers I&amp;rsquo;ve known from former Soviet bloc countries are women, and I don&amp;rsquo;t expect to hear about a culture of consensus trumping pissing-contest swagger in those countries.&lt;/p&gt;
&lt;p&gt;When I was getting a computer science degree, it was always interesting in the first meeting of each large lecture course to do the mental pie graph of how many men and how many women were in each group. That&amp;rsquo;s when I first noticed that among those speaking Eastern European languages (not that I was very good at identifying exactly which languages were being spoken), there were plenty of women, perhaps even fifty percent. In workplaces since then, while I haven&amp;rsquo;t known dozens of Eastern European émigrés, women were well-represented among the ones I&amp;rsquo;ve known.&lt;/p&gt;
&lt;p&gt;Why is this? Does it stem from attitudes about the value of engineering as a profession in those countries? Is any of this a legacy of the Soviet system? Are there things that they&amp;rsquo;re doing right that we should emulate—beyond the obvious one of encouraging a good math education among both sexes—and how would such recommendations relate to the issues that the gloriajw and Jeni bring up? Am I making a mistake by basing my generalization on those who&amp;rsquo;ve emigrated to the US?&lt;/p&gt;
&lt;p&gt;My older daughter was in an after-school math club this year because her friend Diana begged her to join. Diana&amp;rsquo;s Romanian parents (her dad teaches chip design at the University of Virginia and her mom has been on maternity leave from a Java coding job) forced her to join, because they have strong ideas about the importance of a good math education. I&amp;rsquo;m going to have to ask Diana&amp;rsquo;s mom about the system that led her to becoming a software engineer the next time I see her.&lt;/p&gt;
&lt;h2 id=&#34;4-comments&#34;&gt;4 Comments&lt;/h2&gt;
&lt;p&gt;By Anamaria Stoica on &lt;a href=&#34;#comment-999&#34;&gt;June 25, 2007 3:41 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Hi,&lt;br /&gt;
I&amp;rsquo;m a Romanian Computer Science student and also work as a programmer, and I&amp;rsquo;m a girl :)&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;d like to say something about the Romanian educational system: Math is considered one of the most important subjects there are in school. And something different that I&amp;rsquo;ve noticed, people that know Math are considered cool, at least that was like in my High School. Girls too.&lt;/p&gt;
&lt;p&gt;But, even though in my last year of High School we were about 50% boys and 50% girls(both very good at math &amp;amp; programming), not that many girls went to study Computer Science, and preferred instead Economical studies (Maybe it&amp;rsquo;s a trend right now here : girls go to economical universities, boys to engineering universities).&lt;/p&gt;
&lt;p&gt;At my faculty, we are about 25% girls right now. And in general the guys are very cool with the girls. They don&amp;rsquo;t make those kind of judgments based on gender. Although, there are ones that do(not many). BUT in almost every case I&amp;rsquo;ve encountered, those guys sucked as programmers, and in my opinion sucked as human beings too :) (Therefore their opinion is not so important)&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;d like to add something else about the guidance my parents gave me as growing up regarded to my studies. As both of them were engineers, they felt like a good math education for me and my brother was very important and always encouraged both(my brother and I) of us towards this direction equally.&lt;/p&gt;
&lt;p&gt;Thanks for this post and everyone else that posted about this subject. I think it&amp;rsquo;s very encouraging for young programmer girls that there is a concern about this. :) Not long ago I was looking for women in IT as models, and didn&amp;rsquo;t find that many, and started to worry about what chances did I have in this just because I was a girl.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snelson.org.uk&#34; title=&#34;http://www.snelson.org.uk&#34;&gt;John Snelson&lt;/a&gt; on &lt;a href=&#34;#comment-1000&#34;&gt;June 25, 2007 9:37 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Interesting point. I helped take a highly technical class at a company in Madrid a few years back, and was amazed that the attendees were about 50% female.&lt;/p&gt;
&lt;p&gt;Compare that to the UK, where I&amp;rsquo;ve only ever worked with one female software engineer&amp;hellip;&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.squidoo.com/best-airsoft-gun&#34; title=&#34;http://www.squidoo.com/best-airsoft-gun&#34;&gt;Jimmy&lt;/a&gt; on &lt;a href=&#34;#comment-1002&#34;&gt;June 26, 2007 1:37 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;In my IT class there was only one girl:(( And not pretty at all :(((&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-1004&#34;&gt;June 26, 2007 8:12 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;You should read some of the posts I link to&amp;ndash;a public declaration of whether she&amp;rsquo;s pretty is part of the problem.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2007">2007</category>
      
      <category domain="https://www.bobdc.com//categories/miscellaneous">miscellaneous</category>
      
    </item>
    
    <item>
      <title>Time running out for Semantic Web Strategies talk proposals</title>
      <link>https://www.bobdc.com/blog/time-running-out-for-semantic/</link>
      <pubDate>Thu, 21 Jun 2007 16:56:20 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/time-running-out-for-semantic/</guid>
      
      
      <description><div>Nine days left!</div><div>&lt;p&gt;&lt;a href=&#34;http://www.semanticwebstrategies.com/index.php&#34;&gt;&lt;img src=&#34;http://www.semanticwebstrategies.com/images/logo_SWS_hdr.gif&#34; alt=&#34;[Semantic Web Strategies logo]&#34; border=&#34;0&#34; align=&#34;right&#34; hspace=&#34;30px&#34; vspace=&#34;30px&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;When we first set the proposal deadline of June 30th for speaker submissions, it seemed pretty far off, but it&amp;rsquo;s bearing down on us. Several people have told me that they have some ideas and plan to submit something, so it&amp;rsquo;s time to &lt;a href=&#34;http://www.semanticwebstrategies.com/speak.php&#34;&gt;fill out the form&lt;/a&gt;. It&amp;rsquo;s pretty simple and won&amp;rsquo;t take much time. If you have several ideas, don&amp;rsquo;t be afraid to try more than one.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2007">2007</category>
      
      <category domain="https://www.bobdc.com//categories/semantic-web">semantic web</category>
      
    </item>
    
    <item>
      <title>Emacs: good (and how to create a foreign characters menu)</title>
      <link>https://www.bobdc.com/blog/emacs-good-and-how-to-create-a/</link>
      <pubDate>Mon, 18 Jun 2007 18:32:15 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/emacs-good-and-how-to-create-a/</guid>
      
      
      <description><div>Foreign to me, at least.</div><div>&lt;p&gt;I never tried to proselytize Emacs. Many times, when I saw someone using Windows Notepad, I told them &amp;ldquo;there are so many better, free alternatives out there, like &lt;a href=&#34;http://notepad-plus.sourceforge.net/uk/site.htm&#34;&gt;Notepad++&lt;/a&gt;&amp;rdquo;. These people inevitably responded by asking me if I used Notepad++, to which I replied &amp;ldquo;Well, you don&amp;rsquo;t want to know what I use. It&amp;rsquo;s a geek editor.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;I thought that Emacs&amp;rsquo; learning curve would be too steep for most people, because although it&amp;rsquo;s highly configurable and adaptable, it won&amp;rsquo;t adapt too well to the typical Windows user&amp;rsquo;s expectation that Ctrl+X means &amp;ldquo;cut&amp;rdquo; and Ctrl+C means &amp;ldquo;copy&amp;rdquo;. Ctrl+X and Ctrl+C each begin enough important keystroke combinations in Emacs that you&amp;rsquo;re better off not redefining them, so you can never make Emacs too comfortable for Windows users.&lt;/p&gt;
&lt;p&gt;A December blog posting by Derek Slager titled &lt;a href=&#34;http://derekslager.com/blog/posts/2006/12/the-case-for-emacs.ashx&#34;&gt;The Case for Emacs&lt;/a&gt; makes a good case for non-Emacs users to switch, at least if you&amp;rsquo;re a programmer. (I really don&amp;rsquo;t do that much coding, by the way, and spend most of my time using Emacs in &lt;a href=&#34;http://www.thaiopensource.com/nxml-mode/&#34;&gt;nxml&lt;/a&gt; mode to write XML and XHTML. If I am coding, it&amp;rsquo;s usually XSLT, so of course I&amp;rsquo;m using nxml for that as well.) One of Slager&amp;rsquo;s key points is that when Emacs users learn a new programming language, they don&amp;rsquo;t need to learn a new IDE, because their existing development environment probably already has a full-featured mode for that language with keystrokes consistent with what they know from editing code in other languages. This makes them productive in their new language faster. As he put it,&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;If you&amp;rsquo;re an Emacs user, you can be writing C# on Windows Monday, Ruby on Mac OS Tuesday, and Python on Linux on Wednesday. In each case, there are language-specific tools to use, but the place you spend the most time—your editor—is consistent across tools and platforms. Virtually all the time you invested in learning (and customizing) the editor comes along for the ride each time. Emacs lowers the bar.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;img src=&#34;https://www.bobdc.com/img/main/foreignchars.jpg&#34; alt=&#34;[emacs menu screenshot]&#34; border=&#34;0&#34; align=&#34;right&#34; hspace=&#34;30px&#34; vspace=&#34;30px&#34;/&gt;
&lt;p&gt;Emacs just had its &lt;a href=&#34;http://lists.gnu.org/archive/html/info-gnu-emacs/2007-06/msg00000.html&#34;&gt;first significant upgrade&lt;/a&gt; in over four years. I haven&amp;rsquo;t had the time to look through the new features, but there&amp;rsquo;s one improvement that was immediately apparent to me: the little menus I had added to insert commonly needed foreign characters had worked correctly before, but displayed a little oddly, and now they both work and display properly.&lt;/p&gt;
&lt;p&gt;As the picture shows, my &amp;ldquo;non-ASCII&amp;rdquo; menu shows the vowels plus a few more common symbols. Picking a vowel displays a cascade menu of that vowel with different accents, and picking one of those inserts that character. I actually have mine inserting the numeric character references (for example, &lt;code&gt;&amp;amp;#228;&lt;/code&gt; for &lt;code&gt;ä&lt;/code&gt;) but it would be easy enough to modify the code to insert the actual characters.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;(require &#39;easymenu)


(easy-menu-define 
 non-ASCII-menu global-map &amp;quot;non-ASCII&amp;quot;
 &#39;(&amp;quot;non-ASCII&amp;quot;
    (&amp;quot;a&amp;quot;
    [&amp;quot;à&amp;quot; insert-agrave]
    [&amp;quot;á&amp;quot; insert-aacute]
    [&amp;quot;â&amp;quot; insert-acirc]
    [&amp;quot;ã&amp;quot; insert-atilde]
    [&amp;quot;ä&amp;quot; insert-auml]
    )
    (&amp;quot;e&amp;quot;
    [&amp;quot;è&amp;quot; insert-egrave]
    [&amp;quot;é&amp;quot; insert-eacute]
    [&amp;quot;ê&amp;quot; insert-ecirc]
    [&amp;quot;ë&amp;quot; insert-euml]
    )
    (&amp;quot;i&amp;quot;
    [&amp;quot;ì&amp;quot; insert-igrave]
    [&amp;quot;í&amp;quot; insert-iacute]
    [&amp;quot;î&amp;quot; insert-icirc]
    [&amp;quot;ï&amp;quot; insert-iuml]
    )
    (&amp;quot;o&amp;quot;
    [&amp;quot;ò&amp;quot; insert-ograve]
    [&amp;quot;ó&amp;quot; insert-oacute]
    [&amp;quot;ô&amp;quot; insert-ocirc]
    [&amp;quot;õ&amp;quot; insert-otilde]
    [&amp;quot;ö&amp;quot; insert-ouml]
    )
    (&amp;quot;u&amp;quot;
    [&amp;quot;ù&amp;quot; insert-ugrave]
    [&amp;quot;ù&amp;quot; insert-uacute]
    [&amp;quot;û&amp;quot; insert-ucirc]
    [&amp;quot;ü&amp;quot; insert-uuml]
    )
    [&amp;quot;ntilde&amp;quot; insert-ntilde]
    [&amp;quot;euro&amp;quot; insert-euro]
   )
)


(easy-menu-add non-ASCII-menu)


(defun insert-ntilde () (interactive) (insert &amp;quot;&amp;amp;#241;&amp;quot;) )
(defun insert-euro () (interactive) (insert &amp;quot;&amp;amp;#x20AC;&amp;quot;) )


(defun insert-agrave () (interactive) (insert &amp;quot;&amp;amp;#224;&amp;quot;) )
(defun insert-aacute () (interactive) (insert &amp;quot;&amp;amp;#225;&amp;quot;) )
(defun insert-acirc () (interactive) (insert &amp;quot;&amp;amp;#226;&amp;quot;) )
(defun insert-atilde () (interactive) (insert &amp;quot;&amp;amp;#227;&amp;quot;) )
(defun insert-auml () (interactive) (insert &amp;quot;&amp;amp;#228;&amp;quot;) )


(defun insert-egrave () (interactive) (insert &amp;quot;&amp;amp;#232;&amp;quot;) )
(defun insert-eacute () (interactive) (insert &amp;quot;&amp;amp;#233;&amp;quot;) )
(defun insert-ecirc () (interactive) (insert &amp;quot;&amp;amp;#234;&amp;quot;) )
(defun insert-euml () (interactive) (insert &amp;quot;&amp;amp;#235;&amp;quot;) )


(defun insert-igrave () (interactive) (insert &amp;quot;&amp;amp;#236;&amp;quot;) )
(defun insert-iacute () (interactive) (insert &amp;quot;&amp;amp;#237;&amp;quot;) )
(defun insert-icirc () (interactive) (insert &amp;quot;&amp;amp;#238;&amp;quot;) )
(defun insert-iuml () (interactive) (insert &amp;quot;&amp;amp;#239;&amp;quot;) )


(defun insert-ograve () (interactive) (insert &amp;quot;&amp;amp;#242;&amp;quot;) )
(defun insert-oacute () (interactive) (insert &amp;quot;&amp;amp;#243;&amp;quot;) )
(defun insert-ocirc () (interactive) (insert &amp;quot;&amp;amp;#244;&amp;quot;) )
(defun insert-otilde () (interactive) (insert &amp;quot;&amp;amp;#245;&amp;quot;) )
(defun insert-ouml () (interactive) (insert &amp;quot;&amp;amp;#246;&amp;quot;) )


(defun insert-ugrave () (interactive) (insert &amp;quot;&amp;amp;#249;&amp;quot;) )
(defun insert-uacute () (interactive) (insert &amp;quot;&amp;amp;#250;&amp;quot;) )
(defun insert-ucirc () (interactive) (insert &amp;quot;&amp;amp;#251;&amp;quot;) )
(defun insert-uuml () (interactive) (insert &amp;quot;&amp;amp;#252;&amp;quot;) )
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I look forward to finding out what else is new in Emacs 22.1.1.&lt;/p&gt;
&lt;p&gt;Something else nice for Emacs users: the weblog &lt;a href=&#34;http://emacs.wordpress.com/&#34;&gt;minor Emacs wizardry&lt;/a&gt; has lots of great tips. That&amp;rsquo;s where I learned that putting your cursor after any parenthesized expression and pressing &lt;code&gt;C-x C-e&lt;/code&gt; evaluates that expression. Just now I was wondering how much 18 tons of gravel would be for our driveway at $15 each, so I typed &lt;code&gt;(* 18 15)&lt;/code&gt; into the window I happened to be in, entered that keystroke combination, and had my answer.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2007">2007</category>
      
      <category domain="https://www.bobdc.com//categories/neat-tricks">neat tricks</category>
      
    </item>
    
    <item>
      <title>developerWorks article on XHTML 2</title>
      <link>https://www.bobdc.com/blog/developerworks-article-on-xhtm/</link>
      <pubDate>Wed, 13 Jun 2007 17:24:24 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/developerworks-article-on-xhtm/</guid>
      
      
      <description><div>Why I like XHTML 2.</div><div>&lt;p&gt;IBM developerWorks has just published an article I wrote called &lt;a href=&#34;http://www.ibm.com/developerworks/library/x-xhtml2now.html&#34;&gt;Put XHTML 2 to work now&lt;/a&gt;. I originally called it &amp;ldquo;XHTML 2: Useful Now&amp;rdquo;, the idea being that it&amp;rsquo;s worth doing some work with it now instead of waiting for it to become a Recommendation. They thought that this title might give the impression of &amp;ldquo;it&amp;rsquo;s finally become useful&amp;rdquo;, so I let them change it.&lt;/p&gt;
&lt;p&gt;This quote sums up the main idea of the article:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Many publishers that store content in XML have always known that using an existing, standard schema (by which I mean a W3C Schema, a RELAX NG schema, or a DTD) was better than creating their own from scratch. They looked at DocBook and found it too complex; they looked at HTML or XHTML 1 and found it too simple. For many of them, XHTML 2 will hit a sweet spot between the richness of DocBook and the simplicity of XHTML 1 that makes it a perfectly good format for storing content, whether that content has to be converted to other formats for delivery in various media or not.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id=&#34;4-comments&#34;&gt;4 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://kfahlgren.com/blog&#34; title=&#34;http://kfahlgren.com/blog&#34;&gt;Keith Fahlgren&lt;/a&gt; on &lt;a href=&#34;#comment-970&#34;&gt;June 13, 2007 6:21 PM&lt;/a&gt;&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;XHTML 2 will hit a sweet spot between the richness of DocBook and the simplicity of XHTML 1&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I wonder if you the work that the &lt;a href=&#34;http://wiki.docbook.org/topic/Publishers&#34;&gt;DocBook SubCommittee for Publisher&lt;/a&gt; is currently undertaking will be able to reduce (or, if you wanted to be kind, reduce in the culinary sense) DocBook&amp;rsquo;s richness into a useful, approachable subset?&lt;/p&gt;
&lt;p&gt;[Disclosure: I&amp;rsquo;m a subcommittee member]&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-971&#34;&gt;June 13, 2007 7:31 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&amp;ldquo;Useful to whom&amp;rdquo; is always the difficult question when subsetting something that has lots of features for a wide variety of people, because it can be customized for some at the expense of others. Given the committee name, I assume you&amp;rsquo;ve already determined the audience for your subset, which is an important first step. I was a big fan of &lt;a href=&#34;http://www.docbook.org/schemas/simplified%20&#34;&gt;Simplified Docbook&lt;/a&gt; (docbook.org down as I write this), which has been around for a few years.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.ccil.org/~cowan&#34; title=&#34;http://www.ccil.org/~cowan&#34;&gt;John Cowan&lt;/a&gt; on &lt;a href=&#34;#comment-972&#34;&gt;June 13, 2007 10:23 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Plug: &lt;a href=&#34;http://ccil.org/~cowan/xhtml2.odp&#34;&gt;Moving Toward XHTML 2.0&lt;/a&gt;, my ODF slide deck on how XHTML 2 works. Also available in &lt;a href=&#34;http://ccil.org/~cowan/xhtml2.ppt&#34;&gt;Powerpoint&lt;/a&gt; and &lt;a href=&#34;http://ccil.org/~cowan/xhtml2.pdf&#34;&gt;PDF&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.grauw.nl/&#34; title=&#34;http://www.grauw.nl/&#34;&gt;Laurens Holst&lt;/a&gt; on &lt;a href=&#34;#comment-973&#34;&gt;June 14, 2007 4:04 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I’ve put XHTML2 to use in a commercial environment (read: company website and product documentation) for exactly these reasons. Had pretty good experiences with that, although it would have been nice if there was a WYSIWYG editor (or maybe rather WYSIWYM or whatever is the hip word nowadays).&lt;/p&gt;
&lt;p&gt;It was output to four different formats, with XSLT to HTML, XHTML+AJAX and Microsoft CHM files, and PDF using PrinceXML.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2007">2007</category>
      
      <category domain="https://www.bobdc.com//categories/xml">XML</category>
      
    </item>
    
    <item>
      <title>Semantic Web Strategies speaker submission form back up</title>
      <link>https://www.bobdc.com/blog/semantic-web-strategies-speake/</link>
      <pubDate>Mon, 11 Jun 2007 23:19:54 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/semantic-web-strategies-speake/</guid>
      
      
      <description><div>Ready for your ideas.</div><div>&lt;p&gt;The &lt;a href=&#34;http://www.semanticwebstrategies.com/speakerproposal.php&#34;&gt;proposal submission form&lt;/a&gt; for the Semantic Web Strategies conference was timing out over the weekend, but it&amp;rsquo;s back up and working properly now.&lt;/p&gt;
&lt;p&gt;I noticed that it won&amp;rsquo;t let you submit it unless you pick a &amp;ldquo;suggested track&amp;rdquo;; the two fairly arbitrary categories are &amp;ldquo;The Past and Present of the Semantic Web&amp;rdquo; and &amp;ldquo;The Present and Future of the Semantic Web&amp;rdquo;. The former is oriented more toward case studies of projects that have been completed, while the &amp;ldquo;Present and Future&amp;rdquo; one is for discussions of upcoming projects and developments. Don&amp;rsquo;t worry too much about which you pick.&lt;/p&gt;
&lt;p&gt;We&amp;rsquo;ve seen some interesting submissions, and we look forward to seeing more!&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2007">2007</category>
      
      <category domain="https://www.bobdc.com//categories/semantic-web">semantic web</category>
      
    </item>
    
    <item>
      <title>More on Word&#39;s mediocre XML</title>
      <link>https://www.bobdc.com/blog/more-on-words-mediocre-xml/</link>
      <pubDate>Sun, 10 Jun 2007 09:55:11 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/more-on-words-mediocre-xml/</guid>
      
      
      <description><div>It&#39;s not just the index tag markup, but most of the &#34;Insert Field&#34; parts.</div><div>&lt;p&gt;After I &lt;a href=&#34;https://www.bobdc.com/blog/word-2003s-awful-xml-for-index&#34;&gt;wrote recently&lt;/a&gt; about the awful markup used to identify index entries when you save a Word 2003 file as XML, Jon Udell wrote to me to relay MS Office Program Manager &lt;a href=&#34;http://blogs.msdn.com/brian_jones/&#34;&gt;Brian Jones&lt;/a&gt;&amp;rsquo; query about whether I felt similarly about other markup in the XML version of a Word document. I haven&amp;rsquo;t had the time to do a comprehensive review of the XML, and I&amp;rsquo;ve &lt;a href=&#34;http://www.snee.com/bobdc.blog/2006/11/word_2003_xml.html&#34;&gt;written before&lt;/a&gt; about a pleasant surprise I found in it (and I was annoyed at the fuss over Microsoft &lt;a href=&#34;http://www.oreillynet.com/xml/blog/2007/01/an_interesting_offer.html&#34;&gt;paying Rick Jelliffe&lt;/a&gt; to add some perspective to the ODF/OOXML Wikipedia entries—it&amp;rsquo;s &lt;em&gt;Rick Jelliffe&lt;/em&gt;, for chrissake) but a bit more investigation let me generalize from my earlier negative comments, and after writing it out to Jon I thought I&amp;rsquo;d expand on it a bit and post it.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://msdn2.microsoft.com/en-us/library/aa212812(office.11).aspx&#34;&gt;&lt;img src=&#34;https://www.bobdc.com/img/main/wordml.jpg&#34; alt=&#34;[Word ML logo]&#34; border=&#34;0&#34; align=&#34;right&#34; hspace=&#34;30px&#34; vspace=&#34;30px&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The project I&amp;rsquo;m writing doesn&amp;rsquo;t need the hyperlinks or table of contents markers in the Word XML, but from what I&amp;rsquo;ve seen of them, it looks like the XML representation of most of the Insert Field features seem to be that XML-ized version of the RTF: &lt;code&gt;&amp;lt;w:fldChar w:fldCharType=&amp;quot;begin&amp;quot;/&amp;gt;&lt;/code&gt;, then a &lt;code&gt;w:instrText&lt;/code&gt; element with some cryptic string such as &amp;rsquo; TOC \o &amp;ldquo;1-2&amp;rdquo; \n \p &amp;quot; &amp;quot; \h \z &amp;rsquo; for a table of contents marker, &amp;lsquo;HYPERLINK \l &amp;ldquo;_Toc135558539&amp;rdquo;&amp;rsquo; for a hyperlink, &amp;rsquo; XE &amp;ldquo;&amp;rsquo; for an index entry, and &lt;code&gt;&amp;lt;w:fldChar w:fldCharType=&amp;quot;end&amp;quot;/&amp;gt;&lt;/code&gt; to finish it.&lt;/p&gt;
&lt;p&gt;To test this theory, I created a sample document with about a dozen things added with different Insert Field selections and exported the result as an XML document. The XML version of most of the field constructs begin and end with &lt;code&gt;w:r&lt;/code&gt; elements containing &lt;code&gt;w:fldChar&lt;/code&gt; elements with &lt;code&gt;w:fldCharType&lt;/code&gt; attribute values of &amp;ldquo;begin&amp;rdquo; and &amp;ldquo;end&amp;rdquo;. Some store their information in a &lt;code&gt;w:r&lt;/code&gt; child of a &lt;code&gt;w:fldSimple&lt;/code&gt; element instead. The &lt;code&gt;w:fldSimple&lt;/code&gt; element&amp;rsquo;s &lt;code&gt;w:instr&lt;/code&gt; attribute seems to be the equivalent of the &lt;code&gt;w:instrText&lt;/code&gt; cousins of the &lt;code&gt;w:fldChar&lt;/code&gt; &amp;ldquo;begin&amp;rdquo; and &amp;ldquo;end&amp;rdquo; elements, with cryptic strings of uppercase keywords, punctuation, and quotation marks like the TOC one shown above to say something about their purpose. (To be fair, the &amp;ldquo;Hyperlink&amp;rdquo; field had an actual &lt;code&gt;w:hlink&lt;/code&gt; element to represent it.)&lt;/p&gt;
&lt;p&gt;Indicating where the constructs begin and end with two separate, generic empty elements that have a &lt;code&gt;fldCharType&lt;/code&gt; attribute value of &amp;ldquo;begin&amp;rdquo; and &amp;ldquo;end&amp;rdquo; is much more difficult to work with than a matched pair of start- and end-tags. XML isn&amp;rsquo;t simply the representation of data with tags enclosed in angle brackets in such a way that Xerces doesn&amp;rsquo;t complain about it; much of the point of XML is to clearly indicate where things (and sub-things) begin and end using a matching pair of start- and end-tags. I suppose that an XML representation of a Word file must address the possibility of overlap—what if the document has &lt;strong&gt;bold text, &lt;em&gt;then bold italic,&lt;/em&gt;&lt;/strong&gt; &lt;em&gt;then just italic?&lt;/em&gt;—but if the OpenOffice coders can parse the original Word file and turn it into good markup, we know it can be done.&lt;/p&gt;
&lt;p&gt;A new annoyance revealed by my further research is the fact that those &lt;code&gt;w:instrText&lt;/code&gt; elements store their cryptic strings of information such as &amp;rsquo; TOC \o &amp;ldquo;1-2&amp;rdquo; \n \p &amp;quot; &amp;quot; \h \z &amp;rsquo; as PCDATA. Using XSLT, it&amp;rsquo;s usually easy to check whether an element has no content (regardless of the number of descendant elements it has) by checking whether &lt;code&gt;normalize-space(value-of(.))&lt;/code&gt; = &amp;ldquo;&amp;rdquo;, and when processing XML versions of Word there are often empty paragraphs and maybe even empty sections that you want to throw out, but these &lt;code&gt;w:instrText&lt;/code&gt; elements prevent this from working. I know that storing content in PCDATA and metadata in attributes is only a convention, but it&amp;rsquo;s a convention of document-oriented XML going back to SGML days, and an XML version of a Word file is certainly document-oriented XML. (More on this in the &lt;a href=&#34;https://www.bobdc.com/blog/word-2003s-awful-xml-for-index#comments&#34;&gt;comments&lt;/a&gt; to my earlier entry on the topic.)&lt;/p&gt;
&lt;p&gt;The kinds of things that a Word user picks &amp;ldquo;Insert Field&amp;rdquo; to add are often very important to what makes a Word or XML document richer than plain ASCII text with no markup, and it&amp;rsquo;s a shame that whoever designed the MS XML to represent these didn&amp;rsquo;t do a little more modeling of the data necessary to represent each field type and instead just mapped the RTF (or whatever internal structures that I&amp;rsquo;m sure the RTF reflects) to pointy brackets and strings full of internal codes. I&amp;rsquo;m sure it made their design work go more quickly, but the result is something that offers few good arguments for advocacy as a standard.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://blogs.msdn.com/brian_jones/&#34;&gt;Jones&amp;rsquo; blog&lt;/a&gt; has been talking up an open source API for processing the Office XML, and while it&amp;rsquo;s good that such a tool exists and is open source, it doesn&amp;rsquo;t address the issues I describe above. The &amp;ldquo;don&amp;rsquo;t worry about the data complexity, we have a tool that takes care of it&amp;rdquo; argument often presented in such cases leads to a software dependency, and the reason we use open data standards is to avoid dependency on specific tools. (A dirty little secret of the SGML world was that while we all preached the gospel of an open ISO data standard as a way to avoid dependency on specific software tools, most serious production work relied on &lt;a href=&#34;http://www.stilo.com/products/omnimark_overview.html&#34;&gt;Omnimark&lt;/a&gt;, a company that at the time was run by a man who would rather tell developers what they needed than listen to what they needed. One former employer of mine converted their SGML system to use XML purely to eliminate their dependency on Omnimark.) A dependency of a data format on a specific tool takes away from arguments toward making that data format a standard.&lt;/p&gt;
&lt;p&gt;The things that a Word doc file or an XML version of that doc file must represent can be complex, and I&amp;rsquo;m sure that further investigation of the XML, if I had the time, would reveal further pleasant surprises and further annoyances. So far the score, on balance, is pretty low.&lt;/p&gt;
&lt;h2 id=&#34;10-comments&#34;&gt;10 Comments&lt;/h2&gt;
&lt;p&gt;By Bruce D&amp;rsquo;Arcus on &lt;a href=&#34;#comment-960&#34;&gt;June 10, 2007 12:25 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I &lt;a href=&#34;http://netapps.muohio.edu/blogs/darcusb/darcusb/archives/2006/06/08/citations-in-open-xml&#34;&gt;wrote about&lt;/a&gt; the field problem awhile back WRT to citations.&lt;/p&gt;
&lt;p&gt;On a related note, you might be interested to know that the OpenDocument Metadata SC has just wrapped up its proposal for enhanced metadata support. Based on RDF, it will include a generic field, whose logic is encoded using &amp;hellip; RDF.&lt;/p&gt;
&lt;p&gt;By bryan on &lt;a href=&#34;#comment-961&#34;&gt;June 10, 2007 1:14 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&amp;ldquo;I suppose that an XML representation of a Word file must address the possibility of overlap—what if the document has bold text, then bold italic, then just italic?—but if the OpenOffice coders can parse the original Word file and turn it into good markup, we know it can be done.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;not to mention if CSS and HTML can do it.&lt;/p&gt;
&lt;p&gt;By Rick Jelliffe on &lt;a href=&#34;#comment-962&#34;&gt;June 11, 2007 4:39 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Thanks for the nice words! Here some comments on the structure of Open XML, which I was working on for an article draft, but which may provide some extra general info for interested people.&lt;/p&gt;
&lt;p&gt;Open XML&amp;rsquo;s syntax is indeed odd at first, easily enough material for a year&amp;rsquo;s worth of blogs :-), but I have found that there are usually reasons: Open XML has been made using completely different tradeoffs than, say, DOCBOOK has, and consequently looks different.&lt;/p&gt;
&lt;p&gt;First, on the superficial syntax. Remember a few years ago when Michael McQueen was saying that the trouble with attributes was that you couldn&amp;rsquo;t have structured attributes, and the SML people were saying that there was no difference between an attribute and an element and that we should reduce our use of attributes to a minimum? That seems to have influenced MS&amp;rsquo; design choice behind their properties. They have systematically adopted a &amp;ldquo;head-body&amp;rdquo; approach to have properties in elements (this is hardly a new thing: I wrote about it my 1998 book): there is a consistent naming convention of a &amp;ldquo;Pr&amp;rdquo; suffix used throughout.&lt;/p&gt;
&lt;p&gt;However, they also have the HTML-inspired approach that element content should only have searchable content in mind, so that searching doesn&amp;rsquo;t need to be schema-aware. (With the slight complication that you raise, that deleted text sections and fields still use data content, and the use of numeric indexes to shared string tables in SpreadsheetML.) Then they have decided against using mixed content, again influenced by the SML propaganda but also because it resolves one issue for documents loading into relational DBMS.&lt;/p&gt;
&lt;p&gt;Now I never cared for the SML ideas much: but the combination of allowing structured attributes, schema-less searches, and easy loading to DBMS are entirely respectable choices it seems to me. Which is not to say that DOCBOOK or ODF should adopt the same goals.&lt;/p&gt;
&lt;p&gt;Second, still at a fairly superficial level, terseness has been a goal of Open XML. This is particularly true in the oldest of the languages, SpreadsheetML, which goes back to the beginning of the decade. Not only does Open XML use short element names, it also reveals the internal optimizations of Excel: sparse matrixes, shared strings, SQL_DATE-style numeric indexes to dates, and so on.&lt;/p&gt;
&lt;p&gt;When we think of an application like Office, used by hundreds of millions of people worldwide, load/save/recalculate times are not a secondary issue; for example, saving 1 minute a day for one hundred million people is not nothing! Indeed, it then becomes a challenge to ODF, to say &amp;ldquo;Why don&amp;rsquo;t you support more optimized forms?&amp;rdquo; (No criticism of ODF intended: it is growing up.)&lt;/p&gt;
&lt;p&gt;Third, on the organizational level, Open XML uses its Open Packaging Conventions to recreate SGML&amp;rsquo;s entities: when referring to another document (part, eg equiv of entity) whether internal or external, an id is used (eg equiv of entity name), and each part that has such references has a relationships file (e.g. equiv of internal subset containing entity declarations) which map the ids to URIs (internal or external.) &amp;hellip;Indirection!: where is Elliot Kimber?&lt;/p&gt;
&lt;p&gt;The reason for this is obviously that in jettisoning DTDs for XML Schemas, you also jettison the mechanism for making compound documents, and the wheel needs to be reinvented. I don&amp;rsquo;t think there would be any need to mention to Bob how useful entities are for production purposes, for mid-sized documents. (For large documents, extra levels of indirection and managed IDs become practical; for smaller HTML-sized documemts, direct markup of URLs is easier without an indirection mechanism like URLs; but for middle-sized documents, moving constants to headers helps managability: for example, if you have a catalog with the same logo display 10,000 times, it is preferably to change it once in the relationship file (equiv to entity declaration) rather than each of the 10,000 references. Actually, OPC is something that I think ODF could well adopt.&lt;/p&gt;
&lt;p&gt;But OPC and relationships does make the markup a little more difficult to read, if you don&amp;rsquo;t know that they are there, because suddenly there is information held in different files. I think people coming from HTML will especially have this problem.&lt;/p&gt;
&lt;p&gt;Fourth, there a difference in the design level too: as far as I can see, what MS were trying to do is to take a *completely* linear format and allow arbitrary interleaving of custom XML as the mechanism for *all* structuring. Office 2007 doesn&amp;rsquo;t do any structural implication that I know of (though I am not an expert in it.)&lt;/p&gt;
&lt;p&gt;So saying Open XML is like RTF-in-XML is not unfair, though to say that Open XML is *only* RTF-in-XML would be unfair. Nor would a comparison with HTML (a linear format where structures can be made by the user with DIV and SPAN.)&lt;/p&gt;
&lt;p&gt;Open XML is an &amp;ldquo;open&amp;rdquo; format in the sense that the zipper on a flasher&amp;rsquo;s pants is open: you may not like what you see, it may be less or more than you were expecting, but the functionality is exposed unadorned for all the world&amp;rsquo;s education: whether you are repelled or see opportunities is your business :-) The aim of Open XML is to expose everything that goes on inside Office 2007 not to mediate it according to some abstract/ideological view of the perfect document.&lt;/p&gt;
&lt;p&gt;So, in Word, a document is a list of blocks, and a block is either a list of runs or a table. Consequently, in WordprocessingML, a document contains a sequence of &lt;p&gt; or &lt;tbl&gt; elements, and a &lt;p&gt; contains a sequence of &lt;r&gt; run elements, which may contain a sequence of &lt;t&gt; text runs and diagrams etc.&lt;/p&gt;
&lt;p&gt;The radical thing MS have did was to take an interleaving approach to structure: you can open any schema, and use this with a context sensitive editor (in Word) to wrap blocks, runs, rows and cells with &amp;ldquo;custom&amp;rdquo; elements from that schema. The schema is used to provide syntax direction, but not for subsequent validation; the created WordprocessingML document can still be validated against its usual schemas because the custom elements are marked up with one level of indirection, as values of customXml elements in the word-processing space. Now at the moment, this is not fully baked: you cannot key styles to customXml elements as far as I know: but the aim is to expose what Office 2007 does not what it *may* or *should* do!&lt;/p&gt;
&lt;p&gt;In this way they are trying to turn the linear format from a flaw into a strength: if they had structures in place already (sections, lists, headings) they would have to figure out how not to clash with custom XML structures (which is a problem I expect ODF would have.)&lt;/p&gt;
&lt;p&gt;Fifth, I think Open XML is about the first consumer format I have seen which takes the separation of presentation from content in tables really seriously. This is something that Dave Peterson used to comment on, that tables are a presentation format which should link into tabular date held separately. So Open XML provides mapping controls to XPaths and also columns.&lt;/p&gt;
&lt;p&gt;Lou Burnard used to quote someone that all DTDs are theories about a document: Open XML is clearly a theory about office documents in which there is a hierarchy of 1) casual (linear) documents, then 2) linear documents containing links to data in highly structured XML data, then finally 3) structured documents. Each of these levels has a smaller user base than the preceding one, and the idea of Office XML is to expose what Word/Excel/Powerpoint does in attempting to add better support for the subsequent layers onto its linear roots. That the requirement for structured literary documents is less than the requirement for linking to structured data documents from unstructured literary documents.&lt;/p&gt;
&lt;p&gt;It is in interesting theory, and not one that can be sniffed at, I think. If we remember the Pinnacles DTDs, as used by chip makers in the early 90s, it had a database section in its header, for example for Vcc voltage levels, and the text used references to it. The value to the user was not the structuring of the information into sections (which can be done by stylesheets, i.e. by attributes/properties) but the ability to reference the database. Is that referencing capability (of XML) in fact more important for most users/uses than the explicitly hierarchical structuring capabilities?&lt;/p&gt;
&lt;p&gt;Now, all that being said, I hope one of the opportunities for the ISO process is to find out where there is some syntactic ugliness that causes real problems and to get the rationale explained and, if there is no good rationale and the issue is important, to get an improvement in the works. I don&amp;rsquo;t believe for a moment that Open XML is perfect, not that perfection is required for a standard of its type (i.e. one that exposes a particular deployed application), and the more that we can focus on real production issues of the kind that Bob is raising, rather than the parade of high-volume bogosity we have been treated to, the more chance that Open XML can be blocked as an ISO standard (if those real flaws reach a showstopping level) or improved (if those flaws can be explained or fixed) or accepted but with an understanding of its qualities and attributes.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-963&#34;&gt;June 11, 2007 8:31 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Rick,&lt;/p&gt;
&lt;p&gt;Thanks for all this! Two comments:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;the idea of Office XML is to expose what Word/Excel/Powerpoint&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Is there a version of Powerpoint that can save as XML?&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The aim of Open XML is to expose everything that goes on inside Office&lt;br /&gt;
2007 not to mediate it according to some abstract/ideological view of&lt;br /&gt;
the perfect document&amp;hellip; the aim is to expose what Office 2007 does&lt;br /&gt;
not what it *may* or *should* do!&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I can see the reasons for doing it this way, but how can MS advocate that an XML format designed to expose the internal workings of an aging binary format should be the standard adopted by governments and corporations around the world instead of one in which the abstractions were thought out first and the execution was modeled on those? Put another way, how can they suggest that the standard be for users to adapt their legislated norms to the quirks of one company&amp;rsquo;s tool instead of the other way around? (The answer, of course, is because it&amp;rsquo;s &lt;em&gt;their&lt;/em&gt; tool.)\&lt;/p&gt;
&lt;p&gt;By Rick Jelliffe on &lt;a href=&#34;#comment-964&#34;&gt;June 11, 2007 9:31 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Is there a version of Powerpoint? Yes, Office 2007 and AFAIK Office 2000, 2003 and XP with the compatability kits can save as XML-in-ZIP. (MS also has a save-everything-to-one-file-including-images kind of XML that was in Office 2003, but I hope they have not made that available.)&lt;/p&gt;
&lt;p&gt;But Bob, where on earth did you get the idea that anyone on the MS side has ever said that Open XML should be &amp;ldquo;&lt;strong&gt;the&lt;/strong&gt; standard adopted by governments and corporations around the world instead of one in which the abstractions were thought out first and the execution was modeled on those?&amp;rdquo; Boy, that is complete propaganda, and nothing like what I have ever advocated and nothing like what I have ever read or heard from any MS person: and I have been following the issue with more than casual interest. Indeed, MS voted &lt;strong&gt;for&lt;/strong&gt; ODF recently at ANSI/ICITS/V1 or whatever.&lt;/p&gt;
&lt;p&gt;Any sources for this? Or is it something that &amp;ldquo;everyone knows&amp;rdquo;? I think it comes from the idea that there cannot be overlapping standard, or that if there is a standard we are somehow forced to use it. I think it is words being put in MS&amp;rsquo; mouth. TCP is standard not because it is technically excellent or because it came as a result of great openness initially, but because the RFC describes fairly the pre-existing technology: people chose it (or didn&amp;rsquo;t) over the ISO standards because they were smart enough, not because they were confused by multiple standards or compelled to use it.&lt;/p&gt;
&lt;p&gt;ISO standards are voluntary, and the legislatures have by and large resisted the attempt to make them mandatory, especially when most of the mature implementations of ODF are proprietary (IBM word processor, Lotus&amp;rsquo; new Notes, Sun Star Office, MS Office) and the open source versions are notoriously ratty, in 2006/2007 timeframe at least. You don&amp;rsquo;t fight a monopoly by creating a cartel. Norway has set a really good example recently for documents made public to the external population, by making ISO PDF mandatory for completed public documents, ISO ODF mandatory for for incomplete public documents, HTML allowed for websites where appropriate, and any other ISO format allowed (i.e. future ISO Open XML) to provide parallel versions. That is great: a rich range of formats and the guidance about when to use each&amp;hellip;the Norwegian standards prudently leave out mandating what standards should be used internally systems: that is really where you would expect Open XML to be positioned.&lt;/p&gt;
&lt;p&gt;Where MS is freaking out AFAICS is not where governments mandate ODF for public documents (few public documents are incomplete and so would be PDF or HTML anyway, and Office has a good ODF export/import story) but in the idea of disallowing the Office native format &lt;strong&gt;internally&lt;/strong&gt; inside production systems. They have put in a lot of XML-based features for which there is no equivalent: it makes Office much more competitive against, say, Crystal Reports or even Web forms and XMetal. It is that internal market for system developers and integrators and archive-openers to which Open XML is targetted, not the market of level-playing-field public document interchange between competing office suites. I think the competitors talk about public documents, but they are playing a bait-and-switch scam to try to block people from choosing Office for use in internal systems with its extra integrator-friendly features. In other words, MS wants to make Office a rich platform with features that go beyond ODF, MS&amp;rsquo; competitors want to prevent this and fence systems in to only use or exchange ODF (and any extensions that can be grafted on top of it.)&lt;/p&gt;
&lt;p&gt;The MS position, as I understand it from their public comments, is that ODF is fine for many simple document exchange uses but, taking a cold hard look despite our hopes and wishes otherwise, is not adequate (in its current form of ODF 1.0 and more so in its form of 2 years ago when the Office 2007 decisions were being made) for the most basic of requirements needed for the default format for Office: that you could save a document with all the information needed to reopen it unchanged. MS&amp;rsquo; choice was either to add what they needed to ODF 1.0 draft (not a good idea in view of embrace and extend concerns) or become mind readers and adopt two years ago features that ODF 1.3 or 1.4 will have in three years time (not a good idea due to the inaccuracy of clairvoyant technology.)&lt;/p&gt;
&lt;p&gt;However, I am not an MS spokesman (I haven&amp;rsquo;t even signed a non-disclosure agreement), so my view may be skewed. I just go on what their public comments and training material says.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-965&#34;&gt;June 11, 2007 10:31 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Thanks, I hadn&amp;rsquo;t heard about the compatibility kits. I just saw &amp;ldquo;Save as XML&amp;rdquo; in Word in Excel but not PowerPoint.&lt;/p&gt;
&lt;p&gt;I guess I had an oversimplified view of the standardization issues. I thought it was a case of &amp;ldquo;let&amp;rsquo;s pick a format in which to store our content as we go forward,&amp;rdquo; with two sides saying &amp;ldquo;pick our format, not theirs&amp;rdquo;, which is often the case with disagreements over data standards. The different levels and different formats appropriate to each level makes sense.&lt;/p&gt;
&lt;p&gt;TCP was not a single product from a single company, and the standard gave a blueprint for implementers to work from to ensure interoperability. It sounds like Microsoft&amp;rsquo;s format functions more as an API to their product suite, and while &amp;rsquo;m glad that it exists, what is the point of having it stamped as an ISO standard, besides the marketing advantages of being able to say &amp;ldquo;it&amp;rsquo;s an ISO standard&amp;rdquo;? In other words, what is the advantage to anyone outside of Microsoft of their XML format being an ISO standard? Wouldn&amp;rsquo;t implementers have to work around the interface decisions of this one company whether the documentation of this interface held &amp;ldquo;standard&amp;rdquo; status or not?&lt;/p&gt;
&lt;p&gt;By Rick Jelliffe on &lt;a href=&#34;#comment-966&#34;&gt;June 11, 2007 2:46 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The XML-in-ZIP is the default format for Office 2007 (Word, Excell, Powerpoint). If you do &amp;ldquo;Save as XML&amp;rdquo; you get the crazy all-in-one file format that adds a level of packaging elements instead of the ZIP package. To convert a .DOCX file to ZIP, just change the extension.&lt;/p&gt;
&lt;p&gt;They have given up binary formats as the default. You can still save in the old binary formats of course, or ODF and XHTML and ISO PDF/A if you want to, and Excel has a new fast binary format available as well. There are batch converters available to convert old repositories to Open XML too.&lt;/p&gt;
&lt;p&gt;TCP indeed pre-existed the RFC by about 7 years: see&lt;/p&gt;
&lt;p&gt;Cerf, V., and R. Kahn, &amp;ldquo;A Protocol for Packet Network&lt;br /&gt;
Intercommunication&amp;rdquo;, IEEE Transactions on Communications,&lt;br /&gt;
Vol. COM-22, No. 5, pp 637-648, May 1974.&lt;/p&gt;
&lt;p&gt;for the first description by researchers Vince Cerf and R. Kahn. Then it went through about 8 incarnations at ARPA before it became an RFC. Almost all the fundamental internet technologies were developed as libraries/(=~applications/products) first then described later: it is not the blue-sky development method at all. It was the ISO OSI protocols that were developed based on blue sky thinking (a la Richar Ganriel&amp;rsquo;s The Right Way is the Wrong Way), to a large extent (or at least that is the mythology passed down.)&lt;/p&gt;
&lt;p&gt;Why should Open XML be an ISO standard? Well, my attitude is probably more &amp;ldquo;why shouldn&amp;rsquo;t it be?&amp;rdquo; ISO has to be fair, even to Microsoft. If a company can never win by playing the standards game, they never will; the trick is making sure that everyone wins. The basic reason something becomes a standard at ISO is that there is a market requirement for it: now if there isn&amp;rsquo;t a market requirement for MS to document and explain their formats it is hard to think of what else would pass the test&amp;hellip; Having Open XML will not stop the progress of ODF for public documents: the dynamics and sweet spots for both are too different.&lt;/p&gt;
&lt;p&gt;By marc on &lt;a href=&#34;#comment-974&#34;&gt;June 14, 2007 12:52 PM&lt;/a&gt;&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;If a company can never win by playing&lt;br /&gt;
the standards game, they never will&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;don&amp;rsquo;t be confused here:&lt;br /&gt;
&amp;ldquo;play the standards game&amp;rdquo; is not the same that&lt;br /&gt;
&amp;ldquo;game the standards system&amp;rdquo;&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-975&#34;&gt;June 14, 2007 1:41 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;They&amp;rsquo;re close enough&amp;ndash;you&amp;rsquo;d be hard-pressed to make a list of companies that play the game with no interest in gaming the system and another list of companies that game the system without playing the game.&lt;/p&gt;
&lt;p&gt;By Bruce on &lt;a href=&#34;#comment-1005&#34;&gt;June 27, 2007 9:30 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Rick wrote:&lt;/p&gt;
&lt;p&gt;&amp;ldquo;The basic reason something becomes a standard at ISO is that there is a market requirement for it: now if there isn&amp;rsquo;t a market requirement for MS to document and explain their formats it is hard to think of what else would pass the test&amp;hellip;&amp;rdquo;&lt;/p&gt;
&lt;p&gt;This seems to be the crux of the matter: what is &amp;ldquo;it&amp;rdquo; when we&amp;rsquo;re talking about standards?&lt;/p&gt;
&lt;p&gt;As I read through the ISO website looking for some hints, it seems to me the conclusion I come to is &amp;ldquo;it&amp;rdquo; in fact is an XML standard for office documents. On that basis, we already have one, and OOXML would be a competing standard, undermining the purpose of ISO.&lt;/p&gt;
&lt;p&gt;The notion that one should have two ISO standards for office document thus seems to require Rick&amp;rsquo;s more narrow reading of OOXML (that &amp;ldquo;it&amp;rdquo; documents a specific dominant product, and that this is the &amp;ldquo;market requirement&amp;rdquo;).&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2007">2007</category>
      
      <category domain="https://www.bobdc.com//categories/xml">XML</category>
      
    </item>
    
    <item>
      <title>Tutorial half-day added to Semantic Web Strategies conference</title>
      <link>https://www.bobdc.com/blog/tutorial-halfday-added-to-sema/</link>
      <pubDate>Sun, 03 Jun 2007 10:24:40 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/tutorial-halfday-added-to-sema/</guid>
      
      
      <description><div>A chance to learn about the semantic web from the ground up.</div><div>&lt;p&gt;&lt;a href=&#34;http://www.semanticwebstrategies.com/index.php&#34;&gt;&lt;img src=&#34;http://www.semanticwebstrategies.com/images/logo_SWS_hdr.gif&#34; alt=&#34;[Semantic Web Strategies logo]&#34; border=&#34;0&#34; align=&#34;right&#34; hspace=&#34;30px&#34; vspace=&#34;30px&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;m happy to announce that we&amp;rsquo;re adding a half-day of tutorials the day before the &lt;a href=&#34;http://www.semanticwebstrategies.com/&#34;&gt;Semantic Web Strategies&lt;/a&gt; speaker sessions begin on October 1st. (I wrote more about this conference in a &lt;a href=&#34;https://www.bobdc.com/blog/chairing-a-new-semantic-web-co&#34;&gt;recent posting&lt;/a&gt;.) This will give people who are just starting to check out the world of the semantic web a chance to get some basic background in it before hearing about the various projects that will be discussed during the conference proper. It will also give people who are experienced at giving classes in this material a chance to make some money!&lt;/p&gt;
&lt;p&gt;Contact me if you&amp;rsquo;re interested in giving one of the classes on September 30th. And of course, please &lt;a href=&#34;http://www.semanticwebstrategies.com/speak.php&#34;&gt;submit a proposal&lt;/a&gt; for a talk to give during the conference about where semantic web technology fits into your organization&amp;rsquo;s past, present, or future work.&lt;/p&gt;
&lt;h2 id=&#34;1-comments&#34;&gt;1 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.semantic-web.at/&#34; title=&#34;http://www.semantic-web.at/&#34;&gt;Andreas Blumauer&lt;/a&gt; on &lt;a href=&#34;#comment-946&#34;&gt;June 4, 2007 2:40 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Hi Bob,&lt;/p&gt;
&lt;p&gt;my name is Andreas and I´m a co-founder of Austria based &amp;ldquo;Semantic Web School&amp;rdquo;. We´re an experienced team in giving tutorials on different aspects of the semantic web. We started our business in 2004 and have introduced since that time around 150 people from different industries, different countries and organisations like Sun Microsystems, Siemens or Sony into the semantic web.&lt;/p&gt;
&lt;p&gt;So we can give a perfect overview over goals, strategies, use-cases and of course also technologies in the semantic web. In our tutorial we would like to address business needs as well as technological aspects.&lt;/p&gt;
&lt;p&gt;We are an independent organisation in a network of experts from well known software providers as well as universities.&lt;/p&gt;
&lt;p&gt;We would like to offer you a half day tutorial with the title &amp;ldquo;Semantic Web for CEOs and CTOs in a nutshell&amp;rdquo;.&lt;/p&gt;
&lt;p&gt;Best regards!&lt;br /&gt;
Andreas&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2007">2007</category>
      
      <category domain="https://www.bobdc.com//categories/semantic-web">semantic web</category>
      
    </item>
    
    <item>
      <title>Word 2003&#39;s awful XML for index elements</title>
      <link>https://www.bobdc.com/blog/word-2003s-awful-xml-for-index/</link>
      <pubDate>Thu, 31 May 2007 21:33:26 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/word-2003s-awful-xml-for-index/</guid>
      
      
      <description><div>My &#34;XML version of their RTF&#34; joke has become too real to be funny anymore.</div><div>&lt;p&gt;I&amp;rsquo;ve mostly watched the OpenOffice vs. Office Open XML debates as a spectator, but I have &lt;a href=&#34;http://www.xml.com/pub/a/2004/02/04/tr-xml.html&#34;&gt;dealt directly&lt;/a&gt; with OpenOffice XML with some nice results. I &lt;a href=&#34;https://www.bobdc.com/blog/word-2003-xml&#34;&gt;dabbled&lt;/a&gt; with Word&amp;rsquo;s XML a bit and found at least one nice surprise, but I hadn&amp;rsquo;t waded in too deeply until recently, and now that I have, I&amp;rsquo;m pretty disappointed. Basic paragraph markup is pretty messy, and the markup of index terms is awful.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://msdn2.microsoft.com/en-us/library/aa212812(office.11).aspx&#34;&gt;&lt;img src=&#34;https://www.bobdc.com/img/main/wordml.jpg&#34; alt=&#34;[Word ML logo]&#34; border=&#34;0&#34; align=&#34;right&#34; hspace=&#34;30px&#34; vspace=&#34;30px&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;w:p&lt;/code&gt; paragraph elements are split fairly arbitrarily into &lt;code&gt;w:r&lt;/code&gt; elements. A Microsoft &lt;a href=&#34;http://msdn2.microsoft.com/en-us/library/aa212812(office.11).aspx&#34;&gt;Overview of WordprocessingML&lt;/a&gt; tells us that &lt;code&gt;w:r&lt;/code&gt; stores &amp;ldquo;A contiguous set of WordprocessingML components with a consistent set of properties&amp;rdquo;. That&amp;rsquo;s all it tells us. Next to this definition is a link to a &lt;a href=&#34;http://msdn2.microsoft.com/en-us/library/aa223687(office.11).aspx&#34;&gt;special page for the r element&lt;/a&gt; that tells us nothing more about it but does tell us &amp;ldquo;For more information on this element, please refer to the VML Reference, located online in the Microsoft Developer Network (MSDN) Library&amp;rdquo;. It tells us this eleven times. Seriously. The &lt;code&gt;w:r&lt;/code&gt; elements are broken up, arbitrarily as far as I can tell, into &lt;code&gt;w:t&lt;/code&gt; elements, which &lt;a href=&#34;http://msdn2.microsoft.com/en-us/library/aa212812(office.11).aspx&#34;&gt;are defined as&lt;/a&gt; &amp;ldquo;a piece of text&amp;rdquo;. (Not like all those other XML elements!) I have to wonder what the famous six thousand pages of documentation for this format actually say.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;w:r&lt;/code&gt; and &lt;code&gt;w:t&lt;/code&gt; elements are annoying, but it&amp;rsquo;s not a lot of coding to just ignore them and concatenate their contents together. However, I don&amp;rsquo;t even want to try to write code that processes the XML version of index terms from a Word file. Here&amp;rsquo;s a sample, showing what happens when I inserted the index term &amp;ldquo;dogs&amp;rdquo; with a secondary term &amp;ldquo;beagles&amp;rdquo;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;lt;w:r&amp;gt;&amp;lt;w:fldChar w:fldCharType=&amp;quot;begin&amp;quot;/&amp;gt;&amp;lt;/w:r&amp;gt;
&amp;lt;w:r&amp;gt;&amp;lt;w:instrText&amp;gt; XE &amp;quot;&amp;lt;/w:instrText&amp;gt;&amp;lt;/w:r&amp;gt;
&amp;lt;w:proofErr w:type=&amp;quot;spellStart&amp;quot;/&amp;gt;
&amp;lt;w:r wsp:rsidRPr=&amp;quot;00DD6A97&amp;quot;&amp;gt;
  &amp;lt;w:instrText&amp;gt;dogs:beagles&amp;lt;/w:instrText&amp;gt;
&amp;lt;/w:r&amp;gt;
&amp;lt;w:proofErr w:type=&amp;quot;spellEnd&amp;quot;/&amp;gt;
&amp;lt;w:r&amp;gt;&amp;lt;w:instrText&amp;gt;&amp;quot;&amp;lt;/w:instrText&amp;gt;&amp;lt;/w:r&amp;gt;
&amp;lt;w:r&amp;gt;&amp;lt;w:fldChar w:fldCharType=&amp;quot;end&amp;quot;/&amp;gt;&amp;lt;/w:r&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;All of those &lt;code&gt;w:r&lt;/code&gt; elements are siblings of all the other &lt;code&gt;w:r&lt;/code&gt; elements in the same paragraph, so the only indication that the markup above is supposed to function as a single unit is the fact that one &lt;code&gt;w:r&lt;/code&gt; element&amp;rsquo;s &lt;code&gt;w:fldChar&lt;/code&gt; child (which &lt;a href=&#34;http://msdn2.microsoft.com/en-us/library/aa213346(office.11).aspx&#34;&gt;the documentation&lt;/a&gt; says &amp;ldquo;Represents a field-delimiting character&amp;rdquo;) has a &lt;code&gt;w:fldCharType&lt;/code&gt; of &amp;ldquo;begin&amp;rdquo; and another has a value of &amp;ldquo;end&amp;rdquo;. Since a test in a separate Word document shows that Word recognizes the words &amp;ldquo;dogs&amp;rdquo; and &amp;ldquo;beagles&amp;rdquo; as spelled properly but doesn&amp;rsquo;t recognize the string &amp;ldquo;dogs:beagles&amp;rdquo;, I&amp;rsquo;m guessing that the two &lt;code&gt;w:proofErr&lt;/code&gt; elements are there because after Word put my primary and secondary index terms together with a colon delimiter, it didn&amp;rsquo;t recognize what it saw as a properly spelled word and marked the string as a misspelled one.&lt;/p&gt;
&lt;p&gt;Looking at the original Word file, I suppose that the field-delimiting characters in question are curly braces, and the value of the &lt;code&gt;w:instrText&lt;/code&gt; element (which &lt;a href=&#34;http://msdn2.microsoft.com/en-us/library/aa172854(office.11).aspx&#34;&gt;the doc&lt;/a&gt; says &amp;ldquo;Represents field instruction content&amp;rdquo;) of &amp;rsquo; XE &amp;ldquo;&amp;rsquo; tells us that it&amp;rsquo;s an indexing field. (Of course the double quote isn&amp;rsquo;t part of that—it goes with the one after the second &lt;code&gt;w:proofErr&lt;/code&gt; element!)&lt;/p&gt;
&lt;p&gt;Has anyone written anything to parse through this mess? Some OpenOffice coders have written something to parse the original Word doc file and they represent same index tag with this single empty element:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;lt;text:alphabetical-index-mark text:string-value=&amp;quot;collies&amp;quot; text:key1=&amp;quot;dogs&amp;quot;/&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The primary and secondary terms are stored as separately addressable values in less than one-fourth the text that Word used for its markup, we don&amp;rsquo;t need to guess where the markup showing the index terms starts and ends, and as an added bonus, the single element containing this has the word &amp;ldquo;index&amp;rdquo; in its name. (And of course, OpenOffice didn&amp;rsquo;t create a misspelled &amp;ldquo;word&amp;rdquo; and then identify it as misspelled.)&lt;/p&gt;
&lt;p&gt;In the Word version, the idea of curly brace field delimiters around the index markup brought up the specter of a ghost, and soon the ghost was hovering in front of me, moaning and rattling chains. To try to learn more about &amp;ldquo;XE&amp;rdquo; as a field instruction, I did a &lt;a href=&#34;http://www.google.com/search?q=word%20xml%20index%20xe%20%22field%20instruction%22&#34;&gt;Google search for &amp;lsquo;word xml index xe &amp;ldquo;field instruction&amp;rdquo;&amp;rsquo;&lt;/a&gt; (after several other fruitless searches) and the first of the only three hits was the file Word2007RTFSpec9.doc at &lt;a href=&#34;http://download.microsoft.com&#34;&gt;http://download.microsoft.com&lt;/a&gt;: the current spec for the original nemesis of Word interoperability, &amp;ldquo;Rich&amp;rdquo; Text &amp;ldquo;Format&amp;rdquo;. (This spec didn&amp;rsquo;t help me much.) I&amp;rsquo;d often joked that WordML was just an XML version of RTF; now I recognize that it really is, at least for indexing markup. I&amp;rsquo;ll try to look at the good side: at least if you forget a single delimiter with WordML, loading the bad document won&amp;rsquo;t cause a freeze-up that requires a hard reboot of your machine, as it often did if you forgot a single curly brace in RTF generated by a script you were working on.)&lt;/p&gt;
&lt;p&gt;If that sequence of &lt;code&gt;w:r&lt;/code&gt; elements is Microsoft&amp;rsquo;s idea of a sensible standard for indexing markup, then they really don&amp;rsquo;t care about creating usable XML. Does anyone know of code out there that&amp;rsquo;s parsed Microsoft&amp;rsquo;s indexing XML to do something productive with it? My experiments all used Word 2003; has it been improved for the Word 2007 XML?&lt;/p&gt;
&lt;h2 id=&#34;3-comments&#34;&gt;3 Comments&lt;/h2&gt;
&lt;p&gt;By Bryan on &lt;a href=&#34;#comment-942&#34;&gt;June 1, 2007 4:34 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve been arguing this stuff for the past year. The main thing is that the construction of OpenXML is in such a way that surprisingly enough it has managed to produce a format that does not work well with any of the common stack of XML processing technologies: DOM, XSL-T, SAX are all hampered in one way or another by the design decisions of the format.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.grauw.nl/&#34; title=&#34;http://www.grauw.nl/&#34;&gt;Laurens Holst&lt;/a&gt; on &lt;a href=&#34;#comment-944&#34;&gt;June 2, 2007 1:03 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Isn’t putting human-readable language in attributes trouble? Think subscripts, superscripts, etc. That’s why having separate elements for them seems better to me.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-945&#34;&gt;June 2, 2007 4:01 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The question of storing a given piece of information in an element or an attribute is as old as SGML; see &lt;a href=&#34;http://xml.silmaril.ie/developers/attributes/&#34;&gt;http://xml.silmaril.ie/developers/attributes/&lt;/a&gt; for a summary. The convention in document-oriented XML (and an XML representation of a Word file is pretty document oriented!) is to put document content as PCDATA in elements and processing metadata in attribute values.&lt;/p&gt;
&lt;p&gt;Regardless, if you spread information about a single construct across multiple elements, there should be a container element to show that they all go together, making it easier for processes to know when they&amp;rsquo;ve reached the the beginning and end of such a construct. That&amp;rsquo;s much of the point of XML: start-tags and corresponding end-tags to show where things begin and end.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2007">2007</category>
      
      <category domain="https://www.bobdc.com//categories/xml">XML</category>
      
    </item>
    
    <item>
      <title>Chairing a new semantic web conference</title>
      <link>https://www.bobdc.com/blog/chairing-a-new-semantic-web-co/</link>
      <pubDate>Mon, 28 May 2007 11:40:24 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/chairing-a-new-semantic-web-co/</guid>
      
      
      <description><div>Come share your experiences!</div><div>&lt;p&gt;I&amp;rsquo;m very excited to announce a new semantic web conference, which I&amp;rsquo;ll be chairing: &lt;a href=&#34;http://www.semanticwebstrategies.com&#34;&gt;Semantic Web Strategies&lt;/a&gt;, which will be held in San Jose on October 1st and 2nd. &lt;a href=&#34;http://www.jupiterevents.com/&#34;&gt;Jupiterevents&lt;/a&gt;, a division of the venerable &lt;a href=&#34;http://www.jupitermedia.com/&#34;&gt;Jupitermedia&lt;/a&gt;, is doing all the infrastructure work of the conference, while I get to mostly stick to the fun parts.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://www.semanticwebstrategies.com/index.php&#34;&gt;&lt;img src=&#34;http://www.semanticwebstrategies.com/images/logo_SWS_hdr.gif&#34; alt=&#34;[Semantic Web Strategies logo]&#34; border=&#34;0&#34; align=&#34;right&#34; hspace=&#34;30px&#34; vspace=&#34;30px&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I &lt;a href=&#34;https://www.bobdc.com/blog/more-ways-to-make-money-from-t&#34;&gt;mentioned them&lt;/a&gt; when they first contacted me because they were also looking for someone to report on developments in the semantic web world. As I said then, it&amp;rsquo;s a good sign when organizations like theirs start paying attention to the field of semantic web work, because it means that they see some real growth ahead.&lt;/p&gt;
&lt;p&gt;There are already several conferences that address semantic web topics, and the presence of another pushes the field up the hype curve a little. What makes this conference different? I think its title distinguishes it from the &lt;a href=&#34;http://www.semantic-conference.com&#34;&gt;Semantic Technology Conference&lt;/a&gt;, which is probably the most comparable one out there—instead of focusing on the technology, this one will focus on semantic web strategies. We want to provide a forum for people to share stories about their past experiences, long-term plans, and short-term plans for semantic web-related applications and the relationship of these applications to their business plans. To string together some clichés, we want to hear lessons learned, war stories, success stories, and reports from the trenches about past, current, and future work.&lt;/p&gt;
&lt;p&gt;These stories don&amp;rsquo;t have to be about implemented projects directly tied to your organization&amp;rsquo;s business plan; they can be about skunkworks projects done under the radar (if you&amp;rsquo;ll pardon a few more clichés) that never got finished. Few projects are complete successes or complete failures, and assuming that yours falls in between, we want to hear about what worked, what didn&amp;rsquo;t, and what would have worked better if certain things had been in place—for example, tools or standards support that you didn&amp;rsquo;t see then but do now, or more support from certain areas of your organization that didn&amp;rsquo;t understand what you were doing. What decisions did you have to make? Which aspects turned out to be more, or less important than you originally anticipated?&lt;/p&gt;
&lt;p&gt;The metadata world has several examples of different groups working on similar projects with few connections to other groups doing related work, and as this work leads them to investigate semantic web tools, I&amp;rsquo;d like to have some panels that get representatives of these different groups together. For example, who&amp;rsquo;s working on metadata about images? the W3C&amp;rsquo;s &lt;a href=&#34;http://www.w3.org/2001/sw/BestPractices/MM/&#34;&gt;Multimedia Annotation in the Semantic Web Task Force&lt;/a&gt;, the &lt;a href=&#34;http://www.chin.gc.ca/English/Standards/metadata_multimedia.html&#34;&gt;museum industry&lt;/a&gt;, the &lt;a href=&#34;http://www.loc.gov/standards/mix/&#34;&gt;Library of Congress&lt;/a&gt;, Idealliance&amp;rsquo;s &lt;a href=&#34;http://www.prismstandard.org/&#34;&gt;PRISM&lt;/a&gt; activity with their DIM2 work for the publishing industry, the insurance industry, and more. They each have different business needs, but they need to track similar resources, and as each of these groups starts investigating semantic web work, they can find ways to take advantage of each other&amp;rsquo;s work to benefit all the groups. People ask me what kind of audience the conference is shooting for, and I think it will have the greatest benefit if we gather different people from different worlds: the financial industry, biology and pharmaceutical research, the government, the military, the publishing industry, and the corresponding academic fields for each of these.&lt;/p&gt;
&lt;p&gt;In your own semantic web work, what are you proudest of? What do you regret the most? What do you know now that you&amp;rsquo;d wish you known when you started this work? Come share what you&amp;rsquo;ve learned and learn from others by &lt;a href=&#34;http://www.semanticwebstrategies.com/speak.php&#34;&gt;submitting a proposal&lt;/a&gt;. (No paper required!) I&amp;rsquo;m really looking forward to this.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2007">2007</category>
      
      <category domain="https://www.bobdc.com//categories/semantic-web">semantic web</category>
      
    </item>
    
    <item>
      <title>Finding useful RDF data on the web</title>
      <link>https://www.bobdc.com/blog/finding-useful-rdf-data-on-the/</link>
      <pubDate>Tue, 22 May 2007 08:04:37 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/finding-useful-rdf-data-on-the/</guid>
      
      
      <description><div>At rdfdata.org or elsewhere.</div><div>&lt;p&gt;&lt;a href=&#34;http://www.rdfdata.org&#34;&gt;&lt;img src=&#34;http://www.rdfdata.org/img/logo32x32.png&#34; alt=&#34;[rdfdata.org logo]&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; width=&#34;80&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;My perennial rant that the world has too many ontologies and not enough useful data for those ontologies to describe goes back several years. At one point in 2004 I thought I&amp;rsquo;d look around the web for RDF data and compile a central list, and because the domain name &lt;a href=&#34;http://www.rdfdata.org&#34;&gt;rdfdata.org&lt;/a&gt; wasn&amp;rsquo;t taken, I grabbed it.&lt;/p&gt;
&lt;p&gt;I wasn&amp;rsquo;t interested in individual RSS 1.0 files, although I did create entries for &lt;a href=&#34;http://www.rdfdata.org/data.html#rss&#34;&gt;RSS collections&lt;/a&gt;. I also wasn&amp;rsquo;t interested in individual FOAF files, which were small and played a disproportionate role in discussions of the semantic web&amp;rsquo;s potential value at the time, but I did collect a list of &lt;a href=&#34;http://www.rdfdata.org/data.html#foaf&#34;&gt;larger FOAF resources&lt;/a&gt;. I created an RSS feed (1.0, of course) to announce new additions and added a &lt;a href=&#34;http://www.rdfdata.org/wiki/index.cgi?RDFDataContributions&#34;&gt;Wiki&lt;/a&gt; to the site for people to offer suggestions, but it usually got hijacked.&lt;/p&gt;
&lt;p&gt;After a few months, it became increasingly difficult to find new entries to post. A typical 40 minute search using automated Google API scripts—more on these below—might turn up nothing new besides a file that some student submitted with a school project, so I decided to give up. For my final entry on April Fool&amp;rsquo;s Day, 2005, I posted a link to an RDF file of information on available Elvis impersonators that I had created myself by scraping a &lt;a href=&#34;http://www.gigmasters.com/ElvisImpersonator/ElvisImpersonator.asp&#34;&gt;booking agency&amp;rsquo;s website&lt;/a&gt;. (When I submitted the &amp;ldquo;Database of Elvis Impersonators&amp;rdquo; to BoingBoing, Xeni Jardin actually &lt;a href=&#34;http://www.boingboing.net/2005/03/17/database_of_elvis_im.html&#34;&gt;wrote it up&lt;/a&gt; and credited me.) If you live in the U.S., just try to resist going to the booking agency&amp;rsquo;s website and doing a query for your city and state. (Canadians &lt;a href=&#34;http://www.gigmasters.com/ElvisImpersonator/ElvisImpersonator_Canada.asp&#34;&gt;have a page&lt;/a&gt; to query their city and province for Elvis impersonators, but all the pages seem to list the same five or six Americans who are apparently willing to travel pretty far to do their act.)&lt;/p&gt;
&lt;p&gt;I created a single &lt;a href=&#34;http://www.rdfdata.org/dat/rdfdata.rdf&#34;&gt;RDF file listing all the RDF sources&lt;/a&gt;, which may be useful to anyone looking for sample data. Of course, many of its references are now out of date.&lt;/p&gt;
&lt;p&gt;If you&amp;rsquo;re interested in the geekier details of how I tried to automate my searches for RDF files, read on.&lt;/p&gt;
&lt;h2 id=&#34;q0ZKA7AURxuh8fsbFtKk3w&#34;&gt;The scripts&lt;/h2&gt;
&lt;p&gt;I based the script that did the actual queries on googly.pl from O&amp;rsquo;Reilly&amp;rsquo;s &amp;ldquo;Google Hacks&amp;rdquo; book, which is &lt;a href=&#34;http://www.oreillynet.com/pub/h/2754&#34;&gt;available on the web&lt;/a&gt;. (Although mine is also a perl script, I renamed it &lt;a href=&#34;http://www.rdfdata.org/dat/findBigRDF.txt&#34;&gt;findBigRDF.txt&lt;/a&gt; for downloading purposes.) It&amp;rsquo;s pretty heavily commented, so it should be self-explanatory. Google allows up to 1000 queries a day with a given API key, so I set it to do less than that. After each query it loops through the results and ignores certain ones that I knew came up a lot. Its current state is the result of a lot of evolution as I had various ideas about finding RDF data files.&lt;/p&gt;
&lt;p&gt;Each day I would think of some query terms, write a batch file with lines like this, and run it:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;perl findbigRDF.pl dc:creator   &amp;gt; findbigrdf.out
perl findbigRDF.pl dublin   &amp;gt;&amp;gt; findbigrdf.out
perl findbigRDF.pl subject   &amp;gt;&amp;gt; findbigrdf.out
perl findbigRDF.pl bioinformatics   &amp;gt;&amp;gt; findbigrdf.out
perl findbigRDF.pl gene   &amp;gt;&amp;gt; findbigrdf.out
perl findbigRDF.pl chromosome   &amp;gt;&amp;gt; findbigrdf.out
perl findbigRDF.pl commons   &amp;gt;&amp;gt; findbigrdf.out
perl findbigRDF.pl URL   &amp;gt;&amp;gt; findbigrdf.out
perl findbigRDF.pl topic   &amp;gt;&amp;gt; findbigrdf.out
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Then I&amp;rsquo;d sort the results of findbigrdf.out and run a script (python this time; the above script is perl because I found a perl Google API script to use as a model faster than I could find a python equivalent) to compare the results against URLs that were in my existing collection or in a notGoodURLs.xml file that I had also accumulated. Hopefully something interesting popped out at the end, but like I said, the results became skimpier and skimpier over time.&lt;/p&gt;
&lt;p&gt;I have no intention of adding any new entries to rdfdata.org, but web server logs show that the site is still surprisingly popular, so I wanted to write this up to give people some leads on existing RDF out there and some tools for finding more. And to everyone who who made suggestions about resources to list on the website, I just wanted to say thank you, thank you very much.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://www.gigmasters.com/ElvisImpersonator/ElvisImpersonator.asp&#34;&gt;&lt;img src=&#34;http://www.gigmasters.com/images/musicians/12430.jpg&#34; alt=&#34;[Elvis impersonator]&#34; border=&#34;0&#34; class=&#34;rightAlignedOpeningPicture&#34; width=&#34;200&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2007">2007</category>
      
      <category domain="https://www.bobdc.com//categories/rdf">RDF</category>
      
    </item>
    
    <item>
      <title>Keyboards for breakfast</title>
      <link>https://www.bobdc.com/blog/keyboards-for-breakfast/</link>
      <pubDate>Thu, 17 May 2007 07:57:57 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/keyboards-for-breakfast/</guid>
      
      
      <description><div>But would you want to pour syrup on it?</div><div>&lt;p&gt;My standard joke about not being able to program VCRs, my phone, etc. is that without a QWERTY keyboard I don&amp;rsquo;t know where to start. Finally, I can &lt;a href=&#34;http://www.treehugger.com/files/2007/05/my_type_of_appl.php&#34;&gt;program my breakfast&lt;/a&gt;!&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://www.treehugger.com/files/2007/05/my_type_of_appl.php&#34;&gt;&lt;img src=&#34;http://www.treehugger.com/joeyroth/waffleiron3.jpg&#34; alt=&#34;[keyboard waffles]&#34; border=&#34;0&#34; hspace=&#34;30px&#34; vspace=&#34;30px&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Years ago I took a darkroom course at this &lt;a href=&#34;http://www.schoolofvisualarts.edu/&#34;&gt;same institution&lt;/a&gt; and it was great.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2007">2007</category>
      
      <category domain="https://www.bobdc.com//categories/miscellaneous">miscellaneous</category>
      
    </item>
    
    <item>
      <title>Semantic Web project ideas number 5</title>
      <link>https://www.bobdc.com/blog/semantic-web-project-ideas-num-4/</link>
      <pubDate>Mon, 14 May 2007 09:34:37 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/semantic-web-project-ideas-num-4/</guid>
      
      
      <description><div>Use an existing ontology to make a web store easier to use.</div><div>&lt;p&gt;I was tempted to call this &amp;ldquo;Semantic Web project idea number 4a&amp;rdquo;, because it&amp;rsquo;s not a big leap from my &lt;a href=&#34;https://www.bobdc.com/blog/semantic-web-project-ideas-num-3&#34;&gt;last one&lt;/a&gt;. Perhaps if I generalize the idea more it will sound separate enough, but as you&amp;rsquo;ll see, my example builds on the last example.&lt;/p&gt;
&lt;p&gt;A big theme of semantic web evangelism is the value of combining multiple web resources into something greater than the sum of their parts, especially when one of those resources is in RDF. As I mentioned before, using this technology to help people find products they need is a great basis for an application, because in addition to benefiting the users it can earn a commission for the developer. Building an ontology from a taxonomy and then combining it with other resources has a lot of potential, but if we focus on the quest for tunes, there&amp;rsquo;s already at least one ontology that looks ready to use, and using an existing ontology for such an application would be a great demonstration of the value of the semantic web.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://metabrainz.org&#34;&gt;&lt;img src=&#34;http://metabrainz.org/images/mb-banner.png&#34; alt=&#34;[MusicBrainz logo]&#34; border=&#34;0&#34; align=&#34;right&#34; hspace=&#34;30px&#34; vspace=&#34;30px&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;For an online music store to incorporate into an application, &lt;a href=&#34;http://www.amazon.com/music-rock-classical-pop-jazz/b/ref=gw_br_mu/002-0807742-3960021?%5Fencoding=UTF8&amp;amp;node=5174&amp;amp;pf_rd_m=ATVPDKIKX0DER&amp;amp;pf_rd_s=left-nav-1&amp;amp;pf_rd_r=01JR4FXT9S2YB5NEENC6&amp;amp;pf_rd_t=101&amp;amp;pf_rd_p=285525001&amp;amp;pf_rd_i=507846&#34;&gt;Amazon&lt;/a&gt; is the most obvious choice, because they have an API and because they bought cdnow.com. Other obvious choices are &lt;a href=&#34;http://www.apple.com/itunes/&#34;&gt;iTunes&lt;/a&gt; and my new favorite, &lt;a href=&#34;http://www.jdoqocy.com/click-1973330-10364616&#34;&gt;emusic&lt;/a&gt;. (You won&amp;rsquo;t find the latest Justin Timberlake or Fergie hits there, but they have tons of great stuff from a wide variety of categories, and you get 30 DRM-free MP3s a month for $9.99 or 50 for $14.99 after the 25 free songs you get for signing up.)&lt;/p&gt;
&lt;p&gt;My first thought for a public resource to use for such an application was &lt;a href=&#34;http://musicbrainz.org/&#34;&gt;MusicBrainz&lt;/a&gt;, an open content database of album information, because I knew that they have an &lt;a href=&#34;http://wiki.musicbrainz.org/RDF&#34;&gt;RDF interface&lt;/a&gt;. It turns out that the RDF interface has been deprecated in favor of a &lt;a href=&#34;http://wiki.musicbrainz.org/WebService&#34;&gt;REST web service&lt;/a&gt;, which would still be invaluable to someone who wants to use a music database with a public API to help people find music.&lt;/p&gt;
&lt;p&gt;A better place to start, though, especially for a semweb geek, would be Frédérick Giasson and Yves Raimond&amp;rsquo;s &lt;a href=&#34;http://musicontology.com/&#34;&gt;Music Ontology Specification&lt;/a&gt;. They&amp;rsquo;ve even defined properties for &lt;a href=&#34;http://musicontology.com/#term_musicbrainz&#34;&gt;musicbrainz&lt;/a&gt;, &lt;a href=&#34;http://musicontology.com/#term_amazon_asin&#34;&gt;amazon&lt;/a&gt;, &lt;a href=&#34;http://musicontology.com/#term_myspace&#34;&gt;myspace&lt;/a&gt;, and other web-related music resources, which should make it easier to define relationships between these resources to connect information.&lt;/p&gt;
&lt;p&gt;Much of what I wrote in my last posting still applies here, especially the value of Amazon&amp;rsquo;s existing taxonomy as a resource. The advantages of working with an existing ontology instead of building one from a taxonomy should be obvious, but remember that many of the if-you-build-it-they-will-come OWL ontologies out there haven&amp;rsquo;t been applied to much real world data. By doing so yourself, you&amp;rsquo;ll be in a position to deliver feedback to the ontology designers that should be reflected as the ontology evolves. The Music Ontology has a &lt;a href=&#34;http://groups.google.com/group/music-ontology-specification-group&#34;&gt;Google discussion group&lt;/a&gt;, which will be handy for this.&lt;/p&gt;
&lt;p&gt;If you build an application around this ontology and some music, I think it would be easier to create some real value if you aim for depth more than breadth. Pick a category of music that you know and care about and create something for people who are interested but know less than you do about that music.&lt;/p&gt;
&lt;p&gt;Of course, your application doesn&amp;rsquo;t have to be about music. Find one or more related ontologies and one or more related e-commerce sites and build an application that shows that the former can add value to the latter. Perhaps, if you&amp;rsquo;re behind the development of one of these ontologies, you owe it to your ontology&amp;rsquo;s potential users to do such a project, if only on a small scale, to show that your work is more than an academic exercise.&lt;/p&gt;
&lt;p&gt;Using an existing ontology to make an e-commerce site easier to navigate will prove the potential value of the semantic web far more than any &amp;ldquo;Web n&amp;rdquo; essays for which n &amp;gt; 2.&lt;/p&gt;
&lt;h2 id=&#34;1-comments&#34;&gt;1 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-896&#34;&gt;May 18, 2007 9:21 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;On this topic, don&amp;rsquo;t miss Frederic&amp;rsquo;s &lt;a href=&#34;http://fgiasson.com/blog/index.php/2007/04/17/musicbrainz-relation-database-mapped-in-rdf-using-the-music-ontology/&#34;&gt;Musicbrainz Relation Database mapped in RDF using the Music Ontology&lt;/a&gt;.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2007">2007</category>
      
      <category domain="https://www.bobdc.com//categories/project-ideas">project ideas</category>
      
    </item>
    
    <item>
      <title>BoingBoing goofing on ontology designers</title>
      <link>https://www.bobdc.com/blog/boingboing-goofing-on-ontology/</link>
      <pubDate>Thu, 10 May 2007 17:32:26 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/boingboing-goofing-on-ontology/</guid>
      
      
      <description><div>Monotonicity constraints? Hilarious!</div><div>&lt;p&gt;I stopped reading BoingBoing some time ago, but my co-worker &lt;a href=&#34;http://drmacros-xml-rants.blogspot.com/&#34;&gt;Eliot Kimber&lt;/a&gt; just pointed me to a &lt;a href=&#34;http://www.boingboing.net/2007/05/10/pedantic_overanalysi.html&#34;&gt;BoingBoing item&lt;/a&gt; that makes fun of obsessive ontological design. It&amp;rsquo;s not every day that you see a &lt;a href=&#34;http://protege.stanford.edu/&#34;&gt;Protégé&lt;/a&gt; screenshot or references to monotonicity constraints in a BoingBoing humor piece. Ironically, the idea of classifying cute cats could get my younger daughter interested in using Protégé, but she&amp;rsquo;d probably be better off with &lt;a href=&#34;http://www.mindswap.org/2004/SWOOP/&#34;&gt;SWOOP&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;And, &lt;a href=&#34;http://www.catsthatlooklikehitler.com/cgi-bin/seigmiaow.pl&#34;&gt;CatThatLooksLikeHitler&lt;/a&gt; is my new favorite class name.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2007">2007</category>
      
      <category domain="https://www.bobdc.com//categories/rdf/owl">RDF/OWL</category>
      
    </item>
    
    <item>
      <title>Good XSLT advice</title>
      <link>https://www.bobdc.com/blog/good-xslt-advice/</link>
      <pubDate>Thu, 10 May 2007 10:23:48 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/good-xslt-advice/</guid>
      
      
      <description><div>From two of the leading experts.</div><div>&lt;p&gt;Usually, when a tech friend starts a weblog, I try to say &amp;ldquo;hey, check it out&amp;rdquo; here, but I was a little behind on my news when I found out about Jeni Tennison&amp;rsquo;s blog, and plenty of other people pointed to it, and it was already included in the Planet XML feed, so I thought that everyone who should know about it already would. But she just keeps delivering solid, useful, XSLT advice, so a few weeks late, I&amp;rsquo;ll say it: if you write many XSLT stylesheets, you owe it to yourself to read &lt;a href=&#34;http://www.jenitennison.com/blog/&#34;&gt;Jeni&amp;rsquo;s musings&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Of course you&amp;rsquo;ll also want to read Michael Kay&amp;rsquo;s &lt;a href=&#34;http://saxonica.blogharbor.com/blog&#34;&gt;Saxon diaries&lt;/a&gt;. He usually writes more from his perspective as an XSLT implementer, so it&amp;rsquo;s valuable the same way that studying compilers is valuable even if you&amp;rsquo;ll never write a compiler, because you have a better idea of what the computer does with these instructions that you write. Jeni&amp;rsquo;s focus on common stylesheet design questions (for example, match templates or named templates?) will be helpful to an even broader range of stylesheet authors.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2007">2007</category>
      
      <category domain="https://www.bobdc.com//categories/xslt">XSLT</category>
      
    </item>
    
    <item>
      <title>Semantic Web project ideas number 4</title>
      <link>https://www.bobdc.com/blog/semantic-web-project-ideas-num-3/</link>
      <pubDate>Mon, 30 Apr 2007 09:01:52 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/semantic-web-project-ideas-num-3/</guid>
      
      
      <description><div>Build an ontology and rules around a working taxonomy—and maybe make some money!</div><div>&lt;p&gt;Can a taxonomy help you buy a lightbulb? I didn&amp;rsquo;t think so, but when Ron Daniel of &lt;a href=&#34;http://www.taxonomystrategies.com&#34;&gt;Taxonomy Strategies&lt;/a&gt; told me how they helped a big box hardware store with the product taxonomy that drove their online store&amp;rsquo;s menus, I realized that taxonomies aren&amp;rsquo;t just for classifying content, as my publishing technology bias had led me to believe.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://www.amazon.com/s/ref=sr_nr_n_2/002-0807742-3960021?ie=UTF8&amp;amp;rh=n:4285,n:4959&#34;&gt;&lt;img src=&#34;http://ec1.images-amazon.com/images/I/51bxiMtPDiL._AA240_.jpg&#34; alt=&#34;[Cooking Apicius cover]&#34; border=&#34;0&#34; align=&#34;right&#34; hspace=&#34;30px&#34; vspace=&#34;30px&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;A taxonomy organizes concepts and puts them into a hierarchical relationship. An ontology lets you define whatever relationships you want between any concepts in your collection, and a taxonomy is a great head start to creating an ontology.&lt;/p&gt;
&lt;p&gt;If you only consider online stores, there are still a lot of taxonomies out there. Amazon is an obvious one, and the &lt;a href=&#34;http://www.browsenodes.com/&#34;&gt;BrowseNodes.com&lt;/a&gt; website (&lt;a href=&#34;http://www.browsenodes.com/node-283155.html&#34;&gt;Amazon book page&lt;/a&gt;) offers a machine-readable summary of Amazon&amp;rsquo;s taxonomy with lines like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Subjects|283155|1000
History|1000|9
Europe|9|4935
Italy|4935|4959
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;In the fourth line above, 4959 is the node number and 4935 is the parent node&amp;rsquo;s number, showing that 4959 is for Italian History. For a URL to represent a taxonomy node, add the number to the URL stub &lt;a href=&#34;http://www.amazon.com/exec/obidos/tg/browse/-/&#34;&gt;http://www.amazon.com/exec/obidos/tg/browse/-/&lt;/a&gt;. For example, &lt;a href=&#34;http://www.amazon.com/exec/obidos/tg/browse/-/4959&#34;&gt;http://www.amazon.com/exec/obidos/tg/browse/-/4959&lt;/a&gt; is the URL that represents Italian History, and following that URL takes you to a page full of books on this topic. Amazon also lets you do a boolean AND operation on taxonomy nodes in your URLs, so that with 4285 being the node for Italian Cooking, the URL &lt;a href=&#34;http://www.amazon.com/s/ref=sr_nr_n_2/002-0807742-3960021?ie=UTF8&amp;amp;rh=n:4285,n:4959&#34;&gt;http://www.amazon.com/s/ref=sr_nr_n_2/002-0807742-3960021?ie=UTF8&amp;amp;rh=n:4285,n:4959&lt;/a&gt; takes you to a list of books that fall in both the Italian Cooking and the Italian History categories and &lt;a href=&#34;http://www.amazon.com/s/ref=sr_nr_n_2/002-0807742-3960021?ie=UTF8&amp;amp;rh=n:3870,n:3957&#34;&gt;http://www.amazon.com/s/ref=sr_nr_n_2/002-0807742-3960021?ie=UTF8&amp;amp;rh=n:3870,n:3957&lt;/a&gt; shows books that fall in both the &amp;ldquo;C and C++&amp;rdquo; category and the Algorithms category.&lt;/p&gt;
&lt;p&gt;Because you can reference the nodes of this taxonomy with URLs, many pieces are in place to build an RDF/OWL ontology and even rules around the nodes. Why would you want to? In Amazon&amp;rsquo;s case, an ontology of relationships between product category nodes can help people find products that they&amp;rsquo;re interested in more easily. As a bonus, it might even make you some money.&lt;/p&gt;
&lt;p&gt;Of course, once you&amp;rsquo;ve located a product that you&amp;rsquo;re interested in on Amazon or another online store, they have algorithms and data in place to identify related products that you might like. The &amp;ldquo;Look for similar items by category&amp;rdquo; section at the bottom of an Amazon book page does this to some extent, sometimes even linking to multiple points in their subject taxonomy. For the &lt;a href=&#34;http://www.amazon.com/Lidias-Italy-Simple-Delicious-Recipes/dp/1400040361/ref=sr_1_1/002-0807742-3960021?ie=UTF8&amp;amp;s=books&amp;amp;qid=1177687356&amp;amp;sr=1-1&#34;&gt;Lidia&amp;rsquo;s Italy&lt;/a&gt; cookbook, this section has links to the sections on Italian Cooking, Italian History, and travel in Italy. Can semantic web technologies augment this attempt to find other products that the user might want? The data and tools to prove it are all freely available.&lt;/p&gt;
&lt;p&gt;Above you saw examples of URLs that take you right to Amazon pages for specific categories of books. If you build a tool that uses an ontology to help people find books and other products that they might like, a slight tweak to the URL that sends people from your tool to Amazon&amp;rsquo;s web site earns you a commission on anything they buy once they get there, whether it&amp;rsquo;s on the page you sent them to or not. I&amp;rsquo;ve &lt;a href=&#34;https://www.bobdc.com/blog/creating-an-affiliate-website&#34;&gt;written before&lt;/a&gt; about the Amazon Associates program, but I only recently learned that URLs with your associate ID will earn you a commission even when you send them to product category pages. For example, if you want to buy a book about Italian history, I&amp;rsquo;d rather that you followed &lt;a href=&#34;http://www.amazon.com/exec/obidos/redirect?link_code=ur2&amp;amp;camp=1789&amp;amp;tag=bobducharmeA&amp;amp;creative=9325&amp;amp;path=tg/browse/-/4959&#34;&gt;this link&lt;/a&gt; than the one shown earlier, because I&amp;rsquo;ll make a commission from it.&lt;/p&gt;
&lt;p&gt;Plenty of other retail websites have both taxonomies and affiliate programs. Your ontology doesn&amp;rsquo;t have to limit itself to products on just one of these sites; you could link products from multiple sites.&lt;/p&gt;
&lt;p&gt;Semantic Web evangelism often describes the existence of a business potential for this technology, but rarely on a scale that can be implemented by one person with free tools. I think that there are a lot of possibilities here.&lt;/p&gt;
&lt;h2 id=&#34;2-comments&#34;&gt;2 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.cs.umd.edu/~hendler&#34; title=&#34;http://www.cs.umd.edu/~hendler&#34;&gt;Jim Hendler&lt;/a&gt; on &lt;a href=&#34;#comment-826&#34;&gt;April 30, 2007 2:56 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Great post &amp;ndash; there is a bias towards &amp;ldquo;big&amp;rdquo; ontologies, largely because small ones on the Web are relatively new. There&amp;rsquo;s a lot of potential in this area - and managing Web sites and the like is one of them &amp;ndash; take a look at &lt;a href=&#34;http://www.w3.org/TR/webont-req/&#34;&gt;http://www.w3.org/TR/webont-req/&lt;/a&gt; - the OWL Use Case and Requirements document at W3C for some other suggestions - a bit out of date, but still has lots of good ideas.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-827&#34;&gt;April 30, 2007 3:39 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Thanks Jim! Speaking of &amp;ldquo;small&amp;rdquo; ontologies, my idea might be even more practical for someone who limits their work to a subtree of the Amazon book taxonomy that is related to their area of interest&amp;ndash;e.g. Italy and related, programming languages, certain branches of law&amp;ndash;because this person would be more qualified to 1. identify useful new connections between nodes and 2. get word of their new tool&amp;rsquo;s existence to an audience that would be interested in using it.&lt;/p&gt;
&lt;p&gt;Bob\&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2007">2007</category>
      
      <category domain="https://www.bobdc.com//categories/project-ideas">project ideas</category>
      
    </item>
    
    <item>
      <title>Semantic Web project ideas number 3</title>
      <link>https://www.bobdc.com/blog/semantic-web-project-ideas-num-2/</link>
      <pubDate>Fri, 20 Apr 2007 08:53:59 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/semantic-web-project-ideas-num-2/</guid>
      
      
      <description><div>Planning those enterprise resources.</div><div>&lt;p&gt;For part 3 of my &lt;a href=&#34;http://www.snee.com/bobdc.blog/metadata/semantic_web/project_ideas/&#34;&gt;series on semantic web project ideas&lt;/a&gt;, I was tempted to take &lt;a href=&#34;https://www.bobdc.com/blog/semantic-web-project-ideas-num-1&#34;&gt;part 2&lt;/a&gt; and do a global replace of &amp;ldquo;ERP&amp;rdquo; for &amp;ldquo;CRM&amp;rdquo;. I&amp;rsquo;ll briefly recap what a semantic web add-on to an open source Enterprise Resource Planning package would have in common with a similar add-on to an open source Customer Relationship Management package:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;I didn&amp;rsquo;t know that open source packages for this existed until I read the Tapscott/Williams book.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;It&amp;rsquo;s about integration across organization boundaries, so semweb technology should have plenty to offer.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;When researching existing open source packages, look for discussion forums of users and developers to:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Get ideas for what semweb technologies can add&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Find a user with real data that you can work with (maybe for money!)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Get some idea about how easy or difficult the code is to work with&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For the basics of what ERP is about, Wikipedia&amp;rsquo;s &lt;a href=&#34;http://en.wikipedia.org/wiki/Enterprise_resource_planning&#34;&gt;Enterprise resource planning&lt;/a&gt; page seems like a good overview. It describes ERP as an integration of manufacturing, supply chain, financial, CRM and warehouse management systems. Considering what the &amp;ldquo;R&amp;rdquo; in &amp;ldquo;ERP&amp;rdquo; stands for, semweb tools for tracking data about resources and finding new connections and relationships between them should have plenty to offer for such systems.&lt;/p&gt;
&lt;p&gt;A read-write interface might be too ambitious to start with, but wrapping SPARQL interfaces around a few ERP subsystems would be a great beginning. (Make sure to include a few queries that demonstrate productive tasks that couldn&amp;rsquo;t be done without the SPARQL interface; the interface alone isn&amp;rsquo;t enough. You want people who&amp;rsquo;ve never heard of SPARQL to say &amp;ldquo;I want that! I need that!&amp;rdquo;) There are &lt;a href=&#34;http://www.google.com/search?q=%22open%20source%22%20erp&#34;&gt;plenty of open source ERP&lt;/a&gt; packages out there—some even linked to CRM products from the same organizations—so there&amp;rsquo;s plenty to work with. One has the intriguing name of &lt;a href=&#34;http://tinyerp.com/&#34;&gt;Tiny ERP&lt;/a&gt;, and &lt;a href=&#34;http://www.tinyerp.com/download/stable/source/&#34;&gt;its source code&lt;/a&gt; is available in two gzipped files: one of client code and one of server code.&lt;/p&gt;
&lt;p&gt;Google searches for &lt;a href=&#34;http://www.google.com/search?q=erp%20ontology&#34;&gt;ERP ontology&lt;/a&gt; get a few interesting hits. There&amp;rsquo;s probably nothing ready to use as-is, but they look like a good head start. As with CRM semantic web development, I&amp;rsquo;d rather see someone combine iterative additions to an ontology with demos of working code than doing lots and lots of ontology work before building something that actually uses the ontology.&lt;/p&gt;
&lt;p&gt;In a comment to my posting on the possibilities of adding semantic web features to CRM systems, John Cowan pointed out that most open source CRM systems won&amp;rsquo;t completely let you take their code and run with it, but instead only let you re-use it if you distribute your new version with the original developer&amp;rsquo;s branding. This isn&amp;rsquo;t ideal, but it still offers opportunities—if you can get at the package&amp;rsquo;s source, you can add new features, such as the exposure of the package&amp;rsquo;s data through a SPARQL endpoint. I think it will be worth it, because a successful implementation could demonstrate the value of semweb technology to a lot of people who would really take notice.&lt;/p&gt;
&lt;h2 id=&#34;2-comments&#34;&gt;2 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.openlinksw.com/blog/~kidehen&#34; title=&#34;http://www.openlinksw.com/blog/~kidehen&#34;&gt;Kingsley Idehen&lt;/a&gt; on &lt;a href=&#34;#comment-795&#34;&gt;April 20, 2007 5:32 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Bob.,&lt;/p&gt;
&lt;p&gt;I don&amp;rsquo;t know if you are aware of the &lt;a href=&#34;http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData&#34;&gt;Linking Open Data project&lt;/a&gt;? Anyway, we are using this community to drive through work in this area (note the &lt;a href=&#34;http://esw.w3.org/topic/TaskForces/CommunityProjects/LinkingOpenData/THALIATestbed&#34;&gt;THALIA integration benchmark&lt;/a&gt; effort).&lt;/p&gt;
&lt;p&gt;We already have an eCRM system that is part of &lt;a href=&#34;http://virtuoso.openlinksw.com/wiki/main/Main/OdsIndex&#34;&gt;ODS&lt;/a&gt;. The product is currently unreleased becuase we haven&amp;rsquo;t found a shared ontology that models the CRM domain. Anyway, we are not only working on RDF and SPARQL access to our eCRM effort. We are also looking to implement this atop Open Source CRM systems like SugarCRM. The same goals apply to ERP systems (of the Open Source variety).&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-796&#34;&gt;April 20, 2007 5:51 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Kingsley,&lt;/p&gt;
&lt;p&gt;Great to hear! And I &lt;a href=&#34;https://www.bobdc.com/blog/semantic-web-project-ideas-num#uuMXH1_SRwWQ-RzX3t-SsA&#34;&gt;saw this coming&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Bob&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2007">2007</category>
      
      <category domain="https://www.bobdc.com//categories/project-ideas">project ideas</category>
      
    </item>
    
    <item>
      <title>Tech</title>
      <link>https://www.bobdc.com/blog/tech/</link>
      <pubDate>Wed, 18 Apr 2007 10:51:35 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/tech/</guid>
      
      
      <description><div>UVA&#39;s country cousin.</div><div>&lt;p&gt;While in Dallas on business recently, I heard the sports news mention a &amp;ldquo;tech&amp;rdquo; basketball game, and I thought &amp;ldquo;What do they care about Virginia Tech?&amp;rdquo; Of course, they meant Texas Tech, but where I live, it means Virginia Tech.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://www.vt.edu/&#34;&gt;&lt;img src=&#34;http://upload.wikimedia.org/wikipedia/en/9/99/HokieHockeyBird.jpg&#34; alt=&#34;[hockey Hokie]&#34; border=&#34;0&#34; align=&#34;right&#34; hspace=&#34;30px&#34; vspace=&#34;30px&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I live in Charlottesville, about two hours northeast of Blacksburg. Charlottesville is the home of the University of Virginia, where looking down on Tech as the goofy country cousin of &amp;ldquo;Mr. Jefferson&amp;rsquo;s University&amp;rdquo; has been a popular pastime for years. You can even buy &lt;a href=&#34;http://www.mincers.com/php-bin/ecomm4/products.php?category_id=&amp;amp;product_id=622&amp;amp;prev_id=&amp;amp;next_id=572&#34;&gt;T-shirts&lt;/a&gt; in UVA book and souvenir stores showing Tech students as gap-toothed hillbillies. Any football or basketball games between the two schools is a big deal; they even have a trophy for the most recent winner of the football game.&lt;/p&gt;
&lt;p&gt;Plenty of people in Charlottesville went to and root for Tech, and any kid here who wants to grow up to be a veterinarian wants to go there, because they&amp;rsquo;re famous for that. One of my daughters knows an undergraduate there studying &amp;ldquo;animal science,&amp;rdquo; as they call it, right now. The guy who cuts down any dangerously leaning trees in our yard quoted Tech forestry research to me when we were discussing whether it&amp;rsquo;s better to cut a branch off flush with the tree. I&amp;rsquo;ve never been to the campus, but my wife and daughters have when attending 4H-related horse events. Through one of those &amp;ldquo;whatever happened to&amp;rdquo; Google searches, I recently found that an old New York friend who was originally from Cleveland is now a sociology professor at Tech, and I&amp;rsquo;d been meaning to go with my family to one of the horse things and try to hook up with him.&lt;/p&gt;
&lt;p&gt;When my UVA law school graduate wife says that she&amp;rsquo;d rather see our daughters go to UVA than Tech—and she does this often—I usually add &amp;ldquo;unless they want to study computer science.&amp;rdquo; In the four years I&amp;rsquo;ve lived here, geeky news sites such as Slashdot or reddit have mentioned interesting projects at Tech several times, and I&amp;rsquo;ve never seen UVA mentioned there. Many of the victims Monday were computer science students.&lt;/p&gt;
&lt;p&gt;Two weeks ago, while listening in the car to a solo acoustic 1971 version of &amp;ldquo;Ohio&amp;rdquo; on the new Neil Young live in Massey Hall album, I tried to explain something about the Kent State shootings to my older daughter. I told her how on the one hand it was soldiers shooting students, but on the other hand it was twenty-ish Ohio kids shooting twenty-ish Ohio kids, and what an awful landmark it was in America&amp;rsquo;s relationship to its war in Vietnam. (I once heard guitar player Joe Walsh, a student there at the time, describe how he had many friends among both the students and the local National Guard.) I told her that while I&amp;rsquo;m sure it&amp;rsquo;s still a good school that many kids aspire to attend, for many Americans the simple name of the school will always conjure up the shooting and &lt;a href=&#34;http://en.wikipedia.org/wiki/Image:Kent_State_massacre.jpg&#34;&gt;one Life magazine picture&lt;/a&gt; in particular. I hope this doesn&amp;rsquo;t happen with Tech.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2007">2007</category>
      
      <category domain="https://www.bobdc.com//categories/miscellaneous">miscellaneous</category>
      
    </item>
    
    <item>
      <title>Using XHTML 2 schemas</title>
      <link>https://www.bobdc.com/blog/using-xhtml-2-schemas/</link>
      <pubDate>Fri, 13 Apr 2007 07:45:45 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/using-xhtml-2-schemas/</guid>
      
      
      <description><div>The RELAX NG kind, and maybe the XSD kind.</div><div>&lt;p&gt;I wanted to use Emacs+&lt;a href=&#34;http://www.thaiopensource.com/nxml-mode/&#34;&gt;nxml&lt;/a&gt; to create some XHTML 2 documents, so I went looking for an XHTML 2 schema. The &lt;a href=&#34;http://www.w3.org/TR/2006/WD-xhtml2-20060726/&#34;&gt;latest Working Draft&lt;/a&gt; says that it &amp;ldquo;includes an early implementation of XHTML 2.0 in &lt;a href=&#34;http://relaxng.org/&#34;&gt;RELAX NG&lt;/a&gt;, but does not include the implementations in DTD or XML Schema form. Those will be included in subsequent versions, once the content of this language stabilizes.&amp;rdquo; This schema&amp;rsquo;s location is not obvious, but a few web searches turned up &lt;a href=&#34;http://webheadstart.org/html/wwwlist/2006Aug/mail_3070.htm&#34;&gt;a pointer&lt;/a&gt; to the &lt;a href=&#34;http://www.w3.org/TR/2006/WD-xhtml2-20060726/xhtml2.zip&#34;&gt;ZIP archive version&lt;/a&gt; of the Working Draft mentioned in the spec&amp;rsquo;s header.&lt;/p&gt;
&lt;p&gt;When you unzip this file, you&amp;rsquo;ll find a collection of RELAX NG rng files in the xhtml2-20060726\RELAXNG subdirectory. The xhtml2.rng file looks like the &lt;a href=&#34;http://www.w3.org/TR/2006/WD-xhtml2-20060726/xhtml20_relax.html#a_rmodule_RELAX_NG_XHTML_2.0_Driver&#34;&gt;driver file&lt;/a&gt; mentioned in the Working Draft, so I tried parsing a simple XHTML 2 document against that with &lt;a href=&#34;http://www.thaiopensource.com/relaxng/jing.html&#34;&gt;jing&lt;/a&gt; and got some XForms-related error messages. I commented out the xhtml2.rng &lt;code&gt;div&lt;/code&gt; element that contained the XForms module and the sample document parsed just fine. (Make sure that your XHTML 2 document&amp;rsquo;s elements are in the &lt;a href=&#34;http://www.w3.org/2002/06/xhtml2/&#34;&gt;http://www.w3.org/2002/06/xhtml2/&lt;/a&gt; namespace.)&lt;/p&gt;
&lt;p&gt;I used &lt;a href=&#34;http://www.thaiopensource.com/relaxng/trang.html&#34;&gt;trang&lt;/a&gt; to convert the rng files to RELAX NG &lt;a href=&#34;http://relaxng.org/compact-tutorial-20030326.html&#34;&gt;Compact&lt;/a&gt; files so that I could use them with Emacs+nxml. I zipped these up and put them at &lt;a href=&#34;http://www.snee.com/xml/xhtml2rnc2005-07-27.zip&#34;&gt;http://www.snee.com/xml/xhtml2rnc2005-07-27.zip&lt;/a&gt; if anyone else is interested in using them. I also tried converting the RNG files to DTDs, but trang said that there were too many fancy RELAX NG constructs in there, which makes sense— the Working Group used RELAX NG instead of DTDs because it&amp;rsquo;s more expressive.&lt;/p&gt;
&lt;p&gt;The story with W3C Schemas was similar to the DTD one but not as bad. Trang converted the RNG files to XSD&amp;rsquo;s with a few errors. I tried validating a sample document against xhtml2.xsd with &lt;a href=&#34;http://xml.apache.org/xerces-c/stdinparse.html&#34;&gt;stdinparse&lt;/a&gt; and had some luck, but I still got some error messages. I spent a few minutes trying to track down their cause and then quit. I&amp;rsquo;ve always felt that outside of the data typing W3C Schemas are too much trouble, and this certainly didn&amp;rsquo;t change my mind.&lt;/p&gt;
&lt;p&gt;Despite the age of the RELAX NG schema, as indicated by the date in my zip filename, the rnc files worked well with Emacs+nxml. They didn&amp;rsquo;t even have problems with a sample document that included the new &lt;code&gt;about&lt;/code&gt;, &lt;code&gt;role&lt;/code&gt; and &lt;code&gt;property&lt;/code&gt; attributes described in my recent XML.com pieces about RDFa (&lt;a href=&#34;http://www.xml.com/pub/a/2007/02/14/introducing-rdfa.html&#34;&gt;Part 1&lt;/a&gt;, &lt;a href=&#34;http://www.xml.com/pub/a/2007/04/04/introducing-rdfa-part-two.html&#34;&gt;Part 2&lt;/a&gt;) except when an &lt;code&gt;about&lt;/code&gt; attribute value had square brackets to indicate that it was a CURIE. (I was going to link &amp;ldquo;CURIE&amp;rdquo; in that last sentence to my second article&amp;rsquo;s section on them, but somewhere in O&amp;rsquo;Reilly&amp;rsquo;s process for preparing these articles they took the &lt;code&gt;id&lt;/code&gt; attributes off of all of my block elements except for the &lt;code&gt;pre&lt;/code&gt; elements. I put these &lt;code&gt;id&lt;/code&gt; values in the block elements of what I write to make it easier to link to specific points—you know, web, linking, etc.—so it&amp;rsquo;s odd that they would take them out.) CURIEs are recent enough that I wouldn&amp;rsquo;t expect this version of the schema to support them.&lt;/p&gt;
&lt;p&gt;When the next Working Draft comes out, I know I&amp;rsquo;ll go straight to the schema in the zip file to try it out. Maybe it will have better XForms support; maybe there&amp;rsquo;ll be new features to play with. I look forward to it.&lt;/p&gt;
&lt;h2 id=&#34;2-comments&#34;&gt;2 Comments&lt;/h2&gt;
&lt;p&gt;By Paul Everitt on &lt;a href=&#34;#comment-1044&#34;&gt;July 12, 2007 1:51 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I have struggled valiantly for over year to get XHTML2 RNG schemas to work well (I&amp;rsquo;m using oXygen). Specifically, I would love to hear that anybody in the universe has actually used the XHTML2 RNGs to create a document with forms. Meaning, XForms.&lt;/p&gt;
&lt;p&gt;The problem seems to be a combination of how XForms got absorbed into XHTML2 as a host language, plus a set of schemas that had that part somewhat disabled.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-1046&#34;&gt;July 13, 2007 8:50 AM&lt;/a&gt;&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;how XForms got absorbed into XHTML2 as a host language&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;In a word, badly, in the last iteration of the schema. I commented the XForms parts out when I used the schemas.&lt;/p&gt;
&lt;p&gt;I would work on the XForms advocates (e.g. Micah Dubinko) on this score, because it&amp;rsquo;s up to them to integrate XForms better into XHTML 2 if they want people to use it.&lt;/p&gt;
&lt;p&gt;Bob\&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2007">2007</category>
      
      <category domain="https://www.bobdc.com//categories/xml">XML</category>
      
    </item>
    
    <item>
      <title>Semantic Web project ideas number 2</title>
      <link>https://www.bobdc.com/blog/semantic-web-project-ideas-num-1/</link>
      <pubDate>Mon, 09 Apr 2007 08:29:46 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/semantic-web-project-ideas-num-1/</guid>
      
      
      <description><div>Managing relationships with customers.</div><div>&lt;p&gt;In &lt;a href=&#34;https://www.bobdc.com/blog/semantic-web-project-ideas-num&#34;&gt;part one&lt;/a&gt; of this series I described how the Tapscott and Williams &lt;a href=&#34;http://www.amazon.com/exec/obidos/ISBN=1591841380/bobducharmeA/&#34;&gt;Wikinomics&lt;/a&gt; book mentioned a few things that gave me ideas for semantic web projects. One was the concept of open source Customer Relationship Management packages, which I hadn&amp;rsquo;t heard of before. The book didn&amp;rsquo;t mention any specific ones, but a Google search on &lt;a href=&#34;http://www.google.com/search?q=%22open%20source%20crm%22&#34;&gt;&amp;ldquo;Open Source&amp;rdquo; CRM&lt;/a&gt; gets plenty of hits.&lt;/p&gt;
&lt;p&gt;What does CRM software do? Although the Wikipedia &lt;a href=&#34;http://en.wikipedia.org/wiki/Customer_Relationship_Management&#34;&gt;CRM entry&lt;/a&gt; contains the warning &amp;ldquo;This article or section appears to contain a large number of buzzwords and may require cleanup&amp;rdquo; (It&amp;rsquo;s nice to see that the Wikipedians worry about this) the following passage from it makes sense to me:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;CRM software is essentially meant to address the needs of Marketing, Sales and Distribution, and Customer Service and Support divisions within an organization and allow the three to share data on prospects, customers, partners, competitors and employees.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Making it easier to share data across organizational boundaries is a big goal of the semantic web, and if businesses are going to get more value from semweb technology, this looks like a fertile place to plant something. The &amp;ldquo;R&amp;rdquo; in &amp;ldquo;CRM&amp;rdquo; can have some interesting implications in a semweb application; the ability to identify and track a company&amp;rsquo;s relationship(s) with a given customer (and related customers) in a way that improves those relationships would be quite a payoff for adding a cool new technology to an existing software infrastructure. If the software is open source, there should be hooks to add such new features.&lt;/p&gt;
&lt;p&gt;A few quick web searches don&amp;rsquo;t turn up an existing CRM ontology or taxonomy, so one may have to be built. As database packages, a lot of naming and relationships will already be taken care of, but you don&amp;rsquo;t want to extrapolate from the first package you look at to the whole CRM world. If building an ontology, don&amp;rsquo;t fall into the common semantic web developer trap of building a huge ontology and then telling the world to come and get it—you&amp;rsquo;re better off with the &lt;a href=&#34;http://agilemanifesto.org/principles.html&#34;&gt;agile&lt;/a&gt; approach of creating a small ontology, developing working code around it, demoing that, and then building from there.&lt;/p&gt;
&lt;p&gt;Before committing to a particular open source CRM package to work with, I&amp;rsquo;d look through the discussion forums for each and see what users are doing and trying to do with them, as well as what the developers working with the code have to say and what resources are available to answer their questions. To see positive results from the addition of new features, you&amp;rsquo;d want to work with real data, so the forums would provide candidates for partners to work with. If you come up with the right partner and proposal, maybe you could get paid for this work!&lt;/p&gt;
&lt;p&gt;A review of multiple CRM packages would provide good input for the development of a CRM ontology, and it would also broaden your perspective on what people want out of CRM systems and what different packages are doing to meet or exceed those needs. For that matter, you may as well look over the promises made by commercial packages. CRM is a big business with its own culture and &lt;a href=&#34;http://www.crm-daily.com/&#34;&gt;trade press&lt;/a&gt;, so there are plenty of places to do research.&lt;/p&gt;
&lt;p&gt;Keep asking yourself &amp;ldquo;if these people don&amp;rsquo;t know about semantic web technologies, what are they missing? What could it add here to make this software do more?&amp;rdquo; You can be the person who shows them.&lt;/p&gt;
&lt;h2 id=&#34;3-comments&#34;&gt;3 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.openlinksw.com/blog/~kidehen&#34; title=&#34;http://www.openlinksw.com/blog/~kidehen&#34;&gt;Kingsley Idehen&lt;/a&gt; on &lt;a href=&#34;#comment-787&#34;&gt;April 9, 2007 2:13 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Bob,&lt;/p&gt;
&lt;p&gt;Me again :-) We are embarking upon an effort (community based) to create Ontologies for XBRL Taxonomies. In the same vein, we will embark on a similar effort re. CRM. I already floated this idea to the &lt;a href=&#34;http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData&#34;&gt;Linking Open Data&lt;/a&gt; project&amp;rsquo;s mailing list last week as part of the &lt;a href=&#34;http://esw.w3.org/topic/TaskForces/CommunityProjects/LinkingOpenData/THALIATestbed&#34;&gt;THALIA&lt;/a&gt; project enhancements re. incorporation of SPARQL, RDF, and OWL.&lt;/p&gt;
&lt;p&gt;I have been searching without success for an eCRM Ontology (can&amp;rsquo;t believe there isn&amp;rsquo;t one out there). Thus, we will build should nothing show up from the public domain in the next week or so.\&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.ccil.org/~cowan&#34; title=&#34;http://www.ccil.org/~cowan&#34;&gt;John Cowan&lt;/a&gt; on &lt;a href=&#34;#comment-788&#34;&gt;April 9, 2007 2:20 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Beware! Beware!&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;To a first approximation, there are no open-source CRM packages. There are many that claim to be so, but on investigation it turns out that they use the Mozilla Public License or some equivalent with the definitely non-Open-Source addition of rules saying that modified versions must keep their logos with specified content and minimum-size restrictions in place on every displayed screen.&lt;/p&gt;
&lt;p&gt;That is good business for them (it discourages competitors from grabbing the whole thing, hiking off the logo, and substituting their own), but it breaks the Open Source Definition requirement #10, which says that licensing can&amp;rsquo;t depend on the particular technology: if you try to reuse any of the code within a headless application, for example, you are screwed.&lt;/p&gt;
&lt;p&gt;The matter has been taken up on the &lt;a href=&#34;mailto:license-discuss@opensource.org&#34;&gt;license-discuss@opensource.org&lt;/a&gt; mailing list lately, and some CRM vendors are changing their tunes. But read carefully.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-789&#34;&gt;April 9, 2007 3:40 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;John: thanks.&lt;/p&gt;
&lt;p&gt;Kingsley: one of the other ideas I was going to write up was going to be about XBRL, because they&amp;rsquo;ve worked out so many taxonomies with direct business applications. Keep us posted on what you come up with&amp;hellip;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2007">2007</category>
      
      <category domain="https://www.bobdc.com//categories/project-ideas">project ideas</category>
      
    </item>
    
    <item>
      <title>James Clark&#39;s weblog</title>
      <link>https://www.bobdc.com/blog/james-clarks-weblog/</link>
      <pubDate>Fri, 06 Apr 2007 22:49:11 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/james-clarks-weblog/</guid>
      
      
      <description><div>Read it, and pay close attention.</div><div>&lt;p&gt;James Clark has a &lt;a href=&#34;http://blog.jclark.com/&#34;&gt;weblog&lt;/a&gt;. I worry that, because his most recent large multi-year project was an organized effort to get the Thai government to make an official commitment to open source software, too many people who came to XML-related technology in the last few years won&amp;rsquo;t know who he is. Coming up with the acronym &amp;ldquo;XML&amp;rdquo; is only a footnote to his many achievements in the design and implementation of XML technology, and before that, SGML technology. I have to restrain myself from starting a list, because I won&amp;rsquo;t know where to stop, but a Google search on &lt;a href=&#34;http://www.google.com/search?q=%22james%20clark%22%20xml&#34;&gt;&amp;ldquo;James Clark&amp;rdquo; xml&lt;/a&gt; is pretty instructive. Let&amp;rsquo;s just say that when a weblog posting from him talks about what&amp;rsquo;s good and what&amp;rsquo;s bad, it&amp;rsquo;s worth taking very seriously.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2007">2007</category>
      
      <category domain="https://www.bobdc.com//categories/xml">XML</category>
      
    </item>
    
    <item>
      <title>Semantic Web project ideas number 1</title>
      <link>https://www.bobdc.com/blog/semantic-web-project-ideas-num/</link>
      <pubDate>Wed, 04 Apr 2007 20:11:13 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/semantic-web-project-ideas-num/</guid>
      
      
      <description><div>Hello, lazy semweb world.</div><div>&lt;p&gt;When I &lt;a href=&#34;https://www.bobdc.com/blog/metadata-and-metadata&#34;&gt;spoke at a conference recently&lt;/a&gt;, the speaker&amp;rsquo;s gift was a copy of the book that keynote speaker Don Tapscott write with Anthony D. Williams: &lt;a href=&#34;http://www.amazon.com/exec/obidos/ISBN=1591841380/bobducharmeA/&#34;&gt;Wikinomics&lt;/a&gt;. The book is very biz-buzzwordy (from page 150: &amp;ldquo;consumer product companies can find ways to monetize customer-led ecosystems&amp;rdquo;—have these guys bookmarked the &lt;a href=&#34;http://www.snee.com/bobdc.blog/2007/03/instant_tech_marketing_copy.html&#34;&gt;Web Economy Bullshit Generator&lt;/a&gt;?), and they feel compelled to coin their own buzzwords, from the book&amp;rsquo;s title to terms like N-gen, B-web, ideagora, and prosumer. I&amp;rsquo;ll admit that I&amp;rsquo;m a little jealous, though; I wish I could come with some visionary lite tech book that people in suits would want to read on planes. As I write this, &amp;ldquo;Wikinomics&amp;rdquo; has an Amazon ranking of 164. I remember getting excited when &lt;a href=&#34;http://www.snee.com/bob/xsltquickly/index.html&#34;&gt;XSLT Quickly&lt;/a&gt; broke the 7,000 mark.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://www.amazon.com/exec/obidos/ISBN=1591841380/bobducharmeA/&#34;&gt;&lt;img src=&#34;http://ec2.images-amazon.com/images/P/1591841380.01._AA240_SCLZZZZZZZ_.jpg&#34; alt=&#34;[Wikinomics cover]&#34; border=&#34;0&#34; align=&#34;right&#34; hspace=&#34;30px&#34; vspace=&#34;30px&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The book&amp;rsquo;s many case studies about the New Collaboration (actually, I don&amp;rsquo;t think the book used that phrase—maybe if I start using it a lot with a capital &amp;ldquo;N&amp;rdquo; and &amp;ldquo;C&amp;rdquo; I&amp;rsquo;ve got a title for my lite visionary book!) alerted me to some interesting projects such as &lt;a href=&#34;http://en.wikipedia.org/wiki/InnoCentive&#34; title=&#34;innocentive.com down on 2007-04-04&#34;&gt;Innocentive&lt;/a&gt;, &lt;a href=&#34;http://www.yet2.com/&#34;&gt;yet2.com&lt;/a&gt;, and &lt;a href=&#34;http://www.ninesigma.com/&#34;&gt;NineSigma&lt;/a&gt;. Many of the book&amp;rsquo;s topics sound like the kinds of things that people hope to see grow out of the semantic web, although the book don&amp;rsquo;t mention that at all. I started thinking about what semantic web technologies could add to the projects that the book does mention, and I got some ideas.&lt;/p&gt;
&lt;p&gt;I already have several ideas that I have no time to follow through on, so I thought I&amp;rsquo;d start throwing them out there for anyone who&amp;rsquo;s interested. None are quite PhD thesis material, but some might be masters material. They could all be useful, and some could be popular if someone followed through on them. Here&amp;rsquo;s another new coinage: &amp;ldquo;Lazy Semweb&amp;rdquo;! That is, a &lt;a href=&#34;http://www.lazyweb.org/&#34;&gt;lazy web&lt;/a&gt; for semantic technologies. Danny Ayers recently &lt;a href=&#34;http://dannyayers.com/2007/03/28/using-those-profiles&#34;&gt;thew out an offering&lt;/a&gt;, although he didn&amp;rsquo;t use the term.&lt;/p&gt;
&lt;p&gt;For many of the things I suggest, I imagine that some people (for several of my ideas, probably &lt;a href=&#34;http://www.openlinksw.com/blog/~kidehen/&#34;&gt;Kingsley Idehen&lt;/a&gt;) will point out existing work already addressing my idea. That&amp;rsquo;s fine with me, because it provides further input for those seeking ideas. I&amp;rsquo;ll tag all the entries with a &lt;a href=&#34;http://www.snee.com/bobdc.blog/metadata/semantic_web/project_ideas&#34;&gt;metadata/semantic_web/project_ideas&lt;/a&gt; tag to make it easier to find the gathered collection of ideas in this weblog.&lt;/p&gt;
&lt;p&gt;The most important thing for each project is that it should include a demonstration of how to get more value out of the data in question than would be possible without the semweb part. I&amp;rsquo;m not going to accuse Tapscott and Williams of falling short by not mentioning semantic web technology (although I&amp;rsquo;m sure that if they&amp;rsquo;d seen &lt;a href=&#34;http://www.technologyreview.com/video/semantic&#34;&gt;this video&lt;/a&gt; they would have run with it); I&amp;rsquo;m going to challenge semantic web advocates to prove to the Tapscotts and Williamses of the world that semweb technology adds value.&lt;/p&gt;
&lt;h2 id=&#34;Mlcx-bJsTa6qHLrcbeEa-A&#34;&gt;Google Desktop API + Semantic Web technology = ?&lt;/h2&gt;
&lt;p&gt;After all this introductory rambling, I&amp;rsquo;ll start with something short and simple for the first idea. Instead of building on a Tapscott/Williams topic, I&amp;rsquo;ll build on something I&amp;rsquo;ve &lt;a href=&#34;https://www.bobdc.com/blog/semantic-data-entry&#34;&gt;mentioned recently&lt;/a&gt; here. I wondered about semantic web tools that built on the way people choose to work instead of making them use new tools. One tool that builds very nicely on the way I work is Google Desktop. It&amp;rsquo;s free, for work-related issues I use it more than Google itself, and I don&amp;rsquo;t know why I waited so long before trying it. Unfortunately, it&amp;rsquo;s limited to use on Windows machines, but there are &lt;a href=&#34;http://www.theregister.co.uk/2006/01/31/google_goes_desktop_linux/&#34;&gt;rumors&lt;/a&gt; of an Ubuntu version.&lt;/p&gt;
&lt;p&gt;And it&amp;rsquo;s got an &lt;a href=&#34;http://desktop.google.com/dev/searchapi.html&#34;&gt;API&lt;/a&gt;! If Google Desktop retrieves a handful of metadata about each file in which it found your search string, what more can we do with that metadata? I&amp;rsquo;d love to see a program that takes a user&amp;rsquo;s query, reads some ontology rules, and then passes in a more sophisticated query to identify additional related resources that don&amp;rsquo;t fall within the exact parameters of the user&amp;rsquo;s original query. Or, someone could build a SPARQL endpoint around the API. Or, they could use the API to pull a bunch of this metadata into a triplestore, combine that with an ontology and other data&amp;hellip; I think there&amp;rsquo;s some real possibilities here.&lt;/p&gt;
&lt;h2 id=&#34;2-comments&#34;&gt;2 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://blog.whats-your.name&#34; title=&#34;http://blog.whats-your.name&#34;&gt;carmen&lt;/a&gt; on &lt;a href=&#34;#comment-784&#34;&gt;April 4, 2007 10:32 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;KDE and Gnome desktop environments both have triple-stores and faceted metadata recall/browsing/autocreation in experimental versions - check Nepomuk, Beagle, etc.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.openlinksw.com/blog/~kidehen&#34; title=&#34;http://www.openlinksw.com/blog/~kidehen&#34;&gt;Kingsley Idehen&lt;/a&gt; on &lt;a href=&#34;#comment-785&#34;&gt;April 5, 2007 12:12 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Bob,&lt;/p&gt;
&lt;p&gt;How about starting here:&lt;br /&gt;
&lt;a href=&#34;http://demo.openlinksw.com/DAV/JS/rdfbrowser/index.html&#34;&gt;http://demo.openlinksw.com/DAV/JS/rdfbrowser/index.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Just add URLs to the Data Source URI field and hit &amp;ldquo;Query&amp;rdquo;. For you Blog Data Space the TimeLine Tab is a nice place to perform the data link traversal (URI dereferencing) which basically results in some interesting meshups :-)&lt;/p&gt;
&lt;p&gt;Note: You can bookmark via the Permalinks.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2007">2007</category>
      
      <category domain="https://www.bobdc.com//categories/project-ideas">project ideas</category>
      
    </item>
    
    <item>
      <title>The state of the semantic web</title>
      <link>https://www.bobdc.com/blog/the-state-of-the-semantic-web/</link>
      <pubDate>Mon, 02 Apr 2007 07:54:00 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/the-state-of-the-semantic-web/</guid>
      
      
      <description><div>Lookin&#39; good!</div><div>&lt;p&gt;The W3C&amp;rsquo;s Ivan Herman recently gave a talk on the State of the Semantic Web in Bangalore, and he&amp;rsquo;s made the &lt;a href=&#34;http://www.w3.org/2007/Talks/0223-Bangalore_IH/&#34; title=&#34;State of the Semantic Web slides&#34;&gt;slides&lt;/a&gt; available online. Anyone remotely interested in the semantic web or RDF should look through the presentation; it may seem esoteric in places, with its talk of Horn rules and F-logic, but in general it&amp;rsquo;s a clear, up-to-date summary of the important current issues.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://en.wikipedia.org/wiki/Graph&#34;&gt;&lt;img src=&#34;https://www.bobdc.com/img/main/stateofthesemweb.jpg&#34; alt=&#34;[graphs]&#34; border=&#34;0&#34; align=&#34;right&#34; hspace=&#34;30px&#34; vspace=&#34;30px&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;One of my favorite slides is titled &lt;a href=&#34;http://www.w3.org/2007/Talks/0223-Bangalore-IH/Slides.html#(47)&#34;&gt;A major problem: messaging&lt;/a&gt; (which at first I thought referred to message passing, making me think &amp;ldquo;huh?&amp;rdquo;). It&amp;rsquo;s a good summary of misconceptions about the semantic web, which are addressed on subsequent slides: AI reincarnated, merely ugly XML, top-down ontologies trying to dictate everything&amp;hellip;: none of the above! The slide ends with the message &amp;ldquo;Some simple messages should come to the fore!&amp;rdquo;, but I would quibble that one of these messages isn&amp;rsquo;t so simple: &amp;ldquo;&lt;em&gt;People should &amp;rsquo;think&amp;rsquo; in terms of graphs&lt;/em&gt; [his italics and underscore], the rest is syntactic sugar!&amp;rdquo;&lt;/p&gt;
&lt;p&gt;When most people think of graphs, they think of pictorial representations of data. Because I have a computer science degree, I know that by &amp;ldquo;graphs&amp;rdquo; he&amp;rsquo;s referring to a particular data structure for which pictorial representations are a nice option but not the data structure&amp;rsquo;s reason for being, and I worry that the use of this technical term will confuse the wider audience to whom we&amp;rsquo;re trying to evangelize the semantic web. No punchy, correct, superior alternative springs to mind, though—collections of relationship descriptions and attribute/value pairs?&lt;/p&gt;
&lt;p&gt;It was also nice to see the industries and famous brand names listed on the slides &lt;a href=&#34;http://www.w3.org/2007/Talks/0223-Bangalore-IH/Slides.html#(55)&#34;&gt;Some RDF deployment areas&lt;/a&gt;, &lt;a href=&#34;http://www.w3.org/2007/Talks/0223-Bangalore-IH/Slides.html#(56)&#34;&gt;The &amp;ldquo;corporate&amp;rdquo; landscape is moving&lt;/a&gt;, &lt;a href=&#34;http://www.w3.org/2007/Talks/0223-Bangalore-IH/Slides.html#(59)&#34;&gt;There has been lots of R&amp;amp;D&lt;/a&gt;, and &lt;a href=&#34;http://www.w3.org/2007/Talks/0223-Bangalore-IH/Slides.html#(60)&#34;&gt;Portals&lt;/a&gt;. This will help dispel one of the biggest misconceptions of all about the semantic web: that it&amp;rsquo;s limited to academic, ivory tower research projects. Now if we can just cut back on the use of technical computer science terms&amp;hellip;&lt;/p&gt;
&lt;h2 id=&#34;5-comments&#34;&gt;5 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.ldodds.com/blog&#34; title=&#34;http://www.ldodds.com/blog&#34;&gt;Leigh Dodds&lt;/a&gt; on &lt;a href=&#34;#comment-778&#34;&gt;April 2, 2007 10:32 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Why not &amp;ldquo;People should just &amp;rsquo;think&amp;rsquo; in terms of webs&amp;rdquo;?&lt;/p&gt;
&lt;p&gt;A web of resources is how I think of RDF graphs, the visual image is closer to the actual data structures, and its a more positively associated with web/web2.0, simplicity, etc.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.openlinksw.com/blog/~kidehen&#34; title=&#34;http://www.openlinksw.com/blog/~kidehen&#34;&gt;Kingsley Idehen&lt;/a&gt; on &lt;a href=&#34;#comment-779&#34;&gt;April 2, 2007 11:47 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;People should just think in terms of: Web Data or Web Data Sources or Data Sources :-)&lt;/p&gt;
&lt;p&gt;At the end of the day, URIs are pointers to Data :-) This is why dereferencing URIs is feasible and such a powerful concept once broadly understood :-)&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.base4.net/blog.aspx?ID=356&#34; title=&#34;http://www.base4.net/blog.aspx?ID=356&#34;&gt;Alex James&lt;/a&gt; on &lt;a href=&#34;#comment-780&#34;&gt;April 3, 2007 12:47 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;All this brings me to a slide in Ivan Herman’s talk about “The state of Semantic Web” in Bangalore, highlighted by Bob DuCharme. The slide highlights, I think correctly&amp;hellip;&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.grauw.nl/&#34; title=&#34;http://www.grauw.nl/&#34;&gt;Laurens Holst&lt;/a&gt; on &lt;a href=&#34;#comment-781&#34;&gt;April 4, 2007 11:12 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;In Dutch it’s easier: a ‘grafiek’ is a pictorial representation of data, and a ‘graaf’ is well, the mathematical type of graph :). Although it’s also the word for count in Dutch (as-in count Dracula, not count numbers :)), but I don’t think that’s likely to cause confusion.&lt;/p&gt;
&lt;p&gt;Unfortunately, graph probably still isn’t going to be understood by people without a slightly mathematical background. They might have had it mentioned in high school, but long forgotten.&lt;/p&gt;
&lt;p&gt;I think if you want to explain the ‘graphical’ (ehehe) nature of RDF, it is best done with an accompanying picture, and as little text and URIs as possible.&lt;/p&gt;
&lt;p&gt;&lt;br /&gt;
~Grauw&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-782&#34;&gt;April 4, 2007 11:17 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;But what does &amp;ldquo;snee&amp;rdquo; mean? (&lt;a href=&#34;http://www.snee.com/about.html&#34;&gt;Just kidding&lt;/a&gt;.)&lt;/p&gt;
&lt;p&gt;Bob&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2007">2007</category>
      
      <category domain="https://www.bobdc.com//categories/semantic-web">semantic web</category>
      
    </item>
    
    <item>
      <title>New Eric van der Vlist book on Schematron out</title>
      <link>https://www.bobdc.com/blog/new-eric-van-der-vlist-book-on/</link>
      <pubDate>Wed, 28 Mar 2007 21:56:36 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/new-eric-van-der-vlist-book-on/</guid>
      
      
      <description><div>As part of O&#39;Reilly&#39;s &#34;Short Cuts&#34; series.</div><div>&lt;p&gt;&lt;a href=&#34;http://www.oreilly.com/catalog/9780596527716/&#34;&gt;&lt;img src=&#34;http://www.oreilly.com/catalog/covers/9780596527716_cat.gif&#34; alt=&#34;[cover of Schematron book]&#34; border=&#34;0&#34; align=&#34;right&#34; hspace=&#34;30px&#34; vspace=&#34;30px&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;O&amp;rsquo;Reilly has just released a book on Schematron by Eric van der Vlist as part of their Short Cuts series of short, inexpensive PDF books. I&amp;rsquo;m sure this book will do well, because Eric knows his stuff and Schematron is remarkably useful.&lt;/p&gt;
&lt;p&gt;When I was at LexisNexis, they were using XML all over the place, and I was always encouraging people there to use the latest advances in XML technology. Schematron was always the easiest sell. (I just heard recently that their use of it has increased since I left.) An ISO standard rule-based XML quality-checking diagnostic system with natural language error messages that only needs an XSLT engine to implement—what&amp;rsquo;s not to like?&lt;/p&gt;
&lt;p&gt;Its name gives the impression that it&amp;rsquo;s an alternative schema language, and strictly speaking it might be, but I see it more as a complement to existing ones. I&amp;rsquo;ve even &lt;a href=&#34;http://www.xml.com/pub/a/2002/05/15/schematron.html?page=1&#34;&gt;made a case&lt;/a&gt; that Schematron can add enough to a DTD-based system to let the system&amp;rsquo;s users postpone a transition to RELAX NG or W3C Schemas as a replacement for DTDs. An, when implemented as an additional layer to an existing system, a Schematron rollout can mean very little disruption to that system.&lt;/p&gt;
&lt;p&gt;For anyone interested in maintaining and improving the quality of their XML, you owe it to yourself to check out Schematron. Congratulations to Eric and to O&amp;rsquo;Reilly for getting the book out.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2007">2007</category>
      
      <category domain="https://www.bobdc.com//categories/xml">XML</category>
      
    </item>
    
    <item>
      <title>Clever video about Web 2.0 and XML</title>
      <link>https://www.bobdc.com/blog/clever-video-about-web-20-and/</link>
      <pubDate>Tue, 27 Mar 2007 08:48:48 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/clever-video-about-web-20-and/</guid>
      
      
      <description><div>Text in motion.</div><div>&lt;p&gt;Kansas State University anthropologist &lt;a href=&#34;http://www.ksu.edu/sasw/anthro/wesch.htm&#34;&gt;Michael Wesch&lt;/a&gt; has created an interesting four and a half minute video titled &lt;a href=&#34;http://www.youtube.com/watch?v=6gmP4nk0EOE&#34;&gt;&amp;ldquo;Web 2.0&amp;hellip; The Machine is Us/ing Us&amp;rdquo;&lt;/a&gt; that is available on YouTube. I I love how his video communicates to the viewer using text, not as captions or titles or as animation, but as text manipulated by a (usually) unseen hand. If you&amp;rsquo;re a fan of text-based art like the work of Jenny Holzer, you&amp;rsquo;ll enjoy it.&lt;/p&gt;
&lt;p&gt;I was going to take a screen shot to include here and link it to the YouTube page, but the constant, constant motion of Wesch&amp;rsquo;s video as it throws ideas at you is such a big part of its appeal that no screen shot could do it justice. So, I&amp;rsquo;m taking my first shot at embedding a video in a weblog entry, since YouTube makes it so easy. Web 2.0 and all that.&lt;/p&gt;

&lt;div style=&#34;position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;&#34;&gt;
  &lt;iframe src=&#34;https://www.youtube.com/embed/6gmP4nk0EOE&#34; style=&#34;position: absolute; top: 0; left: 0; width: 100%; height: 100%; border:0;&#34; allowfullscreen title=&#34;YouTube Video&#34;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;p&gt;(&lt;a href=&#34;http://www.sfgate.com/cgi-bin/article.cgi?file=/comics/Zippy_the_Pinhead_Color.dtl&#34;&gt;Zippy&lt;/a&gt; says: am I mashed up yet?)&lt;/p&gt;
&lt;p&gt;The reason I&amp;rsquo;m writing about it here is that it&amp;rsquo;s the first time I&amp;rsquo;ve seen someone push the separation of form from content as a theme of Web 2.0. As an old SGML geek, I certainly won&amp;rsquo;t complain. Maybe developers just take it for granted now that something inside of &lt;code&gt;&amp;lt;title&amp;gt;&amp;lt;/title&amp;gt;&lt;/code&gt; tags has more possibilities for reuse than something inside of &lt;code&gt;&amp;lt;b&amp;gt;&amp;lt;/b&amp;gt;&lt;/code&gt; tags, and this kind of reuse is what drives mashups. Being taken for granted, there&amp;rsquo;s no need to push it as a theme in the general Web x (where x &amp;gt; 1) hype, and it took an anthropologist to notice and highlight the theme&amp;rsquo;s central role.&lt;/p&gt;
&lt;h2 id=&#34;1-comments&#34;&gt;1 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://plasmasturm.org/&#34; title=&#34;http://plasmasturm.org/&#34;&gt;Aristotle Pagaltzis&lt;/a&gt; on &lt;a href=&#34;#comment-765&#34;&gt;March 27, 2007 10:05 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;FWIW, that is only the first published draft; Wesch recently posted &lt;a href=&#34;http://youtube.com/watch?v=NLlGopyXT_g&#34;&gt;the final version&lt;/a&gt;. The blurb on that one also includes download links for high-quality versions of the clip.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2007">2007</category>
      
      <category domain="https://www.bobdc.com//categories/xml">XML</category>
      
    </item>
    
    <item>
      <title>Semantic data entry</title>
      <link>https://www.bobdc.com/blog/semantic-data-entry/</link>
      <pubDate>Fri, 23 Mar 2007 08:38:22 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/semantic-data-entry/</guid>
      
      
      <description><div>Instead of motivating users to use new tools, can we build on the tools that they&#39;re already motivated to use?</div><div>&lt;p&gt;I think that Tim O&amp;rsquo;Reilly is overly pessimistic about semantic web technology, but in a recent &lt;a href=&#34;http://radar.oreilly.com/archives/2007/03/different_appro_1.html&#34;&gt;O&amp;rsquo;Reilly Radar posting&lt;/a&gt; that was part of the freebase vs. semantic web technology debate bouncing around about two week ago, he brought up an important issue that&amp;rsquo;s often overlooked: what motivates a user to go to the extra trouble to indicate the semantics of a piece of data to a program that may read that data? For example, when you add &amp;ldquo;On April 2nd, breakfast will be served at 8&amp;rdquo; to a web page, any literate English speaker can understand it. What motivates you to attach the string &amp;ldquo;2007-04-02T08:00&amp;rdquo; and an indication of its type somewhere in there? The belief that you&amp;rsquo;re making a better world isn&amp;rsquo;t good enough—you have to believe that it will help the people interested in eating that breakfast.&lt;/p&gt;
&lt;p&gt;Tim&amp;rsquo;s posting and a recent conversation I had with Eric Miller about using semweb technology to accumulate knowledge in the workplace got me to wondering about why intranet wiki and SharePoint deployments that I&amp;rsquo;ve taken part in didn&amp;rsquo;t work too well. The first level answer is that the users felt no motivation to add the kind of information that those programs are good at accumulating. Instead of asking the obvious second level question (how do we motivate these users to use these programs) I have a different question: how can a knowledge-sharing system build on the users&amp;rsquo; current practices for storing knowledge?&lt;/p&gt;
&lt;p&gt;This brings up today&amp;rsquo;s big question: how do people store knowledge? For example, if Joe HR Guy brings his laptop to a meeting and types up meeting notes during the meeting, what program is he typing with? If Joe tells Jane Project Manager a URL for something that will help her with her current project, how does she remember this URL and what it&amp;rsquo;s for? (This assumes she wasn&amp;rsquo;t told by email, in which case we have one important answer to this question: she remembers that it&amp;rsquo;s in an email from Joe. In fact, if Joe was responsible for taking minutes at his meeting, he may have typed them into his email client so that he could send them to everyone invited to the meeting. I&amp;rsquo;m more interested in Joe&amp;rsquo;s personal notes about what&amp;rsquo;s worth remembering from the meeting.)&lt;/p&gt;
&lt;p&gt;Is anyone&amp;rsquo;s aware of research on this issue? If ten or fifteen people leave comments here about how they store such information, it won&amp;rsquo;t help much, because I want to know about a more representative sample of the population—while I&amp;rsquo;m sure that some of you use some combination of Emacs, nxml, and elisp macros to automate data entry on everyday topics like I do, I know that&amp;rsquo;s not representative. That&amp;rsquo;s why I asked about Joe HR Guy and Jane Project Manager. I want to know what accountants and assistant vice presidents in all kinds of industries use, not what other XML/metadata geeks use.&lt;/p&gt;
&lt;p&gt;For the most part, I&amp;rsquo;m sure they use Bill Gates&amp;rsquo; tool of choice: MS Word. I know one business process analyst who takes meeting notes using nested bulleted lists in Word, and it works out just fine for him and for everyone who has to read his notes. Many people, when going to a meeting where they have to work out A, B, C, and D for W, X, Y, and Z, record notes about the relationships by opening up a blank spreadsheet, writing &amp;ldquo;A B C D&amp;rdquo; across the top row and &amp;ldquo;W X Y Z&amp;rdquo; down the first column, and then filling in the spreadsheet as they talk about A&amp;rsquo;s relationship to W and C&amp;rsquo;s relationship to X. (Or, perhaps they create a tab in the worksheet for each of four categories to allow more three-dimensional accumulation of information.)&lt;/p&gt;
&lt;p&gt;Then there&amp;rsquo;s the third corner of the MS Office triumvirate: PowerPoint, which few people use to take notes on ongoing activities, but which many use to assemble knowledge for transmission to other people. We can all complain about a presentation that consists of bulleted lists, but ideas like &lt;a href=&#34;http://www.opml.org/&#34;&gt;OPML&lt;/a&gt; and its esteemed competition wouldn&amp;rsquo;t have gotten any traction if nested lists of items weren&amp;rsquo;t often a more straightforward, structured approach to storing and transmitting knowledge than paragraphs of prose.&lt;/p&gt;
&lt;p&gt;What do you see around you? What commonly available applications do less technical people use to store knowledge as they accumulate it, or can we just assume a default of email folders plus MS Office files scattered around their hard disks? Have you ever heard of any broad, systematic study of how people do this and patterns that may have shown up? I&amp;rsquo;m not sure what&amp;rsquo;s &amp;ldquo;semantic&amp;rdquo; about this particular data, except that it&amp;rsquo;s recorded knowledge that would benefit from aggregation with related knowledge to create a whole that&amp;rsquo;s greater than the sum of its parts, but that sounds pretty worthwhile.&lt;/p&gt;
&lt;h2 id=&#34;8-comments&#34;&gt;8 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.ecs.soton.ac.uk/~das05r/&#34; title=&#34;http://www.ecs.soton.ac.uk/~das05r/&#34;&gt;Daniel Alexander Smith&lt;/a&gt; on &lt;a href=&#34;#comment-755&#34;&gt;March 23, 2007 9:50 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Hi Bob,&lt;/p&gt;
&lt;p&gt;You asked if there was any research in this area, and i&amp;rsquo;m happy to say there is.&lt;/p&gt;
&lt;p&gt;Firstly there is doingPad, which is a collaboration between MIT and the University of Southampton into ways to capture and recall &amp;ldquo;information scraps&amp;rdquo;, including determining the semantics of what the user has typed.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://mspace.fm/projects/doingPad/&#34;&gt;http://mspace.fm/projects/doingPad/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;br /&gt;
There is also Rich Tags, a University of Southampton project to utilise the &amp;ldquo;tagging&amp;rdquo; method of data entry as a way of semantically marking up resources, and hence building on the current tools that people use, as you say.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://mspace.fm/projects/richtags/&#34;&gt;http://mspace.fm/projects/richtags/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;br /&gt;
Thanks,&lt;/p&gt;
&lt;p&gt;Daniel Alexander Smith\&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://valentinzacharias.de/blog/&#34; title=&#34;http://valentinzacharias.de/blog/&#34;&gt;Valentin&lt;/a&gt; on &lt;a href=&#34;#comment-756&#34;&gt;March 23, 2007 11:21 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Taking about MS tools you forget OneNote - the actual note taking and sharing tool from MS (and popular with the non-cs people around here).&lt;/p&gt;
&lt;p&gt;And you might want to search for &amp;ldquo;Social Semantic Desktop&amp;rdquo; (Nepomuk project and others) - they adress similar questions (although I don&amp;rsquo;t know how much empirical data they have/have published).&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://blog.triplescape.com&#34; title=&#34;http://blog.triplescape.com&#34;&gt;Brian Manley&lt;/a&gt; on &lt;a href=&#34;#comment-757&#34;&gt;March 23, 2007 12:00 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;You might check out some of the papers at &lt;a href=&#34;http://pim.ischool.washington.edu/breakouts.htm.&#34;&gt;http://pim.ischool.washington.edu/breakouts.htm.&lt;/a&gt; While they probably won&amp;rsquo;t answer your question directly, there are a number of interesting papers. The ideas and authors might make for a good starting point if you were serious about doing some research on your own. Hope that helps! - Brian&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.cs.umd.edu/~hendler&#34; title=&#34;http://www.cs.umd.edu/~hendler&#34;&gt;jim hendler&lt;/a&gt; on &lt;a href=&#34;#comment-758&#34;&gt;March 23, 2007 12:16 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I generally agree with your contention that we need to go where the people are - in my ISWC keynote some years ago I suggested that we are wrong in thinking that semantics make it harder for people to author - the secret is to harness them to make it easier - I think Freebase is a good example of this direction - when you say someone is a &amp;ldquo;person&amp;rdquo; it suggests the properties you should fill in for a person - I think RDFS and OWL open a lot of potential for this, I&amp;rsquo;m a tad frustrated because I could never talk any of my grad students into doing this work - maybe it&amp;rsquo;ll be time to revisit this in my new lab&amp;hellip; For example, imagine a &amp;ldquo;home page creator&amp;rdquo; which would provide hints of topics people like on a homepage (hobby, family, etc.) - you could then enter a term (Scuba diving) and it would search for an ontology in that area - letting you then use it to enter info (or to extend it yourself, another idea freebase got right). I wasn&amp;rsquo;t thinking so much of retrofitting existing tools (although when I was at DARPA I proposed that adding DAML to clipart objects could make a great improvement in searching for powerpoint - another one worth revisiting?) &amp;ndash; anyway, this is all to say I think that you are right about needing to think a lot more about the value proposition and to figure out what would motivate people to do the right thing, by making it easier for them to do what they do anyway (or to create social worth in the annotation, sort of like why people created web pages in the first place)\&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.openlinksw.com/blog/~kidehen&#34; title=&#34;http://www.openlinksw.com/blog/~kidehen&#34;&gt;Kingsley Idehen&lt;/a&gt; on &lt;a href=&#34;#comment-759&#34;&gt;March 23, 2007 1:01 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Bob,&lt;/p&gt;
&lt;p&gt;When it comes to what I call the &amp;ldquo;Web 2.0&amp;rdquo; application profile, we have Blogs, Wikis, Shared Bookmark Managers, Feed Aggregators, Blog Rolls, Photo Galleries etc..&lt;/p&gt;
&lt;p&gt;I have attempted to address the fusion of Web 2.0 application profiles (in fact Distributed Collaborative Apps &amp;amp; Services) via our &lt;a href=&#34;http://virtuoso.openlinksw.com/wiki/main/Main/OdsIndex&#34;&gt;OpenLink Data Spaces&lt;/a&gt; (ODS) platform.&lt;/p&gt;
&lt;p&gt;In a nutshell, ODS gives the Web 2.0 user or developer an accelerated leap into the Data Web (or Web 3.0) without any RDF Tax.&lt;/p&gt;
&lt;p&gt;I have also authored a number of demonstrations via my blog which has been a live demonstration of all of this for a very long time :-)&lt;/p&gt;
&lt;p&gt;BTW - there are &lt;a href=&#34;http://demo.openlinksw.com/ods&#34;&gt;Live (Demonstration)&lt;/a&gt; and &lt;a href=&#34;http://myopenlink.net:8990/ods&#34;&gt;Live (serious experimentation)&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;instances of ODS for anyone to evaluate today.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://juxtaprose.com/jay&#34; title=&#34;http://juxtaprose.com/jay&#34;&gt;Jay Fienberg&lt;/a&gt; on &lt;a href=&#34;#comment-760&#34;&gt;March 23, 2007 8:30 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s a bit of a trap to think about this in terms of what &amp;ldquo;people do&amp;rdquo; to record knowledge / data, and Tim O&amp;rsquo;Reilly&amp;rsquo;s comments fall into that trap. There are a lot of different kinds of contexts which shape why people record information / data, and what is or isn&amp;rsquo;t a &amp;ldquo;record&amp;rdquo; is also affected by context.&lt;/p&gt;
&lt;p&gt;For example, I might be willing to use a calendar program at work, but not use one for my personal calendar at home. The different contexts shape my motivations towards the efforts of data entry, and I might have different standards of &amp;ldquo;semantically correct&amp;rdquo; calendars between home and work.&lt;/p&gt;
&lt;p&gt;So, a fully structured vCalendar event might be required at work, whereas &amp;ldquo;this week: Colin&amp;rsquo;s party Sat&amp;rdquo; might be all I need at home.&lt;/p&gt;
&lt;p&gt;There are many layers that factor into what people do&amp;ndash;everything from the physical entry devices to the concepts of a computer (e.g., vs a network) to the concept of an application to the concepts of sites / pages / files, etc. And, then, there are our motivations towards obligation, communication, gratification, etc.&lt;/p&gt;
&lt;p&gt;There are many kinds of designs that can make recording data (or more semantically elaborate data) easier or better for people. But, they&amp;rsquo;re generally only better relative to specific contexts of people&amp;rsquo;s motivations and needs, given the constraints of the computers.&lt;/p&gt;
&lt;p&gt;(By that &amp;ldquo;constraints of the computers&amp;rdquo; bit, I mean to imply that one could alternately create interfaces uniquely suited to specific types of semantic data entry, e.g., a physical device that looks like a wall calendar with icons for people and project names that makes it easy to generate data sets of FOAF + vCard-RDF + DOAP + etc.)&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-761&#34;&gt;March 24, 2007 9:14 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Thanks everyone, there are a lot of great leads here.&lt;/p&gt;
&lt;p&gt;Jay - I agree that trying to generalize this too much would lead to a mess, which is why both of my use cases are about using information that people record in order to get work done at their jobs. I think that&amp;rsquo;s a fine place to start.\&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-767&#34;&gt;March 28, 2007 10:57 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Going through some old bookmarks, I just found the &lt;a href=&#34;http://kftf.ischool.washington.edu/surveys.asp&#34;&gt;Keeping Found Things Found&lt;/a&gt; project at the University of Washington, which looks valuable for this research.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2007">2007</category>
      
      <category domain="https://www.bobdc.com//categories/semantic-web">semantic web</category>
      
    </item>
    
    <item>
      <title>Instant tech marketing copy</title>
      <link>https://www.bobdc.com/blog/instant-tech-marketing-copy/</link>
      <pubDate>Mon, 19 Mar 2007 20:21:21 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/instant-tech-marketing-copy/</guid>
      
      
      <description><div>Monetize scalable supply-chains! Integrate granular users! Reintermediate 24/7 interfaces!</div><div>&lt;p&gt;&lt;a href=&#34;http://www.amazon.com/exec/obidos/ISBN=0691122946/bobducharmeA/&#34;&gt;&lt;img src=&#34;http://ec2.images-amazon.com/images/P/0691122946.01._AA240_SCLZZZZZZZ_V45418175_.jpg&#34; alt=&#34;[&#39;On Bullshit&#39; cover]&#34; border=&#34;0&#34; align=&#34;right&#34; hspace=&#34;30px&#34; vspace=&#34;30px&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve been cleaning up my Firefox bookmarks and moving some to &lt;a href=&#34;http://del.icio.us&#34;&gt;del.icio.us&lt;/a&gt;. (While reading an article somewhere, my wife recently asked me if I&amp;rsquo;d heard of del.icio.us, and after a quick check of the date on my first bookmark there, I could say &amp;ldquo;member since 2004!&amp;rdquo;) Revisiting some of my Firefox bookmarks reminded me of several funny sites out there such as dack.com&amp;rsquo;s &lt;a href=&#34;http://www.dack.com/web/bullshit.html&#34;&gt;web economy bullshit generator&lt;/a&gt;. Each click of the &amp;ldquo;make bullshit&amp;rdquo; button creates a new three-word marketing phrase: Strategize collaborative infomediaries! Repurpose B2C systems!&lt;/p&gt;
&lt;p&gt;My bookmark file says that I bookmarked it in Firefox in 2000. I find that difficult to believe, because it&amp;rsquo;s so up-to-date, but words like &amp;ldquo;vortals&amp;rdquo; do reveal its age a bit. It simply picks a random word from each of three columns, so there&amp;rsquo;s no fancy coding going on, but the generated phrases are frighteningly realistic. Using it, you can matrix leading-edge relationships! Brand collaborative technologies!&lt;/p&gt;
&lt;p&gt;If it was provided as a web service, it would be useful for an application that generated an automated business plan: &amp;ldquo;Our product will let our customers transition back-end e-commerce in order to harness extensible content. As we synergize B2C initiatives, we&amp;rsquo;ll begin to syndicate scalable platforms&amp;rdquo;. (For one final smile before I posted this entry, a spellcheck of it only flagged Dack-generated buzzwords: infomediaries, Reintermediate, Repurpose, Strategize, synergize, and vortals.)&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2007">2007</category>
      
      <category domain="https://www.bobdc.com//categories/miscellaneous">miscellaneous</category>
      
    </item>
    
    <item>
      <title>Metadata and metadata</title>
      <link>https://www.bobdc.com/blog/metadata-and-metadata/</link>
      <pubDate>Tue, 13 Mar 2007 09:58:27 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/metadata-and-metadata/</guid>
      
      
      <description><div>A conference about metadata, full of strangers.</div><div>&lt;p&gt;I consider myself a metadata geek. I &lt;a href=&#34;http://www.snee.com/bobdc.blog/metadata/&#34;&gt;write about it&lt;/a&gt;, I work for a &lt;a href=&#34;http://www.innodata-isogen.com&#34;&gt;company&lt;/a&gt; that helps manage it, I track and play with the related standards, and I have a network of friends who fall into several of these categories.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://www.wilshireconferences.com/MD2007/&#34;&gt;&lt;img src=&#34;https://www.bobdc.com/img/main/wilshire.jpg&#34; alt=&#34;[Wilshire conference logo]&#34; border=&#34;0&#34; align=&#34;right&#34; hspace=&#34;30px&#34; vspace=&#34;30px&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Last week I went to a &lt;a href=&#34;http://www.wilshireconferences.com/MD2007&#34;&gt;conference devoted to metadata&lt;/a&gt; and I didn&amp;rsquo;t know a soul at this large, well-attended conference. I had heard of only one speaker, John Zachman, because a former co-worker was a big fan of his &lt;a href=&#34;http://www.zifa.com/&#34;&gt;framework&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;It turns out that there&amp;rsquo;s a whole metadata industry separate from the metadata world I&amp;rsquo;d been following. The world that I know is focused on describing HTTP-addressable resources (the &amp;ldquo;RD&amp;rdquo; in &amp;ldquo;RDF&amp;rdquo;), whether on the public internet or private intranets. We think of schemas as metadata, of course, but the management of huge amounts of schemas across an enterprise is not a research issue as much as the association of metadata with resources in order to get more value out of those resources.&lt;/p&gt;
&lt;p&gt;At this conference, tracking of schemas across an enterprise was a big issue. &lt;a href=&#34;http://en.wikipedia.org/wiki/Data_Warehouse&#34;&gt;Data Warehousing&lt;/a&gt; and &lt;a href=&#34;http://en.wikipedia.org/wiki/Data_governance&#34;&gt;Data Governance&lt;/a&gt; were the context for it all; as one attendee told me at a reception, organizations like his financial institution want to be able to know where a report&amp;rsquo;s numbers came from.&lt;/p&gt;
&lt;p&gt;Presentation titles like &lt;a href=&#34;http://www.wilshireconferences.com/MD2007/Sessions/v2.html&#34;&gt;How to Perform Information Stewardship within Business Process Redesign&lt;/a&gt; and &lt;a href=&#34;http://www.wilshireconferences.com/MD2007/Sessions/j5.html&#34;&gt;Using Metadata to Support Compliance and Accountability Efforts&lt;/a&gt; give the flavor of what was on attendees&amp;rsquo; minds. I don&amp;rsquo;t think of my metadata interests as being limited to publishing applications, but if we take the word &amp;ldquo;publishing&amp;rdquo; in its broadest sense, maybe it is. The standards I use are typically about tracking the metadata associated with a resource that will be used to convey information to a user—or, to put it differently, a resource that will be published. (It&amp;rsquo;s not &lt;a href=&#34;https://www.bobdc.com/blog/navigating-the-library-metadat&#34;&gt;the first time&lt;/a&gt; that I&amp;rsquo;ve found another whole world of metadata separate from the kind I pay close attention to.)&lt;/p&gt;
&lt;p&gt;In the Venn diagram showing the concerns of the HTTP/publishing metadata world and the enterprise data governance metadata world, the overlap is OWL and semantic technologies, because both groups are getting more interested. My &lt;a href=&#34;http://www.wilshireconferences.com/MD2007/Sessions/v6.html&#34;&gt;conference talk&lt;/a&gt; was on the basics of RDF, and I was definitely not preaching to the converted. I found myself explaining issues that I thought we&amp;rsquo;d all moved past, such as why the URIs that represent resources may look like web addresses but aren&amp;rsquo;t necessarily.&lt;/p&gt;
&lt;p&gt;It was particularly odd for me to see so much interest in OWL from people who weren&amp;rsquo;t that interested in RDF. I saw great coverage of both in a three-hour seminar by Dave McComb and Simon Robe on &lt;a href=&#34;http://www.wilshireconferences.com/MD2007/Sessions/x5.html&#34;&gt;Enterprise Ontology: Designing Your Core Model Using Semantic Web Technology&lt;/a&gt;. Unfortunately, I only saw the first hour because I had to leave for the airport. In a tag-team format, they covered many important issues very well considering how much of the audience was learning these concepts from scratch. (And now I finally understand the difference between Tboxes and Aboxes!) As with any such presentation, much of it was about how OWL models data, and I wish I could have stayed to see how people apply these models to their world of metadata. (Now that I think of it, some of &lt;a href=&#34;https://www.bobdc.com/blog/xml-2006-paper-done-and-availa&#34;&gt;my own experiments&lt;/a&gt; may relate more to the enterprise data world than the publishing world.) Any suggestions on good background resources for the world of enterprise metadata?&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2007">2007</category>
      
      <category domain="https://www.bobdc.com//categories/metadata">metadata</category>
      
    </item>
    
    <item>
      <title>Bifocals</title>
      <link>https://www.bobdc.com/blog/bifocals/</link>
      <pubDate>Sat, 10 Mar 2007 08:58:36 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/bifocals/</guid>
      
      
      <description><div>Getting accustomed to a new perspective.</div><div>&lt;p&gt;A few years ago, as a tourist in a touristy part of Rome a bit east of the &lt;a href=&#34;http://www.snee.com/panoramic/panopantheon50.html&#34;&gt;Pantheon&lt;/a&gt;, I was losing faith in my wife&amp;rsquo;s directional sense and asked her for the map. As I walked along and tried to read it through my prescription sunglasses, I couldn&amp;rsquo;t make out the street names. The print was too tiny. Or—and I really didn&amp;rsquo;t like this scenario—I was finally old enough that I was having trouble seeing small print.&lt;/p&gt;
&lt;img src=&#34;https://www.bobdc.com/img/main/bifocals.jpg&#34; alt=&#34;[Google map east of Roman Pantheon]&#34; border=&#34;0&#34; align=&#34;right&#34; hspace=&#34;30px&#34; vspace=&#34;30px&#34;/&gt;
&lt;p&gt;It turns out that I had lasted a few extra years before getting what one optometrist called &amp;ldquo;forty-itis&amp;rdquo;. Still, as a classic bit of irreversible natural physical deterioration that shows that you&amp;rsquo;re getting older, it put a damper on the birthday I had a few days later. On my next visit to the eye doctor, I told him that I sometimes had to take my glasses off to read small print, and I asked him when I&amp;rsquo;d need bifocals. His answer: &amp;ldquo;When you get tired of taking off your glasses.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;On my last visit, I quoted what he&amp;rsquo;d said before and told him that I was tired of taking off my glasses. He put together the prescription for my progressives. (They hardly use the word &amp;ldquo;bifocals&amp;rdquo;, which conjures up grandparent images. I believe the new name describes how the near-sightedness part of each lens gradually blends in to the far-sightedness part without showing a line. And of course, the new name just sounds, well, progressive.) The optician said that the new lenses would take a few weeks, and that you can&amp;rsquo;t just pick up your first bifocals and run or have someone else get them for you, because an optician must teach you how to wear them.&lt;/p&gt;
&lt;p&gt;Five weeks later I went to pick them up. Matthew, the young man who helped me, was friendly, polite, positive, and provided me with very little useful information. I put the new glasses on and thought &amp;ldquo;I can see up close! I can see far away! It&amp;rsquo;s that simple, and I&amp;rsquo;m set!&amp;rdquo; Matthew agreed that I was set, so we finished up the paperwork and I went out the front door and almost fell over on their brick stoop.&lt;/p&gt;
&lt;p&gt;There&amp;rsquo;s a key fact about getting accustomed to bifocals that Matthew neglected to tell me, and in retrospect it seems obvious: the trade-off of having the bottom part of your lenses focused a foot away from your eyes is that it becomes more difficult to see through the bottom of your lenses at objects that are farther than that, like the little step in front of your feet as you leave the optician&amp;rsquo;s office.&lt;/p&gt;
&lt;p&gt;My next errand of the day was the supermarket, where it became very clear (or rather, very obvious) that distant text well below your usual sightlines is harder to read with bifocals. Supermarkets have plenty of this, with a bottom shelf of products on either side of every aisle.&lt;/p&gt;
&lt;p&gt;As I walked through Charlottesville&amp;rsquo;s sunny downtown mall shortly afterward, I got another surprise: the glasses were becoming darker. While I certainly hadn&amp;rsquo;t asked for the kind that automatically mutates into sunglasses in the sun, that&amp;rsquo;s what they gave me. The slight tint that you would see in them after going back indoors gave me the appearance of a bit player on Miami Vice, and not a good guy bit player. (Being of bifocals-wearing age, I&amp;rsquo;m referring to the TV show Miami Vice, not the movie.)&lt;/p&gt;
&lt;p&gt;When I returned to the optometrist and explained to Matthew what had happened, I think he only heard the part about the sun shining on the downtown mall, and he made a polite, positive, vague response. An older optician was more understanding and apologetic and promised me new lenses within ten business days. I get to keep the sometimes-tinted lenses until then, so along with the short scruffy beard that I&amp;rsquo;ve grown to camouflage the stitches under my chin from a recent fall on the ice, I look pretty shady. Maybe, until I can pick up my new lenses and shave again, I&amp;rsquo;ll pick up some gold chains and unbutton a few shirt buttons and go all-out for the bifocaled coke dealer look. That should be a big hit among the dealers in Chinese antiques and greener-than-thou political action tables of the downtown Charlottesville mall.&lt;/p&gt;
&lt;h2 id=&#34;5-comments&#34;&gt;5 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.ccil.org/~cowan&#34; title=&#34;http://www.ccil.org/~cowan&#34;&gt;John Cowan&lt;/a&gt; on &lt;a href=&#34;#comment-728&#34;&gt;March 10, 2007 11:35 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;After everything my wife went through learning to use bifocals, I firmly decided not to get them until I could no longer read without my glasses. So far (almost 49) that hasn&amp;rsquo;t happened. Indeed, I enjoy being able to go without my glasses most of the time: I wear them now to watch TV, movies, and theatre, and when outside, and that&amp;rsquo;s about it &amp;ndash; a big relief after wearing them nonstop since I was 7.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.wasab.dk/morten/blog/&#34; title=&#34;http://www.wasab.dk/morten/blog/&#34;&gt;Morten Frederiksen&lt;/a&gt; on &lt;a href=&#34;#comment-729&#34;&gt;March 10, 2007 1:18 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve been using contacts for a few years now (approaching the mid-thirties), but only just now dawned on me, that I might need &amp;ldquo;progressives&amp;rdquo; one day.&lt;/p&gt;
&lt;p&gt;When that day comes, it likely won&amp;rsquo;t be possible with contacts!\&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-730&#34;&gt;March 10, 2007 1:24 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Progressive contacts do exist. A Google search on the phrase gets 1.5 million hits.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://dannyayers.com&#34; title=&#34;http://dannyayers.com&#34;&gt;Danny&lt;/a&gt; on &lt;a href=&#34;#comment-735&#34;&gt;March 11, 2007 6:40 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;As an early-40s I could do with some mildly progressives. I have an old pair of specs for myopia which are perfect for use at the keyboard, an newer pair that are good for TV distance and further, but which give me eyestrain if used at the keyboard. Dunno, maybe more to do with lifestyle than age, lazy-eye-tis.&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s rather a coincidence that the people now in line to use progressives are of an age to grow up with the influence of Genesis, Yes and Emerson, Lake &amp;amp; Palmer&amp;hellip;&lt;/p&gt;
&lt;p&gt;By Eliot Kimber on &lt;a href=&#34;#comment-736&#34;&gt;March 12, 2007 10:44 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I got bifocals last year (43) and really felt old at that point. (Un)fortunately, those glasses got ground into the bottom of a pool and I went back to my old lenses. I&amp;rsquo;m finding I don&amp;rsquo;t mind having to lift up my glasses to read the small print&amp;ndash;I think it&amp;rsquo;s better than having to tilt my head back to read my computer screen.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2007">2007</category>
      
      <category domain="https://www.bobdc.com//categories/miscellaneous">miscellaneous</category>
      
    </item>
    
    <item>
      <title>ERH Tired of Acrobat PDFs. Me too.</title>
      <link>https://www.bobdc.com/blog/erh-tired-of-acrobat-pdfs-me-t/</link>
      <pubDate>Mon, 05 Mar 2007 08:59:56 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/erh-tired-of-acrobat-pdfs-me-t/</guid>
      
      
      <description><div>PDF is a great format, and it&#39;s used way too much.</div><div>&lt;p&gt;I keep a file of notes for ideas for potential postings in this weblog. Here are the notes for one idea:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;tired of PDF (impedance mismatch between screen and page metaphor; printed pg numbers vs. Acrobat ones; marketing &amp;ldquo;fact sheets&amp;rdquo; in PDF&amp;ndash;why not HTML?)&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;In a recent Mokka mit Schlag posting titled &lt;a href=&#34;http://www.elharo.com/blog/software-development/2007/02/25/pdf-killed-the-programing-language/&#34;&gt;PDF killed the Programming Language&lt;/a&gt;, Elliotte Rusty Harold has beaten me to it (managing to work in a nice &lt;a href=&#34;http://www.youtube.com/watch?v=LnWGWabxkKs&#34;&gt;Buggles&lt;/a&gt; pun as well), but I&amp;rsquo;ll add a few points.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;m tired of people who create content on a computer screen and deliver it for viewers to see on a computer screen with that content optimized for the printout that only gets printed when it&amp;rsquo;s time for their boss or art director to review it. After all, real brochure-ware should look like a nice brochure when the customer wants to print it, right?&lt;/p&gt;
&lt;p&gt;Those customers don&amp;rsquo;t want to nearly as often as the art director may insist. Customers read product fact sheets because they want to see the facts about the product, not because they want a hi-res view of some designer&amp;rsquo;s design skills. Gratuitous use of PDF these days is in some ways worse than gratuitous use of Flash, because at least Flash is about creating things on a computer screen for viewing on a computer screen. PDFs are bad for viewing on a computer screen.&lt;/p&gt;
&lt;h2 id=&#34;jce7v7pFRDqE713nhr_uUw&#34;&gt;Good things about PDFs&lt;/h2&gt;
&lt;p&gt;Adobe&amp;rsquo;s &lt;a href=&#34;http://www.adobe.com/products/acrobat/adobepdf.html&#34;&gt;Why PDF?&lt;/a&gt; page lists some obvious advantages of PDF files. (And, bless them, this page is HTML, not a PDF.) They call it an &amp;ldquo;Open format&amp;rdquo;, but &amp;ldquo;open&amp;rdquo; here is software corporate-speak for &amp;ldquo;documented&amp;rdquo;, which is certainly a Good Thing, but not open in the sense of &amp;ldquo;to any outside influence&amp;rdquo;, which is what &amp;ldquo;open standard&amp;rdquo; means to most people, even (finally) to Sun.&lt;/p&gt;
&lt;p&gt;The same page also lists &amp;ldquo;Multiplatform&amp;rdquo;, &amp;ldquo;Accessible&amp;rdquo;, and &amp;ldquo;Searchable&amp;rdquo; as advantages. I can&amp;rsquo;t argue with the first. While I can&amp;rsquo;t argue with &amp;ldquo;Accessible&amp;rdquo; either, I&amp;rsquo;d be curious about the opinion of someone who studies these issues more closely. &amp;ldquo;Searchable&amp;rdquo; I&amp;rsquo;ll take with a grain of salt—I know I can&amp;rsquo;t write an application that could search through PDF files. (Advocates of so-called &amp;ldquo;binary XML&amp;rdquo; forget that a key advantage of XML as defined in &lt;a href=&#34;http://www.w3.org/TR/xml/#charsets&#34;&gt;its spec&lt;/a&gt; is that, as a text-based format, writing code to search and manipulate it is very, very easy.) Adobe also says that Acrobat files can &amp;ldquo;maintain information integrity&amp;rdquo;—I wouldn&amp;rsquo;t have worded it the same way, but it can be an advantage to know that page breaks will happen in the exact same place with any viewer on any platform, if you really care about page breaks. Many of my company&amp;rsquo;s clients really care about page breaks, because they&amp;rsquo;re publishers who have products that they send off to printing houses. PDF minimizes so many worries about page layout, fonts, and other printer-related issues that for this purpose it&amp;rsquo;s a wonderful thing. Still, while page fidelity is important in contracts and certain other documents, it shouldn&amp;rsquo;t matter to a tutorial or product fact sheet.&lt;/p&gt;
&lt;p&gt;In general, you can&amp;rsquo;t edit PDFs, although Adobe won&amp;rsquo;t list this as an advantage. This is probably because it falls under &amp;ldquo;information integrity&amp;rdquo;, but more importantly, they have software that lets you edit it. Few people own this software (compared with the number of people who own PDF readers), so when you send a PDF to multiple people, you know they won&amp;rsquo;t screw around with it. Unless you&amp;rsquo;re deliberately asking for revisions to a document, sending a PDF is almost always better than sending a Word file. If John sends a Word (or HTML) file to Jane and Jane forwards it to Jim, how does Jim know that Jane didn&amp;rsquo;t alter it en route?&lt;/p&gt;
&lt;p&gt;I suppose that outside of the publishing industry the key advantage to PDF files these days is that they&amp;rsquo;re not Microsoft Word files. In addition to not being able to edit them, people can&amp;rsquo;t add macros to them that will screw up your computer. When someone at my daughter&amp;rsquo;s school&amp;rsquo;s parent association sends a Word file of a flyer for some event, I wish that it was a PDF, but they don&amp;rsquo;t know any better. My wife was very happy when I showed her &lt;a href=&#34;http://sourceforge.net/projects/pdfcreator/&#34;&gt;PDFCreator&lt;/a&gt;, an open-source Windows program that appears in the printer menu of any program that can print and sends your output to a PDF file instead of a printer. For more advanced use, OpenOffice and free and commercial XSL-FO implementations offer more PDF creation choices. I think we can assume that the programmers that Elliotte mentions in his weblog posting are more technically sophisticated than the people running a typical primary school parents council, so they have less excuse for using PDF.&lt;/p&gt;
&lt;h2 id=&#34;RMs5B1GrTKG3JG4K_lEifA&#34;&gt;Bad things about PDFs&lt;/h2&gt;
&lt;p&gt;I want to focus on disadvantages of the PDF format itself, and not the Acrobat Viewer program. Acrobat Viewer is free, but it&amp;rsquo;s big and slow and bloated, and as Adobe&amp;rsquo;s foot in my door it spends too much time reminding me about &amp;ldquo;critical&amp;rdquo; upgrades and trying to sell me things I don&amp;rsquo;t need. However, you don&amp;rsquo;t have to use Acrobat Viewer to view your PDFs; last June I discovered a &lt;a href=&#34;https://www.bobdc.com/blog/a-nice-windows-alternative-to&#34;&gt;nice alternative&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;When you present printed pages on a glowing screen, there&amp;rsquo;s a certain impedance mismatch, and the advantages of fidelity to printed pages (for example, knowing that when you say &amp;ldquo;bottom of page 12&amp;rdquo;, it will mean the same to all viewers) are often outweighed by the disadvantages. I&amp;rsquo;m especially tired of looking at on-screen representations of the gap between the bottom of one page and the top of another; what good does this do for us?&lt;/p&gt;
&lt;img src=&#34;https://www.bobdc.com/img/main/tiredofpdf1.jpg&#34; alt=&#34;[PDF page break]&#34; border=&#34;0&#34; hspace=&#34;30px&#34; vspace=&#34;30px&#34;/&gt;
&lt;p&gt;How about when Acrobat tells you that you&amp;rsquo;re looking at page 13 of a document, but the bottom of that page say &amp;ldquo;15&amp;rdquo;? Perhaps I could hunt through the twenty categories under &amp;ldquo;Edit Preferences&amp;rdquo; for some check box that lets me change this, but why do I need to? Come to think of it, when you say &amp;ldquo;bottom of page 12&amp;rdquo;, maybe it doesn&amp;rsquo;t mean the same thing to all viewers.&lt;/p&gt;
&lt;p&gt;In a comment on Elliotte&amp;rsquo;s posting, John Cowan &lt;a href=&#34;http://www.elharo.com/blog/software-development/2007/02/25/pdf-killed-the-programing-language/#comment-48143&#34;&gt;points out&lt;/a&gt; that an article about a programming language, when written for (or aspiring to) a peer-reviewed journal article, is taken more seriously when it&amp;rsquo;s in PDF format. That&amp;rsquo;s fine, but as Elliotte wrote of the offending web site, &amp;ldquo;it seems all their tutorials, manuals, white papers, and almost everything else are in PDF&amp;rdquo;. There&amp;rsquo;s just too much PDF out there. The &lt;a href=&#34;http://www.google.com/search?hl=en&amp;amp;lr=&amp;amp;safe=off&amp;amp;as_qdr=all&amp;amp;q=allintitle%3A+faq+filetype%3Apdf&#34;&gt;number of FAQs&lt;/a&gt; alone that are in PDF instead of HTML is just shameful.&lt;/p&gt;
&lt;p&gt;I think that companies that are large-scale publishers as a by-product of their main business are getting ready to move past PDF. You&amp;rsquo;ve heard of the jet fighter planes whose total printed documentation weighs more than one of the planes? Step one for easing the access of repair engineers to the relevant documentation—the process of putting it &amp;ldquo;online&amp;rdquo;, as they still call it—has been to create PDF versions of those books. A CD or DVD full of PDFs is more convenient than a wallful of books to someone crawling around the inside of a plane&amp;rsquo;s engine compartment, but pictures of pages with all the negative issues described above are not the best way to present the relevant information to these repair engineers. Designing a better interface for this content delivery is work, but more of these companies are realizing that the work is worth it.&lt;/p&gt;
&lt;p&gt;Reliance on the page metaphor may be a symptom of the historical moment as we spend a few decades completing the transition from hard copy books to more sensible online delivery for the appropriate content. (Note my little qualifier at the end of that sentence—I&amp;rsquo;m sure that for novels read on the beach, bound hardcopy books will remain the best delivery medium for years to come.) You could compare it to the state of movies before D.W. Griffith, when each film was little more than a single static shot of a filmed silent play.&lt;/p&gt;
&lt;p&gt;Many publishers are now picking up hints that it&amp;rsquo;s time to move on from the page metaphor. I recently attended some meetings in which publishers were discussing ways to control costs, and the first step is discussing ways to measure costs. Cost per page is a classic measure in the publishing world, but everyone in the meeting agreed that it&amp;rsquo;s increasingly meaningless as more content is delivered online.&lt;/p&gt;
&lt;p&gt;My former employer LexisNexis recently renamed the division of the company that creates books to &lt;a href=&#34;http://www.lexisnexis.com/productsandservices/solutionguide.asp?&amp;amp;name=value&amp;amp;CMP=IL15046&#34;&gt;Offline Products&lt;/a&gt;. Considering that everyone used to consider online delivery to be an alternative to print delivery, it&amp;rsquo;s interesting that someone would now define print delivery as essentially being &amp;ldquo;not online&amp;rdquo; delivery. Outside of diagrams accompanying patents, I don&amp;rsquo;t know of any content that LexisNexis deliver as PDFs, so they have the right idea. A brief check around &lt;a href=&#34;http://law.lexisnexis.com&#34;&gt;lexisnexis.com&lt;/a&gt; (which unlike lexis.com or nexis.com is more of a marketing web site than a product delivery one) didn&amp;rsquo;t turn up any PDFs either, and those marketing types are usually the quickest to think that PDF is better than HTML. Although this is a company that was founded to deliver content online, they&amp;rsquo;re still capable of being behind the curve on various technical issues; it&amp;rsquo;s nice to see that they have their PDF/HTML priorities straight.&lt;/p&gt;
&lt;p&gt;To summarize: PDF doesn&amp;rsquo;t make your content look slicker or more professional. It &lt;em&gt;may&lt;/em&gt; make the printed output look better, but make damn sure that most of your content&amp;rsquo;s readers intend to print it before reading it. And, make an honest effort to create an HTML version that prints nicely. Smart people are &lt;a href=&#34;http://www.w3.org/TR/css-print/&#34;&gt;making progress&lt;/a&gt; on this front all the time.&lt;/p&gt;
&lt;h2 id=&#34;3-comments&#34;&gt;3 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.ccil.org/~cowan&#34; title=&#34;http://www.ccil.org/~cowan&#34;&gt;John Cowan&lt;/a&gt; on &lt;a href=&#34;#comment-720&#34;&gt;March 5, 2007 12:17 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The PDF format actually can deal with matching the page numbers in the PDF to the page numbers in the content; I have seen PDFs that do so, with introductory pages 1-8 followed by body pages 1-107 or whatever, and googling suggests that even roman page numbers can be handled.&lt;/p&gt;
&lt;p&gt;Of course, a PDF creator has to understand the content in order to generate such page numbers. A printer driver, for example, can&amp;rsquo;t; it doesn&amp;rsquo;t know which feature of the page being printed represents a page number. OpenOffice could but (as of 2.1) does not. Unfortunately, I can&amp;rsquo;t file a bug on this because I can&amp;rsquo;t persuade OO.o to validate me as a bug-tracker user.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://hackwrench.tripod.com&#34; title=&#34;http://hackwrench.tripod.com&#34;&gt;Robert Claypool&lt;/a&gt; on &lt;a href=&#34;#comment-721&#34;&gt;March 5, 2007 7:27 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;All I know is that Adobe Reader frequently &amp;ldquo;forgets&amp;rdquo; that I want to view documents fit to width and continuous pages. It&amp;rsquo;s still set in preferences. What I think is happening is that the Reader lets the documents hijack the settings. Why it would be so important to let the document enforce these settings is beyond me.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://scottysengineeringlog.net&#34; title=&#34;http://scottysengineeringlog.net&#34;&gt;Scott Hudson&lt;/a&gt; on &lt;a href=&#34;#comment-722&#34;&gt;March 6, 2007 11:08 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The performance issue is huge. I&amp;rsquo;ve started using FoxIt Reader for this very reason. It loads PDFs blazing fast, and it&amp;rsquo;s also free.&lt;/p&gt;
&lt;p&gt;&amp;ndash;Scott&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2007">2007</category>
      
      <category domain="https://www.bobdc.com//categories/publishing">publishing</category>
      
    </item>
    
    <item>
      <title>One namespace to rule them all</title>
      <link>https://www.bobdc.com/blog/one-namespace-to-rule-them-all/</link>
      <pubDate>Thu, 01 Mar 2007 01:59:18 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/one-namespace-to-rule-them-all/</guid>
      
      
      <description><div>Says who? A spam generation program.</div><div>&lt;p&gt;I try not to forward people spam that strikes me as funny because of its strange, autogenerated content (&amp;ldquo;Look! Andre Breton-surrealist-beatnik-acid-poetry!&amp;rdquo;) because there&amp;rsquo;s so much of it out there that there&amp;rsquo;s nothing special about any of it. (Please, no counter-examples.) When one such message got through my spam filters to appear in my inbox with a subject header of &amp;ldquo;One namespace to rule them all&amp;rdquo;, though, it certainly pushed a lot of buttons in the mind of a markup/metadata geek. My inner voice was already arguing with someone out there: &amp;ldquo;What? The whole point of namespaces is to have more than one so that you can distinguish the context of one use of a term from that of another use! Throwing them all together won&amp;rsquo;t scale! That&amp;rsquo;s why the microformats use of the class attribute&amp;hellip;&amp;rdquo; etc.&lt;/p&gt;
&lt;img src=&#34;https://www.bobdc.com/img/main/chfr.jpg&#34; alt=&#34;[spam image exceprt]&#34; border=&#34;0&#34; align=&#34;right&#34; hspace=&#34;30px&#34; vspace=&#34;30px&#34;/&gt;
&lt;p&gt;Then, under the bitmap picture plugging stock shares of the China Fruits Corporation, I saw a block of text assembled from random phrases out on the world wide internet. In an earlier draft of this, I quoted the whole thing, but I&amp;rsquo;m sure you see enough of this sort of thing.&lt;/p&gt;
&lt;p&gt;Before grammatically correct but semantically nonsensical auto-generated content was so prevalent, I used to be fascinated by it. I even got credit toward a masters degree in computers science for &lt;a href=&#34;http://www.snee.com/bob/worksch.html#i1&#34;&gt;some LISP coding&lt;/a&gt; to do some. A few times in my undergraduate years, while taking notes during a lecture, I&amp;rsquo;d nod off and still manage to write a few more words. Upon realizing what I did, I always excitedly checked to see what I had written, knowing that this was the kind of thing that Breton actively &lt;a href=&#34;http://www.answers.com/topic/andr-breton&#34;&gt;sought to write&lt;/a&gt;, but I had never written anything interesting. Maybe it would have looked better in French. (Hell, even the word &amp;ldquo;metadata&amp;rdquo; looks better &lt;a href=&#34;http://fr.wikipedia.org/wiki/M%C3%A9tadonn%C3%A9e&#34;&gt;in French&lt;/a&gt;, although Google &lt;a href=&#34;http://www.google.com/search?q=metadonee&#34;&gt;asks&lt;/a&gt;: &amp;ldquo;Did you mean methadone?&amp;rdquo; More than one French poet would approve.)&lt;/p&gt;
&lt;p&gt;The spammers can use whatever algorithms they like to put text under their bitmap message about the money they want from you without triggering your spam detector, but the subject header of their message is more important. It must grab your attention so that you open the message to see what&amp;rsquo;s inside, as this one did for me. Was this subject header as random as the text in the message? Were the odds similar that my mother could have gotten spam with a subject about one namespace ruling them all? The one non-spam use of the phrase that I can find is a &lt;a href=&#34;http://adamv.com/dev/articles/hatevbs/vbscript&#34;&gt;weblog posting about VBScript&lt;/a&gt;, which is pretty far away from the topics I care about. The grand tone of how this namespace would &amp;ldquo;rule them all&amp;rdquo; was certainly part of the button-pushing effect; did it push any buttons for you when I used it?&lt;/p&gt;
&lt;h2 id=&#34;2-comments&#34;&gt;2 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://h&#34; title=&#34;http://h&#34;&gt;John Cowan&lt;/a&gt; on &lt;a href=&#34;#comment-715&#34;&gt;March 1, 2007 9:45 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;It certainly grabbed &lt;em&gt;me&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve been peddling this to a few places lately:&lt;/p&gt;
&lt;p&gt;Data and procedures and the values they amass,&lt;br /&gt;
Higher-order functions to combine and mix and match,&lt;br /&gt;
Objects with their local state, the messages they pass,&lt;br /&gt;
A property, a package, the control point for a catch &amp;ndash;&lt;br /&gt;
In the Lambda Order they are all first-class.&lt;/p&gt;
&lt;p&gt;One Thing to name them all, One Thing to define them,&lt;br /&gt;
One Thing to place them in environments and bind them,&lt;br /&gt;
In the Lambda Order they are all first-class.&lt;/p&gt;
&lt;p&gt;(the abstract to the Revised Revised Report on Scheme)&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://dannyayers.com&#34; title=&#34;http://dannyayers.com&#34;&gt;Danny&lt;/a&gt; on &lt;a href=&#34;#comment-719&#34;&gt;March 2, 2007 5:12 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Me too&lt;/em&gt;, but the From: field helped (and I was expecting an 80&amp;rsquo;s guitarist yarn).&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2007">2007</category>
      
      <category domain="https://www.bobdc.com//categories/miscellaneous">miscellaneous</category>
      
    </item>
    
    <item>
      <title>Checking Out Yahoo Pipes</title>
      <link>https://www.bobdc.com/blog/checking-out-yahoo-pipes/</link>
      <pubDate>Sat, 24 Feb 2007 09:57:52 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/checking-out-yahoo-pipes/</guid>
      
      
      <description><div>Easy, quick, and useful.</div><div>&lt;p&gt;You&amp;rsquo;ve probably heard that Yahoo has this new, drag-and-drop tool to easily combine and manipulate RSS and Atom feeds. (Forgive me for omitting the exclamation point from their name—speaking of which, shouldn&amp;rsquo;t the logo for &lt;a href=&#34;http://es.yahoo.com/&#34;&gt;yahoo.es&lt;/a&gt; be &amp;ldquo;¡Yahoo!&amp;rdquo;?) Tim O&amp;rsquo;Reilly &lt;a href=&#34;http://radar.oreilly.com/archives/2007/02/pipes_and_filte.html&#34;&gt;called Yahoo pipes&lt;/a&gt; no less than &amp;ldquo;a milestone in the history of the internet.&amp;rdquo; Early reports mentioned load problems, and I was extra busy with work, so I waited a bit before trying it.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://pipes.yahoo.com/&#34;&gt;&lt;img src=&#34;http://l.yimg.com/us.yimg.com/i/us/pps/logo_1.gif&#34; alt=&#34;[yahoo pipes logo]&#34; border=&#34;0&#34; align=&#34;right&#34; hspace=&#34;30px&#34; vspace=&#34;30px&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s definitely cool. My to-do list has several low-priority entries under &amp;ldquo;check out the &lt;a href=&#34;http://www.feedparser.org/&#34;&gt;Universal Feed Parser&lt;/a&gt; and try to combine feeds X, Y, and Z into one feed and then search/sort/remove redundant entries from the combination&amp;rdquo;. Yahoo pipes makes this simple: you drag modules from a menu on the left of the screen onto a workspace where you hook up inputs, specialized processing modules, and then an output.&lt;/p&gt;
&lt;p&gt;A Fetch module lets you specify one or more feeds to grab, using Atom or any RSS flavor. (It took me a while to catch on to the &amp;ldquo;or more&amp;rdquo; part—at first I assembled messy combinations of single-URL Fetch modules all combined through Union operator modules.) Of the modules that you can add between your Fetch module(s) and the Pipe Output one, modules liked &amp;ldquo;Content Analysis&amp;rdquo; and &amp;ldquo;For Each: Annotate&amp;rdquo; look interesting, but I couldn&amp;rsquo;t make them work after a few minutes of playing. Instead of worrying about it, I thought I&amp;rsquo;d just wait until the Yahoo Pipes documentation is less skimpy and the system&amp;rsquo;s teething problems are done in case my difficulties are not my own fault.&lt;/p&gt;
&lt;p&gt;And, there&amp;rsquo;s plenty you can do with the simpler modules. For my first non-trivial pipe, I wanted to create something I could show at a talk I&amp;rsquo;ll be giving to a group of law librarians, so I found feed URLs for fourteen Intellectual Property &lt;a href=&#34;http://www.google.com/search?q=blawg&#34;&gt;blawgs&lt;/a&gt; (&amp;ldquo;law blogs&amp;rdquo;—get it?) on blawg.com&amp;rsquo;s &lt;a href=&#34;http://www.blawg.com/Listing.aspx?CategoriesID=14&amp;amp;page=0&#34;&gt;Intellectual Property&lt;/a&gt; blawg listing, then piped the combination of these feeds through a Filter module that only passes along the ones with the phrase &amp;ldquo;fair use&amp;rdquo; in them. Then, the pipeline goes through another module that sorts the entries by published date and finally to the output module. Now we have something that would be valuable to an IP lawyer working on a case where the issue of &amp;ldquo;fair use&amp;rdquo; plays an important role: &lt;a href=&#34;http://pipes.yahoo.com/pipes/OHEdcnfC2xG_dRz8e_gC8A/&#34;&gt;IP blawg entrieso mentioning fair use&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;One great thing about the modules that use information from specific elements of an RSS feed is that they give you a menu of the available elements in the input you&amp;rsquo;ve chosen. For example, the Filter module lets you permit or block items that match any or all of the rules that you specify on that module. The default rule says &amp;ldquo;title contains [text]&amp;rdquo;, where you enter text such as &amp;ldquo;fair use&amp;rdquo;. Along with &amp;ldquo;contains&amp;rdquo; they offer five other choices such as &amp;ldquo;does not contain&amp;rdquo; and &amp;ldquo;Matches regex&amp;rdquo;. The choices besides &amp;ldquo;title&amp;rdquo; depend on your input; when you drag a little blue hose from the output of a module such as Fetch to a Filter module, the Title part briefly says &amp;ldquo;Updating&amp;hellip;&amp;rdquo; and then becomes a drop-down menu showing available elements in the input. The following shows an example of the elements that show up in the drop-down list for a given set of input.&lt;/p&gt;
&lt;img src=&#34;https://www.bobdc.com/img/main/ypipes1.jpg&#34; alt=&#34;screen shot showing dropdown list&#34;/&gt;
&lt;p&gt;Once I was comfortable with all this, I set myself a goal of creating a version of the &lt;a href=&#34;http://www.boingboing.net/&#34;&gt;BoingBoing&lt;/a&gt; feed that filters out all entries by Cory Doctorow and Xeni Jardin, and I timed myself to see how long this took. I didn&amp;rsquo;t time the creation process down to the second, but from the time I clicked &amp;ldquo;New&amp;rdquo; until I saw a working test run was literally two minutes. That&amp;rsquo;s nice. I started writing out the details of how I created the pipe here, but if you have a Yahoo ID you can easily &lt;a href=&#34;http://pipes.yahoo.com/pipes/4rA1TrfC2xGZpUQlr8cPhQ/edit&#34;&gt;see the visual representation for yourself&lt;/a&gt;, and it saves me some typing.&lt;/p&gt;
&lt;p&gt;Of course the results of your pipe creation work are available as a feed that you can add to a feed reader and drop into other Yahoo pipes feeds, but unfortunately there&amp;rsquo;s no Atom output. On the website&amp;rsquo;s suggestion list, you can &lt;a href=&#34;http://suggestions.yahoo.com/srp/?prop=Pipes&amp;amp;query=atom&amp;amp;crumb=PZo9HZP7mQa&amp;amp;sort=date&amp;amp;filter=&amp;amp;resolved=0&amp;amp;vote_mode=1&#34;&gt;vote&lt;/a&gt; to make that a higher priority. (JSON output is available.)&lt;/p&gt;
&lt;p&gt;I look forward to figuring out and trying the more complex modules. And maybe I&amp;rsquo;ll start following BoingBoing again, now that I&amp;rsquo;ve found an easy way to improve their signal-to-noise ratio.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2007">2007</category>
      
      <category domain="https://www.bobdc.com//categories/publishing">publishing</category>
      
    </item>
    
    <item>
      <title>More ways to make money from the semantic web</title>
      <link>https://www.bobdc.com/blog/more-ways-to-make-money-from-t/</link>
      <pubDate>Thu, 15 Feb 2007 09:49:42 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/more-ways-to-make-money-from-t/</guid>
      
      
      <description><div>For example, by helping the people who are trying to follow the money.</div><div>&lt;p&gt;Some of us geeky types play with certain technologies just because they&amp;rsquo;re fun and useful to us and to our friends in the little clubs that form around each technology. We debate about their potential use to a wider audience, and there&amp;rsquo;s certainly been plenty of this debate about the semantic web.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://www.jupitermedia.com&#34;&gt;&lt;img src=&#34;https://www.bobdc.com/img/main/jmedia.jpg&#34; alt=&#34;[Jupiter Media logo]&#34; border=&#34;0&#34; align=&#34;right&#34; hspace=&#34;30px&#34; vspace=&#34;30px&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;When people who feel that there&amp;rsquo;s money to make start sniffing around, it&amp;rsquo;s an odd feeling; my only real experience at that was watching XML&amp;rsquo; s transition from a side project of some SGML geeks to a pillar technology of an economic bubble. The semantic web may never get that big (personally, I feel that as with Artificial Intelligence, useful and valuable technologies will spin off from it and become part of the plumbing while people continue to debate the &amp;ldquo;success&amp;rdquo; and &amp;ldquo;failure&amp;rdquo; of the blanket term), but people who see a market for it are starting to look into it.&lt;/p&gt;
&lt;p&gt;Jupitermedia has been around for a while. They had the foresight to grab the domain names &lt;a href=&#34;http://www.internet.com/&#34;&gt;internet.com&lt;/a&gt; and &lt;a href=&#34;http://www.graphics.com/&#34;&gt;graphics.com&lt;/a&gt; way back when, and I had heard their name in association with the &lt;a href=&#34;http://www.devx.com/&#34;&gt;devX.com&lt;/a&gt; tech news and features site and their &lt;a href=&#34;http://www.jupiterevents.com/&#34;&gt;conferences and trade shows&lt;/a&gt;. They&amp;rsquo;re planning on a Semantic Web conference, and more importantly for now, they&amp;rsquo;re looking to hire someone to cover the semantic web for them on a part-time basis. They want two or three articles a week of 400 – 500 words each. Pay would be based on the author&amp;rsquo;s experience in semantic web-related work and his or her writing background. According to Gus Venditto of jupitermedia.com (gvenditto@), &amp;ldquo;What we expect is someone who will do the research—talking to people, following up on leads, reading the wires for the latest—and then write solid newsy articles.&amp;rdquo; I&amp;rsquo;d do it, but my job keeps me busy enough.&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s a great opportunity for someone interested in evangelizing the semantic web to get paid to do so. Of course, to track potential story ideas, you&amp;rsquo;d want to point &lt;a href=&#34;http://planetrdf.com/&#34;&gt;Planet RDF&lt;/a&gt; and some some &lt;a href=&#34;http://digg.com/&#34;&gt;digg&lt;/a&gt;, &lt;a href=&#34;http://del.icio.us/&#34;&gt;del.icio.us&lt;/a&gt;, &lt;a href=&#34;http://www.blogpulse.com/&#34;&gt;blogpulse&lt;/a&gt;, and &lt;a href=&#34;http://reddit.com/search?q=semantic+web&#34;&gt;reddit&lt;/a&gt; feeds into a &lt;a href=&#34;http://pipes.yahoo.com/&#34;&gt;Yahoo pipe&lt;/a&gt; &amp;hellip; and the rest is left as an exercise for the writer.&lt;/p&gt;
&lt;p&gt;If you&amp;rsquo;re interested, get in touch with Gus.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2007">2007</category>
      
      <category domain="https://www.bobdc.com//categories/semantic-web">semantic web</category>
      
    </item>
    
    <item>
      <title>Generating RDFa from Movable Type, Part 2</title>
      <link>https://www.bobdc.com/blog/generating-rdfa-from-movable-t-1/</link>
      <pubDate>Fri, 09 Feb 2007 21:15:22 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/generating-rdfa-from-movable-t-1/</guid>
      
      
      <description><div>Why generate metadata that&#39;s redundant with data?</div><div>&lt;p&gt;After I &lt;a href=&#34;https://www.bobdc.com/blog/generating-rdfa-from-movable-t&#34;&gt;wrote recently&lt;/a&gt; about tweaking a Movable Type template so that RDFa metadata would be automatically generated with the individual archive versions of each weblog posting, Ben Adida &lt;a href=&#34;http://rdfa.info/2007/02/01/rdfa-in-movable-type/&#34;&gt;suggested&lt;/a&gt; that it would be better if I had added the markup inline with the weblog entry instead of grouping it into a single block in the web page&amp;rsquo;s &lt;code&gt;head&lt;/code&gt; element.&lt;/p&gt;
&lt;p&gt;I put the metadata in the &lt;code&gt;head&lt;/code&gt; element because I&amp;rsquo;ve been pushing the RDFa development effort to consider the use case of metadata that describes content without being content, such as workflow information about a document or about a component of a document. (For an example, see the &lt;a href=&#34;http://www.w3.org/2006/07/SWD/RDFa/scenarios/20070109/#use-case-3&#34;&gt;Content Management Metadata example&lt;/a&gt; that I submitted to the W3C&amp;rsquo;s RDFa Use Cases document.) I realized, though, that for most of the RDFa metadata that I had added to my weblog template, Ben was right, because so much of that metadata was redundant with the web page&amp;rsquo;s data. The following &lt;code&gt;meta&lt;/code&gt; element uses the MovableType tag &lt;code&gt;$MTEntryTitle&lt;/code&gt; to plug the document&amp;rsquo;s title into the &lt;code&gt;content&lt;/code&gt; attribute,&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;lt;meta property=&amp;quot;dc:title&amp;quot; content=&amp;quot;&amp;lt;$MTEntryTitle encode_html=&amp;quot;1&amp;quot;$&amp;gt;&amp;quot;/&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;so that for a document like this one I&amp;rsquo;d end up with this &lt;code&gt;meta&lt;/code&gt; element:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;lt;meta property=&amp;quot;dc:title&amp;quot; content=&amp;quot;Generating RDFa from Movable Type, Part 2&amp;quot;/&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Since that title is already showing up elsewhere in the document as the title that you see when you read the web page, this second indication of the title is redundant with the displayed title. (It&amp;rsquo;s actually the third indication of the title—I&amp;rsquo;ve always hated how a typical HTML document needs to have its title specified in both the &lt;code&gt;title&lt;/code&gt; child of the &lt;code&gt;head&lt;/code&gt; element and in an &lt;code&gt;h1&lt;/code&gt; element near the top of the &lt;code&gt;body&lt;/code&gt;.)&lt;/p&gt;
&lt;p&gt;The Movable Type template that I use doesn&amp;rsquo;t store the main title in an &lt;code&gt;h1&lt;/code&gt; element, but I&amp;rsquo;m not going to screw around with its structure. I went with the RDFa philosophy of adding a few tags here and there to provide machine-readable clues about the meaning of the existing content. The bold text here shows how the template now generates the same dc:title triple that it generated before, without adding another copy of the title string to the web page.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;lt;h3 class=&amp;quot;entry-header&amp;quot;&amp;gt;&amp;lt;meta property=&amp;quot;dc:title&amp;quot;&amp;gt;&amp;lt;$MTEntryTitle$&amp;gt;&amp;lt;/meta&amp;gt;&amp;lt;/h3&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The description was already in the document as well, so adding a few tags also made it serve double-duty as data and metadata.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;lt;p&amp;gt;&amp;lt;b&amp;gt;&amp;lt;meta property=&amp;quot;dc:description&amp;quot;&amp;gt;&amp;lt;$MTEntryBody$&amp;gt;&amp;lt;/meta&amp;gt;&amp;lt;/b&amp;gt;&amp;lt;/p&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The other place where I added tags into the content includes an assignment of the author name as the &lt;code&gt;dc:creator&lt;/code&gt; and another more interesting case. The &lt;code&gt;meta&lt;/code&gt; element for the &lt;code&gt;dc:date&lt;/code&gt; predicate isn&amp;rsquo;t re-using data as metadata—it wraps the &amp;ldquo;Posted by&amp;rdquo; date with tags that include a &lt;code&gt;content&lt;/code&gt; attribute to provide the object of the triple instead of using the PCDATA content of the &lt;code&gt;meta&lt;/code&gt; element.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;lt;span class=&amp;quot;post-footers&amp;quot;&amp;gt;Posted by &amp;lt;meta property=&amp;quot;dc:creator&amp;quot;&amp;gt;
&amp;lt;$MTEntryAuthorDisplayName$&amp;gt;&amp;lt;/meta&amp;gt; on 
&amp;lt;meta property=&amp;quot;dc:date&amp;quot; 
content=&#39;&amp;lt;$MTEntryDate format=&amp;quot;%Y-%m-%dT%H:%M:%S&amp;quot;&amp;gt;&#39;&amp;gt;
&amp;lt;$MTEntryDate$&amp;gt;&amp;lt;/meta&amp;gt;&amp;lt;/span&amp;gt; 
&amp;lt;span class=&amp;quot;separator&amp;quot;&amp;gt;|&amp;lt;/span&amp;gt; &amp;lt;a class=&amp;quot;permalink&amp;quot; 
href=&amp;quot;&amp;lt;$MTEntryPermalink$&amp;gt;&amp;quot;&amp;gt;Permalink&amp;lt;/a&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The result will look something like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;lt;span class=&amp;quot;post-footers&amp;quot;&amp;gt;Posted by &amp;lt;meta property=&amp;quot;dc:creator&amp;quot;&amp;gt;
Bob DuCharme&amp;lt;/meta&amp;gt; on 
&amp;lt;meta property=&amp;quot;dc:date&amp;quot; content=&#39;2007-01-31T03:45:05&#39;&amp;gt;
January 31, 2007 03:45 AM&amp;lt;/meta&amp;gt;&amp;lt;/span&amp;gt; 
&amp;lt;span class=&amp;quot;separator&amp;quot;&amp;gt;|&amp;lt;/span&amp;gt; &amp;lt;a class=&amp;quot;permalink&amp;quot; 
href=&amp;quot;http://www.snee.com/bobdc.blog/2007/01/the_economist_welcomes_the_sem.html&amp;quot;&amp;gt;
Permalink&amp;lt;/a&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The second &lt;code&gt;meta&lt;/code&gt; element here has a great reason for using the &lt;code&gt;content&lt;/code&gt; attribute to provide an alternative object for the triple about the Dublin Core date: it&amp;rsquo;s using an ISO 8601 version of the date, which is more useful as metadata than one that spells out the month name, because it&amp;rsquo;s easier to sort and to use in query criteria. I might have put this &lt;code&gt;meta&lt;/code&gt; element in the document&amp;rsquo;s &lt;code&gt;head&lt;/code&gt; (where I still have some metadata) but wrapping it around the displayed date directly associates it with that very relevant part of the document. If I did the same for the dates shown with the comments about a weblog entry, someone could search for comments after 20070201T12:00:00 and before 20070201T15:00:00 and find one posted with a displayed date-time stamp of &amp;ldquo;February 1, 2007 at 2:30 PM&amp;rdquo;.&lt;/p&gt;
&lt;p&gt;I disagree with Ben&amp;rsquo;s assertion that &amp;ldquo;marking up the actual rendered data&amp;hellip; is what RDFa is &lt;em&gt;all&lt;/em&gt; about&amp;rdquo; [my emphasis]. The beauty of RDFa is that its design lets us do that and more with it. Providing metadata that is not redundant with content will serve a real business need in the publishing world. Inline RDFa is more difficult to generate automatically than a single block inserted into a document&amp;rsquo;s &lt;code&gt;head&lt;/code&gt; element (and what&amp;rsquo;s that &lt;code&gt;head&lt;/code&gt; element for, if not document metadata?), and I don&amp;rsquo;t think that hand-crafted RDFa use will add up too quickly.&lt;/p&gt;
&lt;p&gt;Also, for a block of metadata in the header, it&amp;rsquo;s easy to specify the full URL of the containing document as the subject of all those triples by wrapping the &lt;code&gt;meta&lt;/code&gt; elements with another one that has the URL in an &lt;code&gt;about&lt;/code&gt; attribute. The way I have it now, the subject of many of the triples is the empty string, which is understood to be the containing document. Understood by what, in the semantic machine-readable sense? If I write an app that pulls such triples from two different documents, it&amp;rsquo;s going to see the same empty string subject for all of those triples unless I write extra code to fix that. One trick to get around this is an &lt;code&gt;about&lt;/code&gt; attribute on the &lt;code&gt;html&lt;/code&gt; start-tag with the full URL of the document; the triples come out great, but because I&amp;rsquo;ve never seen this done before, I wonder whether it&amp;rsquo;s considered good or bad practice. Let me know if you have any ideas about this.&lt;/p&gt;
&lt;p&gt;Meanwhile, I&amp;rsquo;m accumulating useful triples. Looking for ways to add RDFa to auto-generated HTML is fun; it look like I already have a nice start.&lt;/p&gt;
&lt;h2 id=&#34;4-comments&#34;&gt;4 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.ccil.org/~cowan&#34; title=&#34;http://www.ccil.org/~cowan&#34;&gt;John Cowan&lt;/a&gt; on &lt;a href=&#34;#comment-685&#34;&gt;February 10, 2007 1:54 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;OT but important: can you &lt;em&gt;please&lt;/em&gt; switch to black lettering? I have to magnify your text about five times before I can read it, and it&amp;rsquo;s still not easy.&lt;/p&gt;
&lt;p&gt;Thanks.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://internet-apps.blogspot.com/&#34; title=&#34;http://internet-apps.blogspot.com/&#34;&gt;Mark Birbeck&lt;/a&gt; on &lt;a href=&#34;#comment-686&#34;&gt;February 10, 2007 4:39 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Nice work Bob!&lt;/p&gt;
&lt;p&gt;The use-case you give for the @content attribute is a good one&amp;ndash;that of making sure that search engines can find information that may not be present in the document in a form that a machine might recognise. Other examples would be to attach the date information to words like &amp;ldquo;tomorrow&amp;rdquo; or &amp;ldquo;last week&amp;rdquo;, to attach names to titles like &amp;ldquo;the Prime Minister&amp;rdquo; or &amp;ldquo;the President&amp;rdquo;, or full names to countries like &amp;ldquo;the US&amp;rdquo;.&lt;/p&gt;
&lt;p&gt;Another thing that might be of interest is that RDFa is acquiring the idea of a &amp;lsquo;profile&amp;rsquo; that allows non-QName information that is known about to be converted to have a QName. If that&amp;rsquo;s gobbledygook for anyone reading, I apologise&amp;hellip;it just means that words that are not qualified by being in some namespace, a preprocessor will find recognised ones, and make it so that they &lt;em&gt;are&lt;/em&gt; in a namespace. Using this page as an example, it uses things like @rel=&amp;ldquo;start&amp;rdquo; and @rel=&amp;ldquo;prev&amp;rdquo; whic are defined in HTML; an RDFa parser that uses the HTML profile preprocessor (still being defined, but currently called hGRDDL) will see those as the predicates h:start and h:prev giving you even more triples to play with. :)&lt;/p&gt;
&lt;p&gt;Great to see RDFa being put through its paces.&lt;/p&gt;
&lt;p&gt;Best regards,&lt;/p&gt;
&lt;p&gt;Mark&lt;/p&gt;
&lt;p&gt;PS I almost forgot to mention; the triples you get in head should already include the full URL for the document, and not an empty string. There may be a problem with the processor you are using&amp;hellip;which is it?&lt;/p&gt;
&lt;p&gt;&amp;ndash;&lt;br /&gt;
Mark Birbeck, formsPlayer&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;mailto:mark.birbeck@x-port.net&#34;&gt;mark.birbeck@x-port.net&lt;/a&gt; * +44 (0) 20 7689 9232&lt;br /&gt;
&lt;a href=&#34;https://www.formsPlayer.com&#34;&gt;www.formsPlayer.com&lt;/a&gt; * internet-apps.blogspot.com&lt;/p&gt;
&lt;p&gt;standards. innovation.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-687&#34;&gt;February 10, 2007 1:02 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;John,&lt;/p&gt;
&lt;p&gt;Done.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-688&#34;&gt;February 10, 2007 1:25 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Mark,&lt;/p&gt;
&lt;p&gt;I wouldn&amp;rsquo;t expect the INRIA stylesheet to put the full URL in rdf:Description/@rdf:about, because an XSLT stylesheet has no way to know the name of the input document, but it doesn&amp;rsquo;t work for RDFlib either when run locally or on Elias&amp;rsquo;s web service (compare &lt;a href=&#34;http://www.snee.com/temp/test1.html&#34;&gt;http://www.snee.com/temp/test1.html&lt;/a&gt; with &lt;a href=&#34;http://torrez.us/services/rdfa/?url=http%3A%2F%2Fwww.snee.com%2Ftemp%2Ftest1.html).&#34;&gt;http://torrez.us/services/rdfa/?url=http%3A%2F%2Fwww.snee.com%2Ftemp%2Ftest1.html).&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;So the full URL should be plugged in? That&amp;rsquo;s good news.&lt;/p&gt;
&lt;p&gt;thanks,&lt;/p&gt;
&lt;p&gt;Bob&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2007">2007</category>
      
      <category domain="https://www.bobdc.com//categories/rdf">RDF</category>
      
    </item>
    
    <item>
      <title>The Economist welcomes the Semantic Web</title>
      <link>https://www.bobdc.com/blog/the-economist-welcomes-the-sem/</link>
      <pubDate>Wed, 31 Jan 2007 03:45:05 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/the-economist-welcomes-the-sem/</guid>
      
      
      <description><div>And, perhaps, vice versa.</div><div>&lt;p&gt;Last Saturday morning, before getting on a plane from Frankfurt to Charlotte, North Carolina with a book that I knew wouldn&amp;rsquo;t last that far, I went into a newsstand to find a big, glossy, British Formula One magazine to supplement my reading. I didn&amp;rsquo;t find one, but I bought something else glossy and British: a special New Year edition of &lt;a href=&#34;http://www.economist.com/index.html&#34;&gt;The Economist&lt;/a&gt; called &amp;ldquo;The World in 2007&amp;rdquo; with summaries of where various countries and industries are now and where they may go in the coming year.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://www.economist.com/index.html&#34;&gt;&lt;img src=&#34;http://www.economist.com/images/economist_logo.png&#34; alt=&#34;[Economist logo]&#34; border=&#34;0&#34; align=&#34;right&#34; hspace=&#34;30px&#34; vspace=&#34;30px&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;This special edition included lots of single-page essays from government and industry leaders such as Angela Merkel John McCain, Eric Schmidt, and Pedro Lamy. All of the essays that I started were predictable enough that I didn&amp;rsquo;t finish them, until I got to the last page of the magazine, which had an essay with something I hadn&amp;rsquo;t expected to see: a parenthesized list of of the three acronyms RDF, OWL, and SPARQL.&lt;/p&gt;
&lt;p&gt;Huh? In a special edition of The Economist? The essay was called &amp;ldquo;Welcome to the Semantic Web,&amp;rdquo; and of course the tech leadership figure that wrote it was Tim Berners-Lee. While some of us think that bnodes, reification, and traction for competing alternatives to RDF/XML are Big Issues to worry about, his essay made for a nice pep talk on the real big issues that form the grand vision of the Semantic Web.&lt;/p&gt;
&lt;p&gt;You&amp;rsquo;ve probably read most of the essay&amp;rsquo;s main points in other pieces he&amp;rsquo;s written, but a newer one caught my eye:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Consider the financial services industry. Successful investment strategies are based on finding patterns and trends in an increasingly diverse set of information sources&amp;hellip; Leading-edge providers of financial information are now developing services that allow users easily to integrate the data they have themselves—about their own portfolios or from their in-house market models—with the data delivered by the information service. The unique value creation lies in the integration service itself, not in the raw data on its own or even in the software tools, most of which will be built on open-source components.&lt;/p&gt;
&lt;p&gt;The key to this integration is to use common data formats that link the information with identifiable vocabularies&amp;hellip;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;If you follow &lt;a href=&#34;http://www.planetrdf.com&#34;&gt;Planet RDF&lt;/a&gt;, I&amp;rsquo;m sure you can imagine where this discussion led, but the part quoted above reminded me of something that I&amp;rsquo;ve been thinking about. The financial community certainly has a lot of data. I&amp;rsquo;ve noticed that much of the effort in the use of the Extensible Business Reporting Language (&lt;a href=&#34;http://www.xbrl.org/Home/&#34;&gt;XBRL&lt;/a&gt;) goes into careful development of machine-readable taxonomies, and I&amp;rsquo;ve wondered whether RDF/OWL technology can take advantage of these taxonomies to do something worthwhile. While it&amp;rsquo;s a cliche that metadata is data about data, lots of RDF/OWL ontology metadata isn&amp;rsquo;t about anything—&amp;ldquo;Ontologies for the sake of ontologies,&amp;rdquo; as Dan Connolly put it—but financial metadata has plenty of data to go with it, so there should be lots of room for interesting new metadata applications.&lt;/p&gt;
&lt;p&gt;Has anyone heard of any use of semantic web technology with financial data, with or without building on XBRL work?&lt;/p&gt;
&lt;h2 id=&#34;5-comments&#34;&gt;5 Comments&lt;/h2&gt;
&lt;p&gt;By Josh Tauberer on &lt;a href=&#34;#comment-660&#34;&gt;January 31, 2007 6:46 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve played with the SEC&amp;rsquo;s Edgar data source and turned some of it into N3 for fun to see what it would look like:&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://www.govtrack.us/data/rdf/sec.n3.gz&#34;&gt;http://www.govtrack.us/data/rdf/sec.n3.gz&lt;/a&gt;&lt;br /&gt;
&lt;a href=&#34;http://www.govtrack.us/rdfbrowse.xpd?uri=tag:govshare.info,2005:data/us/sec/cik0001308161&#34;&gt;http://www.govtrack.us/rdfbrowse.xpd?uri=tag:govshare.info,2005:data/us/sec/cik0001308161&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://kontrawize.blogs.com/kontrawize/&#34; title=&#34;http://kontrawize.blogs.com/kontrawize/&#34;&gt;Anthony B. Coates&lt;/a&gt; on &lt;a href=&#34;#comment-661&#34;&gt;January 31, 2007 8:31 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;m not aware of financial data being encoded using RDF, OWL, etc. It is under consideration as a technology for modelling business contexts, and perhaps whole business models, for a future version of the ISO 20022 standard for generating banking/finance XML Schemas from models, although I should stress that it is only under evalution, it will be quite a while before any decision one way or another is made.&lt;/p&gt;
&lt;p&gt;It doesn&amp;rsquo;t make much sense to use RDF for data that fits neatly into a hierarchy, since XML does a perfectly good job of that. RDF might make sense for data where the structure needs to be flexible over time, e.g. perhaps for financial data distribution, since data vendors want to be able to add new information and new assocations between information in a timely manner, without waiting for standards or Schemas to catch up. This kind of RDF data might also aid &amp;ldquo;mash ups&amp;rdquo; that integrate financial data with other data, or which integrate financial data from multiple sources, which is an interesting use case.&lt;/p&gt;
&lt;p&gt;Cheers, Tony.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-662&#34;&gt;January 31, 2007 11:50 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Hi Tony,&lt;/p&gt;
&lt;p&gt;I hadn&amp;rsquo;t expected to find use of RDF in the financial community (you and I have discussed this before), but was more curious about people in the RDF/semweb community taking some financial data and building something around it. There is plenty of data and metadata out there in a nice machine-readable form, and I&amp;rsquo;m sure that it wouldn&amp;rsquo;t take much XSLT to turn it into RDF for such a mashup.&lt;/p&gt;
&lt;p&gt;Bob&lt;/p&gt;
&lt;p&gt;Bob&lt;/p&gt;
&lt;p&gt;By Economoist peruser on &lt;a href=&#34;#comment-663&#34;&gt;January 31, 2007 12:33 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The SEC&amp;rsquo;s Edgar XBRL data feed is very low volume - months pass between new items&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://yihongs-research.blogspot.com/&#34; title=&#34;http://yihongs-research.blogspot.com/&#34;&gt;Yihong Ding&lt;/a&gt; on &lt;a href=&#34;#comment-664&#34;&gt;January 31, 2007 8:01 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Hi Bob,&lt;/p&gt;
&lt;p&gt;There is an European project called &lt;a href=&#34;http://www.musing-project.eu/&#34;&gt;MUSING&lt;/a&gt;, which aims to provides innovative knowledge management solutions and services to support the execution of business intelligence activities, directly at the end-user premises. Part of the project includes representing XBRL data sources on Semantic Web ontologies. I have participated this work last summer in Innsbruck. For more detailed information, you may like to talk with &lt;a href=&#34;http://www.heppnetz.de/&#34;&gt;Martin Hepp&lt;/a&gt;, who is the leader of ontology development in this MUSING project.&lt;/p&gt;
&lt;p&gt;cheers,&lt;/p&gt;
&lt;p&gt;Yihong&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2007">2007</category>
      
      <category domain="https://www.bobdc.com//categories/semantic-web">semantic web</category>
      
    </item>
    
    <item>
      <title>Generating RDFa from Movable Type</title>
      <link>https://www.bobdc.com/blog/generating-rdfa-from-movable-t/</link>
      <pubDate>Thu, 25 Jan 2007 14:23:17 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/generating-rdfa-from-movable-t/</guid>
      
      
      <description><div>Easy to generate, easy to use.</div><div>&lt;p&gt;Did you know that the default template for the Movable Type weblog publishing software adds metadata in commented-out RDF/XML to permalink pages? (By &amp;ldquo;permalink pages,&amp;rdquo; I mean the pages that store permanent versions of each weblog entry, as opposed to the versions on the main index page, which are only there for a few weeks.) For example, &lt;a href=&#34;http://www.bradchoate.com/weblog/2005/08/25/movable-type-32&#34;&gt;this weblog&lt;/a&gt; is one I picked at random after doing a Google search for &amp;ldquo;Movable Type&amp;rdquo;; doing a View Source on it will show the commented-out RDF. (For all I know, the weblog&amp;rsquo;s author never heard of RDF.) Being commented out, this RDF is not particularly useful, but this default habit of Movable Type inspired me to try to get Movable Type to put &lt;a href=&#34;http://rdfa.info/&#34;&gt;RDFa&lt;/a&gt; versions of the same metadata into this weblog&amp;rsquo;s permalink pages. It works, and you can now pull RDF/XML out of any permalink version of one of my weblog entries with a single URL, &lt;a href=&#34;http://www.w3.org/2005/08/online_xslt/xslt?xslfile=http%3A%2F%2Fwww-sop.inria.fr%2Facacia%2Fsoft%2FRDFa2RDFXML.xsl&amp;amp;xmlfile=http%3A%2F%2Fwww.snee.com%2Fbobdc.blog%2F2007%2F01%2Fgreat_survey_of_rdfweb_develop.html&amp;amp;content-type=&amp;amp;submit=transform&#34;&gt;like this&lt;/a&gt;. It&amp;rsquo;s much easier than trying to do something with commented-out RDF/XML.&lt;/p&gt;
&lt;p&gt;If you follow that link, which tells the &lt;a href=&#34;http://www.w3.org/2005/08/online_xslt/&#34;&gt;W3C online version of Saxon&lt;/a&gt; to process my last weblog entry with a stylesheet that Fabien Gandon of &lt;a href=&#34;http://www-sop.inria.fr/&#34;&gt;INRIA&lt;/a&gt; put on their server that converts the RDFa to RDF/XML, your browser will only display the PCDATA in the RDF/XML, which won&amp;rsquo;t look like much. Do a View Source on it to see how the INRIA stylesheet pulled the RDFa out of that weblog entry and converted it to RDF/XML. Better yet, follow &lt;a href=&#34;http://www.w3.org/RDF/Validator/ARPServlet?URI=http%3A%2F%2Fwww.w3.org%2F2005%2F08%2Fonline_xslt%2Fxslt%3Fxslfile%3Dhttp%253A%252F%252Fwww-sop.inria.fr%252Facacia%252Fsoft%252FRDFa2RDFXML.xsl%26xmlfile%3Dhttp%253A%252F%252Fwww.snee.com%252Fbobdc.blog%252F2007%252F01%252Fgreat_survey_of_rdfweb_develop.html%26content-type%3D%26submit%3Dtransform&amp;amp;PARSE=Parse+URI%3A+&amp;amp;TRIPLES_AND_GRAPH=PRINT_TRIPLES&amp;amp;FORMAT=PNG_EMBED&#34;&gt;this link&lt;/a&gt; to see the extracted RDF/XML document validated and converted into more readable triples. (Scroll to the right of that output to see the predicate and object of each triple.) Both literals and URIs appear as objects in these triples, which is nice.&lt;/p&gt;
&lt;p&gt;The first five triples for each document are based on &lt;code&gt;meta&lt;/code&gt; tags already inserted by Movable Type. The remaining triples are there because of the following tags that I added to my Movable Type &amp;ldquo;Individual Entry Archive&amp;rdquo; template just before the &lt;code&gt;script&lt;/code&gt; element at the end of the &lt;code&gt;head&lt;/code&gt; element:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;  &amp;lt;meta about=&amp;quot;&amp;lt;$MTEntryPermalink$&amp;gt;&amp;quot;&amp;gt;
    &amp;lt;link rel=&amp;quot;trackback:ping&amp;quot; href=&amp;quot;http://madskills.com/public/xml/rss/module/trackback/&amp;quot;/&amp;gt;
    &amp;lt;link rel=&amp;quot;dc:identifier&amp;quot; href=&amp;quot;&amp;lt;$MTEntryPermalink$&amp;gt;&amp;quot;/&amp;gt;
    &amp;lt;meta property=&amp;quot;dc:creator&amp;quot; content=&amp;quot;Bob DuCharme&amp;quot;/&amp;gt;
    &amp;lt;meta property=&amp;quot;dc:title&amp;quot; content=&amp;quot;&amp;lt;$MTEntryTitle encode_html=&amp;quot;1&amp;quot;$&amp;gt;&amp;quot;/&amp;gt;
    &amp;lt;meta property=&amp;quot;dc:date&amp;quot; content=&amp;quot;&amp;lt;$MTEntryDate format=&amp;quot;%Y-%m-%dT%H:%M:%S&amp;quot;&amp;gt;&amp;quot;/&amp;gt;
    &amp;lt;meta property=&amp;quot;dc:description&amp;quot; content=&amp;quot;&amp;lt;$MTEntryExcerpt encode_html=&amp;quot;1&amp;quot;$&amp;gt;&amp;quot;/&amp;gt;
    &amp;lt;link rel=&amp;quot;dc:subject&amp;quot; href=&amp;quot;http://www.snee.com/ns/blogcat/&amp;lt;$MTCategoryLabel$&amp;gt;&amp;quot;/&amp;gt;
&amp;lt;/meta&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;(I have a version with fewer tags and hard-coded values on the weblog&amp;rsquo;s &lt;a href=&#34;http://www.snee.com/bobdc.blog/&#34;&gt;main index page&lt;/a&gt;, so if you&amp;rsquo;re reading this entry from there, a View Source won&amp;rsquo;t show as much metadata as the permalink pages have.) In addition to the new tags above, the &lt;code&gt;html&lt;/code&gt; start-tag in the Movable Type template needs declarations for any referenced namespaces—in this case, dc and trackback.&lt;/p&gt;
&lt;p&gt;To make it all nice and well-formed, I also replaced the template&amp;rsquo;s &lt;code&gt;&amp;amp;laquo;&lt;/code&gt; and &lt;code&gt;&amp;amp;raquo;&lt;/code&gt; entity references that put the « and » characters near the top with the numeric character references &amp;amp;#171; and &amp;amp;#187;. I tried commenting out the DOCTYPE declaration in the Movable Type template, because these new &lt;code&gt;meta&lt;/code&gt; elements make the document invalid XHTML 1.0 and I wouldn&amp;rsquo;t be parsing it against a DTD anyway, but some odd interactions with the CSS made certain parts of the page disappear, so I left the DOCTYPE declaration alone. &lt;a href=&#34;http://home.ccil.org/~cowan/XML/tagsoup/&#34;&gt;TagSoup&lt;/a&gt; had no problem with the extra RDFa metadata, and didn&amp;rsquo;t include the DOCTYPE declaration in its output, so you might want to use that if you&amp;rsquo;re processing HTML files that contain RDFa metadata. (If you&amp;rsquo;re processing HTML from the wild, you&amp;rsquo;ll want to use TagSoup anyway.)&lt;/p&gt;
&lt;p&gt;Why is it worth all this trouble? Why is RDFa cool? Because document metadata that looks like the following is easy to read, easy to write, and easy to convert to RDF/XML:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;lt;meta about= &amp;quot;http://www.snee.com/bobdc.blog/2007/01/great_survey_of_rdfweb_develop.html&amp;quot;&amp;gt;
  &amp;lt;link rel=&amp;quot;trackback:ping&amp;quot; href=&amp;quot;http://madskills.com/public/xml/rss/module/trackback/&amp;quot;/&amp;gt;
  &amp;lt;link rel=&amp;quot;dc:identifier&amp;quot; href=&amp;quot;http://www.snee.com/bobdc.blog/2007/01/great_survey_of_rdfweb_develop.html&amp;quot;/&amp;gt;
  &amp;lt;meta property=&amp;quot;dc:creator&amp;quot; content=&amp;quot;Bob DuCharme&amp;quot;/&amp;gt;
  &amp;lt;meta property=&amp;quot;dc:title&amp;quot; content=&amp;quot;Great survey of RDF/web development tools&amp;quot;/&amp;gt;
  &amp;lt;meta property=&amp;quot;dc:date&amp;quot; content=&amp;quot;2007-01-17T08:33:38&amp;quot;/&amp;gt;
  &amp;lt;meta property=&amp;quot;dc:description&amp;quot; content=&amp;quot;For both reading and writing RDF....&amp;quot;/&amp;gt;
  &amp;lt;link rel=&amp;quot;dc:subject&amp;quot; href=&amp;quot;http://www.snee.com/ns/blogcat/RDF/OWL&amp;quot;/&amp;gt;
&amp;lt;/meta&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;While everyone, including me, loves to beat RDF/XML, it&amp;rsquo;s not quite a dead horse—as an exchange format for moving metadata between applications, it&amp;rsquo;s just fine, and because a stylesheet such as the INRIA one can convert RDFa to RDF/XML, it means that you can easily use RDFa metadata in a wide variety of applications.&lt;/p&gt;
&lt;p&gt;A new data format isn&amp;rsquo;t useful until there&amp;rsquo;s enough data in that format to drive some applications. I think that RDFa will be very useful, and making this modification to a Movable Type template automates the generation of useful RDFa metadata. Because Movable Type regenerated all of the weblog&amp;rsquo;s permalink entries after I changed the template, I now have lots of RDFa to play with, and I&amp;rsquo;ll have more each time I write a new weblog entry. And thanks to Fabien Gandon, I didn&amp;rsquo;t have to do any coding to make it happen!&lt;/p&gt;
&lt;h2 id=&#34;4-comments&#34;&gt;4 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://scottysengineeringlog.net&#34; title=&#34;http://scottysengineeringlog.net&#34;&gt;Scott Hudson&lt;/a&gt; on &lt;a href=&#34;#comment-653&#34;&gt;January 25, 2007 3:58 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Very interesting! Is this compatible with the Dublin Core recommended encoding guidelines (&lt;a href=&#34;http://www.dublincore.org/documents/dcq-html/&#34;&gt;http://www.dublincore.org/documents/dcq-html/&lt;/a&gt;)?&lt;/p&gt;
&lt;p&gt;For example, on my blog today, you can view source and see:\&lt;/p&gt;
&lt;link href=&#34;http://purl.org/dc/elements/1.1/&#34; rel=&#34;schema.DC&#34; /&gt;\
&lt;link href=&#34;http://purl.org/dc/terms/&#34; rel=&#34;schema.DCTERMS&#34; /&gt;
&lt;meta name=&#34;DC.language&#34; content=&#34;en-US&#34; /&gt;\
&lt;meta name=&#34;DC.type&#34; content=&#34;blog&#34; /&gt;\
&lt;meta name=&#34;DC.publisher&#34; content=&#34;Scott C. Hudson&#34; /&gt;\
etc.
&lt;p&gt;With your approach, would I add a separate section, or add to my existing entry:&lt;/p&gt;
&lt;meta name=&#34;DC.language&#34; property=&#34;dc:language&#34; content=&#34;en-US&#34; /&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-654&#34;&gt;January 25, 2007 4:52 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Hi Scott,&lt;/p&gt;
&lt;p&gt;I hadn&amp;rsquo;t thought about that. It looks like the link elements are similar, having href and rel attributes, but the meta elements aren&amp;rsquo;t, with their name attributes. And, the approach for qualifying the names is obviously different&amp;ndash;RDFa uses a namespace prefix and a colon, which many people don&amp;rsquo;t like in an attribute value, but I&amp;rsquo;ve worked with enough XSLT to be used to it. And, I know of software that can understand those qualifiers, which doesn&amp;rsquo;t apply to the Dublin Core approach.&lt;/p&gt;
&lt;p&gt;Why would you put name qualifiers (DC and DCTERMS) after the period in the @rel values? It looks like &lt;a href=&#34;http://www.dublincore.org/documents/dcq-html/&#34;&gt;http://www.dublincore.org/documents/dcq-html/&lt;/a&gt; has them before.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://scottysengineeringlog.net&#34; title=&#34;http://scottysengineeringlog.net&#34;&gt;Scott Hudson&lt;/a&gt; on &lt;a href=&#34;#comment-655&#34;&gt;January 25, 2007 5:20 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Check out section 2.7 on that link. That&amp;rsquo;s where I got the example from. One other component I didn&amp;rsquo;t add to my posted example, is that the head has a profile attribute:\&lt;/p&gt;
&lt;head profile=&#34;http://dublincore.org/documents/dcq-html/&#34;&gt;
&lt;p&gt;So for completeness sake, should I add the property attribute to each of my meta elements?&lt;/p&gt;
&lt;p&gt;Does your RDFa extraction handle this type of meta info, or will I have to have a meta about wrapper?&lt;/p&gt;
&lt;p&gt;&amp;ndash;Scott&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://bnode.org/&#34; title=&#34;http://bnode.org/&#34;&gt;Benjamin Nowack&lt;/a&gt; on &lt;a href=&#34;#comment-656&#34;&gt;January 26, 2007 7:13 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Hi Scott,&lt;/p&gt;
&lt;p&gt;if you&amp;rsquo;re looking for something that&amp;rsquo;s more in line with the DC encoding guidelines, you might want to check out eRDF (just google for &amp;ldquo;embedded RDF&amp;rdquo;), which also doesn&amp;rsquo;t invalidate your HTML or XHTML 1.0.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2007">2007</category>
      
      <category domain="https://www.bobdc.com//categories/rdf">RDF</category>
      
    </item>
    
    <item>
      <title>Great survey of RDF/web development tools</title>
      <link>https://www.bobdc.com/blog/great-survey-of-rdfweb-develop/</link>
      <pubDate>Wed, 17 Jan 2007 08:33:38 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/great-survey-of-rdfweb-develop/</guid>
      
      
      <description><div>For both reading and writing RDF.</div><div>&lt;p&gt;Lee Feigenbaum&amp;rsquo;s recent posting &lt;a href=&#34;http://www.thefigtrees.net/lee/blog/2007/01/using_rdf_on_the_web_a_survey.html&#34;&gt;Using RDF on the Web: A Survey&lt;/a&gt; is worth reading for anyone considering any kind of RDF development work, web-based or otherwise. At first, I thought that he was limiting himself by requiring that applications and tools be capable of both reading and writing RDF data, but after reading his list, I&amp;rsquo;m glad he did. I also found the wide choice of JSON-related systems to be interesting—they could lead the way to something that definitively answers the question &amp;ldquo;_________ is to RDF/XML as XML is to SGML.&amp;rdquo; I like n3, but at this point more RDF/XML alternatives built into working applications will give us all a broader perspective on what works best.&lt;/p&gt;
&lt;p&gt;I know I&amp;rsquo;ll be re-reading Lee&amp;rsquo;s post each time I&amp;rsquo;m considering the development of an RDF-based application. I hope he moves it to a non-weblog page where he keeps it updated over time, because this kind of information evolves quickly.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2007">2007</category>
      
      <category domain="https://www.bobdc.com//categories/rdf/owl">RDF/OWL</category>
      
    </item>
    
    <item>
      <title>Making up URIs</title>
      <link>https://www.bobdc.com/blog/making-up-uris/</link>
      <pubDate>Fri, 12 Jan 2007 08:40:18 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/making-up-uris/</guid>
      
      
      <description><div>Or not.</div><div>&lt;p&gt;I love this &lt;a href=&#34;http://www.mindswap.org/blog/2006/12/14/tales-from-the-dark-side-continued/&#34;&gt;recent quote&lt;/a&gt; from Jim Hendler:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;If you and I decide that we will use the term &amp;ldquo;&lt;a href=&#34;http://www.cs.rpi.edu/&#34;&gt;http://www.cs.rpi.edu/&lt;/a&gt;~hendler/elephant&amp;rdquo; to designate some particular entity, then it really doesn&amp;rsquo;t matter what the other blind men think it is, they won&amp;rsquo;t be confused when they use the natural language term &amp;ldquo;elephant&amp;rdquo; which is not even close, lexigraphically, to the longer term you and I are using. And if they choose to use their own URI, &amp;ldquo;&lt;a href=&#34;http://www.other.blind.guys.org/elephant%22&#34;&gt;http://www.other.blind.guys.org/elephant&amp;quot;&lt;/a&gt; it won&amp;rsquo;t get confused with ours.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;If you wrote a schema in which you defined a namespace and documented specific terms to mean specific things within that namespace, I can use those URIs to have the same meanings in my own application—but it&amp;rsquo;s not always that simple.&lt;/p&gt;
&lt;p&gt;Let&amp;rsquo;s say that there&amp;rsquo;s a metadata standard called xyz, and they&amp;rsquo;ve declared a schema somewhere. My document at &lt;a href=&#34;http://www.snee.com/docs/mydoc1.xml&#34;&gt;http://www.snee.com/docs/mydoc1.xml&lt;/a&gt;, which uses this standard, begins like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;  &amp;lt;document ns:xyz=&amp;quot;http://www.xyz.org/schemas/docmetadata/&amp;quot;&amp;gt;
    &amp;lt;xyz:header&amp;gt;
      &amp;lt;xyz:foo bar=&amp;quot;56H&amp;quot;&amp;gt;northwest&amp;lt;/xyz:foo&amp;gt;
    &amp;lt;/xyz:header&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I feel confident that the following RDF triple makes sense to say that this document has a foo value of &amp;ldquo;northwest&amp;rdquo;, with no ambiguity about whose definition of &amp;ldquo;foo&amp;rdquo; I&amp;rsquo;m using:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;  &amp;lt;http://www.snee.com/docs/mydoc1.xml&amp;gt;
  &amp;lt;http://www.xyz.org/schemas/docmetadata/foo&amp;gt;
  &amp;quot;northwest&amp;quot;.
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I&amp;rsquo;m confident, that is, unless xyz:foo appears in another context in the same document (for example, as a child of another element besides xyz:header) with a different value. In a somewhat related issue, bar is an attribute above, and while the xyz.org people defined it as having a specific meaning in their schema, in my document above that conforms to their schema it doesn&amp;rsquo;t belong to any namespace, so it doesn&amp;rsquo;t feel right to create an RDF predicate for the bar value that begins with &lt;a href=&#34;http://www.xyz.org/schemas/docmetadata/&#34;&gt;http://www.xyz.org/schemas/docmetadata/&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Is the best practice for defining URIs for such information to just make up my own URI for its predicate around a domain name that I have control over, and then use OWL to define an equivalence if the xyz.org people (or anyone else) define their own URI and triples for bar and I want to aggregate their triples with mine?&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;m guessing that this is the case based on the output of the MIT Simile &lt;a href=&#34;http://simile.mit.edu/RDFizers/&#34;&gt;RDFizer&lt;/a&gt; project&amp;rsquo;s RDF version of a sample &lt;a href=&#34;http://www.loc.gov/standards/mods/&#34;&gt;MODS&lt;/a&gt; document. (Must&amp;hellip; fight&amp;hellip; temptation to link to something Quadrophenia-related&amp;hellip;) The MODS URI &lt;a href=&#34;http://www.loc.gov/mods/v3&#34;&gt;http://www.loc.gov/mods/v3&lt;/a&gt; doesn&amp;rsquo;t appear anywhere in the RDFizer representation of the MODS data, and most properties in the RDFizer version are in the namespace &lt;a href=&#34;http://simile.mit.edu/2006/01/ontologies/mods3&#34;&gt;http://simile.mit.edu/2006/01/ontologies/mods3&lt;/a&gt;#. So, just as the RDFizer folk built something around the &lt;a href=&#34;http://simile.mit.edu&#34;&gt;http://simile.mit.edu&lt;/a&gt; namespace that they had control over, I should probably do the same with my own domain name for information from the xyz.org schema instead of trying to make up my own conventions for representing attributes and different contexts for the same element from the &lt;a href=&#34;http://www.xyz.org/schemas/docmetadata/&#34;&gt;http://www.xyz.org/schemas/docmetadata/&lt;/a&gt; URI. For example, &lt;a href=&#34;http://www.xyz.org/schemas/docmetadata/foo/bar&#34;&gt;http://www.xyz.org/schemas/docmetadata/foo/bar&lt;/a&gt; makes sense to me as a way to represent the bar attribute, but xyz.org is not my domain name to build naming conventions around, so I&amp;rsquo;d be better off representing this as &lt;a href=&#34;http://www.snee.com/ns/xyz/foo/bar&#34;&gt;http://www.snee.com/ns/xyz/foo/bar&lt;/a&gt;. Right?&lt;/p&gt;
&lt;p&gt;(I made up xyz.org as an example domain name as I wrote this, and just now looked at &lt;a href=&#34;http://www.xyz.org/&#34;&gt;the actual website&lt;/a&gt;—they&amp;rsquo;re concerned with bigger issues than namespace URIs, such as the Knights Templar and the Secrets of the Bible.)&lt;/p&gt;
&lt;h2 id=&#34;7-comments&#34;&gt;7 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.ccil.org/~cowan&#34; title=&#34;http://www.ccil.org/~cowan&#34;&gt;John Cowan&lt;/a&gt; on &lt;a href=&#34;#comment-642&#34;&gt;January 12, 2007 9:51 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;That&amp;rsquo;s why you should use example.org, example.com, example.net, or just example: all are reserved domain names.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://kontrawize.blogs.com/kontrawize/&#34; title=&#34;http://kontrawize.blogs.com/kontrawize/&#34;&gt;Anthony B. Coates&lt;/a&gt; on &lt;a href=&#34;#comment-643&#34;&gt;January 12, 2007 10:31 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;This immediately reminds me of the &amp;ldquo;topic merging&amp;rdquo; in topic maps, where once two topics a topic map were identified as referring to the same thing, they were merged into a single topic. For this discussion, a topic map topic is sufficiently similar to an RDF URI.&lt;/p&gt;
&lt;p&gt;The interesting thing is that some topic map engines implemented topic merging by doing a real merge within their internal data model, and others did it just by keeping track of the equivalence, but maintaining the original separate topics internally. So I guess both can work.&lt;/p&gt;
&lt;p&gt;Keeping track of multiple topics or URIs for the same thing must introduce some processing overehead, but it does have the advantage that you can &amp;ldquo;unmerge&amp;rdquo; later if you find that two topics were merged but should not have been (either they weren&amp;rsquo;t really the same, or the user had fat thumbs).&lt;/p&gt;
&lt;p&gt;Cheers, Tony.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-644&#34;&gt;January 12, 2007 11:51 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Right, that&amp;rsquo;s why I mentioned the possibility of using OWL to define the equivalence. That&amp;rsquo;s the kind of thing that makes OWL reasoners such as Pellet so much fun, in a semweb-geek kind of way.&lt;/p&gt;
&lt;p&gt;By Ed Davies on &lt;a href=&#34;#comment-646&#34;&gt;January 13, 2007 1:54 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Sorry, I&amp;rsquo;m confused; is the first XML fragment supposed to be RDF/XML or just some arbitrary XML?&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-647&#34;&gt;January 13, 2007 2:29 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The idea is that it&amp;rsquo;s arbitrary XML, and I want to create separate RDF representations of information in that XML using the namespace defined by xyz.org&amp;rsquo;s metadata schema.&lt;/p&gt;
&lt;p&gt;By Ed Davies on &lt;a href=&#34;#comment-648&#34;&gt;January 13, 2007 4:52 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;The idea is that it&amp;rsquo;s arbitrary XML&lt;/em&gt;&amp;hellip; OK, thanks.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;I&amp;rsquo;m confident, that is, unless xyz:foo appears in another context in the same document (for example, as a child of another element besides xyz:header) with a different value.&lt;/em&gt; Finding another &lt;a href=&#34;xyz:foo&#34;&gt;xyz:foo&lt;/a&gt; element implying the same subject but with a different object need not knock your confidence in this triple - it&amp;rsquo;s only in contradiction with the idea that xyz:foo is a functional property.&lt;/p&gt;
&lt;p&gt;&amp;hellip;&lt;em&gt;bar is an attribute above&lt;/em&gt;&amp;hellip;&lt;em&gt;it doesn&amp;rsquo;t belong to any namespace, so it doesn&amp;rsquo;t feel right to create an RDF predicate for the bar value that begins with &lt;a href=&#34;http://www.xyz.org/schemas/docmetadata/&#34;&gt;http://www.xyz.org/schemas/docmetadata/&lt;/a&gt;.&lt;/em&gt; I&amp;rsquo;d definitely agree with this; it would be very presumptuous of you and not at all in the spirit of the game to define a new URI using their domain.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://ambien.kdhege.info/ambien-cheapest.html&#34; title=&#34;http://ambien.kdhege.info/ambien-cheapest.html&#34;&gt;Achaemenius&lt;/a&gt; on &lt;a href=&#34;#comment-650&#34;&gt;January 22, 2007 2:13 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://ambien.kdhege.info/ambien-action.html&#34;&gt;http://ambien.kdhege.info/ambien-action.html&lt;/a&gt; &amp;lsquo;&amp;gt;ambien action&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2007">2007</category>
      
      <category domain="https://www.bobdc.com//categories/rdf/owl">RDF/OWL</category>
      
    </item>
    
    <item>
      <title>Selling content on the Internet, part 1</title>
      <link>https://www.bobdc.com/blog/bitpasspt1/</link>
      <pubDate>Mon, 08 Jan 2007 20:02:38 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/bitpasspt1/</guid>
      
      
      <description><div>Cheap!</div><div>&lt;p&gt;&lt;em&gt;2007-01-19 update:&lt;/em&gt; It looks like Bitpass is going under. While there&amp;rsquo;s no mention of it on their website, I just got email from them saying that &amp;ldquo;due to circumstances beyond our control, we are discontinuing our operations.&amp;rdquo; If anyone knows of a comparable service or a service with different ideas about enabling small vendors to sell content on the internet, please let me know.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve been researching ways to buy and sell content over the Internet, especially when you want one fee of a dollar or less to buy access to multiple files. PayPal&amp;rsquo;s fees make such a low price impractical, but I&amp;rsquo;ve found an interesting alternative called &lt;a href=&#34;http://www.bitpass.com&#34;&gt;BitPass&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://www.bitpass.com/&#34;&gt;&lt;img src=&#34;http://www.bitpass.com/corp/images/bit_top_logo.gif&#34; alt=&#34;[BitPass logo]&#34; border=&#34;0&#34; align=&#34;right&#34; hspace=&#34;30px&#34; vspace=&#34;30px&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;According to some BitPass &lt;a href=&#34;http://www.bitpass.com/corp/products/self-service-services.php&#34;&gt;marketing literature&lt;/a&gt;, &amp;ldquo;Pricing for your content can be as low as just one cent and you can sell via email, instant messaging or a web site&amp;rdquo;. To buy something, you must first &lt;a href=&#34;https://www.bitpass.com/spend/newaccnt/&#34;&gt;register as a buyer&lt;/a&gt;, which includes the selection of a payment option. For most payment options, you need to have a minimum balance of something like $3.00 deposited, but when you select the BitPass PayThru? option, each charge is taken directly from the specified source, such as PayPal. For example, if you spend ten cents on something, you&amp;rsquo;ll see ten cents deducted from your PayPal account, and if you never make another BitPass purchase, you will never have spent more than those ten cents. (For some odd reason, the question mark is part of the name, which leads to odd-looking questions on the generally useful &lt;a href=&#34;http://www.bitpass.com/corp/imedia/buyers-faq.php&#34;&gt;FAQ&lt;/a&gt; like &amp;ldquo;What is BitPass PayThru??&amp;rdquo; and &amp;quot; How does PayThru? help me?&amp;quot; Having seen how the UK tech news site The Register often headlines Yahoo! stories (&lt;a href=&#34;http://www.theregister.co.uk/2006/12/18/yahoo_messenger_security_flap/&#34;&gt;[1]&lt;/a&gt;, &lt;a href=&#34;http://www.theregister.co.uk/2006/10/26/yahoo_semel_job/&#34;&gt;[2]&lt;/a&gt;), I look forward to the day when they build a headline around this payment option.)&lt;/p&gt;
&lt;p&gt;You can learn more about the experience of selling content on BitPass if you&amp;rsquo;re willing to get a hands-on appreciation of the BitPass buyer experience. As an experiment, I&amp;rsquo;ve put &lt;a href=&#34;https://www.bitpass.com/gateway/0000115B/bitpasspt2/&#34;&gt;part 2&lt;/a&gt; of this post on BitPass, where it will cost you ten cents to read it. Or &lt;a href=&#34;https://www.bitpass.com/gateway/0000115B/bitpasspt2/bitpasspt2.mp3&#34;&gt;hear&lt;/a&gt; it—part of my experiment is to sell an XHTML file and an accompanying MP3 file as a pair, so I just read the entry as a podcast and added a little music to the beginning and end. Whether you follow the link to the XHTML file or the MP3 first, BitPass will ask you to become a customer, and then you&amp;rsquo;ll have access to both for 28 days. There&amp;rsquo;s no DRM on the MP3 file, so you can do whatever you like with it. If you already have at least ten cents in a PayPal account, the whole process is pretty quick and simple.&lt;/p&gt;
&lt;p&gt;Wherever this experiment leads, it won&amp;rsquo;t lead to my charging for entries on this weblog. I started bobdc.blog on my own domain name to give me a platform to play with the related technology, and this is just this week&amp;rsquo;s experiment. I have unrelated projects that may benefit from using BitPass, so I wanted to try it out and get your opinions on it and on its potential competitors. Any comments?&lt;/p&gt;
&lt;h2 id=&#34;2-comments&#34;&gt;2 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.ccil.org/~cowan&#34; title=&#34;http://www.ccil.org/~cowan&#34;&gt;John Cowan&lt;/a&gt; on &lt;a href=&#34;#comment-639&#34;&gt;January 8, 2007 10:19 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Bob, I&amp;rsquo;m just not gonna see the second half of your post, not because I grudge the ten cents, but because the barriers to entry are too high. It wants me to create an account, or remember my Microsoft Passport one, which I think I do but I&amp;rsquo;m not sure; and then it would be either use Paypal, which I refuse to do because of their bad reputation in &lt;a href=&#34;http://www.paypalsucks.com&#34;&gt;certain circles&lt;/a&gt;, or deposit $3.00 with BitPass, which I don&amp;rsquo;t much want to bother with either.&lt;/p&gt;
&lt;p&gt;Here&amp;rsquo;s a quote from my rotating .sig file:&lt;/p&gt;
&lt;p&gt;Micropayment advocates mistakenly believe that efficient allocation of resources is the purpose of markets. Efficiency is a byproduct of market systems, not their goal. The reasons markets work are not because users have embraced efficiency but because markets are the best place to allow users to maximize their preferences, and very often their preferences are not for conservation of cheap resources. &amp;ndash;Clay Shirkey&lt;/p&gt;
&lt;p&gt;And here&amp;rsquo;s the &lt;a href=&#34;http://www.openp2p.com/pub/a/p2p/2000/12/19/micropayments.html?page=2&#34;&gt;whole article&lt;/a&gt;; it&amp;rsquo;s worth reading.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-640&#34;&gt;January 8, 2007 11:06 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;John,&lt;/p&gt;
&lt;p&gt;I completely understand. I almost added something to the entry about another issue with micropayments: if it&amp;rsquo;s too easy to pay, it&amp;rsquo;s too easy to game the system and steal, but if it&amp;rsquo;s too difficult&amp;hellip; then it&amp;rsquo;s too difficult.&lt;/p&gt;
&lt;p&gt;Jumping through a hoop and a half doesn&amp;rsquo;t seem like much trouble when we&amp;rsquo;re spending $34 on Amazon and we don&amp;rsquo;t want the wrong people charging to our accounts, but jumping through any hoops at all for something worth ten cents is rarely worth the trouble.&lt;/p&gt;
&lt;p&gt;I, and I assume others, just happen to have a few dollars ($14? $23?) sitting in a PayPal account, so it&amp;rsquo;s little trouble to spend 10 cents of it. For people who avoid having a PayPal account&amp;ndash;and I know there are good reasons to do so&amp;ndash;there&amp;rsquo;s no argument that it&amp;rsquo;s too much trouble.&lt;/p&gt;
&lt;p&gt;Bob\&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2007">2007</category>
      
      <category domain="https://www.bobdc.com//categories/publishing">publishing</category>
      
    </item>
    
    <item>
      <title>Generating a single, globally unique ID</title>
      <link>https://www.bobdc.com/blog/generating-a-single-globally-u/</link>
      <pubDate>Fri, 29 Dec 2006 18:33:59 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/generating-a-single-globally-u/</guid>
      
      
      <description><div>One that&#39;s XML-compliant and not too long.</div><div>&lt;p&gt;When something has a unique ID, it has identity, and you can do more with it. For example, you can link to it, and you can add metadata to it from &lt;a href=&#34;http://www.snee.com/xml/rdf-drdobbs.html&#34;&gt;anywhere&lt;/a&gt;. I wanted to be able to assign a unique ID to something with one or two keystrokes in Emacs. I came up with something that works, although I&amp;rsquo;m sure there are ways to make it work better.&lt;/p&gt;
&lt;p&gt;By &amp;ldquo;unique&amp;rdquo;, I mean reasonably unique in the whole universe. &amp;ldquo;Reasonably unique&amp;rdquo; is one of those phrases that English teachers hate, because by definition a unique string can&amp;rsquo;t come up a second time. The Wikipedia entries for &lt;a href=&#34;http://en.wikipedia.org/wiki/GUID&#34;&gt;GUID&lt;/a&gt; and &lt;a href=&#34;http://en.wikipedia.org/wiki/UUID&#34;&gt;UUID&lt;/a&gt; give a good overview of the world of globally unique ID generation, and the related entry for &lt;a href=&#34;http://en.wikipedia.org/wiki/Universally_Unique_Identifier&#34;&gt;Universally Unique Identifier&lt;/a&gt; gets into some probability figures for generating the same ID twice. When these figures are compared with the chance of being hit by a meteorite, that makes the IDs unique enough for me.&lt;/p&gt;
&lt;p&gt;Routines for creating these IDs typically generate a 16-byte number and represent it as a hexadecimal number of 32 digits. I wanted something that would definitely begin with a letter, so that it would work as an XML ID value, and I also wanted something shorter than 32 characters if possible. I came up with something 21 characters long that I can insert into a document with two keystrokes in Emacs. Some samples: W4QZtu5rTsWnfyiixxIuFQ, Y_JxAODuQ5uZiiBljYoj0w, DJ6AQkYgTnGal8aR9CM_Ig.&lt;/p&gt;
&lt;p&gt;The key to getting it shorter was to use a base 64 representation of the 16 byte value instead of a hexadecimal one. Most programming languages offer a simple function to do this; for the 64 different digits used to represent possible values, they use the upper-case and lower-case latin alphabet, the ten numerical digits, and two more characters. The default two extras are usually the plus sign and the slash, but these functions typically let you override that, often citing the needs of us XML folk as a reason for doing so. I used the hyphen and underscore.&lt;/p&gt;
&lt;p&gt;To make sure that the value begins with a letter, I first prepended an &amp;ldquo;i&amp;rdquo; onto it, but now I just repeat the value generation until I get one that begins with a letter. Here&amp;rsquo;s the whole thing in Python:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;# Generate a 22-character UUID beginning with a letter.
# (base64 encode the raw bytes from a 16 octet UUID.)


import uuid # from http://zesty.ca/python/uuid.html
import sys
import base64 


b64uid = &#39;00000000&#39;


# Keep generating until we have one that starts with a letter. 
while (b64uid[0:1] &amp;lt; &#39;A&#39;) or \
        (b64uid[0:1] &amp;gt; &#39;z&#39;) or \
        ((b64uid[0:1] &amp;gt; &#39;Z&#39;) and (b64uid[0:1] &amp;lt; &#39;a&#39;)):
    uid = uuid.uuid4()
    b64uid = base64.b64encode(uid.bytes,&#39;-_&#39;)


b64uid = b64uid[0:22] # lose the &amp;quot;==&amp;quot; that finishes a base64 value


sys.stdout.write(b64uid)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I used &lt;code&gt;sys.stdout.write&lt;/code&gt; instead of &lt;code&gt;print&lt;/code&gt; because I don&amp;rsquo;t want that extra carriage return, which seems to show up even when I add a comma after the string to print.&lt;/p&gt;
&lt;p&gt;In Emacs, I wanted &lt;code&gt;Ctrl+C i&lt;/code&gt; to insert an ID value at the cursor location, and when editing an XML file, I wanted the same keystroke to insert &amp;quot; id=&amp;rsquo;{id-value-here}&amp;rsquo;&amp;quot;. The following in my .emacs file did it:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;(defun b64uuid () ; Insert base64 UUID
  (interactive)
  (shell-command &amp;quot;python c:/util/b64uuid.py&amp;quot; t)
)
(define-key global-map &amp;quot;^Ci&amp;quot; &#39;b64uuid)


(defun xml-id ()
  (interactive)
  (insert &amp;quot; id=&#39;&#39;&amp;quot;)
  (backward-char 1)
  (shell-command &amp;quot;python c:/util/b64uuid.py&amp;quot; t)
)


(defun nxml-mode-additional-keys ()
  &amp;quot;Key bindings to add to `nxml-mode&#39;.&amp;quot;
  (define-key nxml-mode-map &amp;quot;^Ci&amp;quot; &#39;xml-id)
  (define-key nxml-mode-map &amp;quot;^Co&amp;quot; &#39;sgml-comment)
  ; more xml-specific keybindings...
)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The elisp code, like the python code, could probably be more efficient, but it works. Ideally, the whole thing would be done in elisp, but writing all that would have taken me a lot more time than it took to patch together the existing python code. I&amp;rsquo;d love to hear any suggestions about improving this.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2006">2006</category>
      
      <category domain="https://www.bobdc.com//categories/neat-tricks">neat tricks</category>
      
    </item>
    
    <item>
      <title>XML 2006 paper done and available</title>
      <link>https://www.bobdc.com/blog/xml-2006-paper-done-and-availa/</link>
      <pubDate>Sat, 23 Dec 2006 08:58:18 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/xml-2006-paper-done-and-availa/</guid>
      
      
      <description><div>The slides, too.</div><div>&lt;p&gt;Better late than never, I&amp;rsquo;ve finished my &lt;a href=&#34;http://www.snee.com/xml/xml2006/owlrdbms.html&#34;&gt;paper&lt;/a&gt; and PowerPoint &lt;a href=&#34;http://www.snee.com/xml/xml2006/owlrdbms.ppt&#34;&gt;slides&lt;/a&gt; for the &amp;ldquo;Relational database integration with RDF/OWL&amp;rdquo; talk that I gave in Boston. It&amp;rsquo;s a summary of work I described in this weblog as an ongoing project (&lt;a href=&#34;https://www.bobdc.com/blog/rdfowl-for-data-silo-integrati&#34;&gt;[1]&lt;/a&gt;, &lt;a href=&#34;http://www.snee.com/bobdc.blog/2006/10/all_the_personal_data_you_want.html&#34;&gt;[2]&lt;/a&gt;, &lt;a href=&#34;http://www.snee.com/bobdc.blog/2006/10/integrating_relational_databas.html&#34;&gt;[3]&lt;/a&gt;, &lt;a href=&#34;http://www.snee.com/bobdc.blog/2006/11/mapping_relational_data_to_rdf.html&#34;&gt;[4]&lt;/a&gt;), with a little more detail about how I actually did it.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://www.snee.com/xml/xml2006/owlrdbms.html&#34;&gt;&lt;img src=&#34;https://www.bobdc.com/img/main/howididit.jpg&#34; alt=&#34;[Young Frankenstein book]&#34; border=&#34;0&#34; align=&#34;right&#34; width=&#34;240px&#34; hspace=&#34;30px&#34; vspace=&#34;30px&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;If I was writing the piece for a paying publication (I eventually did, for &lt;a href=&#34;https://web.archive.org/web/20120717044029/http://www.devx.com/semantic/Article/38700&#34;&gt;devx&lt;/a&gt;), I would have put it through a few more drafts, but it was already late and I didn&amp;rsquo;t want to put off finishing it until after Christmas. Also, because I was more concerned with demonstrating the ideas presented than explaining them perfectly, the paper may explain some concepts without using completely correct vocabulary, so I&amp;rsquo;m happy to make corrections.&lt;/p&gt;
&lt;p&gt;To quote from the introduction:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;In some research on who had actually used RDF/OWL to implement such an integration, I learned of several examples, but none that could be examined closely, so I decided to do one myself. My primary goal in this project was to use RDF/OWL to integrate two relational databases and then perform queries against the aggregate collection to answer realistic questions that could not be answered without the addition of an RDF/OWL ontology. Secondary goals included:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Doing it all with free, portable, open-source software&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Making all the relevant files available so that others could reproduce my results (contained in the &lt;a href=&#34;http://www.snee.com/xml/xml2006/owlrdbms.zip&#34;&gt;owlrdbms.zip&lt;/a&gt; file—see its readme.txt file for instructions on using its files to execute the steps described in this paper)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Doing it with address book data, which has relevance to everyone&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Doing it all without any specialized programming (I did write a short XSLT 1.0 stylesheet utility, which is included in the zip file, but managed to avoid any coding that required compiling and deployment)&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;Now, maybe I&amp;rsquo;ll have time to read &lt;a href=&#34;http://2006.xmlconference.org/programme/&#34;&gt;some of the other papers&lt;/a&gt;.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2006">2006</category>
      
      <category domain="https://www.bobdc.com//categories/rdf/owl">RDF/OWL</category>
      
    </item>
    
    <item>
      <title>Navigating the library metadata landscape</title>
      <link>https://www.bobdc.com/blog/navigating-the-library-metadat/</link>
      <pubDate>Wed, 20 Dec 2006 09:05:29 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/navigating-the-library-metadat/</guid>
      
      
      <description><div>With a subway map!</div><div>&lt;p&gt;I&amp;rsquo;ve always been a bit confused by the various library-related metadata standards. Recently while researching one of them I found an excellent PowerPoint presentation summarizing most of them by the &lt;a href=&#34;http://www.oclc.org/&#34;&gt;OCLC&lt;/a&gt;&amp;rsquo;s Eric Childress called &lt;a href=&#34;http://www.oclc.org/research/presentations/childress/fedlink_20031118.ppt&#34;&gt;Metadata Standards&lt;/a&gt;. (While I&amp;rsquo;m on the subject of the OCLC, don&amp;rsquo;t miss The Onion&amp;rsquo;s &lt;a href=&#34;https://www.theonion.com/dewey-decimal-system-helpless-to-categorize-new-jim-bel-1819568608&#34;&gt;mention of them&lt;/a&gt; last August.) He has individual slides on MARC 21, MODS, METS, ONIX, EAD, MIX, and more. He gets bonus points for adding descriptive comments to his slides, which too few people do.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://mapageweb.umontreal.ca/turner/meta/english/metamap.html&#34;&gt;&lt;img src=&#34;https://www.bobdc.com/img/main/metadatasubway.jpg&#34; alt=&#34;[metamap metadata standards map]&#34; border=&#34;0&#34; align=&#34;right&#34; hspace=&#34;30px&#34; vspace=&#34;30px&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;His third slide&amp;rsquo;s list of five types of metadata (Descriptive, Technical and Structural, Administrative, Rights, and Management) is something that one could quibble with, but he&amp;rsquo;s talking about his domain, which he knows better than I do. Without even getting to grouping of metadata categories, just the definition of metadata can be very subjective—I was once in a meeting where, after one participant described what &amp;ldquo;metadata&amp;rdquo; meant to his group, someone else responded &amp;ldquo;That&amp;rsquo;s not metadata! That&amp;rsquo;s control data!&amp;rdquo;&lt;/p&gt;
&lt;p&gt;Childress&amp;rsquo;s sixth slide illustrates the relationship between various standards and organizations as a subway map with color-coded subway lines representing different groupings of standards, organizations, domains, roles, and other concepts. An SVG-based &lt;a href=&#34;http://mapageweb.umontreal.ca/turner/meta/english/metamap.html&#34;&gt;interactive version&lt;/a&gt; of the same map links each acronym (and of course, it&amp;rsquo;s pretty much all acronyms) to descriptive text and relevant links for that acronym.&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s a pretty handy reference. I know I&amp;rsquo;ll be coming back to it, if only when I get to my New Years resolution of &amp;ldquo;write GRDDL XSLT for a few standard metadata formats.&amp;rdquo;&lt;/p&gt;
&lt;h2 id=&#34;3-comments&#34;&gt;3 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://w3future.com/weblog/&#34; title=&#34;http://w3future.com/weblog/&#34;&gt;Sjoerd Visscher&lt;/a&gt; on &lt;a href=&#34;#comment-628&#34;&gt;December 20, 2006 10:06 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Too bad the SVG is invalid XML. Firefox 2.0.0.1 refuses to show it.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-629&#34;&gt;December 20, 2006 11:30 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;With the only valid SVG that I tried displaying with Firefox 2.0, it showed the XML source, not the image. That same example, and the subway map, did display using IE with Adobe&amp;rsquo;s SVG add-in.\&lt;/p&gt;
&lt;p&gt;By Pas B on &lt;a href=&#34;#comment-630&#34;&gt;December 21, 2006 9:46 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Firefox 1.5.0.8 no-likey, either:&lt;/p&gt;
&lt;p&gt;XML Parsing Error: prefix not bound to a namespace&lt;br /&gt;
Location: &lt;a href=&#34;http://mapageweb.umontreal.ca/turner/meta/english/meta_v12.svg&#34;&gt;http://mapageweb.umontreal.ca/turner/meta/english/meta_v12.svg&lt;/a&gt;&lt;br /&gt;
Line Number 309, Column 1:&lt;br /&gt;
[&lt;br /&gt;
^]{}&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2006">2006</category>
      
      <category domain="https://www.bobdc.com//categories/metadata">metadata</category>
      
    </item>
    
    <item>
      <title>RDF versus XQuery</title>
      <link>https://www.bobdc.com/blog/rdf-versus-xquery/</link>
      <pubDate>Wed, 13 Dec 2006 21:01:32 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/rdf-versus-xquery/</guid>
      
      
      <description><div>Different tools for different problems.</div><div>&lt;p&gt;Danny Ayers recently emailed me about a &lt;a href=&#34;http://lists.w3.org/Archives/Public/public-sweo-ig/2006Dec/0080.html&#34;&gt;posting by IBM&amp;rsquo;s Lee Feigenbaum&lt;/a&gt; on the W3C&amp;rsquo;s &lt;a href=&#34;http://www.w3.org/2001/sw/sweo/&#34;&gt;Semantic Web Education and Outreach Interest Group&lt;/a&gt; &lt;a href=&#34;http://lists.w3.org/Archives/Public/public-sweo-ig/&#34;&gt;mailing list&lt;/a&gt;. Lee had written about a colleague&amp;rsquo;s concerns about semantic web technologies, and Danny asked for my thoughts on the issue. I e-mailed him a few paragraphs, and since then I thought that I might as well post them here, with a bit of copy-editing and a few extra thoughts.&lt;/p&gt;
&lt;p&gt;Lee&amp;rsquo;s colleague&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;expressed concerns that SW technologies (and RDF / SPARQL in particular) may fall short in one prominent area in which XML / XQuery shines: dealing with content-oriented (often mixed content) documents. He was concerned about this given some of our claims about the value of RDF/SW technologies as a unifying environment for data and metadata.&lt;/p&gt;
&lt;p&gt;He gave various examples ranging from insurance policies to resumes to rental agreements, with the basic idea being that XQuery can easily answer questions that involve searching within a document (or, more-so, searching for text in a particular paragraph of a document, perhaps with emphasis added) which uses XML markup. He wondered aloud and we discussed what the SW approach to this would be, and we agreed that it&amp;rsquo;s lacking right now. He expressed worry that whereas XML can wrap data that might be best expressed as relational or RDF data (and then join that data in XQuery queries with document data), the RDF world may not have as nice a story.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Yes, RDF and related technologies fall short in areas where XML and XQuery shine, but XML and XQuery fall short in areas where RDF shines. (And they both fall short in areas where relational databases shine, and&amp;hellip; etc.) RDF is a data model. Certain problem domains map very well to that data model, especially large collections of assignments of values to objects that don&amp;rsquo;t normalize into relational tables or even a single XML schema well. An add-on like OWL makes it easier to define relationships between seemingly unrelated classes of information, making it easier to use the aggregate sources together.&lt;/p&gt;
&lt;p&gt;RDF can add a lot to a publishing system, but tracking the relationship between in-line elements and their containing block elements (that is, mixed content) is not something it can help much. For example, it can be used to store metadata about document components and associations as document files moves through a workflow. (So can plain XML as retrieved by XQuery, but RDF-based data from documents in different formats can be aggregated and used with less custom coding.)&lt;/p&gt;
&lt;p&gt;For some perspective on what RDF can contribute to an XML-based system, it helps to forget one thing (RDF/XML—everything I describe here would work just fine with other RDF syntaxes) and to remember something else: RDF&amp;rsquo;s ability to store metadata about anything with a URI means that it can be used to track information about any XML element with its own ID. In the case of block elements, this is useful for the publishing industry because if one block of a document stores a recipe, another a book excerpt, and another a picture, there will be separate metadata to store about each. (For this sort of thing, I think that RDFa will help to lure back people who were scared off by RDF/XML.) Even inline elements as independent units to track can have value added by RDF if they have an ID; a linking element may have a link type assigned, the date that the link&amp;rsquo;s validity was last verified, and other metadata. To take advantage of an inline element&amp;rsquo;s relationship to its text node siblings and their containing element, though, you&amp;rsquo;ll need something that can parse and read the combination such as an XSLT processor or, for sufficiently large XML, an XQuery processor.&lt;/p&gt;
&lt;p&gt;Searching within documents is certainly where XQuery shines, but unless you&amp;rsquo;re using an XQuery engine for pure substring search (for example, &amp;ldquo;show me which documents have the string &amp;lsquo;fireplace&amp;rsquo; in them&amp;rdquo;), the insurance policy and rental agreement examples would only work well with XQuery if all of the documents conformed to the same schema. The RDF/OWL strength that makes it popular for semantic web work is its ability to query collections of data in the same domain that aren&amp;rsquo;t necessarily all of identical structure. A collection of insurance policies from different companies will have some fields in common, some different fields, some fields that look different but mean the same thing&amp;hellip; treating them as a consistent collection will take a lot of XQuery custom coding, but with RDF + SPARQL, it will only take the application of an increasingly popular standard way of specifying the semantics of each company&amp;rsquo;s forms (OWL) to treat the collection as a single aggregate to query. If you add a set of insurance forms from another insurance company to the set, you only need to add a little more to your OWL, and you can leave your SPARQL queries alone. Done the XQuery way, accounting for this new data will mean checking all your FLWOR expressions to see whether they need revision.&lt;/p&gt;
&lt;p&gt;My &lt;a href=&#34;http://2006.xmlconference.org/programme/presentations/188.html&#34;&gt;XML 2006 talk&lt;/a&gt; was unfortunately in the same time slot as &lt;a href=&#34;http://2006.xmlconference.org/programme/presentations/57.html&#34;&gt;another one&lt;/a&gt; on integration of different data sources using RDF/OWL, and this other one used XQuery as well. I&amp;rsquo;m looking forward to finding out more about what Ken and Ronald did and how they did it; more information is available at a &lt;a href=&#34;http://www.rrecktek.com/xml2006/&#34;&gt;page they did for the project&lt;/a&gt;, although I haven&amp;rsquo;t had a chance to look closely at it yet.&lt;/p&gt;
&lt;h2 id=&#34;2-comments&#34;&gt;2 Comments&lt;/h2&gt;
&lt;p&gt;By Conor Ryan on &lt;a href=&#34;#comment-626&#34;&gt;December 18, 2006 10:16 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Link to XML2006 project page for Ken and Ronald, &lt;a href=&#34;http://www.rrecktek.com/xml2006/,&#34;&gt;http://www.rrecktek.com/xml2006/,&lt;/a&gt; is not working.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-627&#34;&gt;December 19, 2006 1:52 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;It looks like the whole site was down and is now back up.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2006">2006</category>
      
      <category domain="https://www.bobdc.com//categories/rdf/owl">RDF/OWL</category>
      
    </item>
    
    <item>
      <title>Home from XML 2006</title>
      <link>https://www.bobdc.com/blog/home-from-xml-2006/</link>
      <pubDate>Sun, 10 Dec 2006 09:00:36 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/home-from-xml-2006/</guid>
      
      
      <description><div>New things for the future, interesting things from the past.</div><div>&lt;p&gt;Since my last posting, some weblogs have mentioned that I was blogging the XML 2006 conference, so I feel bad that I haven&amp;rsquo;t gotten to my second posting about it until after the end of the conference. Most of my time sitting at a computer in Boston was spent on a project for a client, and there was enough of this that I had to skip several talks that I wanted to see. (For a little multi-tasking, I reviewed some project documents while Jason Hunter discussed &lt;a href=&#34;http://2006.xmlconference.org/programme/presentations/50.html&#34;&gt;Web Publishing 2.0&lt;/a&gt;. Jason was more interesting.)&lt;/p&gt;
&lt;p&gt;There were plenty of presentations and conversations that gave me a lot of good ideas. Fabrice Desré&amp;rsquo;s talk &lt;a href=&#34;http://2006.xmlconference.org/programme/presentations/9.html&#34;&gt;Building Dynamic Applications With Mozilla, REX and XQuery&lt;/a&gt; has me looking forward to trying to build an application around Mozilla and the other components he described, and the discussion after my presentation gave me ideas about ways to build onto my OWL/RDBMS integration demo. An interesting point that came up in the questions after my talk was when Claudia Lucía Jimenez-Guarin (who had spoken on a panel about Agile XML Development that I chaired) asked about trying to integrate data from a different ontology into your data. At first I wasn&amp;rsquo;t sure what to say, and then I remembered that much of the point of my talk was that OWL&amp;rsquo;s ability to describe relationships between data from different sources is much of its power, because (with the right query engine or other software that can understand those relationships) it&amp;rsquo;s the key to using data from different sources as an aggregate whole that is greater than the sum of its parts. Once the paper and slides for my talk are available on the web I&amp;rsquo;ll include links here.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;m also going to stay in touch with Ken Sall and Ronald Reck to learn more about their &lt;a href=&#34;http://2006.xmlconference.org/programme/presentations/57.html&#34;&gt;Applying XQuery and OWL to The World Factbook, Wikipedia and Project Gutenberg&lt;/a&gt; project. Unfortunately, their presentation on integrating different data sources with OWL took place at the same time as my presentation on integrating different data sources with OWL. I&amp;rsquo;ll write more about theirs here once I learn more.&lt;/p&gt;
&lt;p&gt;The annual Docbook dinner organized by Norm Walsh was a lot of fun. When someone there mentioned that he wasn&amp;rsquo;t editing XML in Emacs with &lt;a href=&#34;http://www.thaiopensource.com/nxml-mode/&#34;&gt;nxml&lt;/a&gt; mode because of his fondness for the keystrokes in the psgml mode for editing SGML (I&amp;rsquo;ve written a full book chapter about this, and that chapter is &lt;a href=&#34;http://www.snee.com/bob/sgmlfree/&#34;&gt;available for free&lt;/a&gt;) I said that I had written some Emacs macros to make nxml fill in some of the psgml gaps. After reviewing my .emacs file, I&amp;rsquo;m not sure which macros those are, so I just posted &lt;a href=&#34;http://www.snee.com/xml/nxmlmacros.html&#34;&gt;several candidates&lt;/a&gt; if anyone&amp;rsquo;s interested.&lt;/p&gt;
&lt;p&gt;In Jon Bosak&amp;rsquo;s closing keynote of the conference, he told some great stories about XML&amp;rsquo;s birth from the inner core of the SGML community as he set the stage for a discussion of the current state of markup technology innovation. One story, concerning Charles &amp;ldquo;Father of SGML&amp;rdquo; Goldfarb&amp;rsquo;s insistence that even documents with no DOCTYPE declaration have an implied DTD, included references to Kant and Ben Jonson and had the old school SGML people laughing so hard that IBM&amp;rsquo;s Sharon Adler nearly choked on her drink. (When working on my &lt;a href=&#34;http://www.snee.com/bob/xmlann/index.html&#34;&gt;XML: The Annotated Specification&lt;/a&gt; book for Charles&amp;rsquo; Prentice Hall series, I remember long battles over SGML-rooted concepts that he insisted were implied in XML but that I kept pointing out were never mentioned in the specification that I was annotating. I certainly learned a lot from him though, while working on that and, before that one, a book on free SGML software that included the chapter on psgml Emacs mode.) There was lots of laughter and knowing nods from the veterans as Jon brought up SGML complexities that were painfully factored out by the working group. I wonder if those present at the dinner who hadn&amp;rsquo;t been working with XML as long were a bit puzzled by these reactions to talk of debates about DOCTYPE declaration syntax. They certainly didn&amp;rsquo;t double over at the mention of debates about whitespace handling like the people at the front tables did.&lt;/p&gt;
&lt;p&gt;In general, it was great to catch up at the conference with other old friends and some former and current LexisNexis employees, and to get confidential opinions from key players in the field on new developments. Some regulars and semi-regulars who were missed at the conference this year included Eve Maler, Tim Bray and Lauren Wood, Paul Prescod, Uche Ogbuji (although it was great to finally meet his brother Chimezie), Dale Waldt, Eric Freese (who has a &lt;a href=&#34;http://groups.yahoo.com/group/Adopt_5_More/&#34;&gt;grander project&lt;/a&gt; under way), Edd Dumbill, Sean McGrath, Rick Jelliffe, Henry Thompson&amp;hellip; it was a smaller conference, and with the opening and closing theme being the tenth anniversary of the announcement that XML even existed, this conference reminded me of those days, when the same annual gathering (then named &amp;ldquo;SGML 19yy&amp;rdquo;) was so much smaller than it got during the dot com boom.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2006">2006</category>
      
      <category domain="https://www.bobdc.com//categories/xml">XML</category>
      
    </item>
    
    <item>
      <title>Settled in at XML 2006</title>
      <link>https://www.bobdc.com/blog/settled-in-at-xml-2006/</link>
      <pubDate>Mon, 04 Dec 2006 17:55:02 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/settled-in-at-xml-2006/</guid>
      
      
      <description><div>The biggest annual XML event of them all.</div><div>&lt;p&gt;I just got into my Boston hotel room for XML 2006, a conference I&amp;rsquo;ve attended in one form or another every year since it was called SGML 95. On Thursday afternoon I&amp;rsquo;m giving a presentation titled &lt;a href=&#34;http://2006.xmlconference.org/programme/presentations/188.html&#34;&gt;Relational database integration with RDF/OWL&lt;/a&gt; on a project I&amp;rsquo;ve written about several times (&lt;a href=&#34;https://www.bobdc.com/blog/integrating-relational-databas&#34;&gt;[1]&lt;/a&gt;, &lt;a href=&#34;http://www.snee.com/bobdc.blog/2006/11/mapping_relational_data_to_rdf.html&#34;&gt;[2]&lt;/a&gt;) here; I&amp;rsquo;ll be sure to mention the help I got from the excellent comments for those entries of my weblog and several leading up to them. I was also asked to keep a presentation I have on XHTML2 and Publishing warm in case someone in the Publishing track doesn&amp;rsquo;t show up. It&amp;rsquo;s based on some research I did for the &lt;a href=&#34;http://www.prismstandard.org&#34;&gt;PRISM&lt;/a&gt; group on what XHTML2 could do for magazine publishers and other general interest content publishers. (Quite a bit, as it turned out.)&lt;/p&gt;
&lt;p&gt;On Tuesday afternoon I&amp;rsquo;ll host a panel on &lt;a href=&#34;http://2006.xmlconference.org/programme/presentations/155.html&#34;&gt;Agile XML Development&lt;/a&gt; featuring David Carver, Tony Coates, and Claudia Lucia Jimenez Guarin. (I hope that Tony brings his electric ukulele to Boston, but not necessarily to this panel.) I&amp;rsquo;ll host another panel on Thursday morning in the Enterprise XML track in which Ralph Hodgson will present on &lt;a href=&#34;http://2006.xmlconference.org/programme/presentations/119.html&#34;&gt;Ontology-Based XML Schemas for Interoperability between Systems and Tools&lt;/a&gt; (I&amp;rsquo;ve &lt;a href=&#34;https://www.bobdc.com/blog/schema-language-victory-and-ow#i110&#34;&gt;already mentioned&lt;/a&gt; how much I&amp;rsquo;m looking forward to that) and Cheryl Connors, Mary Ann Malloy, and Ed Masek of the MITRE Corporation will talk about &lt;a href=&#34;http://2006.xmlconference.org/programme/presentations/103.html&#34;&gt;Enabling Secure Interoperability among Federated National Entities&lt;/a&gt;. I almost ended up on a panel on the FEMA Common Alerting Protocol, but luckily I didn&amp;rsquo;t, because I barely know how to spell FEMA.&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s always both fun and frustrating looking at the &lt;a href=&#34;http://2006.xmlconference.org/programme/&#34;&gt;program&lt;/a&gt; and picking out what I&amp;rsquo;m going to see. The frustration comes from seeing simultaneous talks that I want to attend— I&amp;rsquo;m already sorry that Michael Kay&amp;rsquo;s talk on &lt;a href=&#34;http://2006.xmlconference.org/programme/presentations/26.html&#34;&gt;Meta-stylesheets&lt;/a&gt; takes place at the same time as the talk from MITRE folk, and the &lt;a href=&#34;http://2006.xmlconference.org/programme/presentations/154.html&#34;&gt;XML Pipeline Processing panel&lt;/a&gt; is in the same time slot as the Agile XML Development panel. The only other talk on OWL is at the same time as my own, probably because mine was accepted as a &amp;ldquo;late-breaking&amp;rdquo; entry and missed the initial coordination of talk themes.&lt;/p&gt;
&lt;p&gt;Wednesday, I know that Marc Basch&amp;rsquo;s &lt;a href=&#34;http://2006.xmlconference.org/programme/presentations/89.html&#34;&gt;Case Study: Managing XML for a Global Content Delivery Platform&lt;/a&gt; will be good, because I did some peripheral work on that system while at LexisNexis, and Marc and a lot of sharp people have put together a good system that tackles some difficult (and common, for an international company) problems. Betty Harvey has taken a hard look at an issue that I and many others have wondered about: &lt;a href=&#34;http://2006.xmlconference.org/programme/presentations/73.html&#34;&gt;UML from an XML Perspective—Is the Hype Justified?&lt;/a&gt; As an Innodata Isogen employee, though, I should really go to the &lt;a href=&#34;http://2006.xmlconference.org/programme/presentations/150.html&#34;&gt;Panel on Content Management System APIs&lt;/a&gt; that competes with Betty.&lt;/p&gt;
&lt;p&gt;I won&amp;rsquo;t lay out my whole plan of what to see, because doing it on the fly and wandering from room to room is part of the fun. Just hanging out with people and finding out the real inside gossip on standards development, business relationships, and secret personal projects is the best part of the conference; I probably won&amp;rsquo;t get around to any tourism or evening work on my own personal projects, as I usually do on business trips, because I&amp;rsquo;d rather just hang out eating and drinking with people. I&amp;rsquo;ve often said that I&amp;rsquo;d rather go to this annual conference than a high school reunion, because I&amp;rsquo;ll see more old friends there. And now, back to mumbling through the slides for my presentation&amp;hellip;&lt;/p&gt;
&lt;h2 id=&#34;2-comments&#34;&gt;2 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.megginson.com/blogs/quoderat/&#34; title=&#34;http://www.megginson.com/blogs/quoderat/&#34;&gt;David Megginson&lt;/a&gt; on &lt;a href=&#34;#comment-617&#34;&gt;December 4, 2006 9:34 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Welcome to Boston, Bob (and everyone else reading this). I apologize for not having a blog aggregation feed set up this year &amp;ndash; it looks like it would have been useful.&lt;/p&gt;
&lt;p&gt;By Keith Fahlgren on &lt;a href=&#34;#comment-618&#34;&gt;December 5, 2006 10:29 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;David: I&amp;rsquo;ll try to keep a list of the blogs I find (as I find them) here: &lt;a href=&#34;http://www.oreillynet.com/xml/blog/2006/12/xml_conf_2006_first_day.html&#34;&gt;http://www.oreillynet.com/xml/blog/2006/12/xml_conf_2006_first_day.html&lt;/a&gt;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2006">2006</category>
      
      <category domain="https://www.bobdc.com//categories/xml">XML</category>
      
    </item>
    
    <item>
      <title>Quaint, old-world Europe</title>
      <link>https://www.bobdc.com/blog/quaint-oldworld-europe/</link>
      <pubDate>Sun, 03 Dec 2006 19:04:58 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/quaint-oldworld-europe/</guid>
      
      
      <description><div>A glimpse of some old technology that&#39;s still often useful.</div><div>&lt;p&gt;Gran Via is one of Madrid&amp;rsquo;s main streets, and while walking through the rain looking for its &lt;a href=&#34;http://www.museodeljamon.es/&#34;&gt;Museum of Ham&lt;/a&gt; (Madrid has six of these diner-like &amp;ldquo;museums&amp;rdquo;, and I&amp;rsquo;d already been to two that day, but neither had the gift shop with the crucial Museo del Jamon schwag) I passed this place:&lt;/p&gt;
&lt;img src=&#34;http://static.flickr.com/99/312312477_2a7ae460c5.jpg&#34; width=&#34;400px&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;Madrid sign advertising telnet&#34;/&gt;
&lt;p&gt;It looked like a video game arcade, but their advertisement that you could come in and telnet warmed my heart. I&amp;rsquo;ve pushed telnet to &lt;a href=&#34;http://www.xml.com/pub/a/2004/12/15/telnet-REST.html&#34;&gt;places it shouldn&amp;rsquo;t really go&lt;/a&gt; for fun and profit, and the ability of this lightweight, thirty-seven-year-old program to poke around into odd places is part of its appeal, even in a time when you rarely need to go far to find wi-fi access for all of your twenty-first century web applications. A lit-up sign on a main street of one of the world&amp;rsquo;s great cities is definitely one of these odd places.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2006">2006</category>
      
      <category domain="https://www.bobdc.com//categories/tourism">tourism</category>
      
    </item>
    
    <item>
      <title>Schema language victory (and OWL)</title>
      <link>https://www.bobdc.com/blog/schema-language-victory-and-ow/</link>
      <pubDate>Tue, 28 Nov 2006 21:21:12 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/schema-language-victory-and-ow/</guid>
      
      
      <description><div>Winning, losing, and influencing.</div><div>&lt;p&gt;&lt;a href=&#34;https://en.wikipedia.org/wiki/The_Velvet_Underground&#34;&gt;&lt;img src=&#34;https://www.bobdc.com/img/main/vu.jpg&#34; class=&#34;rightAlignedOpeningPicture&#34; alt=&#34;[Velvet Underground]&#34; border=&#34;0&#34; align=&#34;right&#34; hspace=&#34;30px&#34; vspace=&#34;30px&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I was happy to see &lt;a href=&#34;http://www.tbray.org/ongoing/When/200x/2006/11/27/Choose-Relax&#34;&gt;Tim Bray&amp;rsquo;s endorsement&lt;/a&gt; of Elliotte Rusty Harold&amp;rsquo;s &lt;a href=&#34;http://cafe.elharo.com/xml/relax-wins/&#34;&gt;declaration that RELAX Wins&lt;/a&gt;, although given the size of Tim&amp;rsquo;s audience, I wouldn&amp;rsquo;t be surprised if a reasonable number of them barely knew that &lt;a href=&#34;http://www.w3.org/TR/2004/REC-xmlschema-1-20041028/&#34;&gt;W3C Schema&lt;/a&gt; has any competition. It&amp;rsquo;s great that Tim reminded them about this competition, and of course RELAX NG is better, but still, W3C Schema is the context for most non-publishing XML development out there. (When I see how many publishing operations still use DTDs, I could make an argument that neither W3C schemas or RELAX NG has won yet, because a lot of votes haven&amp;rsquo;t been cast.)&lt;/p&gt;
&lt;p&gt;Note that when Elliotte says that RELAX wins, he includes the qualifier &amp;ldquo;among the XML cognoscenti&amp;rdquo;. This reminded me of how Tim &lt;a href=&#34;http://lists.xml.org/archives/xml-dev/200203/msg00569.html&#34;&gt;once said&lt;/a&gt; that &amp;ldquo;SGML &amp;lsquo;mattered&amp;rsquo; in the same sense that Robert Johnson and the Velvet Underground matter to popular music, but nobody bought their records and 99% of the programming profession ignored SGML. Being important or mattering is not equivalent to success in the sense that trade-conference audiences think of it.&amp;rdquo; While giving a class on taking advantage of W3C schema typing from XSLT 2.0 and XQuery, Michael Kay said of W3C schema, based on his experience with clients, &amp;ldquo;I&amp;rsquo;m not selling it to you because it&amp;rsquo;s good; I&amp;rsquo;m selling it to you because it&amp;rsquo;s necessary.&amp;rdquo; He makes a similar good point in an &lt;a href=&#34;http://lists.xml.org/archives/xml-dev/200611/msg00189.html&#34;&gt;xml-dev discussion&lt;/a&gt; of this victory celebration.&lt;/p&gt;
&lt;p&gt;If the bad news is that the majority of XML developers have picked an ugly, convoluted syntax that is difficult to maintain when they store metadata about their types, the good news is that at least they&amp;rsquo;re storing metadata about their types in parsable XML. To get back to my running theme of using RDF/OWL technology to take advantage of existing data, I&amp;rsquo;m looking forward (speaking of trade show audiences) to chairing Ralph Hodgson&amp;rsquo;s &lt;a href=&#34;http://2006.xmlconference.org/programme/presentations/119.html&#34;&gt;Ontology-Based XML Schemas for Interoperability between Systems and Tools&lt;/a&gt; presentation at XML 2006 next week. With a co-worker at TopQuadrant and some people at NASA, Ralph put together a system to automate the creation of RDF and OWL structures from W3C schemas. It fits well with the increasingly important semantic web idea that we should do what we can with the data (and metadata) that&amp;rsquo;s out there instead of talking about the data that should be out there. And TopQuadrant is doing it with real data with a real client—the organization that put men on the moon.&lt;/p&gt;
&lt;p&gt;I won&amp;rsquo;t let Ralph off too easily, though; one of the best parts of chairing a session is that you have your own microphone and get to ask all the questions you want. I&amp;rsquo;m sure I&amp;rsquo;ll have several.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2006">2006</category>
      
      <category domain="https://www.bobdc.com//categories/rdf/owl">RDF/OWL</category>
      
    </item>
    
    <item>
      <title>Word 2003 XML</title>
      <link>https://www.bobdc.com/blog/word-2003-xml/</link>
      <pubDate>Sun, 26 Nov 2006 22:46:46 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/word-2003-xml/</guid>
      
      
      <description><div>Better than I expected, but good enough for a production system?</div><div>&lt;p&gt;After styling some headers in a sample Word document as Heading 1, Heading 2, Heading 3, and so forth, I was pleased to see that when I saved the document as Word 2003 XML, sub-section container elements were wrapped around the appropriate elements, grouping a Heading 3 title and all block elements up to the next Heading 3 (or higher) block together, nested within the group that began with a Heading 2, etc. (Although Open Office 2.0 offers Word 2003 XML as a Save As choice, it does not add these containers.)&lt;/p&gt;
&lt;p&gt;I did this with a pretty simple example, so I don&amp;rsquo;t know how well it would work with more serious documents. Has anyone incorporated the use of this XML into a production system? How robust is it? I understand that getting users to use the styles consistently is a classic problem; for now, I&amp;rsquo;m more interested in how well the Word 2003 XML itself held up to the demands of a production XML system.&lt;/p&gt;
&lt;h2 id=&#34;3-comments&#34;&gt;3 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://surguy.net/&#34; title=&#34;http://surguy.net/&#34;&gt;Inigo&lt;/a&gt; on &lt;a href=&#34;#comment-600&#34;&gt;November 27, 2006 4:52 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Yes, I&amp;rsquo;ve used WordML 2003 in several production systems. It largely works - what causes problems is the occasional unexpected exceptions.&lt;/p&gt;
&lt;p&gt;For example, hyperlinks can be represented as a w:hlink, or as a set of (w:fldCode begin, w:instrText HYPERLINK, w:fldCode end) items. If you click on a hyperlink in a document, and then save it, then the XML changes from using the first representation to the second representation!&lt;/p&gt;
&lt;p&gt;Word automatically puts in wx:subsection elements around headings (actually, it&amp;rsquo;s around blocks delimited by styles marked with an outlineLvl). This is great, and really useful for processing. However, if you&amp;rsquo;re using WordML&amp;rsquo;s capability of including XML in your own namespace in the document, then it doesn&amp;rsquo;t put in the wx:subsections in at all, and all your code depending on them breaks!&lt;/p&gt;
&lt;p&gt;And, as &lt;a href=&#34;http://www.griffinbrown.co.uk/blog/PermaLink,guid,f19a3daa-6cbe-4621-8add-b64f532c6743.aspx&#34;&gt;Andrew of Griffin Brown&lt;/a&gt; discovered, Word Service Pack 2 changes the format slightly. The example he describes is not too bad, I don&amp;rsquo;t think - the real problem is that elements like&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;[w:r]


  [w:instrText]HYPERLINK blahblahblah[/w:instrText]


[/w:r]
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;br /&gt;
can (seemingly randomly) change into\&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;[w:r]


  [w:instrText]HYPERL[/w:instrText]


[/w:r]


[w:r]


  [w:instrText]INK blahblahblah[/w:instrText]


[/w:r]
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;which is much harder to process.&lt;/p&gt;
&lt;p&gt;So, summary, yes, it&amp;rsquo;s usable. Yes, I&amp;rsquo;ve used it in production systems. But, the documentation is poor, I&amp;rsquo;ve had to discover gotchas like these by experience, and its not as easy to process as ODF.&lt;/p&gt;
&lt;p&gt;The Word 2007 OOXML documentation is a useful reference, actually: the XML format hasn&amp;rsquo;t changed very much between the two, and the docs are much better.&lt;/p&gt;
&lt;p&gt;By Keith Fahlgren on &lt;a href=&#34;#comment-604&#34;&gt;November 29, 2006 10:40 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I ended up rewriting most of the conversion tools used for incoming manuscripts around Word 2003 XML. It was a relatively easy decision not because it was such a great format but rather that the non-XML-based tools were all horribly broken. I was thrilled at thought of being able to attack the problem as one of document translation through XSLT2 rather than massaging the input to a (binary) Black Box.&lt;/p&gt;
&lt;p&gt;The end result was reasonably robust, although we have the luxury of authors who use a well-designed Word template with remarkable fidelity. Even after a few months of tweaking, it was never at the point where it could be run without supervision. &amp;ldquo;Fairly high quality&amp;rdquo;&amp;ndash;that&amp;rsquo;s about as good as I&amp;rsquo;d expect to get out of it.&lt;/p&gt;
&lt;p&gt;My experiments with translating Word 2003 XML into DocBook were much easier. However, that path was extremely fragile (often producing invalid DocBook that would have to be hand-validated).&lt;/p&gt;
&lt;p&gt;All that said, I wouldn&amp;rsquo;t build &lt;em&gt;anything&lt;/em&gt; against Word 2003 XML now that Word 2007 OOXML is around the corner (with the added promise of export plugins for older Word versions).&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-605&#34;&gt;November 29, 2006 10:47 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Thanks Keith! I have a question, and I&amp;rsquo;m going to guess at the answer: why did you need XSLT2? Was it because Word&amp;rsquo;s algorithm for determining wx:subsection tag placement wasn&amp;rsquo;t good enough and you needed XSLT2&amp;rsquo;s grouping ability?&lt;/p&gt;
&lt;p&gt;(If you&amp;rsquo;re in Boston next week, I&amp;rsquo;d love to talk about this more.)&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2006">2006</category>
      
      <category domain="https://www.bobdc.com//categories/xml">XML</category>
      
    </item>
    
    <item>
      <title> Frankfurt tourism</title>
      <link>https://www.bobdc.com/blog/frankfurt-tourism/</link>
      <pubDate>Mon, 20 Nov 2006 08:46:55 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/frankfurt-tourism/</guid>
      
      
      <description><div>Germany, barbecue, Moroccan pop, and hiphop.</div><div>&lt;p&gt;I recently heard on short notice that I would have some time to kill in Frankfurt, and just as I was wondering what to do there, Tim Bray &lt;a href=&#34;http://www.tbray.org/ongoing/When/200x/2006/11/08/Frankfurt-Verticals&#34;&gt;posted&lt;/a&gt; something about a recent visit, so I asked him. He suggested the Sachsenhausen district, across the Main River. I booked a room in a hotel near there and wandered around a lot. It was great; I&amp;rsquo;d certainly do it again.&lt;/p&gt;
&lt;p&gt;Sachsenhausen has its own Museum Mile on their side of the river, with a Saturday morning bonus of a large flea market. The only museum I went to was the Communications Museum (&lt;a href=&#34;http://www.museumsstiftung.de/frankfurt/d311_rundgang.asp&#34;&gt;Museum für Kommunikation&lt;/a&gt;), and I highly recommend it: there&amp;rsquo;s plenty for geeks, plenty for kids (there were several birthday parties being hosted when I was there, and USA Today actually lists it under &lt;a href=&#34;http://www.usatoday.com/travel/extraday/frankfurt/worth.htm&#34;&gt;Fun for the kids&lt;/a&gt;), plenty for everyone.&lt;/p&gt;
&lt;p&gt;The histories of television and telephones provide many great exhibits, like this 1961 Kuba Komet TV:&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://flickr.com/photos/bobdc/300392770/&#34;&gt;&lt;img src=&#34;http://static.flickr.com/117/300392770_7e0cae2909_m.jpg&#34; alt=&#34;[Museum für Kommunikation TV]&#34; border=&#34;0&#34; hspace=&#34;30px&#34; vspace=&#34;30px&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The telegraphy history also had some cool stuff. This input device is truly a keyboard&amp;rsquo;s keyboard:&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://flickr.com/photos/bobdc/300392772/&#34;&gt;&lt;img src=&#34;http://static.flickr.com/113/300392772_35302d8efb_m.jpg&#34; alt=&#34;[Museum für Kommunikation telegraph keyboard]&#34; border=&#34;0&#34; hspace=&#34;30px&#34; vspace=&#34;30px&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The museum had a lot of communications-related art. Europeans seem better at making art-technology connections than English-speaking countries. (I know there are exceptions.) This part of the museum had everything from nineteenth-century paintings to works by Max Ernst, Salvador Dali, Joseph Beuys, and Christo. My schedule didn&amp;rsquo;t allow my planned visit to the &lt;a href=&#34;http://www.mmk-frankfurt.de/&#34;&gt;Museum für Moderne Kunst&lt;/a&gt;, so this part of the Communications Museum gave me my big city fix of modern art. Their temporary exhibition, &lt;a href=&#34;http://www.pong-mythos.net/&#34;&gt;pong.mythos&lt;/a&gt;, was particularly good at combining technical history with artistic interpretations of the roles and potential roles of the first video game to enter modern consciousness on a wide scale.&lt;/p&gt;
&lt;p&gt;One of the museum&amp;rsquo;s many &amp;ldquo;specialized consumer hardware through the ages&amp;rdquo; exhibits was a collection of remote control units that could have been duplicated with about 20 euros at the flea market across the street. (Reproducing the &amp;ldquo;cell phones through the ages&amp;rdquo; exhibit would have cost about 200 euros.) Among the many old tape recorders and laptops at the flea market was one old computer that I had to take a picture of for my friends at Sun:&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://flickr.com/photos/bobdc/300392768/&#34;&gt;&lt;img src=&#34;http://static.flickr.com/109/300392768_59be20e315_m.jpg&#34; alt=&#34;[Sun blade at flea market]&#34; border=&#34;0&#34; hspace=&#34;30px&#34; vspace=&#34;30px&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;When I lived in a New York apartment, I got into the flea market habit of focusing on old photographs, because they take up very little room. While looking through some at the Frankfurt flea market, the guy selling CDs in the next booth was playing some North African pop in which a vocoder-like-Cher-&amp;ldquo;Believe&amp;rdquo;-thing was triggered by an EQ filter so that it only affected notes in a certain range, which sounded really cool as the singer did that fast melismatic thing they do in and out of the affected range. I bought the CD, by &amp;ldquo;Talbi One&amp;rdquo;, for five Euros (I later found out that YouTube has a &lt;a href=&#34;http://www.youtube.com/watch?v=zp5nksLPhzE&#34;&gt;video&lt;/a&gt; for the CD&amp;rsquo;s first tune), then went back to the old photos. To replace the CD that I had bought, the guy put on some more Moroccan pop. This tune was much more familiar, because the Chemical Brothers had sampled it for their hit collaboration &lt;a href=&#34;http://www.youtube.com/watch?v=H2hzVV2Nwfs&#34;&gt;Galvanize&lt;/a&gt; (&amp;ldquo;The time has come to&amp;hellip;&amp;rdquo;) with Tribe Called Quest alumnus Q-Tip: &lt;a href=&#34;http://www.najataatabou.com/&#34;&gt;Najat Atabou&amp;rsquo;s&lt;/a&gt; &amp;ldquo;Hadi kedba bayna&amp;rdquo; (&amp;ldquo;Just Tell Me the Truth&amp;rdquo;). (If you&amp;rsquo;re interested in hiphop usage of North African pop, check out this &lt;a href=&#34;http://riddimmethod.net/?p=23&#34;&gt;background on and mashup of&lt;/a&gt; of Jay Z&amp;rsquo;s &amp;ldquo;Big Pimpin&amp;rdquo; and Abdel-Halim Hafez&amp;rsquo;s original &amp;ldquo;Khosara&amp;rdquo;, the song that Timbaland &amp;ldquo;borrowed&amp;rdquo; from to create Big Pimpin&amp;rsquo;s &amp;rsquo;s main riff.) &amp;ldquo;Fünf Euro auch?&amp;rdquo; &amp;ldquo;Fünf Euro auch.&amp;rdquo; She&amp;rsquo;s really impressive, and the CD is much better than the Talbi One disk.&lt;/p&gt;
&lt;p&gt;While in Germany, I also had an interesting insight about a classic German dish. A German co-worker had explained to me that real sauerbraten is beef cooked slowly in a vinegar-based sauce until it&amp;rsquo;s almost ready to fall apart. When a co-worker from the Philippines (of the seven people on my project team there were two Filipinos and no two others from the same country) ordered a &amp;ldquo;barbecue&amp;rdquo; plate, it turned out to really mean &amp;ldquo;grilled&amp;rdquo;. As I tried to explain to the others what &amp;ldquo;barbecue&amp;rdquo; meant in the States, especially in the south, it hit me: the closest thing to barbecue—&lt;a href=&#34;http://www.northcarolinatravels.com/food/barbecue/index.htm&#34;&gt;North Carolina style&lt;/a&gt;, anyway, although they favor pork over beef—in Germany, and maybe in all of Europe, is sauerbraten! Web searches for sauerbraten recipes show other typical barbecue sauce ingredients such as catsup or tomato paste, garlic, peppercorns, onion&amp;hellip; and of course you wash the finished product down with beer.&lt;/p&gt;
&lt;p&gt;The next time I make some barbecue, I may have to crunch up some ginger snaps in the sauce, which seems to be common in sauerbraten recipes. And the sauebraten I had &lt;a href=&#34;http://www.sachsenhaeuserwarte.de/&#34;&gt;tonight&lt;/a&gt; was just great.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2006">2006</category>
      
      <category domain="https://www.bobdc.com//categories/tourism">tourism</category>
      
    </item>
    
    <item>
      <title>DAM! Subversion! RDF? (OWL?)</title>
      <link>https://www.bobdc.com/blog/dam-subversion-rdf-owl/</link>
      <pubDate>Fri, 10 Nov 2006 07:45:56 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/dam-subversion-rdf-owl/</guid>
      
      
      <description><div>One nice thing about blogging is that if you don&#39;t have the spare time to code up an idea, you can at least describe the design issues and see what people think.</div><div>&lt;p&gt;I read Elliot Kimber&amp;rsquo;s &lt;a href=&#34;http://www.google.com/search?q=site%3Adrmacros-xml-rants.blogspot.com%20xcmtdmw&#34;&gt;series on XML content management software&lt;/a&gt; as it came out, and I&amp;rsquo;ve been re-reading it lately for work project reasons. We work at the same company, where content management issues come up a lot. Content Management Systems is also one of those software categories where many products claim to do it all, but what exactly constitutes &amp;ldquo;it all&amp;rdquo; is very vague. Each vendor makes up their own features and puts their own spin on the au courant buzzwords, making it difficult to compare different products. Elliot&amp;rsquo;s approach of treating a basic CMS as a source control system like &lt;a href=&#34;http://subversion.tigris.org/&#34;&gt;Subversion&lt;/a&gt; plus x, y, and z and then analyzing what x, y, and z should be make it easier to sort expectations of a CMS system.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://subversion.apache.org/&#34;&gt;&lt;img src=&#34;http://subversion.apache.org/images/svn-name-banner.jpg&#34; alt=&#34;[subversion logo]&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; width=&#34;200px&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve also been getting to know Subversion. One nice thing that I learned when &lt;a href=&#34;http://drmacros-xml-rants.blogspot.com/2006/07/xcmtdmw-subversion-cooler-than.html&#34;&gt;someone pointed it out to Elliot&lt;/a&gt; is that it can store arbitrary metadata, even passing my Arbitrary Metadata Test by letting me assign a goofinessFactor of 3.1416 to one file.&lt;/p&gt;
&lt;p&gt;Unfortunately, Subversion &lt;a href=&#34;http://svn.haxx.se/users/archive-2006-06/1092.shtml&#34;&gt;doesn&amp;rsquo;t let you&lt;/a&gt; search for files based on metadata values. I see the value of finding an object and then looking at its metadata values, but I want the ability to search the metadata values to find objects. There are ways to add this in, but first let&amp;rsquo;s address the always important question: why bother?&lt;/p&gt;
&lt;h2 id=&#34;i112&#34;&gt;Subversion + ? = (CMS | DAM)&lt;/h2&gt;
&lt;p&gt;I thought that I would learn a lot by adding whatever to Subversion to build a simplified CMS. (Subversion &lt;a href=&#34;http://svnbook.red-bean.com/nightly/en/svn.reposadmin.create.html#svn.reposadmin.create.hooks&#34;&gt;hook scripts&lt;/a&gt; make it easy to trigger python scripts upon events such as check-in.) Elliot &lt;a href=&#34;http://drmacros-xml-rants.blogspot.com/2006/07/xcmtdmw-import-is-everything.html&#34;&gt;makes it clear&lt;/a&gt; that link management is important in a CMS if you want to dynamically create documents from stored pieces and track dependency relationships—and for typical CMS use, you definitely want to do the former and probably the latter—but I didn&amp;rsquo;t want to add that much to Subversion, so I thought of some lower hanging fruit: a &lt;a href=&#34;http://www.google.com/search?q=%22digital%20asset%20manager%22&#34;&gt;Digital Asset Manager&lt;/a&gt;, a project that also gives me the benefit of a cheap pun to use. (Years ago, my future wife and I saw that upon finishing our group&amp;rsquo;s tour of the &lt;a href=&#34;http://en.wikipedia.org/wiki/Hoover_Dam&#34;&gt;Hoover Dam&lt;/a&gt;, the older gentleman leading the tour clearly enjoyed saying &amp;ldquo;Thanks for taking the Dam(n) tour!&amp;rdquo;) Like so many people doing semantic web related development, I could start by creating Yet Another Photo Management System Using RDF. I could also store to-do lists, XML files of all persuasions, Microsoft Office and Open Office files, and other &amp;ldquo;digital assets,&amp;rdquo; as I recently read about &lt;a href=&#34;http://www.onlamp.com/pub/a/onlamp/2006/11/02/personal_document_management.html&#34;&gt;Jason Hunter&lt;/a&gt; and &lt;a href=&#34;http://kitenet.net/~joey/svnhome.html&#34;&gt;Joey Hess&lt;/a&gt; doing.&lt;/p&gt;
&lt;p&gt;If I can forget about linking, I only need to add better metadata management to Subversion&amp;rsquo;s excellent storage and version control and add some of the x, y, and z features mentioned above. But do I really want to use RDF to store the metadata?&lt;/p&gt;
&lt;h2 id=&#34;i115&#34;&gt;The case for storing the metadata in MySQL&lt;/h2&gt;
&lt;p&gt;A relational database offers obvious benefits for storing data that fits easily into rows and columns, and you can have as many columns as you need. One great thing about Subversion&amp;rsquo;s metadata capabilities is that the metadata is versioned, like the files that you check in, so that you could say that at r9, the editor of a document was John Smith, but at r10 it was Jane Jones, and you could always go back and see who was the editor at r9. A simple relational table could store the document&amp;rsquo;s pathname as an ID, the property name (e.g. &amp;ldquo;editor&amp;rdquo;), the property value, and the release number. That&amp;rsquo;s four pieces of information, and therefore not a great fit for an RDF triplestore.&lt;/p&gt;
&lt;h2 id=&#34;i117&#34;&gt;The case for storing the metadata in an RDF triplestore&lt;/h2&gt;
&lt;p&gt;It would be a bit kludgy to add the release number as a suffix to the file ID (e.g doc/intro.htmlr9) and have a regular expression peel it off before presenting it for output, but at least it would squeeze these four pieces of information into a triple. It wouldn&amp;rsquo;t be that much trouble, and by helping to distinguish between two versions of a file, we could consider a release number to be part of the identifier anyway. (I&amp;rsquo;m sure others have thought harder about this than I have, so I&amp;rsquo;d appreciate any pointers.)&lt;/p&gt;
&lt;p&gt;By letting us store RDF versions of the metadata, what would this little kludge buy us? So far, I&amp;rsquo;ve thought of two things. First—and this is what gave me the idea for the whole thing in the first place—OWL reasoners could take advantage of the data. For example, if I want a picture of a logo, and I had declared that files with a type of JPG, JPEG, BMP, PNG, and TIF were image files, then I could easily search the metadata of just the image files without worrying about format. I&amp;rsquo;d love to hear more potential examples of useful, realistic OWL-based queries to do on such data, and probably won&amp;rsquo;t start any coding until I find some.&lt;/p&gt;
&lt;p&gt;The second advantage of storing it in RDF is that more metadata extraction tools are already out there for the taking. For example, there are &lt;a href=&#34;http://www.ivan-herman.net/WebLog/WorkRelated/SemanticWeb/xmpextract.html&#34;&gt;free tools&lt;/a&gt; for pulling &lt;a href=&#34;https://www.bobdc.com/blog/using-or-not-using-adobes-xmp&#34;&gt;XMP&lt;/a&gt;-style RDF from JPEG and Adobe formats. More importantly, the &lt;a href=&#34;http://www.w3.org/2004/01/rdxh/spec&#34;&gt;GRDDL&lt;/a&gt; community is writing XSLT stylesheets to pull metadata from XML-based resources. I think that this community is a bit optimistic in hoping that movie theater and pizza shop web site owners will add processing instructions to their XHTML files that point to these stylesheets, but if the code is being written to do the extractions, there are all kinds of applications that can benefit. A Subversion-based DAM is one.&lt;/p&gt;
&lt;p&gt;Any other suggestions or ideas?&lt;/p&gt;
&lt;h2 id=&#34;1-comments&#34;&gt;1 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.mediabeacon.com&#34; title=&#34;http://www.mediabeacon.com&#34;&gt;Alex M.&lt;/a&gt; on &lt;a href=&#34;#comment-591&#34;&gt;November 10, 2006 12:40 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Excellent write up. It&amp;rsquo;s great to see people being excited about RDF and XMP. We have been boiling in pretty much the same technologies for quite some time. We&amp;rsquo;ve also been a big proponent of the idea of embedding and encrypting arbitrary amounts of data and business logic/forms into the files and created a powerful library that can embed RDF/XMP data into any file type. :)&lt;/p&gt;
&lt;p&gt;If you eve feel bored give our office a buzz and somebody will give you a tour.&lt;/p&gt;
&lt;p&gt;Alex M., MediaBeacon, Inc.\&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2006">2006</category>
      
      <category domain="https://www.bobdc.com//categories/rdf/owl">RDF/OWL</category>
      
    </item>
    
    <item>
      <title>Mapping relational data to RDF with D2RQ</title>
      <link>https://www.bobdc.com/blog/mapping-relational-data-to-rdf/</link>
      <pubDate>Mon, 06 Nov 2006 08:49:18 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/mapping-relational-data-to-rdf/</guid>
      
      
      <description><div>Getting more URIs into your triples&#39; objects, and why this is good.</div><div>&lt;p&gt;&lt;a href=&#34;https://www.bobdc.com/blog/integrating-relational-databas&#34;&gt;Last week&lt;/a&gt; I mentioned the role that &lt;a href=&#34;http://sites.wiwiss.fu-berlin.de/suhl/bizer/d2rq/index.htm&#34;&gt;D2RQ&lt;/a&gt; played in a project I was working on, and I wanted to write a little more about this RDBMS/RDF interface if it&amp;rsquo;s any help to people who may use it. D2RQ is free, and it&amp;rsquo;s easy to use in its default setup, but I&amp;rsquo;m finding that the further you stray from the default setup, the more you can do with it.&lt;/p&gt;
&lt;p&gt;The main mistake I made at the beginning was that when I created a small dummy database for my first mapping attempt, I didn&amp;rsquo;t declare a key value for the database&amp;rsquo;s only table. SQL doesn&amp;rsquo;t require this, but D2RQ does, and for a very good reason: the key value becomes the subject of that triples that express the data values in each row of the table. The D2RQ mapping file lets you configure the URIs used in the triples, so a row from a book publishing database table where the ISBN column is the key and has a value of &amp;ldquo;0553213113&amp;rdquo; and the title column has a value of &amp;ldquo;Moby Dick&amp;rdquo; can generate a (subject, predicate, object) triple of (&lt;a href=&#34;http://foo-url/isbn/0553213113&#34;&gt;http://foo-url/isbn/0553213113&lt;/a&gt;, &lt;a href=&#34;http://bar-url/title&#34;&gt;http://bar-url/title&lt;/a&gt;,&amp;ldquo;Moby Dick&amp;rdquo;).&lt;/p&gt;
&lt;p&gt;While RDF subjects and predicates must be URIs, objects can be string values like &amp;ldquo;Moby Dick&amp;rdquo;, but they can also be URIs. A URI as an object can serve as the subject of other triples, letting you link up information to get more value out of it. (More on this in an &lt;a href=&#34;http://www.snee.com/xml/rdf-drdobbs.html&#34;&gt;article I did for Dr. Dobb&amp;rsquo;s&lt;/a&gt;.)&lt;/p&gt;
&lt;p&gt;D2RQ turns most database values into literal strings when it plugs them into the &amp;ldquo;object&amp;rdquo; slot of an RDF triple statement, but it does know one kind of value in a relational database that&amp;rsquo;s better off being represented as a URI when it&amp;rsquo;s the object of a triple: foreign key values. For example, let&amp;rsquo;s say that in a sales database you set the custID column of an orders table to be a foreign key referencing the custID column of the customers table, because you don&amp;rsquo;t want any orders entered with custID values that aren&amp;rsquo;t in your list of customers. When D2RQ asks your database package about that database, upon finding out that orders.custID is a foreign key, it sets up the mapping file so that RDF triples showing an order&amp;rsquo;s custID value will represent it in a triple&amp;rsquo;s object as as a URI, not a as string like the &amp;ldquo;Moby Dick&amp;rdquo; object of the triple shown above. With custID as the object of triples from the orders table and the subject of triples from the customers table (and similarly, with the orders table&amp;rsquo;s second foreign key column itemID as the object of other triples from that table and the subject of additional triples from the items table), the right SPARQL query can show us that order o003, in which customer c002 ordered item i004, really means that customer John Lennon bought an Epiphone Casino guitar. (When I create a customer database of four names, I pick four obvious ones.)&lt;/p&gt;
&lt;p&gt;Of course, this is even easier in SQL—it&amp;rsquo;s what relational databases are designed for—but the point of my exercise is to set up an RDF version of relational data so that I can see if OWL lets us do useful things with the data that I couldn&amp;rsquo;t do when the data was strictly relational. Which leads to the next D2RQ objects-as-URIs trick: the mapping file&amp;rsquo;s d2rq:uriPattern property.&lt;/p&gt;
&lt;p&gt;In a comment on my last posting, Richard Cyganiak suggested that setting an e-mail address field in my address book databases to be an inverse functional property would add metadata that gave more value to the database. To cut to the chase, it worked; because of this property, &lt;a href=&#34;http://www.mindswap.org/2003/pellet/&#34;&gt;Pellet&lt;/a&gt; can now tell that Bobby Fisher of 2304 Eighth Lane and Robert L. Fisher of 2304 8th Ln. are the same person. (Despite my use of Beatle names in four-person databases, I swear that my &lt;a href=&#34;https://www.bobdc.com/blog/all-the-personal-data-you-want&#34;&gt;random personal data generator&lt;/a&gt; just happened to come up with the name of the wacky former chess champion.) The identification of potentially redundant names is a key feature of some &lt;a href=&#34;http://www.siperian.com/&#34;&gt;expensive products&lt;/a&gt; out there—you don&amp;rsquo;t like getting two copies of the same catalog with your name spelled slightly differently, and the retailer doesn&amp;rsquo;t want to pay for sending you two. So, the case for the value of owl:InverseFunctionalProperty is pretty clear.&lt;/p&gt;
&lt;p&gt;When you&amp;rsquo;re using OWL DL and you want to say that a property is an inverse functional property, the value must be a URI and not a literal string value. (Explanations of the &lt;a href=&#34;http://www.w3.org/TR/2004/REC-owl-features-20040210/#s1.3&#34;&gt;differences&lt;/a&gt; between OWL Lite, OWL DL, and OWL Full have too much knowledge representation jargon for me to remember those differences, but since OWL DL is more powerful than Lite and Full is apparently difficult to develop software for, the OWL software developers&amp;rsquo; Goldilocks approach of going with the one in the middle has me working with OWL DL.) While D2RQ can tell on its own that foreign key values should be represented as URIs, it needs to be told explicitly if you want e-mail addresses to be represented as URIs, and this means a few simple changes to the mapping file generated by D2RQ&amp;rsquo;s generate-mapping utility.&lt;/p&gt;
&lt;p&gt;The following shows the map file entry for the email1 field of the entries table in my MySQL eudora database. The mapping files use the N3 dialect of RDF. The first four lines were generated by D2RQ&amp;rsquo;s generate-mapping utility. I commented out the fourth line, which declares email.entries to be a data property, and added the two new lines below it. The d2rq:uriPattern line says that entries.email1 should be treated as a URI, and it plugs that value into a string beginning with &amp;ldquo;mailto:&amp;rdquo; so that it really is a URI. The line after that says to only bother with this mapping if the entries.email1 value is not equal to an empty string. (the &amp;ldquo;&amp;lt;&amp;gt;&amp;rdquo; here means &amp;ldquo;not equal to&amp;rdquo;, so don&amp;rsquo;t confuse it with more common XML and N3 use of the pointy brackets.) Without this d2rq:condition property, D2RQ creates a URI of just &amp;ldquo;mailto:&amp;rdquo; for blank email1 values. Pellet, with good reason, doesn&amp;rsquo;t like these.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;emap:entries_email1 a d2rq:PropertyBridge;
   d2rq:belongsToClassMap emap:entries;
   d2rq:property eud:entries_email1;
#  d2rq:column &amp;quot;entries.email1&amp;quot;;
   d2rq:uriPattern &amp;quot;mailto:@@entries.email1@@&amp;quot;;
   d2rq:condition &amp;quot;entries.email1 &amp;lt;&amp;gt; &#39;&#39;&amp;quot;;
   .
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Once the email1 values are treated as URIs, &lt;a href=&#34;http://www.mindswap.org/2004/SWOOP/&#34;&gt;SWOOP&lt;/a&gt; (one of the Goldilocks software packages I mentioned earlier) lets me set email1 to be an inverse functional property. As handy as the SWOOP interface is, it&amp;rsquo;s not that difficult to add the bolded line below to your OWL ontology using a text editor:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;lt;owl:ObjectProperty rdf:about=&amp;quot;http://localhost:2020/resource/eudora/entries_email1&amp;quot;&amp;gt;
  &amp;lt;rdf:type rdf:resource=&amp;quot;http://www.w3.org/2002/07/owl#InverseFunctionalProperty&amp;quot; /&amp;gt;
  &amp;lt;rdfs:subPropertyOf rdf:resource=&amp;quot;http://localhost:2020/resource/entries/email&amp;quot; /&amp;gt;
&amp;lt;/owl:ObjectProperty&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;(By the way, tools like SWOOP and Protégé are great for automating the writing and editing of RDF/OWL code, but only because the code is so verbose and redundant, not because it&amp;rsquo;s particularly difficult to understand. When I hear W3C Schema and XBRL advocates say &amp;ldquo;sure, the syntax is convoluted, but don&amp;rsquo;t worry about it, because the tools will take care of it&amp;rdquo; little alarms go off in my head—if I can&amp;rsquo;t understand the syntax used to model some information, I don&amp;rsquo;t want to have to take it on faith that the model is good. There&amp;rsquo;s no reason for RDF/OWL syntax to set off such alarms, and using these tools to generate the syntax and then reviewing the syntax is a great way to learn that syntax, but ultimately, you&amp;rsquo;ll get more work done more quickly if you use the tools.)&lt;/p&gt;
&lt;p&gt;To cut back to the chase I mentioned earlier, following these steps to designate email1 as an inverse functional property made it possible for Pellet to know that because the Bobby Fisher of 2304 Eighth Lane and the Robert L. Fisher of 2304 8th Ln in my MySQL database have the same email address, they&amp;rsquo;re the same person—giving me one more example of how RDF/OWL can help me to get more out of a traditional relational database. I&amp;rsquo;d love to hear more suggestions for things I can add to this ontology that let SPARQL queries find out things that straight SQL queries could not get from the same data.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2006">2006</category>
      
      <category domain="https://www.bobdc.com//categories/rdf/owl">RDF/OWL</category>
      
    </item>
    
    <item>
      <title>Integrating relational databases with RDF/OWL</title>
      <link>https://www.bobdc.com/blog/integrating-relational-databas/</link>
      <pubDate>Mon, 30 Oct 2006 08:22:06 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/integrating-relational-databas/</guid>
      
      
      <description><div>Done, if on a fairly small scale.</div><div>&lt;p&gt;I &lt;a href=&#34;https://www.bobdc.com/blog/rdfowl-for-data-silo-integrati&#34;&gt;recently asked&lt;/a&gt; about the use of RDF/OWL to integrate databases, especially relational databases. The posting received many good comments, but no pointers to the kind of simple example I was hoping to find, so I&amp;rsquo;ve managed to create one myself.&lt;/p&gt;
&lt;p&gt;I loaded two different single-table address book databases into MySQL, with one based on Outlook&amp;rsquo;s structure and field names and the other based on Eudora&amp;rsquo;s. (I&amp;rsquo;ve already &lt;a href=&#34;https://www.bobdc.com/blog/all-the-personal-data-you-want&#34;&gt;written here&lt;/a&gt; about how I generated the data.) I used &lt;a href=&#34;http://sites.wiwiss.fu-berlin.de/suhl/bizer/d2rq/index.htm&#34;&gt;D2RQ&lt;/a&gt; to treat the two databases as RDF and I used &lt;a href=&#34;http://www.mindswap.org/2004/SWOOP/&#34;&gt;SWOOP&lt;/a&gt; to generate an ontology, with equivalence rules such as that the eud:entries_workState field from one database was equivalent to the ol:entries_businessState field from the other.&lt;/p&gt;
&lt;p&gt;To find everyone in the address book with a business address in New York state, a SPARQL query using &lt;a href=&#34;http://www.mindswap.org/2003/pellet/&#34;&gt;Pellet&lt;/a&gt; can ask for all entries for which eud:entries_workState=&amp;ldquo;NY&amp;rdquo; and then also get the entries from the other database for which ol:entries_businessState=&amp;ldquo;NY&amp;rdquo;. By defining all the different phone number properties (home, work, mobile, fax, etc.) as subproperties of &amp;ldquo;phone&amp;rdquo;, I can also query for someone&amp;rsquo;s phone numbers without knowing exactly which kinds of phone numbers the database has for that person, and see them all listed. To me, these both demonstrate how metadata can add value to data, because they let me get answers to practical questions about my data that are more complete than these answers would have been without the metadata.&lt;/p&gt;
&lt;p&gt;To use D2RQ as an interface to a MySQL database, you first run its utility that queries the database for database catalog information and generates a mapping file. Then, you start up the D2RQ server with the mapping file as a parameter, so that when you issue SPARQL queries against its server it can map your queries to the appropriate SQL queries to pull the data out. (I did this all on a Windows machine, by the way, and have no reason to believe that any of it would be different on a Linux machine.) To integrate two databases, I generated a mapping file for each and then combined the two mapping files into one that I used when I started the D2RQ server before making my Pellet queries. I had to rename some namespace prefixes, but the mapping file syntax was pretty easy to understand, as is the &lt;a href=&#34;http://sites.wiwiss.fu-berlin.de/suhl/bizer/D2RQ/spec/&#34;&gt;spec&lt;/a&gt; that describes their syntax. I&amp;rsquo;d like to especially thank Richard Cyganiak, who patiently answered my questions on the &lt;a href=&#34;https://lists.sourceforge.net/lists/listinfo/d2rq-map-devel&#34;&gt;d2rq-map-devel mailing list&lt;/a&gt;. (I&amp;rsquo;m looking forward to checking out the new features of the D2RQ upgrade, which I just learned about this morning.)&lt;/p&gt;
&lt;p&gt;To query the data, I have a script pull all the latest data from the D2RQ server into a file where it&amp;rsquo;s added to the ontology information (which I created with SWOOP, as described below), and then Pellet queries that. Integrating additional RDF-based sources would be easy, whether they come via D2RQ or not; just add them to the same file before querying. This probably won&amp;rsquo;t scale way up, and some digging into the Pellet and D2RQ APIs should make it possible to integrate them more closely for this querying. My next step is to use this same routine to integrate some multi-table databases and to get some non-string data in there. At some point I&amp;rsquo;ll write up how I did all this in more detail and make all the files available so that someone can reproduce it on their own. Meanwhile, I&amp;rsquo;ll probably propose it as a conference topic someplace.&lt;/p&gt;
&lt;p&gt;I love how, when you pick &amp;ldquo;Load&amp;rdquo; and then &amp;ldquo;Ontology&amp;rdquo; from SWOOP&amp;rsquo;s File menu, if you load an RDF file with no ontology information defined, it declares all the predicates it finds as properties. You can save the file without doing anything else, open it up in a text editor, and see the properties all defined there in RDF/OWL syntax. Using SWOOP, you can then specify equivalence and property/subproperty relationships just by pointing and clicking, and then save again. After doing this with the combined address book databases, I pulled the ontology definitions out into a separate file so that when the relational data was updated, I could use D2RQ to pull updated RDF and apply the same ontology definitions to it when querying the new data.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;d love to hear suggestions about additional OWL constructs that can let queries pull information from a database that they couldn&amp;rsquo;t have found without that OWL metadata. This last qualification is important—for example, while I understand the domain and range concepts (at least with pizza; &lt;a href=&#34;http://www.co-ode.org/resources/tutorials/ProtegeOWLTutorial.pdf%20&#34;&gt;PDF&lt;/a&gt;), I&amp;rsquo;d like to know a way that defining the domain or range of a property could let a query do more than it could before these were defined.&lt;/p&gt;
&lt;p&gt;For now, I&amp;rsquo;m pointing and clicking with a free tool to define an ontology that adds value to existing data, and I&amp;rsquo;m psyched.&lt;/p&gt;
&lt;h2 id=&#34;3-comments&#34;&gt;3 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.jogiles.co.nz&#34; title=&#34;http://www.jogiles.co.nz&#34;&gt;Jonathan Giles&lt;/a&gt; on &lt;a href=&#34;#comment-576&#34;&gt;October 30, 2006 4:10 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Hello, good article (and I&amp;rsquo;m glad I stumbled across it on planet RDF). My question is, consider you have a client-side application that sends a large number of queries to the D2RQ server using SPARQL. Have you given any thought to there being some caching layer in between the relational database and the D2RQ server? Or does this already exist? Alternatively, is the response speed good enough such that a cache is not overly necessary?&lt;/p&gt;
&lt;p&gt;I was previously considering a solution where a Jena OWL ontology was populated from a database, which would act as a cache. It would still handle queries using SPARQL however. This blog post suggests a slightly different approach that I would be interested in learning more about.&lt;/p&gt;
&lt;p&gt;Cheers,&lt;br /&gt;
Jonathan Giles.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://dowhatimean.net/&#34; title=&#34;http://dowhatimean.net/&#34;&gt;Richard Cyganiak&lt;/a&gt; on &lt;a href=&#34;#comment-577&#34;&gt;October 30, 2006 4:46 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Cool stuff – so this is what you&amp;rsquo;ve been up to. We&amp;rsquo;ve been looking at integrating D2RQ instances through &lt;a href=&#34;http://darq.sourceforge.net/&#34;&gt;federated SPARQL queries&lt;/a&gt;, and it&amp;rsquo;s nice to see an inference-based working as well.&lt;/p&gt;
&lt;p&gt;Things you could do with OWL, related to data integration: With IFPs, you can infer that people with the same email address from two different DBs are the same person. With subclass relationships, you can infer that a tech report from DB1 and a press release from DB2 are both documents, and therefore queries for documents shall return both.&lt;/p&gt;
&lt;p&gt;Jonathan: If you have enough memory, then loading everything into an in-memory model before doing inference will always be much faster. Databases are OK for doing SPARQL queries, but are no good for doing OWL inference.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-578&#34;&gt;October 30, 2006 5:14 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Thanks Richard! Jonathan&amp;ndash;my plan is to continue my research more horizontally than vertically, i.e. to investigate the potential role of other tools before I try to add a lot of scale and efficiency to the system created with this particular set of tools. Mostly, I want to push the possible rule that OWL can play in this system.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2006">2006</category>
      
      <category domain="https://www.bobdc.com//categories/rdf/owl">RDF/OWL</category>
      
    </item>
    
    <item>
      <title>Finding free content</title>
      <link>https://www.bobdc.com/blog/finding-free-content/</link>
      <pubDate>Mon, 23 Oct 2006 08:15:57 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/finding-free-content/</guid>
      
      
      <description><div>People who should know better often think it&#39;s easy.</div><div>&lt;p&gt;A few weeks ago I wrote about &lt;a href=&#34;https://www.bobdc.com/blog/all-the-personal-data-you-want&#34;&gt;free personal data&lt;/a&gt; that was really just randomly generated names and contact information created for some tests. Coherent prose by knowledgeable people is something that you can&amp;rsquo;t generate with a python script, and it&amp;rsquo;s interesting to see the schemes that some people have made to find such content.&lt;/p&gt;
&lt;p&gt;In the dot com days, I heard from several organizations that were each putting together a Portal for All Things XML. They all told me how great it would be for me to write for them for free because it would be such great exposure. Other XML &amp;ldquo;experts&amp;rdquo; would be writing there as well, you see, making this the central place for anyone concerned with XML. Weblogs hadn&amp;rsquo;t caught on yet, but since &lt;a href=&#34;http://mailman.ic.ac.uk/pipermail/xml-dev/1997-February/000000.html&#34;&gt;early 1997&lt;/a&gt; the &lt;a href=&#34;http://www.xml.org/xml/xmldev.shtml&#34;&gt;xml-dev&lt;/a&gt; mailing list had already been to XML what the &lt;a href=&#34;http://groups-beta.google.com/group/comp.text.sgml/topics&#34;&gt;comp.text.sgml&lt;/a&gt; Usenet group had been to SGML: the central place where people who knew XML well or wanted to know more exchanged ideas and opinions on related issues. If I had an XML-related idea or question and wanted opinions from key players, I knew that that was the place to take it. Weblogs still hadn&amp;rsquo;t supplanted that, and &amp;ldquo;portals&amp;rdquo; never would.&lt;/p&gt;
&lt;p&gt;I did write for XML.com, which offered the best reason of all to write for them: they paid, as did IBM developerWorks and various short-lived print magazines with &amp;ldquo;XML&amp;rdquo; in their title. When O&amp;rsquo;Reilly started the O&amp;rsquo;Reilly Network, I thought that that was a nice place to try the weblogging thing, because it had an audience in place that would see what I wrote, and the idea of trying to convince people to subscribe to a feed of just me was intimidating. (When I started the bobdc.blog weblog, I immediately contacted &lt;a href=&#34;http://planet.xmlhack.com/&#34;&gt;Planet XMLhack&lt;/a&gt; and &lt;a href=&#34;http://planetrdf.com/&#34;&gt;Planet RDF&lt;/a&gt; about becoming part of those feeds—it was like getting a distribution deal.) The O&amp;rsquo;Reilly Network weblog didn&amp;rsquo;t pay, but there was no pressure, and I thought it would be an interesting experiment to have a weblog devoted to one topic: linking.&lt;/p&gt;
&lt;p&gt;It looks like O&amp;rsquo;Reilly has shifted their online publishing emphasis away from paid articles to the O&amp;rsquo;Reilly Network weblogs. XML.com used to have three new articles every week, and it was part of my Wednesday morning routine to see what they were; now they only have one, because they&amp;rsquo;re more interested in relying on the free content of the O&amp;rsquo;Reilly Network blogs. When Bloglines tells me that the Planet XML feed has a new entry &amp;ldquo;by XML.com&amp;rdquo;, it&amp;rsquo;s more likely to be some musings completely unrelated to XML than it is to be a new XML.com article. Meanwhile, &lt;a href=&#34;http://copia.ogbuji.net/blog/&#34;&gt;Uche&lt;/a&gt;, &lt;a href=&#34;http://times.usefulinc.com/&#34;&gt;Edd&lt;/a&gt;, &lt;a href=&#34;http://dubinko.info/blog/&#34;&gt;Micah&lt;/a&gt;, &lt;a href=&#34;http://clarkparsia.com/weblog/&#34;&gt;Kendall&lt;/a&gt; and others of us with less distinctive first names have started our own weblogs on our own domains instead of letting O&amp;rsquo;Reilly make advertising revenue from content that we provide to them for free. (Note to &lt;a href=&#34;http://www.oreillynet.com/pub/au/1712&#34;&gt;Rick Jelliffe&lt;/a&gt;: join us&amp;hellip; join us&amp;hellip;) It seems odd that O&amp;rsquo;Reilly would put more emphasis on the weblogs as people move away from putting their postings there; I guess for some people, the O&amp;rsquo;Reilly branding gives them &amp;ldquo;good exposure,&amp;rdquo; and for O&amp;rsquo;Reilly, free is better.&lt;/p&gt;
&lt;p&gt;Academic publishers have received quality free content for at least a hundred years, but their authors had an incentive, because it advanced their careers in a publish-or-perish sort of way. This model is starting to &lt;a href=&#34;http://math.ucr.edu/home/baez/journals.html&#34;&gt;break up&lt;/a&gt;, especially in the world of math, where the journal you publish in is not quite as important as what you publish. (In expensive fields such as biology, research grants are tied more closely to where you publish as an indicator of your work&amp;rsquo;s importance.)&lt;/p&gt;
&lt;p&gt;The &amp;ldquo;good exposure&amp;rdquo; argument works for the prominent academic journals, it works for YouTube, and it has worked for &lt;a href=&#34;http://abc.go.com/primetime/americasfunniest/index.html&#34;&gt;America&amp;rsquo;s Funniest Home Videos&lt;/a&gt; since fourteen years before YouTube existed. The Publishing 2.0 blog &lt;a href=&#34;http://publishing2.com/2006/09/14/are-users-who-generate-content-receiving-equal-pay-for-equal-work/&#34;&gt;frets&lt;/a&gt; about whether people who contribute to Frito Lay&amp;rsquo;s new campaign are duly compensated, but the contributors get a shot at the most exposure in all of US media.&lt;/p&gt;
&lt;p&gt;Just this past Sunday I got an email about &lt;a href=&#34;http://www.scribd.com/word/index&#34;&gt;Scribd&lt;/a&gt;, which is &amp;ldquo;kind of like YouTube for publishing, and is meant to be an alternative to a blog&amp;rdquo;, according to the email. Much of the point of YouTube was to simplify the public posting of video; the public posting of prose was never that difficult. As an alternative to a blog, Scribd means that the timestamped nature of blog postings makes them look stale after a while, so they hope that their alternative will appeal more to someone who posts two short stories in February and then another one in November. According to their &lt;a href=&#34;http://www.scribd.com/static/tour#&#34;&gt;30 second tour&lt;/a&gt;, they &amp;ldquo;get wide readership for your work by people who care,&amp;rdquo; although clicking &amp;ldquo;SEE MORE&amp;rdquo; after that line displays information about how Google indexes their content (really!) and Scribd tags and categorizes the content, with no real explanation of how the &amp;ldquo;people who care&amp;rdquo; will come to read your work on their site. I know the answer, and it gets back to the YouTube analogy: they hope people will be blogging and emailing that they saw a funny/interesting/wacky piece on Scribd, just as we all do know with YouTube. They have a chicken-egg situation with their building of a content collection and their building of an audience, but if they can hustle some good press it might work.&lt;/p&gt;
&lt;p&gt;One interesting attempt at garnering free content is the recent launch, complete with a &lt;a href=&#34;http://gilbane.com/gr_news_9.19.06.html&#34;&gt;press release&lt;/a&gt;, of a &lt;a href=&#34;http://gilbane.com/ctoblog/&#34;&gt;Content Technology CTO Blog&lt;/a&gt; by the Gilbane Group (&amp;ldquo;Content Technologies, Trends &amp;amp; Advice&amp;rdquo;). They provide it &amp;ldquo;as a service to the content and information technology community. The purpose of the blog is to facilitate ongoing discussion and debate on technologies, approaches and architectures relevant to enterprise content applications,&amp;rdquo; according to its &lt;a href=&#34;http://gilbane.com/ctoblog/archives/2006/07/the_new_content_technology_cto.html&#34;&gt;FAQ&lt;/a&gt; and first posting. Since that posting, there have been a total of four more postings from the thirteen listed contributors, with the last being over a month ago. I have a good topic for discussion on the Content Technology CTO Blog: what incentive can you give knowledgeable people to contribute content for free? The Gilbane Group can only hope that listed contributors such as &lt;a href=&#34;http://bill.cava.us/&#34;&gt;Bill Cava&lt;/a&gt; and &lt;a href=&#34;http://newton.typepad.com/&#34;&gt;John Newton&lt;/a&gt; add their opinions on this topic to the Gilbane blog and not their own.&lt;/p&gt;
&lt;p&gt;If anyone wants to write for me for free, let me know. We could call it &amp;ldquo;The Snee Report&amp;rdquo; or something, as long as I own &lt;a href=&#34;http://www.snee.com/about.html&#34;&gt;the domain name&lt;/a&gt;. Advertising revenue won&amp;rsquo;t be enough to even bother with splitting it up, so I&amp;rsquo;ll keep it myself&amp;hellip; but just think of the exposure!&lt;/p&gt;
&lt;h2 id=&#34;4-comments&#34;&gt;4 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://radar.oreilly.com&#34; title=&#34;http://radar.oreilly.com&#34;&gt;Tim O&amp;rsquo;Reilly&lt;/a&gt; on &lt;a href=&#34;#comment-571&#34;&gt;October 24, 2006 8:48 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Bob &amp;ndash; You&amp;rsquo;re right to note that O&amp;rsquo;Reilly has cut back on the number of xml.com articles we publish. And I&amp;rsquo;m not happy about it.&lt;/p&gt;
&lt;p&gt;But when you imply that we&amp;rsquo;re doing this because we&amp;rsquo;re just hoping for a free ride, I think you&amp;rsquo;re being a bit unfair. Yes, we are hoping to keep xml.com a valuable gathering place by means of free content, but that&amp;rsquo;s because the paid articles have never made any money.&lt;/p&gt;
&lt;p&gt;The sad fact is that CPMs are very low on technical content, and that while the O&amp;rsquo;Reilly Network as a whole has always made a bit of money, it&amp;rsquo;s been because of a very small number of articles that paid for themselves hundreds of times over, and free content. When we analyze costs vs. revenue from ORN articles, we see that only a tiny percentage of articles actually cover there costs. (And I mean tiny, fewer than a hundred of the many thousands of paid articles we&amp;rsquo;ve published over the years.)&lt;/p&gt;
&lt;p&gt;At our current average ad rates, an article has to get something like 60-80,000 page views to break even&amp;ndash;50,000 to cover just what we pay the authors. We could have addressed this by whoring after page views with sensational content, or tilting our online publishing towards more advertiser-friendly higher-CPM topics. But we&amp;rsquo;ve wanted to continue serving hardcore technical communities, and we hoped we could do so by harnessing the enthusiasm of people who want to write about their topics just because they love them.&lt;/p&gt;
&lt;p&gt;So while you position this as if we&amp;rsquo;ve been trying to get a free ride, it&amp;rsquo;s actually the other way around. We&amp;rsquo;ve been trying to make a business out of something that probably should remain (as it is once again becoming) a gathering of enthusiasts.&lt;/p&gt;
&lt;p&gt;We&amp;rsquo;re well aware that in the age of RSS, the advantages of aggregating content in one place are reduced, but we like to think that there is still a place for a site like xml.com, even if it only aggregates pointers to the excellent writing that folks like you, Edd, Kendall, and others are doing on your own blogs. And yes, we do think that for lesser known bloggers, having a blog on the O&amp;rsquo;Reilly Network will provide advantages of exposure and credibility. But it is unfortunate if those bloggers resort to random musings rather than on-topic postings.&lt;/p&gt;
&lt;p&gt;Online publishing is in transition. We&amp;rsquo;re all trying to find the best way forward. As you learn what works for you, I hope you&amp;rsquo;ll continue to report on it here.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-572&#34;&gt;October 24, 2006 9:42 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Thanks Tim. I know that you&amp;rsquo;re running a business with bills to pay. I don&amp;rsquo;t mean to sound accusing when I say &amp;ldquo;for O&amp;rsquo;Reilly, free is better,&amp;rdquo; because free is better for everyone, including people reading XML.com articles without paying for them. I&amp;rsquo;m just sorry that the business model can&amp;rsquo;t support the higher percentage of relevant, professionally written and edited content that used to be available.&lt;/p&gt;
&lt;p&gt;By Kendall on &lt;a href=&#34;#comment-573&#34;&gt;October 24, 2006 10:24 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Hmm, I might offer one or another divergent opinions here (most ironically: I&amp;rsquo;ve never published about XML on any weblog whatever, much less one not connected to XML.com. I&amp;rsquo;ve only ever written about XML for $$! :)), I will add another point that I know both Bob and Tim know:&lt;/p&gt;
&lt;p&gt;Yes, the online world is struggling to find a sustainable business model (non-porn, biz model, that is), but the &lt;em&gt;other&lt;/em&gt; issue that is confronting XML.com specifically is that XML has become infrastructure. Because of its success and ubiquity, the process of learning and teaching XML has been pushed off onto lots of other mechanisms (vendor training, academia, etc).&lt;/p&gt;
&lt;p&gt;XML was the hot thing for longer than most hot things, but even its star has faded. That&amp;rsquo;s right and fine and a mark of its success. So XML.com has to re-invent itself, and it&amp;rsquo;s done that in part by focusing on other, related things like AJAX, Web 2.0, etc.&lt;/p&gt;
&lt;p&gt;So, yes, everyone needs a new biz model. But we in the tech world better build into ours a set of assumptions about the cyclicality of technical issues, since that&amp;rsquo;s just an unavoidable fact of our working life.&lt;/p&gt;
&lt;p&gt;Oh, and I agree with Tim: I&amp;rsquo;m not happy about only publishing one piece per week either, but we are publishing more PDFs these days in the place of those articles. Of course those PDFs cost $$, so we&amp;rsquo;ll see if that&amp;rsquo;s something that the audience is interested in.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://norman.walsh.name/&#34; title=&#34;http://norman.walsh.name/&#34;&gt;Norman Walsh&lt;/a&gt; on &lt;a href=&#34;#comment-574&#34;&gt;October 26, 2006 8:01 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;d be more interested if it was in XML, even XHTML, instead of PDF.&lt;/p&gt;
&lt;p&gt;ObOnTopic: Excellent post, Bob. Sorry for the tangential remark.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2006">2006</category>
      
      <category domain="https://www.bobdc.com//categories/publishing">publishing</category>
      
    </item>
    
    <item>
      <title>Somewhat customized mass publishing</title>
      <link>https://www.bobdc.com/blog/somewhat-customized-mass-publi/</link>
      <pubDate>Tue, 17 Oct 2006 07:57:48 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/somewhat-customized-mass-publi/</guid>
      
      
      <description><div>If you can find an audience of audiences.</div><div>&lt;p&gt;&lt;a href=&#34;http://realtytimes.com/c/sampleplus&#34;&gt;&lt;img src=&#34;https://www.bobdc.com/img/main/yourname.jpg&#34; alt=&#34;[realty times demo]&#34; border=&#34;0&#34; align=&#34;right&#34; hspace=&#34;30px&#34; vspace=&#34;30px&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I recently received an email from a local friend who I know through his trumpet playing, and his email&amp;rsquo;s signature revealed his day job: he&amp;rsquo;s a real estate agent. The signature pointed to his &lt;a href=&#34;http://realtytimes.com/94/TomBibb&#34;&gt;&amp;ldquo;Realty Times&amp;rdquo; newsletter&lt;/a&gt; that greatly impressed me for about four seconds, as I thought: &amp;ldquo;He&amp;rsquo;s researching, writing, and finding illustrations for this many stories every month?&amp;rdquo; His picture is at the top, with a link to his business&amp;rsquo;s home page, and his phone number, email address, and postal address are all in the footer of the newsletter.&lt;/p&gt;
&lt;p&gt;Seeing &lt;code&gt;/94/TomBibb&lt;/code&gt; after the realtytimes.com domain name of his newsletter&amp;rsquo;s URL suggested to me that other people had a similar, if not identical newsletter, and the &lt;a href=&#34;http://www.google.com/search?hl=en&amp;amp;lr=&amp;amp;safe=off&amp;amp;q=site%3Arealtytimes.com+intitle%3A%22real+estate+update%22&amp;amp;btnG=Search&#34;&gt;right Google search&lt;/a&gt; showed that there were quite a few. The most interesting one is the newsletter for real estate agent &lt;a href=&#34;http://realtytimes.com/c/sampleplus&#34;&gt;Your Name&lt;/a&gt;, a blond woman whose hair adds five inches to her height. Before I saw it, I had planned to title this weblog posting &amp;ldquo;Fake customized publishing,&amp;rdquo; but Ms. Name&amp;rsquo;s newsletter showed me that the newsletter includes slots for &amp;ldquo;YOUR OWN ARTICLE&amp;rdquo; on the right and a &amp;ldquo;FEATURED LINK BOX&amp;rdquo; where you &amp;ldquo;Put a link of your choice here (i.e. listing of the week, helpful consumer site, etc.)&amp;rdquo; The realtytimes.com &lt;a href=&#34;http://realtytimes.com/&#34;&gt;main page&lt;/a&gt; does have news stories, but their main activity seems to be the accumulation of content so that they can charge others for rebranding it.&lt;/p&gt;
&lt;p&gt;Syndication (selling content to be rebranded and repackaged by your customers) is not a new idea. The Internet makes finding syndication customers and distributing to them easier, and it makes becoming a customer easier—it&amp;rsquo;s not very practical for a one-person business to become a customer of a grand old syndication network like the &lt;a href=&#34;http://www.ap.org/&#34;&gt;Associated Press&lt;/a&gt; or &lt;a href=&#34;http://www.kingfeatures.com/&#34;&gt;King Features&lt;/a&gt;. The Internet also makes syndication less necessary—why read a New York Times story on your local newspaper&amp;rsquo;s web site when you can read it at &lt;a href=&#34;http://www.nytimes.com/&#34;&gt;nytimes.com&lt;/a&gt;? To come up with a new, Internet-based syndication business model, as realtytimes.com did, you need to find customers with their own audiences and little reason to go after each others&amp;rsquo; audiences. Local real estate markets are a great choice. Does anyone know of other examples of Internet-based syndication where someone found an audience of customers, each with their own separate audience to publish to?&lt;/p&gt;
&lt;h2 id=&#34;4-comments&#34;&gt;4 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.ccil.org/~cowan&#34; title=&#34;http://www.ccil.org/~cowan&#34;&gt;John Cowan&lt;/a&gt; on &lt;a href=&#34;#comment-566&#34;&gt;October 17, 2006 8:18 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Although &lt;a href=&#34;http://en.wikipedia.org/wiki/Associated_Press&#34;&gt;AP&lt;/a&gt; certainly began as a syndication network (indeed, as a &lt;a href=&#34;http://en.wikipedia.org/wiki/Virtual_corporation&#34;&gt;virtual corporation&lt;/a&gt;: it was originally just a bunch of agreements, some leased telegraph lines, and a few reporters covering the &lt;a href=&#34;http://en.wikipedia.org/wiki/Mexican%E2%80%93American_War&#34;&gt;Mexican War&lt;/a&gt;), the overwhelming majority of its content that is of more than local interest is now self-generated.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://ebiquity.umbc.edu/blogger/&#34; title=&#34;http://ebiquity.umbc.edu/blogger/&#34;&gt;Tim Finin&lt;/a&gt; on &lt;a href=&#34;#comment-567&#34;&gt;October 18, 2006 8:22 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://ebiquity.umbc.edu/blogger/splog-software-from-hell/&#34;&gt;Splogs&lt;/a&gt; provide an example from even deeper into the dark side. Sometimes we just can&amp;rsquo;t escape the fact that we are from the third species of chimpanzees.&lt;/p&gt;
&lt;p&gt;By Paul Anderson on &lt;a href=&#34;#comment-568&#34;&gt;October 20, 2006 10:06 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;This idea is quite common in UK politics. A national party will produce a single general newsletter describing all the good things MP X has done, provide it to each local party and the local party then replaces the label MP X with the name and photo of their local MP and then distributes. I&amp;rsquo;ve not seen this done with e-mail or MP&amp;rsquo;s website content though (yet!).&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-569&#34;&gt;October 20, 2006 10:38 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Paul,&lt;/p&gt;
&lt;p&gt;That&amp;rsquo;s an excellent example. A national political party is a centralized organization with the resources to generate professional content that they can provide to geographically localized organizations for customization.&lt;/p&gt;
&lt;p&gt;Thinking about it as a marketing campaign that seeks to combine a consistent message combined with localized content, I realized that many wide-scale advertising campaigns for commercial products do something similar&amp;ndash;for example, Goodyear or Michelin might provide articles on tire care tips to auto repair shops to incorporate into their own hard copy or online publicity.&lt;/p&gt;
&lt;p&gt;Bob&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2006">2006</category>
      
      <category domain="https://www.bobdc.com//categories/publishing">publishing</category>
      
    </item>
    
    <item>
      <title>All the personal data you want</title>
      <link>https://www.bobdc.com/blog/all-the-personal-data-you-want/</link>
      <pubDate>Wed, 04 Oct 2006 08:55:39 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/all-the-personal-data-you-want/</guid>
      
      
      <description><div>Except that it&#39;s all fake.</div><div>&lt;p&gt;I needed some sample address book data for a project that I&amp;rsquo;m working on. Because of the number of people who may see it, I didn&amp;rsquo;t want to use real address book entries, so I wrote some &lt;a href=&#34;http://www.snee.com/xml/misc/generateAddrBook.zip&#34;&gt;Python scripts&lt;/a&gt; to generate some.&lt;/p&gt;
&lt;p&gt;I spread it across a few scripts because I wanted to generate data for different schemas. I put the main data generation functions in one file and then call those functions and format the data in scripts that are specialized for their particular output format. You can use these to generate data for a relational database, XML, your favorite RDF flavor, or whatever you like. The basic library has functions such as &lt;code&gt;firstName()&lt;/code&gt; and &lt;code&gt;zipCode()&lt;/code&gt; to generate random values, with some, like &lt;code&gt;middleName()&lt;/code&gt; and &lt;code&gt;note()&lt;/code&gt;, sometimes returning nothing. I have two scripts that use the library: one generates a CSV file that emulates one exported from Microsoft Outlook 2003, and the other emulates the CSV address file exported by Eudora 7. (Did you know that Eudora can&amp;rsquo;t import the CSV files that it exports?)&lt;/p&gt;
&lt;p&gt;The data is pretty US-oriented, but a few tweaks should adapt it for other countries. It randomly picks first and middle names from the US census list of most popular &lt;a href=&#34;http://www.census.gov/genealogy/names/dist.male.first%20&#34;&gt;male&lt;/a&gt; and &lt;a href=&#34;http://www.census.gov/genealogy/names/dist.female.first&#34;&gt;female&lt;/a&gt; names and surnames from the census list of most popular &lt;a href=&#34;http://www.census.gov/genealogy/names/dist.all.last&#34;&gt;last names&lt;/a&gt;. It took very little web searching to find the most popular &lt;a href=&#34;http://www.santacruzpl.org/readyref/files/q-s/stnames.shtml&#34;&gt;street names&lt;/a&gt; and &lt;a href=&#34;http://www.hoobly.com/m/popular&#34;&gt;US Cities&lt;/a&gt;, and for employer names I went with the &lt;a href=&#34;http://en.wikipedia.org/wiki/List_of_Fortune_500#401-500&#34;&gt;last 100&lt;/a&gt; of the Fortune 500.&lt;/p&gt;
&lt;p&gt;There generated data has plenty of incongruities. Middle names are randomly picked separately from first names, so male and female names are often mixed. The same happens with city and state names, so that Albert Victoria Freeman Jr. may live in Baltimore, California. To convert an employer name to a domain name for a work email address, I just took out spaces and punctuation, converted to lower-case, and put &amp;ldquo;.com&amp;rdquo; at the end, which can result in some long domain names.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve &lt;a href=&#34;http://www.snee.com/bob/worksch.html#i1&#34;&gt;always enjoyed&lt;/a&gt; generating random content that faked the appearance of semantic value. One event in particular inspired me about twenty-three years ago, when the only programming languages I knew were Microsoft Basic and dBase II. I was in the early stages of a &amp;ldquo;poetry&amp;rdquo; generation program that only had seven or eight possible verbs, and all the nouns were pronouns, and it came out with this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;It thinks. 
It scares her.
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;(Try to picture it on green and white paper in a dot matrix font.) The heart of all of these is the random function; when coding for fun, seeing different output each time is often more entertaining than consistent output. I&amp;rsquo;ve recently figured out how I can generate multi-part music from an XSLT script, which I&amp;rsquo;ll make public somewhere once I have the time to actually implement it and write it up.&lt;/p&gt;
&lt;h2 id=&#34;2-comments&#34;&gt;2 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://danbri.org/&#34; title=&#34;http://danbri.org/&#34;&gt;Dan Brickley&lt;/a&gt; on &lt;a href=&#34;#comment-563&#34;&gt;October 4, 2006 10:18 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Fun stuff :)&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve recently been wondering about making a hosted version of the Dada Engine (the system behind the Postmodernism generator you link to). I was actually thinking of it for language learning apps, after noticing that many language courses seem based around exercising one&amp;rsquo;s ability to translate variations on a theme (&amp;ldquo;I want to&amp;hellip;&amp;rdquo;, &amp;ldquo;You need to &amp;hellip;&amp;rdquo;, &amp;ldquo;We used to&amp;hellip;&amp;rdquo;, / &amp;ldquo;eat spaghetti&amp;rdquo; / &amp;ldquo;drink red|white|green wine&amp;rdquo; / &amp;ldquo;quickly&amp;rdquo; &amp;ldquo;slowly&amp;rdquo; / &amp;ldquo;tonight&amp;rdquo; &amp;ldquo;tommorrow&amp;rdquo; &amp;ldquo;every day&amp;rdquo; &amp;hellip;etc.&lt;/p&gt;
&lt;p&gt;After playing a little (&lt;a href=&#34;http://spypixel.com/2006/spanglish/testme.cgi&#34;&gt;http://spypixel.com/2006/spanglish/testme.cgi&lt;/a&gt; &lt;a href=&#34;http://spypixel.com/2006/spanglish/&#34;&gt;http://spypixel.com/2006/spanglish/&lt;/a&gt; &amp;hellip;) I realised my grammar skills (machine and human language!) weren&amp;rsquo;t up to it, &amp;hellip; and maybe some sort of wiki or collaborative effort would let people write better dada engine scripts communally.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve a hunch that a hosted dada engine system could catch on, &amp;hellip; but what it really needs is some changes to give a bit of a UI for grammar creation, and to have some mechanisms for modularisation so that sentence-fragments can more easily be shared across a group of users.&lt;/p&gt;
&lt;p&gt;File under: ProcrastinationOpportunities :)&lt;/p&gt;
&lt;p&gt;By deltabob on &lt;a href=&#34;#comment-564&#34;&gt;October 5, 2006 7:56 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Ah&amp;hellip;green bar paper. I miss it so. I still have a box somewhere of old email printouts from college on green bar paper.&lt;/p&gt;
&lt;p&gt;I like the idea of the randomly generated name/address info. Sometimes when looking into the people finder databases, it feels like that&amp;rsquo;s how they were populated.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2006">2006</category>
      
      <category domain="https://www.bobdc.com//categories/miscellaneous">miscellaneous</category>
      
    </item>
    
    <item>
      <title>w2k batch file programming</title>
      <link>https://www.bobdc.com/blog/w2k-batch-file-programming/</link>
      <pubDate>Wed, 20 Sep 2006 09:06:26 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/w2k-batch-file-programming/</guid>
      
      
      <description><div>Windows batch files have come a long way since I last paid attention.</div><div>&lt;p&gt;&lt;a href=&#34;http://www.xmission.com/~comphope/sethlp.htm&#34;&gt;&lt;img src=&#34;https://www.bobdc.com/img/main/cprompt.jpg&#34; alt=&#34;[cmd window icon]&#34; border=&#34;0&#34; align=&#34;right&#34; hspace=&#34;30px&#34; vspace=&#34;30px&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Besides Emacs and Firefox, the program I use most is probably the Windows command prompt. My daughters make fun of it, and telling them that in the old days the entire computer screen was just one big version of that window is like telling them that at their age, I didn&amp;rsquo;t know that &amp;ldquo;The Wizard of Oz&amp;rdquo; went from black and white to color when Dorothy landed in Oz because our TV set showed everything in black and white. Still, the &amp;ldquo;DOS box&amp;rdquo; is the command line interface to the operating system that I need to use the most.&lt;/p&gt;
&lt;p&gt;Years ago I often tried to push the capabilities of the batch file language further, but it&amp;rsquo;s not a real scripting language. To get beyond simple tasks, I would just write a perl script to do what I needed or else make the argument that the process in question was better off running on a Unix-based box. Recently, however, I learned that the Windows 2000 command language can do more proper programming language tasks than I realized, such as finding substrings of variables, prompting for input, differentiating between string and numeric variables, and even generating random numbers.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ll keep the description to a minimum and just demonstrate. First, substrings:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;   C:\&amp;gt;set teststring=&amp;quot;abcdefghijkl&amp;quot;

  
   C:\&amp;gt;set testsubstring=%teststring:~2,4%

  
   C:\&amp;gt;echo %testsubstring%
   bcde
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Prompting the user and storing the result in a variable:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;   C:\&amp;gt;set /P color=&amp;quot;what color?&amp;quot;
   what color?red

  
   C:\&amp;gt;echo %color%
   red
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Numeric variables:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;   C:\&amp;gt;set i=3

  
   C:\&amp;gt;set x=%i%+5

  
   C:\&amp;gt;echo %x%
   3+5

  
   C:\&amp;gt;set /a i=3
   3
   C:\&amp;gt;set /a x=%i%+5
   8
   C:\&amp;gt;echo %x%
   8
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;My favorite thing in any programming language: a random function!&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;   C:\&amp;gt;echo %random%
   28278

  
   C:\&amp;gt;echo %random%
   3127

  
   C:\&amp;gt;set /a i=%random% % 3
   1
   C:\&amp;gt;echo %i%
   1
   C:\&amp;gt;set /a i=%random% % 3
   1
   C:\&amp;gt;set /a i=%random% % 3
   0
   C:\&amp;gt;set /a i=%random% % 3
   1
   C:\&amp;gt;set /a i=%random% % 3
   0
   C:\&amp;gt;set /a i=%random% % 3
   2
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;You can stack commands on one line, and if you use two ampersands, the failure of one command means that the command processor won&amp;rsquo;t try to execute succeeding ones:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;   C:\&amp;gt;cd \bin &amp;amp;&amp;amp; echo cd successful
   cd successful

  
   C:\bin&amp;gt;cd \xx &amp;amp;&amp;amp; echo cd successful
   The system cannot find the path specified.

  
   C:\bin&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Something that isn&amp;rsquo;t part of the command language, but which an Emacs geek like me was happy to learn about, is tabbing for completion of both directory and file names. In addition to completing a name, it often adds quotes for you:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;   C:&amp;gt;cd doc[tab]
   C:\&amp;gt;cd &amp;quot;Documents and Settings&amp;quot;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I learned about the substring trick on a mailing list, and when I did a few searches to find out more, I learned about the others at two articles that I found: &lt;a href=&#34;http://malektips.com/windows_2000_and_dos_help_and_tips.html&#34;&gt;Windows 2000 and DOS Help and Tips&lt;/a&gt; and &lt;a href=&#34;http://www.xmission.com/~comphope/sethlp.htm&#34;&gt;Information about the SET command&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;m guessing that for each person reading this, some of this is old news, but that one or two things are new to you. I know that bash, sh, and other Unix scripting shells are in &lt;a href=&#34;http://www.cygwin.com/&#34;&gt;cygwin&lt;/a&gt; and therefore available to Windows users, so that there&amp;rsquo;s still no reason to write complex scripts in the Windows batch command language. Still, it was fun to see that an old dog that I&amp;rsquo;d lived with for many years had learned a few new tricks when I wasn&amp;rsquo;t watching.&lt;/p&gt;
&lt;h2 id=&#34;2-comments&#34;&gt;2 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://kontrawize.blogs.com/kontrawize/&#34; title=&#34;http://kontrawize.blogs.com/kontrawize/&#34;&gt;Anthony B. Coates&lt;/a&gt; on &lt;a href=&#34;#comment-552&#34;&gt;September 25, 2006 5:21 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;You should also check out the information about the FOR command, which since Win2K has been able to recurse over all the files in a directory tree, among other things.&lt;/p&gt;
&lt;p&gt;Cheers, Tony.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-553&#34;&gt;September 25, 2006 7:44 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Thanks Tony! Also, I didn&amp;rsquo;t mention the IF command, which doesn&amp;rsquo;t seem to have progressed over the years, but does give batch files a fairly basic programming language capability. (And, I suppose, GOTO, something that the younger folk may not even recognize.)&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2006">2006</category>
      
      <category domain="https://www.bobdc.com//categories/neat-tricks">neat tricks</category>
      
    </item>
    
    <item>
      <title>UMD is number one!</title>
      <link>https://www.bobdc.com/blog/umd-is-number-one/</link>
      <pubDate>Tue, 12 Sep 2006 08:19:04 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/umd-is-number-one/</guid>
      
      
      <description><div>At more than one thing, apparently.</div><div>&lt;p&gt;&lt;a href=&#34;http://www.mindswap.org/&#34;&gt;&lt;img src=&#34;http://www.umresearch.umd.edu/images2/webglobe.gif&#34; alt=&#34;[UMD logo]&#34; border=&#34;0&#34; align=&#34;right&#34; hspace=&#34;30px&#34; vspace=&#34;30px&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;If you were going to do academic research on the semantic web, the University of Maryland would have to merit your serious attention, because a lot of important work has gone on there and continues to do so. Is it pure coincidence that &lt;a href=&#34;http://www.hightimes.com/ht/entertainment/content.php?bid=976&amp;amp;aid=24&#34;&gt;High Times magazine picked them&lt;/a&gt; as the number one school for &amp;ldquo;counterculture activity&amp;rdquo;? Keep in mind that, as the author of the article &amp;ldquo;The High Times Guide to Higher Education&amp;rdquo; put it, a key criteria for school selection is &amp;ldquo;Is this a good school for stoners?&amp;rdquo;&lt;/p&gt;
&lt;h2 id=&#34;3-comments&#34;&gt;3 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://ebiquity.umbc.edu/blogger/&#34; title=&#34;http://ebiquity.umbc.edu/blogger/&#34;&gt;Tim Finin&lt;/a&gt; on &lt;a href=&#34;#comment-537&#34;&gt;September 12, 2006 9:40 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Sigh&amp;hellip; There is &lt;a href=&#34;http://umbc.edu/&#34;&gt;another University of Maryland campus&lt;/a&gt; that is also doing a lot of &lt;a href=&#34;http://ebiquity.umbc.edu/tag/semantic%20web&#34;&gt;semantic web research&lt;/a&gt;. Unfortunately, we are just not in the same league as &lt;a href=&#34;http://www.umcp.umd.edu/&#34;&gt;UMCP&lt;/a&gt;. Not only did we not even make the High Times list of &lt;a href=&#34;http://www.hightimes.com/ht/entertainment/content.php?bid=976&amp;amp;aid=24&#34;&gt;stoner schools&lt;/a&gt;, we don&amp;rsquo;t even have a football team. Worse yet, our most successful intercollegiate team is the &lt;a href=&#34;http://sta.umbc.edu/orgs/chess/&#34;&gt;Chess team&lt;/a&gt;. It&amp;rsquo;s hopeless &amp;ndash; we&amp;rsquo;ll never catch up.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-538&#34;&gt;September 12, 2006 9:47 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I must admit, I didn&amp;rsquo;t realize that Swoop and Swoogle came from two different UMD campuses until I did a few searches before writing this post. It sounds like there are a lot of people thinking deep thoughts all over UMD, for a variety of reasons&amp;hellip;\&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://ebiquity.umbc.edu/blogger/&#34; title=&#34;http://ebiquity.umbc.edu/blogger/&#34;&gt;Tim Finin&lt;/a&gt; on &lt;a href=&#34;#comment-539&#34;&gt;September 12, 2006 10:51 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Yes, we are all high on the Semantic Web.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2006">2006</category>
      
      <category domain="https://www.bobdc.com//categories/semantic-web">semantic web</category>
      
    </item>
    
    <item>
      <title>A great feature in AltaVista, alltheweb, and Ask.com, but not in Google</title>
      <link>https://www.bobdc.com/blog/a-great-feature-in-altavista-a/</link>
      <pubDate>Fri, 08 Sep 2006 08:04:58 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/a-great-feature-in-altavista-a/</guid>
      
      
      <description><div>Searching by date.</div><div>&lt;p&gt;What if you wanted to find documents on the web from before September 11, 2001 that mention Osama Bin Laden? What if you wanted to find the web&amp;rsquo;s earliest mention of Beck Hansen? You can&amp;rsquo;t do this on Google, but you can on &lt;a href=&#34;http://www.altavista.com/web/adv&#34;&gt;AltaVista&amp;rsquo;s advanced search page&lt;/a&gt; (&lt;a href=&#34;http://www.altavista.com/web/results?itag=ody&amp;amp;pg=aq&amp;amp;aqmode=s&amp;amp;aqa=&amp;amp;aqp=osama+bin+laden&amp;amp;aqo=&amp;amp;aqn=&amp;amp;aqb=&amp;amp;kgs=1&amp;amp;kls=0&amp;amp;d2=0&amp;amp;dt=dtrange&amp;amp;dfr%5Bd%5D=1&amp;amp;dfr%5Bm%5D=1&amp;amp;dfr%5By%5D=1980&amp;amp;dto%5Bd%5D=10&amp;amp;dto%5Bm%5D=9&amp;amp;dto%5By%5D=2001&amp;amp;filetype=&amp;amp;rc=dmn&amp;amp;swd=&amp;amp;lh=&amp;amp;nbq=10&#34;&gt;Bin Laden&lt;/a&gt;, &lt;a href=&#34;http://www.altavista.com/web/results?itag=ody&amp;amp;pg=aq&amp;amp;aqmode=s&amp;amp;aqa=&amp;amp;aqp=beck+hansen&amp;amp;aqo=&amp;amp;aqn=&amp;amp;aqb=&amp;amp;kgs=1&amp;amp;kls=0&amp;amp;d2=0&amp;amp;dt=dtrange&amp;amp;dfr%5Bd%5D=1&amp;amp;dfr%5Bm%5D=1&amp;amp;dfr%5By%5D=1980&amp;amp;dto%5Bd%5D=6&amp;amp;dto%5Bm%5D=9&amp;amp;dto%5By%5D=1996&amp;amp;filetype=&amp;amp;rc=dmn&amp;amp;swd=&amp;amp;lh=&amp;amp;nbq=10&#34;&gt;Beck&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;In between Yahoo first popularizing the idea of a web search engine and Google becoming the dominant one, AltaVista had its fifteen minutes as the trendy one among geeks. I now mostly use Google, but when I had free access to my former employer&amp;rsquo;s otherwise non-free search engine, I grew to appreciate the ability to limit a document search by date, especially when an unqualified search yielded an unmanageable amount of hits. (A simple AltaVista search for &amp;ldquo;Osama Bin Laden&amp;rdquo; gets 23 million hits, and a Google search gets 28.5 million.)&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://www.alltheweb.com/advanced?advanced=1&#34;&gt;alltheweb&amp;rsquo;s advanced search page&lt;/a&gt; also offers this, as does &lt;a href=&#34;http://www.ask.com/webadvanced?o=0&amp;amp;l=dir&#34;&gt;Ask.com&amp;rsquo;s advanced search page&lt;/a&gt;. For some odd reason the &lt;a href=&#34;http://www.ask.com&#34;&gt;Ask.com main page&lt;/a&gt; doesn&amp;rsquo;t link to their advanced search page, but their Search Results page does. (&lt;a href=&#34;https://www.teoma.com&#34;&gt;www.teoma.com&lt;/a&gt; links to &lt;a href=&#34;http://search.ask.com/&#34;&gt;an Ask.com page&lt;/a&gt; that does include a link to the advanced search page.) When I filled out the date fields of Ask.com&amp;rsquo;s form, their system inserted the keyword &lt;code&gt;betweendate:&lt;/code&gt; into the query. It also suggested that the keyword it inserted was a spelling mistake, and that perhaps I had meant to insert the two words &amp;ldquo;between date.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;The search engines&amp;rsquo; date range constraints don&amp;rsquo;t always work perfectly, but they&amp;rsquo;re often helpful. For example, as with similar searches on the &lt;a href=&#34;http://groups.google.com/advanced_search?hl=en&#34;&gt;Google Groups advanced search page&lt;/a&gt;, it&amp;rsquo;s fun to look for the earliest available mention of a term or name. I once saw an article decrying what the author called &amp;ldquo;nexis journalism,&amp;rdquo; in which someone simply used one of my former employer&amp;rsquo;s products to track the use of a term over time and then wrote an article describing their results; now you can do the same!&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2006">2006</category>
      
      <category domain="https://www.bobdc.com//categories/neat-tricks">neat tricks</category>
      
    </item>
    
    <item>
      <title>RDFS without RDF/OWL?</title>
      <link>https://www.bobdc.com/blog/rdfs-without-rdfowl/</link>
      <pubDate>Fri, 01 Sep 2006 08:07:47 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/rdfs-without-rdfowl/</guid>
      
      
      <description><div>Has RDF Schema become merely a layer of RDF/OWL?</div><div>&lt;p&gt;&lt;a href=&#34;http://www.w3.org/TR/rdf-schema/&#34;&gt;&lt;img src=&#34;https://www.bobdc.com/img/main/rdfs.png&#34; alt=&#34;[RDF/OWL and RDF schema]&#34; border=&#34;0&#34; align=&#34;right&#34; hspace=&#34;30px&#34; vspace=&#34;30px&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;First, I&amp;rsquo;d like to thank &lt;a href=&#34;http://dannyayers.com/&#34;&gt;Danny&lt;/a&gt;, &lt;a href=&#34;http://www.wasab.dk/morten/blog/&#34;&gt;Morten&lt;/a&gt;, Damian, &lt;a href=&#34;http://dowhatimean.net/&#34;&gt;Richard&lt;/a&gt;, &lt;a href=&#34;http://clarkparsia.com/&#34;&gt;Kendall&lt;/a&gt;, and &lt;a href=&#34;http://www.topquadrant.com/tq_management.htm&#34;&gt;Ralph&lt;/a&gt; for their insights about my &lt;a href=&#34;https://www.bobdc.com/blog/rdfowl-for-data-silo-integrati&#34;&gt;question&lt;/a&gt; last week on existing use of RDF/OWL for integration of separate, non-RDF databases. I&amp;rsquo;ve been doing some follow-up research via email, and will report back on what I find out.&lt;/p&gt;
&lt;p&gt;This week&amp;rsquo;s question about real-world use of RDF metadata: is anybody using RDF Schema for the sake of RDF Schema, or has RDFS become little more than a layer of RDF/OWL? For example, we use &lt;code&gt;rdfs:domain&lt;/code&gt; and &lt;code&gt;rdfs:range&lt;/code&gt; in our RDF/OWL ontologies, but have &lt;code&gt;owl:Class&lt;/code&gt; and &lt;code&gt;owl:subClassOf&lt;/code&gt; completely replaced the use of &lt;code&gt;rdfs:Class&lt;/code&gt; and &lt;code&gt;rdfs:SubClassOf&lt;/code&gt;? In other words, has RDF/OWL, as an extension of RDFS, replaced the use of RDFS by itself, or is anyone still creating and using RDF Schemas that use nothing from the owl namespace?&lt;/p&gt;
&lt;p&gt;My guess is that the answer is no; no one creates RDF Schemas for the sake of RDF Schemas anymore. Tools exist to ease the creation of ontologies without requiring much interaction with RDF/OWL syntax, letting people add all the metadata that RDFS allows and more. RDFS has become just another namespace used to define certain aspects of RDF/OWL ontologies.&lt;/p&gt;
&lt;p&gt;Corrections? Confirmations? Counter-examples?&lt;/p&gt;
&lt;h2 id=&#34;13-comments&#34;&gt;13 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.dfki.uni-kl.de/~grimnes/&#34; title=&#34;http://www.dfki.uni-kl.de/~grimnes/&#34;&gt;Gunnar Grimnes&lt;/a&gt; on &lt;a href=&#34;#comment-515&#34;&gt;September 1, 2006 9:19 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;My impression, that has no real backing - apart from my gut feeling, is that less people use owl, as more and more realise that really understanding and implementing the inferencing required to make owl work is almost impossible. (AND if you manage it&amp;rsquo;s horribly slow)&lt;br /&gt;
I thought more people use RDFS plus a few selected properties from either OWL or the protege extensions, i.e. cardinality constraints and (inverse)functional properties. (which of course means they dont use the intended RDFS semantics either&amp;hellip; )&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-516&#34;&gt;September 1, 2006 11:03 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I don&amp;rsquo;t want to begin a discussion here of whether OWL&amp;rsquo;s popularity is increasing or decreasing and why, so I&amp;rsquo;ll skip to your second point: I proposed that people are only using RDFS constraints in combination with OWL, and you said that &amp;ldquo;more people use RDFS plus a few selected properties from either OWL or the protege extensions to set up an ontology.&amp;rdquo; Are you saying that the use of RDFS+Protege extensions is a common use of RDFS without OWL?&lt;/p&gt;
&lt;p&gt;thanks,&lt;/p&gt;
&lt;p&gt;Bob&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://copia.ogbuji.net/blog&#34; title=&#34;http://copia.ogbuji.net/blog&#34;&gt;Chimezie&lt;/a&gt; on &lt;a href=&#34;#comment-517&#34;&gt;September 1, 2006 11:09 AM&lt;/a&gt;&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;but have owl:Class and owl:subClassOf completely replaced the use of rdfs:Class and rdfs:SubClassOf? In other words, has RDF/OWL, as an extension of RDFS, replaced the use of RDFS by itself, or is anyone still creating and using RDF Schemas that use nothing from the owl namespace?&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I&amp;rsquo;m not aware of owl:subClassof and owl:Classes (and OWL in general) are only needed if the level of expressiveness needed in an ontology is not sufficiently covered by RDFS. In my experience, people tend to use OWL reflexively since it&amp;rsquo;s name suggests that it is an ontology language whereas RDFS isn&amp;rsquo;t.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;RDFS has become just another namespace used to define certain aspects of RDF/OWL ontologies.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Yes, essentially RDFS is really just a convenient subset of RDF/OWL with specific limitations for decidability.&lt;/p&gt;
&lt;p&gt;see: &lt;a href=&#34;http://owl1-1.cs.manchester.ac.uk/Tractable.html#6_RDFS&#34;&gt;http://owl1-1.cs.manchester.ac.uk/Tractable.html#6_RDFS&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-518&#34;&gt;September 1, 2006 11:25 AM&lt;/a&gt;&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;In my experience, people tend to use OWL reflexively since it&amp;rsquo;s name suggests that it is an ontology language whereas RDFS isn&amp;rsquo;t.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I think you hit the nail on the head.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://danbri.org/&#34; title=&#34;http://danbri.org/&#34;&gt;Dan Brickley&lt;/a&gt; on &lt;a href=&#34;#comment-519&#34;&gt;September 1, 2006 12:32 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The Swoogle folks should set up an Amazon wishlist&amp;hellip; then we can bug them with questions like this without feeling bad.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;d love to see stats.&lt;/p&gt;
&lt;p&gt;RDFS by design, is just one way of making simple claims about RDF vocabulary. OWL mostly just extends this, but there are a few awkwardnesses, where you&amp;rsquo;re forced to either jump one way, or the other, &amp;hellip; or be verbose and a bit tricksy. For eg., you can say that something is both and owl:Class and an rdfs:Class. Staying in OWL DL is an additional complication, of course&amp;hellip; And we&amp;rsquo;ve got a rules language coming down the pipeline soon, so this really is a neverending story&amp;hellip;&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-520&#34;&gt;September 1, 2006 1:11 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The tough part for the Swoogle query would be to ask by date, i.e. my question would really be answered by figures that show the total number of RDF Schemas with no use of the OWL namespace vs. the number of OWL ontologies that do use RDFS, broken down by year.&lt;/p&gt;
&lt;p&gt;My gut reaction is that the former figure would go down from year to year as the latter one went up.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://ebiquity.umbc.edu/blogger/&#34; title=&#34;http://ebiquity.umbc.edu/blogger/&#34;&gt;tim finin&lt;/a&gt; on &lt;a href=&#34;#comment-521&#34;&gt;September 1, 2006 2:56 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;We wondered about that too and looked into it for a paper to appear in ISWC 2006. I just made a &lt;a href=&#34;http://ebiquity.umbc.edu/blogger/2006/09/01/rdfs-without-rdfowl/&#34;&gt;post&lt;/a&gt; with some data from our paper.&lt;/p&gt;
&lt;p&gt;We&amp;rsquo;ll think about adding a wish list that people can use to suggest new Swoogle features and also pose questions that we might be able to answer from the underlying database.&lt;/p&gt;
&lt;p&gt;Tim\&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://danbri.org/&#34; title=&#34;http://danbri.org/&#34;&gt;Dan Brickley&lt;/a&gt; on &lt;a href=&#34;#comment-522&#34;&gt;September 1, 2006 8:36 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Tim &amp;mdash; Re wishlist, &amp;hellip; even just a page in the ESW wiki might be useful. E.g., I&amp;rsquo;m definitely interested to learn more about FOAF property usage, especially the current mess around names (firstname, surname, given family etc etc). Nice work with the Geo writeup too btw, that was very handy.\&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://clarkparsia.com/weblog&#34; title=&#34;http://clarkparsia.com/weblog&#34;&gt;Bijan Parsia&lt;/a&gt; on &lt;a href=&#34;#comment-524&#34;&gt;September 3, 2006 12:06 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;We have a &lt;a href=&#34;http://www.mindswap.org/papers/2006/survey.pdf&#34;&gt;similar paper&lt;/a&gt;, with a difference focus/analysis, in ISWC to Tim&amp;rsquo;s.&lt;/p&gt;
&lt;p&gt;First, a minor point: there is no owl:subClassOf. There is only rdfs:subClassOf.&lt;/p&gt;
&lt;p&gt;Second, I never see a point in using rdfs:Class unless you want to (rather artificially) stay in RDFS (oh, ok, you may want to work with RDFS *cough* reasoners, but most of them will do the right thing if you *use* the class as a class, e.g., as the object of a type triple). If you ever want to make use of OWL, it&amp;rsquo;s going to be a bit of a PITA. Not a huge one, I guess, but, eh, why cause that pain.&lt;br /&gt;
&lt;br /&gt;
Third, RDFS is just plain silly. I made an LC (or similar) comment advocating its removal. Alas, that didn&amp;rsquo;t fly :) OWL may be silly too, in a variety of ways, but it at least is fairly substantive in its expressive power. The tractable fragments page (, disclaimer, I work with Bernardo and chatted with him about it) inclusion of RDFS is more for historical reasons, i.e., to isolate the fragment of RDFS that is OWL DL compatible.&lt;/p&gt;
&lt;p&gt;The person who wrote:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Yes, essentially RDFS is really just a convenient subset of RDF/OWL with specific limitations for decidability.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Is not correct. I do not believe that decidability was ever a concern of the designers of RDFS (&amp;ldquo;simplicity&amp;rdquo; perhaps) (I&amp;rsquo;d welcome a pointer to contrary information). RDFS metamodeling is not easily extensible wihout going undecidabible, which is a hint.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;My impression, that has no real backing - apart from my gut feeling, is that less people use owl, as more and more realise that really understanding and implementing the inferencing required to make owl work is almost impossible. (AND if you manage it&amp;rsquo;s horribly slow)&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Sigh. Why claim so much if it&amp;rsquo;s backed by nothing. In point of fact, it&amp;rsquo;s not impossible or even almost impossible. It&amp;rsquo;s not remotely impossible. And OWL DL reasoning isn&amp;rsquo;t horribly slow (depending on what you do and what you mean by slow). Are there scalability issues? Bien sur! But so? It&amp;rsquo;s true for everything. And normal RDF entailment (with the actual semantics) is intractable too.&lt;/p&gt;
&lt;p&gt;I tend to object to the use of the word &amp;ldquo;constraint&amp;rdquo; with RDFS&amp;hellip;RDFS is not constraint minded at all. To be a constraint, there needs to be a meaningful way to &lt;em&gt;violate&lt;/em&gt; the constraint. Practically speaking, there isn&amp;rsquo;t one in RDFS.&lt;/p&gt;
&lt;p&gt;By Irene Polikoff on &lt;a href=&#34;#comment-526&#34;&gt;September 4, 2006 12:44 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;A small nitpick - there is no owl:subClassOf statement. The idea was to layer OWL on top of RDFS, so OWL vocabulary does not add statements if an equivalent statement already exists in RDFS.&lt;/p&gt;
&lt;p&gt;As far as the use of owl:Class, why not? There is no clear disadvantages of using it as opposed to rdfs:Class.&lt;/p&gt;
&lt;p&gt;The problem with using RDFS on its own is that it has very limited utility. On the other hand, RDFS plus a few OWL axioms (namely, inverse, transitive and inverse functional) does quite a bit.&lt;/p&gt;
&lt;p&gt;Going forward I can also see people using RDFS plus rules, possibly by-passing OWL.&lt;/p&gt;
&lt;p&gt;Another problem with RDFS is that it has very unconventional semantics, most people just don&amp;rsquo;t understand it. By this I mean the interaction between domain statements and subclasses. Majority of people who are familar with notions of classes and properties, assume that if class A is in the domain of property p and class B is a subclass of A, then B is also in the domain of property p because in their mind subclasses supposed to &amp;lsquo;have&amp;rsquo; all the properties of their parents.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://danbri.org/&#34; title=&#34;http://danbri.org/&#34;&gt;Dan Brickley&lt;/a&gt; on &lt;a href=&#34;#comment-527&#34;&gt;September 4, 2006 9:03 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;s/Dave Beckett recently suggested using a WIki/Dan Brickley recently suggested using a Wiki/ ;)&lt;/p&gt;
&lt;p&gt;(maybe Dave suggested it too&amp;hellip;)&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://ebiquity.umbc.edu/blogger/&#34; title=&#34;http://ebiquity.umbc.edu/blogger/&#34;&gt;tim finin&lt;/a&gt; on &lt;a href=&#34;#comment-529&#34;&gt;September 4, 2006 10:36 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Sorry. It was late.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://copia.ogbuji.net&#34; title=&#34;http://copia.ogbuji.net&#34;&gt;chimezie&lt;/a&gt; on &lt;a href=&#34;#comment-534&#34;&gt;September 9, 2006 9:50 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Bijan,&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Is not correct. I do not believe that decidability was ever a concern of the designers of RDFS (&amp;ldquo;simplicity&amp;rdquo; perhaps) (I&amp;rsquo;d welcome a pointer to contrary information). RDFS metamodeling is not easily extensible wihout going undecidabible, which is a hint.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;If decidability wasn&amp;rsquo;t an explicit concern at the time of RDFS&amp;rsquo; inception, it would only be because a fully expressive DL was not on the roadmap for RDF at the time. But I don&amp;rsquo;t think it is a coincidence that RDFS *is* a specific DL language. And that it is a subset of OWL *is* irrefutable - I would think you shouldn&amp;rsquo;t need to be convinced of that.&lt;/p&gt;
&lt;p&gt;And it&amp;rsquo;s sort of tongue in cheek to welcome a contrary pointer when you have none along with your own assertions&amp;hellip;.\&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2006">2006</category>
      
      <category domain="https://www.bobdc.com//categories/rdf/owl">RDF/OWL</category>
      
    </item>
    
    <item>
      <title>RDF/OWL for data silo integration?</title>
      <link>https://www.bobdc.com/blog/rdfowl-for-data-silo-integrati/</link>
      <pubDate>Thu, 24 Aug 2006 09:48:35 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/rdfowl-for-data-silo-integrati/</guid>
      
      
      <description><div>Are the rumors true?</div><div>&lt;p&gt;Earlier this week I &lt;a href=&#34;https://www.bobdc.com/blog/what-data-is-your-metadata-abo&#34;&gt;wrote&lt;/a&gt; about my frustration with metadata as data about data that may never exist—data that ontology designers merely wish that someone else would create around their fabulous ontologies. Lately I&amp;rsquo;ve become interested in the more difficult but useful idea of designing ontologies around existing data in order to get more value from that data. In theory, RDF/OWL descriptions of separate, related data collections make it easier to use those collections together; how does this work in practice?&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://www.int-media.com/web_integration.asp&#34;&gt;&lt;img src=&#34;http://www.int-media.com/images/database.gif&#34; alt=&#34;[database integration diagram]&#34; border=&#34;0&#34; align=&#34;right&#34; hspace=&#34;30px&#34; vspace=&#34;30px&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;An interesting aspect of looking into new job opportunities last spring was researching what&amp;rsquo;s really going on in the XML and metadata world(s). Consultancies are getting more RDF/OWL work than I realized, especially in work for the U.S. government. I have a theory about this that I welcome people to debunk or, hopefully, back up with examples: one finding of the September 11th commission was that too much data was hidden in disparate silos in different government agencies, making it difficult enough to share this data that assembling the right pieces to prevent the wrong people from entering the country (or getting their boarding passes, or whatever) could have but didn&amp;rsquo;t happen. People interested in database integration technology could find &amp;ldquo;Homeland Security&amp;rdquo; funding, and it worked the other way as well: people with this problem to solve heard about a W3C standard technology that could help them and went looking into it.&lt;/p&gt;
&lt;h2 id=&#34;i111&#34;&gt;Government and academic work&lt;/h2&gt;
&lt;p&gt;XML 2004 attendees will remember Department of Homeland Security Metadata Program Manager Michael Daconta&amp;rsquo;s keynote speech, which clearly demonstrated this interest. The Government Symposium on Information Sharing and Homeland Security recently held their &lt;a href=&#34;http://www.ncsi.com/ishs05/index.shtml&#34;&gt;fourth annual symposium&lt;/a&gt;. SICoP, or The Semantic Interoperability Community of Practice (&amp;ldquo;a Special Interest Group within the Knowledge Management Working Group sponsored by the Best Practices Committee of the Chief Information Officers Council, in partnership with the Federal XML Working Group&amp;rdquo;) seems to be doing interesting work in this area, although the full title on their &lt;a href=&#34;http://web-services.gov/&#34;&gt;home page&lt;/a&gt; (&amp;ldquo;Semantic Interoperability (XML Web Services) Community of Practice&amp;rdquo;) conflates two areas that most of us would consider to be completely separate—XML Web Services may play a role in Semantic Interoperability implementations, but they&amp;rsquo;re very different concepts. Maybe they wanted to justify the use of the domain web-services.gov.&lt;/p&gt;
&lt;p&gt;Some web searches on database integration and RDF/OWL, DAML, or RDFS turned up some interesting work in academia. The paper &amp;ldquo;RDF/RDFS-based Relational Database Integration&amp;rdquo; (&lt;a href=&#34;http://ccnt.zju.edu.cn/projects/dartgrid/files/publication/Huajun-ICDE2006.pdf%20&#34;&gt;PDF&lt;/a&gt;) by four researchers at China&amp;rsquo;s Zhejiang University looks promising; it describes the rewriting of RDF SPARQL queries into a set of SQL queries and an application of their work at the China Academy of Traditional Chinese Medicine. &amp;ldquo;Knowledge Integration to Overcome Ontological Heterogeneity: Challenges from Financial Information Systems&amp;rdquo; (&lt;a href=&#34;http://ebiz.mit.edu/bgrosof/paps/icis2002-final.pdf&#34;&gt;PDF&lt;/a&gt;) by the Sloan School of Management&amp;rsquo;s Aykut Firat, Stuart Madnick, and Benjamin Grosof discusses the use of RuleML and OWL&amp;rsquo;s predecessor, DAML+OIL, in MIT&amp;rsquo;s COIN (COntext INterchange) project, which seems pretty defunct as of today (&lt;a href=&#34;http://context.mit.edu/~coin/&#34;&gt;dead home page&lt;/a&gt;, &lt;a href=&#34;http://72.14.209.104/search?q=cache:xv80fLNLPlIJ:context.mit.edu/~coin/+%22context+interchange%22+coin&amp;amp;hl=en&amp;amp;gl=us&amp;amp;ct=clnk&amp;amp;cd=1&#34;&gt;Google cache of it&lt;/a&gt;).&lt;/p&gt;
&lt;h2 id=&#34;i114&#34;&gt;Practical work?&lt;/h2&gt;
&lt;p&gt;Database integration has been an important area of computer science research for decades, and finding more papers on this topic that mention RDF/OWL or its predecessors shouldn&amp;rsquo;t be difficult. The government work that I&amp;rsquo;ve found looks like mostly top-down, big-picture design work. This is important work to do, but I&amp;rsquo;d like to find smaller scale working examples of these ideas somewhere: for example, two or more separate collections of non-RDF data, an RDF/OWL ontology describing some of the data in those collections, and something that uses that ontology to use the separate collections together. (Using separate collections of RDF together is a piece of cake—that&amp;rsquo;s part of the point of RDF.)&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;d love to see two different relational databases that are used together with the help of RDF/OWL, but I&amp;rsquo;m not holding my breath, because database vendors have been solving this problem without RDF/OWL for years. If the problem held the interest of the Zhejiang University researchers, though, I&amp;rsquo;m not giving up hope. How about RDF/OWL to integrate two or more collections of XML? Some XML and some relational data? XML, relational data, and some industry-specific notation systems or proprietary formats?&lt;/p&gt;
&lt;p&gt;If you know of anything, please let me know in comments here or via private email (see the bottom of my &lt;a href=&#34;http://www.snee.com/bob&#34;&gt;home page&lt;/a&gt; for the address). A description of something going on behind a firewall somewhere is better than nothing, but I&amp;rsquo;d prefer to hear about projects that anyone can reproduce on their own using free software.&lt;/p&gt;
&lt;h2 id=&#34;9-comments&#34;&gt;9 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://dannyayers.com&#34; title=&#34;http://dannyayers.com&#34;&gt;Danny&lt;/a&gt; on &lt;a href=&#34;#comment-499&#34;&gt;August 24, 2006 1:13 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I reckon things have got quite a bit further than you suggest. You might want to check out activity around D2RQ and SquirrelRDF, a couple more SPARQL/RDBMS bridges. Damien Steer has used the latter on HP&amp;rsquo;s internal LDAP directories, very cool.&lt;/p&gt;
&lt;p&gt;A smallish-team commercial contract job I&amp;rsquo;ve been on for a while now involves the merging of medical patient data from different sources (two completely different SQL DBs for starters). Mapping to a common RDF model seems to work pretty well for the integration problem, SPARQL seems adequate for the necessary querying facilities required.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-500&#34;&gt;August 24, 2006 2:26 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Cool, thanks Danny!&lt;/p&gt;
&lt;p&gt;By Damian on &lt;a href=&#34;#comment-501&#34;&gt;August 24, 2006 4:03 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;You might also want to look at &lt;a href=&#34;http://darq.sourceforge.net/&#34;&gt;darq&lt;/a&gt;, which does federated sparql queries. Bastien has been using this with SquirrelRDF over LDAP (iirc) and other data sources, some of which are RDB backed.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.wasab.dk/morten/blog/&#34; title=&#34;http://www.wasab.dk/morten/blog/&#34;&gt;Morten&lt;/a&gt; on &lt;a href=&#34;#comment-502&#34;&gt;August 24, 2006 5:39 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Regarding &amp;ldquo;database vendors have been solving this problem without RDF/OWL for years&amp;rdquo;:&lt;/p&gt;
&lt;p&gt;That is true (I know, I&amp;rsquo;m doing it on a daily basis), but that doesn&amp;rsquo;t mean that new and better ways and methods will never show up.&lt;/p&gt;
&lt;p&gt;I believe RDF/OWL provides a path to those new and better solutions, but it&amp;rsquo;s arguably not as solid and mature as (relational or otherwise) databases&amp;hellip;\&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-503&#34;&gt;August 24, 2006 5:59 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Morten -&lt;/p&gt;
&lt;p&gt;Tell me more about this path&amp;hellip;\&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://dowhatimean.net/&#34; title=&#34;http://dowhatimean.net/&#34;&gt;Richard Cyganiak&lt;/a&gt; on &lt;a href=&#34;#comment-504&#34;&gt;August 24, 2006 6:22 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Bob, you might be interested in &lt;a href=&#34;http://www.wiwiss.fu-berlin.de/suhl/bizer/d2r-server/&#34;&gt;D2R Server&lt;/a&gt;, which builds on the previously mentioned D2RQ. I&amp;rsquo;ve toyed around with integrating multiple D2R Server instances using Bastian&amp;rsquo;s DARQ and it works, at least in principle, although there are still scalability and performance issues all over the place.&lt;/p&gt;
&lt;p&gt;A couple of other related projects are &lt;a href=&#34;http://esw.w3.org/topic/RdfAndSql&#34;&gt;linked on the ESW wiki&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://clarkparsia.com/&#34; title=&#34;http://clarkparsia.com/&#34;&gt;Kendall Clark&lt;/a&gt; on &lt;a href=&#34;#comment-505&#34;&gt;August 25, 2006 8:19 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Just to chime in, &lt;a href=&#34;http://clarkparsia.com/&#34;&gt;Clark &amp;amp; Parsia&lt;/a&gt; is pretty active in the US federal gov&amp;rsquo;t space doing exactly what you outline, Bob. Which should come as no surprise to you, given yr new employer. :&amp;gt;&lt;/p&gt;
&lt;p&gt;(Also, as to yr theory about the 9/11 stuff prompting integration: that&amp;rsquo;s pretty plausible; but inside of NASA, where we&amp;rsquo;ve done most of our RDF integration work, they&amp;rsquo;ve been worrying about their special data problems, including the silo issue, for a good long while. And the OMB&amp;rsquo;s DRM memos certainly have added additional impetus.)&lt;/p&gt;
&lt;p&gt;In particular we&amp;rsquo;ve been pretty active with NASA doing database integration using RDF (though not OWL yet): the BIANCA project integrates several data sources to manage NASA networked assets (networks, servers, applications, and dependencies between these); and our POPS project is integrating disparate data sources (6 right now with probably dozens more in the next 18 months) using a virtual RDF federation model and a (novel) display client, &lt;a href=&#34;http://clarkparsia.com/projects/code/jspace/&#34;&gt;JSpace&lt;/a&gt;, which is, essentially, a visual RDF query builder (for RDQL and iTQL, SPARQL to come later).&lt;/p&gt;
&lt;p&gt;So, yeah, this stuff is happening, but it&amp;rsquo;s all rather under the radar in some sense.&lt;/p&gt;
&lt;p&gt;As for OWL, I suspect we won&amp;rsquo;t really get into using it for db integration till we have some R&amp;amp;D funds to polish that approach first. There&amp;rsquo;s some good work done by some of the Europeans in using OWL to do align database schemas as represented by UML. But that work isn&amp;rsquo;t trivial and, so far, the RDF approach has been sufficient.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.topquadrant.com&#34; title=&#34;http://www.topquadrant.com&#34;&gt;Ralph Hodgson&lt;/a&gt; on &lt;a href=&#34;#comment-506&#34;&gt;August 27, 2006 5:28 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Hello Bob,&lt;/p&gt;
&lt;p&gt;We are also using RDF/OWL ontologies for database integration, also at Government agencies (FAA, NASA). We have recently tooled up TopBraid Composer to support proxy mappings - see Holger&amp;rsquo;s blog entry &lt;a href=&#34;http://composing-the-semantic-web.blogspot.com/2006/08/update-automated-database-import-into.html&#34;&gt;http://composing-the-semantic-web.blogspot.com/2006/08/update-automated-database-import-into.html&lt;/a&gt;&lt;br /&gt;
for more information.&lt;/p&gt;
&lt;p&gt;In one approach we are using datalog with OWL. A second approach uses SPARQL with OWL and D2RQ.&lt;/p&gt;
&lt;p&gt;&lt;br /&gt;
Regards, Ralph&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://ivas.nc3a.nato.int&#34; title=&#34;http://ivas.nc3a.nato.int&#34;&gt;Victor Rodriguez-Herola&lt;/a&gt; on &lt;a href=&#34;#comment-523&#34;&gt;September 2, 2006 6:04 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Hi,&lt;/p&gt;
&lt;p&gt;We´ve not being doing db integration directly but somehow indirectly. We´ve been &lt;a href=&#34;http://ivas.nc3a.nato.int&#34;&gt;integrating several sources of information&lt;/a&gt; by wrapping them around a very simple web services (SOAP-based and ReST-based). Each of the sources was providing data directly in RDF based on a one-to-one OWL mapping to their respective underlying data model (which in most of the cases was a db model). Then, by using cross-ontologies or interpretation ontologies, that is connecting similar concepts from different ontologies with respect to some &amp;ldquo;generic&amp;rdquo; vocabulary, we were able to make queries.&lt;/p&gt;
&lt;p&gt;Those queries were based on the &amp;ldquo;generic&amp;rdquo; vocabulary and we were using Pellet and a specific algorithm to generate the proper target query that the different sources were able to understand. The point was to transform the source query to RDQL and then, from here, translate it to the target queries. Of course a limited set of query syntax were used and we focused on generating RDF queries (queries by example) and source-specific RDQL queries.&lt;/p&gt;
&lt;p&gt;Once data in RDF were served from different sources (again in RDF based on sources specific OWL), we populated the Pellet ABox and the classification did its job. From here, we could make further refinements of the query (either based on the generic query or based on some more source-specific vocabulary).&lt;br /&gt;
There are a couple of initiatives in our organisation aimed to encourage the use of RDF and OWL so any system or data provider will be able to wrap-up some of its services and provide information directly in RDF.&lt;/p&gt;
&lt;p&gt;Our point is not in making db integration directly but to facilitate systems and data provider developers the path to provide their information in RDF based on its sort of &amp;ldquo;private&amp;rdquo; ontology. The trick then is to make the connections of those disparate models. But, at least, you don´t have to re-program again if you´ve got yet another source in town. Just create your OWL model and provide RDF, make the proper interpretation entries, let the inference engine know about them and there you will have another data source to integrate with the rest.&lt;/p&gt;
&lt;p&gt;It´s not the most advanced proposal, but we´re trying to make the case of interoperability.&lt;/p&gt;
&lt;p&gt;Regards,&lt;br /&gt;
Victor&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2006">2006</category>
      
      <category domain="https://www.bobdc.com//categories/rdf/owl">RDF/OWL</category>
      
    </item>
    
    <item>
      <title>What data is your metadata about, and where is it?</title>
      <link>https://www.bobdc.com/blog/what-data-is-your-metadata-abo/</link>
      <pubDate>Mon, 21 Aug 2006 09:31:10 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/what-data-is-your-metadata-abo/</guid>
      
      
      <description><div>If metadata really is data about data...</div><div>&lt;p&gt;In a &lt;a href=&#34;https://www.bobdc.com/blog/xml-summer-school-in-oxford#i113&#34;&gt;recent posting&lt;/a&gt;, I mentioned that I&amp;rsquo;ve been thinking lately about how some people doing metadata work (in particular, people doing RDF Schema and RDF/OWL ontologies) don&amp;rsquo;t care much about whether there is any corresponding data to go with their metadata. We all dutifully define metadata as &amp;ldquo;data about data,&amp;rdquo; but a lot of metadata out there is not really about any existing, useful data. Dan Connolly &lt;a href=&#34;http://dig.csail.mit.edu/breadcrumbs/node/158&#34;&gt;called it&lt;/a&gt; &amp;ldquo;ontologies for the sake of ontologies.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;When describing this peeve of mine to Paul Prescod, he asked me &amp;ldquo;are you saying that people are creating metadata ontologies before they create the data and should do it the other way around?&amp;rdquo; I replied that too many people create a metadata ontology for data that doesn&amp;rsquo;t exist, announce its availability and the kind of data it would be good for, and then move on to create more ontologies. For example, a year or so ago on the &lt;a href=&#34;http://www.w3.org/2001/sw/interest/&#34;&gt;semantic web mailing list&lt;/a&gt;, someone posted an announcement about an RDF Schema that he had created with a description of the kind of data it would be useful for. I privately e-mailed him suggesting that he create a file of RDF triples as sample instance data that went with his schema to demonstrate how to use this schema, and he sent back a very appreciative email that thanked me and said that this was a great idea and that he would go ahead and do it. Here&amp;rsquo;s what I&amp;rsquo;m wondering: why did this have to be suggested to him? Why wasn&amp;rsquo;t it self-evident that he should create sample data before announcing his schema on the mailing list? Why is it so common for people working with RDF Schema and RDF/OWL to think that if they build it, the data will come, simply because they announced that their work is available? When people develop relational schemas or XML DTDs or schemas, creating sample data that conforms to them is a normal step in testing the suitability of their work; why is it less obvious to people doing ontology work that there should be &lt;a href=&#34;https://www.bobdc.com/blog/my-new-favorite-typo&#34;&gt;meatdata&lt;/a&gt; to go with their metadata?&lt;/p&gt;
&lt;p&gt;When I chose this topic for my five-minute rant at the XML Summer School, I quoted John Chelsom&amp;rsquo;s statement &lt;a href=&#34;http://www.xmlsummerschool.com/Content-Knowledge.asp&#34;&gt;earlier that week&lt;/a&gt; that the main purpose of metadata is to speed up and enrich the searching of resources. To give some credit to the people I&amp;rsquo;m ranting against, I wouldn&amp;rsquo;t agree with this assertion 100%; some people are doing valuable work with pure metadata about medical conditions and potential treatments as they use RDF/OWL tools to find new relationships, but I think too many are designing metadata for nonexistent data that they somehow think they will inspire someone else to create.&lt;/p&gt;
&lt;p&gt;In &lt;a href=&#34;http://lists.w3.org/Archives/Public/semantic-web/2006Jul/thread.html#msg56&#34;&gt;typical discussions&lt;/a&gt; about the lack of RDF data on the web, some people point out the progress in the development of tools that let us treat non-RDF as triples, thereby adding this data to a potential semantic web. I think that this is great, but what I&amp;rsquo;d really like to see is RDF/OWL ontologies that describe this data so that we can get more value from that data. In his talk, John also described the concept of &amp;ldquo;turning content into knowledge by adding metadata and ontology.&amp;rdquo; This would make a great mission statement for someone, and it gives us a clue about the appeal of designing metadata for non-existent data: it&amp;rsquo;s easier. As with many IT projects, starting with a body of existing data and then creating a model that works well with it is messier and more difficult than starting with a blank slate, but from the potential semantic web to the internal systems of many, many, companies, the greatest opportunities for the use of metadata are in building metadata around existing data.&lt;/p&gt;
&lt;p&gt;In forthcoming postings here, I&amp;rsquo;ll write about (or, more likely, ask about) the creation of RDF/OWL ontologies for existing sets of data and how those ontologies add value to that data. Please let me know, in comments here or by private e-mail, of any projects you know of that do this.&lt;/p&gt;
&lt;h2 id=&#34;6-comments&#34;&gt;6 Comments&lt;/h2&gt;
&lt;p&gt;By Gunnar Aastrand Grines on &lt;a href=&#34;#comment-487&#34;&gt;August 21, 2006 10:28 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Why is it so common for people working with RDF Schema and RDF/OWL to think that if they build it, the data will come, simply because they announced that their work is available?&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Because people like making abstractions, it&amp;rsquo;s an intellectually satisfying job&amp;hellip; Look at the programmers, every java programmer loves creating frameworks, the more general the better, but are no so keen to use it to build something useful.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://copia.ogbuji.net&#34; title=&#34;http://copia.ogbuji.net&#34;&gt;Uche&lt;/a&gt; on &lt;a href=&#34;#comment-490&#34;&gt;August 21, 2006 5:26 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;When Bill de hÓra calls me out as one of the &amp;ldquo;markup people out there who can live in the RDF world&amp;rdquo; [1], I like to think he&amp;rsquo;s characterizing folks who work with RDF, but strictly as an auxilliary to XML. IOW, it&amp;rsquo;s exactly what you&amp;rsquo;re saying: worry about what the data is, then decide what metadata representation and processing is needed to enhance its value. As an aside, I&amp;rsquo;d tend to include you, Eric van der Vlist and Edd Dumbill in Bill&amp;rsquo;s list, and I&amp;rsquo;m not sure I&amp;rsquo;d include Shelley Powers, unless I misunderstand her position on certain things (quite possible).&lt;/p&gt;
&lt;p&gt;Anyway, I&amp;rsquo;ve always been able to tolerate what I consider metadata fundamentalism from some RDF/topic maps folks. By that I mean folks who prefer to encode everything they process in such metadata technologies. I&amp;rsquo;ve heard advocates say that people don&amp;rsquo;t need the semantic vagaries of XML formats when they can have the rigor of RDF/TM. I disagree, but what ultimately soured me on RDF was the related, but distinct problem of over-engineering (and over-theorizing) in the RDF world [2]. (Of course you know that, because you&amp;rsquo;re one of those who commented :-) )&lt;/p&gt;
&lt;p&gt;Good piece, Bob. Thanks.&lt;/p&gt;
&lt;p&gt;[1] &lt;a href=&#34;http://www.dehora.net/journal/2004/07/rdf_101.html&#34;&gt;http://www.dehora.net/journal/2004/07/rdf_101.html&lt;/a&gt;&lt;br /&gt;
[2] &lt;a href=&#34;http://copia.ogbuji.net/blog/2005-09-14/Is_RDF_mov&#34;&gt;http://copia.ogbuji.net/blog/2005-09-14/Is_RDF_mov&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&amp;ndash;Uche\&lt;/p&gt;
&lt;p&gt;By Damian Steer on &lt;a href=&#34;#comment-492&#34;&gt;August 21, 2006 8:13 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The reason I like using RDF is because I can (infinitely) defer committing to regular structures. I take my data and add properties as and when I need them. I pick and choose existing properties and classes, or make my own up. The few &amp;lsquo;ontologies&amp;rsquo; I&amp;rsquo;ve written have been either loose collections of terms I couldn&amp;rsquo;t find elsewhere, or descriptions mined from the dataset (so people know what&amp;rsquo;s there).&lt;/p&gt;
&lt;p&gt;But working ontology first seems to throw that advantage away. They&amp;rsquo;re yearning for the shackles of XML schemas, when they ought to be enjoying the freedom of RDF schema.&lt;/p&gt;
&lt;p&gt;By Stephen De Gabrielle on &lt;a href=&#34;#comment-495&#34;&gt;August 22, 2006 5:32 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&amp;ldquo;turning content into knowledge by adding metadata and ontology&amp;rdquo;&lt;/p&gt;
&lt;p&gt;Been into a library lately?&lt;/p&gt;
&lt;p&gt;Point taken about ontologies.&lt;/p&gt;
&lt;p&gt;Maybe their is an opportunity here. Even small libraries have huge datasets, often in electronic form. Most accept volunteer assistance. Maybe some of this intellectual energy could be devoted to real metadata projects in real libraries. Especially small libraries with small budgets (towns under 300k pop for example, schools, small colleges, etc).&lt;/p&gt;
&lt;p&gt;Regards,&lt;/p&gt;
&lt;p&gt;Stephen\&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://dannyayers.com&#34; title=&#34;http://dannyayers.com&#34;&gt;Danny&lt;/a&gt; on &lt;a href=&#34;#comment-497&#34;&gt;August 23, 2006 4:44 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;m as guilty as anyone of creating schemas without data, but I&amp;rsquo;m not sure its necessarily a problem per se. For example, ages ago I hacked out a &lt;a href=&#34;http://dannyayers.com/xmlns/project/&#34;&gt;project ontology&lt;/a&gt;. It included a lot of guesswork about what I thought I would need. When I finally did some coding around it, working with instance data, I found that I was only using a fraction of the terms I&amp;rsquo;d defined. Ok, so there was some wasted effort here, but if you considered the v0.1 of the ontology just a working sketch, it&amp;rsquo;s not that bad.&lt;/p&gt;
&lt;p&gt;So although I agree that the ratio of ontologies to instance data is pretty silly, I&amp;rsquo;m not convinced this is inherently bad. URIs aren&amp;rsquo;t exactly expensive. Duplication of ontologies is generally undesirable, but I think probably unavoidable in the first few passes.&lt;/p&gt;
&lt;p&gt;Perhaps (a big perhaps) when Damian mentions &amp;ldquo;yearning for the shackles of XML schemas&amp;rdquo; this also applies to the criticism of the ontology:data ratio. In the XML world schemas are treated as precious resources. Whereas in RDF, by making *any* statement you are making at least one schema/ontology level assertion (that the predicate resource is a property).&lt;/p&gt;
&lt;p&gt;On your metadata point, I reckon the more interesting side of RDF is where it&amp;rsquo;s talking about things in general, rather than about documents (about things). But given the doc-nature of the current web, there is a lot of low-hanging fruit on the metadata side.&lt;/p&gt;
&lt;p&gt;Re. ontologies for existing data - that&amp;rsquo;s half the motivation behind the &lt;a href=&#34;http://micromodels.org/&#34;&gt;micromodels&lt;/a&gt; stuff. People are creating material in HTML, the microformats folks provide a way of making the data explicit, XSLT offers the bridge to RDF.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-498&#34;&gt;August 23, 2006 8:07 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Danny -&lt;/p&gt;
&lt;p&gt;Did you announce to the world that you had made a project ontology available before or after working with related instance data and discovering that only a fraction of it was useful?&lt;/p&gt;
&lt;p&gt;More generally, is it better to announce to the world &amp;ldquo;I&amp;rsquo;ve written this ontology and made it available&amp;rdquo; as soon as possible, which in your case meant that the majority of it was not useful, or after putting it through some paces and identifying which parts are useful?&lt;/p&gt;
&lt;p&gt;Of course this is a rhetorical question; the classic ontology developer answer would be &amp;ldquo;but some of the other parts might be useful to someone someday.&amp;rdquo; This is just wishful thinking that only increases the amount of useless ontology work that people must wade through to find something that might work for them. I&amp;rsquo;m not looking for rigorous testing, but just a demo, or as you did, some coding around it.&lt;/p&gt;
&lt;p&gt;In attitudes about ontology development today, there&amp;rsquo;s way too much &amp;ldquo;might be useful to someone someday&amp;rdquo; with no desire to put a little effort into determining how useful it might be and to who. Developers of other kinds of schema or software (&amp;ldquo;coding&amp;rdquo;!) with this attitude would have a difficult time being taken seriously. I&amp;rsquo;m sure that sourceforge and download.com are full of software that does nothing for anyone but their authors, but at least it did something useful for them.&lt;/p&gt;
&lt;p&gt;Bob\&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2006">2006</category>
      
      <category domain="https://www.bobdc.com//categories/metadata">metadata</category>
      
    </item>
    
    <item>
      <title>My new favorite typo</title>
      <link>https://www.bobdc.com/blog/my-new-favorite-typo/</link>
      <pubDate>Mon, 14 Aug 2006 20:28:53 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/my-new-favorite-typo/</guid>
      
      
      <description><div>Data about meat?</div><div>&lt;p&gt;&lt;a href=&#34;http://www.google.com/search?q=meatdata&#34;&gt;&lt;img src=&#34;http://www.uwex.edu/ces/agmarkets/DM51b.gif&#34; alt=&#34;[sample meat label]&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; width=&#34;200pt&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;A slip of the fingers can make you type &amp;ldquo;meatdata&amp;rdquo; instead of &amp;ldquo;metadata,&amp;rdquo; and a &lt;a href=&#34;http://www.google.com/search?q=meatdata&#34;&gt;Google search for meatdata&lt;/a&gt; brings up almost 2,000 hits. My favorite is the &lt;a href=&#34;http://mail.asis.org/pipermail/asis-l/2002-August/000252.html&#34;&gt;2002 job posting&lt;/a&gt; for a &amp;ldquo;Meatdata librarian&amp;rdquo; position at the University of Tennessee. Apparently, the relationship between &amp;ldquo;Meatdata and Active Object-Model Pattern Mining Workshop&amp;rdquo; was a hot topic at OOPSLA in &lt;a href=&#34;http://www.joeyoder.com/Research/metadata/OOPSLA98MetaDataWkshop.html&#34;&gt;1998&lt;/a&gt;, &lt;a href=&#34;http://www.adaptiveobjectmodel.com/OOPSLA99/&#34;&gt;1999&lt;/a&gt;, and &lt;a href=&#34;http://www.adaptiveobjectmodel.com/ECOOP2000/&#34;&gt;2000&lt;/a&gt; (note the &lt;code&gt;/html/head/title&lt;/code&gt; title at the top of the browser window for each, not the &lt;code&gt;h1&lt;/code&gt; title). At ektron.com, chrisk &lt;a href=&#34;http://dev.ektron.com/forum.aspx?g=posts&amp;amp;t=5092&#34;&gt;wants to know&lt;/a&gt; if one can &amp;ldquo;setup a Parent - Child relationship between meatdata searchable properties.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;On a less sarcastic note, Rohit Khare used the term &lt;a href=&#34;http://www.4k-associates.com/moma.html&#34;&gt;on purpose&lt;/a&gt; as far back as 1999 to reference the data that the metadata is about. Perhaps for my standard rant that despite our dutiful definitions of metadata as &amp;ldquo;data about data,&amp;rdquo; there&amp;rsquo;s too much metadata out there that lacks any corresponding instance data, I can start using the eighties catch phrase &amp;ldquo;&lt;a href=&#34;http://en.wikipedia.org/wiki/Where%27s_the_beef&#34;&gt;Where&amp;rsquo;s the beef?&lt;/a&gt;&amp;rdquo;&lt;/p&gt;
&lt;h2 id=&#34;1-comments&#34;&gt;1 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.xmlgrrl.com/blog&#34; title=&#34;http://www.xmlgrrl.com/blog&#34;&gt;Eve M.&lt;/a&gt; on &lt;a href=&#34;#comment-480&#34;&gt;August 14, 2006 10:25 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Yeah, I frequently mistype it as &amp;ldquo;meatdata&amp;rdquo; all the time, and it always reminds me of &lt;a href=&#34;http://www.terrybisson.com/meat.html&#34;&gt;They&amp;rsquo;re Made of Meat&lt;/a&gt;. I suspect &amp;ldquo;meat&amp;rdquo; is one of those inherently funny words&amp;hellip;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2006">2006</category>
      
      <category domain="https://www.bobdc.com//categories/metadata">metadata</category>
      
    </item>
    
    <item>
      <title>Wikipedia: more useful than good</title>
      <link>https://www.bobdc.com/blog/wikipedia-more-useful-than-goo/</link>
      <pubDate>Mon, 07 Aug 2006 11:43:43 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/wikipedia-more-useful-than-goo/</guid>
      
      
      <description><div>There&#39;s so much wrong with it... and I use it all the time.</div><div>&lt;p&gt;The program for a recent presentation at my daughter&amp;rsquo;s school footnoted Wikipedia for a few definitions that were supposed to provide background for whatever cultural thing the kids and we were learning about. I thought this was pretty funny, because the point of footnotes is to show that you didn&amp;rsquo;t just make something up; you got it from a source that the reader can check on to follow up on the information. A source that anyone can edit, that makes the news every week for the many silly errors it has, is not a sensible thing to reference in a footnote.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://en.wikipedia.org/wiki/Main_Page&#34;&gt;&lt;img src=&#34;http://upload.wikimedia.org/wikipedia/commons/thumb/5/53/Wikipedia-logo-en-big.png/98px-Wikipedia-logo-en-big.png&#34; alt=&#34;[wikipedia logo]&#34; border=&#34;0&#34; align=&#34;right&#34; hspace=&#34;30px&#34; vspace=&#34;30px&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Nevertheless, I use Wikipedia several times a day. I often don&amp;rsquo;t get past the first few sentences of an entry, but those first few sentences can be remarkably useful, especially in certain categories:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;em&gt;Geek stuff&lt;/em&gt; When the phone company repair guy gave us a new DSL modem, he warned me that I might need to dig up certain dialog boxes to reset some things because my new modem was a &lt;a href=&#34;http://en.wikipedia.org/wiki/DHCP&#34;&gt;DHCP&lt;/a&gt; one, unlike my old &lt;a href=&#34;http://en.wikipedia.org/wiki/Dynamic_DNS&#34;&gt;Dynamic DNS&lt;/a&gt; one. I&amp;rsquo;d heard these acronyms before, but didn&amp;rsquo;t really understand them. Wikipedia straightened me out. (Of course, if either of these entries was full of errors, I have no way of knowing this, but I&amp;rsquo;ll get to that.)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;em&gt;Companies&lt;/em&gt; This is one category where Wikipedia is often better than a more official source. A typical company&amp;rsquo;s &amp;ldquo;About&amp;rdquo; web page offers a vague, buzzwordy extension of their already vague mission statement about the value-added synergy of the total solution lifecycle suite that they provide. The same company&amp;rsquo;s Wikipedia page probably starts off with a brief description of what they&amp;rsquo;re known for making or doing, where they&amp;rsquo;re based, how long they&amp;rsquo;ve been around, and their stock ticker symbols. (Of course, marketing communications people know that anyone can edit a Wikipedia entry, and you can often see their footprints in a Wikipedia entry.) For example, Nike&amp;rsquo;s &lt;a href=&#34;http://www.nike.com/nikebiz/nikebiz.jhtml?page=3&#34;&gt;Company Overview&lt;/a&gt; page doesn&amp;rsquo;t mention that they&amp;rsquo;re a &amp;ldquo;sports and fitness&amp;rdquo; company until the third paragraph, and even then there&amp;rsquo;s nothing to distinguish them from a company that makes treadmills, skis, or lacrosse sticks, none of which they make. Their &lt;a href=&#34;http://en.wikipedia.org/wiki/Nike%2C_Inc.&#34;&gt;Wikipedia entry&lt;/a&gt; starts by saying &amp;ldquo;Nike, Inc. (Pronounced: NIGH-KEY) (NYSE: NKE) is a major American manufacturer of athletic shoes, Clothing/apparel, and sports equipment.&amp;rdquo; BP&amp;rsquo;s &lt;a href=&#34;http://www.bp.com/subsection.do?categoryId=5&amp;amp;contentId=2006530&#34;&gt;What we do&lt;/a&gt; page tells us that &amp;ldquo;[their] business is about finding, producing, and marketing the natural energy resources on which the modern world depends&amp;rdquo;; Wikipedia &lt;a href=&#34;http://en.wikipedia.org/wiki/Bp&#34;&gt;tells us&lt;/a&gt; that &amp;ldquo;BP plc (LSE: BP, NYSE: BP, TYO: 5051 ), originally British Petroleum, is a British energy company with headquarters in London, one of four vertically integrated private sector oil, natural gas, and petrol (gasoline) &amp;lsquo;supermajors&amp;rsquo; in the world, along with Royal Dutch Shell, ExxonMobil and Total&amp;rdquo;—far more useful information.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;em&gt;Semi-famous people&lt;/em&gt; You hear a name, you&amp;rsquo;ve heard it before, you&amp;rsquo;re not sure why. If it&amp;rsquo;s a musician or band, I recommend &lt;a href=&#34;http://www.allmusic.com&#34;&gt;allmusic.com&lt;/a&gt;, but otherwise, Wikipedia will give you the general idea of why someone is famous. Jaron Lanier complained that his one short film seen by only a few people didn&amp;rsquo;t qualify him as a filmmaker, so he kept removing this title from &lt;a href=&#34;http://en.wikipedia.org/wiki/Jaron_Lanier&#34;&gt;his Wikipedia entry&lt;/a&gt;, entry and it kept re-appearing. (As of today, it&amp;rsquo;s not there.) Still, the first few sentences give a reasonable summary of why he&amp;rsquo;s famous in certain circles: &amp;ldquo;Jaron Lanier (born 1960) is an American musician and virtual reality developer. He claims to have popularized the term &amp;lsquo;Virtual Reality&amp;rsquo; (VR) in the early 1980s. At that time, he founded VPL Research, the first company to sell VR products.&amp;rdquo;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;em&gt;Pop culture&lt;/em&gt; If &lt;a href=&#34;http://www.boingboing.net/&#34;&gt;BoingBoing&lt;/a&gt; starts throwing around some Japanese word that may be a kids card game, a video game, or some sex trick, I&amp;rsquo;m usually confident that Wikipedia will tell me what that thing really is. Looking at the &lt;a href=&#34;http://50.lycos.com/&#34;&gt;Lycos Top 50&lt;/a&gt; list recently, I had no idea what &amp;ldquo;&lt;a href=&#34;http://en.wikipedia.org/wiki/Naruto&#34;&gt;Naruto&lt;/a&gt;&amp;rdquo; at number 12 was; I just found out from Wikipedia that it&amp;rsquo;s a Japanese manga comic and TV series. Number 31, &amp;ldquo;&lt;a href=&#34;http://en.wikipedia.org/wiki/Limewire&#34;&gt;Limewire&lt;/a&gt;,&amp;rdquo; was also a mystery to me until I found out that it&amp;rsquo;s a Gnutella client. (Looking at the Lycos Top 50 just now, it seems broken; the actual list isn&amp;rsquo;t showing up.)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;em&gt;Area codes&lt;/em&gt; This is more useful to Americans and Canadians, but others interested in the three-digit numbers sometimes repeated in hip-hop songs may find it handy. If I see a phone number and have no idea where in the country it is, I&amp;rsquo;ll find out from Wikipedia, and I&amp;rsquo;ll probably find out much more about it. For example, see their entry for &lt;a href=&#34;http://en.wikipedia.org/wiki/Area_code_718&#34;&gt;my last area code&lt;/a&gt;, when I was representin&amp;rsquo; the BK.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;A &lt;a href=&#34;http://www.newyorker.com/printables/fact/060731fa_fact&#34;&gt;recent New Yorker&lt;/a&gt; article on Wikipedia (and don&amp;rsquo;t miss a recent &lt;a href=&#34;http://www.theonion.com/content/node/50902&#34;&gt;Onion piece&lt;/a&gt;), which is particularly good on the project&amp;rsquo;s subcultures, reminded me of something else about the quality of Wikipedia entries: the only way you know an article is really high-quality is if it&amp;rsquo;s telling you things you already know. One of the first times I looked at Wikipedia, I looked at their entry for &lt;a href=&#34;http://en.wikipedia.org/wiki/XSLT&#34;&gt;XSLT&lt;/a&gt;, and it was awful. Friends told me &amp;ldquo;Just fix it! That&amp;rsquo;s what Wikipedia is all about!&amp;rdquo; But, this entry needed replacement, not fixing, and it felt like bad etiquette to completely throw out an entry for a basic W3C standard, so I didn&amp;rsquo;t do it. Several months later, it had evolved to a more reasonable description of XSLT, so I felt comfortable making a few changes and additions—for example, adding a mention of the W3C.&lt;/p&gt;
&lt;p&gt;These two experiences with their XSLT entry form the yin and yang of my attitude about Wikipedia, reminding me that you have to take the entries with a grain of salt, but when you do, they can be pretty useful. Like I said, for all I know, their DHCP entry could be full of technical errors. If you&amp;rsquo;re doing serious research on something, Wikipedia is not a serious source, but because it may point to serious sources, it can provide a good starting point.&lt;/p&gt;
&lt;p&gt;And for God&amp;rsquo;s sake, don&amp;rsquo;t footnote it.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2006">2006</category>
      
      <category domain="https://www.bobdc.com//categories/miscellaneous">miscellaneous</category>
      
    </item>
    
    <item>
      <title>XML summer school in Oxford</title>
      <link>https://www.bobdc.com/blog/xml-summer-school-in-oxford/</link>
      <pubDate>Mon, 31 Jul 2006 07:57:32 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/xml-summer-school-in-oxford/</guid>
      
      
      <description><div>The Hogwarts of XML.</div><div>&lt;p&gt;This summer was the seventh year that the &lt;a href=&#34;http://www.csw.co.uk/&#34;&gt;CSW Group&lt;/a&gt; sponsored their &lt;a href=&#34;http://www.xmlsummerschool.com&#34;&gt;XML Summer School&lt;/a&gt; in Oxford, England. CSW&amp;rsquo;s offices are on the outskirts of Oxford, but they rent facilities within the ancient university to hold the classes. For the last few years they&amp;rsquo;ve held it at &lt;a href=&#34;http://www.wadham.ox.ac.uk/public&#34;&gt;Wadham College&lt;/a&gt;, with a beautiful campus that was first build in the early seventeenth century. (To hear my father describe it, I&amp;rsquo;m an Oxford don, so I try to explain &amp;ldquo;Dad&amp;hellip; it&amp;rsquo;s a consulting company renting several rooms and dorms in the university&amp;hellip;&amp;rdquo;) The main Wadham dining hall, where Christopher Wren and Isaac Newton once sat, has a real Hogwarts effect and other buildings in the neighborhood either directly inspired or were &lt;a href=&#34;http://www.londontaxitour.com/london-taxi-tour-Harry-Potter-tour-oxford-film-locations.htm&#34;&gt;used as sets&lt;/a&gt; for several scenes in the the Harry Potter movies.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://www.xmlsummerschool.com&#34;&gt;&lt;img src=&#34;http://www.wadham.ox.ac.uk/public/overview/frontquad&#34; alt=&#34;[Wadham front quad]&#34; border=&#34;0&#34; align=&#34;right&#34; hspace=&#34;30px&#34; vspace=&#34;30px&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I chaired the two-day XSLT track, which began with two sessions on basic XSLT 1.0 from me. &lt;a href=&#34;http://www.datypic.com/index.html&#34;&gt;Priscilla Walmsley&lt;/a&gt; then did an introduction to XSL-FO, which led to some great discussion from people in the audience such as Jeni Tennison and Debbie Lapeyre on XSL-FO implementations they&amp;rsquo;ve done for clients. Paul Prescod did a very popular presentation titled &amp;ldquo;AJAX + XSLT = ?&amp;rdquo; on the past, current, and potential future role of XSLT in AJAX applications. When he and I came up with the idea for the talk, I had no idea of the role that XSLT already played in Google Maps and the web interface to Microsoft Outlook, but I learned this and a lot more from his talk.&lt;/p&gt;
&lt;p&gt;This track&amp;rsquo;s second day focused on the batch of XSLT-related specs that recently moved to Candidate Recommendation status. Priscilla gave an overview of XQuery, the subject of her &lt;a href=&#34;http://safari.oreilly.com/0596006349&#34;&gt;forthcoming O&amp;rsquo;Reilly book&lt;/a&gt;, and Jeni Tennison explained most of the new features that XSLT 2.0 adds to XSLT 1.0. She left the schema support parts of XSLT 2.0 (and XQuery) to &lt;a href=&#34;http://saxonica.blogharbor.com/blog&#34;&gt;Michael Kay&lt;/a&gt;, who discussed and demonstrated those. I didn&amp;rsquo;t realize how much of the design of XSLT 1.0 was planning ahead for the use of XSLT processors that were built into browsers (something that Paul Prescod and I both demonstrated, but which never caught on the way the original XSL Working Group had hoped for), and part of this was a philosophy that runtime error propagation was bad because they would confuse people browsing the web. A key advantage of type-aware XQuery queries and XSLT 2.0 stylesheets is that when something doesn&amp;rsquo;t work correctly—especially in the common case of no output when you expected some—the use of schema-awareness is more likely to result in an error message that gives you a useful clue about what went wrong. Michael is not an enthusiastic supporter of the W3C Schema Language, but strongly believes that any schema language that provides typing is better than DTDs, and because the large complex schemas being created and used these days use the W3C Schema language, that&amp;rsquo;s the one to use. In his words, &amp;ldquo;I&amp;rsquo;m not selling it to you because it&amp;rsquo;s good; I&amp;rsquo;m selling it to you because it&amp;rsquo;s necessary.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;Anyone familiar with XSLT will be recognize all of these names, so we obviously had an impressive lineup. I was sorry that we were scheduled opposite the web services track, where &lt;a href=&#34;http://www.xmlgrrl.com/blog/&#34;&gt;Eve Maler&lt;/a&gt; had put together a great list of speakers. Earlier in the week, I managed to see the second day of the &amp;ldquo;Content and Knowledge with XML&amp;rdquo; track put together by &lt;a href=&#34;http://silmaril.ie/cgi-bin/blog&#34;&gt;Peter Flynn&lt;/a&gt;, which was great. John Chelsom (the &amp;ldquo;C&amp;rdquo; in &amp;ldquo;CSW&amp;rdquo;) opened the day with a talk titled &amp;ldquo;Turning XML Content to Executable Knowledge Bases.&amp;rdquo; He began with some basic philosophical ideas about what we mean by &amp;ldquo;knowledge&amp;rdquo; and then worked his way to how technologies like RDF/OWL can help store usable knowledge in applications. His talk, and a talk later that day by CSW&amp;rsquo;s Rueben Wright on &amp;ldquo;Applying Knowledge to Support Business Processes,&amp;rdquo; convinced me that CSW is more concerned with connecting metadata to data to improve the value of that data than a lot of people who dutifully define metadata as &amp;ldquo;data about data&amp;rdquo; and then assume that someone else will come up with some data that goes with their brilliant metadata.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://www.laurenwood.org/anyway/&#34;&gt;Lauren Wood&lt;/a&gt; put together the next day&amp;rsquo;s &amp;ldquo;Trends and Transients&amp;rdquo; track, but couldn&amp;rsquo;t attend in person because of a &lt;a href=&#34;http://www.laurenwood.org/anyway/archives/2006/06/14/first-days-home/&#34;&gt;new arrival&lt;/a&gt; in her family. This track&amp;rsquo;s final session gave all the track chairs five minutes to rant about whatever we wanted. A double espresso and my thoughts from the previous day about the relative practicality of different people doing metadata work inspired me to do a talk whose title alone took up a chunk of my five minutes: &amp;ldquo;If metadata is data about data, what data is your metadata about, and where can I find some, and don&amp;rsquo;t just tell me that it would be great if someone created some.&amp;rdquo; As the sort of rant that is typical for a weblog, I&amp;rsquo;m going to organize my notes and post that as a separate entry here.&lt;/p&gt;
&lt;p&gt;Peter and I have done the summer school every year that CSW has held it, so by now many other speakers and CSW employees are old friends. Having these friends spread throughout the XML world has been very handy, although we don&amp;rsquo;t end up discussing XML with the attendees at the scheduled evening social events as much as John Chelsom probably hoped. The evening events include &lt;a href=&#34;http://en.wikipedia.org/wiki/Punting&#34;&gt;punting&lt;/a&gt; up a picture-postcard River Cherwell to an idyllic pub and a scheduled pub crawl, complete with documentation, every year. Additional events this year included a tour of Oxford Castle/Prison, in use from 1009 to 1995 (John&amp;rsquo;s wife, an lawyer, told me how she had visited prisoners there) and a tour of and amazing dinner in &lt;a href=&#34;http://en.wikipedia.org/wiki/Cecil_Rhodes&#34;&gt;Cecil Rhodes&lt;/a&gt;&amp;rsquo; home.&lt;/p&gt;
&lt;p&gt;Because of CSW&amp;rsquo;s work in the health care and pharmaceutical field, several tracks and many speakers and attendees come from that world. My chance to learn about this came in the more informal settings, as I tried to express the combination of professional services (XML and metadata-related design, architecture, and implementation) and content services (conversion and enrichment, either automated or by hand, of potentially massive amounts of content) that my &lt;a href=&#34;http://www.innodata-isogen.com&#34;&gt;new employer&lt;/a&gt; provides.&lt;/p&gt;
&lt;p&gt;Most speakers and many attendees end up at the indoor/outdoor college bar at one end of the back quad at the end of the night. I may have asked too many British XML summer school participants &amp;ldquo;why are you drinking Corona when you have much better beer in this country?&amp;rdquo; Observing the attendees of other unrelated programs as they hung out in the bar was also a popular sport, although some made me embarrassed to be from the same country as them—imagine tables of drunken college-age girls wearing their new &amp;ldquo;Oxford&amp;rdquo; T-shirts and shrieking at each other: &amp;ldquo;Oh my GAWD! That&amp;rsquo;s AWESOME!&amp;rdquo;&lt;/p&gt;
&lt;p&gt;CSW&amp;rsquo;s Kerry Poulter, who as usual kept the frenetic week running smoothly, had dug up a few acoustic guitars in case any evening drinking led to a hootenanny, so one night in the back quad bar Eve got out her rollup keyboard, Tony Coates pulled out his electric ukulele, and we played until the bartender cut us off about sixteen bars into Led Zeppelin&amp;rsquo;s &amp;ldquo;Rock and Roll.&amp;rdquo; Our strange trio had some success with &amp;ldquo;Midnight Hour,&amp;rdquo; &amp;ldquo;It&amp;rsquo;s All Over Now,&amp;rdquo; and the Velvet Underground&amp;rsquo;s &amp;ldquo;Oh Sweet Nothing,&amp;rdquo; and I insisted that we start with &amp;ldquo;Wish You Were Here&amp;rdquo; to honor the recent passing of its subject, Syd Barrett. I&amp;rsquo;ve sat around with acoustics and alcohol and friends before, but never with a wireless laptop to double-check on chords and lyrics.&lt;/p&gt;
&lt;p&gt;I certainly learn a lot at the XML Summer School each year, and judging by the number of attendees who return for multiple years to attend different tracks each time, they&amp;rsquo;re learning a lot too. Oxford is beautiful, and a little independent tourism in the area always pays off—many people (Paul Prescod this year, &lt;a href=&#34;http://seanmcgrath.blogspot.com/&#34;&gt;Sean McGrath&lt;/a&gt; and Lauren Wood and &lt;a href=&#34;http://tbray.org/ongoing/&#34;&gt;Tim Bray&lt;/a&gt; last year, me for a few past years and hopefully next year, and some attendees as well) bring their families along. If you&amp;rsquo;re interested in going next year, sign up early, because they ended up turning away people for this summer&amp;rsquo;s session.&lt;/p&gt;
&lt;h2 id=&#34;1-comments&#34;&gt;1 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://dannyayers.com&#34; title=&#34;http://dannyayers.com&#34;&gt;Danny&lt;/a&gt; on &lt;a href=&#34;#comment-476&#34;&gt;July 31, 2006 9:04 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Any pointers on papers, docs? (In particular the RDF/OWL stuff &amp;amp; Ajax+XSLT).&lt;/p&gt;
&lt;p&gt;btw, it&amp;rsquo;s &amp;ldquo;Oh! Sweet Nuthin&amp;rsquo;&amp;rdquo; - only significant when you go hunting the tab ;-)&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2006">2006</category>
      
      <category domain="https://www.bobdc.com//categories/xml">XML</category>
      
    </item>
    
    <item>
      <title>So You Want to Write a Book (about software)?</title>
      <link>https://www.bobdc.com/blog/so-you-want-to-write-a-book-ab/</link>
      <pubDate>Thu, 20 Jul 2006 08:42:31 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/so-you-want-to-write-a-book-ab/</guid>
      
      
      <description><div>Advice from some people who know.</div><div>&lt;p&gt;This installment of my series on &lt;a href=&#34;http://www.snee.com/bobdc.blog/publishing/documenting_software/&#34;&gt;writing about software&lt;/a&gt; is brief: O&amp;rsquo;Reilly has an excellent online publication called &lt;a href=&#34;https://web.archive.org/web/20060719112206/http://www.oreilly.com/oreilly/author/&#34;&gt;So You Want to Write a Book?&lt;/a&gt;, and it&amp;rsquo;s full of good advice whether you plan to write a complete book or not. O&amp;rsquo;Reilly isn&amp;rsquo;t the only publisher of computer books out there, but they&amp;rsquo;re certainly one of the key ones, and this book offers excellent advice for preparing your content and for assembling all the necessary information if you want to move forward with publishing a complete book with them, with another publisher, or &lt;a href=&#34;https://www.bobdc.com/blog/selfpublishing-bound-hardcopy&#34;&gt;on your own&lt;/a&gt;. Even tech writers creating documentation for internal use at their company can learn something about book preparation from it.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2006">2006</category>
      
      <category domain="https://www.bobdc.com//categories/documenting-software">documenting software</category>
      
    </item>
    
    <item>
      <title>Nice parodies of &#34;Mac hipper than PC&#34; ads</title>
      <link>https://www.bobdc.com/blog/nice-parodies-of-mac-hipper-th/</link>
      <pubDate>Wed, 12 Jul 2006 08:16:38 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/nice-parodies-of-mac-hipper-th/</guid>
      
      
      <description><div>The one in the T-shirt and hoodie is the cool one, not the one in the suit, right?</div><div>&lt;p&gt;I&amp;rsquo;ve found the recent U.S. TV ads of Groovy Mac Guy and Staid PC Suit Guy to be pretty annoying, especially considering the recent discussions by Mac people of how appealing Ubuntu has been looking to them (&lt;a href=&#34;http://www.megginson.com/blogs/quoderat/archives/2006/06/29/macos-vs-ubuntu-linux/&#34;&gt;Dave Megginson&lt;/a&gt;, &lt;a href=&#34;http://www.tbray.org/ongoing/When/200x/2006/06/15/Switch-From-Mac&#34;&gt;Tim Bray&lt;/a&gt;, and others they point to). It would be nice to add a Linux guy (who, to be honest, would probably look a lot like &lt;a href=&#34;http://cbg.nohomers.net/&#34;&gt;Comic Book Guy&lt;/a&gt;) discussing with the PC guy how much &lt;a href=&#34;http://www.theopencd.org/&#34;&gt;open source software&lt;/a&gt; they had in common, but I suppose Mac guy would then say that with OS X, they&amp;rsquo;re catching up there—not in so many words, of course.&lt;/p&gt;
&lt;p&gt;Anyway, YouTube has some &lt;a href=&#34;http://www.youtube.com/watch?v=UA3NyRr4Eng&#34;&gt;hilarious parodies&lt;/a&gt; of the Mac ads.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2006">2006</category>
      
      <category domain="https://www.bobdc.com//categories/miscellaneous">miscellaneous</category>
      
    </item>
    
    <item>
      <title>Opening Pandora&#39;s (music) box</title>
      <link>https://www.bobdc.com/blog/opening-pandoras-music-box/</link>
      <pubDate>Thu, 06 Jul 2006 15:55:01 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/opening-pandoras-music-box/</guid>
      
      
      <description><div>Free online music and algorithmic suggestions.</div><div>&lt;p&gt;I&amp;rsquo;ve been listening to music on the &lt;a href=&#34;http://www.pandora.com/&#34;&gt;Pandora&lt;/a&gt; internet radio site for a while now. After creating a free account, you define a &amp;ldquo;channel&amp;rdquo; by naming one or more artists, and then they play whatever music they have by that artist and music that they judge to be similar. Similarity rankings are based on attributes they&amp;rsquo;ve assigned to different artists and judgments by their listeners who&amp;rsquo;ve clicked the Thumbs Up or Thumbs Down icons available with each song. Of course, links are provided if you like something enough to buy it, and ads for music include links to let you add that artist as a new channel in your collection.&lt;/p&gt;
&lt;p&gt;Sometimes the attributes they judge by are not the attributes you&amp;rsquo;re interested in. I created a channel called &amp;ldquo;crunchy art rock&amp;rdquo; that started off with Radiohead and Beck because I like music that balances rocking guitars with less melodic electronic noise (I still don&amp;rsquo;t own Roxy Music&amp;rsquo;s first album, and really should pick up a copy), especially the old school analog electronics. When that channel started playing lots of quiet music with obscure lyrics that were &amp;ldquo;sensitive&amp;rdquo; to the point of whining, I realized that that what Pandora found that Radiohead and Beck had in common was different from what I liked about them, so I didn&amp;rsquo;t listen to that channel much. Recently, adding Sonic Youth and Pere Ubu to that channel has helped, although I hear a lot of fairly generic &amp;ldquo;alternative&amp;rdquo; rock there now.&lt;/p&gt;
&lt;p&gt;An even funnier misjudgment came when I added Doris Day&amp;rsquo;s name to a channel of mostly 1940&amp;rsquo;s swing that I started with Artie Shaw&amp;rsquo;s name (I&amp;rsquo;ve recently gotten to appreciate him more, but not enough to run out and buy a CD), because what little I&amp;rsquo;d heard of her early singing was surprisingly good. Pandora seemed to consider Day&amp;rsquo;s later, more well-known hits as representative, and put songs by the Carpenters and Barbara Streisand on the channel, so I swiftly removed Doris Day. I usually listen to the &amp;ldquo;jazz (mostly bass)&amp;rdquo; channel that I started with the names of bass players who have fronted groups; it&amp;rsquo;s dominated by small group hardbop and makes good listening while working.&lt;/p&gt;
&lt;p&gt;My brother suggested that I try &lt;a href=&#34;http://www.last.fm/&#34;&gt;last.fm&lt;/a&gt;, a similar service, to compare the wisdom of their algorithm for guessing what I&amp;rsquo;d like with Pandora&amp;rsquo;s. I didn&amp;rsquo;t like last.fm as much, but not because of their algorithm. First, they make you download and install some client software before using it, which would have seemed reasonable enough three years ago but already seems old-fashioned—Pandora has a much more Web 2.0 approach, keeping all the logic and interface on the server (and providing one exception to the rule that Flash sucks). Secondly, when you&amp;rsquo;ve listened to enough music on one of these services from an artist you&amp;rsquo;re familiar with, you realize that the service licensed whatever they could, which may or may not be a large sample of that artist&amp;rsquo;s music. It&amp;rsquo;s fun to talk and guess about the algorithm they use to suggest music to you, but when you listen for more than an hour or two, the size of the pool that they can draw from based on your criteria becomes much more important than the magic predictive algorithm. Otherwise, you end up listening to the same things over and over. I was also annoyed when a last.fm search for &amp;ldquo;Ray Brown&amp;rdquo; didn&amp;rsquo;t turn anything up, but an artist that they suggested as similar to Charlie Mingus was the &amp;ldquo;Ray Brown Trio&amp;rdquo;—if last.fm&amp;rsquo;s ability to search artist names couldn&amp;rsquo;t match up &amp;ldquo;Ray Brown&amp;rdquo; with &amp;ldquo;Ray Brown Trio,&amp;rdquo; then I can&amp;rsquo;t have much faith in their predictive algorithm or even the overall architecture of their system. (For fun, I asked last.fm what artists were similar to the Beatles, and they listed the Rolling Stones, Radiohead, Led Zeppelin, and the Eagles. I guess that either being English or having multi-platinum albums makes you similar to the Beatles.)&lt;/p&gt;
&lt;p&gt;Pandora knows that the size of their collection is an issue, because every now and then I get an email mentioning new material they&amp;rsquo;ve acquired that fits in with one of my channels, so I may go back and listen to that channel after ignoring it for a while. Another neat touch is that they make it easy to email anyone about a channel, and the email includes a URL that lets the recipient jump right in and listen to that channel. (Let me know if you&amp;rsquo;d like me to have them email you any these channels.) They don&amp;rsquo;t seem to have any classical music, which would also be nice for work purposes. Even big-name performers like Gil Shahan and Yo Yo Ma get no hits, and when I searched for &amp;ldquo;Ludwig van Beethoven,&amp;rdquo; after having me confirm that it was an artist name, Pandora asked me if I meant &amp;ldquo;Camper Van Beethoven.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;If I was younger, I&amp;rsquo;d play one of those &amp;ldquo;look how eclectic I am&amp;rdquo; games on Pandora and create a channel with several artists from around the world who have nothing to do with each other to see what Pandora&amp;rsquo;s algorithm comes up with, but I&amp;rsquo;m more interested in finding good music to listen to while working that isn&amp;rsquo;t too distracting but may lead me to discover new music in a particular category. For discovering international music from outside of my usual categories, I&amp;rsquo;ll stick with the streaming audio of the &lt;a href=&#34;http://www.wfmu.org/&#34;&gt;WFMU&lt;/a&gt; shows &lt;a href=&#34;http://www.wfmu.org/Playlists/Doug/&#34;&gt;Give the Drummer Some&lt;/a&gt; and &lt;a href=&#34;http://www.wfmu.org/playlists/TP&#34;&gt;Transpacific Sound Paradise&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;5-comments&#34;&gt;5 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://dannyayers.com&#34; title=&#34;http://dannyayers.com&#34;&gt;Danny&lt;/a&gt; on &lt;a href=&#34;#comment-450&#34;&gt;July 6, 2006 5:38 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Made my day: &amp;ldquo;Radiohead&amp;hellip;Beck&amp;hellip;adding Sonic Youth and Pere Ubu to that channel has helped&amp;hellip;&amp;rdquo;&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://xmlhacker.com&#34; title=&#34;http://xmlhacker.com&#34;&gt;M. David Peterson&lt;/a&gt; on &lt;a href=&#34;#comment-451&#34;&gt;July 7, 2006 1:42 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I haven&amp;rsquo;t clicked Danny&amp;rsquo;s trackback to see if he feels the same way I do, but from the lead-in it seems that he does&amp;hellip;&lt;/p&gt;
&lt;p&gt;To both read your analysis, and notice that you have such FANTASTIC taste in music brings a smile to my face. Anybody who can list out and compare Radiohead, Beck, Sonic Youth, etc&amp;hellip; and be able to understand how they are related, and how they are not is just fanastic in my book :)&lt;/p&gt;
&lt;p&gt;Nice! :D&lt;/p&gt;
&lt;p&gt;By Joe on &lt;a href=&#34;#comment-452&#34;&gt;July 8, 2006 2:39 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;For what it&amp;rsquo;s worth, the Pandora folks openly acknowledge that it works better if you specify songs rather than artists for the radio stations, for reasons that your post suggests.&lt;/p&gt;
&lt;p&gt;Thus, if you don&amp;rsquo;t like Doris Day&amp;rsquo;s later work, you should specify a few specific early Doris Day songs that you do like. Also, the thumbs up/thumbs down thing is only relevant within a given channel, so you don&amp;rsquo;t necessarily have to feel like you&amp;rsquo;re permanently harming Pandora&amp;rsquo;s likelihood of giving you something you &amp;ldquo;sometimes like&amp;rdquo; &amp;ndash; if it doesn&amp;rsquo;t fit on your idea of a channel, give it a thumbs down and you can make another channel where it does belong.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;m not shilling for Pandora &amp;ndash; just sharing some stuff I&amp;rsquo;ve picked up along the way.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-454&#34;&gt;July 9, 2006 2:28 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Thanks Joe, this is useful stuff to know.&lt;/p&gt;
&lt;p&gt;Bob&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://plasmasturm.org/&#34; title=&#34;http://plasmasturm.org/&#34;&gt;Aristotle Pagaltzis&lt;/a&gt; on &lt;a href=&#34;#comment-462&#34;&gt;July 16, 2006 7:05 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;My understanding is that last.fm suggest to you songs that other people like who listen to stuff you like – the Web 2.0 “wisdom of the crowd” thing and all that. Of course, the result of that approach is that all popular music is deemed similar to all the other popular music, within rough genres – hence the Beatles being similar to the Rolling Stones, Radiohead, Led Zeppelin, and the Eagles.&lt;/p&gt;
&lt;p&gt;Also, having to download software, to me, makes more sense than the Pandora approach. With Pandora, I’ve got to have a browser open, and I can’t listen to my own music. With the last.fm client (or one of the many third-party plugins to connect various music/media players to the service), your playlist just piles up as a side-effect of your regular music listening: no need to change habits. So both approaches have their pros and contras.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2006">2006</category>
      
      <category domain="https://www.bobdc.com//categories/miscellaneous">miscellaneous</category>
      
    </item>
    
    <item>
      <title>TagSoup 1.0 released</title>
      <link>https://www.bobdc.com/blog/tagsoup-10-released/</link>
      <pubDate>Tue, 27 Jun 2006 08:22:49 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/tagsoup-10-released/</guid>
      
      
      <description><div>A milestone for a very useful open source XML utility.</div><div>&lt;p&gt;John Cowan recently &lt;a href=&#34;http://recycledknowledge.blogspot.com/2006/06/tagsoup-10-final-released.html&#34;&gt;announced&lt;/a&gt; the availability of release 1.0 of &lt;a href=&#34;http://home.ccil.org/~cowan/XML/tagsoup/&#34;&gt;TagSoup&lt;/a&gt;, his Open Source Java tool that parses even the ugliest HTML and lets you treat it like well-formed XML.&lt;/p&gt;
&lt;p&gt;This single jar file does a lot for a 50K file. Although it started off purely as a library with an API interface, it eventually acquired a command line interface. Enter a command like the following (without the carriage return) at your operating system prompt to create an XHTML version of the input:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;java -jar tagsoup-1.0.jar 
  http://home.ccil.org/~cowan/XML/tagsoup/extreme.html
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The input in the example above is John&amp;rsquo;s file of particularly &amp;ldquo;evil&amp;rdquo; HTML; you can pass a local filename as the argument as well. The wide selection of optional command line parameters include various options for dealing with &amp;ldquo;bogons,&amp;rdquo; or unknown elements: you can tell TagSoup to leave them alone (other than ensuring that they&amp;rsquo;re well-formed), delete them, or render them empty with their contents moved outside of their tag boundaries.&lt;/p&gt;
&lt;p&gt;Dave Raggett&amp;rsquo;s &lt;a href=&#34;http://www.w3.org/People/Raggett/tidy/&#34;&gt;HTML Tidy&lt;/a&gt; program has been justly popular for cleaning up messy HTML, but it can be a bit picky. The Tag Soup motto is &amp;ldquo;Keep On Truckin&amp;rsquo;,&amp;rdquo; and it will forge ahead to do its best with whatever you give it. (Try a View Source of the evil.html mentioned above to see the kind of HTML that it valiantly navigates.) The TagSoup home page further describes its differences from HTML Tidy.&lt;/p&gt;
&lt;p&gt;A companion to TagSoup is &lt;a href=&#34;http://mercury.ccil.org/~cowan/XML/tagsoup/tsaxon/&#34;&gt;TSaxon&lt;/a&gt;, a repackaging of version 6.5.3 of Michael Kay&amp;rsquo;s Saxon XSLT 1.0 implementation that includes TagSoup. (I would have posted about TagSoup 1.0 earlier, but John was straightening out some jar packaging problems with TSaxon.) Point TSaxon at an XSLT stylesheet and some ugly HTML, and the TagSoup parser will clean up the HTML before passing it along to Saxon to have the stylesheet applied. For example, the following (without the carriage returns), when using the TSaxon version of saxon.jar, adds &lt;code&gt;id&lt;/code&gt; attributes to block elements of the cleaned-up version of evil.html:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;java -jar saxon.jar -H 
  http://home.ccil.org/~cowan/XML/tagsoup/extreme.html 
  http://www.snee.com/xml/xslt/addids.xsl
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;John has worked on this for four years, and was careful not push it along any faster than it deserved to go (I wanted to write about TSaxon in my XML.com column on XSLT, but he insisted that it wasn&amp;rsquo;t ready yet), so reaching 1.0 is really a milestone for TagSoup. It&amp;rsquo;s quite a gift to people who do screenscraping or any kind of beating into shape of messy HTML content.&lt;/p&gt;
&lt;h2 id=&#34;1-comments&#34;&gt;1 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://xam.de&#34; title=&#34;http://xam.de&#34;&gt;Max Völkel&lt;/a&gt; on &lt;a href=&#34;#comment-449&#34;&gt;July 6, 2006 9:12 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;You should also have a look at CyberNeko, that libary by Andy Clark is well-maintained and well-performing.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2006">2006</category>
      
      <category domain="https://www.bobdc.com//categories/xml">XML</category>
      
    </item>
    
    <item>
      <title>A nice Windows alternative to Acrobat Reader</title>
      <link>https://www.bobdc.com/blog/a-nice-windows-alternative-to/</link>
      <pubDate>Tue, 20 Jun 2006 09:04:01 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/a-nice-windows-alternative-to/</guid>
      
      
      <description><div>A recent Adobe Acrobat Reader Acrobat upgrade gave me a little too much extra time for Adobe&#39;s own good.</div><div>&lt;p&gt;&lt;a href=&#34;http://www.foxitsoftware.com/pdf/rd_intro.php&#34;&gt;&lt;img src=&#34;http://www.foxitsoftware.com/image/foxit_logo.gif&#34; alt=&#34;[foxit software logo]&#34; border=&#34;0&#34; align=&#34;right&#34; hspace=&#34;30px&#34; vspace=&#34;30px&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Tim Bray&amp;rsquo;s &lt;a href=&#34;http://www.tbray.org/ongoing/When/200x/2006/06/15/Switch-From-Mac&#34;&gt;recent posting&lt;/a&gt; on potentially moving away from the Mac to Ubuntu reminded me how much I like the small, lightweight alternatives to Adobe Acrobat Reader on Ubuntu. While trying to read a PDF file that same day on a Windows machine, I instead got yet another Adobe reminder about an Acrobat Reader upgrade and &amp;ldquo;related&amp;rdquo; useless add-ons that were far outside of Adobe&amp;rsquo;s core competencies (a photo album?). The Reader upgrade was marked as &amp;ldquo;critical,&amp;rdquo; and it also told me that I could continue using Reader while it installed, so I told it to go ahead.&lt;/p&gt;
&lt;p&gt;The latter part was a lie. It closed Reader, so during the upgrade I did a Google search for &amp;ldquo;pdf reader windows&amp;rdquo; and discovered &lt;a href=&#34;http://www.foxitsoftware.com/pdf/rd_intro.php&#34;&gt;Foxit Reader&lt;/a&gt;. It&amp;rsquo;s free, and there&amp;rsquo;s no installation routine; you just start up the one-meg EXE file and you&amp;rsquo;re using it. Quickly.&lt;/p&gt;
&lt;p&gt;I love it, and highly recommend it. At first Firefox still opened downloaded documents in Acrobat, but I found out &lt;a href=&#34;http://www.lifehacker.com/software/productivity/download-of-the-day-foxit-pdf-reader-109741.php&#34;&gt;from Lifehacker&lt;/a&gt; how to get Firefox to use Foxit instead. I&amp;rsquo;ve spent too much time on &lt;a href=&#34;http://sastools.com/b2/post/79394202&#34;&gt;various tricks&lt;/a&gt; to get Adobe Reader to start up faster, but no more. Foxit Reader will definitely add a few more free minutes to each of my working days, and I&amp;rsquo;ll be happy to never again have to look at an Adobe upgrade notice when I&amp;rsquo;m trying to read a PDF file.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2006">2006</category>
      
      <category domain="https://www.bobdc.com//categories/neat-tricks">neat tricks</category>
      
    </item>
    
    <item>
      <title>Creating an affiliate website</title>
      <link>https://www.bobdc.com/blog/creating-an-affiliate-website/</link>
      <pubDate>Wed, 14 Jun 2006 08:53:06 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/creating-an-affiliate-website/</guid>
      
      
      <description><div>For fun and very little profit.</div><div>&lt;p&gt;If you click &lt;a href=&#34;http://www.amazon.com/exec/obidos/ISBN=0262510871/bobducharmeA/&#34;&gt;this link&lt;/a&gt;, you&amp;rsquo;ll find that it leads to an Amazon web page where you can buy Abelson and Sussman&amp;rsquo;s &amp;ldquo;Structure and Interpretation of Computer Programs&amp;rdquo;. &lt;a href=&#34;http://www.amazon.com/gp/product/0262510871&#34;&gt;This link&lt;/a&gt; also links to an Amazon page where you can buy the classic computer science textbook, but I&amp;rsquo;d rather that you followed the first link if you&amp;rsquo;re going to buy it. The URL includes a parameter telling Amazon that you came there from a site created by someone with the Amazon affiliate ID bobducharmeA, so they&amp;rsquo;ll pay me a commission for the sale.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://www.hipstergifts.com&#34;&gt;&lt;img src=&#34;http://www.hipstergifts.com/img/hipstergifts.gif&#34; alt=&#34;[hipstergifts.com logo]&#34; border=&#34;0&#34; align=&#34;right&#34; hspace=&#34;30px&#34; vspace=&#34;30px&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Amazon is the most famous web site with an affiliate program, but there are many others. I started keeping a running list of these sites when I noticed online stores with cool stuff that offered such a program, thinking that some day I could create a &amp;ldquo;virtual&amp;rdquo; store of nothing but links to products that I liked on a collection of affiliate sites. Somewhere in that time, I saw that the domain name hipstergifts.com wasn&amp;rsquo;t taken, and I couldn&amp;rsquo;t resist parking it.&lt;/p&gt;
&lt;p&gt;As a between-jobs project, I decided to follow through on this and created &lt;a href=&#34;http://www.hipstergifts.com&#34;&gt;hipstergifts.com&lt;/a&gt;. I wondered how little work and money was necessary to create a professional-looking website that would potentially make some money.&lt;/p&gt;
&lt;p&gt;The main work was registering as an affiliate on each site, selecting the products to point to, and trying to learn exactly what URL would point to each product while giving me credit for any resulting sales. Looking through gag gift web sites and hip hop discount bling jewelry and clothing sites (two common categories out there of affiliate sites with a sense of humor) was fun, although I think my wife got tired of me pointing out products like &lt;a href=&#34;http://www.anrdoezrs.net/click-1973330-610365?url=http%3A%2F%2Fwuwearshoes.com%2FMerchant2%2Fmerchant.mv%3FScreen%3DPROD%26Product_Code%3DUSZIDN%26Category_Code%3DWD%26Store_Code%3DW&#34;&gt;Wu Wear Baby Shoes&lt;/a&gt; and &lt;a href=&#34;http://www.prankplace.com/bullet.htm?KBID=3453&#34;&gt;fake bullet hole decals&lt;/a&gt; for cars. You won&amp;rsquo;t see too many entries on my &lt;a href=&#34;http://www.hipstergifts.com/vendors.html&#34;&gt;vendors page&lt;/a&gt; because applying to an affiliate program doesn&amp;rsquo;t automatically get you in—for some, I apparently didn&amp;rsquo;t make the grade.&lt;/p&gt;
&lt;p&gt;I stored the product and link information in an XML file so that I could generate the web pages with an XSLT stylesheet.To make the website look somewhat professional, I found an open source CSS stylesheet on &lt;a href=&#34;http://www.openwebdesign.org/&#34;&gt;Open Web Design&lt;/a&gt;, which I&amp;rsquo;ve written about here &lt;a href=&#34;https://www.bobdc.com/blog/easy-professionallooking-websi&#34;&gt;before&lt;/a&gt; (note the last line of that weblog posting in particular). I even spent $25 to have &lt;a href=&#34;http://www.gotlogos.com/&#34;&gt;gotlogos.com&lt;/a&gt; design a hipstergifts logo. I&amp;rsquo;m not completely thrilled with the logo they came up with, but considering what I spent, it&amp;rsquo;s fine and I have little right to complain.&lt;/p&gt;
&lt;p&gt;To keep the hipstergifts.com main page from looking too static, I have a cron job run a stylesheet that picks a random product from each category to recreate the index.html page each morning. For publicity, I bought a few related keywords on &lt;a href=&#34;https://adwords.google.com/select/&#34;&gt;Google adwords&lt;/a&gt; and &lt;a href=&#34;http://searchmarketing.yahoo.com/&#34;&gt;Yahoo sponsored search&lt;/a&gt;, but the ads don&amp;rsquo;t rank too highly. &amp;ldquo;Gifts&amp;rdquo; is an expensive keyword.&lt;/p&gt;
&lt;p&gt;My fantasy was to set up this web site and then forget about it as money rolled in, but I doubt if my logo is even paid for in the two months since the site has been up. I could work harder at the &lt;a href=&#34;http://en.wikipedia.org/wiki/Search_engine_optimization&#34;&gt;search engine optimization&lt;/a&gt; part—there is an entire subculture of people who obsess over this, and some make very good money—but now that I have a real job again I have less time for that hobby.&lt;/p&gt;
&lt;p&gt;If you need any fake bullet holes for your car, though, or gimmicky shooting toys or a backwards clock or a fake computer mouse that gives people an electronic shock, please remember &lt;a href=&#34;http://www.hipstergifts.com&#34;&gt;hipstergifts.com&lt;/a&gt;. Especially at holiday time.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2006">2006</category>
      
      <category domain="https://www.bobdc.com//categories/miscellaneous">miscellaneous</category>
      
    </item>
    
    <item>
      <title>RDF metadata in XHTML gets even easier</title>
      <link>https://www.bobdc.com/blog/rdf-metadata-in-xhtml-gets-eve/</link>
      <pubDate>Thu, 08 Jun 2006 08:39:12 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/rdf-metadata-in-xhtml-gets-eve/</guid>
      
      
      <description><div>Elias Torres did the hard part; join in with the fun part!</div><div>&lt;p&gt;I&amp;rsquo;ve &lt;a href=&#34;http://www.snee.com/cgi-sys/cgiwrap/bobd/managed-mt/mt-search.cgi?IncludeBlogs=2&amp;amp;search=rdf%2Fa&#34;&gt;written here before&lt;/a&gt; about RDF/A (now known as &lt;a href=&#34;http://www.w3.org/TR/xhtml-rdfa-primer/&#34;&gt;RDFa&lt;/a&gt;), the spec for embedding RDF triples into XHTML using existing XHTML markup. I&amp;rsquo;ve felt for a while that it holds great promise for making RDF easier to use and easier to incorporate into typical web pages, thereby allowing the creation of a real semantic web of RDF data. I had vague plans to write an XSLT stylesheet that would extract the RDF triples from an XHTML file&amp;rsquo;s RDFa markup, and for sample input I did put together a &lt;a href=&#34;http://www.snee.com/xml/rdfa/rdfa1.html&#34;&gt;test document&lt;/a&gt; that incorporates a lot of sample RDFa from a March version of the RDFa Primer.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://www.w3.org/TR/xhtml-rdfa-primer/&#34;&gt;&lt;img src=&#34;http://www.w3.org/RDF/icons/rdf_w3c_icon.128&#34; alt=&#34;[RDF logo]&#34; border=&#34;0&#34; align=&#34;right&#34; hspace=&#34;30px&#34; vspace=&#34;30px&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;While I put off writing the stylesheet that would do this, Elias Torres wasn&amp;rsquo;t as lazy as me, and he went ahead and created an &lt;a href=&#34;http://torrez.us/rdfa&#34;&gt;RDFa Extractor&lt;/a&gt;. The REST interface makes it easy to extract RDF/XML triples from an existing document; check out the &lt;a href=&#34;http://torrez.us/services/rdfa/?url=http%3A%2F%2Fwww.snee.com%2Fxml%2Frdfa%2Frdfa1.html&#34;&gt;triples from my first test document&lt;/a&gt; that it pulled out.&lt;/p&gt;
&lt;p&gt;Many people are interested in RDFa for its ability to add semantics to existing data—for example, to add markup around a string of digits that already exist in a web page to indicate that the string is a &lt;a href=&#34;http://xmlns.com/foaf/0.1/phone&#34;&gt;http://xmlns.com/foaf/0.1/phone&lt;/a&gt; number. (As the &lt;a href=&#34;http://www.w3.org/TR/2006/WD-xhtml-rdfa-primer-20060516/&#34;&gt;primer&lt;/a&gt; tells us, &amp;ldquo;An important goal of RDFa is to achieve this RDF embedding without repeating existing XHTML content when that content is the metadata.&amp;rdquo;) Because of my work with the &lt;a href=&#34;http://www.prismstandard.org/&#34;&gt;PRISM&lt;/a&gt; group, I was interested in adding data that&amp;rsquo;s a little more meta, such as production workflow data, in which the actual metadata values are not part of the content. After a few questions on the &lt;a href=&#34;http://lists.w3.org/Archives/Public/public-rdf-in-xhtml-tf/&#34;&gt;rdf-in-xhtml&lt;/a&gt; mailing list, I found this pretty easy. To test this kind of metadata, I created &lt;a href=&#34;http://www.snee.com/xml/rdfa/rdfa2.html&#34;&gt;rdfa2.html&lt;/a&gt; yesterday to see what Elias&amp;rsquo;s program would do with it, and the &lt;a href=&#34;http://torrez.us/services/rdfa/?url=http%3A%2F%2Fwww.snee.com%2Fxml%2Frdfa%2Frdfa2.html&#34;&gt;results&lt;/a&gt; are great.&lt;/p&gt;
&lt;p&gt;There are several things that I like about these results. First, I put an empty string as the subject of the metadata about the document itself, as shown below, and Elias&amp;rsquo;s extractor created triples with the document&amp;rsquo;s full URL as the subject of the triples. It also created separate triples for the two dc:subject properties that I assigned to the document.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;lt;meta about= &amp;quot;&amp;quot;&amp;gt;
  &amp;lt;meta property=&amp;quot;dc:title&amp;quot; content=&amp;quot;Meta-metadata&amp;quot;/&amp;gt;
  &amp;lt;meta property=&amp;quot;dc:date&amp;quot; content=&amp;quot;2006-06-06&amp;quot;/&amp;gt;
  &amp;lt;meta property=&amp;quot;dc:subject&amp;quot; content=&amp;quot;metadata&amp;quot;/&amp;gt;
  &amp;lt;meta property=&amp;quot;dc:subject&amp;quot; content=&amp;quot;RDFa test document 2&amp;quot;/&amp;gt;
&amp;lt;/meta&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Publishers in the PRISM group were concerned the ability to assign out-of-line metadata to specific sections of the document, such as a recipe or image within a larger document. To test this, the rdfa2.html document has the following in the content of its &lt;code&gt;body&lt;/code&gt; element (the &lt;code&gt;section&lt;/code&gt; element is a nice new XHTML 2 feature):&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;lt;section id=&amp;quot;s1&amp;quot;&amp;gt;
  &amp;lt;h2&amp;gt;Part one&amp;lt;/h2&amp;gt;
  &amp;lt;p&amp;gt;This document has very little data, but plenty of metadata.&amp;lt;/p&amp;gt;
  &amp;lt;p&amp;gt;It&#39;s my second RDFa test document. I created my 
   &amp;lt;a id=&amp;quot;l1&amp;quot; href=&#39;rdfa1.html&#39;&amp;gt;first one&amp;lt;/a&amp;gt; several months ago.&amp;lt;/p&amp;gt;
&amp;lt;/section&amp;gt;
&amp;lt;section id=&amp;quot;s2&amp;quot;&amp;gt;
  &amp;lt;h2&amp;gt;Part two&amp;lt;/h2&amp;gt;
  &amp;lt;p&amp;gt;This concludes our test.&amp;lt;/p&amp;gt;
&amp;lt;/section&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;It also has the following inside the &lt;code&gt;html/head&lt;/code&gt; element, where the &lt;code&gt;meta&lt;/code&gt; elements shown earlier with metadata about the document itself are stored:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;lt;meta about=&amp;quot;#s1&amp;quot;&amp;gt;
  &amp;lt;meta property=&amp;quot;sn:goofinessFactor&amp;quot; content=&amp;quot;3.2&amp;quot;/&amp;gt;
  &amp;lt;meta property=&amp;quot;sn:direction&amp;quot; content=&amp;quot;south&amp;quot;/&amp;gt;
  &amp;lt;meta property=&amp;quot;sn:editor&amp;quot; content=&amp;quot;lj&amp;quot;/&amp;gt;
&amp;lt;/meta&amp;gt;


&amp;lt;meta about=&amp;quot;#s2&amp;quot;&amp;gt;
 &amp;lt;meta property=&amp;quot;sn:goofinessFactor&amp;quot; content=&amp;quot;4.3&amp;quot;/&amp;gt;
 &amp;lt;meta property=&amp;quot;sn:direction&amp;quot; content=&amp;quot;north&amp;quot;/&amp;gt;
 &amp;lt;meta property=&amp;quot;sn:editor&amp;quot; content=&amp;quot;tr&amp;quot;/&amp;gt;
&amp;lt;/meta&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;(When I test the assignment of arbitrary metadata, I like to pick some pretty arbitrary metadata.) Again, Elias&amp;rsquo;s extractor extracted the triples I hoped to see. Right after those two sets of nested &lt;code&gt;meta&lt;/code&gt; elements, I had a third that assigned metadata to the &lt;code&gt;a&lt;/code&gt; link element inside of section s1:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;lt;meta about=&amp;quot;#l1&amp;quot;&amp;gt;
  &amp;lt;meta property=&amp;quot;sn:cost&amp;quot; content=&amp;quot;0&amp;quot;/&amp;gt;
  &amp;lt;meta property=&amp;quot;sn:lastChecked&amp;quot; content=&amp;quot;2006-06-06T09:04&amp;quot;/&amp;gt;
  &amp;lt;meta property=&amp;quot;sn:type&amp;quot; content=&amp;quot;cite&amp;quot;/&amp;gt;
&amp;lt;/meta&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The history of advanced linking architectures is mostly a series of arguments over the appropriate metadata to store with the address (direct or indirect) of the link destination, the one piece of information that a link can&amp;rsquo;t do without. Different people have different ideas about what &amp;ldquo;typical&amp;rdquo; applications need, and a committee that comes up with a common set of additional metadata typically end up with a mess. RDFa gives people the ability to add whatever metadata they like (with the precisely defined semantics that can come from property names in specific namespaces), which could enable some big advances in linking applications.&lt;/p&gt;
&lt;p&gt;This assignment of metadata to an entire document, to sections within it, and to a specific link within it was just some quick dabbling. There are many other ways that RDFa could be valuable, and Elias&amp;rsquo;s extractor makes them easy to test. So get out there and create new RDFa! Just take some existing web pages, or mock some up, and add semantics to them—movie schedules, directions, parts catalogs, home pages—and see what Elias&amp;rsquo;s RDFa extractor gets out of them.&lt;/p&gt;
&lt;h2 id=&#34;4-comments&#34;&gt;4 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://captsolo.net/info/&#34; title=&#34;http://captsolo.net/info/&#34;&gt;CaptSolo&lt;/a&gt; on &lt;a href=&#34;#comment-437&#34;&gt;June 8, 2006 1:41 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Christoph Gorn has made a nice use of RDFa:&lt;/p&gt;
&lt;p&gt;He already has SIOC profiles with RDF metadata about blog and its posts.&lt;/p&gt;
&lt;p&gt;Duplicating it all as embedded RDF would not make sense, but RDFa can be nicely used to create rdfs:seeAlso links to the profiles with more RDF data about a post. That &amp;rsquo;d be especially useful for pages with multiple posts per page, such as monthly archives.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://captsolo.net/info/&#34; title=&#34;http://captsolo.net/info/&#34;&gt;CaptSolo&lt;/a&gt; on &lt;a href=&#34;#comment-438&#34;&gt;June 8, 2006 1:46 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Here&amp;rsquo;s more info:&lt;br /&gt;
&lt;a href=&#34;http://b4mad.net/datenbrei/archives/2006/06/07/seealso-for-sioc-hooked-in-page-via-rdfa/&#34;&gt;http://b4mad.net/datenbrei/archives/2006/06/07/seealso-for-sioc-hooked-in-page-via-rdfa/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.jasonkolb.com&#34; title=&#34;http://www.jasonkolb.com&#34;&gt;Jason Kolb&lt;/a&gt; on &lt;a href=&#34;#comment-439&#34;&gt;June 13, 2006 8:53 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;This is a very nice, elegant approach. The ONLY thing I don&amp;rsquo;t like about it as much as using microformats is that the data doesn&amp;rsquo;t stay with the text itself. How would say, a feed reader handle this, and would it be able to extract a brief &amp;ldquo;sample&amp;rdquo; text chunk from the text and carry the meta data with it?&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-440&#34;&gt;June 13, 2006 10:11 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Jason,&lt;/p&gt;
&lt;p&gt;That&amp;rsquo;s actually the use case around which RDFa was designed, and what its designers more typically expect people to do. (I&amp;rsquo;m probably not the first to describe RDFa as &amp;ldquo;microformats done right.&amp;rdquo;) Because of publishing use cases from the PRISM group, I was more interested in seeing how well it worked with more out-of-line metadata, and as it turned out, it works fine. That&amp;rsquo;s why my examples focus on that.\&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2006">2006</category>
      
      <category domain="https://www.bobdc.com//categories/semantic-web">semantic web</category>
      
    </item>
    
    <item>
      <title>Writing about software: lists</title>
      <link>https://www.bobdc.com/blog/writing-about-software-lists/</link>
      <pubDate>Thu, 01 Jun 2006 08:44:34 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/writing-about-software-lists/</guid>
      
      
      <description><div>Bulleted lists Numbered lists Other kinds of lists</div><div>&lt;p&gt;As part of my sporadic series on &lt;a href=&#34;http://www.snee.com/bobdc.blog/publishing/documenting_software/&#34;&gt;documenting software&lt;/a&gt;, I wanted to devote a posting to lists, because they play a much larger role in tech writing than they do in typical prose. People reading a chapter of software documentation are less likely to read it from beginning to end than they are to skim it looking for an answer, and if the question is &amp;ldquo;how do I accomplish task X&amp;rdquo;, the answer is probably a multi-part answer. When the instructions for task X are broken up into separate list items, they&amp;rsquo;re easier to find and easier to follow—after carrying out step 3, it&amp;rsquo;s easier to go back and find step 4 if it&amp;rsquo;s indented with a big &amp;ldquo;4&amp;rdquo; in front of it than if it&amp;rsquo;s in the middle of big dense paragraph.&lt;/p&gt;
&lt;img src=&#34;https://www.bobdc.com/img/main/listicons.jpg&#34; alt=&#34;[list icons]&#34; border=&#34;0&#34; align=&#34;right&#34; hspace=&#34;30px&#34; vspace=&#34;30px&#34;/&gt;
&lt;p&gt;The two basic kinds of lists, both so common that word processing programs usually offer tool bar buttons to quickly create each, are bulleted lists and numbered lists. The former is sometimes called an &amp;ldquo;unordered list,&amp;rdquo; giving us the &lt;code&gt;ul&lt;/code&gt; element name used by HTML to create them, and numbered lists are often called &amp;ldquo;ordered lists&amp;rdquo;, giving us the HTML &lt;code&gt;ol&lt;/code&gt; element.&lt;/p&gt;
&lt;p&gt;Sometimes people use one when they should use the other, especially in PowerPoint presentations. The test for which to use is simple:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Think about whether changing the order of your items would cause real problems. It may improve your presentation to save a certain point for last, but if that point came earlier and your audience still got all the information that they need, then that&amp;rsquo;s not a real problem. If you tell your readers to pour oil into the engine before tell them to remove the cap, that&amp;rsquo;s a real problem.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;If the ordering really matters, make it a numbered list. Numbered lists are for giving directions, whether you&amp;rsquo;re telling someone how to drive to your house, how to install a piece of software, or how to perform a particular task with that software.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;I&amp;rsquo;ll admit, that was a pretty contrived example of a numbered list. Lists with only two items, especially numbered lists, are usually not worth formatting as lists. (There are more contrived examples to come.)&lt;/p&gt;
&lt;p&gt;Bulleted lists are popular for for:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Listing issues to keep in mind when considering a particular topic.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Highlighting the good points of something, which is why bulleted lists are popular in marketing literature.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Naming the bad points of an entity or course of action.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Showing short examples of something.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Identifying future issues to think about that build on what the reader has seen so far.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In the sample bulleted list above, note that none of the list items are complete sentences and that they all end with a period. If all the items in a list are complete sentences, or if none ends with punctuation, that&amp;rsquo;s fine, but remember to be consistent throughout a given list. I once saw a bulleted list on a resume from someone applying for a tech writer job in which some items ended with periods and some didn&amp;rsquo;t. If the applicant had wanted to be anything but a tech writer (or perhaps a proofreader or copy editor) this would have been acceptable, but in this case it wasn&amp;rsquo;t. We didn&amp;rsquo;t call him in for an interview.&lt;/p&gt;
&lt;p&gt;You don&amp;rsquo;t have to limit each list item to one phrase, sentence, or even one paragraph. For example:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;This first list item in this numbered list has two paragraphs.&lt;/p&gt;
&lt;p&gt;Additional paragraphs in a list item let you give more background on a particular point without having a large, dense paragraph as your list item. If you put too many paragraphs in a single list item, though, that section of the list looks less and less like a list, which confuses the reader.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;If the content that you&amp;rsquo;re writing must be converted from one format to another, make sure that these multi-paragraph list items don&amp;rsquo;t get mangled. Some conversion programs assume that every paragraph in a list is a new item.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Another advanced technique, which is pretty ubiquitous in PowerPoint presentations, is nested lists.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Inside a numbered list item:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;A nested numbered list can break down a step from the main list into its constituent parts.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;A nested bulleted list can highlight certain issues to keep in mind when executing the step.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Inside a bulleted list item:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;A numbered list points out issues related to the list item where the order matters.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;A bulleted list breaks out the point into subissues.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;(I warned you that many of these examples would be contrived.) I didn&amp;rsquo;t pick the separate bullet characters for the list and sublists above. Because I put a &lt;code&gt;ul&lt;/code&gt; element inside of another &lt;code&gt;ul&lt;/code&gt; element&amp;rsquo;s &lt;code&gt;li&lt;/code&gt; list item element when creating this HTML, browsers change the bullet automatically to show you the different nesting levels. To change the nesting level of list items in a word processor, select them and click the indent or outdent button on the toolbar.&lt;/p&gt;
&lt;p&gt;HTML offers a third type of list known as a definition list. Instead of demonstrating it, I&amp;rsquo;ll point you to the &lt;a href=&#34;http://www.docbook.org/tdg5/en/html/ch02.html#d0e2426&#34;&gt;list of eight list types offered by DocBook&lt;/a&gt;, which itself is a definition list. The DocBook DTD includes a wider choice of list types because it offers the ability to address technical documentation issues at a much finer-grained level of detail. It&amp;rsquo;s worth reading through the list describing the eight types.&lt;/p&gt;
&lt;p&gt;If you think that lists are not something to get passionate about, you&amp;rsquo;re absolutely right. They&amp;rsquo;re a tool for technical communication. I can&amp;rsquo;t finish up, though, without mentioning Information Mapping®, which has had an intermittent influence on tech writing over the years. Robert Horn, a psychologist at Columbia University, developed some theories, a methodology, and &lt;a href=&#34;http://www.infomap.com&#34;&gt;a company&lt;/a&gt; dedicated to more efficient organization and communication of information. If I can oversimplify their approach, I would say that whenever you can break something down into a list, you should. They call the process of breaking down information &amp;ldquo;chunking&amp;rdquo;, and I must say I do like the use of &amp;ldquo;chunk&amp;rdquo; as a verb. There&amp;rsquo;s a wide-eyed, evangelical zeal to the serious devotees&amp;rsquo; belief that &lt;em&gt;everything&lt;/em&gt; can be information mapped—when a former co-worker took a course in Information Mapping, the certificate he received for doing so was formatted as an Information Map. Of only five questions on the &lt;a href=&#34;http://www.infomap.com/index.cfm/TheMethod/Mapping_FAQ&#39;s&#34;&gt;company&amp;rsquo;s FAQ&lt;/a&gt;, one of them is titled &amp;ldquo;Handling resistance to mapping.&amp;rdquo; All your information &lt;a href=&#34;http://en.wikipedia.org/wiki/All_your_base_are_belong_to_us&#34;&gt;are belong to us&lt;/a&gt;!&lt;/p&gt;
&lt;p&gt;As a result of some serious quantitative research, there is certainly some value to Information Mapping, and a tech writer can learn a lot from it without drinking the Kool Aid. If their approach to modular content architecture didn&amp;rsquo;t have a direct influence on DITA, it certainly had a strong indirect influence, because &lt;a href=&#34;http://www.stc.org/&#34;&gt;serious tech writers&lt;/a&gt; had been aware of it for years.&lt;/p&gt;
&lt;p&gt;I can&amp;rsquo;t resist closing with a bulleted list:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Take a closer look at how lists are used in the documentation you see and what choices their authors made in presenting them.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;If you explore the more advanced possibilities such as multi-paragraph list items or nested lists, check that conversion routines converted them properly. People who write these routines may not have considered the less simplistic possibilities.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;If you&amp;rsquo;re doing more complex technical writing using the DocBook DTD, get to know its other list types and make sure that your production routines can handle any that you use. Lots of the DocBook DTD is optional, and conversion routines often don&amp;rsquo;t address everything that may come up.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;3-comments&#34;&gt;3 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://infinitesque.net/&#34; title=&#34;http://infinitesque.net/&#34;&gt;John L. Clark&lt;/a&gt; on &lt;a href=&#34;#comment-431&#34;&gt;June 1, 2006 1:55 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;What about using numbered lists in order to give each item in an otherwise unsequential list an identity (even if only for the visual presentation of the current version of the list)? &amp;ldquo;I want to make statement `foo&amp;rsquo; about item 7 in that list.&amp;rdquo;&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Should we use lists like this?&lt;/li&gt;
&lt;li&gt;How does this approach integrate with your suggestions above?&lt;/li&gt;
&lt;li&gt;How can we differentiate between numbering lists for reference and numbering lists for sequence, first to our readers and second in our markup?&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-432&#34;&gt;June 1, 2006 9:14 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Good point: the use of numbers to provide an addressing system. I&amp;rsquo;d be more inclined to do what contracts and laws do (although they also use regular Arabic numerals): use letters or upper- or lower-case roman numerals to make it easier to reference specific paragraphs. Of course, sometimes the order does matter in those and sometimes it doesn&amp;rsquo;t, and they often use these alternate ways to identify nesting levels, just like the different bullets shown in my nested bulleted list.&lt;/p&gt;
&lt;p&gt;I just learned about the &lt;a href=&#34;http://www.w3.org/TR/CSS21/generate.html#lists&#34;&gt;list-style-type&lt;/a&gt; CSS property for the ol element, which makes this possible. As to differentiating between the two types you describe in the markup, the use of a class attribute to trigger the use of the list-style-type would do that&amp;ndash;just pick a good name for the attribute value!&lt;/p&gt;
&lt;p&gt;As I explained at &lt;a href=&#34;http://www.oreillynet.com/pub/wlg/4772,&#34;&gt;http://www.oreillynet.com/pub/wlg/4772,&lt;/a&gt; I put IDs on all my block level elements so that I can link to them. For example, &lt;a href=&#34;https://www.bobdc.com/blog/writing-about-software-lists#i110&#34;&gt;this&lt;/a&gt; links to the third item in the first bulleted list above.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.la-grange.net/&#34; title=&#34;http://www.la-grange.net/&#34;&gt;karl&lt;/a&gt; on &lt;a href=&#34;#comment-433&#34;&gt;June 4, 2006 8:28 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;There&amp;rsquo;s an interesting about the nature of XML and its infoset, it is an &lt;strong&gt;ordered&lt;/strong&gt; tree. What does that mean? It means that an &amp;ldquo;unordered list&amp;rdquo; is really a semantic definition and not the nature of XML itself, then at the root level of the markup, there is no difference between ul and ol. There are exactly the same by nature.&lt;/p&gt;
&lt;p&gt;The real difference is made when a user agent (browser, authoring tool, bot, indexing engine, semantics engine) has been specifically programmed to make sense of it. This is one of the big trouble of the Web right now. Most of the browsers are not semantics tool but (CSS) renderers with very little semantics capacity.&lt;/p&gt;
&lt;p&gt;A graph is often not ordered at all, so lists in RDF have no sequential reading except if you create a mechanism which says this has to be read in this order.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2006">2006</category>
      
      <category domain="https://www.bobdc.com//categories/documenting-software">documenting software</category>
      
    </item>
    
    <item>
      <title>XML: too flexible?</title>
      <link>https://www.bobdc.com/blog/xml-too-flexible/</link>
      <pubDate>Fri, 26 May 2006 09:01:45 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/xml-too-flexible/</guid>
      
      
      <description><div>Some biologists really like relational databases.</div><div>&lt;p&gt;This week I gave a talk to some biology researchers at the University of Virginia. The basic thesis was that large databases typically need to fit into the neat rows and columns of relational tables, but that new XQuery/XML databases let you store and retrieve huge amounts of data with potentially much more complex structure, and that while this has obvious applications in the publishing world—the world that begat XML—it could have useful applications in other domains as well. I never got past ninth grade biology, but I&amp;rsquo;d read a little on bioinformatics recently, and these people are accumulating and combing through a lot of complex data.&lt;/p&gt;
&lt;p&gt;Going into the talk, I assumed that the listeners either were or weren&amp;rsquo;t familiar with XML, and for those who weren&amp;rsquo;t I&amp;rsquo;d explain the basics and we&amp;rsquo;d go from there. I didn&amp;rsquo;t count on people who considered themselves familiar with it but had some misconceptions based on their own use of it. XML is popular in the sciences as an interchange format, and one professor in particular had a difficult time believing that applications could be built with rigorously structured XML.&lt;/p&gt;
&lt;p&gt;He said several times (based on one of my slides showing sample XML) that if someone can put a &lt;code&gt;title&lt;/code&gt; element anywhere they want, then his application won&amp;rsquo;t know where to look for it, and I kept going back to my slide showing the declaration for the &lt;code&gt;chapter&lt;/code&gt; element that said that one and only one &lt;code&gt;title&lt;/code&gt; element had to go at the very beginning of each chapter. I&amp;rsquo;m probably misrepresenting some of what he said, because we went around in circles a few times without completely understanding each other, but one of his key points was that he likes how the normalization process forces someone (in his case, his graduate students) to really think through the relationships between the pieces of information they&amp;rsquo;re storing. This sounded to me like claiming that a well-designed relational database was better than a badly-designed XML database.&lt;/p&gt;
&lt;p&gt;Thinking back on it, I realize that while XML is common outside of the publishing world, highly structured XML is not as common, except in cases where it&amp;rsquo;s an interchange format that maps directly to some pre-existing relational tables or Java classes, as is often the case with more transactional uses of XML. The biologists had heard of DTDs and schemas, but hadn&amp;rsquo;t bothered with them much, because looking at a handful of XML for a given data class showed them the structure they needed to know. Validation technologies such as Schematron and RELAX NG were understandably way off their radar.&lt;/p&gt;
&lt;p&gt;I did have a slide saying that an advantage of XML databases over object-oriented databases (the other technology that tried to take large databases beyond rows and columns) was that prototyping was a lot easier: you can just throw together some XML and start querying it, while the analysis and design of object-oriented systems can mean a lot of up-front work before you can actually do anything with your data—note how many big fat books there are just on OO analysis and just on OO design. While discussing this slide, I mentioned that for a serious production XML application you should create checkpoints for design review and analysis and so forth before you build too many application dependencies on your thrown-together data, but it looks like I need more slides to make it clearer that while XML can be as flexible as you want, the developer can have a lot of control over the degree of that flexibility, and that large, carefully controlled systems have been built that never would have worked with relational databases—in the print and online publishing worlds, at least.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2006">2006</category>
      
      <category domain="https://www.bobdc.com//categories/xml">XML</category>
      
    </item>
    
    <item>
      <title>Me as 80s New York lead guitarist</title>
      <link>https://www.bobdc.com/blog/me-as-80s-new-york-lead-guitar/</link>
      <pubDate>Wed, 24 May 2006 07:48:57 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/me-as-80s-new-york-lead-guitar/</guid>
      
      
      <description><div>A Christmas present from my brother: a CD of studio visits by a band I was in.</div><div>&lt;p&gt;In the first half of my twenty-five years of living in New York City, I played lead guitar in two serious bands and bass and miscellaneous in several fooling-around-with-friends bands. By &amp;ldquo;serious,&amp;rdquo; I mean gigs consisting of one or two forty-minute bam-bam-bam sets with over ninety per cent originals, and any covers better be obscure and cool enough that everyone who recognizes them says &amp;ldquo;ooh, nice&amp;rdquo; (e.g. &amp;ldquo;Glory&amp;rdquo; from &lt;a href=&#34;https://en.wikipedia.org/wiki/Adventure_(Television_album)&#34;&gt;Television&amp;rsquo;s second album&lt;/a&gt;. You play for minimal money in clubs with good sound and lights with the prime goal of showing that you sound and look professional and have the right material to be signed to a record label, and you can usually assume that there are label people around.&lt;/p&gt;
&lt;p&gt;The first band, the ExHusbands, began with the goal of being the most successful band at Columbia University and occasionally playing downtown clubs, and that was not too difficult. Becoming a big-time downtown band was much more difficult, although it took a big jump when we were discovered and managed by Danny Fields, who had worked with our idols (the Velvet Underground, Iggy and the Stooges, the MC5, the Doors, the Ramones) and whose roots in cool New York scenes went back to Andy Warhol&amp;rsquo;s Factory. We quit school and worked hard at it, and played Max&amp;rsquo;s Kansas City (the first time, shortly before my twentieth birthday, so I was happy to say I did it as a teenager), Hurrah&amp;rsquo;s, the Ritz, Danceteria, many now-defunct clubs, and mostly, CBGBs. We were considered a CB&amp;rsquo;s band. Although we never got a label deal, we went through a period of being treated like the Next Big Thing by people who seemed to know, and that was thrilling at that age. This was all around 1979 - 82, so you could call what we were doing New Wave, although outside of the Velvet Underground and Television (and especially for our singer, Iggy), we were mostly into sixties British Invasion bands. The first time I saw another CB&amp;rsquo;s band that has hit the big time since then, the Strokes, their poppy Velvet Underground approach and their choice of shirts reminded me of the ExHusbands.&lt;/p&gt;
&lt;p&gt;Four or so years later my second cousin Kris Woolsey started The Hunting Accident. We don&amp;rsquo;t remember the details of how we came up with the name, but a bottle of Jägermeister was involved. Today, I&amp;rsquo;d say we were shooting for post-&lt;a href=&#34;https://en.wikipedia.org/wiki/Big_Star&#34;&gt;Big Star&lt;/a&gt; rockin&amp;rsquo; pop, as were the dBs, Matthew Sweet, and the Replacements, who in particular seemed to define a time and place and sound and attitude for me and a lot of my friends. (I can complain about Wikipedia with the best of them, but I just looked at their entry for &lt;a href=&#34;http://en.wikipedia.org/wiki/Power_pop&#34;&gt;power pop&lt;/a&gt;, and it&amp;rsquo;s remarkably well done.) Kris has since gone on to do some work with Fountains of Wayne, which fits in well with my attempt to summarize the band&amp;rsquo;s sound.&lt;/p&gt;
&lt;p&gt;Why I&amp;rsquo;m writing all this: last Christmas, my brother made me a CD of the Hunting Accident&amp;rsquo;s two studio sessions, which makes up about an album&amp;rsquo;s worth of material. I made a &lt;a href=&#34;http://www.snee.com/music/ha&#34;&gt;web page of MP3s&lt;/a&gt; of it for the other guys in the band. As it says, I only wrote one of the songs, but it was one of the band&amp;rsquo;s more popular ones, and friends who were more serious about songwriting than I was liked it a lot, so I was pretty proud of it. We played the Bottom Line, Maxwell&amp;rsquo;s, Tramps, more now-defunct clubs, and CBGB a lot. One memory is standing on the stage at CB&amp;rsquo;s with the &lt;a href=&#34;https://en.wikipedia.org/wiki/The_Del-Lords&#34;&gt;Del Lords&lt;/a&gt; at midnight counting down to the new year of 1988, followed by an only somewhat rehearsed version of Auld Lange Syne. (I also remember how my future wife, who I had met about two months earlier, had folded over the &amp;ldquo;PPY&amp;rdquo; on a &amp;ldquo;HAPPY NEW YEAR&amp;rdquo; paper tiara so that it said &amp;ldquo;HA NEW YEAR.&amp;rdquo;) The Hunting Accident also fell apart before getting a deal.&lt;/p&gt;
&lt;p&gt;Wandering around downtown New York the summer before last, I stopped into CB&amp;rsquo;s, and BG Berlin was still working the door. He looked up and said &amp;ldquo;Bob DuCharme!&amp;rdquo; (He was always very good with names, and I had hung out there a lot when I first moved downtown from the Columbia area.) I told him that my parents still tell the story of waiting through &lt;a href=&#34;https://en.wikipedia.org/wiki/DNA_(American_band)&#34;&gt;DNA&lt;/a&gt;&amp;rsquo;s set to see the ExHusbands play there one Saturday night; DNA was probably Arto Lindsay&amp;rsquo;s most dissonant effort, as a leading light of the &amp;ldquo;No Wave&amp;rdquo; scene, and with a post-Pere Ubu Tim Wright on bass and Ikue Mori on drums, it was really great, but a bit much for my parents. In those days, the four sets of the night were at midnight, 1, 2, and 3, with the headliners at 2, and we were playing at 3. (Each time my parents tell the story, our set gets later.) As I told BG this story in the summer of 2004 I realized that I was the same age that my mother had been when my parents came to see us that night twenty-six years ago. My parents were remarkably supportive of the overall venture, though, considering that dropping out of an Ivy League school to be in a full-time rock and roll band is traditionally unpopular with parents. I&amp;rsquo;m sure that my father&amp;rsquo;s New York acting career, which was cut short by the draft, played a role in their understanding.&lt;/p&gt;
&lt;p&gt;Now that I&amp;rsquo;m too old for that sort of thing, I&amp;rsquo;ve been working on playing jazz on an upright bass for about two and a half years. I&amp;rsquo;ve got all kinds of ideas for making music on computers, but I spend too much time on computers, so struggling with a big tactile vibrating box to play music that is gradually revealing itself to me better and will never go out of style seems like a healthier way to spend unpaid time than more typing and staring at glowing screens.&lt;/p&gt;
&lt;p&gt;Of course, if the right equipment, bass player and drummer were all at an XML-related event in the future, Eve Maler and I would be happy to front a Zeppelin cover band. We&amp;rsquo;ve discussed it, and did finish a geeky rock and roll free-for-all at the 2004 Oxford XML Summer School with &amp;ldquo;Rock and Roll&amp;rdquo; from Zep 4 once with Kal Ahmed on bass. If Tim Bray has a cello and pickup handy, perhaps we could take on &amp;ldquo;Kashmir.&amp;rdquo;&lt;/p&gt;
&lt;h2 id=&#34;7-comments&#34;&gt;7 Comments&lt;/h2&gt;
&lt;p&gt;By deltabob on &lt;a href=&#34;#comment-421&#34;&gt;May 24, 2006 11:38 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I love Matthew Sweet and the Replacements. I graduated high school in 1992 - in Appalachia - so those weren&amp;rsquo;t exactly popular sounds at that time and place.&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s great to hear the names again though - I&amp;rsquo;ll have to go digging through my old cassette tape collection (provided my attic hasn&amp;rsquo;t completely ruined them).&lt;/p&gt;
&lt;p&gt;By Rick Jelliffe on &lt;a href=&#34;#comment-422&#34;&gt;May 24, 2006 11:55 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I wonder if we should make a podcast of music by XML Geeks? Len Bullard has sent me a CD he made, and of course Charles Goldfarb used to play piano professionally (in army bands, or for army audiences, IIRC). Dale Waldt plays guitar and I think Elliot Kimber used to organize some kind of musical events. A few other XML people have sent me comments on my Synthesizer articles at oreilly digital media.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-423&#34;&gt;May 24, 2006 12:24 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Rick,&lt;/p&gt;
&lt;p&gt;Tony Coates has had some ideas about some distributed cooperative music creation among XML geeks, which would work better for you synth guys than us string pluckers. (Of course, Tony qualifies as both, if the strings are a bit short.)&lt;/p&gt;
&lt;p&gt;Wouldn&amp;rsquo;t it be funny, though, if we spent more time discussing whether to allow CDATA sections in the podcast feed than in actually making music?&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://scottysengineeringlog.net&#34; title=&#34;http://scottysengineeringlog.net&#34;&gt;Scott Hudson&lt;/a&gt; on &lt;a href=&#34;#comment-424&#34;&gt;May 24, 2006 12:55 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Don&amp;rsquo;t forget about the brass guys! I&amp;rsquo;m a F Horn player, though not professionally&amp;hellip;&lt;/p&gt;
&lt;p&gt;&amp;ndash;Scotty&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://dannyayers.com&#34; title=&#34;http://dannyayers.com&#34;&gt;Danny&lt;/a&gt; on &lt;a href=&#34;#comment-425&#34;&gt;May 24, 2006 4:15 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Marvellous history Bob.&lt;/p&gt;
&lt;p&gt;After a *long* break the wife and I just recently flipped the other way, from synths (initially analog) to guitar and bass - tactile is good. I&amp;rsquo;m now very much a tender-fingered newbie, but if you have any suggestions for not-over-tricky Zep songz to cover (ideally in Turtle notation), I&amp;rsquo;d be most grateful.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.tbray.org/ongoing/&#34; title=&#34;http://www.tbray.org/ongoing/&#34;&gt;Tim Bray&lt;/a&gt; on &lt;a href=&#34;#comment-428&#34;&gt;May 25, 2006 11:23 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Kashmir? Hell, I wanna play Whole Lotta Love.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-429&#34;&gt;May 26, 2006 7:30 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;And I believe Page did break out the bow for that one.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2006">2006</category>
      
      <category domain="https://www.bobdc.com//categories/miscellaneous">miscellaneous</category>
      
    </item>
    
    <item>
      <title>Download as spreadsheet</title>
      <link>https://www.bobdc.com/blog/download-as-spreadsheet/</link>
      <pubDate>Tue, 16 May 2006 19:56:02 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/download-as-spreadsheet/</guid>
      
      
      <description><div>A one-line text file tells your web server to send a directory&#39;s HTML files as &#34;spreadsheets&#34;.</div><div>&lt;p&gt;I used to think that a website&amp;rsquo;s &amp;ldquo;download as spreadsheet&amp;rdquo; button triggered some back end process that created a binary Excel spreadsheet on the server and sent that to your browser, much like many &amp;ldquo;download as PDF&amp;rdquo; links do. It turns out that it&amp;rsquo;s much, much simpler than that.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://www.snee.com/xml/wks/wks1.html&#34;&gt;&lt;img src=&#34;https://www.bobdc.com/img/main/xls.jpg&#34; alt=&#34;[Excel icon logo]&#34; border=&#34;0&#34; align=&#34;right&#34; hspace=&#34;30px&#34; vspace=&#34;30px&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The key is that the server doesn&amp;rsquo;t really send the file as a spreadsheet. It sends it as HTML with a MIME &lt;a href=&#34;http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.7&#34;&gt;media type&lt;/a&gt; of &amp;ldquo;application/vnd.ms-excel&amp;rdquo;. In other words, the server adds an HTTP header field that tells your browser &amp;ldquo;here comes an Excel spreadsheet, so display this with whatever program you use to view those&amp;rdquo;, and Excel can open up HTML files. (Tell OpenOffice Calc to open an HTML file and it opens it up in OpenOffice Writer, the word processing program.) This way, users think that they&amp;rsquo;re downloading spreadsheets. Lots of HTML formatting is preserved and numbers get treated as numbers, so that if after downloading you add a function like &lt;code&gt;=sum()&lt;/code&gt; in a cell that references other cells, it does the math properly. You&amp;rsquo;ll want your HTML to have one or more tables in it—there&amp;rsquo;s not much point in sending a Shakespeare play to Excel.&lt;/p&gt;
&lt;p&gt;The first few times I did this, I wrote &lt;a href=&#34;http://www.oreilly.com/openbook/webclient/appa.html&#34;&gt;perl&lt;/a&gt; and &lt;a href=&#34;http://gnosis.cx/publish/programming/feature_5min_python.html&#34;&gt;python&lt;/a&gt; CGI scripts to send the HTTP header identifying the MIME type to the browser with the file, but if you&amp;rsquo;re using an Apache web server, an &lt;a href=&#34;http://httpd.apache.org/docs/1.3/howto/htaccess.html&#34;&gt;.htaccess&lt;/a&gt; file gives you an easier way to do it. Among other things, an .htaccess file lets you say &amp;ldquo;for files in this directory with extension foo, the web server should deliver them with MIME type bar.&amp;rdquo; I first learned about these files when someone told me that this weblog&amp;rsquo;s Atom feed wasn&amp;rsquo;t being delivered with the correct MIME type, and he recommended that I fix it with an .htaccess file. I created the following one-line file in my &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34;&gt;http://www.snee.com/bobdc.blog&lt;/a&gt; directory:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;AddType application/atom+xml .atom
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;To deliver HTML tables as spreadsheets with no CGI coding, I created a &lt;a href=&#34;http://www.snee.com/xml/wks&#34;&gt;http://www.snee.com/xml/wks&lt;/a&gt; directory and put the following .htaccess file in it:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;AddType application/vnd.ms-excel .html
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This tells the Apache server to deliver any files with an extension of html &lt;em&gt;that are in this directory&lt;/em&gt; (I certainly wouldn&amp;rsquo;t want this to apply to all HTML files!) with a MIME type of &amp;ldquo;application/vnd.ms-excel&amp;rdquo;. You can see a test result by looking at &lt;a href=&#34;http://www.snee.com/xml/wks/wks1.html&#34;&gt;http://www.snee.com/xml/wks/wks1.html&lt;/a&gt;. I threw in two copies of a table and some other HTML elements to try to confuse Excel, but it wasn&amp;rsquo;t confused. To see what the file really looks like, see &lt;a href=&#34;http://www.snee.com/xml/wks/wks1.xml&#34;&gt;http://www.snee.com/xml/wks/wks1.xml&lt;/a&gt;, an exact copy whose delivery is unaffected by the .htaccess file.&lt;/p&gt;
&lt;p&gt;So, if your website includes tables and you&amp;rsquo;d like to offer viewers the option to download them as spreadsheets, you can keep HTML copies in a directory with an .htaccess file like the one shown above and point your &amp;ldquo;download as spreadsheet&amp;rdquo; links there. Or, if you have a more complex system that generates pages on the fly, the content generation routines hopefully give you some way to set the Content-type in the HTTP header to &amp;ldquo;application/vnd.ms-excel&amp;rdquo; for selected HTML output.&lt;/p&gt;
&lt;p&gt;If I was writing this from home instead of a hotel room, I&amp;rsquo;d try it on our Mac and Linux machines to see if the file opens up in OpenOffice Calc. Of course, if I was at home, I&amp;rsquo;d have other things to do besides playing with MIME media type tricks.&lt;/p&gt;
&lt;h2 id=&#34;4-comments&#34;&gt;4 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://danbri.org/&#34; title=&#34;http://danbri.org/&#34;&gt;Dan Brickley&lt;/a&gt; on &lt;a href=&#34;#comment-411&#34;&gt;May 16, 2006 8:27 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Firefox 1.5.0.3 (on Windows XP) says &lt;a href=&#34;http://www.snee.com/xml/wks/wks1.html&#34;&gt;http://www.snee.com/xml/wks/wks1.html&lt;/a&gt; &amp;ldquo;is an HTML document&amp;rdquo; and offers to open it with an application called, er, Firefox&amp;hellip; Maybe they&amp;rsquo;re content sniffing?&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-412&#34;&gt;May 16, 2006 9:07 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Hmmm, I&amp;rsquo;m using Firefox 1.5.0.3 on XP too&amp;ndash;XP professional? It also works from IE, which interestingly enough tells me &amp;ldquo;You are downloading the file: wks1.xls from &lt;a href=&#34;https://www.snee.com&#34;&gt;www.snee.com&lt;/a&gt;&amp;rdquo; even though the file is named wks1.html.&lt;/p&gt;
&lt;p&gt;I just made another duplicate of the file and called it wks1.xls in the same directory, and I added the following line to .htaccess for good measure:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;AddType application/vnd.ms-excel .xls
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Firefox and IE both send &lt;a href=&#34;http://www.snee.com/xml/wks/wks1.xls&#34;&gt;http://www.snee.com/xml/wks/wks1.xls&lt;/a&gt; to Excel without a problem when I try them. Let me know if it&amp;rsquo;s any better with your copy of 1.5.0.3.&lt;/p&gt;
&lt;p&gt;Of course, the real moral of the story is that anyone who does this for a serious production app instead of just playing around like me has a lot of testing to do.&lt;/p&gt;
&lt;p&gt;\&lt;/p&gt;
&lt;p&gt;By Ed Davies on &lt;a href=&#34;#comment-413&#34;&gt;May 17, 2006 4:38 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Firefox 1.5.03 on XP Pro: I saw the same as Dan B for the .html file. For the .xls file it opened fine in OOo 1.1.2. It seems to me to be a bit naughty of Firefox to be looking inside the URL which is supposed to be opaque. curl &amp;ndash;head confirms to me that the headers returned for the two files have no significant differences.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-414&#34;&gt;May 17, 2006 11:04 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;That explains why I didn&amp;rsquo;t have this problem when doing it the CGI way: because the URL of the &amp;ldquo;resource&amp;rdquo; ended with .pl or .py.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2006">2006</category>
      
      <category domain="https://www.bobdc.com//categories/neat-tricks">neat tricks</category>
      
    </item>
    
    <item>
      <title>Minor new email</title>
      <link>https://www.bobdc.com/blog/minor-new-email/</link>
      <pubDate>Fri, 12 May 2006 07:54:30 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/minor-new-email/</guid>
      
      
      <description><div>A mail productivity trick.</div><div>&lt;p&gt;I check my email several times a day, and typically find two or three new messages each time. A mailing list on local issues is the source of a few too many of these, but I worried that if I routed these to their own folder that I&amp;rsquo;d forget to check it. (I can go days without checking my xml-dev folder.) Because the messages are not made publicly available, I had reservations about &lt;a href=&#34;http://www.xml.com/pub/a/2005/11/23/hacking-ebay-turning-email-alerts-into-atom.html&#34;&gt;converting them to an Atom feed&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I had an idea that&amp;rsquo;s worked out well. I created a folder called &amp;ldquo;minor new email&amp;rdquo; and created rules that forward mail from several sources there: the local mailing list, frequent flier statements, Ticketmaster announcements about upcoming concerts, and mail from dell.com other other companies who have my email address because I once bought something from them.&lt;/p&gt;
&lt;p&gt;I do remember to check this folder once a day, and new messages now show up in my main inbox less frequently and are more likely to be good reasons to interrupt my real work of the day. I hate to delete a Ticketmaster email until I&amp;rsquo;ve looked through it for potentially interesting concerts and forwarded it to my wife if so, but that&amp;rsquo;s not worth losing my train of thought on more serious work, and the cumulative effect of similar emails spread out across the day can take a chunk out of your schedule.&lt;/p&gt;
&lt;p&gt;There&amp;rsquo;s another nice email productivity trick that I&amp;rsquo;ve done for a few years and that Sean McGrath &lt;a href=&#34;http://seanmcgrath.blogspot.com/2005_08_21_seanmcgrath_archive.html#112487156301619634&#34;&gt;discovered&lt;/a&gt; last summer: when reviewing your spam, sort it by sender name. This groups foreign character sets together and repeat offenders as well, making it easier to quickly skim through without missing false positives that your spam checker flagged.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2006">2006</category>
      
      <category domain="https://www.bobdc.com//categories/neat-tricks">neat tricks</category>
      
    </item>
    
    <item>
      <title>My new job</title>
      <link>https://www.bobdc.com/blog/my-new-job/</link>
      <pubDate>Mon, 08 May 2006 12:36:19 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/my-new-job/</guid>
      
      
      <description><div>Joining Innodata Isogen.</div><div>&lt;p&gt;&lt;a href=&#34;http://www.innodata-isogen.com/&#34;&gt;&lt;img src=&#34;http://www.innodata-isogen.com/img/innodata_isogen_logo.gif&#34; alt=&#34;[Innodata Isogen logo]&#34; border=&#34;0&#34; align=&#34;right&#34; hspace=&#34;30px&#34; vspace=&#34;30px&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;m happy to announce that I&amp;rsquo;ve accepted a full-time position as a senior consultant with the professional services group at &lt;a href=&#34;http://www.innodata-isogen.com&#34;&gt;Innodata Isogen&lt;/a&gt;. I have known Isogen employees and alumni since SGML days; one still-current employee, Eliot Kimber (a.k.a. &lt;a href=&#34;http://drmacros-xml-rants.blogspot.com/&#34;&gt;Dr. Macro&lt;/a&gt;) was on the W3C Working Group that invented XML. The Isogen people have remained on the cutting edge of XML and related work both inside and outside of the publishing world, and their acquisition by Innodata has broadened the scope of the kinds of work that both Innodata and Isogen can do. The Innodata Isogen consulting arm that grew out of the Isogen acquisition is based in Dallas, but I&amp;rsquo;ll be working from home. The company has other offices and production and development centers in Hackensack, Austin, India, Sri Lanka, and the Philippines.&lt;/p&gt;
&lt;p&gt;When looking for a technology-related job, much of the search is an investigation into what&amp;rsquo;s going on out there, and that&amp;rsquo;s been an interesting project. There&amp;rsquo;s definitely a lot of cool work going on. I look forward to taking part in this work with Innodata Isogen and with their development partners.&lt;/p&gt;
&lt;p&gt;(And of course, opinions expressed in this weblog do not represent those of Innodata Isogen.)&lt;/p&gt;
&lt;h2 id=&#34;2-comments&#34;&gt;2 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://seanmcgrath.blogspot.com&#34; title=&#34;http://seanmcgrath.blogspot.com&#34;&gt;Sean McGrath&lt;/a&gt; on &lt;a href=&#34;#comment-408&#34;&gt;May 8, 2006 1:40 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Best of luck!\&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://scottysengineeringlog.net&#34; title=&#34;http://scottysengineeringlog.net&#34;&gt;Scott Hudson&lt;/a&gt; on &lt;a href=&#34;#comment-409&#34;&gt;May 8, 2006 2:09 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Congratulations on your new position. I hope it&amp;rsquo;s a good fit for you. I wish you the best of luck!&lt;/p&gt;
&lt;p&gt;Best regards,&lt;/p&gt;
&lt;p&gt;&amp;ndash;Scott&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2006">2006</category>
      
      <category domain="https://www.bobdc.com//categories/miscellaneous">miscellaneous</category>
      
    </item>
    
    <item>
      <title>The W3C&#39;s web-based interface to Saxon 8.5</title>
      <link>https://www.bobdc.com/blog/the-w3cs-webbased-interface-to/</link>
      <pubDate>Tue, 02 May 2006 10:17:23 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/the-w3cs-webbased-interface-to/</guid>
      
      
      <description><div>Running XSLT 2 stylesheets with a URL.</div><div>&lt;p&gt;The W3C has made a web-based version of James Clark&amp;rsquo;s XT XSLT processor available since June of 2000, and Dan Brickley recently announced to the &lt;a href=&#34;mailto:semantic-web@w3.org&#34;&gt;semantic-web@w3.org&lt;/a&gt; mailing list that the W3C replaced the processor behind this service with Michael Kay&amp;rsquo;s &lt;a href=&#34;http://saxon.sourceforge.net/&#34;&gt;Saxon&lt;/a&gt;. You can use it by filling out &lt;a href=&#34;http://www.w3.org/2005/08/online_xslt/&#34;&gt;this form&lt;/a&gt; with URLs for your source document and stylesheet. When you click &amp;ldquo;transform&amp;rdquo;, in addition to running your stylesheet on your source document, you&amp;rsquo;ll see the URL that you could have entered to run the stylesheet with a REST interface.&lt;/p&gt;
&lt;p&gt;Because it&amp;rsquo;s Saxon 8.5, you can run XSLT 2.0 stylesheets on it. (I found out exactly which processor using &lt;a href=&#34;http://www.w3.org/2005/08/online_xslt/xslt?xslfile=http%3A%2F%2Fwww.snee.com%2Fxml%2Fxslt%2Fprocessorversion.xsl&amp;amp;xmlfile=http%3A%2F%2Fwww.snee.com%2Fxml%2Fxslt%2Fgroups.xml&amp;amp;content-type=&amp;amp;submit=transform&#34;&gt;this&lt;/a&gt; URL, which passes it a stylesheet that I described &lt;a href=&#34;http://www.xml.com/pub/a/2004/08/04/tr-xml.html&#34;&gt;here&lt;/a&gt;.) To see a 2.0 feature in action, &lt;a href=&#34;http://www.w3.org/2005/08/online_xslt/xslt?xslfile=http%3A%2F%2Fwww.snee.com%2Fxml%2Fxslt%2Fgrouping.xsl&amp;amp;xmlfile=http%3A%2F%2Fwww.snee.com%2Fxml%2Fxslt%2Fgroups.xml&amp;amp;content-type=&amp;amp;submit=transform&#34;&gt;this URL&lt;/a&gt; runs the &lt;a href=&#34;http://www.snee.com/xml/xslt/grouping.xsl&#34;&gt;http://www.snee.com/xml/xslt/grouping.xsl&lt;/a&gt; stylesheet against the &lt;a href=&#34;http://www.snee.com/xml/xslt/groups.xml&#34;&gt;http://www.snee.com/xml/xslt/groups.xml&lt;/a&gt; input file. This stylesheet, further described &lt;a href=&#34;http://www.xml.com/pub/a/2003/11/05/tr.html&#34;&gt;here&lt;/a&gt;, demonstrates XSLT 2.0&amp;rsquo;s grouping capabilities. Its plain text output won&amp;rsquo;t look like much on a browser, so do a View Source to see the grouped data.&lt;/p&gt;
&lt;p&gt;You don&amp;rsquo;t have to do this from a browser. If you pass it your URL using a utility like &lt;a href=&#34;http://www.gnu.org/software/wget/&#34;&gt;wget&lt;/a&gt;, &lt;a href=&#34;http://www.penguin-soft.com/penguin/man/1/dog.html&#34;&gt;dog&lt;/a&gt;, or &lt;a href=&#34;http://curl.haxx.se/&#34;&gt;cURL&lt;/a&gt;, you can use the W3C&amp;rsquo;s Saxon processor from a line in a shell script or batch file, making this a real boon to REST app development. The page tells us that the service is &amp;ldquo;not to be utilized as a regular service by sites other than w3.org. [The W3C] will consider blocking high volume usage or any usage that causes a strain on our Web servers&amp;rdquo;, so it&amp;rsquo;s really for use as a demo platform—letting you demo some really great capabilities.&lt;/p&gt;
&lt;p&gt;As Kanzaki Masahide pointed out to the same mailing list, this will be also be a great resource for people doing &lt;a href=&#34;http://www.w3.org/2004/01/rdxh/spec&#34;&gt;GRDDL&lt;/a&gt; work; if RDF metadata can be pulled from a web document by simply specifying the right URL, there will be more incentive to add that metadata to XHTML documents. The same applies to &lt;a href=&#34;http://www.w3.org/2001/sw/BestPractices/HTML/2006-04-24-rdfa-primer&#34;&gt;RDFa&lt;/a&gt; documents.&lt;/p&gt;
&lt;h2 id=&#34;2-comments&#34;&gt;2 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.ldodds.com/blog&#34; title=&#34;http://www.ldodds.com/blog&#34;&gt;Leigh Dodds&lt;/a&gt; on &lt;a href=&#34;#comment-406&#34;&gt;May 2, 2006 4:10 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Its certainly great to see the W3C update this service, I&amp;rsquo;ve been a long-standing user of their original to do some quick online hacks and conversions.&lt;/p&gt;
&lt;p&gt;I released a similar service last week, also based on Saxon. Docs here:&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://xmlarmyknife.org/docs/xslt/transform/&#34;&gt;http://xmlarmyknife.org/docs/xslt/transform/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;m intending to extend this to support proper HTTP caching as well as having local copies of &amp;ldquo;popular&amp;rdquo; stylesheets, e.g. the EXSLT work. Suggestions for additional features gratefully appreciated!\&lt;/p&gt;
&lt;p&gt;By Stephen De Gabrielle on &lt;a href=&#34;#comment-407&#34;&gt;May 3, 2006 10:36 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;This is obvious but still fun.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://validator.w3.org/check?uri=http%3A%2F%2Fwww.w3.org%2F2005%2F08%2Fonline_xslt%2Fxslt%3Fxslfile%3Dhttp%253A%252F%252Fwww.kanzaki.com%252Fparts%252Fxsltdoc.xsl%26xmlfile%3Dhttp%253A%252F%252Fwww.kanzaki.com%252Fparts%252Fxsltdoc.xsl%26content-type%3D%26submit%3Dtransform&amp;amp;charset=%28detect+automatically%29&amp;amp;doctype=Inline&amp;amp;ss=1&#34;&gt;Validation of the w3c saxon xslt processor&lt;/a&gt; using &lt;a href=&#34;http://www.kanzaki.com/parts/xsltdoc.xsl&#34;&gt;http://www.kanzaki.com/parts/xsltdoc.xsl&lt;/a&gt; to document itself.&lt;/p&gt;
&lt;p&gt;\&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2006">2006</category>
      
      <category domain="https://www.bobdc.com//categories/xslt">XSLT</category>
      
    </item>
    
    <item>
      <title>A nice XSLT documentation tool</title>
      <link>https://www.bobdc.com/blog/a-nice-xslt-documentation-tool/</link>
      <pubDate>Thu, 27 Apr 2006 15:13:48 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/a-nice-xslt-documentation-tool/</guid>
      
      
      <description><div>Taking &#34;self-documentation&#34; to a new level.</div><div>&lt;p&gt;I just learned from Uche and Chimezie&amp;rsquo;s &lt;a href=&#34;http://copia.ogbuji.net/blog/2006-04-26/del.icio.us.links&#34;&gt;del.icio.us bookmarks page&lt;/a&gt; about a very nice tool on &lt;a href=&#34;http://www.kanzaki.com/&#34;&gt;the Web Kanzaki&lt;/a&gt; for generating XSLT stylesheet documentation. When you go to the &lt;a href=&#34;http://www.kanzaki.com/parts/xsltdoc.xsl&#34;&gt;tool&amp;rsquo;s web page&lt;/a&gt;, you&amp;rsquo;re looking at the output of the tool run on itself. It looks great, due to two simple tricks, the first being quite clever:&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://www.kanzaki.com/&#34;&gt;&lt;img src=&#34;http://www.kanzaki.com/parts/me.gif&#34; alt=&#34;[web kanzaki]&#34; border=&#34;0&#34; align=&#34;right&#34; hspace=&#34;30px&#34; vspace=&#34;30px&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;It&amp;rsquo;s &lt;a href=&#34;http://www.xml.com/pub/a/2003/02/05/tr.html&#34;&gt;easy enough&lt;/a&gt; to add a processing instruction to an XML file to tell a browser that, upon retrieving that file, it should run a particular XSLT stylesheet on it. That stylesheet typically creates HTML output, and the browser displays that HTML, while still showing the original XML when you do a &amp;ldquo;View Source&amp;rdquo;. (I&amp;rsquo;ve enjoyed doing this to &lt;a href=&#34;http://www.xml.com/pub/a/2003/03/05/tr.html&#34;&gt;prototype new link architectures&lt;/a&gt;.) Kanzaki Masahide has done this recursively, pointing the XSLT documentation stylesheet at itself, so that when you go to the URL &lt;a href=&#34;http://www.kanzaki.com/parts/xsltdoc.xsl&#34;&gt;http://www.kanzaki.com/parts/xsltdoc.xsl&lt;/a&gt; you see an xsltdoc.xsl report about the xsltdoc.xsl stylesheet.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;To make it look even nicer, the result also points at CSS stylesheet.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you have a lot of XSLT that isn&amp;rsquo;t well-documented, running it all through xsltdoc.xsl is an easy first step.&lt;/p&gt;
&lt;p&gt;While I&amp;rsquo;m getting to appreciate XQuery more and more lately, this is the kind of area where XSLT really outshines XQuery—the idea of using XQuery scripts to read or write other XQuery scripts is pretty far-fetched, while using XSLT stylesheets to read or write other XSLT stylesheets is simple and productive. (Well, not too simple if you don&amp;rsquo;t have a good handle on how to manipulate namespaces in XSLT, but it&amp;rsquo;s &lt;a href=&#34;http://www.xml.com/pub/a/2001/04/04/trxml/&#34;&gt;not too difficult&lt;/a&gt;.) This kind of automation isn&amp;rsquo;t just an XML geek party trick, but something that becomes increasingly useful as processing-intensive XML manipulation scales up. That&amp;rsquo;s the manipulation itself that I&amp;rsquo;m talking about scaling up; when you scale up the size of the XML content, XQuery—or, more specifically, XQuery engines—start to demonstrate their advantage over the XSLT processors that need all of their input to be in memory at once.&lt;/p&gt;
&lt;p&gt;Mr. Kanzaki&amp;rsquo;s webpage shows that in addition to interesting XSLT work, he&amp;rsquo;s done some cool RDF projects and also plays the upright bass. I&amp;rsquo;d love to meet him someday; we&amp;rsquo;d have a lot to talk about.&lt;/p&gt;
&lt;h2 id=&#34;1-comments&#34;&gt;1 Comments&lt;/h2&gt;
&lt;p&gt;By Bruce on &lt;a href=&#34;#comment-402&#34;&gt;April 28, 2006 8:19 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;See also:&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://www.pnp-software.com/XSLTdoc/&#34;&gt;http://www.pnp-software.com/XSLTdoc/&lt;/a&gt;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2006">2006</category>
      
      <category domain="https://www.bobdc.com//categories/xslt">XSLT</category>
      
    </item>
    
    <item>
      <title>Writing about software: bad words</title>
      <link>https://www.bobdc.com/blog/writing-about-software-bad-wor/</link>
      <pubDate>Mon, 24 Apr 2006 09:26:25 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/writing-about-software-bad-wor/</guid>
      
      
      <description><div>Featuring the &#34;f&#34; word.</div><div>&lt;p&gt;As the third part of my series on &lt;a href=&#34;http://www.snee.com/bobdc.blog/publishing/documenting_software/&#34;&gt;writing about software&lt;/a&gt;, I&amp;rsquo;m writing about overly used words to avoid because they&amp;rsquo;re nearly meaningless. For a start, read George Orwell&amp;rsquo;s essay &lt;a href=&#34;http://www.k-1.com/Orwell/index.cgi/work/essays/language.html&#34;&gt;Politics and the English Language&lt;/a&gt;. The problem he speaks of, in which people use bigger words than they need to because they think that it makes them sound well-informed and important, is particularly endemic to the tech world, and especially acute among the marketing and sales people in all but the highest levels of the tech world. To quote the essay,&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;As soon as certain topics are raised, the concrete melts into the abstract and no one seems able to think of turns of speech that are not hackneyed: prose consists less and less of words chosen for the sake of their meaning, and more and more of phrases tacked together like the sections of a prefabricated henhouse.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Anyone up for some enterprise-wide world-class functionality? &amp;ldquo;Functionality&amp;rdquo;, the &amp;ldquo;f&amp;rdquo; word, is my least favorite of all popular words in the tech world; I have never used it, except as a joke, since I began documenting software in 1989. (By &amp;ldquo;f word&amp;rdquo;, I didn&amp;rsquo;t mean &amp;ldquo;fuck&amp;rdquo;, a concise, punchy old Anglo Saxon word with many layers of meaning that nevertheless is only &lt;a href=&#34;http://norman.walsh.name/2006/02/23/whitespace&#34;&gt;rarely&lt;/a&gt; appropriate when discussing tech issues.) Take a noun, add a suffix to make it an adjective, than add another suffix to make it a bigger noun, and it sounds more important, right? Wrong. Saying that release 3.0 is better than release 2.2 because it has &amp;ldquo;more functionality&amp;rdquo; is worthless; saying that release 3.0 has more features than release 2.2 is less ridiculous, but still says practically nothing—an upgrade has more features by definition. Saying &amp;ldquo;the upgrade has more features&amp;rdquo; just to lengthen the bulleted list of improvements in 3.0 is meaningless padding.&lt;/p&gt;
&lt;p&gt;The many ways of claiming that a product is good tell the reader just as much as telling them that an upgrade does more than the previous release: it tells them nothing. A former co-worker at Moody&amp;rsquo;s Investors Service who had once done software marketing told me that at the time he had a sign over his desk saying &amp;ldquo;Don&amp;rsquo;t Say &amp;lsquo;powerful&amp;rsquo;&amp;rdquo;. I have been guilty of this myself, but it&amp;rsquo;s great advice. Saying that a program is good carries no information unless you qualify as a disinterested observer. If someone from Apple says that the latest iPod variation is good, that means nothing. If Walt Mossberger says that it&amp;rsquo;s good, that means something. Of course, people selling software rarely use the word &amp;ldquo;good&amp;rdquo;; that would be too simple. They&amp;rsquo;ll say that it&amp;rsquo;s world-class, best-of breed, next-generation, enhanced, value-added, game-changing, and especially, powerful, but they&amp;rsquo;re all the same thing: they say that something is good, and as someone with a vested interest in the software&amp;rsquo;s success, you&amp;rsquo;re obligated to &lt;em&gt;show&lt;/em&gt; that it&amp;rsquo;s good, to prove &lt;em&gt;why&lt;/em&gt; it&amp;rsquo;s good, or you&amp;rsquo;re wasting the reader&amp;rsquo;s time.&lt;/p&gt;
&lt;p&gt;When tempted to use the &amp;ldquo;f&amp;rdquo; word, finding a superior alternative is not always as easy as just saying &amp;ldquo;more features&amp;rdquo;, but as Orwell writes, &amp;ldquo;By using stale metaphors, similes, and idioms, you save much mental effort, at the cost of leaving your meaning vague, not only for your reader but for yourself&amp;rdquo;. Some words that fail in their attempt to sound impressive are much easier to replace. Instead of &amp;ldquo;utilize&amp;rdquo;, say &amp;ldquo;use&amp;rdquo;. Instead of &amp;ldquo;necessitate&amp;rdquo;, say &amp;ldquo;require&amp;rdquo;. Instead of &amp;ldquo;facilitate&amp;rdquo;, say &amp;ldquo;ease&amp;rdquo; (or, if you&amp;rsquo;re talking about a meeting, say &amp;ldquo;run&amp;rdquo;). Instead of &amp;ldquo;heterogeneous&amp;rdquo;, say &amp;ldquo;varied&amp;rdquo;; instead of &amp;ldquo;orthogonal&amp;rdquo;, say &amp;ldquo;different&amp;rdquo;, unless you&amp;rsquo;re describing lines that form right angles or a specific kind of &lt;a href=&#34;http://en.wikipedia.org/wiki/Orthogonal_projection&#34;&gt;geometrical projection&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The &amp;ldquo;f&amp;rdquo; word also demonstrates a clue to watch for in avoiding bad tech words: too many suffixes. As I&amp;rsquo;ve quoted Orwell saying &lt;a href=&#34;https://www.bobdc.com/blog/measuring-information&#34;&gt;before&lt;/a&gt;, &amp;ldquo;Never use a long word where a short one will do&amp;rdquo;. When you see the words connectivity, utilization, orthogonality, or productionizing, you can&amp;rsquo;t just chop off the suffixes and move on in your writing, but the presence of all those suffixes should make you stop and think: is there a shorter word that can replace this entire word? It may take more than three or four seconds of thought to find that word, but if more than three or four people will be reading your sentence, you owe it to them.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://www.amazon.com/exec/obidos/ISBN=0262521822/bobducharmeA/&#34;&gt;&lt;img src=&#34;http://images.amazon.com/images/P/0262521822.01._BO2,-64_AA240_SH20_SCLZZZZZZZ_.jpg&#34; alt=&#34;[technobabble cover]&#34; border=&#34;0&#34; align=&#34;right&#34; hspace=&#34;30px&#34; vspace=&#34;30px&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;As the title of Orwell&amp;rsquo;s essay tells us, he feels that political discussion is especially prone to to the use of lazy, trite phrases. Technical discussion is even worse, because new terms are always cropping up and because of several issues common to marketing and sales people: they want to sound up-to-date, they don&amp;rsquo;t completely understand the technology they&amp;rsquo;re trying to sell, and they think that vagueness broadens the range of products and services that they might sell. In the early nineties I noticed that IBM had stopped describing their central hardware products as &amp;ldquo;computers&amp;rdquo; and instead started referring to them as &amp;ldquo;solutions&amp;rdquo;. For example, instead of the AS/400 being a &amp;ldquo;minicomputer&amp;rdquo;, it was a &amp;ldquo;midrange solution&amp;rdquo;. Selling a &amp;ldquo;solution&amp;rdquo; implies the selling of products plus the associated services (something IBM got good at long before electronic computers were invented) but it always amazes me how many tech vendors think that being vague will help them to sell more.&lt;/p&gt;
&lt;p&gt;Sometimes misuse of a technical term can drain it of its meaning. When a technology becomes hot, marketing and sales people misuse it to associate their products with it whether the products have earned this association or not, and words with specific meanings can lose those meanings. Fuzzy logic started off as a branch of set theory; now it means &amp;ldquo;vague reasoning&amp;rdquo;. &amp;ldquo;Artificial Intelligence&amp;rdquo; started off referring to a set of technologies that used specific tools to implement symbolic logic; the term &amp;ldquo;AI&amp;rdquo; got so overused that it became tainted, and semantic web advocates find themselves fighting off the associations with AI to avoid the taint. This loss of a technical term&amp;rsquo;s meaning can be temporary, which happened to &amp;ldquo;object-oriented&amp;rdquo; in the early nineties—I was told more than once that a system was object-oriented because an interface that consisted of clicking and dragging icons meant that I was &amp;ldquo;treating everything as an object&amp;rdquo;. Modern discussions of object-orientation seem back on track, probably because it&amp;rsquo;s not as hot as it used to be.&lt;/p&gt;
&lt;p&gt;An interesting 1991 book that attempted to address this vocabulary issue was &lt;a href=&#34;http://www.amazon.com/exec/obidos/ISBN=0262521822/bobducharmeA/&#34;&gt;Technobabble&lt;/a&gt; by John A. Barry. It&amp;rsquo;s a bit dated now, but it provides good advice on avoiding bad technical jargon as well as other hints that can improve technical writing, such as how to use more of the active and less of the passive voice. The &lt;a href=&#34;http://www.buzzkiller.net/&#34;&gt;Buzzkiller&lt;/a&gt; weblog (&amp;ldquo;A few journalists trying to stop a few thousand buzzwords. Crusading writers from Forbes, Wired and elsewhere do their best to make the tech world safe for English majors&amp;rdquo;), while only rarely updated, also has good advice and is often hilarious.&lt;/p&gt;
&lt;p&gt;Earlier, I mentioned that this vocabulary problem is less of a problem at the highest levels of the tech world. I was thinking that companies with a bigger budget have slicker, more talented people doing the writing, but I just looked at &lt;a href=&#34;http://www.oracle.com&#34;&gt;oracle.com&lt;/a&gt; and the first sentence that I saw at the top was &amp;ldquo;Comprehensive customer relationship management solution brings enterprise-class functionality to customers of all sizes&amp;rdquo;. Oh well.&lt;/p&gt;
&lt;h2 id=&#34;1-comments&#34;&gt;1 Comments&lt;/h2&gt;
&lt;p&gt;By Gunnar Grimnes on &lt;a href=&#34;#comment-401&#34;&gt;April 26, 2006 10:39 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Good post! Reading the orwell essay I found his quote from ecclesiastes much easier to read in the &amp;ldquo;translated&amp;rdquo; version - i sign that it&amp;rsquo;s too late for me, surely.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2006">2006</category>
      
      <category domain="https://www.bobdc.com//categories/documenting-software">documenting software</category>
      
    </item>
    
    <item>
      <title>The science of information</title>
      <link>https://www.bobdc.com/blog/the-science-of-information/</link>
      <pubDate>Wed, 19 Apr 2006 08:58:28 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/the-science-of-information/</guid>
      
      
      <description><div>&#34;Information: The New Language of Science&#34; by Hans Christian von Baeyer.</div><div>&lt;p&gt;I &lt;a href=&#34;https://www.bobdc.com/blog/measuring-information&#34;&gt;recently wrote about&lt;/a&gt; Claude Shannon and Warren Weaver&amp;rsquo;s book &amp;ldquo;The Mathematical Theory of Communication&amp;rdquo; and its insights into the idea of measuring information. I had planned to describe this book as an introduction to a review of a more recent, more easily digestible book, &lt;a href=&#34;http://www.amazon.com/exec/obidos/ISBN=0674018575/bobducharmeA/&#34;&gt;Information: The New Language of Science&lt;/a&gt; by Hans Christian von Baeyer, but decided to write separate entries on the two books.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://www.amazon.com/exec/obidos/ISBN=0674018575/bobducharmeA/&#34;&gt;&lt;img src=&#34;http://images.amazon.com/images/P/0674018575.01._SCLZZZZZZZ_.jpg&#34; alt=&#34;[&#39;Information&#39; cover]&#34; border=&#34;0&#34; align=&#34;right&#34; class=&#34;rightAlignedOpeningPicture&#34; width=&#34;200px&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I enjoyed von Baeyer&amp;rsquo;s book a great deal, and recommend it to anyone interested in where the science of information has been (for example, Shannon and Weaver) and where it&amp;rsquo;s leading. The author, a physicist at Virginia&amp;rsquo;s College of William and Mary, has written &lt;a href=&#34;http://www.amazon.com/exec/obidos/search-handle-url/index=books&amp;amp;field-author-exact=Hans%20Christian%20von%20Baeyer&amp;amp;rank=-relevance%2C%2Bavailability%2C-daterank/103-0127035-3291047&#34;&gt;several&lt;/a&gt; popular science books on physics-related topics. This background leads him to draw an extended analogy throughout the book between the evolution of scientific approaches to dealing with information and historical approaches to dealing with energy:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The gradual crystallization of the concept of information during the last hundred years contrasts sharply with the birth of the equally abstract quantity called energy in the middle of the nineteenth century. Then, in the brief span of twenty years, energy was invented, defined, and established as a cornerstone, first of physics, then of all science. We don&amp;rsquo;t know what energy &lt;em&gt;is&lt;/em&gt;, any more than we know what information is, but as a now robust scientific concept we can describe in in precise mathematical terms, and as a commodity we can measure, market, regulate and tax it.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;He certainly knows the history of approaches to energy, and from Ludwig Boltzman&amp;rsquo;s work to the present he describes many issues common to the quantification of both energy and information, such as entropy, randomness, probability, noise, and the relationship between logarithmic measurement and human perception. Toward the end of the book he shows how more recent subatomic issues in physics have more direct implications on information theory than the history of energy does, as he reviews qubits and Schrodinger&amp;rsquo;s ideas about how we can know what&amp;rsquo;s what inside of an atom.&lt;/p&gt;
&lt;p&gt;The concept of linking, which I think of as an expression of resource relationships, has always been &lt;a href=&#34;http://www.oreillynet.com/pub/au/1191&#34;&gt;close to my heart&lt;/a&gt;. After Von Baeyer quotes Henri Poincaré saying &amp;ldquo;The aim of science is not things in themselves, as the dogmatists in their simplicity imagine, but the relations between things; outside those relations there is no reality knowable&amp;rdquo;, I found it particularly interesting when von Bayer described information as &amp;ldquo;the communication of relationships&amp;rdquo;.&lt;/p&gt;
&lt;p&gt;Claude Shannon is one of the book&amp;rsquo;s heroes (von Baeyer writes that Shannon&amp;rsquo;s &amp;ldquo;A Mathematical Theory of Communication&amp;rdquo; has been &amp;ldquo;likened to the Magna Carta, Newton&amp;rsquo;s laws of motion, and the explosion of a bomb&amp;rdquo;) and von Baeyer makes an interesting prediction about Shannon and Weaver&amp;rsquo;s distinction between semantic information and information as a set of symbols whose successful transmission can be accurately measured:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&amp;hellip;the word &amp;lsquo;information&amp;rsquo; has two different senses. The colloquial usage, as in &amp;lsquo;personal information&amp;rsquo; and &amp;lsquo;directory information&amp;rsquo;, refers to the meaning of a message of some sort. The technical sense, on the other hand, emphasizes the symbols used to transmit a message&amp;hellip; Eventually the two definitions of information should converge, but that hasn&amp;rsquo;t happened yet. When it does, we will finally know what information is; until then we have to make do with compromises.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;His prediction of this convergence is pretty exciting. Overall, &amp;ldquo;Information&amp;rdquo; is a fascinating book, never too technical, and will especially appeal to geeks interested in the future of publishing applications and any applications that manipulate content or data with semantic value.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2006">2006</category>
      
      <category domain="https://www.bobdc.com//categories/book-reviews">book reviews</category>
      
    </item>
    
    <item>
      <title>Large eBay items without the shipping cost</title>
      <link>https://www.bobdc.com/blog/large-ebay-items-without-the-s/</link>
      <pubDate>Fri, 14 Apr 2006 10:04:36 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/large-ebay-items-without-the-s/</guid>
      
      
      <description><div>And, look them over before bidding on them. Some patience required.</div><div>&lt;p&gt;Did you ever see something on eBay that would be a good deal if it wasn&amp;rsquo;t for the shipping costs? Last summer I created a saved eBay search for &amp;ldquo;nordic track virginia&amp;rdquo;, hoping to find a local one that I could pick up myself. I wanted it to exercise when it&amp;rsquo;s too hot or too cold out to go jogging, and I hoped to find someone local who had bought one, didn&amp;rsquo;t use it, and wanted to make some quick cash from it. The shipping on something that big wouldn&amp;rsquo;t be trivial, which is why I wanted someone local.&lt;/p&gt;
&lt;img src=&#34;https://www.bobdc.com/img/main/blackguitar.jpg&#34; alt=&#34;[black acoustic guitar from ebay]&#34; border=&#34;0&#34; align=&#34;right&#34; width=&#34;200px&#34; hspace=&#34;30px&#34; vspace=&#34;30px&#34;/&gt;
&lt;p&gt;This search turned up a Nordic Track at a local store called &lt;a href=&#34;http://www.i-soldit.com/your_store.asp&#34;&gt;iSOLDIt&lt;/a&gt; that sells things on eBay for anyone who drops them off and doesn&amp;rsquo;t mind a commission being taken. A similar store plays an integral part in the plot of the movie &amp;ldquo;The 40 Year Old Virgin&amp;rdquo;, as in-store displays at iSOLDIt now remind you. They had a simple Nordic Track with none of the optional electronics, so I went over to check it out. It was in pretty good shape, so I bid on it, won it, and picked it up a few days later.&lt;/p&gt;
&lt;p&gt;Next, I created a saved search email alert for acoustic guitars that show up at that particular iSOLDIt store. (These emails don&amp;rsquo;t go to my inbox, but to a script that converts them to an Atom feed, which I described in the XML.com article &lt;a href=&#34;http://www.xml.com/pub/a/2005/11/23/hacking-ebay-turning-email-alerts-into-atom.html&#34;&gt;Hacking eBay: Turning Email Alerts into Atom&lt;/a&gt;.) One of the tuning pegs on my existing acoustic is shot, and it&amp;rsquo;s not a nice enough guitar to justify the cost of a new set of decent pegs. A musical instrument, even more than a Nordic Track, is something that you want to hold in your hands and look over before offering money for it, so when a simple black acoustic showed up at the local iSOLDIt, I went over to take a look at it. I decided that it was worth about $100 and that I would bid up to $50 or so for it. I ended up winning it for $26, and I don&amp;rsquo;t have to pay shipping!&lt;/p&gt;
&lt;p&gt;So remember: these stores aren&amp;rsquo;t just a convenience for eBay sellers; they&amp;rsquo;re great for buyers as well. If there&amp;rsquo;s one near you, set up a saved search or two to let you know when they have something you&amp;rsquo;d like, especially if it&amp;rsquo;s something large enough to make the shipping costs discouraging to more distant buyers.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2006">2006</category>
      
      <category domain="https://www.bobdc.com//categories/neat-tricks">neat tricks</category>
      
    </item>
    
    <item>
      <title>Joining the Ruby on Rails chorus</title>
      <link>https://www.bobdc.com/blog/joining-the-ruby-on-rails-chor/</link>
      <pubDate>Tue, 11 Apr 2006 09:49:06 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/joining-the-ruby-on-rails-chor/</guid>
      
      
      <description><div>I tasted the Kool-Aid, and I liked it.</div><div>&lt;p&gt;I finally got around to working my way through a Ruby on Rails tutorial, and was very, very impressed. If you don&amp;rsquo;t have time to install Rails and follow along with the steps in the tutorial, at least do a quick read through Curt Hibbs&amp;rsquo; &lt;a href=&#34;http://web.archive.org/web/20060412083540/http://www.onlamp.com/pub/a/onlamp/2005/01/20/rails.html&#34;&gt;Rolling with Ruby on Rails&lt;/a&gt; article in O&amp;rsquo;Reilly&amp;rsquo;s onlamp.com to get an idea of how easy it is to set up a useful application.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://www.rubyonrails.org/&#34;&gt;&lt;img src=&#34;https://rubyonrails.org/images/rails-logo.svg&#34; width=&#34;150&#34; alt=&#34;[Ruby on Rails logo]&#34; border=&#34;0&#34; align=&#34;right&#34; hspace=&#34;30px&#34; vspace=&#34;30px&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I had the impression that Rails was about quickly building websites, but Hibbs&amp;rsquo; article showed me that it&amp;rsquo;s more of a way to quickly build database applications with a web-based front end. And when I say &amp;ldquo;quickly&amp;rdquo;, I mean really quickly. Once you define your tables, Rails automates the creation of screens and logic to let your application&amp;rsquo;s users create, read, update, and delete data in those tables, and it provides clearly defined places to customize this behavior if you want to get fancier with your HTML, SQL, or Ruby logic.&lt;/p&gt;
&lt;p&gt;I tried a Ruby tutorial a few years ago, but I had just begun using Python, and I stuck with it for the same reason I take up nearly any new language: there were all these great Python libraries (for reaching inside of Microsoft Outlook and Office, for manipulating RDF, and many more) that I wanted to patch together into new applications. Now Ruby has its Killer App.&lt;/p&gt;
&lt;p&gt;Most discussions of Rails focus on its inroads into the high-end web development that many are doing with large, complex Java libraries. I can picture it also playing an increasing role in low-end database development. For example, picture someone whose computer knowledge is limited to basic Microsoft Office usage. I&amp;rsquo;ll call her Brenda. Brenda needs to develop some sort of multi-table (and perhaps multi-user) database for her office. The obvious answer in the last few years would have been &lt;a href=&#34;http://office.microsoft.com/en-us/FX010857911033.aspx&#34;&gt;Microsoft Access&lt;/a&gt;; other choices over the years include dBase, FoxPro, Paradox, FileMaker, and other products that commanded portions of the PC database manager market before Microsoft steamrolled in. Now, Brenda has a free alternative that works equally well on Windows, the Mac, and Linux: Ruby on Rails. The most difficult part is the marketing message; how do you find the Brendas and get the message across? &amp;ldquo;Maximize your productivity and forget about J2EE&amp;rdquo;, a traditional Rails marketing message, won&amp;rsquo;t mean much to Brenda.&lt;/p&gt;
&lt;p&gt;Hibbs&amp;rsquo; article is a little too Windows-centric, with its use of &lt;a href=&#34;http://www.mysqlfront.de/&#34;&gt;MySQL-front&lt;/a&gt; to do the database parts, which is a shame considering how cross-platform Rails is. &lt;a href=&#34;http://www.onlamp.com/pub/a/onlamp/2005/03/03/rails.html&#34;&gt;Part 2&lt;/a&gt; of his article includes the SQL code that you would enter at the &lt;code&gt;mysql&amp;gt;&lt;/code&gt; prompt if you&amp;rsquo;re using Rails on a Mac or Linux box. (If you&amp;rsquo;re going install Rails on Ubuntu Linux, as I did, I found the weblog posting &lt;a href=&#34;http://fo64.com/articles/2005/10/20/rails-on-breezy&#34;&gt;Ruby, Rails, Apache2, and Ubuntu Breezy&lt;/a&gt; by a Joe somebody to be very helpful.) If you&amp;rsquo;re going to carry out all the steps in his tutorial, then before you start typing it&amp;rsquo;s worth reading both part 1 and part 2 and skimming the comments to learn about a few caveats that will speed your progress when you actually get to entering the (minimal) code that the tutorial asks of you. (One more quibble: too much of the article&amp;rsquo;s sample code is shown in screen shots, instead of inside of HTML &lt;code&gt;pre&lt;/code&gt; tags, which would have let readers copy and paste it instead of rekeying it.)&lt;/p&gt;
&lt;p&gt;Amy Hoy&amp;rsquo;s &lt;a href=&#34;http://www.slash7.com/articles/2005/01/24/really-getting-started-in-rails&#34;&gt;Really Getting Started in Rails&lt;/a&gt; fills in some gaps in Hibbs&amp;rsquo; article, and her other articles listed down the left of that screen also look like helpful background.&lt;/p&gt;
&lt;p&gt;Hibbs&amp;rsquo; more recent article &lt;a href=&#34;http://www.onlamp.com/pub/a/onlamp/2005/06/09/rails_ajax.html&#34;&gt;AJAX on Rails&lt;/a&gt; demonstrates how Rails&amp;rsquo; ability to simplify the implementation of useful features has been applied to AJAX&amp;rsquo;s use of XMLHttpRequest Javascript calls (or more accurately, to shield you from the need to deal directly with XMLHttpRequest calls) to update portions of your screen instead of the whole thing, making the client side of your apps more responsive. This article helped me to better understand both the mechanics of AJAX applications and the elegance of Rails&amp;rsquo; ability to incorporate those mechanics.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2006">2006</category>
      
      <category domain="https://www.bobdc.com//categories/miscellaneous">miscellaneous</category>
      
    </item>
    
    <item>
      <title>Measuring information</title>
      <link>https://www.bobdc.com/blog/measuring-information/</link>
      <pubDate>Wed, 05 Apr 2006 18:11:49 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/measuring-information/</guid>
      
      
      <description><div>A short but dense classic offers some solid background.</div><div>&lt;p&gt;I&amp;rsquo;ve always been fascinated by the idea of information as something quantifiable. When William Strunk (of &lt;a href=&#34;http://www.amazon.com/exec/obidos/ISBN=020530902X/bobducharmeA/&#34;&gt;Strunk and White&lt;/a&gt; fame) wrote &lt;a href=&#34;http://www.bartleby.com/141/strunk5.html#13&#34;&gt;omit needless words&lt;/a&gt;, and when George Orwell wrote &amp;ldquo;If it is possible to cut a word out, always cut it out&amp;rdquo; in &lt;a href=&#34;http://www.resort.com/~prime8/Orwell/patee.html&#34;&gt;Politics and the English Language&lt;/a&gt;, they affirmed that good writing packs more information into fewer words (or syllables, or even letters—in the same essay, Orwell wrote &amp;quot; Never use a long word where a short one will do&amp;quot;) than bad writing does. While I won&amp;rsquo;t tag this weblog entry as part of my &lt;a href=&#34;http://www.snee.com/bobdc.blog/publishing/documenting_software/&#34;&gt;Documenting Software&lt;/a&gt; series, the idea of more &amp;ldquo;efficient&amp;rdquo; sentences holds obvious appeal to a geek writer.&lt;/p&gt;
&lt;p&gt;The phrase &amp;ldquo;more information&amp;rdquo; here, though, does not refer to something quantifiable. For example, the second sentence below has more information than the first, but we can&amp;rsquo;t assign a number to the difference:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;The Beatles&#39; album &amp;quot;Revolver&amp;quot; is really so, completely, totally, you know, like, awesome.&lt;/code&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;The Beatles recorded their &amp;quot;Revolver&amp;quot; album at Abbey Road studios from 4/7/66 to 6/17/66.&lt;/code&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Years ago, when I heard about Claude Shannon&amp;rsquo;s work on information theory, I sent away for the book &lt;a href=&#34;http://www.amazon.com/exec/obidos/ISBN=0252725484/bobducharmeA/&#34;&gt;The Mathematical Theory of Communication&lt;/a&gt; that he co-authored with Warren Weaver. This was so long ago that I only realized today that the NYU building where I took every class of my computer science degree was named for the same co-author; if I ever referred to WWH in an e-mail to Matthew Fuchs, who got his PhD there, he&amp;rsquo;d know immediately that I meant Warren Weaver Hall.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://www.amazon.com/exec/obidos/ISBN=0252725484/bobducharmeA/&#34;&gt;&lt;img src=&#34;http://images.amazon.com/images/P/0252725484.01._BO2-64_AA240_SH20_SCLZZZZZZZ_.gif&#34; alt=&#34;[Shannon and Weaver cover]&#34; border=&#34;0&#34; align=&#34;right&#34; hspace=&#34;30px&#34; vspace=&#34;30px&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Much of information theory came out of the study of communication, with the engineering problem being the loss of information. To know what percentage of transmitted information was received, you must be able to measure information, so it&amp;rsquo;s no surprise that Shannon did this work at Bell Telephone Labs. He wrote pages 36 to 125 of this book, and the math is over my head despite my CS degree. (I may claim that I&amp;rsquo;ve always specialized in getting computers to manipulate text, not numbers, but that&amp;rsquo;s a poor excuse considering how much of the payoff from Shannon&amp;rsquo;s work has been in applications that transmit and store text.)&lt;/p&gt;
&lt;p&gt;Weaver&amp;rsquo;s 1949 26-page introduction to the book includes a few logarithmic expressions, but I can handle that, and his whole section is fascinating. Part 2 of Weaver&amp;rsquo;s essay, the longest part, is an interpretation of Shannon&amp;rsquo;s work; Part 1 raises questions that add much clarity to my ruminations about the difference in the amount of information in the two sample sentences above, and Part 3 revisits the questions in light of Shannon&amp;rsquo;s work.&lt;/p&gt;
&lt;p&gt;Part 1 describes three levels of communications problems, which he describes like this:&lt;/p&gt;
&lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;LEVEL A. How accurately can the symbols of communication be transmitted? (The technical problem.)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;LEVEL B. How precisely do the transmitted symbols convey the desired meaning? (The semantic problem.)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;LEVEL C. How effectively does the received meaning affect conduct in the desired way? (The effectiveness problem.)&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;So, the second of my two sentences about the Revolver album has more information in Level B terms (that is, more semantic information), while having an identical amount in Level A terms—with the two sentences being equal in length, my host provider&amp;rsquo;s server used the same energy to send each one to your computer. (I think that Level C concerns the receiving entity more than the received message, and Weaver doesn&amp;rsquo;t say much about it, so I won&amp;rsquo;t address it here.) Part 2 of Weaver&amp;rsquo;s piece is called &amp;ldquo;Communications Problems at Level A&amp;rdquo;; this is obviously where Shannon&amp;rsquo;s math has the most to offer. Weaver mostly discusses Levels B and C in terms of their relationship to Level A, and it&amp;rsquo;s an important relationship: they build on Level A, so problems at Level A cause problems in B and C. He suggests an interesting change to the following diagram, shown originally in Part 1:&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://www.tcw.utwente.nl/theorieenoverzicht/Theory%20clusters/Communication%20and%20Information%20Technology/Information_Theory.doc/&#34;&gt;&lt;img src=&#34;http://www.tcw.utwente.nl/theorieenoverzicht/Theory%20clusters/Communication%20and%20Information%20Technology/Information_Theory.doc/Information_Theory-1.png&#34; alt=&#34;[communication system schematic]&#34; border=&#34;0&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;One can imagine, as an addition to the diagram, another box labeled &amp;ldquo;Semantic Receiver&amp;rdquo; interposed between the engineering receiver (which changes signals to the messages) and the destination. This semantic receiver subjects the message to a second decoding, the demand on this one being that it must match the statistical &lt;em&gt;semantic&lt;/em&gt; characteristics [his italics] of the message to the statistical semantic capacities of the totality of receivers, or of that subset of receivers which constitute the audience one wishes to affect.&lt;/p&gt;
&lt;p&gt;Similarly one can imagine another box in the diagram which, inserted between the information source and the transmitter, would be labeled &amp;ldquo;semantic noise,&amp;rdquo; the box previously labeled as simply &amp;ldquo;noise&amp;rdquo; now being labeled &amp;ldquo;engineering noise.&amp;rdquo; From this source is imposed into the signal the perturbations or distortions of meaning which are not intended by the source but which inescapably affect the destination. And the problem of semantic decoding must take this semantic noise into account. It is also possible to think of an adjustment of original message so that the sum of message meaning plus semantic noise is equal to the desired total message meaning at the destination.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;To &amp;ldquo;match the statistical semantic characteristics&amp;rdquo; sounds like quite a challenge, but I&amp;rsquo;m sure there are semantic web researchers out there reading up on their Shannon and Weaver.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve written all this as an introduction to a review of a more recent, pop science oriented book that I&amp;rsquo;ve just read and enjoyed very much, Hans Christian von Baeyer&amp;rsquo;s &lt;a href=&#34;http://www.amazon.com/exec/obidos/ISBN=0753817829/bobducharmeA/&#34;&gt;Information&lt;/a&gt;, but this is already long enough, so I&amp;rsquo;ll discuss von Baeyer&amp;rsquo;s book at some future point. I&amp;rsquo;ll finish with my favorite quote from &amp;ldquo;The Mathematical Theory of Communication&amp;rdquo;, in which Weaver discusses the effect of the probability of the existence of a given symbol string on efficient compression:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&amp;hellip; anyone would agree that the probability is low for such a sequence of words as &amp;ldquo;Constantinople fishing nasty pink.&amp;rdquo; Incidentally, it is low, but not zero; for it is perfectly possible to think of a passage in which one sentence closes with &amp;ldquo;Constantinople fishing,&amp;rdquo; and the next begins with &amp;ldquo;Nasty pink.&amp;rdquo; And we might observe in passing that the unlikely four-word sequence under discussion &lt;em&gt;has&lt;/em&gt; occurred in a single good English sentence, namedly, the one above.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;A &lt;a href=&#34;http://www.google.com/search?q=%22Constantinople%20fishing%20nasty%20pink%22&#34;&gt;Google search&lt;/a&gt; on the phrase &amp;ldquo;Constantinople fishing nasty pink&amp;rdquo; today gets 246 hits, but of course Shannon and Weaver&amp;rsquo;s influence is much more extensive than that.&lt;/p&gt;
&lt;h2 id=&#34;2-comments&#34;&gt;2 Comments&lt;/h2&gt;
&lt;p&gt;By Gavin Brelstaff on &lt;a href=&#34;#comment-385&#34;&gt;April 6, 2006 9:36 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Ezra Pound in his &amp;ldquo;ABC Of Reading&amp;rdquo; - Faber &amp;amp; Faber 1951&lt;br /&gt;
wrote p36&lt;/p&gt;
&lt;p&gt;&amp;ldquo;Great literature is simply langauge charged with meaning to the utmost possible degree.&amp;rdquo;&lt;br /&gt;
DICHTEN = CONDENSARE&lt;/p&gt;
&lt;p&gt;p63&lt;br /&gt;
&amp;ldquo;Incompetence will show in the use of too many words&amp;rdquo;&lt;/p&gt;
&lt;p&gt;By Gavin Brelstaff on &lt;a href=&#34;#comment-398&#34;&gt;April 11, 2006 4:46 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Dear Bob&lt;/p&gt;
&lt;p&gt;I (also) think your thinking might be elucidated by the&lt;br /&gt;
the Russian linguist:&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://en.wikipedia.org/wiki/Roman_Jakobson&#34;&gt;http://en.wikipedia.org/wiki/Roman_Jakobson&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&amp;ldquo;Jakobson distinguishes six communication functions, each associated with a dimension of the communication process:&amp;rdquo;&lt;/p&gt;
&lt;p&gt;Dimensions&lt;br /&gt;
1 context&lt;br /&gt;
2 message&lt;br /&gt;
3 sender &amp;mdash;&amp;mdash;&amp;mdash;&amp;mdash;&amp;mdash; 4 receiver&lt;br /&gt;
5 channel&lt;br /&gt;
6 code&lt;/p&gt;
&lt;p&gt;Functions&lt;br /&gt;
1 referential (= contextual information)&lt;br /&gt;
2 poetic (= autotelic)&lt;br /&gt;
3 emotive (= self-expression)&lt;br /&gt;
4 conative (= vocative or imperative addressing of receiver)&lt;br /&gt;
5 phatic (= checking channel working)&lt;br /&gt;
6 metalingual (= checking code working)\&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2006">2006</category>
      
      <category domain="https://www.bobdc.com//categories/book-reviews">book reviews</category>
      
      <category domain="https://www.bobdc.com//categories/technology-past">technology, past</category>
      
    </item>
    
    <item>
      <title>Document Engineering</title>
      <link>https://www.bobdc.com/blog/document-engineering/</link>
      <pubDate>Mon, 03 Apr 2006 09:12:00 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/document-engineering/</guid>
      
      
      <description><div>An excellent book by Bob Glushko and Tim McGrath.</div><div>&lt;p&gt;&lt;a href=&#34;http://www.amazon.com/exec/obidos/ISBN=0262072610/bobducharmeA/&#34;&gt;&lt;img src=&#34;http://ec1.images-amazon.com/images/P/0262072610.01._AA240_SCLZZZZZZZ_.jpg&#34; alt=&#34;[Document Engineering cover]&#34; border=&#34;0&#34; align=&#34;right&#34; hspace=&#34;30px&#34; vspace=&#34;30px&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Bob Glushko and Tim McGrath&amp;rsquo;s new book &lt;a href=&#34;http://www.amazon.com/exec/obidos/ISBN=0262072610/bobducharmeA/&#34;&gt;Document Engineering: Analyzing and Designing Documents for Business Informatics and Web Services&lt;/a&gt; describes &amp;ldquo;document engineering&amp;rdquo; as a new discipline. The discipline, if not the name, will sound familiar to people who work with XML in an automated publishing context, a web services context, or somewhere in between. (More on the &amp;ldquo;between&amp;rdquo; later.)&lt;/p&gt;
&lt;p&gt;I found the book&amp;rsquo;s subtitle to be misleading, because the book covers much more than the design of documents. As its introduction tells us,&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The essence of Document Engineering is the analysis and design methods that yield:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Precise specifications or models for the information that business processes require.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Rules by which related processes are coordinated, whether between different firms to create composite services or virtual enterprises or within a firm to streamline information flow between organizations.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Document Engineering provides the concepts and methods needed to align business strategy and information technology, to bridge the gap between what we want to do and how to do it.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The two authors describe documents as &amp;ldquo;self-contained package[s] of related information&amp;rdquo;, which obviously means much more than publishable content that gets set into specific fonts to be read by human eyeballs. In particular, they describe documents as the interfaces between business processes. These processes are typically rendered as services these days, what with service-oriented architectures (SOA) being such a hot topic in IT architecture discussions. Such a document could be an invoice, a bill of lading, or a specific information package designed for the interaction between processes at two business partners.&lt;/p&gt;
&lt;p&gt;Or, of course, it could be a novel or a user&amp;rsquo;s guide or a company&amp;rsquo;s annual report. There&amp;rsquo;s a common distinction in the XML world between &amp;ldquo;data-oriented&amp;rdquo; XML and &amp;ldquo;document-oriented XML&amp;rdquo; that I prefer to describe as transaction-oriented XML versus publishing-oriented XML (see &lt;a href=&#34;http://www.snee.com/xml/xml2004paper.html&#34;&gt;Documents vs. Data, Schemas vs. Schemas&lt;/a&gt;). It&amp;rsquo;s all data, and it&amp;rsquo;s all documents. The status of all well-formed XML as both data and documents is something that Glushko and McGrath take very seriously, and they&amp;rsquo;ve studied engineering techniques from both the business transaction and the publishing worlds to help the reader address issues in both ends of the spectrum and in the many cases in between. For example, in addition to reviewing the methodology proposed in Eve Maler and Jeanne El Andaloussi&amp;rsquo;s classic book &lt;a href=&#34;http://www.amazon.com/exec/obidos/ISBN=0133098818/bobducharmeA/&#34;&gt;Developing SGML DTDs: From Text to Model to Markup&lt;/a&gt;, they present the first detailed approach I&amp;rsquo;ve seen to applying classic &lt;a href=&#34;http://en.wikipedia.org/wiki/Database_normalization&#34;&gt;database normalization&lt;/a&gt; techniques to documents.&lt;/p&gt;
&lt;p&gt;Along with the use of existing data engineering techniques, they also build on existing business processing models and standards such as ebXML. While the book will be very useful for business process people who want to learn more about the processing of non-relational data, a chapter like &amp;ldquo;Describing What Businesses Do and How They Do It&amp;rdquo; will be just as valuable to data people who want to understand the different classes of business processes, the different levels of abstraction used to discuss them, and potential interactions between them, which is especially important considering the primary role that Glushko and McGrath see their idea of &amp;ldquo;documents&amp;rdquo; playing in those interactions.&lt;/p&gt;
&lt;p&gt;Consultants who need to repeatedly perform business process analysis and related document workflow analysis at multiple clients will find this book to be particularly helpful, both to educate themselves and their clients. For example, the chapter &amp;ldquo;When Models Don&amp;rsquo;t Match: The Interoperability Challenge&amp;rdquo; enumerates ways that exchanged data may not fill the intended purpose. I liked the chart showing the potential problems caused by differences in content, encoding, structure, and semantics for a simple piece of information like &amp;ldquo;100 US Dollars&amp;rdquo;.&lt;/p&gt;
&lt;p&gt;The extended use case that the book repeatedly returns to is an integrated event calendaring system for the University of California at Berkeley. Anyone who considers shared calendaring to be the next killer app should consider themselves lucky that Glushko and McGraw chose this as a use case; the clear connections that they draw between the details of the use case and the more abstract discussions that form the bulk of the book will give calendar app developers a big jump in their analysis work. Also handy for anyone&amp;rsquo;s analysis are the lists of questions that the book suggests you ask of someone about any document that they use or send to others.&lt;/p&gt;
&lt;p&gt;As a 702-page (with indexes and backmatter) hardcover, the book does weigh a bit; I read much of it on a series of plane trips in which I was not carrying a laptop. Carrying this book with a laptop and normal luggage might have given me back problems. The book is definitely worth getting, though, for just about anyone who deals with XML data that gets passed from one process to another, and that&amp;rsquo;s a lot of us.&lt;/p&gt;
&lt;h2 id=&#34;2-comments&#34;&gt;2 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://svg.startpagina.nl&#34; title=&#34;http://svg.startpagina.nl&#34;&gt;stelt&lt;/a&gt; on &lt;a href=&#34;#comment-377&#34;&gt;April 3, 2006 12:29 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Some very smart ideas about Documents being way more than a dead instance on a website or print-out:&lt;br /&gt;
&lt;a href=&#34;http://www.google.com/search?q=%22future+of+Science+Communication+and+Publishing%22&#34;&gt;http://www.google.com/search?q=%22future+of+Science+Communication+and+Publishing%22&lt;/a&gt; and&lt;br /&gt;
&lt;a href=&#34;http://www.google.com/search?q=%22hypermedia+for+science%22+datament&#34;&gt;http://www.google.com/search?q=%22hypermedia+for+science%22+datament&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;And what about many DOMs continuously interacting ?&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.sims.berkeley.edu/~glushko&#34; title=&#34;http://www.sims.berkeley.edu/~glushko&#34;&gt;Bob Glushko&lt;/a&gt; on &lt;a href=&#34;#comment-378&#34;&gt;April 3, 2006 3:08 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Bob, I&amp;rsquo;m pleased that you like the book. Tim McGrath and I have set up &lt;a href=&#34;http://docengineering.com/&#34;&gt;site at docengineering.com&lt;/a&gt; with a couple of sample chapters and various other talks and papers.&lt;/p&gt;
&lt;p&gt;-Bob Glushko&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2006">2006</category>
      
      <category domain="https://www.bobdc.com//categories/xml">XML</category>
      
      <category domain="https://www.bobdc.com//categories/book-reviews">book reviews</category>
      
    </item>
    
    <item>
      <title>XML, summer, and Oxford</title>
      <link>https://www.bobdc.com/blog/xml-summer-and-oxford/</link>
      <pubDate>Sat, 25 Mar 2006 15:56:39 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/xml-summer-and-oxford/</guid>
      
      
      <description><div>Now in its sixth year!</div><div>&lt;p&gt;The &lt;a href=&#34;http://www.xmlsummerschool.com&#34;&gt;XML Summer School&lt;/a&gt;, a week of seminars on a wide range of XML-related topics sponsored by the &lt;a href=&#34;http://www.csw.co.uk/&#34;&gt;CSW Group&lt;/a&gt; at a college of Oxford University, is being held for the sixth year. Peter Flynn and I hold the distinction of being the only people to have taught every year, but the list of people who have taught for most of those years the roster of new people each year are both very distinguished lists.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://oxford.openguides.org/wiki/?Punting&#34;&gt;&lt;img src=&#34;http://www.xmlsummerschool.com/images/river1.jpg&#34; alt=&#34;[punts]&#34; border=&#34;0&#34; align=&#34;right&#34; hspace=&#34;30px&#34; vspace=&#34;30px&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;One of these distinguished members, Michael Kay, &lt;a href=&#34;http://saxonica.blogharbor.com/blog/_archives/2006/3/22/1834451.html&#34;&gt;recently wrote&lt;/a&gt; that &amp;ldquo;when you come in on the last day you get the impression of a strong sense of bonding that has taken place over the week.&amp;rdquo; There are people who I would have considered friends because I ran into them once a year at a conference and email sometimes who, because of this conference, I consider fairly close friends, because you get to spend a lot of time hanging out and discussing all kinds of things. (There is often some drinking involved once the day gets old enough; the schedule for the week includes &amp;ldquo;pub crawl&amp;rdquo; as an official event.) The marketing literature for the event has always pointed out that instead of just being lecturers talking through slides to attendees, it&amp;rsquo;s everyone hanging out together day and evening, with activities like &lt;a href=&#34;http://oxford.openguides.org/wiki/?Punting&#34;&gt;punting&lt;/a&gt; and private receptions held in local museums, and Michael&amp;rsquo;s comment shows that the system works.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve been overseeing the track on XSLT and related technologies, which this year includes XSL-FO and XQuery. Michael Kay and Jeni Tennison will return to teach XSLT with me (I cover the easier parts), and this year Priscilla Walmsley joins us to teach XQuery and XSL-FO. Paul Prescod will cover a very interesting topic: the role that client-side XSLT can play in AJAX application development. (Michael&amp;rsquo;s talk will touch on both XSLT and XQuery issues, covering the contributions that schema awareness can bring to application development with these languages.)&lt;/p&gt;
&lt;p&gt;And that&amp;rsquo;s just one track. I should pay more attention to the Health Care and Drug Information tracks, because there&amp;rsquo;s always a good cast of presenters discussing interesting solutions to large problems. I always learn more about web services from people like Marc Hadley and John Kemp; Eve Maler makes some very dry issues about security and access control much more interesting, and there are always new things such as this year&amp;rsquo;s coverage of rich web clients by Paul Prescod and Chris Lilley. With people like Peter Flynn, Tony Coates, Peter Brown, and Sean McGrath, I&amp;rsquo;m never sure what I&amp;rsquo;ll be discussing, but it&amp;rsquo;s always interesting (which reminds me—there are often unscheduled pub crawls as well).&lt;/p&gt;
&lt;p&gt;To summarize, the CSW Oxford XML Summer School is a great opportunity to to learn basic, advanced, old, and new aspects of XML, whether you&amp;rsquo;re starting the week as a beginner or as a long-time practitioner. Take a look through the &lt;a href=&#34;http://www.xmlsummerschool.com&#34;&gt;web site&lt;/a&gt;.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2006">2006</category>
      
      <category domain="https://www.bobdc.com//categories/xml">XML</category>
      
    </item>
    
    <item>
      <title>XHTML 2 for storing content?</title>
      <link>https://www.bobdc.com/blog/xhtml-2-for-storing-content/</link>
      <pubDate>Sat, 18 Mar 2006 17:01:44 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/xhtml-2-for-storing-content/</guid>
      
      
      <description><div>People will use XHTML 2 for more than shipping pages to browsers.</div><div>&lt;p&gt;While discussing &lt;a href=&#34;http://www.w3.org/News/2006#item40/&#34;&gt;RDF/A&lt;/a&gt;, Shelley Powers &lt;a href=&#34;http://weblog.burningbird.net/2006/03/15/useful-semweb-posts/&#34;&gt;recently wrote&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;I still believe that we don&amp;rsquo;t need to embed RDF directly into our web pages because many web sites are dynamic now. As such, if one accesses the page as a human, you get data formatted for human consumption through a browser; if you access the page as a webbot, by attaching /rdf to the end of the document, the same data is formatted for mechanical consumption. No need to clutter up web pages, or make page creation or generation that much harder.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I think that XHTML 2 will be used for more than delivery of content to browsers, adding a lot of value to the ability to add RDF metadata to XHTML 2. People will take XHTML 2 more seriously as a format for actually storing content than they ever took its predecessors for several reasons:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Improvements such as nested section elements with content-dependent &lt;code&gt;h&lt;/code&gt; headers instead of &lt;code&gt;h1&lt;/code&gt;, &lt;code&gt;h2&lt;/code&gt;, &lt;code&gt;h3&lt;/code&gt;, and their brethren will let us make documents structurally richer and therefore easier to slice and dice. The &amp;ldquo;Structuring Advantages&amp;rdquo; slide of Steven Pemberton&amp;rsquo;s &lt;a href=&#34;http://www.w3.org/2005/Talks/05-steven-xtech/&#34;&gt;XHTML2: Accessible, Usable, Device Independent and Semantic&lt;/a&gt; slideshow makes some excellent points about this. The whole slideshow is worth reading.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;I&amp;rsquo;ve noticed a trend among web designers psyched about CSS to take more and more presentation out of their (X)HTML and store it in CSS. The ability to apply different CSS stylesheets to the same XHTML and have it look nice seems to be a mark of professionalism for them now. (Old SGML geeks are tempted to say: &amp;ldquo;Woo-hoo! Professional web designers are finally taking separation of content from presentation seriously! We won!&amp;rdquo;) Maybe the web designers&amp;rsquo; move away from messy HTML is just a fringe benefit of their moves toward the &lt;a href=&#34;http://www.csszengarden.com/&#34;&gt;CSS Zen Garden&lt;/a&gt;. It&amp;rsquo;s still a huge benefit.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Structurally, XHTML 1 wasn&amp;rsquo;t enough for some content applications, and DocBook—even DocBook Lite—was often too much. XHTML 2 will hit a sweet spot for a lot of applications, especially those involved in the interchange of content across workflow steps (which may cross business boundaries—think of it as B2B content). Content exchanged across workflow steps needs metadata; that&amp;rsquo;s often how you know which workflow steps have touched a document. Separate metadata means more documents to track. Embedded metadata is part of the appeal of &lt;a href=&#34;https://www.bobdc.com/blog/using-or-not-using-adobes-xmp&#34;&gt;XMP&lt;/a&gt;, which lets you embed (some) RDF into binary files such as PDF and JPEG files. Embedded metadata makes a lot of things easier, and RDF/A will do this for XHTML.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If I&amp;rsquo;m writing something shorter than an entire book, I&amp;rsquo;m sure I&amp;rsquo;ll mostly use XHTML 2 once its schema gets more settled. The ability to put a list or &lt;code&gt;pre&lt;/code&gt; element inside of a &lt;code&gt;p&lt;/code&gt; element will be very handy for tech writing. If someone needs my content in some other format, it will be easy enough to transform. RDF/A looks to me like microformats done right, and it could even benefit the RDF community more than the XHTML community as it spreads RDF beyond the ivory towers where it&amp;rsquo;s been most comfortable.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;m looking forward to XHTML 2, and RDF/A is a key reason. Again, read Steven&amp;rsquo;s slides.&lt;/p&gt;
&lt;h2 id=&#34;1-comments&#34;&gt;1 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://dubinko.info/blog/&#34; title=&#34;http://dubinko.info/blog/&#34;&gt;Micah Dubinko&lt;/a&gt; on &lt;a href=&#34;#comment-283&#34;&gt;March 18, 2006 6:22 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;100% Agree. -m&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2006">2006</category>
      
      <category domain="https://www.bobdc.com//categories/metadata">metadata</category>
      
    </item>
    
    <item>
      <title>Law metadata on the web</title>
      <link>https://www.bobdc.com/blog/law-metadata-on-the-web/</link>
      <pubDate>Mon, 13 Mar 2006 07:54:22 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/law-metadata-on-the-web/</guid>
      
      
      <description><div>US laws and court decisions: fertile ground for semantic integration projects.</div><div>&lt;p&gt;More and more primary law (court cases and actual laws passed by governments at any level, as opposed to secondary law such as treatises explaining the meaning of primary law) is available on the web. In the United States, most federal and state governments and court systems make it a regular practice to publish this information on the web on their own dot gov websites. Governments typically have laws requiring that all laws be available where citizens can see them, and doing so on the web costs less than the time-honored tradition of publishing bound books.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://www.hypergrove.com/legalrdf.org/index.html&#34;&gt;&lt;img src=&#34;http://www.hypergrove.com/legalrdf.org/LegalXHTML6.gif&#34; alt=&#34;Legal-RDF logo&#34; align=&#34;right&#34; border=&#34;0&#34; class=&#34;rightAlignedOpeningPicture&#34; width=&#34;240px&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Each state, though, has little incentive to encourage and then follow national standards for how this information is published. The web operation at a given state capital or court system is a nonprofit organization working on a limited budget, and if they can get readable HTML, PDF, Word, or even WordPerfect files up there, they&amp;rsquo;ve achieved their main goal. Much of what LexisNexis and WestLaw customers pay for is integrated access to normalized, indexed versions of such data, cross-linking between court decisions and laws, and a professionally-maintained taxonomy to guide them through laws that address particular subjects, and while many customers complain about the expense, the cost doesn&amp;rsquo;t surprise people who understand the work that goes into it.&lt;/p&gt;
&lt;p&gt;Some related efforts for standardizing law XML have been up and running for a while. The OASIS &lt;a href=&#34;http://www.legalxml.org/about/index.shtml&#34;&gt;LegalXML&lt;/a&gt; group is mostly concerned with the electronic exchange of legal data such as court filings and and transcripts. Their &lt;a href=&#34;http://www.legalxml.org/members/index.shtml&#34;&gt;membership&lt;/a&gt; is an interesting array of for-profit and non-profit organizations from several countries. (It turns out that the &lt;a href=&#34;http://www.vscl.org.au/&#34;&gt;Victorian Society for Computers &amp;amp; the Law&lt;/a&gt; is based in the Australian state of Victoria and not concerned with potential applications of Charles Babbage&amp;rsquo;s Difference Engine to British law during the reign of Queen Victoria.) Joshua Tauberer&amp;rsquo;s &lt;a href=&#34;http://www.GovTrack.us&#34;&gt;www.GovTrack.us&lt;/a&gt; (see also his &lt;a href=&#34;http://www.xml.com/pub/a/2006/02/08/govtrack-us-public-data-semantic-web.html&#34;&gt;XML.com article&lt;/a&gt;) uses semantic web technologies to track ongoing US Government activity as they create laws.&lt;/p&gt;
&lt;p&gt;A new organization called &lt;a href=&#34;http://www.legalrdf.org/&#34;&gt;Legal-RDF.org&lt;/a&gt; focuses more on something I&amp;rsquo;ve been wondering about: the use of semantic web technologies to allow for integrated use of the free primary law on the web. Integrating the similar yet often structurally different collections of federal and state US law sounds like an ideal semantic web project; US academic researchers in the field looking for projects that would attract grant money should take a close look at the possibilities.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://www.law.cornell.edu/&#34;&gt;&lt;img src=&#34;http://www.law.cornell.edu/images/tower.gif&#34; alt=&#34;Cornell Legal Information Institute logo&#34; align=&#34;right&#34; border=&#34;0&#34; hspace=&#34;30px&#34; vspace=&#34;30x&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I recently scraped some Supreme Court decision metadata from the excellent collection at Cornell&amp;rsquo;s &lt;a href=&#34;http://www.law.cornell.edu/&#34;&gt;Legal Information Institute&lt;/a&gt;. (I was going to use it to learn &lt;a href=&#34;http://www.w3.org/TR/rdf-sparql-query/&#34;&gt;SPARQL&lt;/a&gt; better, querying for things like how often two judges were on the same side of a concurrence or a dissent for a particular opinion, but this fell in priority as &lt;a href=&#34;https://www.bobdc.com/blog/im-available&#34;&gt;job search&lt;/a&gt; related projects rose in priority.) The HTML at the LII is a bit messy, but the court decisions and opinions have plenty of &lt;code&gt;META&lt;/code&gt; tags, and tools for cleaning up the HTML are easy enough to find, so I turned the metadata into an &lt;a href=&#34;http://www.snee.com/rdf/ussupremect.rdf&#34;&gt;RDF file&lt;/a&gt;. (For an example of the existing metadata, do a View Source on my favorite case, &lt;a href=&#34;http://www.law.cornell.edu/supct/html/92-1292.ZS.html&#34;&gt;510 U.S. 569&lt;/a&gt;, and Justice David Souter&amp;rsquo;s &lt;a href=&#34;http://www.law.cornell.edu/supct/html/92-1292.ZO.html&#34;&gt;opinion&lt;/a&gt;. Don&amp;rsquo;t miss Appendix B of the latter; here&amp;rsquo;s a sample quote: &amp;ldquo;Big hairy woman all that hair it ain&amp;rsquo;t legit, Cause you look like Cousin It&amp;rdquo;.)&lt;/p&gt;
&lt;p&gt;One of the stated goals of Legal-RDF.org (2020-02-23 update: domain name now owned by an ambulance chaser law firm) is &amp;ldquo;to develop and publish domain vocabularies (that is, ontologies) used to label text within legal and related documents with their semantic meaning.&amp;rdquo; For the quick and dirty RDF that I created, I just made up namespace URLs, and Legal-RDF.org work like this ontology project will make it easier for projects like mine to use common namespaces so that they can integrate more easily with each other and, like, you know, form a web. (A folksonomy-oriented list of topics touched on by court cases would have a more difficult time being useful, considering that legal research is the classic use case in which &lt;a href=&#34;http://en.wikipedia.org/wiki/Information_retrieval#Recall&#34;&gt;recall&lt;/a&gt; is more important than &lt;a href=&#34;http://en.wikipedia.org/wiki/Information_retrieval#Precision&#34;&gt;precision&lt;/a&gt;.)&lt;/p&gt;
&lt;p&gt;In general, Legal-RDF.org looks like a great place to get in touch with other people interested in taking on similar pieces of a project that looks like an obvious and useful application of semantic web technologies. I look forward to seeing the results of their work.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2006">2006</category>
      
      <category domain="https://www.bobdc.com//categories/legal-publishing">legal publishing</category>
      
      <category domain="https://www.bobdc.com//categories/semantic-web">semantic web</category>
      
    </item>
    
    <item>
      <title>Writing about software: Naming and spelling things correctly</title>
      <link>https://www.bobdc.com/blog/writing-about-software-naming/</link>
      <pubDate>Wed, 08 Mar 2006 09:51:53 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/writing-about-software-naming/</guid>
      
      
      <description><div>How do you check the spelling of something that isn&#39;t in the dictionary?</div><div>&lt;p&gt;(This is part two of an irregular series that I&amp;rsquo;m doing on writing about software. &lt;a href=&#34;https://www.bobdc.com/blog/writing-about-software-what-do&#34;&gt;What documentation does a product need?&lt;/a&gt; is part one.)&lt;/p&gt;
&lt;p&gt;New technologies often bring along new words and new uses of old ones. Perhaps you&amp;rsquo;ve made up new names for certain components of the application you&amp;rsquo;re documenting. More likely, someone else created new names for something that you must document. You also may be introducing acronyms to your reader, and acronyms less common than &amp;ldquo;HTML&amp;rdquo; should be spelled out the first time they&amp;rsquo;re used. Considering how many things &amp;ldquo;ATM&amp;rdquo; stands for these days, spelling out an acronym once gives the reader valuable context for its use.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://www.amazon.com/exec/obidos/ISBN=0131428993/bobducharmeA&#34;&gt;&lt;img src=&#34;http://images.amazon.com/images/P/0131428993.01._BO2,204,203,200_PIlitb-dp-500-arrow,59_AA240_SH20_SCLZZZZZZZ_.jpg&#34; alt=&#34;Sun Read Me First cover&#34; align=&#34;right&#34; border=&#34;0&#34; hspace=&#34;30px&#34; vspace=&#34;30x&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Where can you find a definitive reference for these spellings, acronym expansions, capitalization and other usage rules? These terms usually come from two classes of sources: specifications and software companies. For a freely available spec like &lt;a href=&#34;http://www.w3.org/TR/2004/REC-xml-20040204/&#34;&gt;the XML Recommendation&lt;/a&gt;, there&amp;rsquo;s no excuse for saying that the acronym stands for &amp;ldquo;eXtensible Markup Language&amp;rdquo; or for forgetting the hyphens in &amp;ldquo;start-tag&amp;rdquo; and &amp;ldquo;end-tag&amp;rdquo;. Just look at the spec. Should the &amp;ldquo;p&amp;rdquo; in XPath be capitalized or not? Look at &lt;a href=&#34;http://www.w3.org/TR/1999/REC-xpath-19991116&#34;&gt;the spec&lt;/a&gt;. Is it more proper to refer to &amp;ldquo;a SQL query&amp;rdquo; or &amp;ldquo;an SQL query&amp;rdquo;? This is a tougher one, because the complete spec is not available for free, but it doesn&amp;rsquo;t take much detective work to find something authoritative: &lt;a href=&#34;http://en.wikipedia.org/wiki/SQL&#34;&gt;Wikipedia&amp;rsquo;s SQL entry&lt;/a&gt; includes links to PDFs of a subset of the ISO/ANSI SQL specification. (It turns out that the phrase &amp;ldquo;an SQL&amp;rdquo; does come up in the SQL spec, and &amp;ldquo;a SQL&amp;rdquo; doesn&amp;rsquo;t.) So, sometimes a bit of detective work is necessary, but it&amp;rsquo;s rarely much and always worth it.&lt;/p&gt;
&lt;p&gt;Should &amp;ldquo;internet&amp;rdquo; be capitalized? Is &amp;ldquo;filename&amp;rdquo; one word or two? There are many cases where multiple renditions can be considered proper, but professional-looking documents come from organizations that pick one variation of each such term and stick with their choice. You shouldn&amp;rsquo;t see &amp;ldquo;internet&amp;rdquo; in the middle of one sentence and &amp;ldquo;Internet&amp;rdquo; in the middle of another sentence in the same document, in the same documentation set, or, ideally, in two different documents coming from the same company.&lt;/p&gt;
&lt;p&gt;To implement this consistency, those in the publishing business (and remember, a software company that wants their documentation to be taken seriously is in the publishing business) have what they call &lt;a href=&#34;http://en.wikipedia.org/wiki/Style_guide&#34;&gt;style guides&lt;/a&gt;, which document their use of words that may offer a choice of usage. (A former co-worker who had worked at Penthouse Magazine—note that I&amp;rsquo;ve resisted the temptation to turn that phrase into a link—once told me of a screaming fight in an editorial meeting over whether &amp;ldquo;butt cheeks&amp;rdquo; is one word or two. I guess every profession has its professionalism.) You can buy the &lt;a href=&#34;http://www.amazon.com/exec/obidos/ISBN=081296389X/bobducharmeA&#34;&gt;New York Times Manual of Style and Usage&lt;/a&gt; at Amazon or any large bookstore, but even better for people documenting software, large software companies often publish their own style guides and sometimes make them available for free.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://www.amazon.com/exec/obidos/ISBN=0735617465/bobducharmeA&#34;&gt;&lt;img src=&#34;http://images.amazon.com/images/P/0735617465.01._BO2,204,203,200_PIsitb-dp-500-arrow,-64_AA240_SH20_SCLZZZZZZZ_.jpg&#34; alt=&#34;MS Manual for Style cover&#34; align=&#34;right&#34; border=&#34;0&#34; hspace=&#34;30px&#34; vspace=&#34;30x&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Sun publishes &lt;a href=&#34;http://www.amazon.com/exec/obidos/ISBN=0131428993/bobducharmeA&#34;&gt;Read Me First! A Style Guide for the Computer Industry&lt;/a&gt; and Microsoft offers the &lt;a href=&#34;http://www.amazon.com/exec/obidos/ISBN=0735617465/bobducharmeA&#34;&gt;Microsoft Manual of Style for Technical Publications&lt;/a&gt;. (The latter was once available for free on their website, but my link to it no longer works, and searches don&amp;rsquo;t turn anything up.) The &lt;a href=&#34;http://developer.apple.com/documentation/UserExperience/Conceptual/APStyleGuide/AppleStyleGuide2006.pdf&#34;&gt;Apple Publications Style Guide&lt;/a&gt; (PDF) is a available for free download, and Oracle&amp;rsquo;s &lt;a href=&#34;http://www.oracle.com/technology/tech/blaf/index.html&#34;&gt;Browser Look and Feel Guidelines&lt;/a&gt; is available online.&lt;/p&gt;
&lt;p&gt;Some, like the Oracle one, concern application development more than documentation development, but they still provide the prose writer with a good reference for terminology. The Microsoft and Apple guides are particularly important when writing about GUI applications running on those platforms. When you tell a reader how to fill out a particular dialog box that pops up on a screen (and remember, a &lt;a href=&#34;http://dictionary.reference.com/search?q=dialog&#34;&gt;dialog&lt;/a&gt; is a conversation between two or more people—a pop-up window with a form to fill out is a dialog box, not a dialog) you need a solid handle on the correct vocabulary for describing the potential parts of a dialog box. &amp;ldquo;Combo box&amp;rdquo; is a proper technical Microsoft term, and combo boxes are different from list boxes, and you should look up the difference if you have to refer to either in documentation.&lt;/p&gt;
&lt;p&gt;Don&amp;rsquo;t underestimate the value of creating your own house style, especially if there is a team of more than one writer working together on documentation. When you do choose between &amp;ldquo;Internet&amp;rdquo; and &amp;ldquo;internet&amp;rdquo; or &amp;ldquo;filename&amp;rdquo; and &amp;ldquo;file name&amp;rdquo;, document it and make sure that everyone who does any writing knows where to find the list and what to do if they have something to discuss and add to the list. Terms related to your own software are especially important to cover on this list.&lt;/p&gt;
&lt;p&gt;Call me picky if you like, but consistency (both internally and with authoritative material outside of your control) is a mark of professionalism, and if you&amp;rsquo;re charging money for your product, you want it to look professional.&lt;/p&gt;
&lt;h2 id=&#34;2-comments&#34;&gt;2 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://danbri.org/&#34; title=&#34;http://danbri.org/&#34;&gt;Dan Brickley&lt;/a&gt; on &lt;a href=&#34;#comment-248&#34;&gt;March 8, 2006 2:17 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Kinda relatyed &amp;mdash; there are some glossaries on the W3C site, &lt;a href=&#34;http://www.w3.org/2003/glossary/&#34;&gt;http://www.w3.org/2003/glossary/&lt;/a&gt; with details here: &lt;a href=&#34;http://www.w3.org/QA/2003/01/Glossary&#34;&gt;http://www.w3.org/QA/2003/01/Glossary&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I couldn&amp;rsquo;t find the RDF, but it&amp;rsquo;s around there somewhere&amp;hellip;&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bob&#34; title=&#34;http://www.snee.com/bob&#34;&gt;Bob&lt;/a&gt; on &lt;a href=&#34;#comment-249&#34;&gt;March 8, 2006 4:04 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Thanks Dan. Considering how many of those words have different meanings in different contexts (even when limited to the domain of information systems), these glossaries are very valuable supplements to the W3C specs themselves.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2006">2006</category>
      
      <category domain="https://www.bobdc.com//categories/documenting-software">documenting software</category>
      
    </item>
    
    <item>
      <title>Easy, professional-looking websites with open source CSS</title>
      <link>https://www.bobdc.com/blog/easy-professionallooking-websi/</link>
      <pubDate>Fri, 03 Mar 2006 09:22:21 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/easy-professionallooking-websi/</guid>
      
      
      <description><div>And, they work well with straightforward XHTML.</div><div>&lt;p&gt;Monday night I went to my first meeting of the &lt;a href=&#34;http://www.neonguild.org/&#34;&gt;Neon Guild&lt;/a&gt;, an association of local Charlottesville technology professionals. (Web designers seemed to dominate, perhaps due to the theme of this month&amp;rsquo;s meeting.) I learned something very valuable about web design: that free, open source CSS stylesheets are available at &lt;a href=&#34;http://www.openwebdesign.org/&#34;&gt;Open Web Design&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://openwebdesign.org/userinfo.phtml?user=kpgururaja&#34;&gt;&lt;img src=&#34;https://www.bobdc.com/img/main/slick.jpg&#34; alt=&#34;[NoProbs thumbnail]&#34; align=&#34;right&#34; border=&#34;0&#34; hspace=&#34;20px&#34; vspace=&#34;20x&#34; width=&#34;240px&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;There are additional open source CSS stylesheets (and many of the same ones) at &lt;a href=&#34;http://www.oswd.org/&#34;&gt;Open Source Web Design&lt;/a&gt;, but after a quick skim I saw more designs that I liked at Open Web Design, which offers nearly 1500 to choose from. Each stylesheet comes with an index.html page to demonstrate it, and it&amp;rsquo;s always a pleasant surprise to see the simplicity of these files. They&amp;rsquo;re usually valid, vanilla XHTML with basic sections such as the main content, side column, menu, and footer wrapped with &lt;code&gt;div&lt;/code&gt; elements that have specific &lt;code&gt;id&lt;/code&gt; values to provide a handle to the CSS stylesheet. (Aging SGML alumni might be tempted to shout &amp;ldquo;We won! People who used to resist the separation of structure from content finally see its value!&amp;rdquo;)&lt;/p&gt;
&lt;p&gt;I looked through the highest-rated ones on the site and ended up taking one called NoProbs, written by a guy in Bangalore who goes by both kpgururaja and Gururaj, and using his CSS stylesheet and index.html model to redo &lt;a href=&#34;http://www.snee.com/bob&#34;&gt;my own home page&lt;/a&gt;. If you compare it to &lt;a href=&#34;http://www.snee.com/bob/index-old.html&#34;&gt;my old one&lt;/a&gt;, which has been through different color schemes and the same design for several years now, I&amp;rsquo;m sure you&amp;rsquo;ll see the improvement. My original one was still a collection of individually edited static pages, and this time I put everything I wanted in &lt;a href=&#34;http://www.snee.com/bob/homepage.xml&#34;&gt;one XML file&lt;/a&gt; and used an &lt;a href=&#34;http://www.snee.com/bob/noprobcss.xsl&#34;&gt;XSLT stylesheet&lt;/a&gt; to convert that to the collection of pages with the right &lt;code&gt;div&lt;/code&gt; wrappers and &lt;code&gt;div/@id&lt;/code&gt; attributes to work correctly with the NoProbs CSS stylesheet. I&amp;rsquo;m sure the same XSLT stylesheet could create web pages for other Open Web Design CSS stylesheets with only minor tweaks. The XML of pre-transformation content also includes a bit of XInclude to incorporate blurbs from the Atom feeds about this weblog and about my recent XML.com articles.&lt;/p&gt;
&lt;p&gt;The whole setup will make it very easy to add a new page and see it show up in the menu or to change details of menu entries without requiring the installation and use of any &amp;ldquo;framework.&amp;rdquo; And, if I ever need to put together a slick-looking web site for another project, I know where my first stop will be.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2006">2006</category>
      
      <category domain="https://www.bobdc.com//categories/neat-tricks">neat tricks</category>
      
      <category domain="https://www.bobdc.com//categories/publishing">publishing</category>
      
    </item>
    
    <item>
      <title>Writing about software: what documentation does a product need?</title>
      <link>https://www.bobdc.com/blog/writing-about-software-what-do/</link>
      <pubDate>Mon, 27 Feb 2006 09:49:47 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/writing-about-software-what-do/</guid>
      
      
      <description><div>Helping users get the most value from a piece of software.</div><div>&lt;p&gt;Based on my experience as a tech writer and a few books I&amp;rsquo;ve written since then, someone recently asked me for advice on documenting a complex software product, and I thought I&amp;rsquo;d share my advice here. After writing it I remembered that two years ago I submitted a proposal to O&amp;rsquo;Reilly for a book on writing about software. After kicking the idea around with Simon St. Laurent a bit, I decided not to go through with it. One thing that put me off was his warning that a place like O&amp;rsquo;Reilly was obviously full of people with very specific opinions on the subject, so a settled outline alone was starting to sound like too much work. Since I had gathered some potentially useful material, and this is a weblog where I can put my opinions and leave others&amp;rsquo; opinions to their own weblogs, I&amp;rsquo;m going to make this a &lt;a href=&#34;http://www.bobdc.com/categories/documenting-software/&#34;&gt;series&lt;/a&gt;. If you&amp;rsquo;re at a small software company, or thinking of launching a product, or just interested in augmenting your coding skills with some complementary prose writing skills, it may be useful. Big software companies already know all this (in fact, we&amp;rsquo;ll see that they provide some very handy resources), and everyone else won&amp;rsquo;t really care. This first installment addresses the question that I was recently asked: what documentation does a software product need?&lt;/p&gt;
&lt;p&gt;The most popular approach since the beginning of the PC era is to think in terms of three volumes, or, if you&amp;rsquo;re not creating bound, hardcopy books, three sets of information that should be easily available to the user. To let my tech writer roots show, I&amp;rsquo;ll put them in a bulleted list:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Getting Started&lt;/strong&gt;: (a.k.a. Tutorial) Assuming that the product in question lets the user create software widgets, this leads a user with only a vague idea of the product&amp;rsquo;s use through the minimal amount that he or she needs to know to get some work done. It may start by walking you through the loading of a widget into the product and demonstrating basic tasks that you can do with a loaded one. Then, it leads you through the creation and use of a new, much simpler widget. Finally, it describes the other things that you can do and describes where to find out more about these tasks in the other two volumes. The combination of this Getting Started guide and a free version of the product make an excellent marketing tool.&lt;/p&gt;
&lt;p&gt;When putting something like this online with no hardcopy version available, remember: the classic advantages of hypertext fall short here, because after a given section, users don&amp;rsquo;t want a choice of where to go next; they won&amp;rsquo;t know where to go. They want to be shown. That&amp;rsquo;s why they&amp;rsquo;re reading the &amp;ldquo;Getting Started&amp;rdquo; manual.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;User Guide&lt;/strong&gt;: (a.k.a. User/Users Manual) a task-oriented approach to using the product. For each &amp;ldquo;how do I do X?&amp;rdquo; that may come up in the user&amp;rsquo;s mind, there should be a corresponding entry in the User Guide table of contents.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Reference Guide&lt;/strong&gt;: (a.k.a. Reference Manual) everything the user might want to know, organized more like the product itself. For example, every menu entry and dialog box should have an index entry. This volume should answer every &amp;ldquo;What&amp;rsquo;s that for?&amp;rdquo; question a user might have while using the product. It should also make it clear how to get to those dialog boxes and menu choices—I&amp;rsquo;ve seen too much documentation that tells me to set something by filling out a particular field on some dialog box without giving me a clue as to where to find that dialog box.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The User Guide is the most work here, because the Getting Started manual can be written in a weekend and the organization and content coverage of the Reference Manual will be self-evident from a detailed review of the product. The User Guide must solve the users&amp;rsquo; problems, and assembling a reasonably complete list of potential problems is much more difficult than listing all the menu choices and dialog boxes.&lt;/p&gt;
&lt;p&gt;When I first decided to write &lt;a href=&#34;http://www.snee.com/bob/xsltquickly/&#34;&gt;XSLT Quickly&lt;/a&gt;, which took a user guide approach to the language after the tutorial in the first twenty pages, I deliberately drafted a table of contents before I had learned much of XSLT. From years of writing Omnimark programs to process SGML, I knew the important tasks faced by someone developing markup conversion programs. I knew that a section title like &amp;ldquo;Getting the most out of xsl:output&amp;rdquo; would be useless to someone learning XSLT, so I had titles like &amp;ldquo;Non-XML Output&amp;rdquo; and &amp;ldquo;Whitespace: preserving and controlling.&amp;rdquo; (Lord knows the latter is a &lt;a href=&#34;http://norman.walsh.name/2006/02/23/whitespace&#34;&gt;common problem&lt;/a&gt; with XML processing.)&lt;/p&gt;
&lt;p&gt;Should you deliver documentation as printed, bound books? As online help included with the product? As HTML files on your company&amp;rsquo;s website? Be prepared to deliver all of them, as well as formats that haven&amp;rsquo;t been invented yet. This is why people store such documentation in &lt;a href=&#34;http://www.docbook.org&#34;&gt;DocBook&lt;/a&gt; XML and, increasingly, &lt;a href=&#34;http://en.wikipedia.org/wiki/DITA&#34;&gt;DITA&lt;/a&gt;; it&amp;rsquo;s also why so many first-generation XML geeks started off as SGML geeks employed as tech writers at software companies: they had to create content that could be mixed and matched and published in multiple media. At &lt;a href=&#34;http://www.informationbuilders.com/&#34;&gt;Information Builders&lt;/a&gt;, I developed scripts to convert documentation into online help for mainframes, OS/2, Unix flavors, and Windows winhelp—remember &lt;a href=&#34;http://msdn2.microsoft.com/en-us/library/5y1h2zxw.aspx&#34;&gt;hlp&lt;/a&gt; files? The main target for IBI at the time, though, was always printed bound books. If you&amp;rsquo;re planning to develop these three volumes and online help, the Reference Guide will provide the bulk of the on-line help material. Clicking the Help button on the Foobar dialog box should lead to the same information that the &amp;ldquo;Foobar dialog box&amp;rdquo; index entry points to in the printed Reference Guide.&lt;/p&gt;
&lt;p&gt;Once, when I complained about the difficulty of using a product&amp;rsquo;s documentation, a saleswoman from the product&amp;rsquo;s vendor kept saying &amp;ldquo;but it&amp;rsquo;s all online!&amp;rdquo; as if that alone made it wonderful. I replied &amp;ldquo;Yes, but the online documentation is badly organized, and I can never find the answers I need.&amp;rdquo; (Have you noticed lately how a Google search on the use of a particular Microsoft Word feature retrieves a good answer faster than Word&amp;rsquo;s built-in online help does?) A lack of paper documentation does not absolve the designer of a product costing hundreds of dollars from an obligation to help users quickly find answers to questions that fall into three categories: How do I get up and running with this product? How do I make the product do this particular task? What will that aspect of the product do for me? These answers may or may not be grouped under the headers &amp;ldquo;Getting Started,&amp;rdquo; &amp;ldquo;User Guide,&amp;rdquo; and &amp;ldquo;Reference Guide,&amp;rdquo; but a clear representation of these three concepts will help users get more efficient work done with the software product, turning them into happier customers.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2006">2006</category>
      
      <category domain="https://www.bobdc.com//categories/documenting-software">documenting software</category>
      
    </item>
    
    <item>
      <title>Googling DITA</title>
      <link>https://www.bobdc.com/blog/googling-dita/</link>
      <pubDate>Thu, 23 Feb 2006 18:34:10 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/googling-dita/</guid>
      
      
      <description><div>Information Typing Architecture or Fetish Star?</div><div>&lt;p&gt;For an upcoming post, I was going to link the acronym DITA to a good page for more information about the Darwin Information Typing Architecture, a DTD that is growing in popularity for representing technical information. Not for the first time, I did a Google search on &amp;ldquo;DITA&amp;rdquo; and laughed out loud upon seeing what the first hit was. I don&amp;rsquo;t want to spoil it for you, but suffice to say that I can&amp;rsquo;t tell you much anyway because I&amp;rsquo;m at work and the Websense filter won&amp;rsquo;t let me follow the link. Try it yourself: &lt;a href=&#34;http://www.google.com/search?q=dita&#34;&gt;http://www.google.com/search?q=dita&lt;/a&gt;. (2020-02-23 update: at the time I wrote that, the number one hit was &lt;a href=&#34;http://www.dita.net/&#34;&gt;dita.net&lt;/a&gt;.)&lt;/p&gt;
&lt;h2 id=&#34;5-comments&#34;&gt;5 Comments&lt;/h2&gt;
&lt;p&gt;By Erik Hennum on &lt;a href=&#34;#comment-237&#34;&gt;February 23, 2006 9:09 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Um, DITA enthusiasts (those of a topic orientation) often suggest searching for &amp;ldquo;DITA XML.&amp;rdquo; A modest demonstration of the need for semantic search?&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bobdc.blog&#34; title=&#34;http://www.snee.com/bobdc.blog&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-238&#34;&gt;February 23, 2006 11:41 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;That would certainly imply different semantics for the &amp;ldquo;I&amp;rsquo;m Feeling Lucky&amp;rdquo; button.&lt;/p&gt;
&lt;p&gt;Bob&lt;/p&gt;
&lt;p&gt;By Eliot Kimber on &lt;a href=&#34;#comment-239&#34;&gt;February 24, 2006 7:28 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;That is very funny. Could certainly make the time spent on the DITA TC conference calls pass a little more quickly&amp;hellip;.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://norman.walsh.name/&#34; title=&#34;http://norman.walsh.name/&#34;&gt;Norman Walsh&lt;/a&gt; on &lt;a href=&#34;#comment-240&#34;&gt;February 24, 2006 10:30 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;For better or worse, I tend to attach DITA to it&amp;rsquo;s Wikipedia page when I mention it. In fact, I use links to Wikipedia in constructing the &amp;ldquo;subject index&amp;rdquo; page for my site, but that&amp;rsquo;s a different topic altogether :-)&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.billtrippe.com&#34; title=&#34;http://www.billtrippe.com&#34;&gt;Bill Trippe&lt;/a&gt; on &lt;a href=&#34;#comment-241&#34;&gt;February 26, 2006 10:02 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;And she&amp;rsquo;s Marilyn Manson&amp;rsquo;s main squeeze, which, somehow, makes it even better.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2006">2006</category>
      
      <category domain="https://www.bobdc.com//categories/dita">DITA</category>
      
    </item>
    
    <item>
      <title>Making hot sauce</title>
      <link>https://www.bobdc.com/blog/making-hot-sauce/</link>
      <pubDate>Mon, 20 Feb 2006 17:08:52 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/making-hot-sauce/</guid>
      
      
      <description><div>Learning a new technology.</div><div>&lt;p&gt;When learning any new technology, I like to start with the smallest, most stripped-down &amp;ldquo;&lt;a href=&#34;http://en.wikipedia.org/wiki/Hello_world&#34;&gt;hello world&lt;/a&gt;&amp;rdquo; app I can. I want to create the most minimal demonstration that qualifies as a working example and then build from there so that it&amp;rsquo;s absolutely clear to me what is truly necessary and what each extra adds. As it turns out, this makes plenty of sense when learning how to make hot sauce, and the &amp;ldquo;hello world&amp;rdquo; of hot sauce is remarkably simple.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://www.texaspete.com&#34;&gt;&lt;img src=&#34;http://secure.mycart.net/product_images/catalog7805/picname1097375.jpg&#34; alt=&#34;[Texas Pete bottles]&#34; align=&#34;right&#34; border=&#34;0&#34; hspace=&#34;30px&#34; vspace=&#34;30x&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Small children like to think of putting hot sauce on their pizza as a dangerous, exciting, grown-up thing to do. &lt;a href=&#34;http://www.texaspete.com/&#34;&gt;Texas Pete&lt;/a&gt; is a good one to start with because while it&amp;rsquo;s not very hot, it&amp;rsquo;s still a red sauce in a bottle with a red and yellow label that implies burning heat. My younger daughter has moved beyond Texas Pete, because a store near us has such a good selection that we love picking out new ones, and when she pointed out the &lt;a href=&#34;http://scientificsonline.com/Product.asp_Q_pn_E_3097700&#34;&gt;Hot Sauce Kit&lt;/a&gt; in the Edmund Scientific catalog, we got it for her for Christmas.&lt;/p&gt;
&lt;p&gt;Stories of the &lt;a href=&#34;http://tabasco.com/tabasco_history/hot_pepper.cfm?xcode=history_aged_oak&#34;&gt;Tabasco&lt;/a&gt; company packing peppers in salt to ferment them for years before using them always gave me the impression that making hot sauce was a complicated process, but it doesn&amp;rsquo;t have to be. The Edmund kit turned out to be bottles, dried spices, and directions. If you skip the kit, the most difficult part to acquire is the bottles, which you can get by cleaning out existing ones as you use up store-bought hot sauce. (A small funnel to get your product into the bottle is handy.) We started with the simplest recipe in the kit, so here&amp;rsquo;s my approximation, the &amp;ldquo;hello world&amp;rdquo; of hot sauce:&lt;/p&gt;
&lt;p&gt;Cut the stems off of some hot peppers and blanch them in boiling white vinegar for two or three minutes. Put the peppers, half a cup of the hot vinegar, and a teaspoon of salt into a food processor or minichopper, puree it, and put it into a bottle.&lt;/p&gt;
&lt;p&gt;That&amp;rsquo;s it. There are hundreds of optional steps, with our first tier being:&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://scientificsonline.com/Product.asp_Q_pn_E_3097700&#34;&gt;&lt;img src=&#34;http://scientificsonline.com/images/250/30977-00-silo.jpg&#34; alt=&#34;[Some Like It Hot kit]&#34; align=&#34;right&#34; border=&#34;0&#34; hspace=&#34;30px&#34; vspace=&#34;30x&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Wait a week before extensive consumption, because it does improve with age.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;For the brief cooking of the peppers, grilling them adds to the flavor, but you should still boil (or just microwave) the vinegar a bit before pureeing the ingredients together.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;A little sugar is a typical ingredient in some of the hotter sauces.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For other optional ingredients, look at the label of your favorite hot sauces and do some web searches. The Edmund kit came with spices such as garlic powder and dried ginger, but fresh garlic and ginger are obviously better, and lime juice is great. We put enough ingredients (including peppers that weren&amp;rsquo;t too hot) into our first few hot sauces that the result was gloppier than your typical Tabasco-type sauce, so yesterday we used habaneros (hot enough that you should use rubber gloves when cutting them) to make a sauce that would be easier to shake out of a bottle while still being hot. Two orange habaneros made us a bottle that wasn&amp;rsquo;t quite as hot as a typical new bottle of standard Tabasco sauce. We also put slices of ginger in the vinegar when we heated it up and added sugar, lime juice, and chopped garlic to the minichopper mix.&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s an excellent parent-kid project, especially when evaluating optional ingredients to add. If any of my relatives are reading this, please act surprised next Christmas when you receive bottles of hot sauce with elaborately designed labels as presents. And, as with making a few batches of beer or taking lessons on a musical instrument for just a few months, the experience gives you a better appreciation of which professionals are particularly good at what they do.&lt;/p&gt;
&lt;h2 id=&#34;2-comments&#34;&gt;2 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://planb.nicecupoftea.org/&#34; title=&#34;http://planb.nicecupoftea.org/&#34;&gt;Libby&lt;/a&gt; on &lt;a href=&#34;#comment-226&#34;&gt;February 21, 2006 8:18 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Bob, perhaps you&amp;rsquo;d also like &lt;a href=&#34;http://www.flickr.com/photos/nicecupoftea/68746990/&#34;&gt;open source cola&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;By Bob DuCharme on &lt;a href=&#34;#comment-227&#34;&gt;February 21, 2006 8:44 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Thanks Libby!&lt;/p&gt;
&lt;p&gt;In my posting forgot to mention &lt;a href=&#34;http://www.cookingforengineers.com&#34;&gt;Cooking for Engineers&lt;/a&gt;, a web site that should appeal to hungry geeks.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2006">2006</category>
      
      <category domain="https://www.bobdc.com//categories/neat-tricks">neat tricks</category>
      
    </item>
    
    <item>
      <title>Linking in</title>
      <link>https://www.bobdc.com/blog/linking-in/</link>
      <pubDate>Fri, 17 Feb 2006 12:57:09 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/linking-in/</guid>
      
      
      <description><div>The &#34;social networking&#34; site turns out to be useful and practically fun.</div><div>&lt;p&gt;I answered my first few &lt;a href=&#34;http://www.linkedin.com&#34;&gt;LinkedIn&lt;/a&gt; invitations with an RDF geek response: &amp;ldquo;look, I&amp;rsquo;ll point to your &lt;a href=&#34;http://www.foaf-project.org/&#34;&gt;FOAF&lt;/a&gt; file if you want to point one at &lt;a href=&#34;http://www.snee.com/bob/foaf.rdf&#34;&gt;mine&lt;/a&gt;.&amp;rdquo; When I started gathering information for a job search, &lt;a href=&#34;http://ourworld.compuserve.com/homepages/Ken_North/&#34;&gt;Ken North&lt;/a&gt; suggested that I reconsider my attitude about LinkedIn, so I joined up. The first surprise was how many RDF geeks I saw there. The XML community in general is pretty well-represented.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://www.linkedin.com&#34;&gt;&lt;img src=&#34;https://www.linkedin.com/img/logos/logo.gif&#34; alt=&#34;[LinkedIn logo]&#34; align=&#34;right&#34; border=&#34;0&#34; hspace=&#34;30px&#34; vspace=&#34;30x&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The next surprise was how much fun it could be to troll around, checking out my contacts&amp;rsquo; contacts and finding out how many friends who I didn&amp;rsquo;t think knew each other actually do. For example, Priscilla Walmsley, Micah Dubinko, Dale Waldt, Betty Harvey, Eve Maler, Tim Bray, and Zarella Rendon all know &lt;a href=&#34;http://www.billtrippe.com/&#34;&gt;Bill Trippe&lt;/a&gt;. I only met Bill briefly once after a talk he gave in New York, but my brother once worked for him at INSO, my LexisNexis boss Chet Ensign interviewed him when he did an SGML book for Prentice-Hall, I own a copy of Bill&amp;rsquo;s &lt;a href=&#34;http://www.amazon.com/exec/obidos/ISBN=0764548891/bobducharmeA/&#34;&gt;book on DRM technology&lt;/a&gt;&amp;hellip; apparently there were far more connections than I realized. So I sent him a LinkedIn invitation to him, and he &lt;a href=&#34;http://www.billtrippe.com/archives/2006/02/linkedin.html&#34;&gt;mentioned it&lt;/a&gt; in his blog.&lt;/p&gt;
&lt;p&gt;If there&amp;rsquo;s someone you want to meet, LinkedIn tells you someone you know (in my case, usually former RIA co-worker &lt;a href=&#34;http://www.axtiveminds.com/&#34;&gt;Dale Waldt&lt;/a&gt;) who is one degree closer to the person you want to meet and offers to route a request for an introduction. You write a cover message to the person you want to meet and another to the person you know (&amp;ldquo;Yo, Dale, please forward this&amp;rdquo;).&lt;/p&gt;
&lt;p&gt;The &amp;ldquo;product&amp;rdquo; LinkedIn offers is interesting: a collection of data, stepped access to it, and the ability to run one of the most classic computer science algorithms against it. (Well, actually, one from a &lt;a href=&#34;http://www.nist.gov/dads/HTML/shortestpath.html&#34;&gt;family of algorithms&lt;/a&gt;—if you came up with a good new member for the family, you would be considered quite the computer scientist.) This is all free, so far; they make their money from advertising and from offering memberships with greater access to data and features.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://www.foaf-project.org/&#34;&gt;&lt;img src=&#34;http://www.foaf-project.org/images/foaflets.jpg&#34; alt=&#34;[FOAF logo]&#34; width=&#34;120px&#34; align=&#34;right&#34; border=&#34;0&#34; hspace=&#34;30px&#34; vspace=&#34;30x&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;FOAF files are fun, and interesting demo apps have been built around them, but instead of listing our actual friends they tend to only list friends of ours with FOAF files. I&amp;rsquo;ve joked that it might be more accurate to call them FOED files, for &amp;ldquo;Friends Of Edd Dumbill,&amp;rdquo; because Edd shows up in so many of the ones I&amp;rsquo;ve seen. (I added the URL for &lt;a href=&#34;http://www.snee.com/bob/img/EddBob12-03.jpg&#34;&gt;this picture&lt;/a&gt; to my FOAF link to Edd to give my FOAF file a little &lt;a href=&#34;http://rdfweb.org/2002/01/photo/&#34;&gt;co-depiction&lt;/a&gt; juice. I don&amp;rsquo;t know what the strange bowl of liquid near the sugar packets in the picture is, but I suppose if I assign it a URL then Edd and I will be linked to it.)&lt;/p&gt;
&lt;p&gt;It would be unfair to the FOAF project to draw too many comparisons to LinkedIn. I&amp;rsquo;ll just say that as a dot com with some real money behind them, LinkedIn has built a useful application, and if they opened it up with an API, the results would be fascinating. And thanks, Ken, for pushing me to join!&lt;/p&gt;
&lt;h2 id=&#34;2-comments&#34;&gt;2 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://plindenbaum.blogspot.com&#34; title=&#34;http://plindenbaum.blogspot.com&#34;&gt;Pierre&lt;/a&gt; on &lt;a href=&#34;#comment-221&#34;&gt;February 17, 2006 4:49 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Hi, the problem with FOAF files is that it seems to me that the only people using it are computer geeks. Of course most people don&amp;rsquo;t know anything about RDF, how to install a FOAF on a server, what to do with it, how to modify it etc.. whereas creating a profile on LinkedIn is straightforward and it is really easy to search in your network (the problem is to convince non-computer friends to accept my invitation and to extend their network :-) &amp;hellip;.)-. Another social networks, less known generate FOAF from your profile:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;O&amp;rsquo;Reilly (&lt;a href=&#34;http://connection.oreilly.com/&#34;&gt;http://connection.oreilly.com/&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;http://videntity.org/&#34;&gt;http://videntity.org/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;?&amp;hellip;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;On my side, I&amp;rsquo;ve been trying to generate and display FOAF files for Biologist: see &lt;a href=&#34;http://www.urbigene.com/foaf/&#34;&gt;SciFOAF&lt;/a&gt; and &lt;a href=&#34;http://www.urbigene.com/foafexplorer/&#34;&gt;MyFOAFExplorer&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://planb.nicecupoftea.org/&#34; title=&#34;http://planb.nicecupoftea.org/&#34;&gt;Libby&lt;/a&gt; on &lt;a href=&#34;#comment-222&#34;&gt;February 17, 2006 6:41 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Pierre, also if you&amp;rsquo;ve not seen it: &lt;a href=&#34;http://rdfweb.org/topic/DataSources&#34;&gt;http://rdfweb.org/topic/DataSources&lt;/a&gt; has a bunch of exporters of foaf. Your tools are very nice. Care to add something to &lt;a href=&#34;http://esw.w3.org/topic/SemanticWebDOAPBulletinBoard&#34;&gt;http://esw.w3.org/topic/SemanticWebDOAPBulletinBoard&lt;/a&gt; for us?&lt;/p&gt;
&lt;p&gt;Bob, I can&amp;rsquo;t work out how to &amp;lsquo;friend&amp;rsquo; you in linkedin - since I know you already it seems a bit weird to persuade one of my friends to introduce us. Guess I need to upgrade my account&amp;hellip;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2006">2006</category>
      
      <category domain="https://www.bobdc.com//categories/miscellaneous">miscellaneous</category>
      
    </item>
    
    <item>
      <title>I&#39;m available</title>
      <link>https://www.bobdc.com/blog/im-available/</link>
      <pubDate>Sun, 12 Feb 2006 19:07:09 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/im-available/</guid>
      
      
      <description><div>Moving on from LexisNexis.</div><div>&lt;p&gt;After five years of working on XML architecture and metadata issues on a huge scale at perhaps the world&amp;rsquo;s oldest commercial online information provider, I&amp;rsquo;ll be moving on in late March, and I&amp;rsquo;m looking for interesting new opportunities. (&amp;ldquo;A-ha!&amp;rdquo; say all the friends wondering about my recent interest in LinkedIn.)&lt;/p&gt;
&lt;p&gt;As a wildly successful standard, XML lets us assemble tools and technologies from very different sources to create new possibilities for the creation and use of data, and these new possibilities are what excite me. While some in the XML world feel that with XML&amp;rsquo;s 2006 status as &amp;ldquo;the new ASCII&amp;rdquo; that is simply &amp;ldquo;part of the plumbing,&amp;rdquo; it&amp;rsquo;s become boring, but to me, plumbing is how you hook systems up together, and each new XML-related technology and HTTP-capable device or platform that crops up around us adds to the combination of things that we can hook up and therefore to the number of cool applications that we can create. These applications can lead to new products from your company, new features in existing products, or to new, more efficient internal systems if you&amp;rsquo;re not in the business of selling software or information products.&lt;/p&gt;
&lt;p&gt;With my software development roots in the SGML world, information products are where I&amp;rsquo;ve had the most experience. Content is a business asset, and new technology that lets us sort, index, cross-reference, and slice and dice that content adds value to it because it lets people use that information to serve more purposes, both for content owners and for their customers. I&amp;rsquo;m particularly fascinated by the growing ability to do this with free software, because it puts so many opportunities into so many hands.&lt;/p&gt;
&lt;p&gt;I have extensive background in learning new technology and evaluating its potential for both end-user and management audiences. I&amp;rsquo;m happy to speak to small or large audiences in person or over the phone, and while I&amp;rsquo;ve never been a journalist, my tech writing background has taught me how to do professional writing quickly. This background also gives me a classic &amp;ldquo;doc-head&amp;rdquo; background, for those who follow the &lt;a href=&#34;http://www.snee.com/xml/xml2004paper.html&#34;&gt;document-oriented/data-oriented distinction&lt;/a&gt; in XML architectures, but since then, I received a Masters degree in computer science, giving me a broader system architecture perspective than that of more typical former tech writers.&lt;/p&gt;
&lt;p&gt;My family and I want to stay in Charlottesville, but I&amp;rsquo;m not averse to travel, and we are an afternoon&amp;rsquo;s drive away from Washington D.C. I do have plenty of telecommuting experience, including the giving of presentations to people I can&amp;rsquo;t see who are spread around the world.&lt;/p&gt;
&lt;p&gt;So send me an email (&lt;a href=&#34;mailto:bob@snee.com&#34;&gt;bob@snee.com&lt;/a&gt;) if you have any ideas, and feel free to let people know about my &lt;a href=&#34;http://www.snee.com/bob/resume.pdf&#34;&gt;resume&lt;/a&gt; (Acrobat format; for more information see my &lt;a href=&#34;http://www.snee.com/bob/&#34;&gt;home page&lt;/a&gt; under &lt;a href=&#34;http://www.snee.com/bob/xmlsgml.html&#34;&gt;xml, sgml&lt;/a&gt; and &lt;a href=&#34;http://www.snee.com/bob/worksch.html&#34;&gt;work, school&lt;/a&gt;). I&amp;rsquo;m excited by many industry and non-industry developments I&amp;rsquo;ve been hearing about, and I look forward to hearing more.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2006">2006</category>
      
      <category domain="https://www.bobdc.com//categories/miscellaneous">miscellaneous</category>
      
    </item>
    
    <item>
      <title>Pulling data out of computers in the mid-twentieth and early twenty-first centuries</title>
      <link>https://www.bobdc.com/blog/pulling-data-out-of-computers/</link>
      <pubDate>Wed, 08 Feb 2006 17:48:02 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/pulling-data-out-of-computers/</guid>
      
      
      <description><div>Report generation in the 1950s and the future of RDF.</div><div>&lt;p&gt;I&amp;rsquo;ve &lt;a href=&#34;https://www.bobdc.com/blog/25-years-of-database-history-s-1#more&#34;&gt;written before&lt;/a&gt; about W.C. McGee&amp;rsquo;s 1981 article in IBM&amp;rsquo;s Journal of Research and Development covering the history of database systems from 1955 to 1980, and I left off saying that I&amp;rsquo;d devote a separate entry to his history of report generation. The creation of reports may sound mundane, but throughout the history of computers the pulling of data meeting specific criteria is the most important thing we do with computers. (Why put data in or calculate new data in the first place?) Since writing that, I&amp;rsquo;ve found a more primary source, written by McGee in 1959 when he was at GE&amp;rsquo;s Hanford Atomic Products Operation, titled &lt;a href=&#34;http://portal.acm.org/citation.cfm?id=320955&#34;&gt;Generalization: Key to Successful Electronic Data Processing&lt;/a&gt; (unfortunately, this requires ACM membership or a fee to access).&lt;/p&gt;
&lt;p&gt;The &amp;ldquo;generalization&amp;rdquo; he writes of is the creation of routines that can be re-used in multiple applications, such as a sorting routine. &amp;ldquo;Thus, by suitable &lt;em&gt;generalization&lt;/em&gt; [his italics] it is possible to design a sorting routine that will sort &lt;em&gt;any&lt;/em&gt; file, regardless of the data it contains.&amp;rdquo; While many take the principle of increased abstraction to promote code re-use for granted today, Harold Abelson and Gerald Jay Sussman don&amp;rsquo;t in their classic &lt;a href=&#34;http://mitpress.mit.edu/sicp/full-text/book/book.html&#34;&gt;Structure and Interpretation of Computer Programs&lt;/a&gt;, devoting plenty of pages to why it&amp;rsquo;s a Good Thing and the best way to go about it.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://portal.acm.org/citation.cfm?id=320955&#34;&gt;&lt;img src=&#34;https://www.bobdc.com/img/main/rptgen.jpg&#34; alt=&#34;[diagram of records on magnetic tape]&#34; align=&#34;right&#34; border=&#34;0&#34; class=&#34;rightAlignedOpeningPicture&#34; width=&#34;450px&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;As an example of these routines, McGee describes &amp;ldquo;the generalized report generation and file maintenance routines [that] have been available to Data Processing planning personnel a little less than three months at the time of this writing.&amp;rdquo; Try to imagine the first ever techniques for modular generation of parameterized reports being more recent than &lt;a href=&#34;http://en.wikinews.org/wiki/Apple_introduces_new_iPod_with_video_playback_capabilities&#34;&gt;Apple&amp;rsquo;s introduction of the video iPod&lt;/a&gt; is today. The automated generation of printed reports based on mechanically stored data had already been around for decades in the use of punched card manipulation machines, but with no software to speak of, redesigning those reports meant rewiring plugboards. In McGee&amp;rsquo;s description of doing this with software, it&amp;rsquo;s interesting to read such a primary source on some of the earliest uses of terms such as file, record, and field. (Some of the earliest that I know of, anyway—I&amp;rsquo;d love to see pointers to earlier use of these terms.)&lt;/p&gt;
&lt;p&gt;McGee&amp;rsquo;s 1981 paper describes developments in report generation after his Hanford work, particularly in the early 1960s at IBM on their &lt;a href=&#34;http://www-03.ibm.com/ibm/history/exhibits/mainframe/mainframe_PP1401.html&#34;&gt;1401&lt;/a&gt; machine. This &amp;ldquo;Report Program Generator&amp;rdquo; program built on the Hanford work, and evolved into RPG II and RPG III, programs I remember seeing mentioned in job listings when I was younger. And now I know what &amp;ldquo;RPG&amp;rdquo; stands for!&lt;/p&gt;
&lt;p&gt;In the Hanford days, when people considered the use of alphanumeric assembly language symbols to represent machine opcodes to be a leap forward in ease and usability, a one-day turnaround for the design, implementation, and generation of reports added a huge amount of value to computers and to the data on them. As data models and delivery platforms have evolved since then, the ability of less technical users to more easily get the data they want has driven the adoption of many new platforms and data models, and, extrapolating to the future, I think that a report generator will be the killer app that RDF is waiting for. (Either that or a simplification of RDF comparable to what XML did to SGML.)&lt;/p&gt;
&lt;p&gt;While playing with Leigh Dodds&amp;rsquo; &lt;a href=&#34;http://www.ldodds.com/projects/twinkle/&#34;&gt;Twinkle&lt;/a&gt; SPARQL query tool, which I would describe as a simple IDE for the creation and running of queries and viewing of results, I was thinking about the possibility of a GUI tool that generates and runs queries against RDF data sets. (OK, RDF &amp;ldquo;graphs.&amp;rdquo; I don&amp;rsquo;t like this term because, while I know it refers to a class of data structures and not a class of pictures, it&amp;rsquo;s the kind of technical alternative use of plain English terms that has helped to confine most RDF use to academia.) This tool would allow a user unfamilar with SPARQL or RDF syntax to fill out dialog boxes to show which data he or she wants, and it would then generate and run a SPARQL query without that user ever seeing that query. I think that a tool like this would help people appreciate the value of the flexibility of RDF, and Leigh agrees. Users of such a tool will miss out on several things, just as the legions of people using Crystal Report Writer who aren&amp;rsquo;t fluent in SQL are missing out on a few things, but like those people generating payroll and inventory reports from the relational databases, people using such an RDF query tool will get more useful work done than they would if they weren&amp;rsquo;t using data stored according to this model.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;m going to try to prototype such a tool. So, if you see a lower frequency of postings on my weblog for a little while, rest assured that I&amp;rsquo;m doing something more productive than reading ancient computer science literature.&lt;/p&gt;
&lt;h2 id=&#34;2-comments&#34;&gt;2 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://codinginparadise.org&#34; title=&#34;http://codinginparadise.org&#34;&gt;Brad Neuberg&lt;/a&gt; on &lt;a href=&#34;#comment-199&#34;&gt;February 9, 2006 3:24 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Cool post; I love computer history, especially finding out the roots of concepts I take for granted.&lt;/p&gt;
&lt;p&gt;I also like how you mentioned someone doing to RDF what XML did to SGML&amp;hellip;. that&amp;rsquo;s been a long overdue thing that I&amp;rsquo;ve expected to happen, but hasn&amp;rsquo;t. You should do a drastic refactoring and paring down (or a reconceptualization) of RDF to make it more amenable to the mainstream and easier to work with. Your right about how technical language can also create barriers to adoption; when the RDF community throws around terms like ontology markup langauges, you bet this scares off your standard enterprise developer.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://clarkparsia.com/&#34; title=&#34;http://clarkparsia.com/&#34;&gt;Kendall Clark&lt;/a&gt; on &lt;a href=&#34;#comment-200&#34;&gt;February 9, 2006 3:37 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Hey Bob,&lt;/p&gt;
&lt;p&gt;We&amp;rsquo;ve done such a thing: &lt;a href=&#34;http://clarkparsia.com/projects/code/jspace/&#34;&gt;Jspace&lt;/a&gt; is a visual query builder for RDF databases with an interesting UI wrapped around it. Basically a &amp;ldquo;polyarchical browser&amp;rdquo;. There&amp;rsquo;s an in-browser version cooked up by the folks at Southampton, mSpace, which we&amp;rsquo;ve reimplemented as a Java app and used for the problem of database integration for NASA. They&amp;rsquo;re using it to do expertise location.&lt;/p&gt;
&lt;p&gt;So, basically, you convert existing databases into RDF, federate them, and then run JSpace against the result &amp;ndash; what you end up with is a pretty useful tool which, behind the scenes, is doing exactly what yr article suggests: building queries based on a user&amp;rsquo;s arbitrary navigation through an information space and presenting the results, i.e., reports, to the user in some novel ways. (Okay, I&amp;rsquo;m exaggerating a tiny bit: you also have to write a JSpace browser &amp;ldquo;model&amp;rdquo; which tells the tool how to relate the federated bits together, but it&amp;rsquo;s not very difficult to write and it&amp;rsquo;s just another RDF graph thingie.)&lt;/p&gt;
&lt;p&gt;For NASA we threw in social network graphs (to locate an expert in a rolodex culture, yr next best move is to call someone who is closer to that expert in the social network) just for fun! :&amp;gt;&lt;/p&gt;
&lt;p&gt;There are more features to add, including Atom channel generation for arbitrary queries, etc., but you might want to download and play with it some. It&amp;rsquo;s GPL&amp;rsquo;d. ;&amp;gt;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2006">2006</category>
      
      <category domain="https://www.bobdc.com//categories/technology-future">technology, future</category>
      
      <category domain="https://www.bobdc.com//categories/technology-past">technology, past</category>
      
    </item>
    
    <item>
      <title>Self-publishing bound, hardcopy books</title>
      <link>https://www.bobdc.com/blog/selfpublishing-bound-hardcopy/</link>
      <pubDate>Fri, 03 Feb 2006 12:31:28 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/selfpublishing-bound-hardcopy/</guid>
      
      
      <description><div>lulu.com plus free XML technology makes it pretty easy these days; I just did it.</div><div>&lt;p&gt;My first book was a crash course in basic end user tasks for using the MVS, VM/CMS, OS/400, VMS, and Unix operating systems: logging in, navigating the file system, using e-mail, using the text editor, listing, creating, and deleting files, and so forth. I wanted to call it &amp;ldquo;Fake Your Way Through Minis and Mainframes&amp;rdquo; but the McGraw-Hill Professional Book Division decided that &amp;ldquo;The Operating Systems Handbook&amp;rdquo; sounded more, well, professional, especially for a $49.50 hardcover. It was published in 1994 and sold a few thousand copies.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://www.snee.com/bob/opsys.html&#34;&gt;&lt;img src=&#34;https://www.bobdc.com/img/main/opsys75.jpg&#34; alt=&#34;[Operating Systems Handbook cover]&#34; align=&#34;right&#34; border=&#34;0&#34; hspace=&#34;30px&#34; vspace=&#34;30x&#34; width=&#34;180px&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;When it went out of print, I had McGraw-Hill revert the rights to me and put free Acrobat files of the book&amp;rsquo;s contents on its &lt;a href=&#34;http://www.snee.com/bob/opsys.html&#34;&gt;home page&lt;/a&gt;. I had used &lt;a href=&#34;http://archive.salon.com/21st/feature/1998/08/25feature.html&#34;&gt;XyWrite&lt;/a&gt; to write the book, so a few perl scripts converted the XyWrite files to &lt;a href=&#34;https://docbook.org/&#34;&gt;DocBook&lt;/a&gt; XML, and Norm Walsh&amp;rsquo;s &lt;a href=&#34;http://docbook.sourceforge.net/projects/xsl/&#34;&gt;DocBook stylesheets&lt;/a&gt; and &lt;a href=&#34;http://xmlgraphics.apache.org/fop/&#34;&gt;FOP&lt;/a&gt; made it pretty easy to create the Acrobat files. People download them, and the MVS section is particularly popular—I get several e-mails a year, some from ibm.com, thanking me for making it available.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve had some fun with &lt;a href=&#34;http://www.cafepress.com/cp/info/&#34;&gt;cafepress.com&lt;/a&gt;&amp;rsquo;s ability to put any image on a T-shirt, a coffee mug, a clock, a lunchbox&amp;hellip; the list has grown over time, and when they added the ability to create books from PDF files, I started to think about making a new, inexpensive bound hardcopy version of &amp;ldquo;The Operating Systems Handbook&amp;rdquo; available. After some research, I discovered &lt;a href=&#34;http://www.lulu.com&#34;&gt;lulu.com&lt;/a&gt;, which is more focused on book publishing than cafepress and makes it even less expensive. Like cafepress, you don&amp;rsquo;t spend a penny on anything unless you want to buy something you&amp;rsquo;ve designed for them to sell, and you can charge a little extra and keep the difference if you&amp;rsquo;re willing to do some extra paperwork. Of course their main clientele is people with novels and poetry that no publisher was interested in, but they&amp;rsquo;re handy for republishing out-of-print books about mainframes and minicomputers. (The fact that you can publish a single copy of a hardcover book with a color cover of your own design for $18 inspires the gag gift reflex in me, too.) I decided to do the whole book and a smaller, less expensive version with just the MVS part.&lt;/p&gt;
&lt;p&gt;I had to revise the stylesheet a little and generate new PDF files, because a bound book needs a wider left margin on odd-numbered pages and a wider right margin on even-numbered pages. While I was at it, I tweaked various other font and margin settings. Customizing the DocBook stylesheets is the classic use case for the difference between &lt;code&gt;xsl:include&lt;/code&gt; and &lt;code&gt;xsl:import&lt;/code&gt; in XSLT: instead of revising the actual DocBook stylesheets, you&amp;rsquo;re better off creating a new stylesheet that uses &lt;code&gt;xsl:import&lt;/code&gt; to import the main DocBook stylesheet, copying the parts that you want to revise into your new stylesheet, and then revising them there, where they&amp;rsquo;ll override the original DocBook stylesheet code. This way, when the DocBook stylesheets get upgraded, you can just import the new ones without worrying about losing your edits. (More on this in &lt;a href=&#34;http://www.xml.com/pub/a/2000/11/01/xslt/index.html&#34;&gt;this XML.com column&lt;/a&gt;.)&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://www.lulu.com/content/224105&#34;&gt;&lt;img src=&#34;http://www.lulu.com/author/display_thumbnail.php?fCID=224105&amp;fSize=zoom_&#34; alt=&#34;[Fake Your Way Through Minis and Mainframes cover]&#34; align=&#34;right&#34; border=&#34;0&#34; hspace=&#34;30px&#34; vspace=&#34;30x&#34; width=&#34;180px&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Here are three important hints if you embark on this yourself:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Get to know the param.xsl file that comes with the DocBook stylesheets. This gives you hooks to customize a lot of the stylesheet behavior without requiring you to dig deeply into the stylesheet code where the real programming logic is. For example, to reset the top margin of the printed pages, I just copied the &lt;code&gt;xsl:param&lt;/code&gt; element that created a &amp;ldquo;page.margin.top&amp;rdquo; parameter from param.xsl into my new stylesheet and assigned the value that I wanted.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Bob Stayton&amp;rsquo;s book &lt;a href=&#34;http://www.sagehill.net/book-description.html&#34;&gt;DocBook XSL: The Complete Guide&lt;/a&gt; lives up to its name and is an excellent reference for the tricky parts. You can buy it (Norm is quoted as saying &amp;ldquo;Buy this book&amp;rdquo;) and it&amp;rsquo;s also available online. The &lt;a href=&#34;http://www.sagehill.net/docbookxsl/PrintOutput.html&#34;&gt;Printed Output&lt;/a&gt; section was invaluable for my stylesheet tweaking. It especially helped me with the embedding of fonts in the PDF files, which lulu requires. Once you know how to do this, you&amp;rsquo;re no longer limited to the Courier, Helvetica, and Times fonts that default FOP usage allows.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;A &lt;code&gt;-Xmx128m&lt;/code&gt; parameter on the java command lines boosts the default heap usage to 128 megs, which FOP needed to handle the full-sized version of my lulu book.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I didn&amp;rsquo;t get too fancy with the book covers, choosing to pick one of their built-in backgrounds and then playing with the font size and color a little for the cover and spine text. Some of the &lt;a href=&#34;http://www.lulu.com/author/cover_browse.php&#34;&gt;cover image choices&lt;/a&gt; are pretty funny, intended for a mystery novel or something—a photo of a human skull partially obscured by shadow? A shiny knife blade dripping blood on a white rose? You can upload your own cover image if you like.&lt;/p&gt;
&lt;p&gt;So, if you wrote a cookbook, a mystery novel, some other kind of novel, or you just want to create a single copy of &amp;ldquo;The Wit and Wisdom of [name of testimonial dinner subject here],&amp;rdquo; look into lulu.com. If your content is stored in DocBook XML or something that can easily be converted to DocBook, most of the necessary work has already been done for you. If you&amp;rsquo;re interested in the basics of MVS, check out the free Acrobat version or the $9.98 paperback version of &amp;ldquo;Fake Your Way Through MVS,&amp;rdquo; or find out about all the mini and mainframe operating systems from the PDF file or from the $19.98 printed, bound book shown above. All these are available from the &lt;a href=&#34;http://www.snee.com/bob/opsys.html&#34;&gt;book&amp;rsquo;s home page&lt;/a&gt;.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2006">2006</category>
      
      <category domain="https://www.bobdc.com//categories/publishing">publishing</category>
      
    </item>
    
    <item>
      <title>Putting semantics on the web</title>
      <link>https://www.bobdc.com/blog/putting-semantics-on-the-web/</link>
      <pubDate>Tue, 31 Jan 2006 09:17:35 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/putting-semantics-on-the-web/</guid>
      
      
      <description><div>Painlessly adding RDF-compatible semantics to XHTML.</div><div>&lt;p&gt;University of Maryland semantic web researcher Jim Hendler (XML 2005 attendees will remember his keynote speech in Atlanta) closed a &lt;a href=&#34;https://www.mindswap.org/blog/2006/01/26/thnking-about-the-semantic-web/&#34;&gt;recent mindswap weblog entry&lt;/a&gt; by writing this:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;I worry that most of the Semantic Web community is doing work in Semantics, most of the rest are looking at Web apps, and hardly anyone is actually looking at the &amp;ldquo;Semantic Web&amp;rdquo; that I really care about&amp;hellip;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I&amp;rsquo;ve felt for a while that too much semantic web work is about building ontologies and schemas and that not enough is about creating actual instance data to use. Imagine if, in the early days of XML, everyone spent most of their time creating DTDs and assumed that others would create XML documents to conform to their DTDs, while those others were just creating more DTDs themselves.&lt;/p&gt;
&lt;p&gt;More semantic web data would give the people doing semantic web research more opportunities to create interesting applications, and the W3C&amp;rsquo;s &lt;a href=&#34;http://www.w3.org/2001/sw/BestPractices/HTML/&#34;&gt;RDF in XHTML Taskforce&lt;/a&gt;, chaired by Ben Adida, is doing some great work here. While their earlier work focused on convincing the RDF crowd of the value of this RDF-XHTML hybrid, recent drafts of the &lt;a href=&#34;http://www.w3.org/2001/sw/BestPractices/HTML/2006-01-24-rdfa-primer&#34;&gt;RDF/A Primer&lt;/a&gt; take on the bigger task of demonstrating this value to people on the other side of the RDF-XHTML fence. For example, the Primer demonstrates how adding a few new attributes here and there in your existing XHTML lets automated programs pull out your contact information and update the departmental directory or pull picture metadata from your web page about your recent trip and add it to your central database of photo information.&lt;/p&gt;
&lt;p&gt;In a posting today on &lt;a href=&#34;http://www.bnode.org/archives2/53&#34;&gt;Cologne&amp;rsquo;s 2nd Web Montag&lt;/a&gt;, Benjamin Nowack writes that a common question about the semantic web is &amp;ldquo;&amp;lsquo;where is the connection to my HTML pages?&amp;rsquo; (link between the clickable web and the semantic web).&amp;rdquo; I think we have our answer in RDF-XHTML. Airlines and movie theater chains are already creating web versions of the data we want, and a few new attributes in that HTML will be a lot less trouble for them than creating parallel RDF/XML files or most other options for offering machine-readable semantic data on the web. I look forward to the new possibilities that this opens up for both developers and consumers.&lt;/p&gt;
&lt;h2 id=&#34;2-comments&#34;&gt;2 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.vanschklift.com/blog&#34; title=&#34;http://www.vanschklift.com/blog&#34;&gt;biou&lt;/a&gt; on &lt;a href=&#34;#comment-167&#34;&gt;January 31, 2006 10:28 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;perhaps an alternative can be found in the duo GRDDL / microformats. It is another approach, perhaps less expressive, but with practical implications which make a direct link with the classical web.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bob&#34; title=&#34;http://www.snee.com/bob&#34;&gt;Bob&lt;/a&gt; on &lt;a href=&#34;#comment-168&#34;&gt;January 31, 2006 11:05 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;There&amp;rsquo;s definitely good work going on with GRDDL as well, but while it will be possible for different websites to use a shared library of stylesheets, the XSLT stylesheet used for extraction is another moving part for each web site to contend with.&lt;/p&gt;
&lt;p&gt;I think that RDF/A takes a rather microformat approach itself.\&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2006">2006</category>
      
      <category domain="https://www.bobdc.com//categories/semantic-web">semantic web</category>
      
    </item>
    
    <item>
      <title>Metadata since the nineteenth century</title>
      <link>https://www.bobdc.com/blog/metadata-since-the-nineteenth/</link>
      <pubDate>Fri, 27 Jan 2006 11:29:43 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/metadata-since-the-nineteenth/</guid>
      
      
      <description><div>Two good books.</div><div>&lt;p&gt;After following Dave Beckett&amp;rsquo;s &lt;a href=&#34;http://journal.dajobe.org/journal/posts/2006/01/19/reading-list/&#34;&gt;pointer&lt;/a&gt; to Stefano Mazzocchi essay &lt;a href=&#34;http://www.betaversion.org/~stefano/linotype/news/95/&#34;&gt;On the Quality of Metadata&lt;/a&gt; last week, I remembered that while we have people like Stefano and Bruce D&amp;rsquo;Arcus among us with stronger backgrounds in more classical approaches to &lt;a href=&#34;http://technorati.com/blogs/metadata&#34;&gt;metadata&lt;/a&gt;, most geeks think that technology from ten years ago is ancient history. I&amp;rsquo;d like to recommend two books I&amp;rsquo;ve read recently for the historical background they provide on the creation, organization, and use of metadata to locate information: Peter Morville&amp;rsquo;s &lt;a href=&#34;http://www.amazon.com/exec/obidos/ISBN=0596007655/bobducharmeA/&#34;&gt;Ambient Findability&lt;/a&gt; and Elaine Svenonius&amp;rsquo;s &lt;a href=&#34;http://www.amazon.com/exec/obidos/ISBN=0262194333/bobducharmeA/&#34;&gt;The Intellectual Foundation of Information Organization&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://www.amazon.com/exec/obidos/ISBN=0596007655/bobducharmeA/&#34;&gt;&lt;img src=&#34;https://www.bobdc.com/img/main/0596007655.01._AA240_SCLZZZZZZZ_.jpg&#34; alt=&#34;[Morville book cover]&#34; border=&#34;0&#34; align=&#34;right&#34; hspace=&#34;30px&#34; vspace=&#34;30x&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Morville&amp;rsquo;s book focuses on &amp;ldquo;&lt;a href=&#34;http://technorati.com/blogs/findability&#34;&gt;findability&lt;/a&gt;&amp;rdquo; as an engineering discipline. When you create something on the web, it&amp;rsquo;s no use to anyone if they can&amp;rsquo;t find it. While there is a seamier side to the search engine optimization efforts of people who see it as way to get rich quickly (with yet another technology trail blazed by the porn industry), it&amp;rsquo;s a real problem for respectable companies with serious offerings. He writes that&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Hewlett-Packard has taken findability a step further by defining a &amp;ldquo;Findability Group&amp;rdquo; that includes an interdisciplinary team responsible for user interface design, information architecture, and search, thereby creating a vital bridge across vertical silos. Perhaps we will see more findability engineers and findability teams in the coming years.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;He focuses on metadata and classification as ways to improve findability, with an eye on the implications of the new information delivery technologies cropping up around us. While many discussions of metadata that you read briefly mention card catalogs before plowing into talk of RDF and the semantic web, Morville&amp;rsquo;s &lt;a href=&#34;http://technorati.com/blogs/library%20science&#34;&gt;library science&lt;/a&gt; background gives him a broader perspective on the work that&amp;rsquo;s gone on for over a hundred years to create usable metadata. His breathless buzzword slinging (&amp;ldquo;RFID is a disruptive technology poised to shift paradigms&amp;rdquo;&amp;hellip;&amp;ldquo;Millions of bloggers swap memes in exchange for karma, whuffie, and other tokens of a reputation economy&amp;rdquo;) makes the book read like something from Wired magazine and will make it look dated pretty quickly, but his efforts to draw on pioneers of &lt;a href=&#34;http://technorati.com/blogs/information%20science&#34;&gt;information science&lt;/a&gt; to inform our approaches to new classes of content delivery systems make his book worth reading.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://www.amazon.com/exec/obidos/ISBN=0262194333/bobducharmeA/&#34;&gt;&lt;img src=&#34;https://www.bobdc.com/img/main/svenoniuscover.jpg&#34; alt=&#34;[Svenonius book cover]&#34; border=&#34;0&#34; align=&#34;right&#34; hspace=&#34;30px&#34; vspace=&#34;30x&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;While &amp;ldquo;Ambient Findability&amp;rdquo; may be more of a beach read than the MIT Press book &amp;ldquo;The Intellectual Foundation of Information Organization,&amp;rdquo; I had no problem reading Svenonius&amp;rsquo;s book on the beach at Cape May last summer. While Morville was a student of library and information science, Elaine Svenonius is a professor of library information science at UCLA, and provides a more sober, rigourous treatment of information organization issues in a book that, without backmatter, is roughly the same length as Morville&amp;rsquo;s.&lt;/p&gt;
&lt;p&gt;Svenonius describes practical and theoretical background in the development and use of metadata with a good historical context. The book covers milestones such as Anthony Panizzi&amp;rsquo;s mid-nineteenth century plan for organizing books in the British library, the beginning of library &amp;ldquo;science&amp;rdquo; in the 1930s (now known as &amp;ldquo;Information Science&amp;rdquo;) at the Chicago Graduate Library School, Cyril Cleverdon&amp;rsquo;s invention of the precision/recall distinction in the late 1950s, and the development of &lt;a href=&#34;http://dublincore.org/&#34;&gt;Dublin Core&lt;/a&gt;. Did you know that Colon Classification (keep your intestinal jokes to yourself and see &lt;a href=&#34;http://dir.yahoo.com/Business_and_Economy/Business_to_Business/Food_and_Beverage/Beverages/Alcohol_and_Spirits/Beer/Organizations/&#34;&gt;Yahoo&lt;/a&gt; for an example, where they use a greater-than character instead of a colon) was invented in 1924? She shows examples of how the past can teach us lessons about dealing with new technology, especially regarding the politics of standardization:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;In the 1940s and 1950s, the Library of Congress also turned to specialists to draft rules for its growing collection of motion pictures, sound recordings, and pictures. The Library of Congress rules proved difficult to use and, as a result, were rejected by most school and public libraries. This led to a proliferation of locally developed manuals to describe nonbook materials, simultaneously abrogating the standardization principle and that of integration.&lt;/p&gt;
&lt;p&gt;Early in the 1970s reaction set in, and a swing began away from specialization and toward integration. Committees first in Canada and then in England and the United States began to formulate rules for nonbook materials that would be compatible with those used for books.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;It&amp;rsquo;s not all libraries and Dewey Decimal Systems; she covers many topics that are important to data geeks such as keyword searching, faceted classification, issues around creating and imposing controlled vocabularies, mapping of different names for the same entity such as &amp;ldquo;Mark Twain&amp;rdquo; and &amp;ldquo;Samuel Clemens,&amp;rdquo; and especially bibliographies. The word &amp;ldquo;bibliography&amp;rdquo; may bring to mind some of the drearier aspects of reading boring books to write boring papers in school, but they&amp;rsquo;re ultimately about the creation and organization of metadata to make a work easier to find, and a lot of sharp people have thought hard about this for a long time.&lt;/p&gt;
&lt;p&gt;Her afterword shows how well her perspective extends to the future:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Two trends appear to be dominating current research and development. One is the increasing formalization of information organization as an object of study through mathematical and entity-relationship modeling, linguistic conceptualization, definitional analysis of theoretical constructs, and empirical research. The second is the increasing reach of automation to develop new means to achieve the traditional bibliographic objectives, to design intelligent search engines, and to aid in the work of cataloging and classification.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Her book doesn&amp;rsquo;t read like a Wired magazine article, but it&amp;rsquo;s not long, and it provides great background in how we got where we are with metadata, which is important to know if you&amp;rsquo;re interested in where it&amp;rsquo;s going.&lt;/p&gt;
&lt;h2 id=&#34;1-comments&#34;&gt;1 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://xml.com/&#34; title=&#34;http://xml.com/&#34;&gt;Kendall Clark&lt;/a&gt; on &lt;a href=&#34;#comment-176&#34;&gt;February 1, 2006 4:45 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Hi Bob,&lt;/p&gt;
&lt;p&gt;FWIW, I read Elaine S.&amp;rsquo;s great book a few years back, can&amp;rsquo;t remember how I got onto it, except that I had a pretty intense self-study program going for a while (in the 2000-2002 range) on library and information science, and Svenonius&amp;rsquo;s book is &lt;em&gt;so&lt;/em&gt; foundational, I just had to read it. Anyone interested in metadata and information architecture should read it, IMO.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2006">2006</category>
      
      <category domain="https://www.bobdc.com//categories/book-reviews">book reviews</category>
      
      <category domain="https://www.bobdc.com//categories/metadata">metadata</category>
      
    </item>
    
    <item>
      <title>All Your Google Base metadata taxonomy are belong to us</title>
      <link>https://www.bobdc.com/blog/all-your-google-base-metadata/</link>
      <pubDate>Tue, 24 Jan 2006 10:17:24 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/all-your-google-base-metadata/</guid>
      
      
      <description><div>Google gives us a taxonomy.</div><div>&lt;p&gt;&lt;a href=&#34;http://en.wikipedia.org/wiki/All_your_base_are_belong_to_us&#34;&gt;&lt;img src=&#34;https://www.bobdc.com/img/main/Aybabtu.png&#34; alt=&#34;[All your base!]&#34; border=&#34;0&#34; align=&#34;right&#34; hspace=&#34;30px&#34; vspace=&#34;30x&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;When you upload data to Google Base, you can name your attributes whatever you like, but Google has given you a head start by providing a taxonomy of attribute names for the information that people are likely to upload, such as course schedules, jobs, and housing listings. Don&amp;rsquo;t worry about whether to store your magazine name as PubName or PublicationName; the Google Base page documenting &lt;a href=&#34;http://base.google.com/base/attribute_list.html&#34;&gt;XML attributes&lt;/a&gt; shows that you&amp;rsquo;re best off storing it as a g:publication_name. (Note to XML geeks: they mean &amp;ldquo;attribute&amp;rdquo; in the database sense here, not the XML sense, despite the &amp;ldquo;XML attributes&amp;rdquo; title of the page documenting their taxonomy. See the &lt;a href=&#34;http://openrecord.org/dojo/2006-01-09/data_model_comparison.html&#34;&gt;Data Model Comparison Table&lt;/a&gt; mentioned in my last blog entry to compare Google Base data modeling terms with others.)&lt;/p&gt;
&lt;p&gt;Hardcore markup people know that the &amp;ldquo;g&amp;rdquo; prefix isn&amp;rsquo;t really part of the name. It&amp;rsquo;s standing in for a URI that is the real identifier for a particular collection of names, and if a document declares &amp;ldquo;xxx&amp;rdquo; as the prefix that represents the same URI, then an application should treat xxx:publication_name the same as it would treat g:publication_name from one of Google&amp;rsquo;s sample Google Base documents. (If you&amp;rsquo;re not sure why, see Ron Bourret&amp;rsquo;s &lt;a href=&#34;http://www.rpbourret.com/xml/NamespacesFAQ.htm&#34;&gt;XML Namespace FAQ&lt;/a&gt;, which I try to reread at least once a year.)&lt;/p&gt;
&lt;p&gt;The &lt;a href=&#34;http://base.google.com/base/provider_module.html&#34;&gt;Google Base - Provider Namespace&lt;/a&gt; documentation tells us that &amp;ldquo;The &amp;lsquo;g:&amp;rsquo; prefix is reserved for the Google Base XML module and should not be used,&amp;rdquo; which shows that someone got sloppy in coding the Google Base system somewhere. I took one of their sample documents, changed the namespace declaration to xmlns:xxx=&amp;ldquo;&lt;a href=&#34;http://base.google.com/cns/1.0%22&#34;&gt;http://base.google.com/cns/1.0&amp;quot;&lt;/a&gt;, changed all the g: prefixes to xxx:, and uploaded the document, and Google Base did the right thing and recognized names from that namespace even with the xxx prefix. It still worries me a bit that it was much easier to find the use of the g prefix at base.google.com than it was to find the URI that it represents, because too many people still think that the prefix name is the namespace name, not a temporary stand-in for the full name to reduce markup bulk.&lt;/p&gt;
&lt;p&gt;There&amp;rsquo;s another important issue about Google Base for people interested in metadata and taxonomies to consider: I mentioned earlier that you don&amp;rsquo;t have to worry about whether to store your magazine name as PubName or PublicationName, because Google has already picked a name. If Google Base gets legs, this taxonomy will get legs. Those of us who think of &lt;a href=&#34;http://dublincore.org/documents/dces/&#34;&gt;Dublin Core&lt;/a&gt; names such as dc:creator as pervasive and well-understood may eventually see more g:author elements than dc:creator elements out there.&lt;/p&gt;
&lt;p&gt;Sam Ruby has &lt;a href=&#34;http://www.intertwingly.net/blog/2005/11/20/Google-Base-Format-Review&#34;&gt;pointed out some sloppiness&lt;/a&gt; in the taxonomy and its documentation, and I&amp;rsquo;ve found a bit myself. For example, the first sample file I clicked on in section 4 of the &lt;a href=&#34;http://base.google.com/base/atom_specs.html&#34;&gt;Atom 0.3 Specification&lt;/a&gt; documentation, &lt;a href=&#34;http://base.google.com/base/news-atom-template.xml&#34;&gt;news-atom-template.xml&lt;/a&gt;, wasn&amp;rsquo;t well-formed—the first tag&amp;rsquo;s second attribute value was missing a closing quote, which is a pretty unprofessional mistake when a major brand name is presenting a model for people to follow.&lt;/p&gt;
&lt;p&gt;Google Base also lets you &lt;a href=&#34;http://base.google.com/support/bin/answer.py?answer=27882&#34;&gt;create your own attributes&lt;/a&gt;, which is nice, and I&amp;rsquo;m sure that their metadata experts will look closely at this folksonomy as it develops. And meanwhile, whether Google Base takes off or not, taxonomy specialists should prepare for the possibility that this &amp;ldquo;Google Core&amp;rdquo; namespace may show up in petabytes of data.&lt;/p&gt;
&lt;h2 id=&#34;2-comments&#34;&gt;2 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://lylejohnson.name/blog/&#34; title=&#34;http://lylejohnson.name/blog/&#34;&gt;Lyle Johnson&lt;/a&gt; on &lt;a href=&#34;#comment-161&#34;&gt;January 27, 2006 2:39 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Does Google Base even work at this point? I just tried &amp;ldquo;publishing&amp;rdquo; a few items (specifically, events) and even though their status is &amp;ldquo;Published&amp;rdquo;, they don&amp;rsquo;t show up in any searches.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bob&#34; title=&#34;http://www.snee.com/bob&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-162&#34;&gt;January 27, 2006 2:58 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;When I tried &amp;ldquo;bulk&amp;rdquo; uploading two 2K files, they were in a &amp;ldquo;pending&amp;rdquo; status before they showed up as regular items, but there was an indication of their pending status.&lt;/p&gt;
&lt;p&gt;Bob&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2006">2006</category>
      
      <category domain="https://www.bobdc.com//categories/metadata">metadata</category>
      
    </item>
    
    <item>
      <title>Meta-metadata</title>
      <link>https://www.bobdc.com/blog/metametadata/</link>
      <pubDate>Fri, 20 Jan 2006 16:22:03 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/metametadata/</guid>
      
      
      <description><div>If metadata is data about data, then here&#39;s some handy data about metadata.</div><div>&lt;p&gt;I haven&amp;rsquo;t looked too hard at &lt;a href=&#34;http://dojotoolkit.org/&#34;&gt;dojo&lt;/a&gt;, an open source Javascript toolkit, but on &lt;a href=&#34;http://www.robotwisdom.com/&#34;&gt;robotwisdom&lt;/a&gt; I found out about the &lt;a href=&#34;http://openrecord.org/dojo/2006-01-09/data_model_comparison.html&#34;&gt;Data Model Comparison Table&lt;/a&gt; on dojo&amp;rsquo;s website. The page&amp;rsquo;s multiple tables compare various dojo data model and metadata concepts with comparable concepts in RDF, XML, SQL, spreadsheets, CSV files, Google Base, Ning, and more. If you don&amp;rsquo;t care about dojo and take its columns out of the table, the comparison of the remaining columns is still very interesting. I&amp;rsquo;ll bet this ends up on some cubicle walls.&lt;/p&gt;
&lt;h2 id=&#34;3-comments&#34;&gt;3 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://times.usefulinc.com/&#34; title=&#34;http://times.usefulinc.com/&#34;&gt;Edd Dumbill&lt;/a&gt; on &lt;a href=&#34;#comment-105&#34;&gt;January 20, 2006 5:37 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Handy! Could do with some updating re: RDF typing.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bob&#34; title=&#34;http://www.snee.com/bob&#34;&gt;Bob&lt;/a&gt; on &lt;a href=&#34;#comment-106&#34;&gt;January 20, 2006 5:41 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Edd -&lt;/p&gt;
&lt;p&gt;Given the ambition of such a collection of charts, I figured that some of its details would be wrong. If you sent in some suggested edits, they&amp;rsquo;d be crazy not to make them.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://dannyayers.com&#34; title=&#34;http://dannyayers.com&#34;&gt;Danny&lt;/a&gt; on &lt;a href=&#34;#comment-107&#34;&gt;January 20, 2006 6:09 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Great find! &amp;lsquo;Tis a bit feeble on the RDF side, but the rest just happens to be something that&amp;rsquo;ll help with a write-up I&amp;rsquo;m working on - ta Bob.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2006">2006</category>
      
      <category domain="https://www.bobdc.com//categories/metadata">metadata</category>
      
    </item>
    
    <item>
      <title>Bill Kent</title>
      <link>https://www.bobdc.com/blog/bill-kent/</link>
      <pubDate>Tue, 17 Jan 2006 09:16:12 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/bill-kent/</guid>
      
      
      <description><div>Bill Kent combined timeless abstract ideas about data modeling with practical advice, and always with a sense of humor.</div><div>&lt;p&gt;&lt;a href=&#34;http://www.authorhouse.com/BookStore/ItemDetail.aspx?bookid=2713&#34;&gt;&lt;img src=&#34;http://www.authorhouse.com/BookStore/Covers/2713.jpg&#34; alt=&#34;[Data and Reality cover]&#34; align=&#34;right&#34; border=&#34;0&#34; hspace=&#34;30px&#34; vspace=&#34;30x&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Bill Kent passed away on December 17th at the age of 69. Six years ago, in a discussion of a generalized definition for the word &amp;ldquo;schema&amp;rdquo; on xsl-list, Ben Pickering &lt;a href=&#34;http://www.biglist.com/cgi-bin/wilma/wilma_hiliter/xsl-list/200007/msg00368.html&#34;&gt;asked&lt;/a&gt; &amp;ldquo;are there any books or online documents dealing with the theory of &amp;lsquo;information structures&amp;rsquo;? Some kind of description of the ways in which information may be structured, and the advantages of doing it a particular way? Michael Kay &lt;a href=&#34;http://www.biglist.com/cgi-bin/wilma/wilma_hiliter/xsl-list/200007/msg00381.html&#34;&gt;replied&lt;/a&gt; &amp;ldquo;Yes! Though most of the ones I know of are written in the &amp;lsquo;database&amp;rsquo; context rather than the &amp;lsquo;document&amp;rsquo; context. Some are very academic / mathematical / philosophical, some more oriented to the practitioner. One of the best in my view, but very hard to get now, is Bill Kent&amp;rsquo;s &amp;lsquo;Data and Reality&amp;rsquo;.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;Between the book&amp;rsquo;s excellent title and the source of the recommendation, I had to seek it out, and found a 1979 North Holland printing at &lt;a href=&#34;http://www.powells.com/&#34;&gt;Powells&lt;/a&gt; completely set in a Courier typeface. (It has since become available from &lt;a href=&#34;http://www.authorhouse.com/BookStore/ItemDetail.aspx?bookid=2713&#34;&gt;authorhouse&lt;/a&gt; in both electronic form and as a paperback.) I have e-mailed Kent since then, and was thrilled to receive a reply and to later correspond with him about the possibility of compiling his other writings. I was also very sorry to miss the Extreme XML 2003 conference in Montreal, because he was the keynote speaker there. I&amp;rsquo;m sure he had some insightful things to say about where XML data modeling issues fit into the larger data modeling issues that he had thought so much about over the years.&lt;/p&gt;
&lt;p&gt;In a field where a ten-year-old book can look hopelessly out of date, &amp;ldquo;Data and Reality&amp;rdquo; has plenty of clear, prescient advice for those of us working nearly three decades after it was written. He talks of semiotics, set theory and realistic examples of data modeling problems that you often don&amp;rsquo;t realize are problematic until he explains why. Much of what he writes anticipates fundamental notions of object-oriented development, and upon my first reading I couldn&amp;rsquo;t help but wonder how he reacted to OO ideas when they came along. You don&amp;rsquo;t have to wonder much; he had plenty to say, much of which you can find in the &amp;ldquo;Object orientation&amp;rdquo; section of the essays he&amp;rsquo;s made available in the &lt;a href=&#34;http://www.bkent.net/catalogsource.htm&#34;&gt;Document List&lt;/a&gt; section of his website. He wasn&amp;rsquo;t simply pointing out problems that OO modeling would solve, though—&amp;ldquo;Data and Reality&amp;rdquo; also mentions issues that would point to problems with the OO model.&lt;/p&gt;
&lt;p&gt;One important issue in his writing is object identity. For example:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;What does &amp;ldquo;catching the same plane every Friday&amp;rdquo; really mean? It may or may not be the same physical airplane. But if a mechanic is scheduled to service the same plane every Friday, it had better be the same physical airplane.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Another issue is object boundaries. How do we, and when should we, represent two concepts separately? Once a concept is represented in software, what is the effect of the passing of time on it?&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;At what point is it appropriate to introduce a new representative into the system, because change has transformed something into a new and different thing?&lt;/p&gt;
&lt;p&gt;The problem is one of identifying or discovering some essential invariant characteristic of a thing, which gives it its identity. That invariant characteristic is often hard to identify, or may not exist at all.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;An important theme in &amp;ldquo;Data and Reality&amp;rdquo; is that the way we represent something has more to do with how it&amp;rsquo;s used than any intrinsic qualities of the thing:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The category of a thing (i.e., what it is) might be determined by its position, or environment, or use, rather than by its intrinsic form and composition. In the set of plastic letters my son plays with, there is an object which might be an &amp;ldquo;N&amp;rdquo; or a &amp;ldquo;Z&amp;rdquo;, depending on how he holds it. Another one could be a &amp;ldquo;u&amp;rdquo; or an &amp;ldquo;n&amp;rdquo;, and still another might be a &amp;ldquo;b&amp;rdquo;, &amp;ldquo;p&amp;rdquo;, &amp;ldquo;d&amp;rdquo;, or &amp;ldquo;q&amp;rdquo;.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;As he put it elsewhere, &amp;ldquo;we are not modeling reality, but the way information about reality is processed, by people.&amp;rdquo; Today, whether we&amp;rsquo;re talking about open data shared by anyone who wants it, protected data shared by business partners according to the terms of a strict contract, or any data sharing that falls between these two scenarios, the issues Kent describes are even more important than they were when he wrote these words 29 years ago.&lt;/p&gt;
&lt;p&gt;I drafted that last paragraph before I came across this, the ending of his book:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;In an absolute sense, there is no singular objective reality. But we can share a common enough view of it for most of our working purposes, so that reality does appear to be objective and stable.&lt;/p&gt;
&lt;p&gt;But the chances of achieving such a shared view become poorer when we try to encompass broader purposes, and to involve more people. This is precisely why the question is becoming more relevant today: the thrust of technology is to foster interaction among greater numbers of people, and to integrate processes into monoliths serving wider and wider purposes. It is in this environment that discrepancies in fundamental assumptions will become increasingly exposed.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;He knew where things were headed, and he had a lot of great advice for people processing information in the seventies, today, and years from now. Take a look at his &lt;a href=&#34;http://www.bkent.net/obituary.htm&#34;&gt;obituary&lt;/a&gt; and &lt;a href=&#34;http://www.bkent.net/&#34;&gt;web site&lt;/a&gt;, particularly his nature photography, &amp;ldquo;Data and Reality&amp;rdquo; excerpts, and other writing since then. He will live on in this work.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2006">2006</category>
      
      <category domain="https://www.bobdc.com//categories/technology-past">technology, past</category>
      
    </item>
    
    <item>
      <title>Selling IBM mainframes in 1979—the colors! The outfits!</title>
      <link>https://www.bobdc.com/blog/selling-ibm-mainframes-in-1979-1/</link>
      <pubDate>Fri, 06 Jan 2006 17:31:04 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/selling-ibm-mainframes-in-1979-1/</guid>
      
      
      <description><div>A machine with half a meg of memory cost $62,500... but it supported multiple concurrent users.</div><div>&lt;p&gt;&lt;a href=&#34;http://blog.raymondfrohlich.com/2006/01/retro-office-pictures-ibm-4331_05.html&#34;&gt;&lt;img src=&#34;http://blog.raymondfrohlich.com/images/post7-ibm4331/4.jpg&#34; alt=&#34;[4331 promo picture]&#34; width=&#34;200px&#34; align=&#34;right&#34; border=&#34;0&#34; hspace=&#34;30px&#34; vspace=&#34;30x&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Raymond Frohlich, an artist and master&amp;rsquo;s candidate at NYU&amp;rsquo;s Tisch School of the Arts, has posted scanned pictures of a 1979 marketing publication for the IBM 4331, a low-end mainframe of the time (&lt;a href=&#34;http://blog.raymondfrohlich.com/2006/01/retro-office-pictures-ibm-4331.html&#34;&gt;part 1&lt;/a&gt;, &lt;a href=&#34;http://blog.raymondfrohlich.com/2006/01/retro-office-pictures-ibm-4331_05.html&#34;&gt;part 2&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://www.boingboing.net/2006/01/05/scans_from_1979_ibm_.html&#34;&gt;BoingBoing&lt;/a&gt; and others who&amp;rsquo;ve notice it clearly love the retro hair and outfits of the models as much as the retro price/performance ratio, but I looked a little more into the 4331. A Google search found a &lt;a href=&#34;http://twistypuzzles.com/cgi-bin/puzzle.cgi?pid=973&#34;&gt;4331 schwag Rubik&amp;rsquo;s cube&lt;/a&gt; (did I mention the retro seventies appeal?) and an &lt;a href=&#34;http://www-03.ibm.com/ibm/history/exhibits/mainframe/mainframe_PP4331.html&#34;&gt;IBM historical archive entry&lt;/a&gt; on the computer. It&amp;rsquo;s too bad the illustration there is in black and white; full-color renditions of the minimalist seventies office artwork on the wall would add much to the picture.&lt;/p&gt;
&lt;p&gt;The existence of these IBM archives for me was a real find. It has a lot of fascinating material, such as audio recordings of Thomas Watson Jr. from 1993 and 1938 and one from his father in 1915, and a 1931 recording of employees singing the song &amp;ldquo;Ever Onward IBM&amp;rdquo; in 1931, all on the &lt;a href=&#34;http://www-03.ibm.com/ibm/history/multimedia/&#34;&gt;multimedia&lt;/a&gt; page. There&amp;rsquo;s a 100-page FAQ (&lt;a href=&#34;http://www-03.ibm.com/ibm/history/documents/pdf/faq.pdf&#34;&gt;pdf&lt;/a&gt;) with questions like &amp;ldquo;What was the IBM 46?&amp;rdquo; and &amp;ldquo;What was the IBM 47?&amp;rdquo; (Apparently the question &amp;ldquo;What was the IBM 4331?&amp;rdquo; is not frequently asked.) Did you know that before Watson Sr. streamlined the business, they also made clocks, including one that would probably be considered an art deco masterpiece today (&lt;a href=&#34;http://www-03.ibm.com/ibm/history/exhibits/cc/pdf/cc_2407M351.pdf&#34;&gt;pdf&lt;/a&gt;)? I especially love the &lt;a href=&#34;http://www-03.ibm.com/ibm/history/exhibits/mainframe/mainframe_album.html&#34;&gt;Mainframes photo album&lt;/a&gt; and the color-coding of some of the big systems—would you prefer your mainframe in &lt;a href=&#34;http://www-03.ibm.com/ibm/history/exhibits/mainframe/mainframe_2423PH3031.html&#34;&gt;yellow&lt;/a&gt;, &lt;a href=&#34;http://www-03.ibm.com/ibm/history/exhibits/3033/3033_241902.html&#34;&gt;red&lt;/a&gt;, or &lt;a href=&#34;http://www-03.ibm.com/ibm/history/exhibits/mainframe/mainframe_2423PH3084.html&#34;&gt;blue&lt;/a&gt;?&lt;/p&gt;
&lt;p&gt;To set a period mood while looking at Frohlich&amp;rsquo;s pictures, I suggest that you put on Armed Forces, Elvis Costello&amp;rsquo;s 1979 album, and remember that two albums earlier he was recording &amp;ldquo;Watching the Detectives&amp;rdquo; and &amp;ldquo;Alison&amp;rdquo; at night and doing mainframe data entry by day.&lt;/p&gt;
&lt;h2 id=&#34;2-comments&#34;&gt;2 Comments&lt;/h2&gt;
&lt;p&gt;By Len Fischer on &lt;a href=&#34;#comment-217&#34;&gt;February 14, 2006 3:06 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I don&amp;rsquo;t know if this was added after your blog entry but there is a What was the IBM 4331 entry on page 57 of the IBM history faq.pdf.&lt;/p&gt;
&lt;p&gt;By Merideth Carleton on &lt;a href=&#34;#comment-392&#34;&gt;April 9, 2006 12:03 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Have you seen this before? It&amp;rsquo;s a number guessing game: &lt;a href=&#34;http://www.amblesideprimary.com/ambleweb/mentalmaths/guessthenumber.html.&#34;&gt;http://www.amblesideprimary.com/ambleweb/mentalmaths/guessthenumber.html.&lt;/a&gt; I guessed 34355, and it got it right! Pretty neat.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2006">2006</category>
      
      <category domain="https://www.bobdc.com//categories/technology-past">technology, past</category>
      
    </item>
    
    <item>
      <title>After Web 2.0? Web 2.0 2.0</title>
      <link>https://www.bobdc.com/blog/after-web-20-web-20-20/</link>
      <pubDate>Tue, 03 Jan 2006 14:17:39 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/after-web-20-web-20-20/</guid>
      
      
      <description><div>The next step for the web.</div><div>&lt;p&gt;The original &lt;a href=&#34;http://www.oreillynet.com/lpt/a/6228&#34;&gt;Web 2.0&lt;/a&gt; conference was held in October of 2004, and now that we&amp;rsquo;ve reached the second half of the decade, Web 2.0 already seems so&amp;hellip; turn-of-the-century. A group of bold visionaries is already creating a whole new Web 2.0 that I like to call Web 2.0 2.0.&lt;/p&gt;
&lt;p&gt;Like anyone hyping a hot new technology trend, I won&amp;rsquo;t define what I&amp;rsquo;m talking about, but instead show an arbitrary yet evocative list:&lt;/p&gt;
&lt;table&gt;
&lt;tr id=&#34;i12&#34; class=&#34;odd&#34;&gt;&lt;th&gt;&lt;b&gt;Web 2.0&lt;/b&gt;&lt;/th&gt;&lt;th &gt;&lt;b&gt;Web 2.0 2.0&lt;/b&gt;&lt;/th&gt;&lt;/tr&gt;
&lt;tr id=&#34;i13&#34; class=&#34;even&#34;&gt;&lt;td &gt;&lt;a href=&#34;https://web.archive.org/web/20180524011022/http://en.wikipedia.org/&#34;&gt;Wikipedia&lt;/a&gt;&lt;/td&gt;&lt;td &gt;&lt;a href=&#34;https://web.archive.org/web/20180524011022/http://uncyclopedia.org/wiki/Main_Page&#34;&gt;Uncyclopedia&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr id=&#34;i14&#34; class=&#34;odd&#34;&gt;&lt;td &gt;overuse of trendy music terms (remix, mashup)&lt;/td&gt;&lt;td &gt;overuse of French cinematic terms (mise-en-sc&amp;#232;ne, montage, v&amp;#233;rit&amp;#233;)&lt;/td&gt;&lt;/tr&gt;
&lt;tr id=&#34;i15&#34; class=&#34;even&#34;&gt;&lt;td &gt;&lt;a href=&#34;https://web.archive.org/web/20180524011022/http://www.cnn.com/2004/TECH/ptech/01/09/bus2.feat.geek.camp/&#34;&gt;foo&lt;/a&gt; camp, &lt;a href=&#34;https://web.archive.org/web/20180524011022/http://barcamp.org/&#34;&gt;bar&lt;/a&gt; camp, &lt;a href=&#34;https://web.archive.org/web/20180524011022/http://longtailcamp.org/&#34;&gt;long tail&lt;/a&gt; camp&lt;/td&gt;&lt;td &gt;&lt;a href=&#34;https://web.archive.org/web/20180524011022/http://pages.zoom.co.uk/leveridge/sontag.html&#34;&gt;on&lt;/a&gt; camp&lt;/td&gt;&lt;/tr&gt;
&lt;tr id=&#34;i16&#34; class=&#34;odd&#34;&gt;&lt;td &gt;&lt;a href=&#34;https://web.archive.org/web/20180524011022/http://iclock.org/ &#34;&gt;iClock&lt;/a&gt;&lt;/td&gt;&lt;td &gt;&lt;a href=&#34;https://web.archive.org/web/20180524011022/http://www.humanclock.com/&#34;&gt;human clock&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr id=&#34;i17&#34; class=&#34;even&#34;&gt;&lt;td &gt;Google&lt;/td&gt;&lt;td &gt;&lt;a href=&#34;https://web.archive.org/web/20180524011022/http://www.gizoogle.com/&#34;&gt;Gizoogle&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr id=&#34;i18&#34; class=&#34;odd&#34;&gt;&lt;td &gt;platforms as frameworks&lt;/td&gt;&lt;td &gt;frameworks as platforms&lt;/td&gt;&lt;/tr&gt;
&lt;tr id=&#34;i19&#34; class=&#34;even&#34;&gt;&lt;td &gt;real weblogs by unknown people&lt;/td&gt;&lt;td &gt;&lt;a href=&#34;https://web.archive.org/web/20180524011022/http://stuckinrehabwithpatobrien.blogspot.com/#111144582372661444&#34;&gt;fake weblogs about famous people&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr id=&#34;i20&#34; class=&#34;odd&#34;&gt;&lt;td &gt;craigslist&lt;/td&gt;&lt;td &gt;&lt;a href=&#34;https://web.archive.org/web/20180524011022/http://www.kasperhauser.com/khmc/&#34;&gt;khraigslist &lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr id=&#34;i21&#34; class=&#34;even&#34;&gt;&lt;td &gt;RSS feeds of content other than blogs and news articles&lt;/td&gt;&lt;td &gt;Atom 1.0 feeds of content other than blogs and news articles&lt;/td&gt;&lt;/tr&gt;
&lt;tr id=&#34;i22&#34; class=&#34;odd&#34;&gt;&lt;td &gt;&lt;a href=&#34;https://web.archive.org/web/20180524011022/http://labs.google.com/&#34;&gt;hot new Google apps&lt;/a&gt;&lt;/td&gt;&lt;td &gt;&lt;a href=&#34;https://web.archive.org/web/20180524011022/http://www.technorati.com/search/%22google+rumor%22&#34;&gt;rumors&lt;/a&gt; about Google apps that don&#39;t exist yet&lt;/td&gt;&lt;/tr&gt;
&lt;tr id=&#34;i23&#34; class=&#34;even&#34;&gt;&lt;td &gt;folksonomies&lt;/td&gt;&lt;td &gt;Fahrvergn&amp;#252;gen&lt;/td&gt;&lt;/tr&gt; 
&lt;tr id=&#34;i24&#34; class=&#34;odd&#34;&gt;&lt;td &gt;&lt;a href=&#34;https://web.archive.org/web/20180524011022/http://www.flickr.com/&#34;&gt;flickr&lt;/a&gt;, &lt;a href=&#34;https://web.archive.org/web/20180524011022/http://del.icio.us/&#34;&gt;del.icio.us&lt;/a&gt;&lt;/td&gt;&lt;td &gt;&lt;a href=&#34;https://web.archive.org/web/20180524011022/http://www.tagtagger.com/&#34;&gt;tagtagger&lt;/a&gt;, &lt;a href=&#34;https://web.archive.org/web/20180524011022/http://supr.c.ilio.us/&#34;&gt;supr.c.ilio.us&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr id=&#34;i25&#34; class=&#34;even&#34;&gt;&lt;td &gt;Creative Commons Licensing&lt;/td&gt;&lt;td &gt;&lt;a href=&#34;https://web.archive.org/web/20180524011022/http://supr.c.ilio.us/un/hu-lb-dr/01/&#34;&gt;uncreative uncommons licensing&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;
&lt;p&gt;These last two are instructive. While del.ici.ous lets people create tags about web sites and flickr lets people create tags about pictures, the mise-en-scène of supr.c.ilio.us takes it to the next level, letting people create tags about social bookmarking sites. Tagging about tagging! Very Web 2.0 2.0. And, they&amp;rsquo;ve even come up with a new licensing scheme for it: &lt;a href=&#34;http://supr.c.ilio.us/un/hu-lb-dr/01/&#34;&gt;uncreative uncommons&lt;/a&gt;. Free as in &amp;ldquo;association!&amp;rdquo;&lt;/p&gt;
&lt;p&gt;If Web 2.0 was both &lt;a href=&#34;http://www.paulgraham.com/web20.html&#34;&gt;meaningless and meaningful&lt;/a&gt;, Web 2.0 2.0 is even more meaningless—and even more meaningful. Web 2.0 2.0 practitioners aren&amp;rsquo;t tied to the old, 2004 models of community building, perpetual betas, and XMLHttpRequest calls. They&amp;rsquo;re thinking big by thinking small, by thinking big, and we&amp;rsquo;re seeing a qualitative difference in the role of the web in our lives. It&amp;rsquo;s no longer about &amp;ldquo;us,&amp;rdquo; but about &amp;ldquo;we,&amp;rdquo; and you can&amp;rsquo;t spell &amp;ldquo;web&amp;rdquo; without &amp;ldquo;we.&amp;rdquo; (You also can&amp;rsquo;t spell &amp;ldquo;asshole&amp;rdquo; without &amp;ldquo;&lt;a href=&#34;http://www.lightreading.com/document.asp?doc_id=83507&#34;&gt;aol&lt;/a&gt;,&amp;rdquo; but that&amp;rsquo;s a separate topic.)&lt;/p&gt;
&lt;p&gt;Web 2.0 2.0 technologies like &lt;a href=&#34;http://www.technicalpursuit.com/ajax_indepth.htm%20&#34;&gt;HWAJAX&lt;/a&gt; (Hand Waving + Asynchronous JavaScript + XML) are transforming the value exchange paradigm as we know it. While Web 2.0 (or, as the French call it, &amp;ldquo;&lt;a href=&#34;http://fr.wikipedia.org/wiki/Web_2.0&#34;&gt;web deux point eaux&lt;/a&gt;&amp;rdquo;) was about two-way communication between providers and consumers, Web 2.0 2.0 is about three-way communication, between providers, consumers, and self-styled pundits like myself, all centered around &amp;ldquo;value&amp;rdquo; as a &amp;ldquo;meme.&amp;rdquo; A diagram will make it easier to understand:&lt;/p&gt;
&lt;img id=&#34;i10&#34; src=&#34;https://www.bobdc.com/img/main/web2020.png&#34; alt=&#34;Web 2.0 2.0&#34;/&gt;
&lt;p&gt;If you haven&amp;rsquo;t seen any examples of what I&amp;rsquo;m talking about, then I guess you just don&amp;rsquo;t read the right blogs, do you? Web 2.0 2.0 is truly the fulfillment of Tim Berners Lee&amp;rsquo;s &lt;a href=&#34;http://www.w3.org/People/Berners-Lee/Kids&#34;&gt;original vision&lt;/a&gt; of the World Wide Web: &amp;ldquo;Often it was just easier to go and ask people when they were having coffee.&amp;rdquo; Or, put another way, the Web is the network is the computer.&lt;/p&gt;
&lt;p&gt;William Gibson, who is some kind of prophet or something, once said &amp;ldquo;The future is here. It&amp;rsquo;s just not evenly distributed yet.&amp;rdquo; This is apparently a very cool thing to quote in vague, utopian blog postings, probably because the word &amp;ldquo;distributed&amp;rdquo; alludes to distributed computing, which is much hipper than centralized systems. Arthur C. Clark wrote that &amp;ldquo;Any sufficiently advanced technology is indistinguishable from magic,&amp;rdquo; which proves my point just about as well. The use of at least one of these quotes is mandatory in any essay that&amp;rsquo;s going for a technology visionary effect, and I have both of them, so I have a particularly clear grasp of where the web is going.&lt;/p&gt;
&lt;p&gt;But in the words of filmmaker and Web 2.0 2.0 patron saint Marty DiBergi, enough of my yakkin&amp;rsquo; : get out there and start creating Web 2.0 2.0 applications! There&amp;rsquo;s money to be made—and this time we&amp;rsquo;ll do it right, without creating the mess that &lt;a href=&#34;http://bubble20.blogspot.com/index.html&#34;&gt;Bubble 2.0&lt;/a&gt; became.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://supr.c.ilio.us/un/hu-lb-dr/01/&#34;&gt;&lt;img src=&#34;http://supr.c.ilio.us/un/i/un-denied.png&#34; alt=&#34;most rights denied&#34; border=&#34;off&#34;/&gt;&lt;/a&gt; &lt;a href=&#34;http://technorati.com/blogs/web%202.0%202.0&#34;&gt;&lt;/a&gt; &lt;a href=&#34;http://technorati.com/blogs/web%202.0%202.0&#34;&gt;&lt;/a&gt; &lt;a href=&#34;http://technorati.com/blogs/web%202.0&#34;&gt;&lt;/a&gt; &lt;a href=&#34;http://technorati.com/blogs/web%202.0&#34;&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&#34;2-comments&#34;&gt;2 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://supr.c.ilio.us/blog/&#34; title=&#34;http://supr.c.ilio.us/blog/&#34;&gt;Eran&lt;/a&gt; on &lt;a href=&#34;#comment-62&#34;&gt;January 3, 2006 4:17 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Hey Bob,&lt;/p&gt;
&lt;p&gt;Good article. I really think this web 2.0 2.0 thing will catch on. Not sure about that whole french thing though but that&amp;rsquo;s mostly because i&amp;rsquo;ve no clue what all them words mean.&lt;/p&gt;
&lt;p&gt;PS. I think you meant Uncreative _Un_commons.&lt;/p&gt;
&lt;p&gt;Eran.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bob&#34; title=&#34;http://www.snee.com/bob&#34;&gt;Bob&lt;/a&gt; on &lt;a href=&#34;#comment-63&#34;&gt;January 3, 2006 4:36 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Thanks Eran, I fixed it.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2006">2006</category>
      
      <category domain="https://www.bobdc.com//categories/technology-future">technology, future</category>
      
    </item>
    
    <item>
      <title>Technorati tags as metadata: making them more meta</title>
      <link>https://www.bobdc.com/blog/technorati-tags-as-metadata-ma/</link>
      <pubDate>Tue, 27 Dec 2005 15:21:52 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/technorati-tags-as-metadata-ma/</guid>
      
      
      <description><div>Is it really metadata when you have to announce &#34;here are my posting&#39;s Technorati tags&#34;?</div><div>&lt;p&gt;More bloggers are embedding &lt;a href=&#34;http://www.technorati.com/help/tags.html&#34;&gt;Technorati Tags&lt;/a&gt; into their postings, and it&amp;rsquo;s great to see a major league app use user-created metadata for anything. The popular convention of announcing what your Technorati tags are as part of your blog&amp;rsquo;s content, though, made me wonder: can we add these tags where the casual reader doesn&amp;rsquo;t see them, so that they really are metadata instead of being additional tagged content? For example, if I write a posting about RDF and XMP without actually using the word &amp;ldquo;metadata&amp;rdquo;, how can I get a Technorati tag search on the word metadata to find that entry?&lt;/p&gt;
&lt;p&gt;A View Source on my &lt;a href=&#34;https://www.bobdc.com/blog/a-news-reader-wish-granted&#34;&gt;last posting here&lt;/a&gt; will reveal some experiments I did with Technorati tags that had no content between the &lt;code&gt;a&lt;/code&gt; elements&amp;rsquo; start- and end-tags. (Experiments &lt;a href=&#34;http://www.snee.com/sneetard/2005/12/another_technorati_tagging_tes_1.html&#34;&gt;elsewhere&lt;/a&gt; showed that a single-tag empty &lt;code&gt;a&lt;/code&gt; element got treated as a start-tag with no end, so that the text after it was underlined as if it were a link anchor.) Of the following three &lt;code&gt;a&lt;/code&gt; elements, the first resulted in an entry at &lt;a href=&#34;http://technorati.com/tag/bobtest4&#34;&gt;http://technorati.com/tag/bobtest4&lt;/a&gt; and the third in an entry at &lt;a href=&#34;http://www.technorati.com/blogs/bobtest6&#34;&gt;http://www.technorati.com/blogs/bobtest6&lt;/a&gt; (along with the comment &amp;ldquo;Hey Bob! Is your blog about bobtest6?&amp;rdquo; on the right):&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;lt;a href=&amp;quot;http://technorati.com/tag/bobtest4&amp;quot; rel=&amp;quot;tag&amp;quot;&amp;gt;&amp;lt;/a&amp;gt;
&amp;lt;a href=&amp;quot;http://technorati.com/tag/bobtest5&amp;quot; rel=&amp;quot;directory&amp;quot;&amp;gt;&amp;lt;/a&amp;gt;
&amp;lt;a href=&amp;quot;http://technorati.com/tag/bobtest6&amp;quot; rel=&amp;quot;tag directory&amp;quot;&amp;gt;&amp;lt;/a&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Considering that a &lt;code&gt;rel&lt;/code&gt; attribute is &lt;a href=&#34;http://www.w3.org/TR/xhtml-modularization/abstraction.html#dt_LinkTypes&#34;&gt;supposed to&lt;/a&gt; hold &amp;ldquo;a space-separated list of link types,&amp;rdquo; and that &amp;ldquo;White space characters are not permitted within link types,&amp;rdquo; I would think that the second line above would put a bobtest5 entry at &lt;a href=&#34;http://www.technorati.com/blogs/bobtest5&#34;&gt;http://www.technorati.com/blogs/bobtest5&lt;/a&gt; and and that the third would put entries at both &lt;a href=&#34;http://www.technorati.com/blogs/bobtest6&#34;&gt;http://www.technorati.com/blogs/bobtest6&lt;/a&gt; and &lt;a href=&#34;http://www.technorati.com/tag/bobtest6&#34;&gt;http://www.technorati.com/tag/bobtest6&lt;/a&gt;, but it looks like Technorati&amp;rsquo;s crawler treats &amp;ldquo;tag directory&amp;rdquo; as a single &lt;code&gt;rel&lt;/code&gt; entry and ignores the one with a value of &amp;ldquo;directory&amp;rdquo;.&lt;/p&gt;
&lt;p&gt;I could be wrong; this is just guesswork based on experiments. (I&amp;rsquo;ve searched for further background from Technorati on how this really works but had no luck. Although I signed up for their developer&amp;rsquo;s program and sent various e-mails, I still have no idea know how to subscribe to their mailing list.)&lt;/p&gt;
&lt;p&gt;A nice bonus is that the bobtest4 example sort of works from an &lt;a href=&#34;http://www.w3.org/2001/sw/BestPractices/HTML/2005-rdfa-syntax&#34;&gt;RDF/A&lt;/a&gt; perspective, too, although a predicate of &amp;ldquo;tag&amp;rdquo; is awfully vague. Even a name like &amp;ldquo;ttag&amp;rdquo; would give a better clue as to what kind of tag we&amp;rsquo;re talking about: a Technorati one.&lt;/p&gt;
&lt;p&gt;In this and future postings, I&amp;rsquo;m grouping such metadata inside of a &lt;code&gt;div&lt;/code&gt; element with a &lt;code&gt;class&lt;/code&gt; value of &amp;ldquo;technoratiTags&amp;rdquo;. Does anyone have ideas for a better way to incorporate empty &lt;code&gt;a&lt;/code&gt; elements with Technorati metadata into XHTML, particularly if it can take advantage of other metadata standards work?&lt;/p&gt;
&lt;p&gt;(More writing on the use of Technorati tags: &lt;a href=&#34;http://www.oreillynet.com/pub/wlg/6247&#34;&gt;Big-time app uses the a/@rel attribute, boosting &amp;ldquo;folksonomy&amp;rdquo; development&lt;/a&gt; and &lt;a href=&#34;http://www.oreillynet.com/pub/wlg/6516&#34;&gt;Folksonomy tags for indirect linking&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://technorati.com/blogs/technorati%20tags&#34;&gt;&lt;/a&gt; &lt;a href=&#34;http://technorati.com/blogs/technorati%20tags&#34;&gt;&lt;/a&gt; &lt;a href=&#34;http://technorati.com/blogs/metadata&#34;&gt;&lt;/a&gt; &lt;a href=&#34;http://technorati.com/blogs/metadata&#34;&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&#34;4-comments&#34;&gt;4 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://sevenroot.org/dlc/&#34; title=&#34;http://sevenroot.org/dlc/&#34;&gt;Darren Chamberlain&lt;/a&gt; on &lt;a href=&#34;#comment-51&#34;&gt;December 27, 2005 4:32 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Doesn&amp;rsquo;t Technorati also use atom:category or dc:subject elements for determining a story&amp;rsquo;s tags? I think this is what you really want, rather than putting empty anchor tags on your pages:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;If your blog software supports categories and RSS/Atom feeds (like Movable Type, WordPress, TypePad, Blogware, Radio), just use the included category system and make sure you are publishing RSS/Atom feeds and your categories will be read as tag.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Also, and this is the best part, in my opinion:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;You do not have to link to Technorati. You can link to any web page that ends in a tag - even your own site!&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;So a regular link to Wikipedia, or your own categories, or whatever, can have rel=&amp;ldquo;tag&amp;rdquo; on it, so you don&amp;rsquo;t need to artificially link.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bob&#34; title=&#34;http://www.snee.com/bob&#34;&gt;Bob&lt;/a&gt; on &lt;a href=&#34;#comment-52&#34;&gt;December 27, 2005 6:00 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;MovableType puts a block of RDF metadata in the entry, but comments it out. I do see dc:subject=&amp;ldquo;metadata&amp;rdquo; in there for this entry, but nothing showing up at &lt;a href=&#34;http://www.technorati.com/blogs/metadata?sort=recent,&#34;&gt;http://www.technorati.com/blogs/metadata?sort=recent,&lt;/a&gt; which is where I&amp;rsquo;d expect to find it, so I don&amp;rsquo;t think that Technorati is parsing anything inside the comments. I do see something at &lt;a href=&#34;http://technorati.com/tag/metadata,&#34;&gt;http://technorati.com/tag/metadata,&lt;/a&gt; but I think that comes from the kinds of tags I describe in the posting.&lt;/p&gt;
&lt;p&gt;I did see the part about any URL working as long as you had rel=&amp;ldquo;tag&amp;rdquo; in their documentation, but if that worked then my zamfir10 test at &lt;a href=&#34;http://www.snee.com/sneetard/2005/12/december_13_test_2.html&#34;&gt;http://www.snee.com/sneetard/2005/12/december_13_test_2.html&lt;/a&gt; should have worked. The use of atom:category should work, but MovableType only lets me pick one category and (I think) converts that to atom:category in the Atom feed, and I&amp;rsquo;d like to assign multiple bits of metadata to a single posting.&lt;/p&gt;
&lt;p&gt;thanks,&lt;/p&gt;
&lt;p&gt;Bob&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://epeus.blogspot.com&#34; title=&#34;http://epeus.blogspot.com&#34;&gt;Kevin Marks&lt;/a&gt; on &lt;a href=&#34;#comment-61&#34;&gt;January 3, 2006 3:38 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Hi Bob, have a closer read of the &lt;a href=&#34;http://microformats.org/wiki/rel-directory&#34;&gt;http://microformats.org/wiki/rel-directory&lt;/a&gt; spec page.&lt;br /&gt;
Using rel=&amp;ldquo;directory&amp;rdquo; asserts that the linked page is a directory listing containing an entry for the current page.&lt;br /&gt;
However, the Technorati blog directory is a tagged one, and so you should assert that it is a tag directory fro your page, if you want us to include it.&lt;br /&gt;
We do combine the meanings - rel=&amp;ldquo;tag&amp;rdquo; says &amp;rsquo;this is a tag&amp;rsquo;; rel=&amp;ldquo;directory&amp;rdquo; says this is a directory for the page, ie the blog as a whole (indicated as the homepage) What we assume from a bare rel=&amp;ldquo;tag&amp;rdquo; is that the context is the blog post. The scope of rel=&amp;ldquo;tag&amp;rdquo; is kept unspecified deliberately, as it is used in multiple contexts (se xFolk and hReview, for example, as well as blog posts). Adding rel=&amp;ldquo;directory&amp;rdquo; resolves this ambiguity.&lt;/p&gt;
&lt;p&gt;Hope that helps.&lt;/p&gt;
&lt;p&gt;PS Putting empty anchor tags is bad practice - if you care enough about the tagspace to use it you should link it, and your links may confuse your readers (especially those using screen-reading software) and search engines that are not aware of these microformats.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bob&#34; title=&#34;http://www.snee.com/bob&#34;&gt;Bob&lt;/a&gt; on &lt;a href=&#34;#comment-64&#34;&gt;January 3, 2006 4:46 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Thanks. I must admit that my example of tagging a post that doesn&amp;rsquo;t use the word &amp;ldquo;metadata&amp;rdquo; with the keyword &amp;ldquo;metadata&amp;rdquo; is an edge case. Tagging existing content words as keywords shouldn&amp;rsquo;t be that challenging, and I&amp;rsquo;ll try to get into that habit.&lt;/p&gt;
&lt;p&gt;Bob\&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2005">2005</category>
      
      <category domain="https://www.bobdc.com//categories/metadata">metadata</category>
      
    </item>
    
    <item>
      <title>Semantic web apparently moving along</title>
      <link>https://www.bobdc.com/blog/semantic-web-apparently-moving/</link>
      <pubDate>Thu, 22 Dec 2005 11:12:55 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/semantic-web-apparently-moving/</guid>
      
      
      <description><div>There are various ways to measure it, but Danny Ayers shows that where it counts, it&#39;s making progress.</div><div>&lt;p&gt;A few months ago, Danny Ayers said that he was looking for work. Now, &lt;a href=&#34;http://dannyayers.com/archives/2005/12/22/work/&#34;&gt;he writes&lt;/a&gt;&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;I seem to have a fairly constant flow of work offers, the majority being frustratingly interesting Semantic Web-related contracts. So if anyone’s actually looking for this kind of thing let me know and I can pass names along.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;You can count up your sourceforge projects, or the amount of &lt;a href=&#34;http://www.rdfdata.org&#34;&gt;RDF data&lt;/a&gt; or school projects or &lt;a href=&#34;http://www.technorati.com/search/%22semantic+web%22&#34;&gt;Technorati mentions&lt;/a&gt; or whatever you like, but if someone who specializes in semantic web work is finding too much of it out there, that&amp;rsquo;s a real indicator of progress.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://technorati.com/blogs/%22semantic+web%22&#34;&gt;&lt;/a&gt; &lt;a href=&#34;http://technorati.com/blogs/%22semantic+web%22&#34;&gt;&lt;/a&gt;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2005">2005</category>
      
      <category domain="https://www.bobdc.com//categories/metadata">metadata</category>
      
    </item>
    
    <item>
      <title>A news reader wish, granted</title>
      <link>https://www.bobdc.com/blog/a-news-reader-wish-granted/</link>
      <pubDate>Sat, 17 Dec 2005 11:33:31 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/a-news-reader-wish-granted/</guid>
      
      
      <description><div>It turns out that Bloglines lets you choose, by feed, whether you want the complete content, the summary, or just the title displayed.</div><div>&lt;p&gt;&lt;em&gt;June 2018 update: when Bloglines went down, I used Google Reader for a few years. When that went away, I used FeedReader for a few years, but having been recently a bit frustrated with their interface, I now use &lt;a href=&#34;http://www.inoreader.com&#34;&gt;inoreader&lt;/a&gt;. Long live RSS!&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;I &lt;a href=&#34;https://www.bobdc.com/blog/short-descriptions-or-full-ent-1&#34;&gt;recently wished&lt;/a&gt; that news readers would let you pick whether you want to see the complete entries or just the summaries in a feed that provides both. I just found out that my newsreader of choice, &lt;a href=&#34;http://www.bloglines.com/&#34;&gt;Bloglines&lt;/a&gt;, does this and more: you can pick whether to see the complete entry, the summary, or just the title, and you can make different choices for different feeds. For Atom 1.0 feeds, it uses the &lt;code&gt;summary&lt;/code&gt; element for the summary and the &lt;code&gt;content&lt;/code&gt; element for the complete entry, as it should.&lt;/p&gt;
&lt;p&gt;The ability to view just the title is an extra bonus, because it lets you scan a lot of entries even more quickly, assuming that the titles are written by professionals who are interested in describing the entries well, which is true for serious news sites like the Guardian or the New York Times. If the title only piques your interest a little, so that you&amp;rsquo;d like to see the summary without opening up the complete story in a new tab or window, you can just click the plus sign to see the summary:&lt;/p&gt;
&lt;img id=&#34;i3&#34; src=&#34;https://www.bobdc.com/img/main/bloglinessum.jpg&#34; alt=&#34;bloglines screenshot&#34;/&gt;
&lt;p&gt;It&amp;rsquo;s yet another reason to use Bloglines, and yet another reason to use Atom 1.0.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://technorati.com/tag/bobtest4&#34;&gt;&lt;/a&gt; &lt;a href=&#34;http://technorati.com/tag/bobtest5&#34;&gt;&lt;/a&gt; &lt;a href=&#34;http://technorati.com/tag/bobtest6&#34;&gt;&lt;/a&gt;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2005">2005</category>
      
      <category domain="https://www.bobdc.com//categories/blogging-about-blogging">blogging about blogging</category>
      
    </item>
    
    <item>
      <title>Scripting the addition of XML files to the eXist XQuery database</title>
      <link>https://www.bobdc.com/blog/scripting-the-addition-of-xml/</link>
      <pubDate>Mon, 12 Dec 2005 17:23:48 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/scripting-the-addition-of-xml/</guid>
      
      
      <description><div>This wasn&#39;t documented very well, so once I got it to work I thought I&#39;d post it.</div><div>&lt;p&gt;Saxon is great for getting to know XQuery syntax (see &lt;a href=&#34;http://www.xml.com/pub/a/2005/03/02/xquery.html&#34;&gt;part one&lt;/a&gt; and &lt;a href=&#34;http://www.xml.com/pub/a/2005/03/23/xquery-2.html&#34;&gt;part two&lt;/a&gt; of my &amp;ldquo;Getting Started with XQuery&amp;rdquo; articles in XML.com for more on this), but it reads all of the data to query into memory, and much of the point of XQuery is to work with large, indexed, disk-based collections of XML that won&amp;rsquo;t fit into memory. I&amp;rsquo;ve started playing with the open-source &lt;a href=&#34;http://exist.sourceforge.net/&#34;&gt;eXist&lt;/a&gt; XML database for this.&lt;/p&gt;
&lt;p&gt;After starting up the eXist server, you can start up the interactive client and load files from there, but if the client has any problems loading the files, it doesn&amp;rsquo;t show any error messages that I could find—all I knew was that the file I tried to load wasn&amp;rsquo;t showing up in the client&amp;rsquo;s list of loaded files. If you want to load a lot of files, you don&amp;rsquo;t want an interactive client, anyway; you want to create a script that does it for you. Apparently, the documentation and sample perl/python/java scripts that come with eXist are a bit behind the development of the system itself, so they don&amp;rsquo;t always work. I finally found a simple way to load files using an eXist extension to XQuery, demonstrated by the code below.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;(: Load the files temp2a.xml, temp2b.xml, temp2c.xml 
   from c:\temp into the eXist database. :)


xquery version &amp;quot;1.0&amp;quot;;


declare namespace xmldb=&amp;quot;http://exist-db.org/xquery/xmldb&amp;quot;;


&amp;lt;html&amp;gt;&amp;lt;body&amp;gt;
{ 
  (: We&#39;ll load each file into the coll1 collection as the administrator. :)
  let $collection := xmldb:collection(&amp;quot;xmldb:exist:///db/coll1&amp;quot;, &amp;quot;admin&amp;quot;, &amp;quot;&amp;quot;)
  for $dataFilename in (&amp;quot;temp2a&amp;quot;,&amp;quot;temp2b&amp;quot;,&amp;quot;temp2c&amp;quot;)
      let $name := $dataFilename
      let $URI := xs:anyURI(concat(&amp;quot;file:///c:/temp/&amp;quot;,$name,&amp;quot;.xml&amp;quot;))
      let $retCode :=  xmldb:store($collection, $name, $URI)
      return &amp;lt;p&amp;gt;{$retCode}&amp;lt;/p&amp;gt; 
 }
&amp;lt;/body&amp;gt;&amp;lt;/html&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;With eXist stored in c:\bin\eXist on a Windows machine and its server up and running, storing the XQuery script above as C:\bin\eXist\webapp\xquery\loadfiles.xq and then sending a browser to http://localhost:8080/exist/xquery/loadfiles.xq ran the query, loaded the files, and displayed the return codes in the browser.&lt;/p&gt;
&lt;p&gt;After getting this to work with simple dummy files, I found what was wrong with the file I was originally having problems with: &amp;ldquo;The document is too complex/irregularily structured to be mapped into eXist&amp;rsquo;s numbering scheme.&amp;rdquo; As a dayjob-related file, I can&amp;rsquo;t describe it in much detail, but this reaction to it didn&amp;rsquo;t surprise me. Still, I have plenty of ideas for eXist apps to build around less complex XML.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://technorati.com/blogs/exist&#34;&gt;&lt;/a&gt; &lt;a href=&#34;http://technorati.com/blogs/XML&#34;&gt;&lt;/a&gt; &lt;a href=&#34;http://technorati.com/blogs/XQuery&#34;&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&#34;1-comments&#34;&gt;1 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.ruminate.co.uk&#34; title=&#34;http://www.ruminate.co.uk&#34;&gt;Jim Fuller&lt;/a&gt; on &lt;a href=&#34;#comment-30&#34;&gt;December 14, 2005 8:56 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;i have been using eXist in anger for the past year&amp;hellip;.a few bits of advice;&lt;/p&gt;
&lt;p&gt;* u must use the latest build&amp;hellip;too much volatility with older snapshots&lt;/p&gt;
&lt;p&gt;* I would suggest using the very useful Ant tasks for doing mundane stuff like loading and exporting data to the database&lt;/p&gt;
&lt;p&gt;* as for complex xml type errors&amp;hellip;i havent encountered this! usually issues with getting an xml document into eXist is &amp;hellip;not well formed, dtd is not registered in /exist/WEB-INF/catalog, super large docs (should break up into collections/docs if possible)&lt;/p&gt;
&lt;p&gt;* the REST interface is fine with the current perl scripts&amp;hellip;i use them constantly&lt;/p&gt;
&lt;p&gt;* authentication and doc ownership is a little hit and miss with eXist at the moment across various interfaces (webdav, servlet, REST, XML-RPC)&lt;/p&gt;
&lt;p&gt;gl, Jim Fuller&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2005">2005</category>
      
      <category domain="https://www.bobdc.com//categories/xml">XML</category>
      
    </item>
    
    <item>
      <title>Using (or not using) Adobe&#39;s XMP metadata format</title>
      <link>https://www.bobdc.com/blog/using-or-not-using-adobes-xmp/</link>
      <pubDate>Fri, 09 Dec 2005 18:21:29 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/using-or-not-using-adobes-xmp/</guid>
      
      
      <description><div>If Adobe is really interested in promoting XMP properly, they could learn a lot about developer relations from Yahoo and Amazon.</div><div>&lt;p&gt;Adobe is pushing for &lt;a href=&#34;http://www.adobe.com/products/xmp/main.html&#34;&gt;XMP&lt;/a&gt; to become the metadata format for the OASIS &lt;a href=&#34;http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=office&#34;&gt;OpenDocument&lt;/a&gt; format, and Leigh Dodds just posted some &lt;a href=&#34;http://www.ldodds.com/blog/archives/000261.html&#34;&gt;notes on his review of XMP&lt;/a&gt;. He covers the XMP-RDF relationship issues better than I could. He also refers to my &lt;a href=&#34;http://www.xml.com/pub/a/2004/09/22/xmp.html&#34;&gt;article&lt;/a&gt; on XMP published in XML.com over a year ago, in which I wrote this:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;For now, the use of XMP means either depending on commercial vendor tools or being comfortable with C++ so that you can use Adobe&amp;rsquo;s SDK, but this is changing. Activity in the XMP User-to-User forum shows that open source Java tools are on the way, which will make it much easier to incorporate the use of XMP into production workflows—for example, to extract the metadata from a batch of images and then load that data into a database.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I&amp;rsquo;d like to revise that:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The use of XMP means either depending on commercial vendor tools or being comfortable with C++ so that you can use Adobe&amp;rsquo;s SDK.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;As Leigh wrote, &amp;ldquo;The ability to embed metadata in arbitrary binary document formats is a huge benefit&amp;rdquo; of XMP. I&amp;rsquo;ve always thought that XMP had great potential to be a bridge between the often overly-academic world of RDF and the world of commercial media production. It&amp;rsquo;s too bad that while Adobe is very interested in corporate tool support of XMP, they&amp;rsquo;re not interested in building the kind of grass roots tool support that would make XMP much more useful and therefore popular.&lt;/p&gt;
&lt;p&gt;When Adobe representative Alan Lillich recently &lt;a href=&#34;http://lists.oasis-open.org/archives/office/200512/msg00009.html&#34;&gt;made the case&lt;/a&gt; to the OpenDocument TC for XMP to become the metadata format, he included a section &amp;ldquo;About the Adobe XMP SDK&amp;rdquo; that neglected to mention that it&amp;rsquo;s limited to use by C++ programmers. When he writes that the &amp;ldquo;revamped API&amp;hellip; [is] much easier to use&amp;rdquo; I assume that he means that some class libraries have been refactored and the documentation has been improved. (I just downloaded the current SDK again to check, and it&amp;rsquo;s mostly cpp, hpp, and html files.)&lt;/p&gt;
&lt;p&gt;If Adobe looked at the effort that Amazon and Yahoo have put into encouraging application development by part-time programmers using popular scripting languages, they might get some ideas about spreading the popularity of the XMP specification. If Adobe had a wrapper for a popular scripting language around the C++ based binaries two years ago (note that XMP is &lt;a href=&#34;http://www.adobe.com/aboutadobe/pressroom/pressreleases/200109/20010924xmp.html&#34;&gt;over four years&lt;/a&gt; old), by now flickr users would be clamoring for flickr to pull XMP metadata from their pictures and post it on each picture&amp;rsquo;s page, as many flickr users did for EXIF metadata. I know that if Adobe offered a python or perl binding, I would have written some apps to insert and pull XMP metadata by now. I did some C++ coding in school, but setting up a C++ environment (free or otherwise) and getting up to speed with it would take more time than I have available. It would take less time, if Adobe offered a Ruby or PHP binding to the API, for me to finally settle down and learn either of those languages.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;d like to see the OpenDocument TC push for a stronger commitment from Adobe to support grass-roots XMP app development before the TC adds metadata structures to the OpenDocument formats that are, for now, mostly geared to use by tools from Adobe. (I&amp;rsquo;ve seen lists from Adobe of companies that claim some support for XMP, but a URL like &lt;a href=&#34;https://www.kodak.com&#34;&gt;www.kodak.com&lt;/a&gt; doesn&amp;rsquo;t tell me much about what that company&amp;rsquo;s support is.) Without easier ways to use XMP metadata, I don&amp;rsquo;t see much payoff to the TC for making it part of the OpenDocument format.&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://technorati.com/blogs/Adobe&#34;&gt;&lt;/a&gt; &lt;a href=&#34;http://technorati.com/blogs/XMP&#34;&gt;&lt;/a&gt; &lt;a href=&#34;http://technorati.com/blogs/RDF&#34;&gt;&lt;/a&gt; &lt;a href=&#34;http://technorati.com/blogs/metadata&#34;&gt;&lt;/a&gt; &lt;a href=&#34;http://technorati.com/blogs/OpenDocument&#34;&gt;&lt;/a&gt;&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2005">2005</category>
      
      <category domain="https://www.bobdc.com//categories/metadata">metadata</category>
      
    </item>
    
    <item>
      <title>Short descriptions or full entries in the feed: your choice</title>
      <link>https://www.bobdc.com/blog/short-descriptions-or-full-ent-1/</link>
      <pubDate>Fri, 09 Dec 2005 12:42:00 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/short-descriptions-or-full-ent-1/</guid>
      
      
      <description><div>If this is too terse for you, you now have an alternative.</div><div>&lt;p&gt;The debate over whether feeds should provide complete entries or brief descriptions of them is getting religious in its fervor. People on both sides are so damn sure that they&amp;rsquo;re right.&lt;/p&gt;
&lt;p&gt;Personally, I prefer short descriptions. When using Bloglines to catch up on feeds while eating lunch, I like to scan as many entries as possible, and paging down and paging down through someone&amp;rsquo;s C# code in the Planet XML feed really slows this process down. Still, I can see reasons to want the entire entry in the feed—it saves you some clicks and provides more possibilities for new apps that build on that data. I also hated the dependence of earlier feed formats on CDATA sections to make this possible, so the option to put a &lt;code&gt;type=&amp;quot;xhtml&amp;quot;&lt;/code&gt; attribute setting on the &lt;code&gt;content&lt;/code&gt; element is probably my single favorite thing about Atom 1.0.&lt;/p&gt;
&lt;p&gt;It would be nice if all Atom feeds had both the summary and the content, and news readers let you pick which one you want displayed. Meanwhile, I decided to provide two feeds for people to choose from. I was surprised at how few weblogs my web searches turned up that offer this choice. And meanwhile, Norm Walsh and Sean McGrath&amp;rsquo;s weblog feeds will continue to provide my model for pithy entry descriptions.&lt;/p&gt;
&lt;h2 id=&#34;1-comments&#34;&gt;1 Comments&lt;/h2&gt;
&lt;p&gt;By Martin on &lt;a href=&#34;#comment-28&#34;&gt;December 11, 2005 5:18 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Thanks!&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2005">2005</category>
      
      <category domain="https://www.bobdc.com//categories/blogging-about-blogging">blogging about blogging</category>
      
    </item>
    
    <item>
      <title>25 years of database history (starting in 1955)</title>
      <link>https://www.bobdc.com/blog/25-years-of-database-history-s-1/</link>
      <pubDate>Tue, 06 Dec 2005 17:19:01 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/25-years-of-database-history-s-1/</guid>
      
      
      <description><div>A 1981 article in IBM&#39;s Journal of Research and Development gave me a much better perspective on how database systems got where they are.</div><div>&lt;p&gt;A 1981 article in IBM&amp;rsquo;s Journal of Research and Development gave me a much better perspective on how database systems got where they are. The abstract of W.C. McGee&amp;rsquo;s article &lt;a href=&#34;http://domino.research.ibm.com/tchjr/journalindex.nsf/0/18c2b2dadee8a44985256bfa0067f4d8?OpenDocument&#34;&gt;Data Base Technology&lt;/a&gt; tells us that &amp;ldquo;The evolution of data base technology over the past twenty-five years is surveyed, and major IBM contributions to this technology are identified and briefly noted.&amp;rdquo; It put a lot of disjointed facts that I knew in perspective, showing how one thing led to another. All italicizing in indented block quotations below is his.&lt;/p&gt;
&lt;p&gt;Around 1964, the term &amp;ldquo;data base&amp;rdquo; was &amp;ldquo;coined by workers in military information systems to denote collections of data shared by end-users of time sharing computer systems.&amp;rdquo; In earlier days, each application had its own &amp;ldquo;master files&amp;rdquo; of data, so the concept of a data collection that could be shared by multiple applications was a new idea in efficiency.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The data structure classes of early systems were derived from punched card technology, and thus tended to be quite simple. A typical class was composed of &lt;em&gt;files&lt;/em&gt; of &lt;em&gt;records&lt;/em&gt; of a single type, with the record type being defined by an ordered set of fixed-length &lt;em&gt;fields&lt;/em&gt;. Because of their regularity, such files are now referred to as &lt;em&gt;flat files&lt;/em&gt;&amp;hellip; Files were typically implemented on sequential storage media, such as magnetic tape.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Representation of one-to-many relationships was an early challenge.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The processing required to reflect such associations was not unlike punched card processing, involving many separate sorting and merging steps.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;When punched cards were the only practical form of memory, the kinds of RAM-based interim data structures such as arrays and lists that we now create on the way to a final result all had to be done as separate piles of cards. While magnetic tape was an obvious step forward, separate runs for each sort operation and extraction must have still been pretty tedious.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Early structuring methods had the additional problem of being hardware-oriented. As a result, the languages used to operate on structures were similarly oriented.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;He goes on to describe the evolution of the key &amp;ldquo;data structure classes&amp;rdquo; of databases: hierarchic (what we now call &amp;ldquo;hierarchical&amp;rdquo;), network, relational, and semantic. If you picture a database using each of these models as a collection of tables (or flat files), the great advantage of the relational model was the ability to create run-time connections between the tables—a JOIN. For hierarchic and network databases, they keys that represented links between tables had to be specified when you defined the tables. The advantage of the network model over the earlier hierarchic model was that the pattern of permanent joins did not need to fit into a tree structure. While I once worked at a &lt;a href=&#34;http://www.informationbuilders.com/&#34;&gt;company that made a multi-platform hierarchic database&lt;/a&gt;, I never realized that hierarchic databases were around before computers had hard disks, when everything was done using tapes and punch cards. IBM began designing its IMS hierarchic database in 1966 for the Apollo space program, and it&amp;rsquo;s &lt;a href=&#34;http://www-306.ibm.com/software/data/ims/&#34;&gt;still around today&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Hierarchic databases were bad at storing many-to-many relationships, and in the mid-sixties the network model was developed. The use of the first commercial hard disks in computers enabled this more flexible access to data. While I&amp;rsquo;ve heard of the IBM IDMS product and GE&amp;rsquo;s IDS that McGee mentions, I&amp;rsquo;ve never heard of &amp;ldquo;the TOTAL DBMS of CINCOM, perhaps the most widely used DBMS in the world today.&amp;rdquo;&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;In the mid-1960s, a number of investigators began to grow dissatisfied with the hardware orientation of then extant data structuring methods, and in particular with the manner in which pointers and similar devices for implementing entity associations were being exposed to the users.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Mathematical approaches that applied set theory to data management used tables to represent sets of entities with attributes.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The key new concepts in the entity set method were the simplicity of the structures it provided and the use of entity identifiers (rather than pointers or hardware-dictated structures)&amp;hellip; In the late 1960s, [IBM&amp;rsquo;s] E.F. Codd noted that an entity set could be viewed as a mathematical relation on a set of domains D~1~, D~2~,. . .,D~&lt;em&gt;n&lt;/em&gt;~, where each domain corresponds to a different property of the entity set.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This led to the relational database model.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Aside from the mathematical relation parallel, Codd&amp;rsquo;s major contribution to data structures was the introduction of the notions of &lt;em&gt;normalization&lt;/em&gt; and &lt;em&gt;normal forms&lt;/em&gt;&amp;hellip; To avoid update anomalies, Codd recommended that all information be represented in third normal form. While this conclusion may seem obvious today[1980!], it should be remembered that at the time the recommendation was made, the relationship between data structures and information was not well understood. Codd&amp;rsquo;s work in effect paved the way for much of the work done on information modeling in the past ten years.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;It also paved the way for a 1977 startup in Redwood Shores, California, called Software Development Laboratories, that could completely commit to the relational model, unlike IBM, who had many big customers using IMS and IDMS on IBM mainframes. When writing his paper three years later, McGee saw no reason to mention this little company, which would go on to become Oracle Corporation and play an obviously huge role in the use of relational databases.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Codd characterized his methodology as a &lt;em&gt;data model&lt;/em&gt; and thereby provided a concise term for an important but previously unarticulated data base concept, namely, the &lt;em&gt;combination&lt;/em&gt; of a class of data structures and the operations allowed on the structures of the class&amp;hellip; The term &amp;ldquo;model&amp;rdquo; has been applied retroactively to early data structuring methods, so that, for example, we now speak of &amp;ldquo;hierarchic models&amp;rdquo; and &amp;ldquo;network models&amp;rdquo; as well as the relational model.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Many today consider the main choices of data models to be relational databases versus object-oriented models, with relational models having the performance edge because of the regular structure of the data. It&amp;rsquo;s ironic to read McGee describe the performance problems of early relational databases; then, as now, higher levels of abstraction required more cycles—I guess those &amp;ldquo;hardware-dictated structures&amp;rdquo; had a payoff after all!&lt;/p&gt;
&lt;p&gt;Speaking of object-oriented databases, note that McGee&amp;rsquo;s snapshot of the state of the art in 1980 names &amp;ldquo;semantic data structures&amp;rdquo; as the next step after relational databases. He describes Peter Chen&amp;rsquo;s Entity Relationship Model as an example of a semantic model. Academic papers and database (or rather, &amp;ldquo;data base&amp;rdquo;) textbooks of the time are full of talk of the value of this next higher level of abstraction. Some could argue that the object-oriented approach was either competition to or an outgrowth of this work; I don&amp;rsquo;t have the background to make a case for either side. For a bit more irony, it&amp;rsquo;s kind of funny in this day of &amp;ldquo;semantic web&amp;rdquo; advocacy to read the big promises made in the name of &amp;ldquo;semantic data base systems&amp;rdquo; back then.&lt;/p&gt;
&lt;p&gt;McGee&amp;rsquo;s paper covers the development of other important DBMS concepts, often at IBM, such as the concept of the transaction (1976), views and authorization (1975), and report generators. This last development is interesting enough that I&amp;rsquo;ll cover it in a separate essay.&lt;/p&gt;
&lt;p&gt;At the end, after an introduction to the basic problem of distributed databases, McGee&amp;rsquo;s conclusion tells us:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The solution of these problems promises to make the next twenty-five years of database technology as eventful and stimulating as the past twenty-five years have been.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I wonder if he considers new database developments from 1980 to 2005 to have been as stimulating as those of the preceding twenty-five years. I&amp;rsquo;d be surprised if he did. While the role of computers in our lives has obviously leapt ahead in that period, the progress in database technology, outside of performance issues and progress on distributed databases, can&amp;rsquo;t compare to all the developments of those first twenty-five years. Advances that led to applications like Google are part of full-text search and information retrieval, a separate field with its own history going back to the early nineteen-sixties. I&amp;rsquo;ll write about that when I finish &lt;a href=&#34;http://www.amazon.com/gp/product/0262025388/&#34;&gt;something else that I&amp;rsquo;m reading&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;2-comments&#34;&gt;2 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.gentoo.ro&#34; title=&#34;http://www.gentoo.ro&#34;&gt;mudrii&lt;/a&gt; on &lt;a href=&#34;#comment-25&#34;&gt;December 7, 2005 12:12 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Thank you for article it was very interesting to read and look into the past.&lt;br /&gt;
Thanks&lt;br /&gt;
Regards&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.SQLSummit.com&#34; title=&#34;http://www.SQLSummit.com&#34;&gt;Ken North&lt;/a&gt; on &lt;a href=&#34;#comment-1109&#34;&gt;August 3, 2007 12:58 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;W.C. McGee&amp;rsquo;s &lt;em&gt;IBM Systems Journal&lt;/em&gt; article provides an interesting history. There is another publication that provides quite a bit of detail. In March 1976, &lt;em&gt;ACM Computing Surveys&lt;/em&gt; published a special issue about &amp;ldquo;Data-Base Management Systems&amp;rdquo;. Several authors contributed six articles about the evolution of databases, about relational, CODASYL and hierarchical databases, and a comparison between relational and CODASYL (network) database technology. Don Chamberlin of IBM, co-inventor of SQL and XQuery, was the author of the article about relational DBMSs. The March 1976 articles are in the ACM digital archive at:&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://portal.acm.org/ft_gateway.cfm?id=984386&amp;amp;type=pdf&#34;&gt;http://portal.acm.org/ft_gateway.cfm?id=984386&amp;amp;type=pdf&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;You mentioned &amp;ldquo;Mathematical approaches that applied set theory to data management&amp;rdquo; before going into an explanation of Codd&amp;rsquo;s relational theory. This presentation about technology trends includes information about computing history, including the origins of database and the relational model:&lt;br /&gt;
&lt;a href=&#34;http://www.webservicessummit.com/Trends/TechTrends1/ComputingTrends_part1.html&#34;&gt;Software and Database Technology Trends (slide presentation)&lt;/a&gt;&lt;br /&gt;
&lt;br /&gt;
You&amp;rsquo;ll find it acknowledges the contribution of David L. Childs. Codd&amp;rsquo;s seminal paper about the relational model followed a Childs&amp;rsquo; paper about set theoretic data structures. In fact, Codd cited the Childs paper in his paper.&lt;/p&gt;
&lt;p&gt;We discussed this history in comp.database.theory in 2004. That thread noted Childs&amp;rsquo; work wasn&amp;rsquo;t widely published at the time because it was government-funded research with a restricted audience.&lt;/p&gt;
&lt;p&gt;The relational model evolved over time. For example, by 1990, Chris Date was describing the relational model in three parts (&amp;ldquo;Introduction to Database Systems&amp;rdquo;):&lt;/p&gt;
&lt;p&gt;\&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;relational data structure\&lt;/li&gt;
&lt;li&gt;relational integrity rules\&lt;/li&gt;
&lt;li&gt;relational algebra.&lt;br /&gt;
\&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Childs&amp;rsquo; 1968 papers and Codd&amp;rsquo;s 1970 paper discussed structure (independent sets, no fixed structure, access by name instead of by pointers) and operations (union, restriction, etc.). Childs&amp;rsquo; papers included benchmark times for doing set operations on an IBM 7090. Codd&amp;rsquo;s 1970 paper introduced normal forms, and his subsequent papers introduced the integrity rules.&lt;/p&gt;
&lt;p&gt;What&amp;rsquo;s interesting is the University of Michigan connection. Codd, Bing Yao, and Michael Stonebreaker were graduates. Some of the work done at University of Michigan during that time (Childs&amp;rsquo; STDS, Ash and Sibley&amp;rsquo;s TRAMP relational memory) was for the CONCOMP project. It was funded by the US government and the research was available only to &amp;ldquo;qualified requesters&amp;rdquo;.\&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2005">2005</category>
      
      <category domain="https://www.bobdc.com//categories/technology-past">technology, past</category>
      
    </item>
    
    <item>
      <title>&#34;Turing&#39;s Cathedral&#34; and XSLT</title>
      <link>https://www.bobdc.com/blog/turings-cathedral-and-xslt/</link>
      <pubDate>Fri, 02 Dec 2005 14:55:19 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/turings-cathedral-and-xslt/</guid>
      
      
      <description><div>George Dyson may not know anything about XSLT, but his recent essay about Google, John von Neumann, and biological computation reminded me of the two leading approaches to XSLT development.</div><div>&lt;p&gt;In George Dyson&amp;rsquo;s recent Third Culture essay &lt;a href=&#34;http://www.edge.org/3rd_culture/dyson05/dyson05_index.html&#34;&gt;Turing&amp;rsquo;s Cathedral&lt;/a&gt;, one theme is the value of a shift in programming models toward something closer to biological &amp;ldquo;computation,&amp;rdquo; and Google&amp;rsquo;s potential role in this. The general idea is that instead of writing instructions to act on data at specific locations in memory, which is how computers have worked since John von Neumann first set up the concept of the stored program computer, code would be written to act on certain data when it comes along. Apparently, von Neumann himself was getting interested in the biological model before he died of cancer. Here&amp;rsquo;s a quote from Dyson&amp;rsquo;s essay:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Once a system of template-based-addressing is in place, the door is opened to code that can interact directly with other code, free at last from a rigid bureaucracy requiring that every bit be assigned an exact address. You can (and a few people already are) write instructions that say &amp;ldquo;Do THIS with THAT&amp;rdquo;&amp;ndash;without having to specify exactly Where or When. This revolution will start with simple, basic coded objects, on the level of nucleotides heading out on their own and bringing amino acids back to a collective nest. It is 1945 all over again.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;It&amp;rsquo;s easy to say &amp;ldquo;this isn&amp;rsquo;t new, event-driven/OO models, etc. etc.&amp;rdquo; What struck me, especially because Dyson&amp;rsquo;s essay uses the word &amp;ldquo;template&amp;rdquo; repeatedly, is how easily the two approaches he describes line up with the push and pull models of XSLT stylesheet development. Or conversely, what struck me is how well the push/pull distinction fit into the grand themes of a ponderous edge.org essay by a big name in the history of intellectual ideas. It makes the essay a fun read for XSLT geeks.&lt;/p&gt;
&lt;h2 id=&#34;2-comments&#34;&gt;2 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://scottysengineeringlog.net&#34; title=&#34;http://scottysengineeringlog.net&#34;&gt;Scott Hudson&lt;/a&gt; on &lt;a href=&#34;#comment-13&#34;&gt;December 2, 2005 3:25 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Interesting! Since RDF is very much based on specific addressing, and Web 2.0 for that matter, what implications does essay have? Seems to me that Topic Maps would be better suited under this scenario, since you are not required to have a specific address for topics&amp;hellip;&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bob&#34; title=&#34;http://www.snee.com/bob&#34;&gt;Bob&lt;/a&gt; on &lt;a href=&#34;#comment-14&#34;&gt;December 2, 2005 3:50 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Hi Scott,&lt;/p&gt;
&lt;p&gt;RDF &lt;em&gt;data&lt;/em&gt; is (often) based on specific addressing, but much of the semantic web gospel is about building useful apps around potentially incomplete data&amp;ndash;i.e. whatever you can find, or, in terms of Dyson&amp;rsquo;s essay, whatever comes your way. It&amp;rsquo;s actually one of the things that made Web 1.0 so successful: apps that worked with a data set that didn&amp;rsquo;t necessarily have any normalization or referential integrity.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2005">2005</category>
      
      <category domain="https://www.bobdc.com//categories/xslt">XSLT</category>
      
    </item>
    
    <item>
      <title>No plans in place to upgrade Xalan Java to XSLT 2.0</title>
      <link>https://www.bobdc.com/blog/no-plans-in-place-to-upgrade-x/</link>
      <pubDate>Thu, 01 Dec 2005 19:00:08 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/no-plans-in-place-to-upgrade-x/</guid>
      
      
      <description><div>From one of the horse&#39;s mouths.</div><div>&lt;p&gt;On the Xalan Java user mailing list, Henry Zongaro of the IBM Xalan development team recently &lt;a href=&#34;http://mail-archives.apache.org/mod_mbox/xml-xalan-j-users/200511.mbox/%3cOF3E3464AA.2B1FCCC8-ON852570C9.004FA9AB-852570C9.0051ECD3@ca.ibm.com%3e&#34;&gt;replied&lt;/a&gt; to my request about plans for XSLT 2.0 support in Xalan Java: &amp;ldquo;the Xalan [Project Management Committee] hasn&amp;rsquo;t put in place any plans for XSLT 2.0 support.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;When I asked on &lt;a href=&#34;http://www.biglist.com/lists/xsl-list/archives/200511/msg00624.html&#34;&gt;xsl-list&lt;/a&gt; if anyone knew of current or forthcoming XSLT 2.0 implementations besides Saxon, the general response was that there is piecemeal support developing in various other XSLT processors, so this shouldn&amp;rsquo;t hold up XSLT 2.0&amp;rsquo;s progress toward becoming a W3C Recommendation. Still, I&amp;rsquo;d consider the three leading XSLT engines to be Saxon, Xalan, and libxslt, and if two of those three have no current plans to move beyond XSLT 1.0 then I have to worry about XSLT 2.0&amp;rsquo;s future.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;2005-12-28 update:&lt;/em&gt; I just found out that Xalan 2.2 is bundled with the Java 1.4 JRE, making Xalan pretty ubiquitous. This also explains why, after putting Xalan 2.7 jar files on an Ubuntu Linux machine at home and a Windows machine at work after a hard disk replacement, I couldn&amp;rsquo;t get either to work. The Sagehill documentation for using DocBook + XSLT (which is great all-around) shows has a &lt;a href=&#34;http://www.sagehill.net/docbookxsl/InstallingAProcessor.html#EndorsedXalan&#34;&gt;nice workaround&lt;/a&gt; for this.&lt;/p&gt;
&lt;h2 id=&#34;7-comments&#34;&gt;7 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://logopoeia.com/&#34; title=&#34;http://logopoeia.com/&#34;&gt;Michael(tm) Smith&lt;/a&gt; on &lt;a href=&#34;#comment-12&#34;&gt;December 2, 2005 12:47 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I have to worry also. In particular, I think that if the W3C has any concern at all about acceptance of XSLT 2.0 in the open-source community, it should not become a rec unless/until there is an implementation of it based on libxslt/libxml2 (if necessary, forked off from the current libxslt codebase, since the principal developer of libxslt has made it clear that he is personally not interested in implementing support for XSLT 2.0). Or if not libxslt, then in some other non-Java open-source library.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://netapps.muohio.edu/blogs/darcusb/darcusb/&#34; title=&#34;http://netapps.muohio.edu/blogs/darcusb/darcusb/&#34;&gt;Bruce D&amp;rsquo;Arcus&lt;/a&gt; on &lt;a href=&#34;#comment-15&#34;&gt;December 2, 2005 4:58 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I recall a personal communication with David Tolpin, who argued that XSLT 2.0 was likely to be just a single implementation language: Saxon.&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s unfortunate we&amp;rsquo;re not seeing evidence yet that he&amp;rsquo;s wrong; I find a lot of the XSLT 2.0 features really useful.&lt;/p&gt;
&lt;p&gt;I probably need to look into reimplementing my citepoc, my XSLT 2.0-based citation processor in some other language (or combination of them; maybe Ruby or Python + XSLT 1.0?).&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://copia.ogbuji.net&#34; title=&#34;http://copia.ogbuji.net&#34;&gt;Uche&lt;/a&gt; on &lt;a href=&#34;#comment-16&#34;&gt;December 3, 2005 11:58 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Wow. I&amp;rsquo;m very skeptical of XSLT 2.0, but I&amp;rsquo;m surprised to hear that it&amp;rsquo;s becoming a Saxon-only thing. I know that Daniel V is eve more skeptical than I am, so I don&amp;rsquo;t expect there will ever be an libxslt implementation. However, Michael, I think it&amp;rsquo;s grossly overstated to suggest that XSLT 2 should proceed to REC without a libxslt impl. C&amp;rsquo;mon!&lt;/p&gt;
&lt;p&gt;It underscores the fact that we need to keep EXSLT going with features borrowed from XSLT 1.0, but to be made available in 1.0.&lt;/p&gt;
&lt;p&gt;And Bruce, oh yeah. Python+XSLT 1.0 rocks the park. It&amp;rsquo;s the reason I don&amp;rsquo;t even have to care about what happens to XSLT 2. See:&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://copia.ogbuji.net/blog/2005-10-20/Processing&#34;&gt;http://copia.ogbuji.net/blog/2005-10-20/Processing&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://w3future.com/weblog/&#34; title=&#34;http://w3future.com/weblog/&#34;&gt;Sjoerd Visscher&lt;/a&gt; on &lt;a href=&#34;#comment-21&#34;&gt;December 6, 2005 5:07 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I think MSXML is a leading XSLT engine as well. And Microsoft doesn&amp;rsquo;t have plans to support XSLT 2.0 either.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bob&#34; title=&#34;http://www.snee.com/bob&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-23&#34;&gt;December 6, 2005 7:45 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Yeah, and while they&amp;rsquo;ve made noises about XQuery support, they&amp;rsquo;re so big on this XLinq thing (Google: &amp;ldquo;did you mean XLink?&amp;rdquo;) that I have to wonder about that as well.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://weblog.philringnalda.com/&#34; title=&#34;http://weblog.philringnalda.com/&#34;&gt;Phil Ringnalda&lt;/a&gt; on &lt;a href=&#34;#comment-53&#34;&gt;December 28, 2005 7:29 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The unclosed &amp;ldquo;update&amp;rdquo; paragraph that&amp;rsquo;s currently blowing up your Atom feed is why I&amp;rsquo;ve never felt very good about including inline XHTML in a feed without also serving it as application/xhtml+xml, to see when it breaks (or, of course, having an XML toolchain involved that wouldn&amp;rsquo;t let you save anything that&amp;rsquo;s not well-formed).&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bob&#34; title=&#34;http://www.snee.com/bob&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-54&#34;&gt;December 28, 2005 9:12 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;1. Good point.&lt;/p&gt;
&lt;p&gt;2. Oops, sorry. I write new entries in Emacs with nxml to make sure that it&amp;rsquo;s valid XHTML, and got sloppy in throwing that single paragraph into a MoveableType data entry form, and now I know what goes wrong if I&amp;rsquo;m not careful.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2005">2005</category>
      
      <category domain="https://www.bobdc.com//categories/xslt">XSLT</category>
      
    </item>
    
    <item>
      <title>Getting started</title>
      <link>https://www.bobdc.com/blog/getting-started/</link>
      <pubDate>Wed, 30 Nov 2005 18:31:12 -0500</pubDate>
      
      <guid>https://www.bobdc.com/blog/getting-started/</guid>
      
      
      <description><div>I&#39;ve decided to start a more general-purpose weblog.</div><div>&lt;p&gt;I&amp;rsquo;ve decided, with some encouragement last summer from Eve Maler, to start a general-purpose weblog. I&amp;rsquo;ve done a lot of writing on technical topics in various media over the years, but postponed doing this until I felt more of a sense of purpose about it.&lt;/p&gt;
&lt;p&gt;My use of blogs so far has been more experimental:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;http://www.oreillynet.com/pub/au/1191&#34;&gt;Thinking About Linking&lt;/a&gt; on the O&amp;rsquo;Reilly Network focused on linking-related issues. I began it in April of 2003 when a now-defunct mailing list on hypertext fizzled, thinking that I would write my ideas about linking and then whoever wanted to read them could when I wrote them or at some point in the future. My plan was for the list of postings to become a resource for myself and other people doing research on the topic, and that hopefully I&amp;rsquo;d get some feedback on those ideas. I also wanted to see if greater focus in a weblog made it more valuable than one about just anything.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;http://www.snee.com/sneetard/&#34;&gt;sneetard&lt;/a&gt; is a blog by and for two people, so that my brother and I can point out funny things we&amp;rsquo;ve seen to each other. This topic formed the bulk of our e-mail before (much to my wife&amp;rsquo;s frustration when I told her that Peter and I e-mailed each other three or four times on a given day and she then asked &amp;ldquo;how is he?&amp;rdquo; and I had no reply), and the blog reduces the chance that one of us will point out something on WFMU&amp;rsquo;s &lt;a href=&#34;http://blog.wfmu.org/&#34;&gt;Beware the Blog&lt;/a&gt; or &lt;a href=&#34;http://www.boingboing.net&#34;&gt;boingboing&lt;/a&gt; that the other has already seen.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Xanadu is a private weblog for about 45 people involved in metadata and the publishing industry.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;What all three of these have in common is that none is a blog in which one person writes about anything they want for the whole world to see, like most weblogs. Each has a fairly specific purpose, goofy or otherwise. bobdc.blog will be a lot closer to the one-person-soapbox model, but I will continue to refrain from discussing my new favorite CD or what I had for breakfast. I&amp;rsquo;ll be discussing new and old technology, and a weblog on my own domain, in which I can tweak the stylesheets and play with the settings to understand that technology better, is a more appropriate place to do that. As to old technology, I&amp;rsquo;ll paste and edit the last two paragraphs from my last O&amp;rsquo;Reilly weblog posting here:&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve learned from recent reading that most histories of computers focus on computers that specialized in the most advanced math possible for their time, which was as much of a niche application in 1900 and 1945 as it is now. Many key tasks that we use computers for today—particularly database tasks—were being carried out by automated, usually electrical machines since the nineteenth century in a separate but parallel history to the Collosus-Mark I-ENIAC-EDVAC history of computers that you typically read about. Did you know that during World War I the U.S. Army could run automated queries against a database to find, for example, French-speaking soldiers with a chauffeur&amp;rsquo;s license? Lately, I&amp;rsquo;ve been fascinated by large-scale database applications that predate any database technology that geeks currently take seriously. A lot of people now consider any pre-relational technology to be prehistoric; that&amp;rsquo;s a pretty limited perspective.&lt;/p&gt;
&lt;p&gt;The history of computing applications, with or without the use of electrical stored-program computers, has a lot to teach us about the problems and innovations we&amp;rsquo;re working on now. I&amp;rsquo;m sure I&amp;rsquo;ll be spouting opinions on other developments as well, especially XML-related ones, which I&amp;rsquo;ve worked with and written about since the days when XML was a &lt;a href=&#34;http://xml.coverpages.org/sgml.html&#34;&gt;four-letter word&lt;/a&gt;. (For example: now that the W3C &amp;ldquo;binary XML&amp;rdquo; effort has been renamed &amp;ldquo;&lt;a href=&#34;http://www.w3.org/XML/EXI&#34;&gt;efficient XML interchange&lt;/a&gt;,&amp;rdquo; there&amp;rsquo;s no reason to argue with it anymore, because who can argue with greater efficiency? Right?) When I see interesting linking-related news, I&amp;rsquo;ll probably add new entries to the O&amp;rsquo;Reilly weblog, by my main weblog from now on will be this one. I hope it&amp;rsquo;s worth reading.&lt;/p&gt;
&lt;h2 id=&#34;7-comments&#34;&gt;7 Comments&lt;/h2&gt;
&lt;p&gt;By &lt;a href=&#34;http://plasmasturm.org/&#34; title=&#34;http://plasmasturm.org/&#34;&gt;Aristotle Pagaltzis&lt;/a&gt; on &lt;a href=&#34;#comment-17&#34;&gt;December 4, 2005 6:02 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Hi Bob,&lt;/p&gt;
&lt;p&gt;two requests on technicalia:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Since it’s now /bobdc.blog/bobdcblog.atom rather than /bobdc.blog/atom.xml, could you please update your templates so the link tags in your page headers will point to the correct location? I picked “Atom” from there and got subscribed to the wrong feed, then had to manually fix it.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The right way to handle existing subscribers, assuming you have enough control on the server to instutite it, would be to send a “301 Moved Permanently” redirect in response to requests for /bobdc.blog/atom.xml, rather than putting a dummy feed there. Aggregators can then automatically update their subscription to the correct location.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Anyway, I look forward to reading you here as much as I did on O’Reilly’s site.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bob&#34; title=&#34;http://www.snee.com/bob&#34;&gt;Bob&lt;/a&gt; on &lt;a href=&#34;#comment-18&#34;&gt;December 5, 2005 9:55 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Sorry, I&amp;rsquo;m still getting to know what Movable Type puts where in the various templates. I just fixed it in the templates that generate the Atom file, index.html, and the various archive html files, so please let me know if you find any references to atom.xml elsewhere.&lt;/p&gt;
&lt;p&gt;The 301 return code is a good idea, but I don&amp;rsquo;t have that level of control over the HTTP sent by my host provider&amp;rsquo;s server. I was happy to learn about .htaccess, which is how I get Atom files sent with a MIME type of atom+xml. In fact, that&amp;rsquo;s why I renamed the atom feed file to have an extension of atom.&lt;/p&gt;
&lt;p&gt;thanks,&lt;/p&gt;
&lt;p&gt;Bob&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://plasmasturm.org/&#34; title=&#34;http://plasmasturm.org/&#34;&gt;Aristotle Pagaltzis&lt;/a&gt; on &lt;a href=&#34;#comment-20&#34;&gt;December 5, 2005 10:08 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;No problem, just pointing things out. :-) If you can create a .htaccess, you probably *can* send a 301:&lt;/p&gt;
&lt;p&gt;RedirectMatch 301 ^/bobdc.blog/atom\.xml$ /bobdc.blog/bobdcblog.atom&lt;/p&gt;
&lt;p&gt;That would go in the .htaccess in your document root.&lt;/p&gt;
&lt;p&gt;By Martin on &lt;a href=&#34;#comment-22&#34;&gt;December 6, 2005 5:22 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Do you have any plans on providing a full-content feed? I&amp;rsquo;d appreciate.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bob&#34; title=&#34;http://www.snee.com/bob&#34;&gt;Bob DuCharme&lt;/a&gt; on &lt;a href=&#34;#comment-24&#34;&gt;December 6, 2005 7:49 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;As a matter of fact I was planning on doing that after I get the next post up, which is a pretty long one. Can you point me to any examples of sites that provide both a summary Atom feed and a full one? I want to make sure I get all the link elements in the secondary one right.&lt;/p&gt;
&lt;p&gt;By &lt;a href=&#34;http://plasmasturm.org/&#34; title=&#34;http://plasmasturm.org/&#34;&gt;Aristotle Pagaltzis&lt;/a&gt; on &lt;a href=&#34;#comment-26&#34;&gt;December 7, 2005 9:08 AM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Bob: I assume you mean autodiscovery links? In that case, you’d do something like&lt;/p&gt;
&lt;link rel=&#34;alternate&#34; type=&#34;application/atom+xml&#34; title=&#34;Atom&#34; href=&#34;http://www.snee.com/bobdc.blog/bobdcblog.atom&#34; /&gt;\
&lt;link rel=&#34;alternate&#34; type=&#34;application/atom+xml&#34; title=&#34;Atom, full text&#34; href=&#34;http://www.snee.com/bobdc.blog/bobdcblog-full.atom&#34; /&gt;
&lt;p&gt;By &lt;a href=&#34;http://www.snee.com/bob&#34; title=&#34;http://www.snee.com/bob&#34;&gt;Bob&lt;/a&gt; on &lt;a href=&#34;#comment-27&#34;&gt;December 8, 2005 12:54 PM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;No, I meant actual websites. &lt;a href=&#34;http://www.dhemery.com/cwd/&#34;&gt;http://www.dhemery.com/cwd/&lt;/a&gt; is about the only one I&amp;rsquo;ve seen, but I didn&amp;rsquo;t look very hard.&lt;/p&gt;
&lt;p&gt;I now have an Atom feed with full entries at &lt;a href=&#34;http://www.snee.com/bobdc.blog/bobdcblogfull.atom&#34;&gt;http://www.snee.com/bobdc.blog/bobdcblogfull.atom&lt;/a&gt; and will be publcizing it more in a blog entry.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2005">2005</category>
      
      <category domain="https://www.bobdc.com//categories/blogging-about-blogging">blogging about blogging</category>
      
    </item>
    
    <item>
      <title>SPARQL queries of the Billboard Hot 100 PART 2</title>
      <link>https://www.bobdc.com/blog/hot100pt2/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      
      <guid>https://www.bobdc.com/blog/hot100pt2/</guid>
      
      
      <description><div>Current and historical data! REVISE THAT</div><div>&lt;p&gt;constructforwd.rq works, so if I can&amp;rsquo;t get the INSERT version to work just import from constructed triples.&lt;/p&gt;
&lt;p&gt;REMEMBER THE / AT THE END OF THE schema.org PREFIX DECLARATION&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Right how these are just notes that I pasted here so that it would get backed up&lt;/em&gt;*&lt;/p&gt;
&lt;p&gt;April blog entry should get up to the running of the Python script, loading those triples, and good queries below. But, this could have been done with an RDBMS. We&amp;rsquo;ll see how to make it a real knowledge graph next week.&lt;/p&gt;
&lt;p&gt;For that, constructforwd.rq works. Comment at top shows curl command that makes all the triples linking to wikidata. With those added to the repo, try a query that shows the youngest person to have a hit in a given week, and think of some other things. Who played the guitar, who played blues, who was influenced by Robert Johnson, who is in the Rock and Roll Hall of Fame (last few ideas all came from &lt;a href=&#34;https://www.wikidata.org/wiki/Q11036&#34;&gt;https://www.wikidata.org/wiki/Q11036&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;From the README:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Files in ~/git/billboard-hot-100-rdf/&lt;/p&gt;
&lt;p&gt;This works, so I can proceed, although I&amp;rsquo;d rather use annotation syntax:&lt;/p&gt;
&lt;p&gt;PREFIX h1: &lt;a href=&#34;http://rdfdata.org/hot100#&#34;&gt;http://rdfdata.org/hot100#&lt;/a&gt;
PREFIX schema: &lt;a href=&#34;http://schema.org&#34;&gt;http://schema.org&lt;/a&gt;
PREFIX dc: &lt;a href=&#34;http://purl.org/dc/elements/1.1/&#34;&gt;http://purl.org/dc/elements/1.1/&lt;/a&gt;
SELECT * WHERE {
?recording a schema:Recording ;
dc:title &amp;ldquo;Cruel Summer&amp;rdquo; ;
schema:byArtist &amp;ldquo;Taylor Swift&amp;rdquo; .&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;lt;&amp;lt; ?recording h1:charted ?chartDate &amp;gt;&amp;gt; h1:position ?chartPosition .
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;}&lt;/p&gt;
&lt;h1 id=&#34;calculating-the-values-we-didnt-convert-from-json&#34;&gt;Calculating the values we didn&amp;rsquo;t convert from JSON&lt;/h1&gt;
&lt;h2 id=&#34;chart-position-last-week&#34;&gt;Chart position last week&lt;/h2&gt;
&lt;p&gt;PREFIX h1: &lt;a href=&#34;http://rdfdata.org/hot100#&#34;&gt;http://rdfdata.org/hot100#&lt;/a&gt;
PREFIX schema: &lt;a href=&#34;http://schema.org&#34;&gt;http://schema.org&lt;/a&gt;
PREFIX dc: &lt;a href=&#34;http://purl.org/dc/elements/1.1/&#34;&gt;http://purl.org/dc/elements/1.1/&lt;/a&gt;
PREFIX xsd: &lt;a href=&#34;http://www.w3.org/2001/XMLSchema#&#34;&gt;http://www.w3.org/2001/XMLSchema#&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;SELECT ?dateLastWeek ?positionLastWeek WHERE {&lt;/p&gt;
&lt;h1 id=&#34;2-find-out-the-week-before-the-latest-chart&#34;&gt;2. Find out the week before the latest chart&lt;/h1&gt;
&lt;h1 id=&#34;appearance-and-the-position-from-that-week&#34;&gt;appearance and the position from that week.&lt;/h1&gt;
&lt;p&gt;BIND (?latestPosition - &amp;ldquo;P7D&amp;rdquo;^^xsd:duration AS ?dateLastWeek)
&amp;laquo; ?recording h1:charted ?dateLastWeek &amp;raquo; h1:position ?positionLastWeek .
{
# 1. Find the date of the latest chart appearance.
SELECT ?recording (MAX(?chartDate) AS ?latestPosition) WHERE {
?recording a schema:Recording ;
dc:title &amp;ldquo;Snooze&amp;rdquo; ;
schema:byArtist/rdfs:label &amp;ldquo;SZA@en&amp;rdquo; .
?recording h1:charted ?chartDate .
}
GROUP BY ?recording
}
}&lt;/p&gt;
&lt;h2 id=&#34;who-had-hits-in-the-most-decades&#34;&gt;Who had hits in the most decades?&lt;/h2&gt;
&lt;p&gt;PREFIX schema: &lt;a href=&#34;http://schema.org/&#34;&gt;http://schema.org/&lt;/a&gt;
PREFIX h1: &lt;a href=&#34;http://rdfdata.org/hot100#&#34;&gt;http://rdfdata.org/hot100#&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;SELECT ?artist (COUNT(DISTINCT ?decade) AS ?decades) WHERE {
?recording a schema:Recording ;
schema:byArtist ?artist ;
h1:charted ?chartDate .
BIND (SUBSTR(str(?chartDate),1,3) AS ?decade)
}
GROUP BY ?artist
ORDER BY DESC(?decades)&lt;/p&gt;
&lt;h2 id=&#34;what-were-the-decades-in-which-little-richard-had-hits&#34;&gt;what were the decades in which Little Richard had hits?&lt;/h2&gt;
&lt;p&gt;After I straighten out SERVICE call part: revise everything up here to make a URI for the artist from the string and give schema:byArtist a range of that.&lt;/p&gt;
&lt;p&gt;All this could be done with a relational database, so for a  &amp;ldquo;knowledge graph&amp;rdquo; angle see if there is a way to add links to each artist&amp;rsquo;s Wikidata page.&lt;/p&gt;
&lt;h1 id=&#34;passing-artist-name-to-wikidata-to-get-artist-url&#34;&gt;passing artist name to wikidata to get artist URL&lt;/h1&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;PREFIX rdfs: &amp;lt;http://www.w3.org/2000/01/rdf-schema#&amp;gt;
PREFIX wd: &amp;lt;http://www.wikidata.org/entity/&amp;gt;
PREFIX wdt: &amp;lt;http://www.wikidata.org/prop/direct/&amp;gt;

SELECT ?artistURI WHERE {
	BIND (&amp;#34;The Beatles&amp;#34;@en AS ?artistName)

  SERVICE &amp;lt;https://query.wikidata.org/sparql&amp;gt; {
    
      # Check if artist a musician (a human with a value for instrument; singers and 
      # rappers listed with &amp;#34;instrument&amp;#34; of &amp;#34;voice&amp;#34;) or a musical group. 
      { 
        ?artistURI rdfs:label ?artistName;
                   wdt:P31  wd:Q5 ;        # instance of human
                   wdt:P1303 ?instrument . 
      }
      UNION
      {
        ?artistURI rdfs:label ?artistName ;
                   wdt:P31 wd:Q215380 . # instance of musical group
      }
  } # end of SERVICE
}
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;** json2rdf.py&lt;/p&gt;
&lt;p&gt;#!/usr/bin/env python3&lt;/p&gt;
&lt;h1 id=&#34;explain-more-about-it-here&#34;&gt;EXPLAIN MORE ABOUT IT HERE&lt;/h1&gt;
&lt;h1 id=&#34;run-as&#34;&gt;run as&lt;/h1&gt;
&lt;h1 id=&#34;json2rdfpy-alljson&#34;&gt;./json2rdf.py ../all.json&lt;/h1&gt;
&lt;p&gt;import json
import sys
import urllib.parse&lt;/p&gt;
&lt;p&gt;if (len(sys.argv) &amp;lt; 2):
print(&amp;ldquo;Enter an input filename as an argument.&amp;rdquo;)
exit()&lt;/p&gt;
&lt;p&gt;inputFile = sys.argv[1]&lt;/p&gt;
&lt;p&gt;jsonBlock = &amp;quot;&amp;quot;&lt;/p&gt;
&lt;p&gt;with open(inputFile) as fp:
for line in fp:
jsonBlock += line&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;jsonData = json.loads(jsonBlock)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;print(&amp;rsquo;@prefix h1: &lt;a href=&#34;http://rdfdata.org/hot100#&#34;&gt;http://rdfdata.org/hot100#&lt;/a&gt; .&amp;rsquo;)
print(&amp;rsquo;@prefix schema: &lt;a href=&#34;http://schema.org/&#34;&gt;http://schema.org/&lt;/a&gt; .&amp;rsquo;)
print(&amp;rsquo;@prefix dc: &lt;a href=&#34;http://purl.org/dc/elements/1.1/&#34;&gt;http://purl.org/dc/elements/1.1/&lt;/a&gt; .&amp;rsquo;)
print(&amp;rsquo;@prefix xsd: &lt;a href=&#34;http://www.w3.org/2001/XMLSchema#&#34;&gt;http://www.w3.org/2001/XMLSchema#&lt;/a&gt; .&amp;rsquo;)
print()&lt;/p&gt;
&lt;p&gt;for week in jsonData:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;chartDate = week[&amp;quot;date&amp;quot;]

for recording in week[&amp;quot;data&amp;quot;]:
    artist =recording[&amp;quot;artist&amp;quot;]
    artist = artist.replace(&#39;&amp;quot;&#39;,&#39;\\&amp;quot;&#39;)
    song = recording[&amp;quot;song&amp;quot;]
    song = song.replace(&#39;&amp;quot;&#39;,&#39;\\&amp;quot;&#39;)
    # ID of song is artist + song because two different songs can have
    # the same title, e.g. Taylor Swift&#39;s and Banarama&#39;s &amp;quot;Cruel Summer&amp;quot;.
    artistSong = artist + song
    # Lose characters that screw up URI. 
    for c in &#39; &amp;amp;/.\&amp;quot;\&#39;&#39;:
        artistSong = artistSong.replace(c,&#39;&#39;)
    print(&amp;quot;h1:&amp;quot; + urllib.parse.quote(artistSong) + &amp;quot; a schema:Recording;&amp;quot;)
    print(&#39;     schema:byArtist &#39; + &#39;&amp;quot;&#39; + artist + &#39;&amp;quot;@en;&#39;)
    print(&#39;     dc:title &#39; + &#39;&amp;quot;&#39; + song + &#39;&amp;quot;;&#39;)
    print(&#39;     h1:charted &#39; + &#39;&amp;quot;&#39; + chartDate + &#39;&amp;quot;^^xsd:date {| &#39;)
    print(&#39;        h1:position &#39; + str(recording[&amp;quot;this_week&amp;quot;]))
    print(&#39;|}.&#39;)
&lt;/code&gt;&lt;/pre&gt;
&lt;h1 id=&#34;adding-in-links-to-wikidata&#34;&gt;adding in links to wikidata&lt;/h1&gt;
&lt;p&gt;&lt;a href=&#34;https://chartdata.org/faq/&#34;&gt;https://chartdata.org/faq/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Not as Python, but as an INSERT update query with a SERVICE  call to Wikidata. First do a CONSTRUCT and a COUNT to see what % of the Hot 100 this works for. With a URI like &lt;a href=&#34;http://www.wikidata.org/entity/Q194220&#34;&gt;http://www.wikidata.org/entity/Q194220&lt;/a&gt; representing the artist, what triple do I add? &lt;a href=&#34;https://schema.org/sameAs&#34;&gt;https://schema.org/sameAs&lt;/a&gt; mentions Wikidata entries as possibilities. Make it clear that unlike owl sameAs, this is a link to follow and not for inferencing. But why not do that? Admit that there will be mistakes.&lt;/p&gt;
&lt;p&gt;The following works but only for 1000 artists. Maybe there is some GraphDB limit I can reset. See the two commented lines near the top of it.&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;PREFIX rdfs: &amp;lt;http://www.w3.org/2000/01/rdf-schema#&amp;gt;
PREFIX wd: &amp;lt;http://www.wikidata.org/entity/&amp;gt;
PREFIX wdt: &amp;lt;http://www.wikidata.org/prop/direct/&amp;gt;

PREFIX schema: &amp;lt;http://schema.org/&amp;gt;
# Can I do anything with the query so that I don&amp;#39;t need the DISTINCT keyword?
# There are 10,773 artist names. Can I do this for all of them? Reset some GraphDB limit? 
SELECT DISTINCT ?artistName ?artistURI WHERE {
	 #BIND (&amp;#34;Ricky Nelson&amp;#34;@en AS ?artistName)
    ?s schema:byArtist ?artistName .

  SERVICE &amp;lt;https://query.wikidata.org/sparql&amp;gt; {
    
      # Check if artist a musician (a human with a value for instrument; singers and 
      # rappers listed with &amp;#34;instrument&amp;#34; of &amp;#34;voice&amp;#34;) or a musical group. 
      { 
        ?artistURI rdfs:label ?artistName;
                   wdt:P31  wd:Q5 ;        # instance of human
                   wdt:P1303 ?instrument . 
      }
      UNION
      {
        ?artistURI rdfs:label ?artistName ;
                   wdt:P31 wd:Q215380 . # instance of musical group
      }
  } # end of SERVICE
}
    
&lt;/code&gt;&lt;/pre&gt;&lt;ul&gt;
&lt;li&gt;more readme notes&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;To point to a Wikidata resource, I used the LoC&amp;rsquo;s &lt;a href=&#34;http://www.loc.gov/mads/rdf/v1#hasCloseExternalAuthority&#34;&gt;http://www.loc.gov/mads/rdf/v1#hasCloseExternalAuthority&lt;/a&gt; (see &lt;a href=&#34;https://id.loc.gov/ontologies/madsrdf/v1.html&#34;&gt;https://id.loc.gov/ontologies/madsrdf/v1.html&lt;/a&gt; and &lt;a href=&#34;https://id.loc.gov/authorities/names/n81127048.html&#34;&gt;https://id.loc.gov/authorities/names/n81127048.html&lt;/a&gt; because they use it for the same thing.&lt;/p&gt;
&lt;p&gt;Turn &lt;a href=&#34;https://github.com/mhollingshead/billboard-hot-100&#34;&gt;https://github.com/mhollingshead/billboard-hot-100&lt;/a&gt; into SPARQL and query it. Maybe turn it into JSON-LD as an experiment in using that. Have a cron job on snee pull it down and put an RDF version on bobdc.com where I can pull it down. Or make it a local cron job that just pulls from there and loads the RDF into a repo. Model: &amp;ldquo;Cruel Summer&amp;rdquo; charted 2024-02-17; rdf-star about that triple has all the other data. But, &amp;ldquo;Cruel Summer&amp;rdquo; isn&amp;rsquo;t a good ID because there has been another hit with the same title: &lt;a href=&#34;https://en.wikipedia.org/wiki/Cruel_Summer_(Bananarama_song)&#34;&gt;https://en.wikipedia.org/wiki/Cruel_Summer_(Bananarama_song)&lt;/a&gt;. So for ID http://whatever/Taylor-Swift-Cruel-Summer ? rdf:type of &lt;a href=&#34;https://schema.org/MusicRecording&#34;&gt;https://schema.org/MusicRecording&lt;/a&gt;. byArtist is a property. Actually, the ID would be better as an MD5() of &amp;ldquo;Taylor SwiftCruel Summer&amp;rdquo;; the 33 chars it returns is the shortest of the checksum functions. If I leave out last_week, peak_position, and weeks_on_chart can I should be able to calculate those, so try just storing :charted value of the date and the this_week value.&lt;/p&gt;
&lt;p&gt;GraphDB imported all.ttl in 16 seconds.&lt;/p&gt;
&lt;p&gt;This works, so I can proceed, although I&amp;rsquo;d rather use annotation syntax:&lt;/p&gt;
&lt;p&gt;PREFIX h1: &lt;a href=&#34;http://rdfdata.org/hot100#&#34;&gt;http://rdfdata.org/hot100#&lt;/a&gt;
PREFIX schema: &lt;a href=&#34;http://schema.org/&#34;&gt;http://schema.org/&lt;/a&gt;
PREFIX dc: &lt;a href=&#34;http://purl.org/dc/elements/1.1/&#34;&gt;http://purl.org/dc/elements/1.1/&lt;/a&gt;
SELECT * WHERE {
?recording a schema:Recording ;
dc:title &amp;ldquo;Cruel Summer&amp;rdquo; ;
schema:byArtist &amp;ldquo;Taylor Swift&amp;rdquo; .&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;lt;&amp;lt; ?recording h1:charted ?chartDate &amp;gt;&amp;gt; h1:position ?chartPosition .
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;}&lt;/p&gt;
&lt;h1 id=&#34;calculating-the-values-we-didnt-convert-from-json-1&#34;&gt;Calculating the values we didn&amp;rsquo;t convert from JSON&lt;/h1&gt;
&lt;h2 id=&#34;weeks-on-chart&#34;&gt;Weeks on chart&lt;/h2&gt;
&lt;p&gt;PREFIX h1: &lt;a href=&#34;http://rdfdata.org/hot100#&#34;&gt;http://rdfdata.org/hot100#&lt;/a&gt;
PREFIX schema: &lt;a href=&#34;http://schema.org/&#34;&gt;http://schema.org/&lt;/a&gt;
PREFIX dc: &lt;a href=&#34;http://purl.org/dc/elements/1.1/&#34;&gt;http://purl.org/dc/elements/1.1/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;SELECT (COUNT(?chartPosition) AS ?weeksOnChart) WHERE {
?recording a schema:Recording ;
dc:title &amp;ldquo;Cruel Summer&amp;rdquo; ;
schema:byArtist &amp;ldquo;Taylor Swift&amp;rdquo; .&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;lt;&amp;lt; ?recording h1:charted ?chartDate &amp;gt;&amp;gt; h1:position ?chartPosition .
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;}&lt;/p&gt;
&lt;h2 id=&#34;highest-chart-position&#34;&gt;Highest chart position&lt;/h2&gt;
&lt;p&gt;Change SELECT line on the last one to&lt;/p&gt;
&lt;p&gt;SELECT (MIN(?chartPosition) AS ?highestPosition) WHERE {&lt;/p&gt;
&lt;h2 id=&#34;chart-position-last-week-1&#34;&gt;Chart position last week&lt;/h2&gt;
&lt;p&gt;PREFIX h1: &lt;a href=&#34;http://rdfdata.org/hot100#&#34;&gt;http://rdfdata.org/hot100#&lt;/a&gt;
PREFIX schema: &lt;a href=&#34;http://schema.org/&#34;&gt;http://schema.org/&lt;/a&gt;
PREFIX dc: &lt;a href=&#34;http://purl.org/dc/elements/1.1/&#34;&gt;http://purl.org/dc/elements/1.1/&lt;/a&gt;
PREFIX xsd: &lt;a href=&#34;http://www.w3.org/2001/XMLSchema#&#34;&gt;http://www.w3.org/2001/XMLSchema#&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;SELECT ?dateLastWeek ?positionLastWeek WHERE {&lt;/p&gt;
&lt;h1 id=&#34;2-find-out-the-week-before-the-latest-chart-1&#34;&gt;2. Find out the week before the latest chart&lt;/h1&gt;
&lt;h1 id=&#34;apperance-and-the-position-from-that-week&#34;&gt;apperance and the position from that week.&lt;/h1&gt;
&lt;p&gt;BIND (?latestPosition - &amp;ldquo;P7D&amp;rdquo;^^xsd:duration AS ?dateLastWeek)
&amp;laquo; ?recording h1:charted ?dateLastWeek &amp;raquo; h1:position ?positionLastWeek .
{
# 1. Find the date of the latest chart appearance.
SELECT ?recording (MAX(?chartDate) AS ?latestPosition) WHERE {
?recording a schema:Recording ;
dc:title &amp;ldquo;Snooze&amp;rdquo; ;
schema:byArtist &amp;ldquo;SZA&amp;rdquo; .
?recording h1:charted ?chartDate .
}
GROUP BY ?recording
}
}&lt;/p&gt;
&lt;p&gt;All this could be done with a relational database, so for a  &amp;ldquo;knowledge graph&amp;rdquo; angle see if there is a way to add links to each artist&amp;rsquo;s Wikidata page.&lt;/p&gt;
&lt;p&gt;Checking on their Wikidata pages &amp;ndash; This works for Keith:
curl &lt;a href=&#34;https://query.wikidata.org/bigdata/namespace/wdq/sparql?query=DESCRIBE%20%3Chttp%3A%2F%2Fwww.wikidata.org%2Fentity%2FQ189599%3E&#34;&gt;https://query.wikidata.org/bigdata/namespace/wdq/sparql?query=DESCRIBE%20%3Chttp%3A%2F%2Fwww.wikidata.org%2Fentity%2FQ189599%3E&lt;/a&gt;
But it&amp;rsquo;s asking with his Wikidata ID. I want to ask with the name if there is a page for that person.&lt;/p&gt;
</div></description>
      
      <author>Bob DuCharme</author>
      <category domain="https://www.bobdc.com//categories/2024">2024</category>
      
      <category domain="https://www.bobdc.com//categories/sparql">SPARQL</category>
      
    </item>
    
  </channel>
</rss>