Querying for labels

The normal way and the wikibase:label service way

labeled tomato plants with SPARQL logo

In my last blog entry I discussed various ways that different RDF datasets assign human-readable labels to resources, with the rdfs:label property being at the center of them all. I mentioned how schema.org doesn’t use rdfs:label but its own equivalent of that, schema:name, which its schema declares as a subproperty of rdfs:label. Since I wrote that, Fan Li pointed out that Facebook’s Open Graph protocol also has their own equivalent: og:title, which you can see used in the HTML source of IMDB, Instagram, and yelp. (I tried pointing each of those three links to the view-source version of the pages, and that didn’t work, so you’ll have to take the extra step with each to view their source and see each one’s og:title value.) This also gets defined as a subproperty of rdfs:label in the OGP schema, so a serious RDFS application could parse that schema and then treat og:title values as rdfs:label values.

Treating those rdfs:label variations as rdfs:label values

Querying for rdfs:label values is simple enough. To demonstrate how a query for rdfs:label values will retrieve og:title and schema:name values when a query engine that can do inferencing has access to the Open Graph Protocol and schema.org schemas, I added some of those values to the following document with comments about where I found each. (Where I found them they were not in Turtle syntax like they are here, but they were in machine-readable formats that could easily be converted to Turtle.)

Sample data:

@prefix og: <http://ogp.me/ns#> .
@prefix schema: <https://schema.org/> .

# og:title examples

  og:title "Priscilla (2023) ⭐ 6.9 | Biography, Drama, Music" . 

  og:title " (&#064;bobdcofficial) &#x2022; Instagram photos and videos" . 

  og:title "Peter Chang's China Grill - Charlottesville, VA" . 

# schema:name examples

## (added by Hugo as a default with no special configuration from me)
  schema:name "Human-readable names in RDF" . 

  schema:name "The Best Books We Read This Week" . 

  schema:name "Men's Super-T Long Sleeve T-Shirt" . 

I downloaded the schema.org and OGP schema files and combined them into a single schema file:

cat ogp.me.ttl schemaorg-current-https.ttl > comboschema.ttl

Then, as I described in Hidden gems included with Jena’s command line utilities, I used the Jena riot tool to do RDFS inferencing with the data above and the combined schemas. It produced a lot of triples, so I used grep to only show the ones that mentioned the rdfs:label value:

riot --rdfs comboschema.ttl labeldata.ttl | grep "#label" 

It produced these results:

<https://www.imdb.com/title/tt22041854/?ref_=ttls_li_tt> <http://www.w3.org/2000/01/rdf-schema#label> "Priscilla (2023) ⭐ 6.9 | Biography, Drama, Music" .
<https://www.instagram.com/bobdcofficial/> <http://www.w3.org/2000/01/rdf-schema#label> " (&#064;bobdcofficial) &#x2022; Instagram photos and videos" .
<https://www.yelp.com/biz/peter-changs-china-grill-charlottesville> <http://www.w3.org/2000/01/rdf-schema#label> "Peter Chang's China Grill - Charlottesville, VA" .
<https://www.bobdc.com/blog/rdflabels/> <http://www.w3.org/2000/01/rdf-schema#label> "Human-readable names in RDF" .
<https://www.newyorker.com/best-books-2023> <http://www.w3.org/2000/01/rdf-schema#label> "The Best Books We Read This Week" .
<https://www.landsend.com/products/mens-super-t-long-sleeve-t-shirt/id_130670> <http://www.w3.org/2000/01/rdf-schema#label> "Men's Super-T Long Sleeve T-Shirt" .

So, asking for the rdfs:label values when the schemas were available retrieved the schema:name and og:title values because they were subproperties of rdfs:label and because I used a query engine that could do inferencing. (When I created a repo that would do RDFS inferencing with the free version of GraphDB, the same thing happened. Standards!)

Some extra help from the Wikidata Query Service

Querying for an rdfs:label value in Wikipedia can be simple enough:

PREFIX wd:   <http://www.wikidata.org/entity/>

   wd:Q144 rdfs:label ?name

Doing this in Wikidata, though, gets about 300 results (and the number has gone up since I first drafted this blog entry) because Wikidata knows the word for “dog” in so many languages. We could FILTER it down to one or just a few languages like this:

PREFIX wd:   <http://www.wikidata.org/entity/>

   wd:Q144 rdfs:label ?label
   FILTER (lang(?label) IN ("en","es"))

Wikidata has a special service to make this easier. To demonstrate it, let’s say I’m wondering about the topics of the Wikiquote pages https://en.wikiquote.org/wiki/Dogs and https://en.wikiquote.org/wiki/Cats (although it’s pretty clear from the URLs). The following query, which you can try on the Wikidata Query Service, will show me a ?foo value of wd:Q144 and and ?bar value of wd:Q146, which are not very informative:

SELECT ?foo ?bar
  { <https://en.wikiquote.org/wiki/Dogs> schema:about ?foo }
  { <https://en.wikiquote.org/wiki/Cats> schema:about ?bar }

I could ask for rdfs:label values of ?foo and ?bar, but instead I’ll use the wikibase:label service built in to the Wikidata Query Service. This not only looks up the labels but even creates variables for them by adding “Label” to the names of the variables representing the resources that I’m querying about:

SELECT ?fooLabel ?barLabel
  { <https://en.wikiquote.org/wiki/Dogs> schema:about ?foo }
  { <https://en.wikiquote.org/wiki/Cats> schema:about ?bar }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]" } 

Running that query gives us the following results:

fooLabel    barLabel
--------    --------
dog         house cat

I could name a specific language if I wanted; running the next one shows a ?fooLabel value of “Hund” and a ?barLabel value of “Hauskatze”.

SELECT ?fooLabel ?barLabel
  { <https://en.wikiquote.org/wiki/Dogs> schema:about ?foo }
  { <https://en.wikiquote.org/wiki/Cats> schema:about ?bar }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "de" } 

A neat Wikidata Query Service trick that I only recently learned about is how the web interface lets you reset the default language. If I click on “English” in the upper right of the query screen I get a drop-down, searchable list of languages. If I pick “español” from this list, the query screen’s “Examples” button gets renamed as “Ejemplos”, “Help” becomes “Ayuda”, and so forth with the rest of the UI. When I run the [AUTO_LANGUAGE] query from above after doing this, it shows a ?fooLabel value of “perro” and a ?barLabel value of “gato doméstico”.

With a made-up language code of “xyz” that it doesn’t recognize, it gives me the Q names from the ?foo and ?bar values as ?fooLabel and ?barLabel values:

fooLabel  barLabel
--------  --------
Q144      Q146

The wikibase:label service is not standard SPARQL, but with the tremendous amount of multi-lingual data available in Wikidata, it adds a lot of convenience that can trim down the length of your Wikidata queries.

Comments? Reply to my tweet (or even better, my Mastodon message) announcing this blog entry.

CC BY 2.0 photo by F Delventhal