# Pulling SKOS prefLabel and altLabel values out of DBpedia

Or, using linked data to build a standards-compliant thesaurus with SPARQL.

When my TopQuadrant colleague Dean Allemang referred to the use of DBpedia as a controlled vocabulary, I said “Huh?” He helped me to realize that if you and I want to refer to the same person, place, or thing, but there’s a chance that we might use different names for it, DBpedia’s URI for it might make the best identifier for us to both use. For example, if you refer to the nineteenth-century American president and Civil War general Ulysses S. Grant and I refer to him as Ulysses Grant, and then we find out that DBpedia’s URI for him is http://dbpedia.org/resource/Ulysses_S._Grant, I’m not going to insist on leaving Grant’s middle initial out of the URI.

Grant once had the nickname “Useless S. Grant”, and DBpedia can help us here, too. If you try to go to a Wikipedia page for http://en.wikipedia.org/wiki/Useless_S._Grant, instead of sending you an error message, Wikipedia will redirect you to the http://en.wikipedia.org/wiki/Ulysses_S._Grant page. DBpedia uses the http://dbpedia.org/ontology/wikiPageRedirects property to track these redirect values, and a SPARQL query that uses it can list alternative names for things that have Wikipedia entries.

I can use this and one of DBpedia’s Categories pages to drive a SPARQL query that selects preferred and alternative labels for a group of DBpedia entries at once. If you enter the following query on DBpedia’s snorql form, it will give you a list of the preferred names of all the 19th-century presidents of the United States, as well as other names they might be known by.

SELECT ?prefLabel ?altLabel
WHERE
{
?president dcterms:subject
<http://dbpedia.org/resource/Category:19th-century_presidents_of_the_United_States> ;
rdfs:label ?prefLabel  .
?nickname <http://dbpedia.org/ontology/wikiPageRedirects> ?president ;
rdfs:label ?altLabel .
FILTER ( lang(?prefLabel) = "en" )
FILTER ( lang(?altLabel) = "en" )
}


The variable names I used will give SKOS fans a clue where I’m going with this: the creation of SKOS triples from this data. The following variation on the SELECT query above declares that the URL for each president on the list of 19th century presidents is a skos:Concept, and it then assigns skos:prefLabel and skos:altLabel values based on the same logic used in the query above.

CONSTRUCT
{
?pres a skos:Concept;
skos:prefLabel ?prefLabel ;
skos:altLabel ?altLabel .
}
WHERE
{
?pres dcterms:subject
<http://dbpedia.org/resource/Category:19th-century_presidents_of_the_United_States> ;
rdfs:label ?prefLabel .
?alt <http://dbpedia.org/ontology/wikiPageRedirects> ?pres; rdfs:label ?altLabel .
FILTER ( lang(?altLabel) = "en" )
FILTER ( lang(?prefLabel) = "en" )
}
}


When running this query with DBpedia, it creates 300 triples. These include skos:altLabel values such as “The Great Emancipator” and “Abe Lincoln” for Abraham Lincoln (or rather, for the concept http://dbpedia.org/resource/Abraham_Lincoln, which has a skos:prefLabel of “Abraham Lincoln”) as well as popular misspellings such as “Abraham Linkin” and “Presedent Lincon”. (If I was going to use this in a production application, I’d change the skos:altLabel values based on misspellings to skos:hiddenLabel values.)

It’s nice how a single query can pull data from DBpedia to populate a SKOS-based thesaurus with preferred and alternative labels. It makes a nice example of how SPARQL can add value (in this case, by redoing the data to conform to a specialized standard) from linked data.

## 2 Comments

Nice one Bob! Be sure and post anything else you might have on (quasi-) extracting domain vocabs from DBpedia. Seems a lot better than making up names/URIs from scratch - not only because there will be linkage already in place, but also it’ll save loads of work in looking for synonyms etc.

But what I really want to know - what is that guitar he’s playing!? Does seem to suit his name (and hairstyle).

By Bob DuCharme{.commenter-profile} on February 23, 2011 6:00 PM

Hi Danny, and thanks! No idea about the guitar; I just found that with some searches.