More Picasso paintings in one year than all the Vermeer paintings?

Answering an art history question with SPARQL.

Woman Writing a Letter, with her Maid by Johannes Vermeer

Sometimes a question pops into my head that, although unrelated to computers, could likely be answered with a SPARQL query. I don’t necessarily know the query off the top of my head and have to work it out. I’m going to discuss an example of one that I worked out and the steps that I took, because I wanted to show how I navigated the Wikidata data model to get what I wanted.

On a recent trip to Dublin my wife and I went to Dublin’s wonderful National Gallery of Ireland. Among other paintings we saw Vermeer’s Woman Writing a Letter, with her Maid and Picasso’s Still Life with a Mandolin.

Seeing any Vermeer is a treat because there are so few of them around, and the way he depicts light makes for a huge difference between seeing a picture of the painting and seeing the real thing in front of you. (Remember, when you see these dumb discussions about AI-generated “paintings”: we can discuss whether they’re art or not, but they’re not paintings if there is no paint. They’re PNG and JPG files. If you compare the image above with the Vermeer hanging on the wall at the National Gallery of Ireland you’ll see what a tremendous difference that can be.) The Picasso was also great to see live because it was from his more colorful late cubist period; while some of his related collages included bits of wall paper, for this one he painted wallpaper-like patterns onto the canvas.

Still Life with a Mandolin

We know that Picasso was very prolific for many decades. This led me to wonder: was there any single year of Picasso’s career where he produced more paintings than Vermeer produced in his whole life? (Judging, in both cases, by surviving paintings that we have record of.)

The Wikipedia page for Vermeer tells us that “only 34 paintings are universally attributed to him today”, so I didn’t need SPARQL for that. The question for to me answer was this: were there any years where Picasso painted more than 34 paintings?

What triples say “Picasso made this painting”?

First I had to identify how Wikidata tells us that Picasso painted a given painting. I started with one of his most famous ones and clicked Wikidata item on the left side of the Guernica (Picasso) Wikipedia page. This showed me that Q175036 is the Wikidata identifier for this painting. I knew that the Wikidata triples with subjects that build on this ID would provide some good clues about developing a query that could count up his paintings per year.

What triples say “It’s a painting”?

I didn’t want to count up all his artworks per year, but just his paintings, so I entered the following query and executed it to see what class Guernica was an instance of. (Note that instead of using rdf:type or a as a property meaning “is an instance of”, Wikidata uses wdt:P31. Being reminded of this was part of my navigation around the Wikidata data model that I mentioned above.)

SELECT * WHERE {
  wd:Q175036 wdt:P31 ?class .
  ?class rdfs:label ?name .
  FILTER (lang(?name) = "en")
}

This showed that it is an instance of wd:Q3305213, or “painting”.

What triples say “It’s by Picasso”?

I went to the Wikipedia page for Picasso, picked Wikidata item, and saw that Picasso’s Wikidata identifier is WQ5593.

Next, I did a very simple query for all the data about the painting Guernica:

SELECT * WHERE {
  wd:Q175036 ?p ?o 
} 

The result of this query included “wdt:P170 wd:Q5593”. If wd:Q5593 is Picasso, what is wdt:P170? This is easy enough to find out when executing the query with the Wikidata SPARQL endpoint HTML form: I just clicked on this name in the query result and it showed me that wdt:P170 means “creator”.

What triples say what year a painting was created?

The Wikipedia page for Guernica says that it was created in 1937. The earlier result of asking for all the triples about the painting showed that it has a wdt:P571 value of “1 January 1937”, where wdt:P571 means “inception.”

What paintings in what years?

Next, I used this query to list all the paintings by Picasso and the dates they were created:

SELECT * WHERE {
  ?painting wdt:P31 wd:Q3305213 ; # it's a painting
  wdt:P170 wd:Q5593 ;             # by Picasso
  rdfs:label ?title ;
  wdt:P571 ?inceptionDate .
  FILTER (lang(?title) = "en")
} 

This listed them, but the Wikidata endpoint interface was displaying dates like 1913-01-01 as “1 January 1913” (with a suspicious amount having that “1 January”, so that may be a default when the month and day were unavailable). I just wanted the year if I was going to look for total paintings per year. I eventually realized that the date values were in ISO 8601 format, so I tried pulling out the year values with this query:

SELECT * WHERE {
  ?painting wdt:P31 wd:Q3305213 ; # it's a painting
  wdt:P170 wd:Q5593 ;             # by Picasso
  rdfs:label ?title ;
  wdt:P571 ?inceptionDate .
  BIND(substr(?inceptionDate,1,4) AS ?year)
  FILTER (lang(?title) = "en")
} 

The dates still looked inconsistent, so I stored that query in the file pquery1.rq and used curl to run the query from my shell command line so that I could see the raw result:

curl --data-urlencode "query@pquery1.rq" https://query.wikidata.org/sparql

That showed me that the dates weren’t just arranged in ISO 8601 format—they were actually typed as ISO dates, so I revised the query above to convert those to regular strings before pulling out the year value with this query, and the ?year values came as the four-digit numbers I wanted to see:

SELECT * WHERE {
  ?painting wdt:P31 wd:Q3305213 ; # it's a painting
  wdt:P170 wd:Q5593 ;             # by Picasso
  rdfs:label ?title ;
  wdt:P571 ?inceptionDate .
  # added str() call to following
  BIND(substr(str(?inceptionDate),1,4) AS ?year)
  FILTER (lang(?title) = "en")
} 

How many Picasso paintings per year?

I wasn’t really interested in the painting titles or their month and day of inception. I had everything I needed to answer my original question: how many paintings did Picasso do each year?

SELECT ?year (COUNT(?painting) AS ?paintingsInYear) WHERE {
  ?painting wdt:P31 wd:Q3305213 ; # it's a painting
  wdt:P170 wd:Q5593 ;             # by Picasso
  wdt:P571 ?inceptionDate .
  BIND(substr(str(?inceptionDate),1,4) AS ?year)
} 
GROUP BY ?year
ORDER BY DESC(?paintingsInYear)

Here are the first few rows of the results:

year    paintingsInYear
1901	52
1906	33
1908	31
1909	30
1905	25
1914	24
1903	23

So there’s the answer: we know of more Picasso paintings from 1901 than we know of Vermeer paintings from his whole life, and in 1906 Picasso came close to the Vermeer total. The first decade of the twentieth century was a very busy year for Picasso. (I then found a website showing his paintings by year; the 1901 page is interesting.)

The eye icon dropdown “Display result as” menu on the left side of the Wikidata Query Service page offers other ways to visualize the data. I changed the ORDER BY line in the last query to sort by the ?year value, ran the query, and then picked “line chart” from the dropdown and got this graph of the number of Picasso’s paintings per year:

Picasso paintings per year

This makes it even clearer how busy he was in the first decade of that century.

There are other display types, and of course, many other painters. There is a lot more fun to have here!

The most difficult part of creating such a query is the cryptic nature of the entity and property IDs: a single letter followed by a few digits. If the resources and properties used more readable names such as “Guernica (painting)” and “creator” instead, it would be more intuitive and easier to write queries—for those of us who speak English. But, Wikidata is designed to be usable by everyone in the world, not just the English speakers, and that’s a good thing. I won’t complain.

One more note: I included a digital-humanities tag with this post because it’s about using technology to answer an art history question. The field is often about accumulating data from different sources so that people can identify new patterns, but as the data in Wikidata accumulates more and more, there are more and more great things we can do with this wonderful source.


Comments? Reply to my tweet announcing this blog entry.