Jump to content

Wikipedia:WikiProject Women in Red/Wikidata redlist guide

From Wikipedia, the free encyclopedia

This Wikidata redlist guide provides step-by-step guidance to create Women in Red redlists. Although this guide is focused on Women in Red, it may be useful to create Wikidata-based lists for other purposes.

Preliminaries

[edit]

In order to create a Wikidata-based redlist, you will need:

You will use the following tools:

Basics

[edit]

Simple example

[edit]

Let's start with a trivial Wikidata list. It will have a single entry for Ada Lovelace and we'll use the following query:

SELECT ?item WHERE {
  ?item wdt:P31 wd:Q5 .
  ?item wdt:P21 wd:Q6581072 .
  ?item wdt:P735 wd:Q346047 .
  ?item wdt:P734 wd:Q1260681 .
}

Click here to launch the Wikidata query

The above query will get every Wikidata item fulfills these conditions:

  1. Is a human: instance of (P31) human (Q5).
  2. Is a female: sex or gender (P21) female (Q6581072).
  3. Has given name Ada: given name (P735) Ada (Q346047).
  4. Has family name Byron: family name (P734) Byron (Q1260681).

Now that we have a SPARQL query that returns the entries we want, we can create the redlist using {{Wikidata list}} (and remembering to include a {{Wikidata list end}} template):

wikitext
{{Wikidata list
|sparql=SELECT ?item WHERE {
  ?item wdt:P31 wd:Q5 .
  ?item wdt:P21 wd:Q6581072 .
  ?item wdt:P735 wd:Q346047 .
  ?item wdt:P734 wd:Q1260681 .
}
|columns=label:name,P18,description,P106,P569,P570,P19,P20,item:wikidata item
|links=red
|thumb=40
}}
{{Wikidata list end}}

ListeriaBot will take care of updating it automatically, producing the following output:

result

This list is automatically generated from data in Wikidata and is periodically updated by Listeriabot.
Edits made within the list area will be removed on the next update!

name image description occupation date of birth date of death place of birth place of death wikidata item
Ada Lovelace
English mathematician (1815–1852) mathematician
programmer
poet
computer scientist
inventor
translator
writer
engineer
1815-12-10 1852-11-27 London Marylebone Q7259
End of auto-generated list.

Notice that the query returns only ?item. Columns in the table it generates are specified in the |columns= parameter of the {{Wikidata list}} template. See Template:Wikidata listfor more information on Wikidata list parameters.

Missing articles

[edit]

In order to list only items without a corresponding article in the English Wikipedia, every redlist needs the following SPARQL fragment:

OPTIONAL { ?w schema:about ?item; schema:isPartOf <https://en.wikipedia.org/>. }
FILTER(!(BOUND(?w)))

You will also see the following equivalent form:

FILTER NOT EXISTS { ?w schema:about ?item; schema:isPartOf <https://en.wikipedia.org/> . }

Number of sites

[edit]

When looking for notable subjects, it is often useful to look at how many Wikimedia projects have a page for a given item. This number can be retrieved with the following SPARQL fragment:

?item wikibase:sitelinks ?linkcount .

Here's a modified version of the simple example modified to add a column with link count:

wikitext
{{Wikidata list
|sparql=SELECT ?item ?linkcount WHERE {
  ?item wdt:P31 wd:Q5 .
  ?item wdt:P21 wd:Q6581072 .
  ?item wdt:P735 wd:Q346047 .
  ?item wdt:P734 wd:Q1260681 .
  ?item wikibase:sitelinks ?linkcount . # number of site links
}
|columns=label:name,P18,description,P106,P569,P570,P19,P20,item:wikidata item,?linkcount:site links
|links=red
|thumb=40
}}
{{Wikidata list end}}
result

This list is automatically generated from data in Wikidata and is periodically updated by Listeriabot.
Edits made within the list area will be removed on the next update!

name image description occupation date of birth date of death place of birth place of death wikidata item site links
Ada Lovelace
English mathematician (1815–1852) mathematician
programmer
poet
computer scientist
inventor
translator
writer
engineer
1815-12-10 1852-11-27 London Marylebone Q7259 121
End of auto-generated list.

Handling large results

[edit]

The number of results for a SPARQL query can often be in the thousands or tens of thousands. That is way beyond what we can handle in a wiki redlist, so we need to cut it own. The number of results of a query can be limited by adding a LIMIT clause to the end. For example, LIMIT 1000 to limit results to 1000.

However, if we use LIMIT alone, the results that make it into the list will be arbitrary, and they might not be the most relevant. So it is a good idea to always apply order criteria. A limit with our recommended order follows:

ORDER BY DESC(?linkcount) ASC(?item)
LIMIT 1000

This limits the results to the top 1000 by number of sites. If two items have the same number of sites, the one with the lowest item number takes precedence. This makes the result deterministic, meaning that in the absence of actual data changes, the query will always return the same set of 1000 results. If we didn't do this, the bot will repeatedly remove and add back items in subsequent updates.

Occupation

[edit]

One of the most common criterion for redlist is occupation (P106). Check out current redlists by occupation. We specify one or more occupations as follows:

?item wdt:P106 ?occ
VALUES ?occ {
  wd:Q5468707  # forensic entomologist
  wd:Q27645949 # paleoentomologist
  wd:Q3055126  # entomologist 
}

This will include items where occupation (P106) is either forensic entomologist (Q5468707), paleoentomologist (Q27645949), or entomologist (Q3055126). The comments in the query (e.g. # entomologist) are optional, but they can make the query more readable to humans.

Here's a full example of a redlist of 5 entomologist women (see also the actual Entomologists redlist):

wikitext
{{Wikidata list
|sparql=SELECT DISTINCT ?item ?linkcount WHERE {
  ?item wdt:P106 ?occ .
  VALUES ?occ {
    wd:Q5468707  # forensic entomologist
    wd:Q27645949 # paleoentomologist
    wd:Q3055126  # entomologist 
  }
  ?item wdt:P21 wd:Q6581072 .
  ?item wdt:P31 wd:Q5 .
  ?item wikibase:sitelinks ?linkcount .
  OPTIONAL { ?w schema:about ?item; schema:isPartOf <https://en.wikipedia.org/>. }
  FILTER(!(BOUND(?w)))
}
ORDER BY DESC(?linkcount) ASC(?item)
LIMIT 5
|columns=label:name,P18,description,P106,P569,P570,P19,P20,item:wikidata item,?linkcount:site links
|links=red
|thumb=40
}}
{{Wikidata list end}}
result

This list is automatically generated from data in Wikidata and is periodically updated by Listeriabot.
Edits made within the list area will be removed on the next update!

name image description occupation date of birth date of death place of birth place of death wikidata item site links
Ulrike Aspöck
Austrian entomologist entomologist
biologist
1941-07-12 Linz Q21339012 5
Inessa Sharova Soviet and Russian entomologist (1931-2021) entomologist 1931-10-28 2021-06-22 Moscow Q74602140 5
Ottó Merkl
Hungarian entomologist (1957–2021) zoologist
Wikipedian
entomologist
museologist
1957-08-26 2021-02-19 Budapest Budapest Q1176042 4
Irma Allodiatoris Romanian-born Hungarian entomologist, anthropologist, historian of science, bibliographer entomologist
anthropologist
historian of science
bibliographer
1912-02-01 1988-03-07 Arad Budapest Q12349377 4
Yvonne Kranz-Baltensperger Swiss arachnologist and entomologist arachnologist
entomologist
1973 Innsbruck Q14939200 4
End of auto-generated list.

Country

[edit]

See our country redlists. A simple approach to create this would be using the country of citizenship (P27) property. But Wikidata may be missing the country of citizenship, but it may have other geographical properties that would be good enough for our purposes. So we can use a combination of country of citizenship (P27), country (P17), country of origin (P495), country for sport (P1532), and place of birth (P19). We can do it with the following SPARQL fragment:

VALUES ?country {
  wd:Q189 # Iceland
}
{
  { ?item (wdt:P27|wdt:P17|wdt:P495|wdt:P1532) ?country. }
  UNION
  { ?item (wdt:P19/wdt:P17) ?country. }
}

Here's a full example of a redlist of 5 women from Honduras (see also the actual Honduras redlist):

wikitext
{{Wikidata list
|sparql=SELECT DISTINCT ?item ?linkcount WHERE {
  VALUES ?country {
    wd:Q783
  }
  {
    { ?item (wdt:P27|wdt:P17|wdt:P495|wdt:P1532) ?country. }
    UNION
    { ?item (wdt:P19/wdt:P17) ?country. }
  }
  ?item wdt:P21 wd:Q6581072 .
  ?item wdt:P31 wd:Q5 .
  ?item wikibase:sitelinks ?linkcount .
  OPTIONAL { ?w schema:about ?item ; schema:isPartOf <https://en.wikipedia.org/> . }
  FILTER(!BOUND(?w))
}
ORDER BY DESC(?linkcount) ASC(?item)
LIMIT 5
|columns=label:name,P18,description,P106,P569,P570,P19,P20,item:wikidata item,?linkcount:site links
|links=red
|thumb=40
}}
{{Wikidata list end}}
result

This list is automatically generated from data in Wikidata and is periodically updated by Listeriabot.
Edits made within the list area will be removed on the next update!

name image description occupation date of birth date of death place of birth place of death wikidata item site links
Lidia López
Honduran financier, speaker and politician financier
orator
politician
fiduciary
company auditor
2000 Cortés Department Q105142220 3
Helen Umaña
Honduran university teacher university teacher
writer
1948
1942
Ocotepeque Q27915559 3
Celia Monterrosa
Honduran model model
beauty pageant contestant
1996-01-22 San José de Colinas Q54887692 3
Lena Karyn Gutiérrez Arévalo
politician politician 1977-04-19 Tegucigalpa Q11153510 2
Micaela Josefa Quezada Borjas First Lady 1795 Q15709765 2
End of auto-generated list.

Troubleshooting

[edit]

Killed by OS for overloading memory

[edit]

A list may fail to update because the bot ran out of memory. This is signaled with the error Killed by OS for overloading memory on manual updated. This problem is a known problem of ListeriaBot, and it is usually because there are many links to large entities. A workaround is reducing the number of links to geographical entitites. For example, removing the place of death (P20) column.