Jump to content

User talk:Andrew Gray/Archives/121

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia


sports problem

Hi, Andrew! I was reading the threads about your analysis at GGTF and WiR, and I started to kind of pull the discussion off track there with my statistical nerdiness so I thought I'd ask here. I was just astounded that 46% of BLPs are athletes. I'm trying to reconcile this in my head, because I'm reeling. Can it possibly be that almost half of BLPs are notably primarily due to their athletic career? That is, this isn't Donald Trump being coded as a golfer and therefore part of that 46%? If 46% of wikipedia's BLPs are for people who are notable primarily for their athletic career, that's not a gender problem, it's a sports problem. --valereee (talk) 09:58, 18 June 2019 (UTC)

@Valereee: It's really quite a startling figure, isn't it? I vaguely suspected it would be high, but nothing like that level. There are 536k athlete articles in total - 33% of all biographies, and about 9% of all articles. (It's a higher proportion of BLPs than all biographies because of the strong recency bias)
One caveat is that our definition of "athlete" is a little debatable. The Wikidata ontology, which we're using here, considers dancers to be athletes, which I guess is something you could argue either way; ditto chess players, racing drivers, etc. I would estimate no more than 20-40k people fall into these "maybe or maybe not what we mean when we say athlete" groups; the headline figure is probably reasonable, give or take 5-10%.
The second caveat is the one you raise - are we picking up people who are really notable for other reasons? In theory the Wikidata "occupation" field should not be used for people unless it is a significant part of their life - my feeling is that there will inevitably be some miscategorisation here, along the lines of your golf example, but it's relatively low level.
I've done some tests by looking for people who have some kind of other non-athletic occupation listed as well. Other than things like "football manager", the most common non-sport-related single occupation was "politician" (2.5k) or "actor" (3.5k, and a third of those are dancers). Given that some athletes do go on to become politicians (or even vice versa), this seems like a plausible sort of number and suggests the classification is reasonably clean.
Looking at the breakdown by field, I make it 171k (association) footballers, 33k (gridiron) footballers, 32k "athletes" in the more restrictive sense, 31k cricketers, 26k baseball players, 20k rugby players, 20k basketball players, 19k ice hockey players. Those groups together give us about two thirds of the total - there will be some overlap for individuals who're listed twice, of course, but it highlights the sort of thing that drives the numbers - large well-documented team sports. And of those, by far the largest proportion seems to be (historic?) football players... Andrew Gray (talk) 22:51, 18 June 2019 (UTC)
As I asked at WiR talk, where does this 46% figure come from? I'm not seeing it at all. The Denelezh figures look like 22% "sports figures" to me, & that's not BLPs, but people born after 1800. So, no, it can't possibly be. We have to be careful with things like this, as the next thing you know, claims like this are plastered over the world's media, and believed. Johnbod (talk) 00:03, 19 June 2019 (UTC)
Johnbod, I actually think 'wikipedia has a sports problem' would be a better thing to be plastered all over the world's media than 'wikipedia is sexist.' --valereee (talk) 10:24, 19 June 2019 (UTC)
Not if it's 'wikipedia has a sports problem (that's twice as big as it actually is)'. Plus I think you'll find journalists and editors won't be very interested in that story. Can we confirm what the actual figure is? To avoid misleading quick skimmers, it should be corrected where you have posted it. Johnbod (talk) 13:27, 19 June 2019 (UTC)
@Johnbod: I haven't written this up properly yet - hoping to find some time soon - but it followed on from the gender work here, when I was trying to extend it with some numbers on occupational subgroups. (It worked for politicians and athletes, broke down for researchers, & as I was most interested in them I didn't push further). A worked set of numbers follows, to make it clear how I got there...
As you note, denelezh can't get us "living people". For all enwiki biographies, denelezh gives us 536,346 athletes out of 1,632,072 people - ~33% of all biographies on enwiki. The 22% figure you've got, I think, is the share of "all biographies with any sitelink" which are athletes. (Apparently other projects are less sports-oriented than we are, though "any sitelink" includes Commons & Wikisource so those will probably inflate the share for artists/writers). NB these numbers do include pre-1800 data, it's just that it's not shown on the breakdown lines.
To find the numbers for BLPs, I looked at the intersection of articles identified as athletes using the same process as denelezh (occupation tagging in Wikidata), and articles in Category:Living people. Overall, I got 64,871 matches for "female athletes", and 348,968 for "male athletes"; total 413,839. At the time I did this there were 906,720 entries in Category:Living people, so 45.64% of BLPs are identifiable as athletes, which rounds up to our 46% figure. (I was only looking at M/F ratios so this does not include anyone not coded as M/F, but I do not anticipate that would change the overall figures substantially - at most ±0.1%).
As valereee noted, there is still the underlying question of people who are counted as "sports" but who we're mostly interested in for other reasons. It's hard to say for sure how many of these there might be, but my feeling is that it's not overwhelmingly high; the relatively low rates of overlap between athletic and non-athletic occupation entries seem to support that. It would be quite reasonable to round down the overall numbers to take account of this, but even with a generous estimate for "people who are primarily famous in other fields, I wouldn't think it would get much below, say, ~40% of BLPs. I'll see if I can think of more rigorous ways to investigate this. As I said, these numbers did surprise me somewhat, but overall, I find them reasonably plausible. Andrew Gray (talk) 20:37, 19 June 2019 (UTC)
Thanks, Andrew, but I must say I find it very hard to believe a figure this high. I wonder if there is a double-counting issue? Have you accounted for the lack of death dates problem? I suspect this affects athletes much more than other categories, for obvious reasons. What is the athletic % in the over-95 group for example, or over 100? One often finds people missing from Category:Living people btw - many editors don't add it. But thanks for doing this stuff, no doubt there are many wrinkles we will learn to identify. It's certainly worth nailing a correct figure. Johnbod (talk) 21:31, 19 June 2019 (UTC)
@Johnbod: Other than the issue of including people who're "not primarily athletes", there shouldn't be any double-counting based on people being athletes in multiple different ways - the methodology spits out a single list of distinct WP page titles/IDs and I've confirmed they're all unique.
WRT dates, I've only tested against the presence of Category:Living people and trusted to that being reasonably well maintained - at least, probably better maintained than the alternatives! I definitely agree that "we don't know if they're still living" may be an issue, particularly with all the "man who played two games for Partick Thistle in 1951 and then became a welder" type articles. I'll see if I can work out some way to isolate that group (say "born before 1940 or active before 1960, believed living"?) and run the stats seperately, though it may be tricky to do so. Andrew Gray (talk) 11:46, 20 June 2019 (UTC)

Wikidata weekly summary #370

Articles you might like to edit, from SuggestBot

Note: All columns in this table are sortable, allowing you to rearrange the table so the articles most interesting to you are shown at the top. All images have mouse-over popups with more information. For more information about the columns and categories, please consult the documentation and please get in touch on SuggestBot's talk page with any questions you might have.

Views/Day Quality Title Tagged with…
9 Quality: Low, Assessed class: Start, Predicted class: Start Baron Inglewood (talk) Add sources
6 Quality: Medium, Assessed class: C, Predicted class: C Burdett baronets (talk) Add sources
9 Quality: Medium, Assessed class: Start, Predicted class: B Blackett baronets (talk) Add sources
26 Quality: Medium, Assessed class: Start, Predicted class: C Dashwood baronets (talk) Add sources
4 Quality: Low, Assessed class: Start, Predicted class: Start Baron Alington (talk) Add sources
81 Quality: Medium, Assessed class: Start, Predicted class: C Marquess of Londonderry (talk) Add sources
44 Quality: Medium, Assessed class: B, Predicted class: B 8th Fighter Squadron (talk) Cleanup
259 Quality: Medium, Assessed class: Start, Predicted class: C Statehood movement in Puerto Rico (talk) Cleanup
32 Quality: Medium, Assessed class: Start, Predicted class: B Culture of Washington, D.C. (talk) Cleanup
63 Quality: Low, Assessed class: Stub, Predicted class: Start Agriculture in Germany (talk) Expand
44 Quality: Medium, Assessed class: Start, Predicted class: C Zoltán Dani (talk) Expand
46 Quality: Medium, Assessed class: Start, Predicted class: C Capital punishment in Sweden (talk) Expand
541 Quality: Medium, Assessed class: Start, Predicted class: C United States Air Force Thunderbirds (talk) Unencyclopaedic
200 Quality: Medium, Assessed class: Start, Predicted class: B Historiography of the British Empire (talk) Unencyclopaedic
201 Quality: Medium, Assessed class: C, Predicted class: B English-language spelling reform (talk) Unencyclopaedic
4 Quality: Medium, Assessed class: Start, Predicted class: C Charles Cardwell McCabe (talk) Merge
3 Quality: Low, Assessed class: Start, Predicted class: Start Carlton Tower and Portman Hotel shootings (talk) Merge
42 Quality: Medium, Assessed class: C, Predicted class: C Singing game (talk) Merge
35 Quality: Low, Assessed class: Start, Predicted class: Start Capital punishment in Iceland (talk) Wikify
307 Quality: Medium, Assessed class: C, Predicted class: C Kaylani Lei (talk) Wikify
887 Quality: Medium, Assessed class: C, Predicted class: C Belladonna (actress) (talk) Wikify
2 Quality: Low, Assessed class: NA, Predicted class: Start Andy McQuade (talk) Orphan
2 Quality: Medium, Assessed class: Start, Predicted class: C A M Nurul Islam (talk) Orphan
3 Quality: Low, Assessed class: Stub, Predicted class: Stub Anđelko Ćuk (talk) Orphan
8 Quality: Low, Assessed class: Stub, Predicted class: Stub Law and Justice Youth Forum (talk) Stub
6 Quality: Low, Assessed class: Stub, Predicted class: Start Sir Thomas Reade, 4th Baronet (talk) Stub
4 Quality: Low, Assessed class: Stub, Predicted class: Stub Alfred Arnold (talk) Stub
4 Quality: Low, Assessed class: Stub, Predicted class: Stub Bill Shanahan (talk) Stub
4 Quality: Low, Assessed class: Stub, Predicted class: Stub Rina De Liguoro (talk) Stub
5 Quality: Medium, Assessed class: Stub, Predicted class: C Wilfrid Normand, Baron Normand (talk) Stub

SuggestBot picks articles in a number of ways based on other articles you've edited, including straight text similarity, following wikilinks, and matching your editing patterns against those of other Wikipedians. It tries to recommend only articles that other Wikipedians have marked as needing work. We appreciate that you have signed up to receive suggestions regularly; your contributions make Wikipedia better — thanks for helping!

If you have feedback on how to make SuggestBot better, please let us know on SuggestBot's talk page. -- SuggestBot (talk) 23:32, 17 June 2019 (UTC)

Articles you might like to edit, from SuggestBot

Note: All columns in this table are sortable, allowing you to rearrange the table so the articles most interesting to you are shown at the top. All images have mouse-over popups with more information. For more information about the columns and categories, please consult the documentation and please get in touch on SuggestBot's talk page with any questions you might have.

Views/Day Quality Title Tagged with…
6 Quality: Low, Assessed class: Start, Predicted class: Start Alexander Tsiurupa (talk) Add sources
13 Quality: Medium, Assessed class: Start, Predicted class: C Brooke baronets (talk) Add sources
5 Quality: Medium, Assessed class: Start, Predicted class: C Lowther baronets (talk) Add sources
7 Quality: Medium, Assessed class: Start, Predicted class: C Sydney Arnold, 1st Baron Arnold (talk) Add sources
5 Quality: Low, Assessed class: Start, Predicted class: Start Cave baronets (talk) Add sources
3 Quality: Medium, Assessed class: Start, Predicted class: C 1928 St Ives by-election (talk) Add sources
199 Quality: High, Assessed class: C, Predicted class: FA Premiership of Theresa May (talk) Cleanup
146 Quality: High, Assessed class: GA, Predicted class: GA Zenbook (talk) Cleanup
9 Quality: Medium, Assessed class: Start, Predicted class: C Ralph Julian Rivers (talk) Cleanup
44 Quality: Medium, Assessed class: Start, Predicted class: C Plymouth City Council (talk) Expand
85 Quality: Low, Assessed class: Stub, Predicted class: Start Florence Parly (talk) Expand
4,487 Quality: High, Assessed class: C, Predicted class: GA Game of Thrones (season 7) (talk) Expand
9 Quality: Medium, Assessed class: Start, Predicted class: C Louisiana Farm Bureau Federation (talk) Unencyclopaedic
70 Quality: Medium, Assessed class: C, Predicted class: C Mark Clarke (politician) (talk) Unencyclopaedic
21 Quality: Medium, Assessed class: Start, Predicted class: C Wu Chinese-speaking people (talk) Unencyclopaedic
89 Quality: Low, Assessed class: Start, Predicted class: Start Leave Means Leave (talk) Merge
28 Quality: Low, Assessed class: NA, Predicted class: Start Dabqaad (talk) Merge
4 Quality: Low, Assessed class: NA, Predicted class: Stub Fateless Records (talk) Merge
1,550 Quality: Medium, Assessed class: B, Predicted class: C ITER (talk) Wikify
3 Quality: Medium, Assessed class: NA, Predicted class: C Ralph Waldo Swetman (talk) Wikify
25 Quality: Low, Assessed class: Stub, Predicted class: Start 1998 Southwark London Borough Council election (talk) Wikify
6 Quality: Medium, Assessed class: NA, Predicted class: C Psycho Village (talk) Orphan
3 Quality: Low, Assessed class: Stub, Predicted class: Start Edmund Storms (talk) Orphan
2 Quality: Low, Assessed class: Start, Predicted class: Start Alam Udang Bum (talk) Orphan
19 Quality: Low, Assessed class: Stub, Predicted class: Stub Capital punishment in Peru (talk) Stub
18 Quality: Low, Assessed class: Stub, Predicted class: Stub Capital punishment in Cape Verde (talk) Stub
26 Quality: Low, Assessed class: Stub, Predicted class: Stub Merrily We Roll Along (song) (talk) Stub
8 Quality: Medium, Assessed class: Stub, Predicted class: C James Cavendish (MP for Derby) (talk) Stub
23 Quality: Low, Assessed class: Stub, Predicted class: Stub Capital punishment in Macau (talk) Stub
2 Quality: Low, Assessed class: Stub, Predicted class: Start Ralph Freman (1666–1742) (talk) Stub

SuggestBot picks articles in a number of ways based on other articles you've edited, including straight text similarity, following wikilinks, and matching your editing patterns against those of other Wikipedians. It tries to recommend only articles that other Wikipedians have marked as needing work. We appreciate that you have signed up to receive suggestions regularly; your contributions make Wikipedia better — thanks for helping!

If you have feedback on how to make SuggestBot better, please let us know on SuggestBot's talk page. -- SuggestBot (talk) 23:38, 24 June 2019 (UTC)

The June 2019 Signpost is out!

Wikidata weekly summary #371

WikiCup 2019 July newsletter

The third round of the 2019 WikiCup has now come to an end. The 16 users who made it to the fourth round needed to score at least 68 points, which is substantially lower than last year's 227 points. Our top scorers in round 3 were:

  • Norfolk Island Cas Liber, our winner in 2016, with 500 points derived mainly from a featured article and two GAs on natural history topics
  • South Carolina Adam Cuerden, with 480 points, a tally built on 16 featured pictures, the result of meticulous restoration work
  • Cascadia (independence movement) SounderBruce, a finalist in the last two years, with 306 points from a variety of submissions, mostly related to sport or the State of Washington
  • United States Usernameunique, with 305 points derived from a featured article and two GAs on archaeology and related topics

Contestants managed 4 (5) featured articles, 4 featured lists, 18 featured pictures, 29 good articles, 50 DYK entries, 9 ITN entries, and 39 good article reviews. As we enter the fourth round, remember that any content promoted after the end of round 3 but before the start of round 4 can be claimed in round 4. Please also remember that you must claim your points within 14 days of "earning" them, and it is imperative to claim them in the correct round; one FA claim had to be rejected because it was incorrectly submitted (claimed in Round 3 when it qualified for Round 2), so be warned! When doing GARs, please make sure that you check that all the GA criteria are fully met.

If you are concerned that your nomination—whether it is at good article nominations, a featured process, or anything else—will not receive the necessary reviews, please list it on Wikipedia:WikiCup/Reviews Needed (remember to remove your listing when no longer required). Questions are welcome on Wikipedia talk:WikiCup, and the judges are reachable on their talk pages or by email. Good luck! If you wish to start or stop receiving this newsletter, please feel free to add or remove your name from Wikipedia:WikiCup/Newsletter/Send. Godot13 (talk), Sturmvogel 66 (talk), Vanamonde (talk) and Cwmhiraeth (talk). MediaWiki message delivery (talk) 20:11, 2 July 2019 (UTC)

Articles you might like to edit, from SuggestBot

Note: All columns in this table are sortable, allowing you to rearrange the table so the articles most interesting to you are shown at the top. All images have mouse-over popups with more information. For more information about the columns and categories, please consult the documentation and please get in touch on SuggestBot's talk page with any questions you might have.

Views/Day Quality Title Tagged with…
1,839 Quality: Medium, Assessed class: C, Predicted class: C Brexit Party (talk) Add sources
20 Quality: Medium, Assessed class: Start, Predicted class: B Labour candidates and parties in Canada (talk) Add sources
3 Quality: Low, Assessed class: Stub, Predicted class: Start Mervyn Manningham-Buller (talk) Add sources
38,034 Quality: High, Assessed class: B, Predicted class: GA Rory Stewart (talk) Add sources
8 Quality: Medium, Assessed class: Start, Predicted class: C Massey Lopes (talk) Add sources
32 Quality: Low, Assessed class: Stub, Predicted class: Start ENEA (Italy) (talk) Add sources
204 Quality: Medium, Assessed class: B, Predicted class: C Elmendorf Air Force Base (talk) Cleanup
14 Quality: High, Assessed class: C, Predicted class: FA Tollmann's bolide hypothesis (talk) Cleanup
108 Quality: Medium, Assessed class: Start, Predicted class: B Religion and capital punishment (talk) Cleanup
776 Quality: Medium, Assessed class: C, Predicted class: B Colonial history of the United States (talk) Expand
4 Quality: Medium, Assessed class: Start, Predicted class: C Henry Slesser (talk) Expand
133 Quality: Medium, Assessed class: Start, Predicted class: B Tenant farmer (talk) Expand
621 Quality: Medium, Assessed class: NA, Predicted class: C Isotopes of hydrogen (talk) Unencyclopaedic
265 Quality: Medium, Assessed class: B, Predicted class: B Political status of Puerto Rico (talk) Unencyclopaedic
59 Quality: Medium, Assessed class: NA, Predicted class: C Immigration Enforcement (talk) Unencyclopaedic
796 Quality: High, Assessed class: C, Predicted class: FA Western philosophy (talk) Merge
260 Quality: Medium, Assessed class: C, Predicted class: C Simultaneous multithreading (talk) Merge
3 Quality: Low, Assessed class: Start, Predicted class: Start Moradei (talk) Merge
5 Quality: Medium, Assessed class: NA, Predicted class: C Philip Pedley (talk) Wikify
8 Quality: Medium, Assessed class: Start, Predicted class: C Nathaniel Bliss (talk) Wikify
468 Quality: Medium, Assessed class: Start, Predicted class: C Aurora Snow (talk) Wikify
2 Quality: Low, Assessed class: Stub, Predicted class: Stub Amado Tame Shear (talk) Orphan
2 Quality: Low, Assessed class: Start, Predicted class: Stub Abdul Attah (talk) Orphan
3 Quality: Low, Assessed class: Stub, Predicted class: Start Arrowhead Pawn Shop (talk) Orphan
4 Quality: Low, Assessed class: Stub, Predicted class: Stub Leeds Intelligencer (talk) Stub
3 Quality: Low, Assessed class: Stub, Predicted class: Start Giles Green (talk) Stub
51 Quality: Low, Assessed class: Stub, Predicted class: Start Patrick O'Flynn (talk) Stub
2 Quality: Low, Assessed class: Stub, Predicted class: Stub Brodie Westen (talk) Stub
9 Quality: Low, Assessed class: Stub, Predicted class: Stub Paul Johnson (United States Air Force) (talk) Stub
4 Quality: Low, Assessed class: Stub, Predicted class: Stub Wes Stevens (talk) Stub

SuggestBot picks articles in a number of ways based on other articles you've edited, including straight text similarity, following wikilinks, and matching your editing patterns against those of other Wikipedians. It tries to recommend only articles that other Wikipedians have marked as needing work. We appreciate that you have signed up to receive suggestions regularly; your contributions make Wikipedia better — thanks for helping!

If you have feedback on how to make SuggestBot better, please let us know on SuggestBot's talk page. -- SuggestBot (talk) 23:25, 1 July 2019 (UTC)

Wikidata weekly summary #372