Wikipedia:Bots/Requests for approval/ShortDescBot
- The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at Wikipedia:Bots/Noticeboard. The result of the discussion was Approved.
New to bots on Wikipedia? Read these primers!
- Approval process – How this discussion works
- Overview/Policy – What bots are/What they can (or can't) do
- Dictionary – Explains bot-related jargon
Operator: MichaelMaggs (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)
Time filed: 13:52, Monday, December 14, 2020 (UTC)
Function overview: Add short description to pages in Moth categories that are currently lacking one
Automatic, Supervised, or Manual: Automatic, after pre-review
Programming language(s): Pywikibot
Source code available: GitHub
Links to relevant discussions (where appropriate): WikiProject. Also noted on the WP short description page
Edit period(s): One time
Estimated number of pages affected: 26,000
Namespace(s): Mainspace
Exclusion compliant (Yes/No): Yes
Function details: This is the first of a series of proposed bot tasks intended to make headway in adding short descriptions to the 3.5 million articles that still don’t have one. This is of some importance to mobile users as it means that a large number of articles still don't have any descriptive/disambiguating text appearing under the title when a search is carried out. I've some experience in working with short descriptions having added some 10,000 so far, most semi-manually with JWB and the short description helper gadget.
The moths seem a good place to start since suitably-precise short descriptions can’t trivially be generated from existing inboxes (even for articles where one exists), at least without expensive Lua calls. This task will skip over all pages that already have an existing short description. The bot deals with Wikipedia short descriptions only, and doesn't make use of Wikidata short descriptions in any way. I could add Bots exclusion-compliance if needed, but that doesn't seem appropriate here.
The aim is to keep the new descriptions simple so that they can be added to many articles quickly, while still maintaining a low error rate. The procedure is, on a category-by-category basis:
- Run the bot in trial mode, exporting all of the bot-proposed changes to a local spreadsheet
- Review for obvious errors, adjust the code, and repeat 1 until the automated error rate is sufficiently low
- Manually remove any remaining evident errors from the list
- Run the bot in edit mode, making changes only to the pages in the final corrected list.
The moth articles are well structured, and it’s possible to identify “Species of moth” and “Genus of moths” with near 100% accuracy. You can see a sample of 200 or so proposed edits from Category:Moths of the United States at User:MichaelMaggs/Moths; note that the bot correctly identifies several articles as genus which Wikidata wrongly has as species. Of the 837 target articles in that category, the bot is able to fix over 98%, with just a few being skipped where it wasn't quite able to extract the first sentence of the lead.
Discussion
[edit]My initial reaction was that this should be possible with taxobox directly, but as noted in the discussions and function details it's difficult to do cheaply, so the bot makes sense. There is clear consensus for this specific task as well as for prior bots like this, so no concerns there.
Reviewing the list from User:MichaelMaggs/Moths, the cases where the bot and Wikidata differ (Apreta, Apocrisias, Abrenthia) are all monotypic genera. Our convention is for the article to be titled after the genus, but Wikidata doesn't seem to share this as far as I can tell; for example, it has separate items for the species and the genus (i.e. it may be that we are associating these articles with the wrong Wikidata items and not that the Wikidata items are wrong). I'm not sure it would be incorrect for such a short description to say "species" instead of "genus" (they are, in a sense, the same thing); in fact, the example from the guideline of a monotypic order, Amphionides, actually has "monotypic species" in its short description. I don't think what you are doing is wrong or the bot should be changed, but I'm wondering if it points to deeper issues with our categorization that might need to be noted and addressed later.
Exclusion compliance indeed seems unlikely to be an issue, but it is cheap to add and serves as an extra safety check. As you'll be editing the mainspace and lots of pages, I recommend you add it.
I also did a quick code review. I didn't find any major issues, but here are some suggestions:
- When we add {{Short description}}, should we set the
|bot=ShortDescBot
parameter? shortdesc_exists()
only checks for {{Short description}} in the lead section. While it would be against the MOS, to be safe I think we should check it for anywhere in the page.
I also wanted to point out a few Python conventions to encourage cleaner code, unrelated to functionality. Feel free to ignore these:
- The script defines several global options (like
required_words
) but then passes them as function parameters with the same name. This variable shadowing isn't necessary and can be confusing; you can use the globals in the function body directly. This can greatly simplify your function signatures. - On the subject of globals, there's no need to use the
global
keyword (e.g.global wikipedia
) in a function unless you are assigning a value to that name in that function. Python knows to access names globally if they aren't defined in the function body.
Thanks! (Please ping me if responding.) — Earwig talk 07:06, 18 December 2020 (UTC)[reply]
- Just to note that the way I coded this originally, in [1] lines 22-27 and 220-227, looked at the page info rather than whether the template was in the lead section. I think this is the safer way to do it than the new check, as it works even in cases where a template auto-includes a short description without using the template. Thanks. Mike Peel (talk) 11:23, 18 December 2020 (UTC)[reply]
- Yes, that's a much safer check. Good point. — Earwig talk 18:35, 18 December 2020 (UTC)[reply]
- The Earwig, Mike Peel: thanks for your comments, and for the most helpful suggested improvemements to the code. I'll make those changes.
- Interesting question about the
|bot=
parameter. I've never once come across that on any page I've looked at, though PearBOT 5 seems to have used it, and to be honest I can't see that it's of much use. All it does is to clutter the wikicode permanently with the bot/username that made the change - information which is easily available in the history, and which isn't so far as I know permanently recorded in connection with other bot edits. The parameter is 'optional' according to the template, and I'd prefer not to use it unless BAG recommends that I should. MichaelMaggs (talk) 18:00, 18 December 2020 (UTC)[reply]- I agree that it clutters wikicode and requires a separate edit to clean up later. The main benefit seems to be categorization, but that can be similarly achieved with a query over the user contributions if we end up in a situation where we need to mass-revert or something. Previous BRFAs (1, 2) also did not add it. If no one else objects, I'm fine leaving it out. — Earwig talk 18:35, 18 December 2020 (UTC)[reply]
- Approved for trial (50 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. Let's give it a shot after the suggested changes are made. — Earwig talk 18:35, 18 December 2020 (UTC)[reply]
- Trial complete. User:The Earwig Thanks. Here are the results of the trial run. I've also re-published the source code in case you want to review that. MichaelMaggs (talk) 14:31, 19 December 2020 (UTC)[reply]
- @MichaelMaggs: Thanks. The results look good, but I'd like to leave the discussion open for a bit longer in case anyone else wants to chime in. Also, please check your email. — Earwig talk 23:29, 19 December 2020 (UTC)[reply]
- @The Earwig: Yes that's fine. Will be ready to go when you are. MichaelMaggs (talk) 23:59, 19 December 2020 (UTC)[reply]
- @MichaelMaggs: Thanks. The results look good, but I'd like to leave the discussion open for a bit longer in case anyone else wants to chime in. Also, please check your email. — Earwig talk 23:29, 19 December 2020 (UTC)[reply]
- Trial complete. User:The Earwig Thanks. Here are the results of the trial run. I've also re-published the source code in case you want to review that. MichaelMaggs (talk) 14:31, 19 December 2020 (UTC)[reply]
- Approved. — Earwig talk 00:54, 24 December 2020 (UTC)[reply]
- The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at Wikipedia:Bots/Noticeboard.