Wikidata:Requests for permissions/Bot/GZWDer (flood) 4
- The following discussion is closed. Please do not modify it. Subsequent comments should be made in a new section. A summary of the conclusions reached follows.
- Approved --Lymantria (talk) 17:00, 13 September 2018 (UTC)[reply]
Contents
Task
[edit]GZWDer (flood) (talk • contribs • new items • new lexemes • SUL • Block log • User rights log • User rights • xtools)
Operator: GZWDer (talk • contribs • logs)
Task/s: Mass creation of new items across Wikimedia projects
- Repeatly clearing Special:UnconnectedPages so that every wikis will have zero unconnected articles and categories (exception see below)
Code: via PetScan and occasionally Pywikibot (appropriate Pywikibot action is blocked by phab:T200399, but PetScan currently does not support some wikis
Function details: This is a task already have repeatly run since 2014, and currently account for almost half of the account's edits. more than 5,500,000 items was created using this tool. I start a review of this task due to Wikidata:Administrators'_noticeboard#Please_block_Special:Contributions/GZWDer_(flood).
For clarifing, this review is about indiscriminated creation of items across Wikimedia projects (based of deep category/template link, or Special:UnconnectedPages); this task currently does not plan to (for objections already raised):
- Create items for pages that is excluded by Wikidata:Notability or Wikidata:Notability/Exclusion criteria.
- Create items for templates (except circumstances described below).
- Create items for Wikisource pages (except circumstances described below).
These is what the account currently does, but not that controversial and probably not subject to the review: (however, you may also raise your concern here)
Create items for small set (<50) of predefined pages, which is (manually or using scripts) improved laterCreating items for specific set of pages, provided that all items created are manually curated and improved.- Before using PetScan or Pasleim's harvest_template, create items to ensure related items exist. The number of items may be large, but all items will be filled finally. (This does not include mass addition of categories or templates.)
- Create items using Pywikibot's harvest_template.py or coordinate_import.py (but not newitem.py). Cebwiki is excluded as it is addressed individually.
- Create items with no sitelinks, but only statements, provided that they does not duplicate existing topic (but large tasks that is not already approved to be run by another bot needs specific approval).
- Create items for only sitelink connection proposes (e.g. to connect templates or project pages imported across projects, given that they does not meet the exclusion criteria.
- Any item creation invoked manually (e.g. Mix'n'match).
--GZWDer (talk) 15:20, 1 September 2018 (UTC)[reply]
Discussion
[edit]- Hi! Could you clarify point 1, giving an example if possible? Thank you, --Epìdosis 15:53, 1 September 2018 (UTC)[reply]
- @Epìdosis: The point 1 in the first or second part?--GZWDer (talk) 16:04, 1 September 2018 (UTC)[reply]
- Second part. The first part in my opinion is all OK. --Epìdosis 16:18, 1 September 2018 (UTC)[reply]
- After a thought I have clarified the first point.--GZWDer (talk) 16:38, 1 September 2018 (UTC)[reply]
- OK; could you maybe give an example of a "specific set of pages"? Thank you, --Epìdosis 17:17, 1 September 2018 (UTC)[reply]
- This may be generated using any methods (PetScan etc. for pages without items in a specific category); This point is meant to be an alternative of manual item creation (so it is important that all items created will be later checked manually).--GZWDer (talk) 17:38, 1 September 2018 (UTC)[reply]
- OK; could you maybe give an example of a "specific set of pages"? Thank you, --Epìdosis 17:17, 1 September 2018 (UTC)[reply]
- After a thought I have clarified the first point.--GZWDer (talk) 16:38, 1 September 2018 (UTC)[reply]
- Second part. The first part in my opinion is all OK. --Epìdosis 16:18, 1 September 2018 (UTC)[reply]
- @Epìdosis: The point 1 in the first or second part?--GZWDer (talk) 16:04, 1 September 2018 (UTC)[reply]
- Support I agree with all the points written above in the two lists. --Epìdosis 17:54, 1 September 2018 (UTC)[reply]
- This review is to decide whether mass item creation in general cases (i.e. except the circumstances listed above) is acceptible (see the AN discussion). @Epìdosis: Does you mean you will not support mass item creations other than the cases listed above?--GZWDer (talk) 18:27, 1 September 2018 (UTC)[reply]
- For information: My long-term plan is to repeatly clearing Special:UnconnectedPages so that every wikis will have zero unconnected articles and categories (probably excluding newly-created pages, and non-notable pages will be excluded by phab:T97577). It is almost impossible to check possible duplicates for every newly created articles, so I propose to periodically mass import them. (Dexbot did this for some wikis previously.)--GZWDer (talk) 18:37, 1 September 2018 (UTC)[reply]
- I agree with clearing Special:UnconnectedPages periodically excluding newly-created pages (i.e. less than two months); I would adopt a stricter term for pages with Template:Merge (Q6919004) (i.e. one year); I would also require to previously check Wikimedia disambiguation page (Q4167410) (duplicates can easily be found using their names). I support the exceptions in the first list and the cases in the second list. I support creation in different cases only if then a human checks the items manually. --Epìdosis 18:48, 1 September 2018 (UTC)[reply]
- But, it is a issue that the community have concern about mass importing unconnected pages. Yes we may exclude pages with title same as label of an existing item, but this means there will be a number of pages never connected without human intervention.--GZWDer (talk) 19:00, 1 September 2018 (UTC)[reply]
- I agree with clearing Special:UnconnectedPages periodically excluding newly-created pages (i.e. less than two months); I would adopt a stricter term for pages with Template:Merge (Q6919004) (i.e. one year); I would also require to previously check Wikimedia disambiguation page (Q4167410) (duplicates can easily be found using their names). I support the exceptions in the first list and the cases in the second list. I support creation in different cases only if then a human checks the items manually. --Epìdosis 18:48, 1 September 2018 (UTC)[reply]
- @Jura1: You posed a bunch of questions below, answered by GZWDer. Would you be so kind as to inform us whether you see any problems or omissions in the answers given? Lymantria (talk) 08:17, 7 September 2018 (UTC)[reply]
- I don't really see an advantage in approving this, but if people are happy with it in general, I guess I just have to filter wikibase:statements 0 --- Jura 09:59, 11 September 2018 (UTC)[reply]
- @GZWDer: Jura1 has posed some follow up questions below. Would you be so kind as to proceed on those? Lymantria (talk) 08:18, 7 September 2018 (UTC)[reply]
- The questions and answers have given a good insight in the controversy and the way the largest of problems might be avoided. Having read them and taking into account the support vote, I will be approving this task in a couple of days, provided that no new objections will be raised. Lymantria (talk) 16:36, 11 September 2018 (UTC)[reply]
General questions
[edit]- Question Where does the description of what it does exclude the duplicates it currently creates? How can admins monitor that it excludes them? --- Jura 20:38, 1 September 2018 (UTC)[reply]
- This bot can work on two modes: one skips pages with title same as label of an existing item (note this does not eliminate all duplicates); this means there will be a number of pages never connected without human intervention. Another is create items for all pages. This will increase the work of various merge process, but actually reduced the number of "hidden" duplicates (by turning them to real duplicates, which is much easier to discover).--GZWDer (talk) 21:40, 1 September 2018 (UTC)[reply]
- You don't seem to address any of the concerns raised with your bot activity up to now. Creating a duplicate without any statements can lead to Wikidata users doing the same work twice. Merging sometimes happens only after users added more statements leading to the identification of a duplicate. If attempts to match an article against an existing item, this would have been much easier. --- Jura 07:39, 2 September 2018 (UTC)[reply]
- There're no simple and foolproof method to compare new articles and items automatically, so users work on manual new item connection can only do them manually. I do think the speed of article creation is much higher than the speed of linking them to Wikidata (assuming no faultless bots exist), so this will only result in larger and larger backlog. Again, even if there're some duplicates, creating new items provides space of improvement (this include future merger).
- Question Does this create items for any new page or wait two months before creating an item? --- Jura 20:38, 1 September 2018 (UTC)[reply]
- In my opinion if there're bot (or admins willing to) periodly delete unclaimed or poorly claimed items with no sitelink, and merging items for redirect pages, I don't think it is a problem to create items as soon as local pages are created. Pywikibot defaultly have threshold of created 14 days ago and last edit 7 days ago; this may be a good option. However, I think the final solution is phab:T97577#1254527.--GZWDer (talk) 21:40, 1 September 2018 (UTC)[reply]
- Question As there is no such bot, should we wait with approval until it exists? --- Jura 07:39, 2 September 2018 (UTC)[reply]
- Before it exists we can just exclude new pages from the list generated (both PetScan results and Special:UnconnectedPages are sorted by the day of page creation). Again Pywikibot defaultly have a threshold of item creation.--GZWDer (talk) 15:18, 2 September 2018 (UTC)[reply]
- Question how will it interact with cases where users create items with a substantive number of statements periodically? --- Jura 07:39, 2 September 2018 (UTC)[reply]
- Many people who create items with a substantive number of statements also deal with existing items. I don't see a kind of page that is periodically created (so that they will periodically be new unconnected ones), but have a common non-trival (i.e. not only P31 and like) property.--GZWDer (talk) 15:18, 2 September 2018 (UTC)[reply]
- Question How will it avoid that we get more items without statements? --- Jura 20:38, 1 September 2018 (UTC)[reply]
- I don't think "more items without statements" itself is harmful. They does provide a start point for adding statements using various tools (which does not work when the item does not exist). It is much more useful when Wikidata:Database reports/items without claims categories and Wikidata:Database reports/templates and items with 0 claims get updated again (which may be blocked by phab:T114904).--GZWDer (talk) 21:40, 1 September 2018 (UTC)[reply]
- I think items without statements are problematic and complicate many tasks, especially for people operating bots that check first before creating items. --- Jura 07:39, 2 September 2018 (UTC)[reply]
- Hidden duplicates are more problematic as we don't even have a simple way to find them.--GZWDer (talk) 15:18, 2 September 2018 (UTC)[reply]
- Question Will it merge any duplicates it creates or created? --- Jura 20:38, 1 September 2018 (UTC)[reply]
- The task only create new items. Duplicated may be found and merged by various tools after items are created (none will work if no items exist).--GZWDer (talk) 21:40, 1 September 2018 (UTC)[reply]
- Question What is meant with #5 ("Create items for only sitelink connection proposes")? Can you give samples other than templates. --- Jura 20:38, 1 September 2018 (UTC)[reply]
- See Q56374627 as an example.--GZWDer (talk) 21:40, 1 September 2018 (UTC)[reply]
- Question Can we be sure these will have at least two sitelinks? --- Jura 07:39, 2 September 2018 (UTC)[reply]
- It only works on a predefined list of pages. After the work items that can not be connected to other wiki (i.e. have only one sitelink) should be few and can be checked manually. By the way, the scope of review is creation of items in wikis in general, not this highly specific but less concroversial task.
- Question As notability policy has exclusion criteria, how will you attempt to gauge other requirements? Not everything not explicitly excluded is automatically notable. --- Jura 07:39, 2 September 2018 (UTC)[reply]
- All articles and categories (and Wikisource author pages etc.) that is not explicitly excluded are presumed to be notable. Even if after a discussion we need to exclude a specific kind of page, it's easy to generate the list of items affected via PetScan. Note I don't plan to mass import pages in namespaces for majorly maintain use (Project, Help, etc), except the six circumstances above.--GZWDer (talk) 15:18, 2 September 2018 (UTC)[reply]
Scope questions
[edit]- Question How can admins monitor that activity of the bot relates to this task and not another one? --- Jura 21:28, 1 September 2018 (UTC)[reply]
- If this task is approved all kinds of item creation will become legitimate, other than pages explicitly excluded.--GZWDer (talk) 21:40, 1 September 2018 (UTC)[reply]
- So admins cannot monitor if activity matches this task. This is problematic. I think you should find a way to identify edits that relate to this task. --- Jura 07:39, 2 September 2018 (UTC)[reply]
- For a simple way you can check [1] (items for new concepts without sitelinks will also appear here, but I will avoid doing works involving and not involving Wikimedia pages simultaneously). For long term solution, see phab:T200234 and [2].--GZWDer (talk) 15:30, 2 September 2018 (UTC)[reply]
- Question Will this exclude items for primarily bot created pages/wikis, such as cebwiki? --- Jura 21:28, 1 September 2018 (UTC)[reply]
- For bot created pages/wikis in general there're no reason not to import them. For cebwiki specifically see below.--GZWDer (talk) 21:40, 1 September 2018 (UTC)[reply]
- I think we should consider excluding them from this task approval. --- Jura 07:39, 2 September 2018 (UTC)[reply]
- If we provided that 1. the newly created pages have very few existing duplicates (i.e. are about completely new concepts) or 2. there're clearly a simple way to identify duplicates (such as making a list of existing instances and comparing them with new pages, especially if the concepts can be identified by unique identifier and they are mostly complete for existing instances in Wikidata) I think the task can be done.
- Question Will this exclude anything related to Wikidata:Requests for permissions/Bot/GZWDer (flood) 2? --- Jura 21:28, 1 September 2018 (UTC)[reply]
- Yes for cebwiki only (for srwiki there's no strong concern specific to it so it's not excluded from the task).--GZWDer (talk) 21:40, 1 September 2018 (UTC)[reply]
- As the other task isn't approved and raised some serious concerns, I don't think this should be used to circumvent its review and approval. --- Jura 07:39, 2 September 2018 (UTC)[reply]
- Question what happens with pages from inactive wikis? (e.g. wikis with less than 100 edits per year by local users) --- Jura 07:39, 2 September 2018 (UTC)[reply]
- What's the problem of pages there? (I may exclude small wikis, this only reduce my work.)--GZWDer (talk) 15:30, 2 September 2018 (UTC)[reply]
- Question how will it interact with cases where wikis added local interwikis? Some bots add these to Wikidata items. --- Jura 07:39, 2 September 2018 (UTC)[reply]
- For bots known to do them, we can exclude these pages and let EmausBot to finish the task. (Before Lsjbot cease to active in svwiki, I does not import any pages from svwiki and cebwiki, as EmausBot will do it eventually.)--GZWDer (talk) 15:30, 2 September 2018 (UTC)[reply]
Specific cases
[edit]- Question What happens with user categories? --- Jura 20:38, 1 September 2018 (UTC)[reply]
- In 2017 many user categories was imported. No new ones is imported in 2018. It's up to community that (we should start a discussion about) whether they are notable, especially for those with multiple sitelinks. Note I do not plan to import any categories from Commons.--GZWDer (talk) 21:40, 1 September 2018 (UTC)[reply]
- Question Can we count on you not to create any? --- Jura 07:39, 2 September 2018 (UTC)[reply]
- I will not import user categories en masse.--GZWDer (talk) 15:41, 2 September 2018 (UTC)[reply]
- Question What happens with categories in general? Will it create large numbers of single sitelink items? --- Jura 21:28, 1 September 2018 (UTC)[reply]
- I don't think reason against the creation and there're little previous concern about that. If you think they should be excluded, reach a consensus first.--GZWDer (talk) 21:40, 1 September 2018 (UTC)[reply]
- The question is here if we should approve such indiscriminate creations or if we can expect some common sense choices before creating them. I'm not sure if there is a need to provide potential interwikis for any category, notably categories Wikipedia consider project specific maintenance and they haven't linked to any other wiki themselves. Some admins spend a lot of time to delete such category, e.g. MisterSynergy. --- Jura 07:39, 2 September 2018 (UTC)[reply]
- For maintaince categories that is intended to live temporary and eventually be deleted, they are already excluded by Wikidata:Notability/Exclusion criteria.--GZWDer (talk) 15:41, 2 September 2018 (UTC)[reply]
- Question Will items for categories have statements? --- Jura 07:39, 2 September 2018 (UTC)[reply]
- Basically all category have P31=Q4167836. Pywikibot can not add it directly (phab:T174994), but they can be added later (by browsing Special:Contributions).--GZWDer (talk) 15:41, 2 September 2018 (UTC)[reply]
- Question What happens with pages with soft redirects? --- Jura 20:38, 1 September 2018 (UTC)[reply]
- I recorded soft redirect templates that is actually in used in articles in a page (see User:GZWDer/temp13). They can be excluded from Petscan. This is not intended to be the best solution in the future, they should instead be handled by phab:T97577.--GZWDer (talk) 21:40, 1 September 2018 (UTC)[reply]
- Question Please confirm whether you will skip them or not. --- Jura 07:39, 2 September 2018 (UTC)[reply]
- Some wikis may have similar templates I do not know. If they are found it's easy to generate a list of items affected via PetScan.--GZWDer (talk) 15:41, 2 September 2018 (UTC)[reply]
- Question Can you give samples for items that would be created for wikisource? --- Jura 20:38, 1 September 2018 (UTC)[reply]
- See Q15892200 for example. For concern I'm not planning to creating items from Wikisource without statements, at least for the moment.--GZWDer (talk) 21:40, 1 September 2018 (UTC)[reply]
- Question Seems like a standard Wikisource page. What would it exclude? Have you solved the problem with labels in your past creations? I recall fixing several 10000s of items after broken labels and the absence of statements lead users to do all sorts of problematic edits. I don't think this task should be approved with a clear statement on what you will do. --- Jura 07:39, 2 September 2018 (UTC)[reply]
- Again I'm not planning to creating items from Wikisource, unless they can be clearly managed.--GZWDer (talk) 15:41, 2 September 2018 (UTC)[reply]
- Question Will items for Wikisource pages have statements? --- Jura 07:39, 2 September 2018 (UTC)[reply]
- Question Will it create items for nlwiki? --- Jura 20:38, 1 September 2018 (UTC)[reply]
- No, at least when there're people cake care of unconnected pages there.--GZWDer (talk) 21:40, 1 September 2018 (UTC)[reply]
- What do you consider a measure for that? --- Jura 07:39, 2 September 2018 (UTC)[reply]
- Question What happens with Wikinews? --- Jura 21:11, 1 September 2018 (UTC)[reply]
- Wikinews articles will be imported. I am currently not planning to import categories.--GZWDer (talk) 21:40, 1 September 2018 (UTC)[reply]
- Question Will there be statements on Wikinews items? If there is no plan for categories, I think an approval should exclude categories. --- Jura 07:39, 2 September 2018 (UTC)[reply]
Malfunction and repair
[edit]- Question How will you act if the process malfunctions? How should admins intervene? --- Jura 20:38, 1 September 2018 (UTC)[reply]
- Currently all running are manual. I does not plan fully automatical running in the foreseeable future.--GZWDer (talk) 21:40, 1 September 2018 (UTC)[reply]
- The question is how you will act or react? What will you do? --- Jura 07:39, 2 September 2018 (UTC)[reply]
- Question Will you repair defective item creations? --- Jura 20:38, 1 September 2018 (UTC)[reply]
- For what kind of defective? 1. Creation of items for non-notable pages may be found via PetScan and be mass removed. 2. For label issue: Once no inappropriate labels are generated by Pywikibot (see phab:T200399), we may generate a list from dump and fix it (there're much more errors in old items!)--GZWDer (talk) 21:40, 1 September 2018 (UTC)[reply]
- Will you do attempt to do a reasonable effort to clear this or just expect other users to do it? --- Jura 07:39, 2 September 2018 (UTC)[reply]
- Question Will you repair any past defective edits? --- Jura 20:38, 1 September 2018 (UTC)[reply]
- See above. For duplicated items there're no way to discover all (you can not guarantee that no duplicates happens in your creations either, unless you manually check all items), so it will take a long time to fix. But as I have said, nobody should assumes that no duplicates exists in Wikidata and entity IDs are still vaild after merging.--GZWDer (talk) 21:40, 1 September 2018 (UTC)[reply]