Wikidata:Bot requests/Archive/2022/03
This is an archive of past discussions. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page. |
Shakeosphere person ID
Shakeosphere person ID (P2886) error. Example:
- Leonhard Euler (Q7604) = 12 007, need 36645.
- Peter Abelard (Q4295) = 39, need 24677.
- etc.
- Please, delete or reindex. --Khodakov Pavel (talk) 14:25, 24 March 2022 (UTC)
- Sorry: id_new = id_old + 24638. --Khodakov Pavel (talk) 14:37, 24 March 2022 (UTC)
I am taking a look at this. Some care is required, e.g. Benjamin Franklin (Q34969) is at https://shakeosphere.lib.uiowa.edu/persons/person.jsp?pid=2437 William Avery (talk) 07:55, 4 June 2022 (UTC)
I ran a script to scan through all 2783 items with a Shakeosphere ID, added 24638 to the id, retrieved the corresponding page from Shakeosphere, and tried to match the name. Somewhat cryptic output is at User:William Avery Bot/Shakeosphere report.
In 117 cases, that gave an invalid ID that didn't return a valid page on Shakeosphere. These need further investigation.
In 2 cases, a valid page was returned, but the name on it didn't match at all:
- Sampling of lymph from lymph vessels afferent to the supramammary lymph gland in the cow. (Q51750968) - Doesn't make sense that this would have a Shakeosphere Person ID
- David Home of Crossrig (Q18527559) - already has correct id
The other 2644 items (95.7%) can be corrected by adding 24638 to the existing id.
The stability of the ID is clearly dubious, but I'll go ahead with an update, unless somebody proposes P2886 for deletion. William Avery (talk) 21:25, 4 June 2022 (UTC)
- Request process
BRFA filed at WD:BRFA ยง William Avery Bot 6 William Avery (talk) 10:22, 11 June 2022 (UTC)
Task completed: There is a list of items processed at User:William Avery Bot/Shakeosphere live William Avery (talk) 09:23, 9 July 2022 (UTC)
I think that this discussion is resolved and can be archived. If you disagree, don't hesitate to replace this template with your comment. William Avery (talk) 09:23, 9 July 2022 (UTC) |
request to merge true duplicates (2021-11-27)
Request date: 27 November 2021, by: Jura1
- Link to discussions justifying the request
- Task description
- A true duplicate is an item with a sitelink to the same wikipage as another item. This should be impossible, but for some technical reasons, they happened.
- Generally one of them isn't editable and remains without any statements.
- The technical problem said to be solved (see Wikidata:Report_a_technical_problem#quadruplicate). So no new true duplicates should be created.
- There is unknown quantity of true duplicates to be merged. We currently have >1,000,000 items without any statements.
- https://w.wiki/4TZd finds ca. 1700 created in one month by one user.
- Maybe an update of Wikidata:True duplicates can identify more. This was requested at Wikidata:Report_a_technical_problem#quadruplicate.
- Maybe Wikidata:Request a query can help too.
- Discussion
- Marius run the report to find all the identical sitelinks. It has now completed. (phab:T299422#7712720) -Mohammed Sadat (WMDE) (talk) 18:18, 21 February 2022 (UTC)
- Thanks. I will have look. --- Jura 12:23, 4 March 2022 (UTC)
- Request process
Request to deprecate P2190 string formats as property should use numeric format (2022-03-15)
Request date: 15 March 2022, by: Wolfgang8741
- Link to discussions justifying the request
- C-SPAN person ID (P2190) is transitioning to a numeric format for reliability of linking since the string format has been found to break when C-SPAN changes the string it doesn't always redirect. See property discussion. as well as on Project chat. Coordinated updates to templates using this property have been notified on Wikipedia for update and cleanup.
For the entries added prior to 26 Feb 2022 all matched numeric formats have been uploaded. Strings added after that date have not been checked.
- Task description
1. Deprecate all existing statements using a string for the value in C-SPAN person ID (P2190) and add qualifier reason for deprecated rank (P2241) with withdrawn identifier value (Q21441764)
2. Remove any strings added for the property after 14 March 2022 when the property officially started validating for numeric IDs.
3. For string IDs added between 26 Feb and 14 March, resolve the string to the C-SPAN url. Parse the url response and extract the numeric ID.
- Discussion
- Request process
Request to extract music titels from headline (2022-03-21)
Request date: 21 March 2022, by: Bigbossfarin
- Link to discussions justifying the request
I would like to feed Wikidata with Offizielle Deutsche Charts album ID (P10262) of all the albums on the website offiziellecharts.de/album-details-$1 (examples).
Problem: The name of the interpret and album on the website is in header 1 (h1) and header 2 (h2) in the HTML source code (example) and I don't know how to crawl this data.
- Task description
I need a list of the headers with ID number:
URL | ID | h1 | h2 |
---|---|---|---|
... | ... | ... | ... |
https://www.offiziellecharts.de/album-details-12 | 12 | Michael Jackson | Thriller |
https://www.offiziellecharts.de/album-details-13 | 13 | ZZ Top | Eliminator |
... | ... | ... | ... |
the same thing would be fine for artists
URL | ID | h1 |
---|---|---|
... | ... | ... |
https://www.offiziellecharts.de/suche/person-978 | 978 | Michael Jackson |
... | ... | ... |
and songs
URL | ID | h1 | h2 |
---|---|---|---|
... | ... | ... | ... |
https://www.offiziellecharts.de/titel-details-1680 | 1680 | Michael Jackson | Bad |
... | ... | ... | ... |
- Licence of data to import (if relevant)
- Discussion
Hello @Bigbossfarin, I'm not sure offiziellecharts.de really appreciate to have their whole website crawled. And I don't know if the license is ok with adding data to Wikidata. Myst (talk) 19:29, 24 March 2022 (UTC)
- Request process