Wikidata:Property proposal/amino acid (start, end) position
amino acid position, amino acid start position, amino acid end position
[edit]Originally proposed at Wikidata:Property proposal/Natural science
Description | 3 related properties:
|
---|---|
Represents | amino acid position (Q66424100) |
Data type | Quantity |
Domain | property: superclass is Wikidata property related to biology (Q22988603) |
Allowed values | integers > 0 |
Allowed units | none |
Example 1 | phenylalanine hydroxylase (Q420604):
|
Example 2 | phenylalanine hydroxylase (Q420604):
|
Example 3 | phenylalanine hydroxylase (Q420604):
|
Planned use | manually specify how specific peptides are part of their proprotein. Bots could then also import such data, or data about position of protein domains, binding positions of posttranslational modifications, or disease mutations |
Robot and gadget jobs |
|
See also | genomic start (P644), genomic end (P645) |
Motivation
[edit]The lack of the property is preventing me to completely add knowledge to protein and peptide items, and this must have been an issue for the bots that import from UniProt as well, but I could not find previous discussions. This is an essential addition to the properties of statements about biological macromolecules that consist of amino acids. --SCIdude (talk) 09:52, 13 August 2019 (UTC)
Please note that I felt a single value property necessary (instead of using identical start/end) because I expect a much more frequent application of it than the start/end version from disease variants alone. --SCIdude (talk) 15:21, 13 August 2019 (UTC)
Discussion
[edit]WikiProject Molecular biology has more than 50 participants and couldn't be pinged. Please post on the WikiProject's talk page instead. ChristianKl ❪✉❫ 15:05, 13 August 2019 (UTC)
- Support David (talk) 05:34, 14 August 2019 (UTC)
- Support I've been trying to figure out how to add specific PTMs that are associated with diseases, this would work well. Only question I have is whether or not it should be restricted to amino acid sequences, since there are similar issues with nucleic acid sequences. Eg- specific nucleic acid deletions resulting in dysfunctional proteins, or site-specific methylation. Not sure if it would be better as one general property for aa and na sequences, or two distinct properties. Gtsulab (talk) 20:19, 13 August 2019 (UTC)
- @Gtsulab:: there is genomic start (P644), genomic end (P645) for nucleic acids (but no single value version). A concept mixing amino acids and nucleic acids does only exist in reality with the abstract, mathematical sequence concept---I would not object against a property "(start,end) position in sequence" if it existed. --SCIdude (talk) 08:07, 14 August 2019 (UTC)
- @SCIdude:: Yes, exactly!--I could see expanding the constraints/name for genomic start (P644), genomic end (P645) to be more inclusive so it would be more like the "(start,end) position in sequence". In any case, I think a property for a single position in a sequence would be very valuable whether or not it could be applied to both genes and proteins or just proteins. Gtsulab (talk) 18:52, 14 August 2019 (UTC)
- It feels to me like it would make more sense to rename the genomic start (P644) and genomic end (P645) into sequence start and end. ChristianKl ❪✉❫ 19:16, 18 August 2019 (UTC)
- I think it's best to withdraw this
and create the most general (and missing) "position in sequence" for any sequence. I mean how would I extract from WD the president number of Lyndon B. Johnson (Q9640) ?--SCIdude (talk) 17:15, 23 August 2019 (UTC)
- Finally I found series ordinal (P1545) the application of which to DNA/proteins however seems quite stretched... --SCIdude (talk) 07:14, 24 August 2019 (UTC)
- Support The idea in general seems quite useful. I liked the discussions around making a more inclusive concept, and I agree with SCIdude that it gets stretched. In the end, for this, it is not quite the order itself that matters, but having a good pointer. That being said, the modelling of pointwise indications is promising, but a bit hazy. "has part" "protein phosphorylation" is not accurate (a biological process is not part of a protein). The qualifier for "gene substitution association with" "phenylketonuria" would have to be something like "position in a sequence inherent to an item (e.g a specific gene or protein) for which a change has this effect". I guess that the local optimum would be changing constraints of genomic start (P644) and genomic end (P645) for inserting the domain info and keep the discussion going on pointwise representations. Anyways, good work. TiagoLubiana (talk) 18:48, 24 August 2019 (UTC)
- @TiagoLubiana: Thanks. Please also comment on the successor proposal: Wikidata:Property proposal/position in sequence --SCIdude (talk) 06:17, 25 August 2019 (UTC)