Quantitative Biology > Biomolecules

arXiv:2410.15592 (q-bio)

[Submitted on 21 Oct 2024]

Title:CPE-Pro: A Structure-Sensitive Deep Learning Model for Protein Representation and Origin Evaluation

Authors:Wenrui Gou, Wenhui Ge, YangTan, Guisheng Fan, Mingchen Li, Huiqun Yu

Abstract:Protein structures are important for understanding their functions and interactions. Currently, many protein structure prediction methods are enriching the structure database. Discriminating the origin of structures is crucial for distinguishing between experimentally resolved and computationally predicted structures, evaluating the reliability of prediction methods, and guiding downstream biological studies. Building on works in structure prediction, We developed a structure-sensitive supervised deep learning model, Crystal vs Predicted Evaluator for Protein Structure (CPE-Pro), to represent and discriminate the origin of protein structures. CPE-Pro learns the structural information of proteins and captures inter-structural differences to achieve accurate traceability on four data classes, and is expected to be extended to more. Simultaneously, we utilized Foldseek to encode protein structures into "structure-sequence" and trained a protein Structural Sequence Language Model, SSLM. Preliminary experiments demonstrated that, compared to large-scale protein language models pre-trained on vast amounts of amino acid sequences, the "structure-sequences" enable the language model to learn more informative protein features, enhancing and optimizing structural representations. We have provided the code, model weights, and all related materials on this https URL.

Subjects:	Biomolecules (q-bio.BM); Computation and Language (cs.CL); Machine Learning (cs.LG); Quantitative Methods (q-bio.QM)
Cite as:	arXiv:2410.15592 [q-bio.BM]
	(or arXiv:2410.15592v1 [q-bio.BM] for this version)
	https://doi.org/10.48550/arXiv.2410.15592

Submission history

From: Wenrui Gou [view email]
[v1] Mon, 21 Oct 2024 02:21:56 UTC (2,837 KB)

Quantitative Biology > Biomolecules

Title:CPE-Pro: A Structure-Sensitive Deep Learning Model for Protein Representation and Origin Evaluation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Quantitative Biology > Biomolecules

Title:CPE-Pro: A Structure-Sensitive Deep Learning Model for Protein Representation and Origin Evaluation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators