Abstract
The human genome is traditionally represented as a DNA sequence of three billion base pairs. However, its intricacies are captured by many more complex signals, representing DNA variations, the expression of gene activity, or DNA’s structural rearrangements; a rich set of data formats is used to represent such signals. Different conceptual models explain such elaborate structure and behavior. Among them, the Conceptual Schema of the Human Genome (CSG) provides a concept-oriented, top-down representation of the genome behavior – independent of data formats. The Genomic Conceptual Model (GCM) instead provides a data-oriented, bottom-up representation, targeting a well-organized, unified description of these formats. We hereby propose to join these two approaches to achieve a more complete vision, linking (1) a concepts layer, describing genome elements and their conceptual connections, with (2) a data layer, describing datasets derived from genome sequencing with specific technologies. The link is established when specific genomic data types are chosen in the data layer, thereby triggering the selection of a view in the concepts layer. The benefit is mutual, as data records can be semantically described by high-level concepts and exploit their links. In turn, the continuously evolving abstract model can be extended thanks to the input provided by real datasets. As a result, it will be possible to express queries that employ a holistic conceptual perspective on the genome, directly translated onto data-oriented terms and organization. The approach is here exemplified using the DNA variation data type but is applicable to all genomic information.
A. Bernasconi and A. García S.—Should be regarded as Joint First Authors.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Augustyn, D.R., et al.: Perspectives of using Cloud computing in integrative analysis of multi-omics data. Brief. Funct. Genomics 20(4), 198–206 (2021)
Bass, J.I.F., et al.: Human gene-centered transcription factor networks for enhancers and disease variants. Cell 161(3), 661–673 (2015)
Bernasconi, A., et al.: The road towards data integration in human genomics: players, steps and interactions. Brief. Bioinform. 22(1), 30–44 (2021). https://doi.org/10.1093/bib/bbaa080
Bernasconi, A., et al.: META-BASE: a novel architecture for large-scale genomic metadata integration. IEEE/ACM Trans. Comput. Biol. Bioinf. 19(1), 543–557 (2022)
Bernasconi, A., Ceri, S., Campi, A., Masseroli, M.: Conceptual modeling for genomics: building an integrated repository of open data. In: Mayr, H.C., Guizzardi, G., Ma, H., Pastor, O. (eds.) ER 2017. LNCS, vol. 10650, pp. 325–339. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-69904-2_26
Bornberg-Bauer, E., et al.: Conceptual data modelling for bioinformatics. Brief. Bioinform. 3(2), 166–180 (2002)
Calvanese, D., et al.: Ontology-based database access. In: SEBD, pp. 324–331 (2007)
Canakoglu, A., et al.: GenoSurf: metadata driven semantic search system for integrated genomic datasets. Database 2019 (2019)
Cappelli, E., et al.: OpenGDC: unifying, modeling, integrating cancer genomic data and clinical metadata. Appl. Sci. 10(18), 6367 (2020)
Ceri, S., Bernasconi, A., Canakoglu, A., Gulino, A., Kaitoua, A., Masseroli, M., Nanni, L., Pinoli, P.: Overview of GeCo: a project for exploring and integrating signals from the genome. In: Kalinichenko, L., Manolopoulos, Y., Malkov, O., Skvortsov, N., Stupnikov, S., Sukhomlin, V. (eds.) DAMDID/RCDL 2017. CCIS, vol. 822, pp. 46–57. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-96553-6_4
Ceri, S., et al.: What you always wanted to know about Datalog (and never dared to ask). IEEE Trans. Knowl. Data Eng. 1(1), 146–166 (1989)
García, A., et al.: Towards the understanding of the human genome: a holistic conceptual modeling approach. IEEE Access 8, 197111–197123 (2020)
García, A., et al.: A conceptual model-based approach to improve the representation and management of omics data in precision medicine. IEEE Access 9, 154071–154085 (2021)
García S., A., Casamayor, J.C., Pastor, O.: ISGE: a conceptual model-based method to correctly manage genome data. In: Nurcan, S., Korthaus, A. (eds.) CAiSE 2021. LNBIP, vol. 424, pp. 47–54. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-79108-7_6
Gopinath, C., et al.: Contemporary animal models for human gene therapy applications. Curr. Gene Ther. 15(6), 531–540 (2015)
Mamidi, T.K.K., et al.: Integrating germline and somatic variation information using genomic data for the discovery of biomarkers in prostate cancer. BMC Cancer 19(1), 1–12 (2019)
Masseroli, M., et al.: Processing of big heterogeneous genomic datasets for tertiary analysis of Next Generation Sequencing data. Bioinformatics 35(5), 729–736 (2018)
Masseroli, M., et al.: Modeling and interoperability of heterogeneous genomic big data for integrative processing and querying. Methods 111, 3–11 (2016)
Mayr, H.C., et al.: The triptych of conceptual modeling. Softw. Syst. Model. 20(1), 7–24 (2021)
Meyerson, W., et al.: Origins and characterization of variants shared between databases of somatic and germline human mutations. BMC Bioinform. 21(1), 1–22 (2020)
Nghiem, P.P., et al.: Gene therapies in canine models for duchenne muscular dystrophy. Hum. Genet. 138(5), 483–489 (2019)
O’Leary, N.A., et al.: Reference sequence (refseq) database at ncbi: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res. 44(D1), D733–D745 (2016)
Pastor, O., et al.: Model-based engineering applied to the interpretation of the human genome. In: The Evolution of Conceptual Modeling, pp. 306–330. Springer (2011)
Paton, N.W., et al.: Conceptual modelling of genomic information. Bioinformatics 16(6), 548–557 (2000)
Przytycki, P.F., et al.: Differential analysis between somatic mutation and germline variation profiles reveals cancer-related genes. Genome Med. 9(1), 79 (2017)
1000 Genomes Project Consortium: a global reference for human genetic variation. Nature 526(7571), 68 (2015)
ENCODE Project Consortium: An integrated encyclopedia of DNA elements in the human genome. Nature 489(7414), 57–74 (2012)
Reyes Román, J.F., Pastor, Ó., Casamayor, J.C., Valverde, F.: Applying conceptual modeling to better understand the human genome. In: Comyn-Wattiau, I., Tanaka, K., Song, I.-Y., Yamamoto, S., Saeki, M. (eds.) ER 2016. LNCS, vol. 9974, pp. 404–412. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46397-1_31
Safran, M., Rosen, N., Twik, M., BarShir, R., Stein, T.I., Dahary, D., Fishilevich, S., Lancet, D.: The GeneCards Suite. In: Abugessaisa, I., Kasukawa, T. (eds.) Practical Guide to Life Science Databases, pp. 27–56. Springer, Singapore (2021). https://doi.org/10.1007/978-981-16-5812-9_2
Schuster, S.C.: Next-generation sequencing transforms today’s biology. Nat. Methods 5(1), 16–18 (2008)
Weinstein, J.N., et al.: The cancer genome atlas pan-cancer analysis project. Nat. Genet. 45(10), 1113–1120 (2013)
Acknowledgement
This research is funded by the ERC Advanced Grant 693174 GeCo (Data-Driven Genomic Computing), INNEST/2021/57, and MICIN/AEI/ 10.13039/501100011033.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Bernasconi, A., García S., A., Ceri, S., Pastor, O. (2022). A Comprehensive Approach for the Conceptual Modeling of Genomic Data. In: Ralyté, J., Chakravarthy, S., Mohania, M., Jeusfeld, M.A., Karlapalem, K. (eds) Conceptual Modeling. ER 2022. Lecture Notes in Computer Science, vol 13607. Springer, Cham. https://doi.org/10.1007/978-3-031-17995-2_14
Download citation
DOI: https://doi.org/10.1007/978-3-031-17995-2_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-17994-5
Online ISBN: 978-3-031-17995-2
eBook Packages: Computer ScienceComputer Science (R0)